sql_magic: Jupyter Magic to Write SQL for Apache Spark and Relational Databases

July 13, 2017 Chris Rawles

Data scientists love Jupyter Notebook, Python, and Pandas. And they also write SQL. I created sql_magic to facilitate writing SQL code from Jupyter Notebook to use with both Apache Spark (or Hive) and relational databases such as PostgreSQL, MySQL, Pivotal Greenplum and HDB, and others. The library supports SQLAlchemy connection objects, psycopg connection objects, SparkSession and SQLContext objects, and other connections types. The %%read_sql magic function returns results as a Pandas DataFrame for analysis and visualization.

%%read_sql df_result
SELECT {col_names}
FROM {table_name}
WHERE age < 10

The sql_magic library expands upon current libraries such as ipython-sql with the following features:

  • Support for both Apache Spark and relational database connections simultaneously
  • Asynchronous execution (useful for long queries)
  • Browser notifications for query completion
# installation
pip install sql_magic

Check out the GitHub repository for more information.


Links:

About the Author

Chris Rawles

Chris Rawles is a senior data scientist at Pivotal in New York, New York, where he works with customers across a variety of domains, building models to derive insight and business value from their data. He holds an MS and BA in geophysics from UW-Madison and UC Berkeley, respectively. During his time as a researcher, Chris focused his efforts on using machine-learning to enable research in seismology.

Previous
Pivotal Data Science Technical Workshop - Dallas PAIR
Pivotal Data Science Technical Workshop - Dallas PAIR

Next
Detecting Risky Assets in an Organization Using Time-Variant Graphical Model
Detecting Risky Assets in an Organization Using Time-Variant Graphical Model