Tips for managing postgres/gpdb database connections

ipython-sql

An easy way to connect to a SQL database from Jupyter is with the ipython-sql package.

Installation

pip install --user ipython-sql==0.3.9

Setup

Including this at the top of your notebook allows you to manage all of your db connections in a single file in your home directory:

homedir = %env HOME

%config SqlMagic.dsn_filename='$homedir/.odbc.ini'
%reload_ext sql
%config SqlMagic.autopandas=True
%config SqlMagic.short_errors=False
#%config SqlMagic.column_local_vars=True

If you share the notebook with others, they can use their own config file, and not have to modify anything in the notebook.

Sample .odbc.ini

[ODBC Data Sources]
tunnel = "ssh tunnel on Port 80000 to remote pg or gpdb server"

local_madlib = "local gpdb database: madlib schema"
local_madlib_pg = "local pg database: madlib schema"
local_mnist = "local gpdb database: mnist schema"


ngd_places = "gpdb database on NGD ARM64 processors: places schema"

guest6_aa = "gpdb database on guest-six Centos6 cluster"
guest7_aa = "gpdb database on guest-seven Centos7 cluster"

tunnel = "connect through an ssh tunnel set up on local port 8000"

[ODBC]
Threading = 1

[ngd_places]
DriverName=postgresql
Host=node1
Port=5432
Username=ngd
Database=places

[local_madlib]
DriverName=postgresql
Host=localhost
Port=6000
Username=dominovaldano
Database=madlib

[local_madlib_pg]
DriverName=postgresql
Host=localhost
Port=5432
Username=gpadmin
Database=madlib

[local_mnist]
DriverName=postgresql
Host=localhost
Port=6000
Username=dominovaldano
Database=mnist

[tunnel]
DriverName=postgresql
Host=localhost
Port=8000
Username=gpadmin
Database=madlib

[aa_guest7]
DriverName=postgresql
Host=10.0.4.5
Port=5432
Username=guest
Database=datascience
Password=XXXXXX

[aa_guest6]
DriverName=postgresql
Host=10.0.4.2
Port=5432
Username=guest
Database=datascience
Password=XXXXXX

Usage

%%sql [aa_guest7]
SELECT * FROM my_table;

Reading and writing database tables from Pandas

It is useful to be able to read from a database table into a Pandas dataframe, and to write Pandas dataframe to a database table.  From the ipython-sql site:

Read:

In [3]: result = %sql SELECT * FROM character WHERE speechcount > 25
In [4]: dataframe = result.DataFrame()

Write:

In [5]: %sql --persist dataframe
In [6]: %sql SELECT * FROM dataframe;



  • No labels