THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Tips for managing postgres/gpdb database connections
ipython-sql
An easy way to connect to a SQL database from Jupyter is with the ipython-sql package.
Installation
pip install --user ipython-sql==0.3.9
Setup
Including this at the top of your notebook allows you to manage all of your db connections in a single file in your home directory:
homedir = %env HOME
%config SqlMagic.dsn_filename='$homedir/.odbc.ini'
%reload_ext sql
%config SqlMagic.autopandas=True
%config SqlMagic.short_errors=False
#%config SqlMagic.column_local_vars=True
If you share the notebook with others, they can use their own config file, and not have to modify anything in the notebook.
Sample .odbc.ini
[ODBC Data Sources]
tunnel = "ssh tunnel on Port 80000 to remote pg or gpdb server"
local_madlib = "local gpdb database: madlib schema"
local_madlib_pg = "local pg database: madlib schema"
local_mnist = "local gpdb database: mnist schema"
ngd_places = "gpdb database on NGD ARM64 processors: places schema"
guest6_aa = "gpdb database on guest-six Centos6 cluster"
guest7_aa = "gpdb database on guest-seven Centos7 cluster"
tunnel = "connect through an ssh tunnel set up on local port 8000"
[ODBC]
Threading = 1
[ngd_places]
DriverName=postgresql
Host=node1
Port=5432
Username=ngd
Database=places
[local_madlib]
DriverName=postgresql
Host=localhost
Port=6000
Username=dominovaldano
Database=madlib
[local_madlib_pg]
DriverName=postgresql
Host=localhost
Port=5432
Username=gpadmin
Database=madlib
[local_mnist]
DriverName=postgresql
Host=localhost
Port=6000
Username=dominovaldano
Database=mnist
[tunnel]
DriverName=postgresql
Host=localhost
Port=8000
Username=gpadmin
Database=madlib
[aa_guest7]
DriverName=postgresql
Host=10.0.4.5
Port=5432
Username=guest
Database=datascience
Password=XXXXXX
[aa_guest6]
DriverName=postgresql
Host=10.0.4.2
Port=5432
Username=guest
Database=datascience
Password=XXXXXX
Usage
%%sql [aa_guest7]
SELECT * FROM my_table;
Reading and writing database tables from Pandas
It is useful to be able to read from a database table into a Pandas dataframe, and to write Pandas dataframe to a database table. From the ipython-sql site:
Read:
In [3]: result = %sql SELECT * FROM character WHERE speechcount > 25 In [4]: dataframe = result.DataFrame()
Write:
In [5]: %sql --persist dataframe In [6]: %sql SELECT * FROM dataframe;