To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start. Otherwise, follow the regular guides for installing from binaries or compiling from source.
Please note that VM's with MADlib pre-installed are available from the Pivotal download site: Greenplum database sandbox VM and Pivotal HDB sandbox VM. This may be an alternative to following the installation steps described on this page for some folks.
Super Quick Start
To set up PostgreSQL + MADLib with Anaconda Python on OSX:
PYTHON=/Users/janedoe/anaconda/bin/python brew install postgresql --with-python brew services start postgresql — — Set up database and roles — — Install the .dmg of latest madlib downloaded from MADlib website /usr/local/madlib/bin/madpack -s madlib -p postgres install
Quick Start With Binaries
Prerequisites
Install and configure your database of choice. MADlib currently supports the following platforms:
- PostgreSQL
- Greenplum database
- Pivotal HDB/Apache HAWQ (incubating)
Postgres platform notes:
- Ensure that you install Postgres with the Python extension specified (i.e., --with-python), as described here in the PosgreSQL documentation. If not you will see an error message like the one below when you try to install MADlib with madpack:
/usr/local/madlib/bin/madpack -s madlib -p postgres install madpack.py : INFO : Detected PostgreSQL version 9.5. madpack.py : INFO : *** Installing MADlib *** madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py) madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib) madpack.py : INFO : Testing PL/Python environment... madpack.py : INFO : > Creating language PL/Python... madpack.py : ERROR : SQL command failed: SQL: CREATE LANGUAGE plpythonu; ERROR: could not access file "$libdir/plpython2": No such file or directory madpack.py : ERROR : Cannot create language plpythonu. Stopping installation... madpack.py : ERROR : MADlib installation failed.
- If the environment variables listed below are defined, it can save you some typing.
Installing MADlib
- Download the MADlib binary
- Postgres: Get either the OSX or Redhat/CentOS binary from the MADlib download page
- Greenplum database and HDB/HAWQ : Download the .gppkg binary from Pivotal Network
Install the package at the OS level.
- Postgres:
- on OSX double click the installer package
on Redhat / CentOS run the following as root:
yum install <madlib_package> --nogpgcheck
- Greenplum Database:
on Redhat / CentOS run the following as gpadmin:
gppkg -i <madlib_package>
- HDB/HAWQ:
on Redhat / CentOS run the following as gpadmin:
gppkg -i <madlib_package>
- Ensure that the environment is setup for your database deployment and that the database is up and running.
Ensure that psql, postgres, and pg_config are in your path
which psql which postgres which pg_config
Ensure that the database is started and running
psql -c 'select version()'
The above may need user/port/password setting depending on how the database has been configured.
Run the MADlib deployment utility to install MADlib into each database that you want to use it:
Postgres:
/usr/local/madlib/bin/madpack -s madlib –p postgres install
if environment variables are defined. Otherwise use a fully defined connection string:
/usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
Greenplum Database:
/usr/local/madlib/bin/madpack –p greenplum install
The above may need user/port/password setting depending on how the database has been configured.
HDB/HAWQ:
/usr/local/madlib/bin/madpack –p hawq install
The above may need user/port/password setting depending on how the database has been configured.
For more information on madpack:
/usr/local/madlib/bin/madpack --help
Help output for madpack is also attached to this wiki page for your reference.
After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib." See the PostgreSQL docs for information on schemas and privileges.
Test your installation
Postgres:
/usr/local/madlib/bin/madpack -s madlib –p postgres install-check
Greenplum Database:
/usr/local/madlib/bin/madpack –p greenplum install-check
The above may need user/port/password setting depending on how the database has been configured.
HDB/HAWQ:
/usr/local/madlib/bin/madpack –p hawq install-check
The above may need user/port/password setting depending on how the database has been configured.
Installing from PGXN (PostgreSQL)
Prerequisites
Requirements for installing MADlib:
- gcc (For OSX, Clang will work for compiling the source, but not for documentation.)
- pgxn installed
- PostgreSQL (64-bit) 9.2+ with plpython support enabled. Note: plpython may not be enabled in Postgres by default.
Use below commnd to install and load the latest MADlib package uploaded on PGXN.
pgxn install madlib pgxn load madlib
Compiling From Source
Prerequisites
Requirements for installing MADlib:
- gcc (For OSX, Clang will work for compiling the source, but not for documentation.)
- An installed version of HDB/HAWQ, Greenplum Database 4.2+ or PostgreSQL (64-bit) 9.2+ with plpython support enabled. Note: plpython may not be enabled in Postgres by default.
Installing MADlib
In the $MADLIB_ROOT
directory (location of MADlib source) run the following commands:
mkdir build cd build cmake .. make
Above, we built the executables in the build
folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT
).
Deploying MADlib
Deploy MADlib into the database with MADlib package manager madpack
located under $BUILD_ROOT/src/bin
.
To install:
$BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install
To make sure that the installation is successful:
$BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check
For more information on the usage of madpack:
$BUILD_ROOT/src/bin/madpack --help
Defining environment variables
The variables below will be automatically used by the madpack
installer if no connection string is provided:
- User:
PGUSER
orUSER
(defaults to OS username) - Password:
PGPASSWORD
(defaults to empty) - Host:
PGHOST
(defaults to 'localhost') - Database:
PGDATABASE
(defaults to OS username) - Port:
PGPORT
(defaults to 5432)
An example of deploying MADlib using the environment variables:
export PGPORT=5430 export PGHOST=127.0.0.1 export PGDATABASE=madlibtest $BUILD_ROOT/src/bin/madpack -p postgres install