A quick start guide for installing from binaries or compiling from source for MADlib®.
Quick Start With Binaries
Prerequisites
Install and configure your database of choice. MADlib currently supports the following platforms:
- PostgreSQL
- Greenplum database
- Apache HAWQ (incubating)
This guide describes the installation steps for Postgres and Greenplum. (HAWQ installation steps will be added at a later date.)
Postgres platform notes:
- Ensure that you install Postgres with the Python extension specified.
- If environment variables are defined, this can save you some typing.
Installing MADlib
- Download the MADlib binary
- Postgres: Get either the OSX or Redhat/CentOS binary from the MADlib download page
- Pivotal Greenplum Database: Download the .gppkg binary from Pivotal Network
- Install the package at the OS level.
- Postgres:
- on OSX double click the installer package
on Redhat / CentOS run the following as root:
yum install <madlib_package> --nogpgcheck
- Pivotal Greenplum Database:
on Redhat / CentOS run the following as gpadmin:
gppkg install <madlib_package>
- Postgres:
Ensure that the environment is setup for your database deployment and that the database is up and running.
Ensure that psql, postgres, and pg_config are in your path
which psql which postgres which pg_config
Ensure that the database is started and running
psql -c 'select version()'
The above may need user/port/password setting depending on how the database has been configured.
Run the MADlib deployment utility to install MADlib into each database that you want to use it:
Postgres:
/usr/local/madlib/bin/madpack -s madlib –p postgres install
if environment variables are defined. Otherwise use a fully defined connection string:
/usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
Pivotal Greenplum Database:
/usr/local/madlib/bin/madpack –p greenplum install
The above may need user/port/password setting depending on how the database has been configured.
For more information on madpack:
/usr/local/madlib/bin/madpack --help
Test your installation
Postgres:
/usr/local/madlib/bin/madpack -s madlib –p postgres install-check
Pivotal Greenplum Database:
/usr/local/madlib/bin/madpack –p greenplum install-check
The above may need user/port/password setting depending on how the database has been configured.
Compiling From Source
Prerequisites
Requirements for installing MADlib:
- gcc (For OSX, Clang will work for compiling the source, but not for documentation.)
- An installed version of HAWQ, Greenplum Database 4.2+ or Postgre (64-bit) 9.2+ with plpython support enabled. Note: plpython may not be enabled in Postgres by default.
Installing MADlib
In the $MADLIB_ROOT
directory (location of MADlib source) run the following commands:
mkdir build cd build cmake .. make
Above, we built the executables in the build
folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT
).
Deploying MADlib
Deploy MADlib into the database with MADlib package manager madpack
located under $BUILD_ROOT/src/bin
.
To install:
$BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install
To make sure that the installation is successful:
$BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check
For more information on the usage of madpack:
$BUILD_ROOT/src/bin/madpack --help
Defining environment variables
The variables below will be automatically used by the madpack
installer if no connection string is provided:
- User:
PGUSER
orUSER
(defaults to OS username) - Password:
PGPASSWORD
(defaults to empty) - Host:
PGHOST
(defaults to 'localhost') - Database:
PGDATABASE
(defaults to OS username) - Port:
PGPORT
(defaults to 5432)
An example of deploying MADlib using the environment variables:
export PGPORT=5430 export PGHOST=127.0.0.1 export PGDATABASE=madlibtest $BUILD_ROOT/src/bin/madpack -p postgres install