You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 67 Next »


To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start.  Otherwise, follow the regular guides for installing from binaries or compiling from source.

For developers, you may want to use the Docker image described in the Developer Guide.

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called Release Specific Installations.

Please note that VM's with MADlib pre-installed are available from the Pivotal download site:  Greenplum database sandbox VM and Pivotal HDB sandbox VM.  Using the VMs may be an alternative to following the installation steps described on this page for some folks.

MADlib works with Python 2.6 and 2.7.  Currently, Python 3.x is not supported.

Super Quick Start

To set up PostgreSQL + MADLib with Anaconda Python on OSX: 

PYTHON=/Users/janedoe/anaconda/bin/python 
brew install postgresql --with-python
brew services start postgresql
— — Set up database and roles
— — Install the .dmg of latest madlib downloaded from MADlib website http://madlib.incubator.apache.org/download.html
/usr/local/madlib/bin/madpack -s madlib -p postgres install

Quick Start With Binaries

Prerequisites

Install and configure your database of choice. MADlib currently supports the following platforms:

  • PostgreSQL
  • Greenplum database
  • Pivotal HDB/Apache HAWQ (incubating)

Postgres platform notes:

  • Ensure that you install Postgres with the Python extension specified (i.e., --with-python), as described here in the PosgreSQL documentation. If not you will see an error message like the one below when you try to install MADlib with madpack:
 /usr/local/madlib/bin/madpack -s madlib -p postgres install
madpack.py : INFO : Detected PostgreSQL version 9.5.
madpack.py : INFO : *** Installing MADlib ***
madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py)
madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib)
madpack.py : INFO : Testing PL/Python environment...
madpack.py : INFO : > Creating language PL/Python...
madpack.py : ERROR : SQL command failed:
SQL: CREATE LANGUAGE plpythonu;
ERROR: could not access file "$libdir/plpython2": No such file or directory
madpack.py : ERROR : Cannot create language plpythonu. Stopping installation...
madpack.py : ERROR : MADlib installation failed.

Installing MADlib

  1. Download the MADlib binary

Install the package at the OS level.

  • Postgres:
    • on OSX double click the installer package
    • on Redhat / CentOS run the following as root:

      yum install <madlib_package> --nogpgcheck
  • Greenplum Database:
    • on Redhat / CentOS run the following as gpadmin:

      gppkg -i <madlib_package>
  • HDB/HAWQ:
    • on Redhat / CentOS run the following as gpadmin:

      gppkg -i <madlib_package>
  1. Ensure that the environment is setup for your database deployment and that the database is up and running.
    • Ensure that psql, postgres, and pg_config are in your path

      which psql
      which postgres 
      which pg_config
    • Ensure that the database is started and running

      psql -c 'select version()'

      The above may need user/port/password setting depending on how the database has been configured.

  2. Run the MADlib deployment utility to install MADlib into each database that you want to use it:

    • Postgres:

      /usr/local/madlib/bin/madpack -s madlib –p postgres install

      if environment variables are defined. Otherwise use a fully defined connection string:

      /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
    • Greenplum Database:

      /usr/local/madlib/bin/madpack –p greenplum install

      The above may need user/port/password setting depending on how the database has been configured.

    • HDB/HAWQ:

      /usr/local/madlib/bin/madpack –p hawq install

      The above may need user/port/password setting depending on how the database has been configured.

    For more information on madpack:

    /usr/local/madlib/bin/madpack --help

    Help output for madpack is also attached to this wiki page for your reference.

    After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions.  Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.


  3. Test your installation

    • Postgres:

      /usr/local/madlib/bin/madpack -s madlib –p postgres install-check
    • Greenplum Database:

      /usr/local/madlib/bin/madpack –p greenplum install-check

      The above may need user/port/password setting depending on how the database has been configured. 

      Please note that if the optimizer_control GUC is set to off in Greenplum, this can certain install checks to fail.  This is because some install checks turn the optimizer on or off for efficiency reasons.  However, this does not mean there is any problem with the installation.

    • HDB/HAWQ:

      /usr/local/madlib/bin/madpack –p hawq install-check

      The above may need user/port/password setting depending on how the database has been configured.

Installing from PGXN (PostgreSQL)

Prerequisites

Requirements for installing MADlib:

  • gcc (For OSX, Clang will work for compiling the source, but not for documentation.)
  • pgxn installed
  • PostgreSQL (64-bit) 9.2+ with plpython support enabled. Note: plpython may not be enabled in Postgres by default.

 

Use below commnd to install and load the latest MADlib package uploaded on PGXN.  

pgxn install madlib
pgxn load madlib 

 

Compiling From Source

Prerequisites

Requirements for installing MADlib:

  • gcc (For OSX, Clang will work for compiling the source, but not for documentation.)
  • An installed version of HDB/HAWQ, Greenplum Database 4.2+ or PostgreSQL (64-bit) 9.2+ with plpython support enabled.  Note: plpython may not be enabled in Postgres by default.  
  • MADlib works with Python 2.6 and 2.7.  Currently, Python 3.x is not supported.

Installing MADlib

In the $MADLIB_ROOT directory (location of MADlib source) run the following commands:

mkdir build 
cd build 
cmake .. 
make

Above, we built the executables in the build folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT).

Deploying MADlib

Deploy MADlib into the database with MADlib package manager madpack located under $BUILD_ROOT/src/bin.

  • To install:

    $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install
  • To make sure that the installation is successful:

    $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check
  • For more information on the usage of madpack:

    $BUILD_ROOT/src/bin/madpack --help

Defining environment variables

The variables below will be automatically used by the madpack installer if no connection string is provided:

  1. User: PGUSER or USER (defaults to OS username)
  2. Password: PGPASSWORD (defaults to empty)
  3. Host: PGHOST (defaults to 'localhost')
  4. Database: PGDATABASE (defaults to OS username)
  5. Port: PGPORT (defaults to 5432)

An example of deploying MADlib using the environment variables:

export PGPORT=5430
export PGHOST=127.0.0.1
export PGDATABASE=madlibtest
$BUILD_ROOT/src/bin/madpack -p postgres install

 

Release Specific Installations

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed in this section.

11/30/16 - Installation of MADlib 1.9.1 on GPDB 4.3.11

The procedure exactly the same as described below for installation of MADlib on GPDB 4.3.10 .

11/28/16 - Installation of MADlib 1.9.1 on HDB/HAWQ 2.1.0.0

The procedure exactly the same as described below for installation of MADlib on HDB/HAWQ 2.0.1

10/19/16 - Installation of MADlib 1.9.1 on GPDB 4.3.10

This is an important note for installation of MADlib on GPDB 4.3.10.  It does not apply to any other releases.

1) Fix madpack install utility
* issue: After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility (see below).  The script is downloadable from the Pivotal Network.

2) install checks
* issue: some failures may happen on MADlib install checks,  however the MADlib install actually completed OK.

This is a poor customer experience that will be fixed in the next release. On the positive side, once the installation is done, MADlib should work OK.

------------------------------

More on fixing madpack from #1 above:

After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility.
The syntax for fix_madpack.sh is below.

This can be somewhat confusing because after gppkg
installation, you will get a message on the console
that says:

“Please run the following command to deploy MADlib
usage: madpack install [-s schema_name] -p hawq -c user@host:port/database
etc...”

So the correct order of operations is:

1. gppkg install of MADlib
2. run fix_madpack.sh
3. run madpack utility

*****************************************************
COMMAND NAME: fix_madpack.sh
*****************************************************

Script to fix a MADlib installation issue on GPDB 4.3.10.

This script patches a line in madpack.py, the MADlib installation
script. A backup of the original file is created in the same folder as
madpack.py called 'madpack.py.orig'.  The script is downloadable from the Pivotal Network.

*****************************************************
SYNOPSIS
*****************************************************

fix_madpack.sh [--prefix <MADLIB_INSTALL_PATH>]

fix_madpack.sh -h


*****************************************************
PREREQUISITES
*****************************************************

The following tasks should be performed prior to executing this script:

* Set $GPHOME to the correct GPDB installation directory containing MADlib
OR
* Set MADlib installation path using the --prefix option


*****************************************************
OPTIONS
*****************************************************

--prefix <MADLIB_INSTALL_PATH>
Optional. Expected MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.

-h | -? | --help
Displays the online help.


*****************************************************
EXAMPLE
*****************************************************

/home/gpadmin/madlib/fix_madpack.sh --prefix /usr/local/gpdb/madlib

10/7/16 - Installation of MADlib 1.9.1 on HDB/HAWQ 2.0.1

This is an important note for installation of MADlib on HDB/HAWQ 2.0.1.

1) gppkg
* issue: does not end cleanly after installing MADlib so you need to manually exit via ctl-z, however the MADlib install actually completed OK.

2) QuickLZ compression
* issue: After gppkg installation MADlib, you must run the script 
remove_compression.sh BEFORE running the madpack utility (see below).  The script is downloadable from the Pivotal Network.

3) install checks
* issue: some failures may happen on MADlib install checks,  however the MADlib install actually completed OK.

This is a poor customer experience that will be fixed in the next release. On the positive side, once the installation is done, MADlib should work OK.

------------------------------

More on removing compression from #2 above:

After gppkg installation MADlib, you must run the script 
remove_compression.sh BEFORE running the madpack utility.
The syntax for remove_compression.sh is below.

This can be somewhat confusing because after gppkg
installation, you will get a message on the console
that says:

“Please run the following command to deploy MADlib
usage: madpack install [-s schema_name] -p hawq -c user@host:port/database
etc...”

So the correct order of operations is:

1. gppkg install of MADlib
2. run remove_compression.sh
3. run madpack utility


*****************************************************
COMMAND NAME: remove_compression.sh
*****************************************************

MADlib install script for HDB/HAWQ 2.0.1+ to remove 'QUICKLZ' 
compression. Works on the current MADlib installation 
(but not all versions of MADlib in the case that multiple 
versions are installed). The script is downloadable from the Pivotal Network.


*****************************************************
SYNOPSIS
*****************************************************

remove_compression.sh [--prefix <MADLIB_INSTALL_PATH>]

remove_compression.sh -h


*****************************************************
PREREQUISITES
*****************************************************

The following tasks should be performed prior to executing this script:

* Set $GPHOME to the correct HAWQ installation directory
OR
* Set MADlib installation path using the --prefix option


*****************************************************
OPTIONS
*****************************************************

--prefix <MADLIB_INSTALL_PATH>
Optional. Expected MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.

-h | -? | --help
Displays the online help.


*****************************************************
EXAMPLE
*****************************************************

/home/gpadmin/madlib/remove_compression.sh --prefix /usr/local/hdb/madlib

 
  • No labels