Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called Release Specific Installations.

MADlib works with Python requires python version 2.6 and 2.7.  CurrentlyCurrently, Python 3.x is not supported.

Currently supported database version: postgres 9 and 10, greenplum 4.3 and 5.X.versions: Please see this page for supported databases and OS

The following python libraries are required for their associated modules:

Deep Learning: dill, grpcio==1.39.0, protobuf==3.17.3, hyperopt==0.2.5, tensorflow == 1.14, scikit-learn==0.19

XGBoost: pandas, xgboost==0.82

KNN: scipy==1.2.1

Unit tests: pgsanity

Anchor
Anchor
Super Quick Start
Super Quick Start
Super Quick Start

To set up PostgreSQL + MADLib MADlib with Anaconda Python on OSX: 

...

  • PYTHON=/Users/janedoe/anaconda/bin/

...

  • python 
  •  Install Postgres with the Python extension specified (i.e., --with-python), as described here in the PostgreSQL documentation. Note that previously you could install postgres with python support using brew by running 'brew install postgresql --with-python' but passing the '--with-python' flag is not supported anymore.
  •  Set up database and roles

  •  Install the .dmg of latest madlib downloaded from MADlib website https://madlib.

...

...

  •  /usr/local/madlib/bin/madpack

...

  • -s

...

  • madlib

...

  • -p

...

  • postgres

...

  • install

Anchor
Quick Start With Binaries
Quick Start With Binaries
Quick Start With Binaries

...

  • Ensure that you install Postgres with the Python extension specified (i.e., --with-python), as described here in the PostgreSQL documentation. If not you will see an error message like the one below when you try to install MADlib with madpack:
Code Block
 /usr/local/madlib/bin/madpack -s madlib -p postgres install
madpack.py : INFO : Detected PostgreSQL version 9.5.
madpack.py : INFO : *** Installing MADlib ***
madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py)
madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib)
madpack.py : INFO : Testing PL/Python environment...
madpack.py : INFO : > Creating language PL/Python...
madpack.py : ERROR : SQL command failed:
SQL: CREATE LANGUAGE plpythonu;
ERROR: could not access file "$libdir/plpython2": No such file or directory
madpack.py : ERROR : Cannot create language plpythonu. Please check if you
                have configured and installed portid (your platform) with
                `--with-python` option. Stopping installation...
madpack.py : ERROR : MADlib installation failed

...

  1. Download the MADlib binary
    • Postgres: Get either the OSX or Redhat/CentOS binary from the For Postgres: OS X and Linux binaries can be found on the MADlib download page
    • For Greenplum: Linux .gppkg binaries can be found on Pivotal Network in the "Greenplum Advanced Analytics Group"
      • NOTE: the above .gppkg binaries work for both open and closed source Greenplum and can be downloaded by anybody (after creating a Pivotal Network account)
      Greenplum database : Download the .gppkg binary from Pivotal Network
  2. Install the package

...

  1. .
    1. Postgres:
      • on OSX double click the installer package
      • on Redhat / CentOS run the following as root:

        Code Block
        yum install <madlib_package> --nogpgcheck

        or

        Code Block
        languagebash
        rpm -i <madlib_package>


    2. Greenplum

...

    1. :

      • on Redhat / CentOS run the following as gpadmin:

        Code Block
        languagebash
        gppkg -i <madlib_package>

...


    1. NOTE: if you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.
  1. Ensure that the environment is setup for your database deployment and that the database is up and running.
    • Ensure that psql, postgres, and pg_config are in your path

      Code Block
      languagebash
      which psql
      which postgres 
      which pg_config


    • Ensure that the database is started and running

      Code Block
      languagebash
      psql -c 'select version()'

      The above may need user/port/password setting depending on how the database has been configured.

  2. Run the MADlib deployment utility to install deploy MADlib into each database that you want to use it:
    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install

      if environment variables are defined. Otherwise use a fully defined connection string:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install

      The above may need user/port/password setting depending on how the database has been configured.

    For more information on madpack:

    Code Block
    languagebash
    /usr/local/madlib/bin/madpack --help

    Help output for madpack is also attached to this wiki page for your reference.

    • Run the MADlib madpack deployment utility to install MADlib into each database that you want to use it in:

  3. After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.

  4. Test your installation

    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install-check


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install-check

      The above may need user/port/password setting depending on how the database has been configured. 

      Please note that if the optimizer_control GUC is set to off in Greenplum, the following install checks will fail, and these MADlib functions will not work:  decision tree, random forest, LDA , k-Means, PMML export for decision tree, PMML export for random forest.  This will be fixed in a future release (MADLIB-1109).  The parameter optimizer_control controls whether the server configuration parameter optimizer can be changed. The parameter optimizer controls whether the GPORCA optimizer is enabled when running SQL queries.

...

  • gcc and g++ (For OSX, Clang will work for compiling the source, but not for documentation.). Note: C++11 is not fully supported yet.  
  • m4
  • patch
  • cmake
  • pgxn installed
  • PostgreSQL (64-bit) 9.2+ with plpython support enabled. Note: plpython may not be enabled in Postgres by default.


Use below command to install and load the latest MADlib package uploaded on PGXN.  

Code Block
languagebash
pgxn install madlib
pgxn load madlib 

 If you see the following error, it's likely that you are using Parallel Execution flags for make. 

Code Block
languagebash
[ 86%] Performing build step for 'EP_boost'
Ignored: make
[ 86%] Performing install step for 'EP_boost'
Ignored: make
[ 86%] Completed 'EP_boost'
[ 86%] Built target EP_boost
make[1]: *** [all] Error 2
make: *** [all] Error 2
ERROR: command returned 2: make PG_CONFIG=/usr/local/pg10/bin/pg_config all

You can run this as a workaround:

Code Block
languagebash
MAKEFLAGS='-j1' pgxn install madlib
pgxn load madlib 

...

Or, if you want to use parallel execution, you can also install Boost 1.60 yourself, and tell cmake where to find it.

For example, on OSX that looks like this:


Code Block
languagebash
brew install boost@1.60
export BOOST_INCLUDEDIR=/usr/local/opt/boost@1.60/include/

Anchor
Compile From Source
Compile From Source
Compiling From Source

Prerequisites

Prerequisites

Requirements for installing MADlib:

  • gcc and g++ (
    • For
    OSX
    • OS X, Clang will work for compiling the source, but not for
    documentation.)
    • NOTE: On Ubuntu 16.04, you will need to use GCC 4 and not the default compiler (GCC 5) to run MADlib. Refer to MADLIB-1068 for details. The outstanding GCC 5 issue is being tracked under MADLIB-1145.
  • An installed version of Greenplum Database 4.2+ or PostgreSQL (64-bit) 9.2+ with plpython support enabled.  
    • NOTE: plpython may not be enabled in Postgres by default. 
  • MADlib works with Python 2.6 and 2.7.  Currently, Python 3.x is not supported.
    • the documentation. To compile on newer versions of XCode we need to enable the CXX11 flag. Setting -DCXX11=1 during cmake, will auto-download Boost 1.75.0 if Boost > 1.65.0 is not found on the system.
      Note: Setting -DCXX11=1 will enable C++11, which is not fully supported, i.e, MADlib compiles but some install-check/dev-check tests may fail.
  • python 2.6 or 2.7
    • python 3.x is not currently supported by MADlib.
  • cmake
    • NOTE:
    cmake (
    • the latest version of cmake might cause issues. Please try cmake 3.5.2 in case you get an error or a segmentation fault.
    )
    • NOTE: On Centos 6 (possibly other Linux variants), we have seen occasions where cmake will have issues running (seg fault) if the greenplum_path.sh file has been sourced prior to the cmake execution. If you encounter issues, you can use ldd on the cmake executable to confirm dynamic libraries are picked up from the Greenplum installation directories. If this is the case, start a new shell in which the greenplum_path.sh file is not sourced in your current running shell session. You can reference MADLIB-1093 for additional details.

Installing MADlib

  • An installed version of Greenplum Database or PostgreSQL (64-bit) 9.2+ with plpython support enabled.  
    • NOTE: plpython may not be enabled in Postgres by default.

Installing MADlib

In the $MADLIB_ROOT directory (location of the MADlib source) run the In the $MADLIB_ROOT directory (location of MADlib source) run the following commands:

Code Block
languagebash
mkdir build 
cd build 
cmake .. 
make -j8 # if this causes issues, switch back to a plan `make`

Above, we built the executables in the build folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT).

...

Deploy MADlib into the database with MADlib package manager madpack located under $BUILD_ROOT/src/bin.

Run the MADlib deployment utility to install MADlib into each database that you want to use it:

  • Postgres:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -s madlib –p postgres install

    if environment variables are defined. Otherwise use a fully defined connection stringTo install:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


  • Greenplum Database:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack –p greenplum install

    The above may need user/port/password setting depending on how the database has been configured.

  • To installTo make sure that the installation is successful:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check
    For more information on the usage of madpack
    ][:port][/database] install


  • To make sure that the installation is successful:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check


  • For more information on the usage of madpack:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack --help


Anchor
Env variables
Env variables
Defining environment variables

The variables below will be automatically used by the madpack installer if no connection string is provided:

  1. User: PGUSER or USER (defaults to OS username)
  2. Password: PGPASSWORD (defaults to empty)
  3. Host: PGHOST (defaults to 'localhost')
  4. Database: PGDATABASE (defaults to OS username)
  5. Port: PGPORT (defaults to 5432)

An example of deploying MADlib using the environment variables:

Code Block
languagebash
export PGPORT=5430
export PGHOST=127.0.0.1
export PGDATABASE=madlibtest
$BUILD_ROOT/src/bin/madpack -

...

p postgres install

Anchor

...

GPDB Variables
GPDB Variables

...

Defining

...

GPDB variables

The variables below will be automatically used by the madpack installer if no connection string is provided:

  1. User: PGUSER or USER (defaults to OS username)
  2. Password: PGPASSWORD (defaults to empty)
  3. Host: PGHOST (defaults to 'localhost')
  4. Database: PGDATABASE (defaults to OS username)
  5. Port: PGPORT (defaults to 5432)

An example of deploying MADlib using the environment variables:can be set in GPDB in case memory-related issues show up. Feel free to adjust them based on the specifics of the installed system.

Code Block
languagebashsql
set max_statement_mem='50GB';
set statement_mem='50GB';
set memory_spill_ratio=80;
set gp_resqueue_memory_policy=auto;
set work_mem='4GB';
set gp_vmem_protect_limit=20000export PGPORT=5430
export PGHOST=127.0.0.1
export PGDATABASE=madlibtest
$BUILD_ROOT/src/bin/madpack -p postgres install

Anchor
Upgrading MADlib gppkg
Upgrading MADlib gppkg
Upgrading MADlib gppkg

...