Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [MADLIB-1399] Remove references to deprecated HAWQ/HDB

...

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called Release Specific Installations.

Please note that VM's with MADlib pre-installed are available from the Pivotal download site:  Greenplum database sandbox VM and Pivotal HDB sandbox VM.  Using the VMs may be an alternative to following the installation steps described on this page for some folks.MADlib works with Python 2.6 and 2.7.  Currently, Python 3.x is not supported.

...

  • PostgreSQL
  • Greenplum databasePivotal HDB/Apache HAWQ (incubating)

MADlib requires the GNU M4 Unix macro processor which must be present for installation to succeed.

...

  1. Download the MADlib binary

...

  • Postgres:
    • on OSX double click the installer package
    • on Redhat / CentOS run the following as root:

      Code Block
      yum install <madlib_package> --nogpgcheck

      or

      Code Block
      languagebash
      rpm -i <madlib_package>


  • Greenplum Database:
    • on Redhat / CentOS run the following as gpadmin:

      Code Block
      languagebash
      gppkg -i <madlib_package>
    HDB/HAWQ:

    on Redhat / CentOS run the following as gpadmin:

    Code Block
    languagebash
    gppkg -i <madlib_package>

If you are If you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.

  1. Ensure that the environment is setup for your database deployment and that the database is up and running.
    • Ensure that psql, postgres, and pg_config are in your path

      Code Block
      languagebash
      which psql
      which postgres 
      which pg_config


    • Ensure that the database is started and running

      Code Block
      languagebash
      psql -c 'select version()'

      The above may need user/port/password setting depending on how the database has been configured.

  2. Run the MADlib deployment utility to install MADlib into each database that you want to use it:

    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install

      if environment variables are defined. Otherwise use a fully defined connection string:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install

      The above may need user/port/password setting depending on how the database has been configured.HDB/HAWQ

    For more information on madpack:

    Code Block
    languagebash
    /usr/local/madlib/bin/madpack 
    –p hawq install
    --help

    Help output for madpack is also attached to this wiki page for

    The above may need user/port/password setting depending on how the database has been configured.

    For more information on madpack:

    Code Block
    languagebash
    /usr/local/madlib/bin/madpack --help

    Help output for madpack is also attached to this wiki page for your reference.

    After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions.  Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.


  3. Test your installation

    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install-check


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install-check

      The above may need user/port/password setting depending on how the database has been configured. 

      Please note that if the optimizer_control GUC is set to off in Greenplum, the following install checks will fail, and these MADlib functions will not work:  decision tree, random forest, LDA , k-Means, PMML export for decision tree, PMML export for random forest.  This will be fixed in a future release (MADLIB-1109).  The parameter optimizer_control controls whether the server configuration parameter optimizer can be changed. The parameter optimizer controls whether the GPORCA optimizer is enabled when running SQL queries.

    • HDB/HAWQ:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p hawq install-check

      The above may need user/port/password setting depending on how the database has been configured.

...

Anchor
PGXN
PGXN
Installing from PGXN (PostgreSQL)

Prerequisites

Prerequisites

Requirements for installing MADlib:

...

  • gcc and g++ (For OSX, Clang will work for compiling the source, but not for documentation.)
    • NOTE: On Ubuntu 16.04, you will need to use GCC 4 and not the default compiler (GCC 5) to run MADlib. Refer to MADLIB-1068 for details. The outstanding GCC 5 issue is being tracked under MADLIB-1145.
  • An installed version of HDB/HAWQ, Greenplum Database 4.2+ or PostgreSQL (64-bit) 9.2+ with plpython support enabled.  
    • NOTE: plpython may not be enabled in Postgres by default. 
  • MADlib works with Python 2.6 and 2.7.  Currently, Python 3.x is not supported.
  • cmake (the latest version of cmake might cause issues. Please try cmake 3.5.2 in case you get an error or a segmentation fault.)
    • NOTE: On Centos 6 (possibly other Linux variants), we have seen occasions where cmake will have issues running (seg fault) if the greenplum_path.sh file has been sourced prior to the cmake execution. If you encounter issues, you can use ldd on the cmake executable to confirm dynamic libraries are picked up from the Greenplum installation directories. If this is the case, start a new shell in which the greenplum_path.sh file is not sourced in your current running shell session. You can reference MADLIB-1093 for additional details.

...

The procedure exactly the same as described below for installation of MADlib on GPDB 4.3.10 .

...

10/

...

19/16 - Installation of MADlib 1.9.1 on

...

GPDB 4.3.10

The procedure exactly the same as described below for installation of MADlib on HDB/HAWQ 2.0.1

10/19/16 - Installation of MADlib 1.9.1 on GPDB 4.3.10

This is an important note for installation of MADlib on GPDB 4.3.10.  It does not apply to any other releases.

1) Fix madpack install utility
* issue: After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility (see below).  The script is downloadable from the Pivotal Network.

2) install checks
* issue: some failures may happen on MADlib install checks,  however the MADlib install actually completed OK.

This is a poor customer experience that will be fixed in the next release. On the positive side, once the installation is done, MADlib should work OK.

------------------------------

More on fixing madpack from #1 above:

After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility.
The syntax for fix_madpack.sh is below.

This can be somewhat confusing because after gppkg
installation, you will get a message on the console
that says:

“Please run the following command to deploy MADlib
usage: madpack install [-s schema_name] -p hawq -c user@host:port/database
etc...”

So the correct order of operations is:

1. gppkg install of MADlib
2. run fix_madpack.sh
3. run madpack utility

*****************************************************
COMMAND NAME: fix_madpack.sh
*****************************************************

Script to fix a MADlib installation issue on GPDB 4.3.10.

This script patches a line in madpack.py, the MADlib installation
script. A backup of the original file is created in the same folder as
madpack.py called 'madpack.py.orig'.  The script is downloadable from the Pivotal Network.

*****************************************************
SYNOPSIS
*****************************************************

fix_madpack.sh [--prefix <MADLIB_INSTALL_PATH>]

fix_madpack.sh -h

...

madpack install utility

The following tasks should be performed prior to executing this script:

* Set $GPHOME to the correct GPDB installation directory containing MADlib
OR
* Set MADlib installation path using the --prefix option

...

--prefix <MADLIB_INSTALL_PATH>
Optional. Expected MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.

-h | -? | --help
Displays the online help.

...

/home/gpadmin/madlib/fix_madpack.sh --prefix /usr/local/gpdb/madlib

10/7/16 - Installation of MADlib 1.9.1 on HDB/HAWQ 2.0.1

This is an important note for installation of MADlib on HDB/HAWQ 2.0.1.

1) gppkg
* issue: does not end cleanly after installing MADlib so you need to manually exit via ctl-z, however the MADlib install actually completed OK.

2) QuickLZ compression
* issue: After gppkg installation MADlib, you must run the script 
removefix_compressionmadpack.sh BEFORE running the madpack utility (see below).  The script is downloadable from the the Pivotal Network.

32) install checks
* issue: some failures may happen on MADlib install checks,  however the MADlib install actually completed OK.

...

------------------------------

More on removing compression fixing madpack from #2 #1 above:

After gppkg installation MADlib, you must run the script 
removefix_compressionmadpack.sh BEFORE sh BEFORE running the madpack utility.
The syntax for removefix_compressionmadpack.sh is sh is below.

This can be somewhat confusing because after gppkg
installation, you will get a message on the console
that says:

...

1. gppkg install of MADlib
2. run removerun fix_compressionmadpack.sh
3. run madpack utility

*****************************************************
COMMAND NAME: removefix_compressionmadpack.sh
*****************************************************

Script to fix a MADlib installation issue on GPDB 4.3.10.

This script patches a line in madpack.py, the MADlib installation
script. A backup of the original file is created in the same folder as
madpack.py called 'madpack.py.orig'MADlib install script for HDB/HAWQ 2.0.1+ to remove 'QUICKLZ' 
compression. Works on the current MADlib installation 
(but not all versions of MADlib in the case that multiple 
versions are installed).  The script is downloadable from the Pivotal Network.

*****************************************************
SYNOPSIS
*****************************************************

removefix_compressionmadpack.sh [--prefix <MADLIB_INSTALL_PATH>]

removefix_compressionmadpack.sh -h


*****************************************************
PREREQUISITES
*****************************************************

...

* Set $GPHOME to the correct HAWQ GPDB installation directory containing MADlib
OR
* Set MADlib installation path using the --prefix option

...


*****************************************************
EXAMPLE
*****************************************************

/home/gpadmin/madlib/removefix_compressionmadpack.sh --prefix /usr/local/hdbgpdb/madlib