Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start.  Otherwise, follow the regular guides for installing from binaries or compiling from source.

For developers, you may want to use the Docker image described in the Developer Guide.

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called Release Specific Installations 82903224.

MADlib works with Python 2.6 and 2.7.  CurrentlyCurrently, Python 3.x is not supported.

Currently supported database version: postgres Postgres 9 and 10, greenplum Greenplum 4.3 and 5.X.

Anchor
Super Quick Start
Super Quick Start
Super Quick Start

To set up PostgreSQL + MADLib MADlib with Anaconda Python on OSX: 

...

MADlib requires the GNU M4 Unix macro processor which must be present for installation to succeed.

If the environment variables listed below are defined, it can save you some typing.

...

  • Ensure that you install Postgres with the Python extension specified (i.e., --with-python), as described here in the PostgreSQL documentation. If not you will see an error message like the one below when you try to install MADlib with madpack:
Code Block
 /usr/local/madlib/bin/madpack -s madlib -p postgres install
madpack.py : INFO : Detected PostgreSQL version 9.5.
madpack.py : INFO : *** Installing MADlib ***
madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py)
madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib)
madpack.py : INFO : Testing PL/Python environment...
madpack.py : INFO : > Creating language PL/Python...
madpack.py : ERROR : SQL command failed:
SQL: CREATE LANGUAGE plpythonu;
ERROR: could not access file "$libdir/plpython2": No such file or directory
madpack.py : ERROR : Cannot create language plpythonu. Please check if you
                have configured and installed portid (your platform) with
                `--with-python` option. Stopping installation...
madpack.py : ERROR : MADlib installation failed

...

  1. Download the MADlib binary
    • For Postgres: Get use either the OSX or Redhat/CentOS Linux binary from the MADlib download page
    • For Greenplum database : Download download the .gppkg binary from the Greenplum Advanced Analytics Group in Pivotal Network (NOTE: anybody can easily create an account on Pivotal Network and download the .gppkg)
  2. Install the package

...

  1. .
    1. Postgres:
      • on OSX double click the installer package
      • on Redhat / CentOS run the following as root:

        Code Block
        yum install <madlib_package> --nogpgcheck

        or

        Code Block
        languagebash
        rpm -i <madlib_package>


    2. Greenplum

...

    1. :

      • on Redhat / CentOS run the following as gpadmin:

        Code Block
        languagebash
        gppkg -i <madlib_package>

...


    1. NOTE: if you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.
  1. Ensure that the environment is setup for your database deployment and that the database is up and running.
    • Ensure that psql, postgres, and pg_config are in your path

      Code Block
      languagebash
      which psql
      which postgres 
      which pg_config


    • Ensure that the database is started and running

      Code Block
      languagebash
      psql -c 'select version()'

      The above may need user/port/password setting depending on how the database has been configured.

  2. Run the MADlib deployment utility to install MADlib into each database that you want to use it:

    • PostgresPostgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install

      if environment variables are defined. Otherwise use a fully defined connection string:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install

      The above may need user/port/password setting depending on how the database has been configured.

    For more information on madpack:

    Code Block
    languagebash
    /usr/local/madlib/bin/madpack --help
    Help output for madpack is also attached to this wiki page for your reference.
    • Run the MADlib madpack deployment utility to install MADlib into each database that you want to use it in:

  3. After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.

  4. Test your installation

    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install-check


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install-check

      The above may need user/port/password setting depending on how the database has been configured. 

      Please note that if the optimizer_control GUC is set to off in Greenplum, the following install checks will fail, and these MADlib functions will not work:  decision tree, random forest, LDA , k-Means, PMML export for decision tree, PMML export for random forest.  This will be fixed in a future release (MADLIB-1109).  The parameter optimizer_control controls whether the server configuration parameter optimizer can be changed. The parameter optimizer controls whether the GPORCA optimizer is enabled when running SQL queries.

...