Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start.  Otherwise, follow the regular guides for installing from binaries or compiling from source.

For developers, you may want to use the Docker image described in the Developer Guide.

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called 82903224 Release Specific Installations.

MADlib works with Python 2.6 and requires python version 2.7. Currently, Python 3.x is not supported.

Currently supported database versions: Please see this page for supported databases and OS

...

The following python libraries are required for their associated modules:

Deep Learning: dill, grpcio==1.39.0, protobuf==3.17.3, hyperopt==0.2.5, tensorflow == 1.14, scikit-learn==0.19

XGBoost: pandas, xgboost==0.82

KNN: scipy==1.2.1

Unit tests: pgsanity

Anchor
Super Quick Start
Super Quick Start
Super Quick Start

To set up PostgreSQL + MADlib with Anaconda Python on OSX: 

  • PYTHON=/Users/janedoe/anaconda/bin/python 
  •  Install Postgres with the Python extension specified (i.e., --with-python), as described here in the PostgreSQL documentation. Note that previously you could install postgres with python support using brew by running 'brew install postgresql --with-python' but passing the '--with-python' flag is not supported anymore.

To set up PostgreSQL + MADlib with Anaconda Python on OSX: 

  •  PYTHON=/Users/janedoe/anaconda/bin/python
  •  brew install postgresql --with-python
  •  brew services start postgresql
  • - Set up database and roles

  • - Install the .dmg of latest madlib downloaded from MADlib website httphttps://madlib.apache.org/download.html

  •  /usr/local/madlib/bin/madpack -s madlib -p postgres install

...

MADlib requires the GNU M4 Unix macro processor which must be present for installation to succeed.

If the environment variables listed below are defined, it can save you some typing.

...

  1. Download the MADlib binary
    • For Postgres: OS X and Linux binaries can be found on the MADlib download page
    • For Greenplum: Linux .gppkg binaries can be found on Pivotal Network in the "Greenplum Advanced Analytics Group"
      • NOTE: the above .gppkg binaries work for both open and closed source Greenplum and can be downloaded by anybody (after creating a Pivotal Network account)
  2. Install the package.
    1. Postgres:
      • on OSX double click the installer package
      • on Redhat / CentOS run the following as root:

        Code Block
        yum install <madlib_package> --nogpgcheck

        or

        Code Block
        languagebash
        rpm -i <madlib_package>


    2. Greenplum:

      • on Redhat / CentOS run the following as gpadmin:

        Code Block
        languagebash
        gppkg -i <madlib_package>


    3. NOTE: if you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.
  3. Ensure that the environment is setup for your database deployment and that the database is up and running.
    • Ensure that psql, postgres, and pg_config are in your path

      Code Block
      languagebash
      which psql postgres pg_config


    • Ensure that the database is started and running

      Code Block
      languagebash
      psql -c 'select version()'

      The above may need user/port/password setting depending on how the database has been configured.

  4. Run the MADlib deployment utility to deploy MADlib into each database that you want to use it:
    • Postgres:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib –p postgres install

      if environment variables are defined. Otherwise use a fully defined connection string:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


    • Greenplum Database:

      Code Block
      languagebash
      /usr/local/madlib/bin/madpack –p greenplum install

      The above may need user/port/password setting depending on how the database has been configured.

      Run the MADlib madpack deployment utility to install MADlib into each database that you want to use it in:

  5. After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.

  6. Test your installation

...

  • Postgres:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -s madlib –p postgres install

    if environment variables are defined. Otherwise use a fully defined connection string:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install


  • Greenplum Database:

    Code Block
    languagebash
    $BUILD_ROOT/src/bin/madpack –p greenplum install

    The above may need user/port/password setting depending on how the database has been configured.

...