Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide explains all of the elements needed to successfully develop and plug in a new MADlib module.

...

Anchor
Prereq
Prereq
Prerequisites

Follow Install MADlib by following the steps in the Installation Guide for MADlib.

MADlib source code is organized such that the core logic of a machine learning or statistical module is located in a common location, and the database-port specific code is located in a ports folder.  Since all currently supported databases are based on Postgres, the postgres port contains all the port-specific files, with greenplum and hawq inheriting from it.  Before proceeding with this guide, it is recommended that you familiarize yourself with the MADlib architecture.

...

Let's add a new module called hello_world. Inside this module we implement a User-Defined SQL Aggregate (UDA), called avg_var which  which computes the mean and variance for a given numerical column of a table.  We'll implement a distributed version of Welford's online algorithm for computing the mean and variance.

Unlike an ordinary UDA in PostgreSQL, avg_var will also work on a distributed database and take advantage of the underlying distributed network for parallel computations.  The usage of avg_var is very simple: ; users simply run the following command in psql:

...

Below are the main steps we will go through in this guide:

  1. Register the module.
  2. Define the SQL functions.
  3. Implement the functions in C++.
  4. Register the C++ header files.

...

Add the following line to the file called Modules.yml under ./src/config/ yaml

Code Block
languagetext
- name: hello_world

...

Code Block
languagesql
DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.avg_var(DOUBLE PRECISION);

CREATE AGGREGATE MADLIB_SCHEMA.avg_var(DOUBLE PRECISION) (
    SFUNC=MADLIB_SCHEMA.avg_var_transition,
    STYPE=double precision[],
    FINALFUNC=MADLIB_SCHEMA.avg_var_final,
    m4_ifdef(`__POSTGRESQL__', `', `PREFUNC`prefunc=MADLIB_SCHEMA.avg_var_merge_states,')
    INITCOND='{0, 0, 0}'
);
 

We also define parameters passed to CREATE AGGREGATE:

...