Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The files for the examples can be found in the hello world folder of the source code.

  1. Install MADlib
  2. Module Anatomy Explained
  3. Adding A New Module
  4. Adding An Iterative UDF

Anchor
Install
Install
Installation

Follow steps in Installation Guide for MADlib.

Anchor
Anatomy
Anatomy
Module Anatomy

MADlib source code is organized such that the core logic of a machine learning or statistical module is placed located in a common location and the database-port specific code is collected located in a ports folder. Since all currently supported databases are based on Postgres, the postgres port contains all the port-specific files, with greenplum and hawq inheriting from it. Before proceeding with this guide, it is recommended that you familiarize yourself with the [[Module anatomy| MADlib Module Anatomy]].

Anchor
Adding New Module
Adding New Module
Adding A New Module

Let us add a new module called hello_world. Inside this module we implement a User-Defined SQL Aggregate (UDA), called avg_var which computes the mean and variance for a given numerical column of a table. We'll implement a distributed version of Welford's online algorithm for computing the mean and variance.

...

The files for above exercise can be found in the examples folder of the source code.

Anchor
Adding Iterative Module
Adding Iterative Module
Adding An Iterative UDF

In this session we demonstrate a slightly more complicated example which requires invoking a UDA iteratively. Such cases can often be found in many machine learning modules where the underlying optimization algorithm takes iterative steps towards the optimum of the objective. In this example we implement a simple logistic regression solver as an iterative UDF. In particular, the user will be able to type the following command in psql to train a logistic regression classifier:

...