Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide explains all the elements needed to successfully develop and plug in a new MADlib module.

  1. Prerequisites
  2. Adding a New Module
  3. Adding an Iterative UDF

...

The files for the examples in this guide can be found in the the hello world folder  folder of the source code .

...

repository.

Anchor

...

Prereq

...

Prereq

...

Prerequisites

Follow steps in the Installation Guide for MADlib.

...

MADlib source code is organized such that the core logic of a machine learning or statistical module is located in a common location and the database-port specific code is located in a ports folder.  Since all currently supported databases are based on Postgres, the postgres port contains all the port-specific files, with greenplum and hawq inheriting from it.  Before proceeding with this guide, it is recommended that you familiarize yourself with the MADlib architecture.

Anchor
Adding New Module
Adding New Module
Adding A New Module

Let us 's add a new module called hello_world. Inside this module we implement a User-Defined SQL Aggregate (UDA), called avg_var which computes the mean and variance for a given numerical column of a table.  We'll implement a distributed version of Welford's online algorithm for computing the mean and variance.

...