This guide explains all the elements needed to successfully develop and plug in a new MADlib module.
...
The files for the examples in this guide can be found in the the hello world folder folder of the source code .
...
repository.
Anchor |
---|
...
|
...
|
...
Prerequisites
Follow steps in the Installation Guide for MADlib.
...
MADlib source code is organized such that the core logic of a machine learning or statistical module is located in a common location and the database-port specific code is located in a ports
folder. Since all currently supported databases are based on Postgres, the postgres
port contains all the port-specific files, with greenplum
and hawq
inheriting from it. Before proceeding with this guide, it is recommended that you familiarize yourself with the MADlib architecture.
Anchor | ||||
---|---|---|---|---|
|
Let us 's add a new module called hello_world
. Inside this module we implement a User-Defined SQL Aggregate (UDA), called avg_var which computes the mean and variance for a given numerical column of a table. We'll implement a distributed version of Welford's online algorithm for computing the mean and variance.
...