...
The files for the examples can be found in the hello world folder of the source code.
Anchor | ||||
---|---|---|---|---|
|
Follow steps in Installation Guide for MADlib.
Anchor | ||||
---|---|---|---|---|
|
MADlib source code is organized such that the core logic of a machine learning or statistical module is placed located in a common location and the database-port specific code is collected located in a ports
folder. Since all currently supported databases are based on Postgres, the postgres
port contains all the port-specific files, with greenplum
and hawq
inheriting from it. Before proceeding with this guide, it is recommended that you familiarize yourself with the [[Module anatomy| MADlib Module Anatomy]].
Anchor | ||||
---|---|---|---|---|
|
Let us add a new module called hello_world. Inside this module we implement a User-Defined SQL Aggregate (UDA), called avg_var which computes the mean and variance for a given numerical column of a table. We'll implement a distributed version of Welford's online algorithm for computing the mean and variance.
...
The files for above exercise can be found in the examples folder of the source code.
Anchor | ||||
---|---|---|---|---|
|
In this session we demonstrate a slightly more complicated example which requires invoking a UDA iteratively. Such cases can often be found in many machine learning modules where the underlying optimization algorithm takes iterative steps towards the optimum of the objective. In this example we implement a simple logistic regression solver as an iterative UDF. In particular, the user will be able to type the following command in psql
to train a logistic regression classifier:
...