Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Moved source links to the beginning of the exercise

...

  1. Register the module.
  2. Define the SQL functions.
  3. Implement the functions in C++.
  4. Register the C++ header files.

The files for this exercise can be found in the hello world folder of the source code repository.

1. Register the module

Add the following line to the file called Modules.yml under ./src/config/

...

Code Block
languagesql
SELECT madlib.avg_var(second_attack) FROM patients;

    -- ************ --
    --    Result    --
    -- ************ --
    +-------------------+
    | avg_var           |
    |-------------------|
    | [0.5, 0.25, 20.0] |
    +-------------------+
-- (average, variance, count) --

...

The files for the above exercise can be found in the hello world folder of the source code repository.

Anchor
Adding Iterative Module
Adding Iterative Module
Adding An Iterative UDF

...

Compared to the steps presented in the last session, here we do not need to modify the Modules.yml file because we are not creating new module. Another difference is that we create an additional .py_in python file along with the .sql_in file.  That is where most of the iterative logic will be implemented.

The files for this exercise can be found in the hello world folder of the source code repository.

1. Overview

The overall logic is split into three parts. All the UDF and UDA are defined in simple_logistic.sql_in. The transition, merge and final functions are implemented in C++. Those functions together constitute the UDA called __logregr_simple_step which takes one step from the current state to decrease the logistic regression objective. And finally in simple_logistic.py_in the plpy package is used to implement in python a UDF called logregr_simple_train which invokes __logregr_simple_step iteratively until convergence.

...

Code Block
languagesql
SELECT madlib.logregr_simple_train( 
    'patients',                                 -- source table
    'logreg_mdl',                               -- output table
    'second_attack',                            -- labels
    'ARRAY[1, treatment, trait_anxiety]');      -- features
SELECT * FROM logreg_mdl;

-- ************ --
--    Result    --
-- ************ --
+--------------------------------------------------+------------------+
| coef                                             |   log_likelihood |
|--------------------------------------------------+------------------|
| [-6.27176619714, -0.84168872422, 0.116267554551] |         -9.42379 |
+--------------------------------------------------+------------------+

...

...

 

...