Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

master build statusImage RemovedImage Added

This guide explains all of the elements needed to successfully develop and plug in a new MADlib® module.

...

Code Block
languagetext
## 1) Pull down the `madlib/postgres_9.6:latest` image from docker hub:
docker pull madlib/postgres_9.6:latest
## 2) Launch a container corresponding to the MADlib image, mounting the source code folder to the container:
docker run -d -it --name madlib -v (path to incubator-madlib directory):/incubator-madlib/ madlib/postgres_9.6
where incubator-madlib is the directory where the MADlib source code resides.
############################################## * WARNING * ##################################################
# Please be aware that when mounting a volume as shown above, any changes you make in the "incubator-madlib" 
# folder inside the Docker container will be reflected on your local disk (and vice versa). This means that
# deleting data in the mounted volume from a Docker container will delete the data from your local disk also.
#############################################################################################################
## 3) When the container is up, connect to it and build MADlib:
docker exec -it madlib bash
mkdir /incubator-madlib/build-docker
cd /incubator-madlib/build-docker
cmake ..
make
make doc
make install
## 4) Install MADlib:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install
## 5) Several other madpack commands can now be run:
# Run install check, on all modules:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check
# Run install check, on a specific module, say svm:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check -t svm
# Run dev check, on all modules (more comprehensive than install check):
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres dev-check
# Run dev check, on a specific module, say svm:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres dev-check -t svm
# Reinstall MADlib:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres reinstall
## 6) Kill and remove containers (after exiting the container):
docker kill madlib
docker rm madlib

...

Code Block
languagecpp
   /**
     * @brief Update state with a new data point
     */
    template <class OtherHandle>
    AvgVarTransitionState &operator+=(const double x){
        double diff = (x - avg);
        double normalizer = static_cast<double>(numRows + 1);
        // online update mean
        this->avg += diff / normalizer;
        // online update variance
        double new_diff = (x - avg);
        double a = static_cast<double>(numRows) / normalizer;
        this->var = (var * a) + (diff * new_diff) / normalizer;
    }
 
/**
 * @brief Merge with another State object
 *
 * We update mean and variance in a online fashion
 * to avoid intermediate large sum. 
 */
template <class OtherHandle>
AvgVarTransitionState &operator+=(
    const AvgVarTransitionState<OtherHandle> &inOtherState) {

    if (mStorage.size() != inOtherState.mStorage.size())
        throw std::logic_error("Internal error: Incompatible transition "
                               "states");
    double avg_ = inOtherState.avg;
    double var_ = inOtherState.var;
    uint16uint64_t numRows_ = static_cast<uint16cast<uint64_t>(inOtherState.numRows);
    double totalNumRows = static_cast<double>(numRows + numRows_);
    double p = static_cast<double>(numRows) / totalNumRows;
    double p_ = static_cast<double>(numRows_) / totalNumRows;
    double totalAvg = avg * p + avg_ * p_;
    double a = avg - totalAvg;
    double a_ = avg_ - totalAvg;

    numRows += numRows_;
    var = p * var + p_ * var_ + p * a * a + p_ * a_ * a_;
    avg = totalAvg;
    return *this;
}

...

Code Block
languagesql
SELECT madlib.avg_var(second_attack) FROM patients;

    -- ************ --
    --    Result    --
    -- ************ --
    +-------------------+
    | avg_var           |
    |-------------------|
    | [0.5, 0.25, 20.0] |
    +-------------------+
-- (average, variance, count) --

...


Anchor
Adding Iterative Module
Adding Iterative Module
Adding An Iterative UDF

...

The example below demonstrates the usage of madlib.logregr_simple_train on the patients table we used earlier. The trained classification model is stored in the table called logreg_mdl and can be viewed using standard SQL query. 


Code Block
languagesql
SELECT madlib.logregr_simple_train( 
    'patients',                                 -- source table
    'logreg_mdl',                               -- output table
    'second_attack',                            -- labels
    'ARRAY[1, treatment, trait_anxiety]');      -- features
SELECT * FROM logreg_mdl;

-- ************ --
--    Result    --
-- ************ --
+--------------------------------------------------+------------------+
| coef                                             |   log_likelihood |
|--------------------------------------------------+------------------|
| [-6.27176619714, -0.84168872422, 0.116267554551] |         -9.42379 |
+--------------------------------------------------+------------------+

...