Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page lists some starter projects that new contributors could work on as a way of getting more familiar with MADlib.  These starter JIRAs are tagged with the label "starter"  in https://issues.apache.org/jira/browse/MADLIB/.

Please also refer to the Contribution Guidelines and Quick Start Guide for Developers.  

Documentation

No.ItemDescriptionLinkStatus
1Improve module documentationReview the latest MADlib documentation http://doc.madlib.net/latest/ and make any needed updates to content or accuracy. You can also add additional examples.https://issues.apache.org/jira/browse/MADLIB-922Open
2Improve online helpStandardize on-line help so syntax is the same for all modules.https://issues.apache.org/jira/browse/MADLIB-923Open
3    

Bug Fixes and Improvements

No.ItemDescriptionLinkOpen
1Improved error message for Elastic Net predict()When we pass the selected coefficients to elastic net's "predict()" function, it throws as ugly error message which is not indicative of the real error.https://issues.apache.org/jira/browse/MADLIB-835

Open

2Confusing Error Messages while running elastic net prediction functionFix confusing error message

 

New Features

https://issues.apache.org/jira/browse/MADLIB-787Open
3LDA (parsed) model table and output table disagreeInvestigate and determine if this is an issue. If it is, repair it.https://issues.apache.org/jira/browse/MADLIB-899Open
4PivotalR test failures indicate potential bugs in MADlib GLMThese problems may be just numerical issues with too large the condition numbers or too small of a training set. To be investigated.https://issues.apache.org/jira/browse/MADLIB-896Open
5Implement skipping of arrays-with-NULL for elastic net predictBetter NULL handling for elastic net predict.https://issues.apache.org/jira/browse/MADLIB-919Open
6Improve RF output format for variable importanceEasier way of accessing the variable importance output from random forest so that I can understand which are the most important variables.https://issues.apache.org/jira/browse/MADLIB-925Open
7Covariance matrixAdd parameter to output covariance matrix to Pierson's correlation function.https://issues.apache.org/jira/browse/MADLIB-924Open
8    

Other

No.ItemDescriptionLinkOpen
1Add PMML export modules*Support additional MADlib modules for PMML export Open
2    


*Some notes on PMML below...

  • MADlib models can be exported in PMML format for use in scoring by a PMML evaluator.  

  • The following MADlib 1.8 algorithms can be exported in PMML format:

    • Linear regression

    • Logistic regression

    • GLM

    • Multinomial regression

    • Ordinal regression

    • Decision trees

    • Random forest

    • Your contribution here...

  • The Predictive Model Markup Language (PMML) is an XML-based file format that provides a way for applications to describe and exchange models produced by data mining and machine learning algorithms.

  • For more information, please see  http://www.dmg.org/

...

Non-Iterative Modules

 

Iterative Modules

 

PivotalR

PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database,  HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.   

It would be very valuable to add to support for more MADlib modules in PivotalR.  Please refer to this PivotalR wiki page for more information on how to do this.