Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please also refer to the Contribution Guidelines and Quick Start Guide for Developers.  

Documentation

No.ItemDescriptionLink
1Improve module documentationReview the latest MADlib documentation http://
doc
madlib.
madlib
apache.
net
org/docs/latest/ and make any needed updates to content or accuracy. You can also add additional examples.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-922
2Improve online helpStandardize on-line help so syntax is the same for all modules.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-923
3Create sample data science note booksCreate Jupyter or Apache Zeppelin showing how to use various modules in Apache MADlib. 
 
These are maintained at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-1127
4
 



Bug Fixes and Improvements

No.ItemDescriptionLink
1Improved error message for Elastic Net predict()When we pass the selected coefficients to elastic net's "predict()" function, it throws as ugly error message which is not indicative of the real error.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-835
2Confusing Error Messages while running elastic net prediction functionFix confusing error message
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-787
3LDA (parsed) model table and output table disagreeInvestigate and determine if this is an issue. If it is, repair it.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-899
4PivotalR test failures indicate potential bugs in MADlib GLMThese problems may be just numerical issues with too large the condition numbers or too small of a training set. To be investigated.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-896
5Implement skipping of arrays-with-NULL for elastic net predictBetter NULL handling for elastic net predict.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-919
6Improve RF output format for variable importanceEasier way of accessing the variable importance output from random forest so that I can understand which are the most important variables.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-925
7Covariance matrixAdd parameter to output covariance matrix to Pierson's correlation function.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-941
8
   




New

...

Non-Iterative Modules

No.ItemDescriptionLink
1
Add PMML export modules*Support additional MADlib modules for PMML export
k-Nearest NeighborsInitial implementation of k-NN
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-
926
927
2
   

*Some notes on PMML below...

  • MADlib models can be exported in PMML format for use in scoring by a PMML evaluator.  

  • The following MADlib 1.8 algorithms can be exported in PMML format:

    • Linear regression

    • Logistic regression

    • GLM

    • Multinomial regression

    • Ordinal regression

    • Decision trees

    • Random forest

    • Your contribution here...

  • The Predictive Model Markup Language (PMML) is an XML-based file format that provides a way for applications to describe and exchange models produced by data mining and machine learning algorithms.

  • For more information, please see  http://www.dmg.org/

New Non-Iterative Modules

k-Nearest Neighbors - Phase 2Add additional features to k-NN
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-1129
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-1059
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-1060
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
No.ItemDescriptionLink1k-Nearest NeighborsInitial implementation of k-NN JiraserverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-
927
1061
2
3Stratified samplingUtility to perform stratified, randomized, proportional sampling and labeling.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-986
3
4URI utilitiesA set of utilities for parsing and extracting URIs from text.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-910
4
5AnonymizationUtility for anonymization.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-911
5
6SessionizationUtility to partition event streams into sessions by timeouts and identifiers
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-909
   
7Mixed Effects ModelingMixed-effects model containing fixed-effects and random-effects components.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-987
8

New Methods: Expectation Maximimization

Gaussian mixture modeling and others
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-410
9
 




New Iterative Modules

No.ItemDescriptionLink
1
   2  
Model parameter weightingAssign weights to training samples or observations.
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-988
2New clustering algorithmOPTICS and/or DBSCAN clustering algo
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyMADLIB-1017
3
 




PivotalR

PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database,   HDB/HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.   

...