...
Please also refer to the Contribution Guidelines and Quick Start Guide for Developers.
Documentation
No. | Item | Description | Link |
---|
1 | Improve module documentation | Review the latest MADlib documentation http://madlib. |
incubator.apache.org/docs/latest/ and make any needed updates to content or accuracy. You can also add additional examples. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-922 |
---|
|
|
2 | Improve online help | Standardize on-line help so syntax is the same for all modules. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-923 |
---|
|
|
3 | Create sample data science note books | Create Jupyter or Apache Zeppelin showing how to use various modules in Apache MADlib. |
| These are maintained at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1127 |
---|
|
|
4 |
Bug Fixes and Improvements
No. | Item | Description | Link |
---|
1 | Improved error message for Elastic Net predict() | When we pass the selected coefficients to elastic net's "predict()" function, it throws as ugly error message which is not indicative of the real error. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-835 |
---|
|
|
2 | Confusing Error Messages while running elastic net prediction function | Fix confusing error message | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-787 |
---|
|
|
3 | LDA (parsed) model table and output table disagree | Investigate and determine if this is an issue. If it is, repair it. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-899 |
---|
|
|
4 | PivotalR test failures indicate potential bugs in MADlib GLM | These problems may be just numerical issues with too large the condition numbers or too small of a training set. To be investigated. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-896 |
---|
|
|
5 | Implement skipping of arrays-with-NULL for elastic net predict | Better NULL handling for elastic net predict. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-919 |
---|
|
|
6 | Improve RF output format for variable importance | Easier way of accessing the variable importance output from random forest so that I can understand which are the most important variables. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-925 |
---|
|
|
7 | Covariance matrix | Add parameter to output covariance matrix to Pierson's correlation function. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-941 |
---|
|
|
8 |
| | ...
New Non-Iterative Modules
Add PMML export modules* | Support additional MADlib modules for PMML exportk-Nearest Neighbors | Initial implementation of k-NN | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB- |
---|
|
|
926 | | | *Some notes on PMML below...
MADlib models can be exported in PMML format for use in scoring by a PMML evaluator.
The following MADlib 1.9 algorithms can be exported in PMML format:
The Predictive Model Markup Language (PMML) is an XML-based file format that provides a way for applications to describe and exchange models produced by data mining and machine learning algorithms.
For more information, please see http://www.dmg.org/
JPMML is an open source PMML evaluator available under GPL license.
For more information, please see https://github.com/jpmml and https://github.com/jpmml/openscoring
You can only export from MADlib into PMML (no import currently)
New Non-Iterative Modules
k-Nearest Neighbors - Phase 2 | Add additional features to k-NN | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1129 |
---|
|
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1059 |
---|
|
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1060 |
---|
|
|
No. | Item | Description | Link |
---|
1 | k-Nearest Neighbors | Initial implementation of k-NN Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB- |
---|
|
|
92723 | Stratified sampling | Utility to perform stratified, randomized, proportional sampling and labeling. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-986 |
---|
|
|
34 | URI utilities | A set of utilities for parsing and extracting URIs from text. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-910 |
---|
|
|
45 | Anonymization | Utility for anonymization. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-911 |
---|
|
|
56 | Sessionization | Utility to partition event streams into sessions by timeouts and identifiers | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-909 |
---|
|
|
67 | Mixed Effects Modeling | Mixed-effects model containing fixed-effects and random-effects components. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-987 |
---|
|
|
| | | |
8 | New Methods: Expectation Maximimization | Gaussian mixture modeling and others | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-410 |
---|
|
|
9 |
New Iterative Modules
No. | Item | Description | Link |
---|
1 | Model parameter weighting | Assign weights to training samples or observations. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-988 |
---|
|
|
2 |
| | | New clustering algorithm | OPTICS and/or DBSCAN clustering algo | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1017 |
---|
|
|
3 |
PivotalR
PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database, HDB/HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.
...