...
This page lists some starter projects that new contributors could work on as a way of getting more familiar with MADlib®. These starter JIRAs are tagged with the label "starter" in https://issues.apache.org/jira/browse/MADLIB/.
Please also refer to the Contribution Guidelines and Quick Start Guide for Developers.
Documentation
Status |
---|
1 | Improve module documentation | Review the latest MADlib documentation http:// |
docmadlibnetorg/docs/latest/ and make any needed updates to content or accuracy. You can also add additional examples. |
https://issues.apache.org/jira/browse/MADLIB-922 | Open Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-922 |
---|
|
|
2 | Improve online help | Standardize on-line help so syntax is the same for all modules. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-923 |
---|
|
|
3 | Create sample data science note books | Create Jupyter or Apache Zeppelin showing how to use various modules in Apache MADlib. These are maintained at https:// |
issuesapache.org/jira/browse/MADLIB-923Open | 3 | | | | com/apache/madlib-site/tree/asf-site/community-artifacts | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1127 |
---|
|
|
4 |
Bug Fixes and Improvements
Open |
---|
1 | Improved error message for Elastic Net predict() | When we pass the selected coefficients to elastic net's "predict()" function, it throws as ugly error message which is not indicative of the real error. |
https://issues.apache.org/jira/browse/ Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-835 |
---|
|
|
Open | 2 | Confusing Error Messages while running elastic net prediction function | Fix confusing error message |
https://issues.apache.org/jira/browse/MADLIB-787 | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-787 |
---|
|
|
Open |
3 | LDA (parsed) model table and output table disagree | Investigate and determine if this is an issue. If it is, repair it. |
https://issues.apache.org/jira/browse/MADLIB-899 | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-899 |
---|
|
|
Open |
4 | PivotalR test failures indicate potential bugs in MADlib GLM | These problems may be just numerical issues with too large the condition numbers or too small of a training set. To be investigated. |
https://issues.apache.org/jira/browse/MADLIB-896 | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-896 |
---|
|
|
Open |
5 | Implement skipping of arrays-with-NULL for elastic net predict | Better NULL handling for elastic net predict. |
https://issues.apache.org/jira/browse/MADLIB-919 | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-919 |
---|
|
|
Open |
6 | Improve RF output format for variable importance | Easier way of accessing the variable importance output from random forest so that I can understand which are the most important variables. |
https://issues.apache.org/jira/browse/MADLIB-925 | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-925 |
---|
|
|
Open |
7 | Covariance matrix | Add parameter to output covariance matrix to Pierson's correlation function. |
https://issues.apache.org/jira/browse/MADLIB-924 | Open | 8 | | | | |
...
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-941 |
---|
|
|
8 |
|
|
|
New Non-Iterative Modules
Open | 1 | Add PMML export modules* | Support additional MADlib modules for PMML export |
---|
1 | k-Nearest Neighbors | Initial implementation of k-NN | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-927 |
---|
|
|
2 | k-Nearest Neighbors - Phase 2 | Add additional features to k-NN | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1129 |
---|
|
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1059 |
---|
|
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB- |
---|
|
|
926Open | 2 | | | | |
*Some notes on PMML below...
MADlib models can be exported in PMML format for use in scoring by a PMML evaluator.
The following MADlib 1.8 algorithms can be exported in PMML format:
The Predictive Model Markup Language (PMML) is an XML-based file format that provides a way for applications to describe and exchange models produced by data mining and machine learning algorithms.
For more information, please see http://www.dmg.org/
JPMML is an open source PMML evaluator available under GPL license.
For more information, please see https://github.com/jpmml and https://github.com/jpmml/openscoring
You can only export from MADlib into PMML (no import currently)
New Non-Iterative Modules
New Iterative Modules
Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1061 |
---|
|
|
3 | Stratified sampling | Utility to perform stratified, randomized, proportional sampling and labeling. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-986 |
---|
|
|
4 | URI utilities | A set of utilities for parsing and extracting URIs from text. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-910 |
---|
|
|
5 | Anonymization | Utility for anonymization. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-911 |
---|
|
|
6 | Sessionization | Utility to partition event streams into sessions by timeouts and identifiers | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-909 |
---|
|
|
7 | Mixed Effects Modeling | Mixed-effects model containing fixed-effects and random-effects components. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-987 |
---|
|
|
8 | New Methods: Expectation Maximimization | Gaussian mixture modeling and others | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-410 |
---|
|
|
9 |
|
|
|
New Iterative Modules
No. | Item | Description | Link |
---|
1 | Model parameter weighting | Assign weights to training samples or observations. | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-988 |
---|
|
|
2 | New clustering algorithm | OPTICS and/or DBSCAN clustering algo | Jira |
---|
server | ASF JIRA |
---|
columns | key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | MADLIB-1017 |
---|
|
|
3 |
...
PivotalR
PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database, HDB/HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.
...