MADlib graduated to an Apache Top Level Project on 7/19/17. Read the press release.
Apache MADlib® is an open-source library for scalable in-database analytics.
It provides data-parallel implementations of mathematical, statistical,
graph and machine learning methods for structured and unstructured data.
Quick Start Guides
...
General Information
Learn about MADlib.
General Information
- MADlib website
- Greenplum database YouTube channel with MADlib content YouTube channel including step-by-step guides for common algorithms
- Module and algorithm documentation
- FAQ
Developer Documentation
- Source code repo
- Contribution Guidelines
- Documentation Guide (Doxygen)
- Ideas for contribution
- Algorithm technical design document
Architecture
See how the pieces fit together.
Release Notes
...
Third Party Components
...
argparse 1.2.1
provides an easy, declarative interface for creating command line toolsBoost 1.47.0 (or newer)
provides peer-reviewed portable C++ source librariesEigen 3.2.2
is a C++ template library for linear algebraPyYAML 3.10
is a YAML parser and emitter for PythonPyXB 1.2.4
is a Python library for XML Schema Bindings- Porter2 stemmer reduces workds to common roots for comparison and operating on.
- UseLATEX.cmake contains CMAKE commands to use the LaTeX compiler
Licensing
License information regarding MADlib and included third-party libraries can be found inside the license directory. ASF licensing guidance for MADlib pertaining to its pre-Apache history as an open source project with BSD licensing is described here.
Papers
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)
Related Software
PivotalR - lets the user run the functions of the open-source big-data machine learning package MADlib directly from R.
- PyMADlib - a nascent Python wrapper for MADlib, which brings you the power and flexibility of python with the number crunching power of MADlib.