MADlib® is an open-source library for scalable in-database analytics.
It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
Quick Start Guides
Get going with a minimum of fuss.
Developer Documentation
Contribute to the project.
General Information
Learn about MADlib.
Architecture
See how the pieces fit together.
Third Party Components
MADlib incorporates material from the following third-party components
argparse 1.2.1
"provides an easy, declarative interface for creating command line tools"Boost 1.47.0 (or newer)
"provides peer-reviewed portable C++ source libraries"doxypy 0.4.2
"is an input filter for Doxygen"Eigen 3.2.2
"is a C++ template library for linear algebra"PyYAML 3.10
"is a YAML parser and emitter for Python"PyXB 1.2.4
"is a Python library for XML Schema Bindings"
Licensing
License information regarding MADlib and included third-party libraries can be found inside thelicense
directory.
Release Notes
Historical release notes for releases prior to move to ASF.
Papers and Talks
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)
Related Software
PivotalR
- PivotalR also lets the user run the functions of the open-source big-data machine learning packageMADlib
directly from R.PyMADlib
- PyMADlib is a python wrapper for MADlib, which brings you the power and flexibility of python with the number crunching power ofMADlib
.