Supplemental Spark Projects

This page tracks software projects that supplement Apache Spark and add to its ecosystem. To add an item to this page, please send a note to user@spark.apache.org with the name of the project, a brief description, and URL.

Spark Infrastructure

Spark Job Server - REST interface for managing and submitting Spark jobs on the same cluster (see blog post for details)
SparkR - R frontend for Spark
MLbase - Machine Learning research project on top of Spark
Apache Mesos - Cluster management system that supports running Spark
Tachyon - In memory storage system that supports running Spark
Spark Cassandra Connector - Easily load your Cassandra data into Spark and Spark SQL; from Datastax
ElasticSearch - Spark SQL Integration
Spark-Scalding - Easily transition Cascading/Scalding code to Spark
Zeppelin - an IPython-like notebook for Spark. There is also ISpark, and the Spark Notebook.

Interesting Spark Applications

Apache Mahout - Previously on Hadoop MapReduce, Mahout has switched to using Spark as the backend
Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
Spindle - Spark/Parquet-based web analytics query engine\
Thunderain - a framework for combining stream processing with historical data, think Lamba architecture
DF from Ayasdi - a Pandas-like data frame implementation for Spark

Child pages

Supplemental Spark Projects

Spark Infrastructure

Interesting Spark Applications