This page tracks external software projects that supplement Apache Spark and add to its ecosystem.
Spark Packages
The Spark package index provides a community-managed list of libraries and applications that work with Spark. You can add a package as long as you have a GitHub repository.
Infrastructure Projects
- Spark Job Server - REST interface for managing and submitting Spark jobs on the same cluster (see blog post for details)
- SparkR - R frontend for Spark
- MLbase - Machine Learning research project on top of Spark
- Apache Mesos - Cluster management system that supports running Spark
- Tachyon - Memory centric distributed storage system that supports running Spark
- Spark Cassandra Connector - Easily load your Cassandra data into Spark and Spark SQL; from Datastax
- ElasticSearch - Spark SQL Integration
- Spark-Scalding - Easily transition Cascading/Scalding code to Spark
- Zeppelin - an IPython-like notebook for Spark. There is also ISpark, and the Spark Notebook.
Applications Using Spark
- Apache Mahout - Previously on Hadoop MapReduce, Mahout has switched to using Spark as the backend
- Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
- BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
- Spindle - Spark/Parquet-based web analytics query engine
- Thunderain - a framework for combining stream processing with historical data, think Lamba architecture
- DF from Ayasdi - a Pandas-like data frame implementation for Spark