Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Spark Job Server - REST interface for managing and submitting Spark jobs on the same cluster (see blog post for details)
  • SparkR - R frontend for Spark
  • MLbase - Machine Learning research project on top of Spark
  • Apache Mesos - Cluster management system that supports running Spark
  • Alluxio (née Tachyon) - Memory speed virtual distributed storage system that supports running Spark
  • Spark Cassandra Connector - Easily load your Cassandra data into Spark and Spark SQL; from Datastax
  • FiloDB - a Spark integrated analytical/columnar database, with in-memory option capable of sub-second concurrent queries
  • ElasticSearch - Spark SQL Integration
  • Spark-Scalding - Easily transition Cascading/Scalding code to Spark
  • Zeppelin - an IPython-like notebook for Spark.  There is also ISpark, and the Spark Notebook.
  • IBM Spectrum Conductor with Spark - cluster management software that integrates with Spark
  • EclairJS enables Node.js developers to codeagainst Spark, and data scientists to use Javascript in Jupyter notebooks.
  • SnappyData - an open source OLTP + OLAP database integrated with Spark on the same JVMs.
  • GeoSpark - Geospatial RDDs and joins

Applications Using Spark

  • Apache Mahout - Previously on Hadoop MapReduce, Mahout has switched to using Spark as the backend
  • Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
  • BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
  • Spindle - Spark/Parquet-based web analytics query engine
  • Spark Spatial - Spatial joins and processing for Spark
  • Thunderain - a framework for combining stream processing with historical data, think Lamba architecture
  • DF from Ayasdi - a Pandas-like data frame implementation for Spark
  • Oryx -  Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
  • ADAMA framework and CLI for loading, transforming, and analyzing genomic data using Apache Spark