This page tracks the users of Spark. Feel free to add yourself to this list (you will need a wiki user account) and explain how you use Spark. Please add a short description (up to three bullet points) and a link to your organization or project.
Companies & Organizations
- UC Berkeley AMPLab - Big data research lab that initially launched Spark
- We're building a variety of open source projects on Spark, including Shark, MLbase, and Spark Streaming, and developing new distributed systems techniques that improve the engine
- We have both graduate students and a team of professional software engineers working on the stack
- Adatao, Inc. - Pervasive Data Science in the Enterprise
- Team of ex-Googlers & Yahoos with large-scale infrastructure experience (including both flavors of MapReduce at Google & Yahoo) & PhD's in ML/Data Mining
- Determined that Spark, among the many alternatives, answered the right problem statements with the right design
- Amrita Center for Cyber Security Systems and Networks
- Autodesk
- Baidu
- Celtra
- Conviva - Experience Live
- See our talk at AmpCamp on how we are using Spark to provide real time video optimization
- Databricks
- Digby
- Exabeam
- Falkonry
- Freeman Lab at HHMI
- We are using Spark for analyzing and visualizing patterns in large-scale recordings of brain activity in real time
- GraphFlow, Inc.
- Groupon
- Istanbul Sehir University
- Knoldus Software LLC
- Magine TV
- MediaCrossing - Digital Media Trading Experts in the New York and Boston areas
- We are using Spark as a drop-in replacement for Hadoop Map/Reduce to get the right answer to our queries in a much shorter amount of time.
- NFLabs
- Nokia Solutions and Networks
- Ooyala, Inc. - Powering personalized video experiences across all screens
- See our blog post on how we use Spark for Fast Queries
- See our presentation on Cassandra, Spark, and Shark
- Peerialism
- PlanBMedia
- Premise
- Sohu
- Taobao
- TruEffect Inc
- Tuplejump
- UC Santa Cruz
Software and Research Projects
- Shark - Hive and SQL on top of Spark
- MLbase - Machine Learning project on top of Spark
- BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
- GraphX - a graph processing & analytics framework on top of Spark
- Apache Mesos - Cluster management system that supports running Spark
- Tachyon - In memory storage system that supports running Spark
- BigR - Native R (and other front-ends) for Big-Data-Science/Machine-Learning with open API on top of Spark+Hadoop (soon to be open sourced)
- Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
- OpenDL - Deep learning training work based on Spark. Just kick off