Dual streaming and batch engine
Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:
Batch API (DataSet) | Streaming API (DataStream) | |
---|---|---|
Blocking execution | yes | no |
Pipelined execution | yes | yes |
Associated JIRA:
Expected: Q1 2015
Fine-grained fault tolerance for batch programs
Description: Currently, recovery upon failure backtracks until the data sources. This will add the option to checkpoint intermediate DataSets and backtrack from checkpoints.
Associated JIRA:
Expected: Q2 2015
Interactive programs
Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.
Associated JIRA:
Expected: Q1 2015
Interactive Scala shell
Description: Be able to run Flink interactive programs from a Scala shell
Associated JIRA:
Expected: Q2/Q3 2015
Machine Learning library
Description: Create common code infrastructure (data types) and popular algorithms.
Associated JIRA:
Expected: Initial version with k-means, ALS, optimizationn in Q1 2015
Machine Learning library
Description: Create common code infrastructure (data types) and popular algorithms.
Associated JIRA:
Expected: Initial version with k-means, ALS, logistic regression in Q1 2015
Integrate with Mahout linear algebra DSL
Description: Make Flink a backend of Mahout DSL
Associated JIRA:
Expected: Q2 2015
Graph processing library
Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph
Associated JIRA:
Expected: Q1 2015
Logical Query Integration
Description: Enable SQL-style queries that use a Row data type with a logical schema.
Associated JIRA:
Expected: Q2 2015
SQL on Flink
Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in embedded/mixed mode and by submitting queries from a client.
Associated JIRA:
Expected: Q3/Q4 2015
Integrate with Tachyon
Description:
Associated JIRA:
Expected:
Integrate with Zeppelin
Description:
Associated JIRA:
Expected:
Integrate with Tez
Description:
Associated JIRA:
Expected:
Integrate with Samoa
Description:
Associated JIRA:
Expected:
Semantic annotations for optimization
Description:
Associated JIRA:
Expected:
Improved statistics for the optimizer
Description:
Associated JIRA:
Expected:
Use off-heap memory
Description:
Associated JIRA:
Expected:
Dynamic memory allocation
Description:
Associated JIRA:
Expected: