Dual streaming and batch engine

Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:

	Batch API (DataSet)	Streaming API (DataStream)
Blocking execution	yes	no
Pipelined execution	yes	yes

Associated JIRA:

Expected: Q1 2015

Fine-grained fault tolerance for batch programs

Description: Currently, recovery upon failure backtracks until the data sources. This will add the option to checkpoint intermediate DataSets and backtrack from these checkpoints.

Associated JIRA:

Expected: Q2 2015

Interactive programs

Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.

Associated JIRA:

Expected: Q1 2015

Interactive Scala shell

Description: Be able to run Flink interactive programs from a Scala shell

Associated JIRA:

Expected: Q2/Q3 2015

Machine Learning library

Description: Create common code infrastructure (data types) and popular algorithms.

Associated JIRA:

Expected: Initial version with k-means, ALS, optimizationn in Q1 2015

Machine Learning library

Description: Create common code infrastructure (data types) and popular algorithms.

Associated JIRA:

Expected: Initial version with k-means, ALS, logistic regression in Q1 2015

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Associated JIRA:

Expected: Q2 2015

Graph processing library

Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph

Associated JIRA:

Expected: Q1 2015

Logical Query Integration

Description: Enable SQL-style queries that use a Row data type with a logical schema.

Associated JIRA:

Expected: Q2 2015

SQL on Flink

Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in embedded/mixed mode and by submitting queries from a client.

Associated JIRA:

Expected: Q3/Q4 2015

Integrate with Tachyon

Description:

Associated JIRA:

Expected:

Integrate with Zeppelin

Description:

Associated JIRA:

Expected:

Integrate with Tez

Description: Enable Flink programs to run on Tez rather than using Flink's network stack. For certain use cases, this will give the option of running Flink programs with the resource elasticity that Tez provides.

Associated JIRA:

Expected: First version supporting a subset of Flink API in Q1 2015

Integrate with Samoa

Description:

Associated JIRA:

Expected:

Semantic annotations for optimization

Description: A lot of optimizations are not possible in Flink, because the optimizer does not know what is happening inside user-defined functions. By adding semantic information for user functions which tells the optimizer how a function behaves, some of these limitations can be overcome.

Associated JIRA:

Expected: Q1 2015

Plan choice hints

Description: Query optimizers are kind of black boxes and usually do a good job in finding efficient executions. However, in some cases the user/developer knows better and wants guide the optimizer or help to find a better plan. Flink’s optimizer offers several hints which are not well exposed in the API. Also documentation for how write well optimizable programs need to be improved.

Associated JIRA:

Expected: Q2 2015

Improved statistics for the optimizer

Description:

Associated JIRA:

Expected:

Use off-heap memory

Description:

Associated JIRA:

Expected:

Dynamic memory allocation

Description:

Associated JIRA:

Expected:

Page tree

Flink Roadmap

Dual streaming and batch engine

Fine-grained fault tolerance for batch programs

Interactive programs

Interactive Scala shell

Machine Learning library

Machine Learning library

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Graph processing library

Logical Query Integration

SQL on Flink

Integrate with Tachyon

Integrate with Zeppelin

Integrate with Tez

Integrate with Samoa

Semantic annotations for optimization

Plan choice hints

Improved statistics for the optimizer

Use off-heap memory

Dynamic memory allocation