Dual streaming and batch engine

Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:

	Batch API (DataSet)	Streaming API (DataStream)
Blocking execution	yes	no
Pipelined execution	yes	yes

Associated JIRA:

Expected: Q1 2015

Fine-grained fault tolerance for batch programs

Description: Currently, recovery upon failure backtracks until the data sources. This will add the option to checkpoint intermediate DataSets and backtrack from these checkpoints.

Associated JIRA:

Expected: Q2 2015

Interactive programs

Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.

Associated JIRA:

Expected: Q1 2015

Machine Learning library

Description: Create common code infrastructure (data types) and popular algorithms.

Associated JIRA:

Expected: Initial version with k-means, ALS, optimizationn in Q1 2015

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Associated JIRA:

Expected: Q2 2015

Graph processing library

Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph

Associated JIRA:

Expected: Q1 2015

Logical Query Integration

Description: Enable SQL-style queries that use a Row data type with a logical schema.

Associated JIRA: [FLINK-947]

Expected: Q2 2015

SQL on Flink

Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in standalone and in embedded mode.

Associated JIRA:

Expected: Q3/Q4 2015

Integrate with Distributed Memory Storage

Description: Integrate with distributed memory storage (such as Tachyon) to allow lineage-based recovery

Associated JIRA:

Expected:

Integrate with Zeppelin

Description:

Associated JIRA:

Expected:

Integrate with Tez

Description: Enable Flink programs to run on Tez rather than using Flink's network stack. For certain use cases, this will give the option of running Flink programs with the resource elasticity that Tez provides.

Associated JIRA:

Expected: First version supporting a subset of Flink API in Q1 2015

Integrate with Samoa

Description: Create a Samoa adaptor

Associated JIRA:

Expected: Q1 2015

Semantic annotations for optimization

Description: A lot of optimizations are not possible in Flink, because the optimizer does not know what is happening inside user-defined functions. By adding semantic information for user functions which tells the optimizer how a function behaves, some of these limitations can be overcome.

Associated JIRA:

Expected: Q1 2015

Plan choice hints

Description: Query optimizers are kind of black boxes and usually do a good job in finding efficient executions. However, in some cases the user/developer knows better and wants guide the optimizer or help to find a better plan. Flink’s optimizer offers several hints which are not well exposed in the API. Also documentation for how write well optimizable programs need to be improved.

Associated JIRA:

Expected: Q2 2015

Improved statistics for the optimizer

Description: Improve data source statistics, integrate with data sources that already provide statistics (HCatalog)

Associated JIRA:

Expected: Q2 2015

Use off-heap memory

Description: Use off-heap memory for intermediate results, sorting and hashing. Reduces number of objects and size of JVM heap to make garbage collection more efficient.

Associated JIRA: Unable to render Jira issues macro, execution error.

Expected: Q1 2015

Dynamic memory allocation

Description: Allocate memory to operators based on a need/benefit scheme. Improves memory utilization for pipelined operators.

Associated JIRA:

Expected: Q2/Q3 2015

Incremental ML Library

Description:

Associated JIRA:

Expected:

Unify batch and streaming APIs

Description:

Associated JIRA:

Expected:

Page tree

Flink Roadmap

Dual streaming and batch engine

Fine-grained fault tolerance for batch programs

Interactive programs

Machine Learning library

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Graph processing library

Logical Query Integration

SQL on Flink

Integrate with Distributed Memory Storage

Integrate with Zeppelin

Integrate with Tez

Integrate with Samoa

Semantic annotations for optimization

Plan choice hints

Improved statistics for the optimizer

Use off-heap memory

Dynamic memory allocation

Incremental ML Library

Unify batch and streaming APIs