Dual streaming and batch engine

Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:

	Batch API (DataSet)	Streaming API (DataStream)
Blocking execution	yes	no
Pipelined execution	yes	yes

Associated JIRA:

Expected: Q1 2015

Fine-grained fault tolerance for batch programs

Description: Currently, recovery upon failure backtracks until the data sources. This will add an option to checkpoint intermediate DataSets and backtrack from checkpoints.

Associated JIRA:

Expected: Q2 2015

Interactive programs

Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.

Associated JIRA:

Expected: Q1 2015

Interactive Scala shell

Description: Be able to run Flink interactive programs from a Scala shell

Associated JIRA:

Expected: Q2/Q3 2015

Machine Learning library

Description: Create common code infrastructure (data types) and popular algorithms.

Associated JIRA:

Expected: Initial version with k-means, ALS, optimizationn in Q1 2015

Machine Learning library

Description: Create common code infrastructure (data types) and popular algorithms.

Associated JIRA:

Expected: Initial version with k-means, ALS, logistic regression in Q1 2015

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Associated JIRA:

Expected: Q2 2015

Graph processing library

Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph

Associated JIRA:

Expected: Q1 2015

Logical Query Integration

Description: Enable SQL-style queries that use a Row data type with a logical schema.

Associated JIRA:

Expected: Q2 2015

SQL on Flink

Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in embedded/mixed mode and by submitting queries from a client.

Associated JIRA:

Expected: Q3/Q4 2015

Integrate with Tachyon

Description:

Associated JIRA:

Expected:

Integrate with Zeppelin

Description:

Associated JIRA:

Expected:

Integrate with Tez

Description:

Associated JIRA:

Expected:

Integrate with Samoa

Description:

Associated JIRA:

Expected:

Semantic annotations for optimization

Description:

Associated JIRA:

Expected:

Improved statistics for the optimizer

Description:

Associated JIRA:

Expected:

Use off-heap memory

Description:

Associated JIRA:

Expected:

Dynamic memory allocation

Description:

Associated JIRA:

Expected:

Page tree

Flink Roadmap

Dual streaming and batch engine

Fine-grained fault tolerance for batch programs

Interactive programs

Interactive Scala shell

Machine Learning library

Machine Learning library

Integrate with Mahout linear algebra DSL

Description: Make Flink a backend of Mahout DSL

Graph processing library

Logical Query Integration

SQL on Flink

Integrate with Tachyon

Integrate with Zeppelin

Integrate with Tez

Integrate with Samoa

Semantic annotations for optimization

Improved statistics for the optimizer

Use off-heap memory

Dynamic memory allocation