Dual streaming and batch engine
Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:
Batch API (DataSet) | Streaming API (DataStream) | |
---|---|---|
Blocking execution | yes | no |
Pipelined execution | yes | yes |
Associated JIRA:
Expected: Q1 2015
Fine-grained fault tolerance for batch programs
Description: Currently, recovery upon failure backtracks until the data sources. This will add the option to checkpoint intermediate DataSets and backtrack from these checkpoints.
Associated JIRA:
Expected: Q2 2015
Interactive programs
Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.
Associated JIRA:
Expected: Q1 2015
Interactive Scala shell
Description: Be able to run Flink interactive programs from a Scala shell
Associated JIRA:
Expected: Q2/Q3 2015
Machine Learning library
Description: Create common code infrastructure (data types) and popular algorithms.
Associated JIRA:
Expected: Initial version with k-means, ALS, optimizationn in Q1 2015
Machine Learning library
Description: Create common code infrastructure (data types) and popular algorithms.
Associated JIRA:
Expected: Initial version with k-means, ALS, logistic regression in Q1 2015
Integrate with Mahout linear algebra DSL
Description: Make Flink a backend of Mahout DSL
Associated JIRA:
Expected: Q2 2015
Graph processing library
Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph
Associated JIRA:
Expected: Q1 2015
Logical Query Integration
Description: Enable SQL-style queries that use a Row data type with a logical schema.
Associated JIRA:
Expected: Q2 2015
SQL on Flink
Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in embedded/mixed mode and by submitting queries from a client.
Associated JIRA:
Expected: Q3/Q4 2015
Integrate with Tachyon
Description:
Associated JIRA:
Expected:
Integrate with Zeppelin
Description:
Associated JIRA:
Expected:
Integrate with Tez
Description: Enable Flink programs to run on Tez rather than using Flink's network stack. For certain use cases, this will give the option of running Flink programs with the resource elasticity that Tez provides.
Associated JIRA:
Expected: First version supporting a subset of Flink API in Q1 2015
Integrate with Samoa
Description:
Associated JIRA:
Expected:
Semantic annotations for optimization
Description: A lot of optimizations are not possible in Flink, because the optimizer does not know what is happening inside user-defined functions. By adding semantic information for user functions which tells the optimizer how a function behaves, some of these limitations can be overcome.
Associated JIRA:
Expected: Q1 2015
Plan choice hints
Description: Query optimizers are kind of black boxes and usually do a good job in finding efficient executions. However, in some cases the user/developer knows better and wants guide the optimizer or help to find a better plan. Flink’s optimizer offers several hints which are not well exposed in the API. Also documentation for how write well optimizable programs need to be improved.
Associated JIRA:
Expected: Q2 2015
Improved statistics for the optimizer
Description:
Associated JIRA:
Expected:
Use off-heap memory
Description:
Associated JIRA:
Expected:
Dynamic memory allocation
Description:
Associated JIRA:
Expected: