Dual streaming and batch engine
Description: Natively support both blocking and pipelined mode of execution for both batch (DataSet) and stream (DataStream) programs. Batch (DataSet) programs will be able to use a combination of blocking and pipelining. Stream (DataStream) programs will use pipelining. Interactive programs (programs that bring back results to the client) will use blocking. Note that the notion of batch/streaming is an API notion, and the notion of blocking/pipelining is a runtime engine concept. The ways that these will interleave is the following:
Batch API (DataSet) | Streaming API (DataStream) | |
---|---|---|
Blocking execution | yes | no |
Pipelined execution | yes | yes |
Associated JIRA:
Expected: Q1 2015
Fine-grained fault tolerance for batch programs
Description: Currently, recovery upon failure backtracks until the data sources. This will add the option to checkpoint intermediate DataSets and backtrack from these checkpoints.
Associated JIRA:
Expected: Q2 2015
Interactive programs
Description: Programs that are partially executed in the cluster and partially in the client, They consist of many small programs submitted by the driver program, with driver-side logic in-between.
Associated JIRA:
Expected: Q1 2015
Interactive Scala shell
Description: Be able to run Flink interactive programs from a Scala shell
Associated JIRA:
Expected: Q2/Q3 2015
Machine Learning library
Description: Create common code infrastructure (data types) and popular algorithms.
Associated JIRA:
Expected: Initial version with k-means, ALS, optimizationn in Q1 2015
Integrate with Mahout linear algebra DSL
Description: Make Flink a backend of Mahout DSL
Associated JIRA:
Expected: Q2 2015
Graph processing library
Description: Create a library of common graph operations on a distrivuted Graph data type. The library currently lives in this github repository: https://github.com/project-flink/flink-graph
Associated JIRA:
Expected: Q1 2015
Logical Query Integration
Description: Enable SQL-style queries that use a Row data type with a logical schema.
Associated JIRA:
Expected: Q2 2015
SQL on Flink
Description: Enable some variant of SQL (likely HiveQL) to run on top of Flink, both in standalone and in embedded mode.
Associated JIRA:
Expected: Q3/Q4 2015
Integrate with Tachyon
Description: Integrate with Tachyon storage and lineage-based recovery
Associated JIRA:
Expected:
Integrate with Zeppelin
Description:
Associated JIRA:
Expected:
Integrate with Tez
Description: Enable Flink programs to run on Tez rather than using Flink's network stack. For certain use cases, this will give the option of running Flink programs with the resource elasticity that Tez provides.
Associated JIRA:
Expected: First version supporting a subset of Flink API in Q1 2015
Integrate with Samoa
Description: Create a Samoa adaptor
Associated JIRA:
Expected: Q1 2015
Semantic annotations for optimization
Description: A lot of optimizations are not possible in Flink, because the optimizer does not know what is happening inside user-defined functions. By adding semantic information for user functions which tells the optimizer how a function behaves, some of these limitations can be overcome.
Associated JIRA:
Expected: Q1 2015
Plan choice hints
Description: Query optimizers are kind of black boxes and usually do a good job in finding efficient executions. However, in some cases the user/developer knows better and wants guide the optimizer or help to find a better plan. Flink’s optimizer offers several hints which are not well exposed in the API. Also documentation for how write well optimizable programs need to be improved.
Associated JIRA:
Expected: Q2 2015
Improved statistics for the optimizer
Description: Improve data source statistics, integrate with data sources that already provide statistics (HCatalog)
Associated JIRA:
Expected: Q2 2015
Use off-heap memory
Description: Use off-heap memory for intermediate results, sorting and hashing. Reduces number of objects and size of JVM heap to make garbage collection more efficient.
Associated JIRA:
Expected: Q1 2015
Dynamic memory allocation
Description: Allocate memory to operators based on a need/benefit scheme. Improves memory utilization for pipelined operators.
Associated JIRA:
Expected: Q2/Q3 2015
Incremental ML Library
Description:
Associated JIRA:
Expected:
Unify batch and streaming APIs
Description:
Associated JIRA:
Expected: