Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Motivation

This FLIP addresses twothree, currently interconnected problems : The arbitrary structure of the existing iteration model and the inconsistency of the current job termination protocol due to pending iterative computation and possible deadlocks.

1) Loops API

The current implementation of loops (or Iterations) on the streaming API of Flink does not allow for a deterministic termination of jobs and depends on user-defined timeouts to enforce termination. This impacts correctness, resulting to incomplete processing or/and inconsistent state when iterations exist in the topology. We propose a new functional, yet compositional API (i.e. nested loops) for expressing asynchronous DataStream loops with proper scoping and a distributed coordination algorithm for graph termination that exploits scoping information. 

 

2) Cyclic Flow Control (Deadlock and Starvation-Free)

Currently, Flink's backpressure mechanism might have deadlocks. The heart of the problem is that there are finite resources (i.e., number of network buffers) along the conceptual cyclic backpressure graph. 


3) Termination

Currently, an iteration head and tail are two obscure, interconnected dataflow operators that utilize an in-memory blocking queue. A fixed timeout at the consumer side (head) terminates the head task and subsequently allows for a graceful, yet irregular, shut-down of the dataflow graph in topological operator order.

...