Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For graphs with no loops, the termination happens in topological sequential order, from sources to sinks. Starting with the end-of-stream on all regular sources, each operator terminates in-order after all prior operators have already terminated. This is trivial when we have a DAG, but not likewise when loops exist in the pipeline. Computation might still be pending in the distributed dataflow graph, even when all regular sources are done. There is, however, a practical way to terminate any arbitrary dataflow graph with or without cycles. For a correct termination, the operators can determine in a decentralized manner that there is no pending data/computation in the pipeline. Here is a simple protocol that puts this logic into practice.


Technique Overview

The main idea of the decentralised graph termination algorithm is that when loops are scoped, we can trivially coordinate the termination decision recursively, starting from outermost loop of the graph down to the innermost and then back. The important thing is to get knowledge of the status of all operators within a scope and its inner scopes. Thus, we propose an abortable, distributed passing-the-baton approach to traverse the graph and collect all statuses of the operators in order to determine whether termination is feasible.

...

Fig.2 - A dataflow graph example that contains nested loops

Iteration-decentralized-alg3.png

 

In Fig. 2 we depict an example throughout all phases to use as a reference. The protocol works as follows:

...

Once all regular stream sources have reached the “end-of-stream” the system enters into a “Attempting” phase, upon which the following abortable protocol becomes active:

  1. The regular stream sources broadcast a TerminationAttempt event to downstream operators

  2. Three types of operators might receive this event

...

  1. Otherwise, the Operator will repeat protocol from step  2.5.1 with an increased attempt id

 

Compatibility, Deprecation, and Migration Plan

...

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other wayYou can find the prior proposal that was rejected here (Part I). That is more general but requires more RPC communication which can be preferably avoided.