Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Unaligned checkpoints on non-keyed, forwarded operators are disabled by default. If we do not have in-flight data, we arrive at the status quo.
  • We add an explicit toggle to force unaligned checkpoints on non-keyed operators for power users. That toggle enables unaligned checkpoints when ordering is not important or no rescaling from a checkpoint will occur.
  • We devise and implement FLIP-? (light-weight partition reassignment), which will provide a very fine-grain mechanism to transfer the ownership of input splits/key groups. Based on this mechanism, we can fully support unaligned checkpoints on non-keyed operators.

Open questions

Handling records spanning multiple buffers

A single record can end up in different tasks while checkpointing.

Therefore, 

  1. without rescaling, even with fan-in topology, downstream needs to match incomplete record with the upstream; but this can be handled by having separate input channels
  2. when scaling-in upstream,  buffers of incomplete records need to be matched
  3. when scaling-in downstream, there can be multiple incomplete records in input buffers (it can be a single input channel after recovery)

There are two options:

  1. recover from one logical channel atomically (output of upstream, input of downstream) and always on one (downstream?) side. Output of upstream should be blocked during recovery (my idea was negative credit while upstream holds more buffers than it should, which would be taken from upstream buffer pool)
  2. join records on-the-fly (maintain some information like subtask id or record id and probably sort by it on sender and receiver)
    (where -> is a boundary)

Retaining subtask indices between restarts

This is needed to maintain correspondence between upstream output buffers and downstream input buffers.

Ad-hoc vs continuous spilling

Existing persistence mechanisms vs custom

Compatibility, Deprecation, and Migration Plan

...