Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Unaligned checkpoints on non-keyed, forwarded operators are disabled by default. If we do not have in-flight data, we arrive at the status quo.
  • We add an explicit toggle to force unaligned checkpoints on non-keyed operators for power users. That toggle enables unaligned checkpoints when ordering is not important or no rescaling from a checkpoint will occur.
  • We devise and implement FLIP-? (light-weight partition reassignment), which will provide a very fine-grain mechanism to transfer the ownership of input splits/key groups. Based on this mechanism, we can fully support unaligned checkpoints on non-keyed operators.


Problem dimensions: keyed/non-keyed, scale-in/out, upstream/downstream

Possible issues: multi-buffer records, multi-record buffers, ordering (new/old data), pressure

General algorithm:

  1. find relevant state
    1. what are the requirements? can we use any distribution as long as we can satisfy (3) (find correct channels)?
  2. filter out records/buffers
    1. after scaling out, state file can contain irrelevant records
  3. load data into the correct channels (IncputChannel/SubPartition)
    1. this should resolve MBR issues
  4. ensure ordering
    1. epochs?

Open questions

Avoiding double processing in downstream (if continuous spilling)

...