THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Unaligned checkpoints on non-keyed, forwarded operators are disabled by default. If we do not have in-flight data, we arrive at the status quo.
- We add an explicit toggle to force unaligned checkpoints on non-keyed operators for power users. That toggle enables unaligned checkpoints when ordering is not important or no rescaling from a checkpoint will occur.
- We devise and implement FLIP-? (light-weight partition reassignment), which will provide a very fine-grain mechanism to transfer the ownership of input splits/key groups. Based on this mechanism, we can fully support unaligned checkpoints on non-keyed operators.
Problem dimensions: keyed/non-keyed, scale-in/out, upstream/downstream
Possible issues: multi-buffer records, multi-record buffers, ordering (new/old data), pressure
General algorithm:
- find relevant state
- what are the requirements? can we use any distribution as long as we can satisfy (3) (find correct channels)?
- filter out records/buffers
- after scaling out, state file can contain irrelevant records
- load data into the correct channels (IncputChannel/SubPartition)
- this should resolve MBR issues
- ensure ordering
- epochs?
Open questions
Avoiding double processing in downstream (if continuous spilling)
...