THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- not writing the same buffer multiple times (when the job is backpressured); can be addressed by implementing incremental checkpointing for FS backend
- incremental loading and processing of state
- no additional memory: ideally, existing network buffers should be reused
Open questions
Avoiding double processing in downstream (if continuous spilling)
With continuous spilling, upstream saves all its output which includes what was already processed/buffered by the downstream.
Possible solutions:
- downstream loads upstream output buffers file and filters out what it has already processed; files are named deterministically and indexed
- same, but offset in the file is calculated based on buffer size
- downstream communicates its offset to upstream during snapshotting
- downstream communicates its offset to upstream during recovery
Compatibility, Deprecation, and Migration Plan
...