THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- find relevant state
- what are the requirements? can we use any distribution as long as we can satisfy (3) (find correct channels)?
- filter out records/buffers
- after scaling out, state file can contain irrelevant records
- load data into the correct channels (IncputChannel/SubPartition)
- this should resolve MBR issues
- solution: task IDs?
- ensure ordering
- solution: epochs?
Open questions
Avoiding double processing in downstream (if continuous spilling)
...