Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add Key decisions section

...

Note, that during downscaling the size limit of “max inflight data” might be temporarily exceeded during recovery (see recovery section).

Key decisions

  1. snapshot channel state when a checkpoint arrives (as opposed to continuously writing all output buffers): POC showed it's more efficient and has lower latency
  2. use the existing checkpointing mechanism (see Persistence)
  3. restore upstream/downstream parts of a channel in the corresponding tasks (as opposed to restoring the whole channel in the downstream task): less coupling and easier to implement if use the existing checkpointing mechanism

Recovery

Assignment of state and mapping it to in-memory data structures (without rescaling)

...