Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

These fixes only get the pipeline running again but the inflight intermediate stream data and any samza managed state that was reset is consequently lost. Aside from the data loss, this process of a pipeline upgrade is evidently cumbersome. Therefore, it is important to support the ability to drain and stop a pipeline smoothly prior to making an upgrade which involves an incompatible intermediate schema change. The work here is scoped to drain only intermediate stream (shuffle) data. The onus of clearing user state, if required, is on the user and can be done by switching to a fresh state.

Proposed Changes

We propose a drain operation for Samza pipelines which halts the pipeline from consuming new data, completes processing of buffered data and then stops the pipeline. Below is the high level approach with finer details covered in the sections below: 

...