Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Simplicity. Easy for developers and users to use.

Non-goals

Full snapshot of a Samza job's state at a particular point in time.

Rewinding or fast-fowarding state store changelogs in relation to the input streams. The challenge is that the changelog stream is typically log compacted.

...

Different systems in Samza have different formats for checkpoint offsets and lacks any contract that describes the offset format. To maintain backwards compatibility and to have better operability for setting starting offsets, this solution takes the approach of defining the concept of Startpoints and utilizing a metadata storage layer separate from manipulating the checkpoint offsets directly in the checkpoint stream.

The Startpoint indicates what offset position a particular SystemStreamPartition should start consuming from. The Startpoint takes higher precedence than Checkpoints and defines the position type and the position value of the position type. For example, if the Startpoint position type is TIMESTAMP, then the position value is an epoch value. The Startpoint enforces a stricter contract for external tools and services to follow as opposed to the string offset value in the Checkpoint.

...