Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Startpoints are written to the metadata store using two key types: SSP-only and SSP+TaskName. For broadcast input streams, an SSP may span across multiple tasks and therefore, Startpoints are applied at the task level. For Startpoints on SSP-only keys, the JC will have a mechanism to fan out the SSP across all tasks that the SSP maps to. The below diagram illustrates the flow.

As with Checkpoints, Startpoints are applied to the starting offset of an SSP in a task instance during the start up time of the SamzaContainer.

Committing Startpoints

Once a particular Startpoint is applied to the starting offset of a system-stream-partition in a task instance, it is subsequently removed at the next offset commit. 

...

Previous explored solutions involved modifying the checkpoint offsets directly. Operationally, the Startpoint solution provides more safety because checkpoints are used for fault-tolerance. To prevent human error, the framework should not allow an external source to manipulate the checkpointed offsets. The ability to set starting offsets out-of-band as Startpoints is designed to do, provides the additional safety layer.

Another pain point for manipulating checkpoints is that it requires the Samza job to be stopped. Startpoints can be written while the job is running.

Startpoint Intent-ACK Model

...