Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JC - Job coordinator.

Requirements

Goals

  1. Provide a simple and generic interface to manipulate starting offsets per input stream partition instead of relying on system-specific checkpoint tools and services. This allows flexibility to build REST API layers and tools on top of the common interface.
  2. Allow defining starting offsets on an input stream by SSP across all tasks or SSP per task.

...

  1. Framework level support for various offset types such as, specific offsets and timestamp-based offsets

...

  1. .
  2. Provide safety by setting starting offsets out-of-band and not directly in the checkpoints.
  3. Simplicity. Easy for developers and users to create tools and services to set starting offsets of a given Samza job.

Non-goals

  1. Rewinding or fast-fowarding state store changelogs in relation to the input streams. The challenge is that the changelog stream is typically log compacted.
  2. Providing a service that externally exposes a Startpoint API. Such services require other core changes and will be explored in another SEP or design document.

Proposed Implementation

Different systems in Samza have different formats for checkpoint offsets and lack any contract that describes the offset format. To maintain backwards compatibility and to have better operability for setting starting offsets, this solution introduces the concept of Startpoints and utilizes the abstract metadata storage layer.

...