Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add section on "Offsets for Consumer Rebalances"

...

The existing .checkpoint files will be retained for any StateStore that does not set managesOffsets()  to true , and the existing offsets will be automatically migrated into StateStores that manage their own offsets, iff there is no offset returned by StateStore#getCommittedOffset.

Offsets for Consumer Rebalances

During Consumer rebalance, Streams directly checks all on-disk .checkpoint files, including those for Tasks/StateStores that are not currently "open". This is done to ensure that Tasks are assigned to the instance with the most complete local state. With Atomic Checkpointing, any StateStore that manages its own offsets (i.e. managesOffsets() is true) will first need to be opened in order to read its offsets, and then closed again.

The additional latency this would introduce to the rebalance is unacceptable. Instead, we will continue to write the offsets of all StateStores, including those that manage their own offsets, to the .checkpoint file. For StateStores that manage their own offsets, this file will only be read if that store is closed (i.e. not yet initialized). Since the store is not open, the above race condition does not apply, making this safe. If a StateStore returns true for both  managesOffsets  and isOpen , then the store will be queried for its offsets directly, via getCommittedOffset.

Interactive Queries

Interactive queries currently see every record, as soon as they are written to a StateStore. This can cause some consistency issues, as interactive queries can read records before they're committed to the Kafka changelog, which may be rolled-back. To address this, interactive queries will query the underlying StateStore, and will not be routed through a Transaction. This ensures that interactive queries see a consistent view of the store, as they will not be able to read any uncommitted records.

...