Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Today Kafka Streams stream time inference and synchronization is tricky to understand and also introduces much non-detereminism into the processing ordering when a task is fetching from multiple input streams (a.k.a. topic-partitions from Kafka).

More specifically, a stream task may contain multiple input topic-partitions. And hence we need to decide 1) which topic-partition to pick the next record to process for this task, 2) how stream time will be advanced when records are being processed. Today this logic is determined in the following way:

...

c) The task's stream time is defined as the smallest value across all of its partitions' timestamp, and hence is also monotonically increasing.


The above behavior is based on the motivations that for operations such as joins, we want to synchronize the timestamps across multiple input streams so that we can avoid "out-of-ordering" data across multiple topic-partitions in a best effort (note that within a single topic-partition we always process data following the offset ordering).

We have observed the above mechanism has the following issue:

...