Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Today most consumer based processing model honors the partition level ordering. However, ETL operations such as join, aggregation and so on are per-key level, so the relative order across different keys is not necessarydoes not require to be maintained, except for user customized operations. Many organizations are paying more system guarantee than what they actually need.

...

  1. Data consume and produce scales are no longer coupled. This means we could save money by configuring a reasonable topic with decent amount of partitions just for input traffic purposeto save cost.
  2. Better avoid partition level hotkeys. When a specific key is processing really slow, the decoupled key based consumption could bypass it and make progress on other keys.
  3. No operation overhead for scaling out in extreme cases. Users just need to add more consumer/stream capacity to unblock even there are a few partitions available.

...

As stated above, the scaling cap for consumer based application is the number of input partitions. In an extreme scenario when there is one single input partition with two consumers, one consumer must remain idle. If the single box consumer could not keep up the speed of processing, there is no solution to it but lagging. It would be ideal we could co-process data within one partition by two consumers when the partition level order is not required, such that we could add as many consumer instances as we want.

...

In the individual commit mode, the offset metadata shall grow much quicker and harder to predict. To avoid messing up the stable offsetThe log needs to be highly compacted to avoid disk waste, which is different from the existing consumer offset topic. Thus, we propose to add another internal topic called `__individual_commit_offsets` which stores the individual commits specifically, and call the current __consumed_offsets topic the primary offset topic. This  Furthermore, this isolation should make the development IC feature rollout more controllable by avoiding messing up stable offset in primary and achieve at-least-once when we need to delete the corrupted IC offset topic. The individual commit offset topic shall be required to co-locate with the __consumed_offsets topic, which means it has to share the configuration of number of partitions, replication factor and min replicas as primary offset topic.

...