Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the infra cost perspective, pre-define a higher number of partitions will definitely increase the network traffic as more metadata and replication will be needed. Besides extra money paid, the operation overhead increases while maintaining broker cluster in good shape with more topic partitions beyond necessity. It's been a known pain point for Kafka streaming processing scalability which is of great value to be resolved. 

Further more take Kafka Streams as an example, the processing model honors the partition level ordering. However, most operations such as join, aggregation and so on are per-key level, so the relative order across different keys is not necessary, except for user customized operations. 

The proposal here, is to decouple the consumption and physical partition count, by making consumers capable of collaborating on the same topic partition. There are a couple of benefits compared with existing model:

  1. Data consume and produce scales are no longer coupled. This means we could save money by configuring a reasonable input topic with decent amount of partitions.
  2. Better avoid partition level hotkeys. When a specific key is processing really slow, the decoupled key based consumption could bypass it and make progress on other keys.
  3. No operation overhead for scaling out. Users just need to add more consumer/stream capacity to unblock .

For streaming operations such as join, aggregation and so on, the relative order between different keys is not necessary to be maintained, however currently topic partition data must be processed in order for easy offset management. If one hot key is blocking the way, all subsequent data processing will be stopped. 

Public Interfaces

The 

Proposed Changes

  1. even there are fewer consumers.

Proposed Changes


Public Interfaces

The Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

...