Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Out of all the scaling needs, as we discussed in KIP-X, the scaling out consumer is indeed hard as we have to struggle with server side filtering and a not well scaled offset commit mode. Furthermore, most cases the number of consumers is not the bottleneck - the total parallelism is. The biggest win could in fact come from scaling the the single consumer application, instead of adding more consumers.

One more thing we We don't want constantly to tackle is the synchronization. Constantly synchronizing the states between threads eithermeans a lot of performance penalty. So the individual commit feature is still a favored approach to solve the problem. What we need is just a different way of viewing the offset data format. Instead of mapping from offset → committed boolean, we actually need to map from worker-id → committed offsets. From the old design, coordination point was pushed down and way frequent that potentially hurts the performance. If different sub task gets the chance to be assigned with partial tasks, the 

High Level Design

In offset commit protocol, we always have a topic partition → offset mapping to remember our progress. In fact if suppose we build a consumer with multi-threading access, we could actually do the rebalance assignment of key ranges to workers and let those mappings returned and stored on broker side. In this way, say if we have two workers A and B sharing the same consumer, they should be able to commit their progress individually by (worker-id, offset) pairs. Adding the group assignment message which has key range mapping, we could easily do the client side filtering for the first generation if possible. This work also unblocks the potential later if we want consumer level scaling by defining their individual key ranges, so that we could allow concurrent commit.

...