...
We would start from supporting a new offset acknowledgement semantics commit semantics as this unblocks the potential to process regardless of partition level ordering. The stateful operation support is of more value in nature, so concurrently we could add supports on key based filtering with Fetch calls. The stateless and transactional supports have to be built on top of the first two steps.
Stage name | Goal | Dependency |
---|---|---|
Individual acknowledgementcommit | Create a generic offset acknowledgement commit model beyond current partition → offset mapping. | No |
Key filtering based fetch on topic level | Add capability to FetchRequest with specific hashed key range or specific keys | No |
Transactional support | Incorporate individual ack commit into the transaction processing model | Key based filtering |
Support cooperative fetch on broker level | Round robin assign data to cooperative consumer fetches when there is no key-range specified | Individual ackcommit |
High Level Design
The design will be broken down, aligning with the roadmap defined above.
Individual
...
commit
draw.io Diagram | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The logic behind individual commit is very straightforward. For sequential commit each topic partition will only point to one offset number, while under individual commit it is possible to have "gaps" in between. The "stable offset" is a borrowed concept from transaction semantic which is just for illustration purpose, and the idea is that stable offset marks the position where all the messages before it are already committed. "Max commit offset" is the furtherest position this topic has advanced. It is allowed to have some commit gaps in between the stable offset and max offset.
In the individual commit mode, the offset metadata shall grow much quicker and harder to predict. To avoid messing up the stable offset, we propose to add another internal topic called `__individual_commit_offsets` which stores the gaps specifically
Public Interfaces
The
Compatibility, Deprecation, and Migration Plan
...
- KIP-253 proposed physical partition expansion, which is a fairly complex implementation and could be hard to reason about correctness.
- Some discussion around making Kafka Streams as a multi-threading model where consumers are completely decoupled from processing thread. This means we have to tackle tcommitle the concurrent processing challenge and there could be more inherent work to redesign state store semantics too.
...