...
This is the schema of the metadata advertised by each member.
Name | Type | Doc |
---|---|---|
ProcessId | uuid / static | Identity of the instance that may have multiple consumers. |
UserEndPoint | bytes / static | Used for cross-client communication. |
ClientTags | map / static | Used for rack-aware assignment algorithm. |
TopologyHash | uuid / dynamic | Only updatable when reason is not zero. |
TaskLag | array / dynamic | Only updatable when reason is not zero. |
Member Metadata Reasons
- None (0)
- Shutdown (1)
- WarmUpReady (2)
- WarmUpFailed (3)
- TopologyChanged (4)
Assignment Metadata Schema
Name | Type | Doc |
---|---|---|
ActiveTasks | list | Local assignment for this consumer. |
StandbyTasks | map | Local standby tasks for this consumer. |
WarmupTasks | map | Local warming up tasks for this consumer. |
PartitionsByHost | map | Global assignment information used for IQ. |
Assignment Metadata Errors
...
Compatibility, Deprecation, and Migration Plan
Cluster Upgrade
- Deploy the new version of the software
- Roll the cluster
- Enable the new protocol
- Roll the cluster
Group Upgrade
- Deploy the new software
- Roll the members
- Enable the new protocol
- Roll the members
Regular expression.
- Kafka 4.0 opt-in, consumer support.
- Kafka 4.x opt-in, streams support.
- Kafka 5.0, new protocol by default.
Test Plan
- Unit/Integration/System tests
- Simulation Tests
- TLA+
Rejected Alternatives
TODO
Future Work
...
The change is mainly backward compatible. The current protocol is still supported without any changes. The consumers and streams should be able to upgrade to the new protocol without any issues. The only part that may require adaptations is the regex based subscriptions. At the moment, those are validated on the client side based and that is language specific. We plan to use https://github.com/google/re2 on the server side so clients may have to adapt their regex based subscriptions when they migrate to the new protocol. We believe that the transition should be frictionless for most of the users.
The migration requires to first upgrades the servers to the new software and enable the new group coordinator. A roll is required for this. Then, the consumers must be upgraded to the new software as well and the new protocol must be enabled. This can be done with on roll if the consumer version is supported by the upgrade path. If not, two rolls are required.
We plan to release the feature in a Kafka 4.x release, possibly 4.0. The feature will be opt-in to start. We plan to make it the default in Kafka 5.0.
We don't plan to deprecate the current rebalance protocol anytime soon because Connect and others still rely on it. However, we will deprecate the consumer embedded protocol used in the current protocol in Kafka 5.0 and to remove it in Kafka 6.0. Consumers and Streams won't be able to use it any longer after this.
Test Plan
Our primary method for testing the implementation will be through Discrete Event Simulation (DES). DES allows us to test a large number of deterministically generated random scenarios which include various kinds of faults (such as network partitions). It allows us to define system invariants programmatically which are then checked after each step in the simulation. The protocol will be formally verified with a TLA+ model as well. Other than that, we will use the typical suite of unit/integration/system tests. System tests will be parameterised to run with both protocols.
Rejected Alternatives
TODO
Future Work
Eventually, we aim at deprecating the current membership/rebalance API. In order to get to this point, we would need to first move all the use cases away from it.
Connect Group/Rebalance Protocol
Kafka Connect is the second protocol type which is currently supported by Apache Kafka. We propose to use a similar approach that the one used by the current proposal for Connect in the future. We would introduce a new connect group type and introduce a new set of APIs for Connect. The rebalance protocol is very similar to the consumer rebalance protocol but works with different resource types.
Membership/Leader Election API
The group membership protocol is also used outside of Apache Kafka. For instance, the Confluent Schema Registry uses it for leader election. It is not clear whether we really want to suppose such cases in the future. If we do, we could also define a new set of APIs for it. That would be much cleaner in the long run.