Table of Contents |
---|
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: [One of "Under Discussion", "Accepted", "Rejected"]
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here [Change the link from KAFKA-1 to your own ticket]TBD
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Describe the problems you are trying to solve.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
A public interface is any change to the following:
Binary log format
The network protocol and api behavior
Any class in the public packages under clientsConfiguration, especially client configuration
org/apache/kafka/common/serialization
org/apache/kafka/common
org/apache/kafka/common/errors
org/apache/kafka/clients/producer
org/apache/kafka/clients/consumer (eventually, once stable)
Monitoring
Command line tools and arguments
- Anything else that will likely break existing users in some way when they upgrade
Proposed Changes
Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Rejected Alternatives
Rebalance during scaling up is always painful. Every newly joined member will keep the group at rebalancing stage until all of new instances finished bootstrapping. There could be multiple shuffling of active tasks around existing and new instances, thus decreasing the entire system availability. This negative impact has been mitigated after we introduced KIP-345. Under static membership, user could provide a list of hard-coded `group.instance.id`s to pre-register their identities on broker if the new host info is known, so that broker coordinator could respond to scaling operations more intelligently. For example when we scale up the fleet by defining 4 new client instance ids, the server shall wait until all 4 new members to join the group before kicking out only one rebalance, instead of four in the worst case.
Proposed Changes
This change requires us to change JoinGroup protocol to batch mode in order to easily scale multiple members at once.
Public Interfaces
We will bump JoinGroup request/response version to support batch adding members.
Code Block |
---|
JoinGroupRequest => GroupId SessionTimeout RebalanceTimeout MemberId GroupInstanceId ProtocolType GroupProtocols
GroupId => String
SessionTimeout => int32
RebalanceTimeout => int32
MemberId => String // removed
GroupInstanceId => String // removed
ProtocolType => String
GroupProtocols => [Protocol MemberMetadata]
Protocol => String // removed
MemberMetadata => bytes // removed
JoinGroupMembers => []JoinGroupRequestMember // new
MemberId => String // new
GroupInstanceId => String // new
Protocol => String // new
JoinGroupResponse => ThrottleTime ErrorCode GenerationId ProtocolName LeaderId MemberId Members
ThrottleTime => int16
ErrorCode => int16 // removed
GenerationId => int32
ProtocolName => String
LeaderId => String
MemberId => String // removed
Members => []JoinGroupResponseMember
MemberId => String
GroupInstanceId => String
Metadata => bytes
MemberJoinResponseList => []JoinGroupResult // new
MemberInfo => JoinGroupResponseMember // new
ErrorCode => int16 // new |
A new admin request shall be created for user to supply a list of `group.instance.id` to batch join the group:
Code Block | ||||
---|---|---|---|---|
| ||||
public static AddMemberResult addMembersToGroup(String groupId, list<String> groupInstanceIdsToAdd, AddMemberToGroupOptions options); |
In the meantime, for better visibility for static members, we are also going to bump DescribeGroup request/response protocol to include `group.instance.id`:
Code Block |
---|
DescribeGroupRequest => ThrottleTime Groups
ThrottleTime => int16
Groups => []DescribeGroups
ErrorCode => int16
GroupId => String
GroupState => String
ProtocolType => String
ProtocolData => int16
Members => []DescribedGroupMember
MemberId => String
GroupInstanceId => String // new
ClientId => String
ClientHost => String
MemberMetadata => bytes
MemberAssignment => bytes |
Compatibility, Deprecation, and Migration Plan
- User needs to upgrade broker to latest version to be able to use this new feature.
- Since we are only introducing new admin API, the change should be backward compatible.
Rejected Alternatives
We could trigger multiple join group requests at the same time without changing JoinGroup protocol. However, considering our change in LeaveGroupRequest, it's hard to handle multiple responses within single admin client request. Changing the protocol to adapt to this change shall be more consistentIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.