Table of Contents |
---|
Status
Current state: Under Discussion
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Rebalance during scaling up is always painful. Every newly joined member will keep the group at rebalancing stage until all of new instances finished bootstrapping. There could be multiple shuffling of active tasks around existing and new instances, thus decreasing the entire system availability. This negative impact has been mitigated after we introduced KIP-345. Under static membership, user could provide a list of hard-coded `group.instance.id`s to pre-register their identities on broker if the new host info is known, so that broker coordinator could respond to scaling operations more intelligently. For example when we scale up the fleet by defining 4 new client instance ids, the server shall wait until all 4 new members to join the group before kicking out only one rebalance, instead of four in the worst case.
Proposed Changes
This change requires us to change JoinGroup protocol to batch mode in order to easily scale multiple members at once.
Public Interfaces
We will bump JoinGroup request/response version to support batch adding members.
...
A new admin request shall be created for user to supply a list of `group.instance.id` for to batch join the group:
Code Block | ||||
---|---|---|---|---|
| ||||
public static AddMemberResult addMemberToGroup(String groupId, list<String> groupInstanceIdsToAdd, AddMemberToGroupOptions options); |
...
Code Block |
---|
DescribeGroupRequest => ThrottleTime Groups ThrottleTime => int16 Groups => []DescribeGroups ErrorCode => int16 GroupId => String GroupState => String ProtocolType => String ProtocolData => int16 Members => []DescribedGroupMember MemberId => String GroupInstanceId => String // new ClientId => String ClientHost => String MemberMetadata => bytes MemberAssignment => bytes |
Compatibility, Deprecation, and Migration Plan
- User needs to upgrade broker to latest version to be able to use this new feature.
- Since we are only introducing new admin API, the change should be backward compatible.
Rejected Alternatives
We could trigger multiple join group requests at the same time without changing JoinGroup protocol. However, considering our change in LeaveGroupRequest, it's hard to handle multiple responses within single admin client request. Changing the protocol to adapt to this change shall be more consistent.