Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Static Membership: the membership protocol where the consumer group will not trigger rebalance unless 
    • A new member joins
    • A leader rejoins (possibly due to topic assignment change)
    • An existing member offline time is over session timeout
    • Broker receives a leave group request containing a list of alistof `group.instance.id`s (details later)

...

The new `group.instance.id` config will be added to the join group request. Two lists A list of tuples containing `group.instance.id` and `member.id` will be added to the LeaveGroupRequest, while removing the single `member.id` field.

Code Block
JoinGroupRequest => GroupId SessionTimeout RebalanceTimeout MemberId GroupInstanceId ProtocolType GroupProtocols
  GroupId             => String
  SessionTimeout      => int32
  RebalanceTimeout	  => int32
  MemberId            => String
  GroupInstanceId     => String // new
  ProtocolType        => String
  GroupProtocols      => [Protocol MemberMetadata]
  Protocol            => String
  MemberMetadata      => bytes

LeaveGroupRequest => GroupId GroupInstanceIdList MemberIdListMemberIdentityList
  GroupId             => String
  MemberId            => String // removed
  GroupInstanceIdListMemberIdentityList => List[Tuple[String] // new
  MemberIdList        => List[String], String]] // new

In the meantime, we bump the join/leave group request/response version to v4/v3.

...

Code Block
languagejava
titleErrors.java
MEMBER_ID_MISMATCH(78, "For join group request, this This implies some group.instance.id is already in the consumer group, however the corresponding member.id was not matching the record on coordinator; For leave group request, this implies the member.id list length doesn't align with group.instance.id list matching the record on coordinator", MemeberIdMisMatchException::new),
GROUP_INSTANCE_ID_NOT_FOUND(79, "Some group.instance.id specified in the leave group request are not found", GroupInstanceIdInvalidException::new)

...

On server side, broker will keep handling leave group request <= v3 as before. We extended the LeaveGroupRequest API with two with a new lists tuple list which map from pairs `group.instance.id` to `member.id`. The reason to include `member.id` list instead of solely adding a `group.instance.id` list is to move LeaveGroupRequest towards a more consistent batch API in long term. The two lists are expected to be of same length and aligned, which means each `group.instance.id` at the same index of .id` list instead of solely adding a `group.instance.id` has a strong matching reflected within current static membership on brokerlist is to move LeaveGroupRequest towards a more consistent batch API in long term. The processing rules are following:

  1. For static member, `group.instance.id` must be provided. Client could optionally provide a `member.id` when `group.instance.id` is configured non-empty. If `member.id` is provided, the member will only be removed if the `member.id` matches. Otherwise, only the `group.instance.id` is used. The `member.id` serves as a validation here, which currently will not be used (set to empty string) but potentially useful if we do fully automated removal process.
  2. For leave group requests under dynamic membership, the member will apply a singleton list of one tuple containing a `member.id` that it is currently using, and a singleton list of `group.instance.id` which is set to empty string. If this is the case, we shall just remove the given dynamic member the same way as current leave group logic.
  3. Error cases expected are:
    1. Some instance ids (non-empty) are not found, which means the request is not valid (GROUP_INSTANCE_ID_INVALID, defined in the public changes section)
    2. The length of MemberIdList doesn't match length of GroupInstanceIdList (MEMBER_ID_MISMATCH, defined in the public changes section)
    3. A theoretical case would be that both `member.id` and `group.instance.id` are set to empty string. We shall expose error in the server log. If the entire batch request is configured with empty strings, UNKNOWN_MEMBER_ID error will be returned.

...

Currently the scale down is controlled by session timeout, which means if user removes the over-provisioned consumer members it waits until session timeout to trigger the rebalance. This is not ideal and motivates us to change LeaveGroupRequest to be able to include a list of tuples of `group.instance.id` and a list of `member.id` such that we could batch remove offline members and trigger rebalance immediately without them.

...