Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Adopted

Table of Contents

Status

Current state:  Under DiscussionAdopted

Discussion thread: TBD

JIRA: https://issues.apache.org/jira/browse/KAFKA-7610

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-7824

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

In the current consumer protocol, the field `member.id` is assigned by broker to track group member status. A new consumer joins the group with `member.id` field set as UNKNOWN_MEMBER_ID (empty string), since it needs to receive the identity assignment from broker first. For request with unknown member id, broker will blindly accept the new join group request, store the member metadata and return a UUID to consumer. The edge case is that if initial join group request keeps failing due to connection timeout, or the consumer keeps restarting, or the max.poll.interval.ms configured on client is set to infinite (no rebalance timeout kicking in to clean up the member metadata map), there will be accumulated MemberMetadata info within group metadata cache which will eventually burst broker memory. The detection and fencing of invalid join group request is crucial for broker stability.

This KIP is a parallel work with KIP-389 which tries to enforce hard cap on the group metadata size, and an important complement for KIP-345 which introduces static membership.

...

Code Block
languagejava
titleErrors.java
MEMBER_ID_REQUIRED(79, "Consumer needs to have a valid member id before actually entering group", MemeberIdRequiredException::new),

We shall also bump join group protocol version so that broker knows whether the consumer could safely handle this type of error. For example if we bump protocol version from m to m+1, all the request with version >= m+1 will be returned with MEMBER_ID_REQUIRED error, while version <= m will still be blindly accepted for backward compatibility.

Proposed Changes

When encountering MEMBER_ID_REQUIRED exception, the client will use the given member id in the join group response to retry the join, which is expected to be accepted by the broker if id matches. If we encounter UNKNOWN_MEMBER_ID exception with the second join attempt, client handling logic will be the same, which is reseting the generation and ask a new member id from broker by sending anonymous join group request. We also handle the registered member id eviction through session timeout so that the pre-allocation map will not grow indefinitely, although the map size should be trivial since we only store a random generated id.

Effectively speaking, previously we accept anonymous member joining as new member, and now we require one more bounce to justify new member identity.

Compatibility, Deprecation, and Migration Plan

  • This is a pure broker upgrade which should be transparent to the client users. Impact should be minimum.
  • No compatibility issue identified.

Rejected Alternatives

Jason proposed another approach to monitor the TCP connection. As he described, "During the initial JoinGroup, we can detect failed members when the TCP connection fails. This is difficult at the moment because we do not have a mechanism to propagate disconnects from the network layer. A potential option is to treat the disconnect as just another type of request and pass it to the handlers through the request queue." It is still under discussion and we believe that KIP-394 is a more intuitive solution.N/A