Status

Current state: Under Discussion

Discussion thread:

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The capability to inspect consumer group metadata such as the list of members in the group and their partition assignments is crucial for debugging, monitoring, and administration, but there is currently no API which exposes this information for the new consumer. In the initial design of the group management protocol for the new consumer, it was assumed that group metadata would be persisted in Zookeeper by group coordinators. This would have allowed tooling to inspect Zookeeper directly to obtain this metadata in the same way that it does for the old consumer. However, the alternative of storing group metadata in Kafka was suggested in KAFKA-2017, which has several advantages, such as reducing overall load on Zookeeper and maintaining a clean separation of client state (stored in Kafka) from broker state (stored in Zookeeper). At the time of writing, no general agreement has been reached on this issue, so to avoid forcing a hasty decision, the consensus seems to be to revisit the question after the 0.9 release.

With this issue momentarily tabled, we still have the problem of how group metadata can be viewed on an active cluster. Inspecting broker logs to find current group membership and assignments is not really a viable solution, even in the short term. To address this problem, we propose to modify the GroupMetadataRequest (formerly known as the ConsumerMetadataRequest) to support returning group metadata. This request is currently used by clients to find the group coordinator. Basically, the idea is to allow this request to return group metadata when it is received by the group's coordinator. For tooling, the workflow would be to send one GroupMetadata request to find the coordinator for the group, and one further request to retrieve the metadata from the coordinator. Additionally, we extend this request to support returning multiple groups so that it can be used to get a list of the groups hosted by each coordinator. To get a list of all groups in the cluster, tools will have to query each broker separately and combine the results.

Public Interfaces

The specific changes to the GroupMetadata request/response schemas are given below. Briefly, the schemas are modified to support querying multiple groups, and the response is modified to return group and member metadata. The request includes a flag to return the group metadata for all groups managed by the coordinator. We could alternatively use an empty list to indicate this case, but that pattern seems to be frowned upon after experience from the TopicMetadata request.

We also add a flag to the request which indicates whether member metadata is desired (although we have listed them separately, these two flags could be combined into a single Option field). The purpose of this field is to allow the retrieval of member-specific information such as topic subscriptions and partition assignments for the new consumer. Since this data can be large, however, the flag can be disabled for use cases which do not want the overhead. For example, clients which are only trying to discover the current coordinator would disable the flag.

GroupMetadataRequest => IncludeAllGroups IncludeMemberMetadata Groups
  IncludeAllGroups => int8
  IncludeMemberMetadata => int8
  Groups => [GroupId]
    GroupId => String
 
GroupMetadataResponse => [ErrorCode GroupId Coordinator GroupMetadata MemberMetadata]
  ErrorCode => int16
  GroupId => string
  Coordinator => Id Host Port
    Id => int32
    Host => string
    Port => int32
  GroupMetadata => State ProtocolType Generation Protocol
    State => int8
    ProtocolType => string
    Generation => int32
    Protocol => string
  MemberMetadata => Leader Members
    Leader => string
    Members => [MemberId MemberIpAddress ClientId MemberMetadata MemberAssignment]
      MemberId => string
      MemberHost => string
      ClientId => string
      MemberMetadata => Bytes
      MemberAssignment => Bytes

Proposed Changes

We propose to implement the above request/response schema as version 1 of the group metadata request with the following semantics:

Brokers will handle GroupMetadata requests with the IncludeAllGroups flag set by returning metadata for all groups currently registered with themselves.
If the broker is not the coordinator for a requested groupId, it will return only the coordinator host information.
If the broker is the coordinator for a requested groupId, it will return current group metadata and member metadata according to whether the IncludeMemberMetadata flag is set and whether any metadata is available (see below).

Group States: Below we list the possible group states and how it affects the associated metadata.

Dead: There are no active members in the group. All other group metadata fields will be set to empty.
Initializing: The group is loading metadata from its storage. All other group metadata fields will be set to empty and no member metadata will be returned.
Rebalancing: The group is undergoing a rebalance. The generation, protocolType, and protocol will be set according to the prior generation. No member metadata will be returned.
Stable: The group has a valid generation. Group and member metadata will be set based on the active generation.

Member Metadata: Since the memberId is randomly generated, we must include additional information to help users identify the group member. The client host can be obtained from the session of the member's JoinGroup request and the clientId from the JoinGroup request itself. These fields will stay fixed until the next rebalance.

Since both Copycat and the new consumer use Kafka's group management, the same request can be used by administrators to inspect both types of clients (as well as any future use cases that may come up). The protocolType specifies what kind of group it is and how metadata/assignments should be decoded. For consumer-specific tooling, any groups which do not have a "consumer" protocol type can be ignored.

Error Codes: The following error codes are possible with this request:

COORDINATOR_NOT_AVAILABLE: The broker could not determine the coordinator for the associated groupId. Under the current implementation, this would happen if the leader of the consumer offsets topic associated with the groupId were unavailable.
NOT_COORDINATOR_FOR_GROUP: The broker is not the coordinator for the associated group. The Coordinator field will be set to the host information of the coordinator. Clients should use this to determine if they need an additional query to satisfy their request.
NONE: The request was satisfied successfully the the group's coordinator.

Compatibility, Deprecation, and Migration Plan

Version 0 of the GroupMetadata request supports only a single groupId and returns only coordinator host information in the response. Since this is a subset of the functionality provided by this request, there should be no problem continuing to support it. One notable difference between the two versions is the use of the NOT_COORDINATOR_FOR_GROUP error. The new version uses this to indicate to clients that the broker is not the coordinator for the group, while the old request would return no error (since the request was only used to locate the coordinator and that was successfully accomplished). An alternative would be to keep the old behavior and have clients check the brokerId of the coordinator in the response to know whether an additional request is needed. It might also be possible to use an UNKNOWN group state to indicate the same.

One thing worth noting for future compatibility is that this request assumes that the coordinator will always have group and member metadata available to it. While it may not always be necessary to store this information in memory, the broker must have some way to load it on demand. Relaxing this requirement would generally mean deprecating and removing this API.

Rejected Alternatives

If the persistence question were settled, it may be possible to query the storage system directly. For example, if group metadata were stored in Zookeeper, then tools could query it directly in the same way that they currently do. However, in line with KIP-4, it seems preferable to have tools depend only on the Kafka API since this decouples them from the storage implementation and allows for simpler access control.
Instead of extending the group metadata request, it would also be possible to use a new request type. The only real drawback is, well, that it requires a new request type. The preference in the Kafka community appears to be keeping the number of request types minimal and extending the GroupMetadata request to return group metadata seems reasonable.

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

Proposed Changes

Rejected Alternatives

Space shortcuts

Child pages

KIP-40: GroupMetadata request enhancement

Status

Motivation

Public Interfaces

Proposed Changes

Rejected Alternatives