Status
Current state: "Under Discussion"
Discussion thread: https://www.mail-archive.com/dev@kafka.apache.org/msg77721.html
JIRA: KAFKA-5565
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Consumer group rebalancing may impact the performance of clients. The rebalancing process may also sometimes take longer than expected. It would be good to have some metrics which provide visibility into how many rebalances are in progress.
Public Interfaces
We should add new metrics identifying how many consumer groups are in each state.
- NumGroupsRebalancing: the number of consumer groups which are currently in the Rebalancing state.
- NumGroupsAwaitingSync: the number of consumer groups which are currently in the AwaitingSync state.
- NumGroupsStable: the number of groups which are currently in the Stable state.
- NumGroupsDead: the number of groups which are currently in the Dead state.
- NumGroupsEmpty: the number of groups which are currently in the Empty state.
In combination with the existing NumGroups metric, this will show what percentage of groups are in a particular state at a given time.
Compatibility, Deprecation, and Migration Plan
None
Rejected Alternatives
Instead of adding a metric, we could look through the broker logs to see when consumer group rebalances begin and end. However, this would be more difficult for metrics monitoring systems to track, since they would have to parse the broker logs.
Another option would be to provide more information about groups through the AdminClient. While this would be useful, it doesn't serve exactly the same function of giving a summary of what is going on which a metric does.