Status

Current state: Under Discussion

Discussion thread: TBD

JIRA: Unable to render Jira issues macro, execution error. Unable to render Jira issues macro, execution error.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Today group coordinator will take in unlimited number of join group requests into the membership metadata. There is a potential risk described in Unable to render Jira issues macro, execution error. where too many illegal joining members will burst broker memory before session timeout GC them. To ensure stability of the broker, we propose to enforce a hard limit on the size of consumer group in order to prevent explosion of server side cache/memory.

Public Interfaces

We propose to add a new configuration into KafkaConfig.scala, and its behavior will affect the following coordinator APIs:

GroupCoordinator.scala

def handleJoinGroup(...)

def handleSyncGroup(...)

where we shall enforce the group size capping rules upon requests.

Proposed Changes

We shall add a config called group.max.size on the coordinator side.

KafkaConfig

val GroupMaxSizeProp = "group.max.size"
...
val GroupMaxSize = 1000000
...
.define(GroupMaxSizeProp, INT, Defaults.GroupMaxSize, MEDIUM, GroupMaxSizeDoc)

The default value 1_000_000 proposed here is based on a rough size estimation of member metadata (120B), so the max allowed memory usage per group is 120B * 1_000_000 = 100 MB which should be sufficient large number of 5X~10X for most use cases I know. Further discussion is welcomed on defining the default value!

Implementation wise we shall block registration of new member once a group reaches its capacity, and define a new error type:

Errors.java

GROUP_MAX_SIZE_REACHED(77, "Consumer group is already at its full capacity.",
 GroupMaxSizeReachedException::new);

Since the cap should never be reached, the consumer would fail itself upon receiving this error message to reduce load on broker side because reaching capacity limit is a red flag indicating some client side logic bug and should be prohibited to ensure server stability.

Compatibility, Deprecation, and Migration Plan

This is backward compatible change.

Rejected Alternatives

Some discussion here proposed other approaches like enforcing memory limit or changing initial rebalance delay. We believe that those approaches are "either not strict or not intuitive" (Quote from Stanislav), compared with group size cap which is very easy to understand and config by end user in the customized manner.

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

Proposed Changes

Rejected Alternatives

Space shortcuts

Child pages

KIP-389: Enforce group.max.size to cap metadata growth

Status

Motivation

Public Interfaces

Proposed Changes

Rejected Alternatives