Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently there is a config called rebalance timeout which is configured by consumer max.poll.intervals. The reason we set it to poll interval is because consumer could only send request within the call of poll() and we want to wait sufficient time for the join group request. When reaching rebalance timeout, the group will move towards completingRebalance stage and remove unjoined groupsmembers. This is actually conflicting with the design of static membership, because those temporarily unavailable members will potentially reattempt the join group and trigger extra rebalances. Internally we would optimize this logic by having rebalance timeout only in charge of stopping prepare rebalance stage, without removing non-responsive members immediately. There would not be a full rebalance if the lagging consumer sent a JoinGroup request within the session timeout.

So in summary, the member will only be removed due to session timeout. We shall remove it from both in-memory static member name mapping and member list.

...

  1. It gives users more control of their member name; this would help for debugging purposes.
  2. It is more cloud-/k8s-and-alike-friendly: when we move an instance from one container to another, we can copy the member name to the config files.
  3. It doe not require the consumer to be able to access another dir on the local disks (think your consumers are deployed on AWS with remote disks mounted).
  4. By allowing consumers to optionally specifying a member name, this rebalance benefit can be easily migrated to connect and streams as well which relies on consumers, even in a cloud environment.

Future Works

Beyond static membership we could unblock many interactive use cases between broker and consumer. We will initiate separate discussion threads once 345 is done.  Examples are:

  1. Pre-registration (Proposed by Jason). Client user could provide a list of hard-coded `member.name` so that the server could respond to scaling operations more intelligently. For example when we scale up the fleet by defining 4 new client member names, the server shall wait until all 4 new members to join the group before kicking out the rebalance, same with scale down.
  2. Add hot standby hosts by defining `target.group.size` (proposed by Mayuresh). We shall keep some idle consumers within the group and when one of the active member go offline, we shall trigger hot swap due to the fact that current group size is smaller than `target.group.size`. With this change we might even not need to extend the session timeout since we should easily use the spare consumer to start working.