Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Adding new static memberships should be straightforward. This operation should be happening fast enough (to make sure capacity could catch up quickly), we are defining another config called expansion timeout. In ideal case, we could actually introduce a new status called "learner" where the newly up hosts could try to catch up with the assigned task progress first before triggering the rebalance, from which we don't see a sudden dip on the progress. 

With the introduction of static membershipFor scaling up from empty stage, we plan to deprecate 

Code Block
titleGroupMetadata.scala
def expansionTimeoutMs = Int // Default 5 min

This is the timeout when we count down to trigger exactly one rebalance (i.e, the time estimate to spin up # of hosts) since the first joined member's request. It is advised to be set roughly the same with session timeout to make sure the workload become balanced when you 2X or 3X your stream job. Example with expansion timeout 5 min: 

deprecate group.initial.rebalance.delay.ms since 

Rolling bounce

Currently there is a config called rebalance timeout which is configured by consumer max.poll.intervals. The reason we set it to poll interval is because consumer could only send request within the call of poll() and we want to wait sufficient time for the join group request. When reaching rebalance timeout, the group will move towards completingRebalance stage and remove unjoined groups. This is actually conflicting with the design of static membership, because those temporarily unavailable members will reattempt the join group and trigger extra rebalancesEffectively, we are using expansion timeout to replace rebalance timeout, which is configured by max.poll.intervals from client side, and using registration timeout to replace session timeout. We are also replacing group.initial.rebalance.delay.ms config on broker side to make sure the behavior is consistent for static membership.

Fault-tolerance of static membership

...