Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Registration timeout is the timeout we will trigger rebalance when a member goes offline for too long. It should usually be set much larger than session timeout which is used to detect consumer health. It is monitored through heartbeat the same as session timeout. By setting it to 15 ~ 30 minutes, we are loosening the track of static member progress, and transfer the member management to client application like K8. Of course, we should not wait forever for the member to back online simply for the purpose of reducing rebalances. Eventually the member will be kicked out of group and a final rebalance is triggered. Note that we are tracking the earliest offline member and compare with the registration timeout. Example below with registration timeout 15 min:

...

There are cases when we are scaling down the application, it is advised to do it quickly so that when the registration timeout is reached since the first gone member, we could trigger one single rebalance and make the progress back on track. Note that here we are sacrificing liveness for 15 min of registration timeout for the sake of minimizing state shuffling

A corner case is that A & B could be dropping off the group at near time. In static membership, we still need to sync group to make sure how many existing members are still alive, otherwise unnecessary rebalance will trigger later.

Another case is adding new static memberships (scale up!). This operation should be happening fast enough (to make sure capacity could catch up quickly), we are defining another config called expansion timeout.

...