Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagetext
titleExample: Member leaves
Initial group and assignment: A(T1), B(T2), C(T3), D(T4)
D(T4) bounces. First leaves the group.
Rebalance is triggered. Remaining member rejoin with subscriptions: 
A(T,assigned:T1), B(T,assigned:T2), C(T,assigned:T3)
Leader computes detects "lost" partition T4. Sends empty assignments, without revocations and a scheduled rebalance timeout of t1: 
A(assigned:,revoked:,t1), B(assigned:,revoked:,t1), C(assigned:,revoked:,t1)
Before t1 is, member D joins again as D'
Rebalance is triggered. All members join with subscriptions: 
A(T,assigned:T1), B(T,assigned:T2), C(T,assigned:T3), D'(T,assigned:)
Leader sends updated assignment:
A(assigned:,revoked:,-), B(assigned:,revoked:,-), C(assigned:,revoked:,-), CD'(assigned:T4,revoked:,-)
  • Leader exits and new leader is elected in this rebalance - This case is treated as loss of member. Enough info is available to at least heuristically detect whether the leader bounced back and can immediately be reassigned or if we put the missing partitions into purgatory during the timeout.
    • What if the previous leader was in the middle of waiting for a scheduled timeout?
      • If the new leader was already in the group, they can just use the timeout they should already know about and not override it.
      • If the new leader is new to the group, they should fall back to assuming there wasn't a wait period in effect.

...