Co-ordinator failure during rebalance

...

A rebalance operation goes through several phases -

Co-ordinator receives notification of a rebalance - either a zookeeper watch fires for a topic/partition change or a new consumer registers or an existing consumer dies.
Co-ordinator initiates a rebalance operation
Consumers send a JoinGroupRequest
Co-ordinator increments the group's generation id in zookeeper
Co-ordinator sends a JoinGroupResponse

Co-ordinator can fail at any of the above phases during a rebalance operation. This section discusses how the failover handles each of these scenarios.

If the co-ordinator fails at step #1 after receiving a notification but not getting a chance to act on it, the new co-ordinator has to be able to detect the need for a rebalance operation on completing the failover. During failover, the co-ordinator reads a group's metadata from zookeeper, including the list of topics the group has subscribed to and the previous partition ownership decision. If the # of topics or # of partitions for the subscribed topics are different from the ones in the previous partition ownership decision, the new co-ordinator detects the need for a rebalance and initiates one for the group. Similarly if the consumers that connect to the new co-ordinator are different from the ones in the group's generation in zookeeper, it initiates a rebalance for the group. For example, if a consumer that is not in the current generation sends a HeartbeatRequest or

Space shortcuts

Child pages

Versions Compared

Old Version 57

New Version 58

Key

Co-ordinator failure during rebalance

Consumer id assignment

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 57

New Version 58

Key

Co-ordinator failure during rebalance

Consumer id assignment