THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Co-ordinator failure during rebalance
...
A rebalance operation goes through several phases -
- Co-ordinator receives notification of a rebalance - either a zookeeper watch fires for a topic/partition change or a new consumer registers or an existing consumer dies.
- Co-ordinator initiates a rebalance operation
- Consumers send a JoinGroupRequest
- Co-ordinator increments the group's generation id in zookeeper
- Co-ordinator sends a JoinGroupResponse
Co-ordinator can fail at any of the above phases during a rebalance operation. This section discusses how the failover handles each of these scenarios.
- If the co-ordinator fails at step #1 after receiving a notification but not getting a chance to act on it, the new co-ordinator has to be able to detect the need for a rebalance operation on completing the failover. During failover, the co-ordinator reads a group's metadata from zookeeper, including the list of topics the group has subscribed to and the previous partition ownership decision. If the # of topics or # of partitions for the subscribed topics are different from the ones in the previous partition ownership decision, the new co-ordinator detects the need for a rebalance and initiates one for the group. Similarly if the consumers that connect to the new co-ordinator are different from the ones in the group's generation in zookeeper, it initiates a rebalance for the group. For example, if a consumer that is not in the current generation sends a HeartbeatRequest or
Consumer id assignment