Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By separating the permanent state from the transitory state, we can more effectively handle transitory issues.  For example, if you have a 3 node cluster that is undergoing rolling upgrade, one of the nodes might be down because it is rolling.  However, we should still allow users to create new topics with replication factor 3. Currently, that is not possible, because the node's registration information gets wiped the moment its ZK registration goes away.  With KIP-631, the registration remains, although the node becomes fenced.  Another example is doing reassignment on a cluster where one or more nodes is down.  Currently, when a node is down, all of its ZK registration information is gone.  But  we need this information in order to understand things like whether the replicas of a particular partition are well-balanced across racks.

...