Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Question: In between T1 and T2, the broker B containing incompatible features could lingers in linger in the cluster. This window is very small (milli seconds), and typically rare – it can only happen in a rare case where an incompatible broker comes up in the cluster around the time that a feature version upgrade is finalized.
The question is, what if the controller finalizes features via E2 while processing of broker registration via E1 is still in-flight/queued on the controller?  Would this cause a harm to the cluster?

Solution: We intend to handle the race condition by careful ordering of events in the controller. In the controller, the thread that handles the ApiKeys.UPDATE_FEATURES request (E2) will be the ControllerEventThreadThis is also the same thread that updates the controller's cache of Broker info whenever a new broker joins the cluster (E1). In this setup, if an ApiKeys.UPDATE_FEATURES request (E2) is processed ahead of a notification from ZK about an incompatible broker joining the cluster (E1), then the controller can certainly detect the incompatibility when it processes E1 after E2 (since it knows the latest finalized features). The controller would handle the incompatible broker, by blocking the remaining of the new broker startup sequence by refusing to send an UpdateMetadataRequest to bootstrap the new broker. Then, it is only a matter of time (milli seconds) before the new broker receives a ZK notification (E3) about a change to '/features' node, then automatically shuts itself down due to the incompatibility.

...