Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

When the KRaft broker begins controlled shutdown, it immediately disables the metadata listener. This means that metadata changes as part of the controlled shutdown do not get sent to the respective components. For partitions that the broker is follower of, that is what we want. It prevents the follower from being able to rejoin the ISR while still shutting down. But for partitions that the broker is leading, it means the leader will remain active until controlled shutdown finishes and the socket server is stopped. That delay can be as much as 5 seconds and probably even worse. Note that in the ZK world, we have an explicit request `StopReplica` which serves the purpose of shutting down both follower and leader, but we don't have something similar in KRaft.

Proposed Changes

This KIP proposes changing the ISR expansion logic on the leader and and the ISR validation logic on the controller to avoid bringing back fenced or shutting down replicas in the ISR. The leader will consider only unfenced replicas to be eligible to join the ISR. It will rely on the metadata cache to get this information via the metadata log. As the metadata cache is eventually consistent, the leader might try to add a replica - which was just removed by the controller - back to the ISR because it does not know that the replica was fenced by the controller yet. In order to avoid this, the controller will validate the new ISR and reject any AlterPartition request containing an ineligible replica with the newly introduced INELIGIBLE_REPLICA error code. For backward compatibility, OPERATION_NOT_ATTEMPTED will be used for older versions. When the leader receives an INELIGIBLE_REPLICA error code, it is expected to revert back its state to the last committed state - assuming that the state did not change in the mean time - and to retry to expansion. When a broker is unfenced by the controller, the leader does nothing because subsequent fetch requests from the followers will try to get them back into the ISR if they are caught-up.

With this change, a shutting down broker can stop its metadata listener when the controlled shutdown is terminated. This allows leaders hosted on that broker to step down while allowing followers to keep fetching until the broker shuts down.

...