Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In case a partition fails, the replica fetcher thread would stop tracking the failed partition. A set failedPartitions would be used to keep a track of it. Instead of throwing an exception which ends up terminating the thread, an error message will be logged and the partition will be added to the failedPartitions set. The partition would be removed from the fetcherLagStats and partitionStates since partition lag cannot be accurately tracked once fetching is stopped. The thread would continue monitoring rest of the partitions which are lost in the current scenario.

If all partitions for a fetcher thread are marked as failed, the thread would be shut down. In cases where a replica is deleted on a broker through a StopReplicaRequest while the partition is present in failedPartitions set, the partition would be removed from the set. 

Until the next leader epoch, the partition would remain in the failedPartitions set. At the leader epoch, the failed partitions would be marked as un-failed by removing from the set for failed partitions. Hereafter, the controller can choose the partition as leader or follower and would follow the usual behavior.

...