Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • FailedPartitionsCount - Count of partitions that have failed. Instead of separate metrics, clientId is used as a tag to distinguish between Replica and ReplicaAlterLogDir fetchers.

  • TotalReplicaFetcherThreads - Total replica fetcher threads. (we might add if its useful)

...

In case of a partition failure, the replica fetcher thread associated with it, would stop tracking itthe failed partition. The thread would continue to monitor rest of the partitions. The failedPartition set would keep track of failed partitions. Once the fetcher stops tracking it, the partition would be removed from the set for failed partitions. Hereafter, the controller may choose the partition as a leader or follower. If the partition has recovered and healthy enough to lead it would remain leader otherwise usual behavior would follow as for a leader going down.

Since the two replica fetchers (ReplicaFetcherThread and ReplicaAlterLogDirsThread) are quite similar in behavior and are extended from the same class, probably should not make one deviate much from the other.

...