Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposed Changes

Controllers

Today, any errors during metadata processing on the Active Controller cause it to renounce the quorum leadership. These renounces are different than the ones caused by general Raft elections due to other reasons like a roll. Repeated elections caused due to errors in the active controller could point to issues in the metadata log generation/handling logic and having visibility into these makes sense. The MetadataErrorCount metric reflects the number of times a controller node has had to renounce quorum leadership due to an error in the event processing logic.

The MetadataErrorCount metric will be incremented anytime the controller is going to resign as a result of handling exceptions in Event ProcessingFor Active Controllers the MetadataErrorCount is incremented anytime they hit an error in either generating a Metadata log or while applying it to memory. For standby controllers, this metric is incremented when they hit an error in applying the metadata log to memory. This metric will reflect the total count of force resignations errors that a controller (both leader and non-leader) underwent encountered in metadata log processing since the last restart.

https://github.com/apache/kafka/blob/14d2269471141067dc3c45300187f20a0a051777/metadata/src/main/java/org/apache/kafka/controller/QuorumController.java#L409 

...