Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this example, AtMinIsr triggers when there is only 1 insync replica remaining, and tells us that 1 more failure will cause the partition to go completely offline!

Usage

This new AtMinIsr categorization can be extremely powerful in detecting broker failures (example 2 from above) and determining when action should be taken without being too noisy.

A potential usage of this new AtMinIsr category is:

  1. Set up an alert for AtMinIsr > 0 for a period of time
  2. If the alert is triggered, then assess the health of the cluster
  3. If there is broker failure which cannot be fixed quickly, then use partition-metric or --at-min-isr-partitions option of TopicCommand to quickly determine list of topics to repartition

AtMinIsr Values + Possible Explanations

...