Status

Current state"Adopted" (2.2.0)

Discussion thread: here

JIRA: Unable to render Jira issues macro, execution error.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

A complex system like Kafka is non-trivial to operate. Metrics help users gauge how the system is behaving in real time. Kafka exposes metrics via JMX. Most users typically integrate these metrics with third-party systems which ease monitoring, detection of abnormal behavior and alerting.
When starting up a broker, most metrics will naturally not be populated since not much activity has happened. In this case, said metrics will output their default value.

In particular, I want to shine the light on three classes which help the tracking of Max, Avg and Min statistics. These classes respectively keep track of the maximum, average and minimum value they have recorded just fine. 
The problem is that these metrics give out inconsistent (in regards to each other) default values. Max gives out `-Inf`, Avg gives out `0.0` and Min gives out `1.7976931348623157e+308` (Double.MAX_VALUE).

This is confusing to say the least and can cause third-party tool failures at worst. Most likely, users have explicit checks in their metrics monitoring software for default values like `metricValue != Double.NEGATIVE_INFINITY` and etc.
It is more intuitive to have these values be consistent.

Public Interfaces

All `-min`, `-avg` and `-max` metrics will now output `NaN` default values.

Proposed Changes

Change the Min, Avg and Max stats' default value to be be `NaN` and therefore consistent with each other.

It is worth noting that these are Kafka metrics, not Yammer (the third-party library)

Compatibility, Deprecation, and Migration Plan

The default value for all `-min`, `-avg` and `-max` metrics will change.
If users' tools cannot handle `NaN` as a default value, they will need to work around it (probably via an explicit check).

Rejected Alternatives

  • Change Max's default value to `4.9e-324` (Double.MIN_VALUE) (to be consistent with Min)
    • The Max stat is used in far more metrics than Min so it is better to change the one that is least used
    • -Inf and +Inf are more intuitive default values for Max and Min stats respectively
  • Change Min's default value to be `+Inf` (to be consistent with Min)
    • This would cause the least backwards compatibility problems since there are only 3 metrics that use `-min`.
    • As discussed in the thread, `NaN` is more correct semantically and more intuitively shows that nothing has been recorded
  • No labels