IDIEP-64
Author
Sponsor
Created

  

Status
DRAFT


Motivation

Infrastructure developed in IEP-35 introduces a convenient way to work with metrics. All metrics are registered in a centralized registry, where names, descriptions and types are stored.

It lacks a way to specify units of measurement for the metrics though. All metrics are provided as raw numbers, and it's up to the user to interpret these numbers. The lack of units specified in a standardized form make this interpretation problematic.

We hope that developers specify units in the description or comments, but not everybody does that. Currently in a general case you need to go to the code where the metric is implemented and see how metrics are calculated to determine their units.

Different metrics of the same kind use different units. For example, buckets in tx.nodeSystemTimeHistogram use milliseconds, while cache.<cacheName>.GetTime  uses nanoseconds for some reason.

When setting up a monitoring infrastructure, users need this information, since it makes a big difference whether the metric uses milliseconds or nanoseconds, bytes or number of packets. Monitoring tools could use this information to scale the measurements properly, converting bytes into megabytes, milliseconds into minutes, etc. There is no source of this information currently, so setting up a monitoring dashboard required searching through the source code of Ignite.

Description

MetricUnit class should be introduced. It should be possible to acquire the following instances:

  • NUMBER – number of times something happened or number of elements. No specific units are assigned to this value.
    Example: io.dataregion.PagesReplaced 
  • NANOSECONDS, MILLISECONDS, SECONDS – for time periods.
    Example: tx.totalNodeSystemTime 
  • TIMESTAMP – for specific moment in time.
    Example: cache.RebalanceStartTime 
  • BYTES, KILOBYTES, MEGABYTES – for amount of data.
    Example: cache.TotalRebalancedBytes 
  • PERCENT – fraction.
    Example: sys.CpuLoad 

It should also be possible to represent units like "bytes per second" and "number of times per second". A method MetricUnit.per(...)  should be implemented that will create an instance of a new MetricUnit corresponding to the ratio.

Example metrics: io.dataregion.PagesReplaceRate, cache.RebalancingBytesRate.


Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

// Links or report with relevant JIRA tickets.

  • No labels