ID | IEP-64 |
Author | |
Sponsor | |
Created |
|
Status | DRAFT |
Infrastructure developed in IEP-35 introduces a convenient way to work with metrics. All metrics are registered in a centralized registry, where names, descriptions and types are stored.
It lacks a way to specify units of measurement for the metrics though. All metrics are provided as raw numbers, and it's up to the user to interpret these numbers. The lack of units specified in a standardized form make this interpretation problematic.
We hope that developers specify units in the description or comments, but not everybody does that. Currently in a general case you need to go to the code where the metric is implemented and see how metrics are calculated to determine their units.
Different metrics of the same kind use different units. For example, buckets in tx.nodeSystemTimeHistogram
use milliseconds, while cache.<cacheName>.GetTime
uses nanoseconds for some reason.
When setting up a monitoring infrastructure, users need this information, since it makes a big difference whether the metric uses milliseconds or nanoseconds, bytes or number of packets. Monitoring tools could use this information to scale the measurements properly, converting bytes into megabytes, milliseconds into minutes, etc. There is no source of this information currently, so setting up a monitoring dashboard required searching through the source code of Ignite.
MetricUnit class should be introduced. It should be possible to acquire the following instances:
io.dataregion.PagesReplaced
tx.totalNodeSystemTime
cache.RebalanceStartTime
cache.TotalRebalancedBytes
sys.CpuLoad
It should also be possible to represent units like "bytes per second" and "number of times per second". A method MetricUnit.per(...)
should be implemented that will create an instance of a new MetricUnit
corresponding to the ratio.
Example metrics: io.dataregion.PagesReplaceRate, cache.RebalancingBytesRate.
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.
// Links or report with relevant JIRA tickets.