Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Total time is not very relevant for long running applications, only the rate of change (millisPerSec or similar)
  • In most cases it's best to simply aggregate the time/count across the different GabrageCollectors, however the specific collectors are dependent on the current Java runtime 
  • It's impossible to detect long GC pauses that may cause heartbeat timeouts

We propose to improve the current situation by:

  • Exposing rate metrics per GarbageCollector
  • Exposing aggregated Total time/count/rate metrics
  • Expose average GC time metric in the current measurement window (last 1 minute)

These new metrics are all derived from the existing ones with minimal overhead.

...

New GC metrics (in addition to the existing ones)

Job-/TaskManager

Status.JVM.GarbageCollector<GarbageCollector>.TimeMsPerSecMilliseconds spent performing garbage collection per second.Meter
<GarbageCollector>.AverageTimeAverage collection time in the current metric window. Delta(Time) / Delta(Count)Gauge
Total.TimeThe total time spent performing garbage collection across all collectors.Gauge
Total.TimeMsPerSecMilliseconds spent performing garbage collection per second across all collectors.Meter
Total.AverageTimeAverage collection time in the current metric window across all collectors. Delta(Time) / Delta(Count)Gauge
Total.CountThe total number of collections that have occurred across all collectors.Gauge

...