Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
// deprecated APIs: use {@link Sensor#record(double)} directly instead.

@Deprecated
void recordLatency(final Sensor sensor, final long startNs, final long endNs);

@Deprecated
void recordThroughput(final Sensor sensor, void final long value);

// updated APIs javadocs

  /*
   * Add a latency and throughputrate sensor for a specific operation, which will include the following metrics:
   * <ol>
   * <li>average latency</li>
   * <li>max latency</li>
   * <li>invocation rate (num.operations / time unit)</li>
   * <li>total invocation count</li>
   * </ol>
   * Whenever a user record this sensor via {@link Sensor#record(double)} etc,
   * it will be counted as one invocation of the operation, and hence the rate / count metrics will be updated accordingly;
   * and the recorded latency value will be used to update the average / max latency as well. The time unit of the latency can be defined
   * by the user.
   *
   * Note that you can add more metrics to this sensor after created it, which can then be updated upon {@link Sensor#record(double)} calls;
   * but additional user-customized metrics will not be managed by {@link StreamsMetrics}.
   *
   * @param scopeName          name of the scope, which will be used as part of the metrics type, e.g.: "stream-[scope]-metrics".
   * @param entityName         name of the entity, which will be used as part of the metric tags, e.g.: "[scope]-id" = "[entity]".
   * @param operationName      name of the operation, which will be used as the name of the metric, e.g.: "[operation]-latency-avg".
   * @param recordingLevel     the recording level (e.g., INFO or DEBUG) for this sensor.
   * @param tags               additional tags of the sensor
   * @return The added sensor.
   */
  Sensor addLatencyAndThroughputSensoraddLatencyAndRateSensor(final String scopeName,
                                       final String entityName,
                                       final String operationName,
                                       final Sensor.RecordingLevel recordingLevel,
                                       final String... tags);

Users can create a sensor via either `addLatencyAndThroughputSensor` or `addThroughputSensor`, which will be pre-registered with the latency / throughput metrics already; more metrics can then be added to the returned sensors in addition to the pre-registered ones. When recording a value to the sensor, users should just use `Sensor#record()` directly on the sensor itself.

Streams build-in Metrics



  /*
   * Add a rate sensor for a specific operation, which will include the following metrics:
   * <ol>
   * <li>invocation rate (num.operations / time unit)</li>
   * <li>total invocation count</li>
   * </ol>
   * Whenever a user record this sensor via {@link Sensor#record(double)} etc,
   * it will be counted as one invocation of the operation, and hence the rate / count metrics will be updated accordingly.
   *
   * Note that you can add more metrics to this sensor after created it, which can then be updated upon {@link Sensor#record(double)} calls;
   * but additional user-customized metrics will not be managed by {@link StreamsMetrics}.
   *
   * @param scopeName          name of the scope, which will be used as part of the metrics type, e.g.: "stream-[scope]-metrics".
   * @param entityName         name of the entity, which will be used as part of the metric tags, e.g.: "[scope]-id" = "[entity]".
   * @param operationName      name of the operation, which will be used as the name of the metric, e.g.: "[operation]-latency-avg".
   * @param recordingLevel     the recording level (e.g., INFO or DEBUG) for this sensor.
   * @param tags               additional tags of the sensor
   * @return The added sensor.
   */
  Sensor addRateSensor(final String scopeName,
                       final String entityName,
                       final String operationName,
                       final Sensor.RecordingLevel recordingLevel,
                       final String... tags);


Users can create a sensor via either `addLatencyAndRateSensor` or `addRateSensor`, which will be pre-registered with the latency / rate metrics already; more metrics can then be added to the returned sensors in addition to the pre-registered ones. When recording a value to the sensor, users should just use `Sensor#record()` directly on the sensor itself.

Streams build-in Metrics

And for Streams built-in metrics, we will clean them up by 1) And for Streams built-in metrics, we will clean them up by 1) adding a few instance-level metrics, 2) removing a few non-useful / overlapped-in-function metrics, 3) changing some metrics' recording level as well. Note the symbols tags in the tables below (the descriptions of the metrics are omitted since their semantics are all straight-forward based on the names of "rate, total, max, avg, static gauge" etc).

...

enforcedprocessingrate | totaldropped-latewindow    suppressionemitDEBUG *suppress processor only

LEVEL 0LEVEL 1LEVEL 2LEVEL 3LEVEL 3LEVEL 3

Per-Client

Per-Thread

Per-Task 

Per-Processor-Node Per-State-StorePer-Cache
TAGS

type=stream-metrics,client-id=[client-id]

type=stream-thread-metrics,thread-name=[threadId]


(! tag name changed)

type=stream-task-metrics,thread-name=[threadId],task-id=[taskId]


(! tag name changed)

type=stream-processor-node-metrics,thread-name=[threadId],task-id=[taskId],processor-node-id=[processorNodeId]


(! tag name changed)

stream-state-metrics,thread-name=[threadId],thread-name=[taskId],[storeType]-state-id=[storeName]


(! tag name changed)

type=stream-record-cache-metrics,thread-name=[threadId],task-id=[taskId],record-cache-id=[storeName]


(! tag name changed)

version | commit-id (static gauge)
INFO ($)




application-id (static gauge)
INFO ($)




topology-description (static gauge)
INFO ($)




state (dynamic gauge)
INFO ($)




rebalance-latency (avg | max)
INFO ($)




rebalance (rate | total)
INFO ($)




last-rebalance-time (dynamic gauge)
INFO ($)
active-task-process (ratio)
INFO ($)
standby-task-process (ratio-time (dynamic gauge)
INFO ($)




process-latency (avg | max)

INFODEBUG(! removed for now)

process (rate | total)

INFODEBUG ( → ) on source-nodes onlyDEBUG

punctuate-latency (avg | max)

INFODEBUG


punctuate (rate | total)

INFODEBUG


commit-latency (avg | max)

INFODEBUG


commit (rate | total)

INFODEBUG


poll-latency (avg | max)

INFO



poll (rate | total)

INFO



task-created | closed (rate | total)

INFO



active-task-
process (
ratio)
DEBUG
record-lateness (avg | max)
DEBUG

INFO ($)


standby-task-process (ratio)


INFO ($)


dropped
-records (rate | total)


INFO * (→)

DEBUG * (a subset of processor only)

                 

(! name changed)



skipped-
records (rate | total)

 (! moved to lower level)INFO *




enforced-processingskipped-records (rate | total)


DEBUG


record-lateness (avg | max)


DEBUG


suppression-emit (rate | total)



DEBUG * (suppress processor  (! moved to lower level)INFO * ( → )INFO * (few processors + record queue only)

suppression-buffer-size (avg | max)




DEBUG * (suppression buffer only)
suppression-buffer-count (avg | max)




DEBUG * (suppression buffer only)
expired-window-record-drop (rate | total)




DEBUG * (window store onlyonly)
put | put-if-absent .. | get-latency (avg | max)




DEBUG * (excluding suppression buffer)

                 (! name changed)


put | put-if-absent .. | get-latency (avg | maxrate)




DEBUG * (excluding suppression buffer)

                               (! name changed)

put | put-if-absent .. | get (rate)

DEBUG * (excluding suppression buffer)

                 (! name changed)

hit-ratio (avg | min | max)
DEBUG  (! name changed)

A few philosophies behind this cleanup:

(! name changed)


hit-ratio (avg | min | max)





DEBUG  (! name changed)


A few philosophies behind this cleanup:

  1. We will remove most of the parent sensors with `level-tag=all` except two cases.  The main idea is to let users to do rolling-ups themselves only if necessary so that we can save necessary metrics value aggregations. For those two exceptional cases, two parent-child sensor relationship is maintained because it is a bit tricky for users to do the rolling up correctly.
  2. We will keep all LEVEL-0 (instance) and LEVEL-1 (thread) sensors as INFO, and most of lower level sensors as DEBUG reporting level. They only exception is active/standby-task-process and dropped / skipp-records
    1. active/standby-task-process indicate the percentage that the current hosting thread is spending on processing them.
    2. dropped/skipped records indicate unexpected errors during processing and hence need to be paid attention by users. Their semantics though are a bit different: skipped records are those skipped at the very beginning of the process and hence not even traverse the topology at all; dropped-records are those dropped in the middle of the topology, and are not necessarily corresponding to a 1-1 mapping to the source records since one source records may be transformed to multiple intermediate records which are then dropped later
  3. We will remove most of the parent sensors with `level-tag=all` except two cases.  The main idea is to let users to do rolling-ups themselves only if necessary so that we can save necessary metrics value aggregations. For those two exceptional cases, two parent-child sensor relationship is maintained because it is a bit tricky for users to do the rolling up correctly.
  4. We will keep all LEVEL-0 (instance) and LEVEL-1 (thread) sensors as INFO, and most of lower level sensors as DEBUG reporting level. They only exception is dropped-late-records and skipp-records since they indicate unexpected errors during processing and hence need to be paid attention by users
    1. .
  5. Some of the lower level metrics like "forward-rate" and "destroy-rate" are removed directly since they are overlapping with other existing metrics already.
  6. For some metrics that are only useful for a specific type of entities, like "expired-window-record-drop", we will only create the sensors lazily in order to save unnecessary costs for metrics reporters to iterate those empty sensors.

...