Table of Contents

Status

Current state: Under DiscussionAccepted

Discussion thread: here

JIRA: KAFKA-9983, KAFKA-10054

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The following metrics would be added:

record-e2e-latency-max min [ms]
record-e2e-latency-p99 max [ms] (99th percentile)
record-e2e-latency-p90 [ms] (90th percentile)record-e2e-latency--min avg [ms]

These will be exposed on the task-level at with the recording level INFO with the following tags:

type = stream-processor-tasknode-metrics
thread-id=[threadId]
task-id=[taskId]
processor-node-id=[processorNodeId]

These will be reported for source and terminal operators at the recording level INFO

We will also expose these metrics on the processor-node-level for for stateful operators at the recording level TRACE with the following tags:

type = stream-processor-node-metrics
thread-id=[threadId]
task-id=[taskId]
processor-node-id=[processorNodeId]

(which is also being added as part of this KIP)

In all cases the metrics will be computed at the end of the operation or subtopology. In the case of , once the processing has been complete

Update

The min and max task-level metrics for example, this means the metric reflects the end-to-end-latency at the time it leaves the sink node.For those of you wondering what TRACE level metrics are, we will be adding this new metrics level as part of the KIP.INFO metrics have been added in 2.6, and the remaining metrics will ship in the next version

Proposed Changes

Imagine a simple 3-node subtopology with source node O, filter node F, aggregation A, and sink node I. For any record flowing through this with record timestamp t, let t_O be the system (wallclock) time when it is sent from the source topic, t_A be the time when it is finished being processed by the aggregator node, and t_I be the time when it leaves the sink node for the output or repartition topic. The end-to-end latency at operator O for a given record is defined as

...

This idea was originally discussed but ultimately put to rest as it does address the specific goal set out in this KIP, to report the time for an event to be reflected in the output. This alternative metric, which we call "staleness", has some use as a gauge of the record time when received by an operator, which may have implications for its processing for some operators. However this issue is orthogonal and thus rejected in favor of measuring at the record output.

Reporting mean or median (p50)

Rejected because:

...

Space shortcuts

Child pages

Versions Compared

Old Version 17

New Version Current

Key

Status

Update

Proposed Changes

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 17

New Version Current

Key

Status

Update

Proposed Changes