Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder DiscussionAccepted

Discussion thread: here 

JIRA: KAFKA-9983KAFKA-10054

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The following metrics would be added:

  • record-e2e-stalenesslatency-max min [ms]
  • record-stalenesse2e-latency-99th max [ms] (99th percentile)
  • record-staleness-75th [ms] (75th percentile)
  • record-staleness-avg [ms] (mean)
  • record-staleness-min e2e-latency-avg [ms]

These will be exposed on the task-level at  with the recording level INFO with the following tags:

  • type = stream-processor-tasknode-metrics
  • thread-id=[threadId]
  • task-id=[taskId]
  • processor-node-id=[processorNodeId]

These will be reported for source and terminal operators at the recording level INFO

We will also expose these metrics on the operator-level for stateful operators at the recording level DEBUG with the following tags:

...

TRACE (which is also being added as part of this KIP)

In all cases the metrics will be computed at the end of the operation, once the processing has been complete

Update

The min and max task-level INFO metrics have been added in 2.6, and the remaining metrics will ship in the next version

...

Proposed Changes

Imagine a simple 3-node subtopology with source node O, filter node F, aggregation A, and sink node node I. For  For any record flowing through this with record timestamp t, let tO be the system (wallclock) time when it is read sent from the source topic, tFA be the time when it is finished being processed by the filteraggregator node, and and tI be the time when it reaches leaves the sink node for the output or repartition topic. The staleness at end-to-end latency at operator for a given record is defined as 

SLO (t) = tO  - t

and likewise for the other operator-level end-to-end latencies. This represents the age of the record at the time is was received processed by operator O. The task-level staleness Send-to-end (e2e) latency L will be computed based on the source sink node, ie = SOLI. The source nodes node e2e latency reading from the user input topics therefore represent the consumption latency, the time it took for a newly-created event to be read by Streams. This can be especially interesting in cases where some records may be severely delayed: for example by a IoT device with unstable network connections, or when a user's smartphone reconnects to the internet after a flight and pushes all the latest updates. On the other side, the sink node e2e latency – which is also the task-level e2e latency, reveals how long it takes for the record to be fully processed through that subtopology. If the task is the final one in the full topology, this is the full end-to-end latencythe time it took for a record to be fully processed through Streams.

Note that for a given record,  SLO <= SF LA <= SI L. This holds true within and across subtopologies. A downstream subtopology will always have a task-level end-to-end latency staleness greater greater than or equal to that of an upstream subtopology for a single task , (which in turn implies the same holds true for the statistical measures exposed via the new metrics). Comparing the staleness e2e latency across tasks (or across operators) will also be of interest as this represents the processing delay: the amount of time it took for Streams to actually process the record from point A to point B within the topology. Seeing a large processing delay indicates a possible bottleneck in the topology and may help users debug their application performance. (Debugging topology bottlenecks is not the primary motivation of this KIP, but it is a nice side effect.) 

Late arriving records will be included in this metric, even if they are otherwise dropped due to the grace period having passed. Although we already expose a metric for the number of late dropped records, there is no way for a user to find out how late the record was. Including them in the staleness metrics may for one thing help users to set a reasonable grace period if they see that a large number of records are being dropped. Another slight difference with the concept of late records This metric is related but ultimately orthogonal to the concept of late-ness. One difference for example is that they are dropped based on stream time, whereas this metric is always reported with respect to the current time. The stream-time may lag the system time in low traffic, for example, or when all records are considerably delayed. This might mean the user sees no dropped records even though the staleness is large. 

...

We define the staleness in terms of individual record timestamps, but we could have instead it as the difference between the system time and the stream time, ie S = tO  - s where st is the stream-time. This approach has some drawbacks; first and foremost, we are losing information in the case of out-of-order data, and may not notice records with extreme delays. In the example above, an IoT device that regularly disconnects for long periods of time would push a lot of out-of-order data when it reconnects. Since these records would not advance or effect the stream-time, this delay would not be reflected as an increase in the processing latency metric. But the end-to-end latency of these records is of course quite high

Computing the metric at record intake time

This idea was originally discussed but ultimately put to rest as it does address the specific goal set out in this KIP, to report the time for an event to be reflected in the output. This alternative metric, which we call "staleness", has some use as a gauge of the record time when received by an operator, which may have implications for its processing for some operators. However this issue is orthogonal and thus rejected in favor of measuring at the record output.