Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}"

Attribute Name

Description

records-latency-max

The latency of the partition with the longest latency.  Picks the largest partition record-latency-max of all partitions of a topic assigned to this client.

kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}"

...

Report latency metrics in the Kafka Consumer at the client (max latency) and partition level,  similar to how consumer lag is currently reported.

Since Kafka 0.10.0 the Kafka Producer will by default add a value for the timestamp property of a ProducerRecord unless it’s otherwise provided by the user.  The default value is the KIP-32 introduced a timestamp field to the Kafka message format.  The timestamp could be provided by the user, the KafkaProducer, or the Broker, depending on how the Broker configuration message.timestamp.type is defined.  When defined by Kafka the timestamp value uses current wallclock time represented as a unix epoch long in milliseconds returned by the Java Standard Library call to System.getCurrentMillis()In the general case where we assume records are produced and consumed in the same timezone, the same network, and on machines with We assume that all application and Broker machines enable active clock synchronization services , then and that the latency may be calculated by taking the difference of current wall clock time with the timestamp from a fetched recordConsumerRecord.  When latency is calculated as negative then the metric value will be reported as NaN.

UNIX epoch time is always represented as UTC time, and is therefore agnostic of any machine's particular default timezone.

...

At the partition level we can provide the latest calculated latency and the max and average latency within the metrics sample window.  At the topic and client level we only provide the max latency, which is is the max(record-latency-max) of all partitions assigned to a client for particular topic, or all topics.  An average, or some other percentile could also be represented.  A sum of partition latencies would not make sense because it's expected that consumers will consume partitions in parallel and not in a serial manner.

Info
titleUsing Latency Metric for SLAs

If a message was produced a long time ago, and a new consumer group has been created, then the latency metrics will have very high values until the consumer group catches up. This is especially true in the context of KIP-405: Kafka Tiered Storage, which allows reading very old messages. Therefore, a consumer application that relies on reading all messages from the past will report a high records-latency for a while.

Using this metric for SLAs should only be done when a consumer group is expected to be continuously consuming (in a steady state), and not for bootstrapping new consumer groups.

Compatibility, Deprecation, and Migration Plan

...

  1. It only works in cases where offsets are committed back to Kafka.  If an app or stream processor uses its own offset management then the current offset of a partition cannot be obtained from Kafka.
  2. It can only predict an estimate.  Accuracy improves with higher fidelity lookup tables (latest offsets looked up more frequently).

Disclosure: I (Sean Glover) am the author of the Kafka Lag Exporter project and the "Monitor Kafka Consumer Group Latency with Kafka Lag Exporter" blog post by Lightbend.