Page History

...

Sum - Monotonic total count meter (Counter). Suitable for total number of X counters, e.g., total number of bytes sent.
Gauge - Non-monotonic current value meter (UpDownCounter). Suitable for current value of Y, e.g., current queue count.
Histogram - Value distribution meter (ValueRecorder). Suitable for latency values, etc.
For simplicy a client implementation may choose to provide an average value as Gauge instead of a Histogram. These averages should be using the original Histogram metric name + ".avg60s" (or whatever the averaging period is), e.g., "client.request.rtt.avg10s".

Client instance-level metrics

Metric name	Type	Labels	Description
client.connection.creations	Sum	FIXME: with broker_id label?	Total number of broker connections made.
client.connection.count	Gauge		Current number of broker connections.
client.connection.errors	Sum	reason	Total number of broker connection failures. Label ‘reason’ indicates the reason: disconnect - remote peer closed the connection. auth - authentication failure. TLS - TLS failure. timeout - client request timeout. close - client closed the connection.
client.request.rtt	GaugeHistogram	broker_id	Average request latency / round-trip-time to broker and back
client.request.queue.latency	GaugeHistogram	broker_id	Average request queue latency waiting for request to be sent to broker.
client.request.queue.count	Gauge	broker_id	Number of requests in queue waiting to be sent to broker.
client.request.success	Sum	broker_id	Number of successful requests to broker, that is where a response is received without no request-level error (but there may be per-sub-resource errors, e.g., errors for certain partitions within an OffsetCommitResponse).
client.request.errors	Sum	broker_id reason	Number of failed requests. Label ‘reason’ indicates the reason: timeout - client timed out the request, disconnect - broker connection was closed before response could be received, error - request-level protocol error.
client.io.wait.time	GaugeHistogram		Amount of time waiting for socket I/O . FIXME: histogram? Avg? Total?Should this be for POLLOUT only?writability (POLLOUT). A high number indicates socket send buffer congestion.

As the client will not know the broker id of its bootstrap servers the broker_id label should be set to “bootstrap”. FIXME: Should we have a broker_address (“host:port”) for this purpose?

...

Metric name	Type	Labels	Description
client.consumer.poll.interval	Gauge FIXMEHistogram		The interval at which the application calls poll(), in seconds.
client.consumer.poll.last	Gauge		The number of seconds since the last poll() invocation.
client.consumer.poll.latency	GaugeHistogram		The time it takes poll() to return a new message to the application
client.consumer.commit.count	Sum		Number of commit requests sent.
client.consumer.group.assignment.strategy	String		Current group assignment strategy in use.
client.consumer.group.assignment.partition.count	Gauge		Number of currently assigned partitions to this consumer by the group leader.
client.consumer.assignment.partition.count	Gauge		Number of currently assigned partitions to this consumer, either through the group protocol or through assign().
client.consumer.group.rebalance.count	Sum		Number of group rebalances.
client.consumer.group.error.count	Sum	error	Consumer group error counts. The error label depicts the actual error, e.g., "MaxPollExceeded", "HeartbeatTimeout", etc.
client.consumer.record.queue.count	Gauge		Number of records in consumer pre-fetch queue.
client.consumer.record.queue.bytes	Gauge		Amount of record memory in consumer pre-fetch queue. This may also include per-record overhead.
client.consumer.record.application.count	Sum		Number of records consumed by application.
client.consumer.record.application.bytes	Sum		Memory of records consumed by application.
client.consumer.fetch.latency	GaugeHistogram		FetchRequest latency.
client.consumer.fetch.count	Count		Total number of FetchRequests sent.
client.consumer.fetch.failures	Count		Total number of FetchRequest failures.

...

Metric name	Type	Labels	Description
client.producer.partition.queue.bytes	Gauge	topic partition acks=all\|none\|leader	Number of bytes queued on partition queue.
client.producer.partition.queue.count	Gauge	topic partition acks=all\|none\|leader	Number of records queued on partition queue.
client.producer.partition.latency	GaugeHistogram	topic partition acks=all\|none\|leader	Total produce record latency, from application calling send()/produce() to ack received from broker.
client.producer.partition.queue.latency	GaugeHistogram	topic partition acks=all\|none\|leader	Time between send()/produce() and record being sent to broker.
client.producer.partition.record.retries	Sum	topic partition acks=all\|none\|leader	Number of ProduceRequest retries.
client.producer.partition.record.failures	Sum	topic partition acks=all\|none\|leader reason	Number of records that permanently failed delivery. Reason is a short string representation of the reason, which is typically the name of a Kafka protocol error code, e.g., “RequestTimedOut”.
client.producer.partition.record.success	Sum	topic partition acks=all\|none\|leader	Number of records that have been successfully produced.

...

Space shortcuts

Child pages

Versions Compared

Old Version 21

New Version 22

Key

Client instance-level metrics