Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateUnder discussion

Discussion thread: here and now here

JIRA: here TBD

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The following examples illustrate the derivation of the telemetry metric names from Kafka metric names:

Kafka metric nameTelemetry metric name
"connection-creation-rate", group="producer-metrics"

"messaging.kafka.producer.connection.creation.rate" 

"rebalance-latency-max", group="consumer-coordinator-metrics"

"messaging.kafka.consumer.coordinator.rebalance.latency.max" 

Other vendor or implementation-specific metrics can be added according to the following examples, using "contrib"  followed by an implementation-specific name as the namespace:

Implementation-specific metric nameTelemetry metric name

Client "io.confluent.librdkafka"

Metric name "client.produce.xmitq.latency"

"messaging.kafka.contrib.io.confluent.librdkafka.client.produce.xmitq.latency" 

Python client "com.example.client.python"

Metric name "object.count"

"messaging.kafka.contrib.com.example.client.python.object.count" 

Metrics may also hold any number of attributes which provide the multi-dimensionality of metrics. These are similarly derived from the tags of the Kafka metrics, and thus the properties of the equivalent JMX MBeans, replacing '-' with '_'. For example:

Kafka metric nameTelemetry metric name
"request-latency-avg", group="producer-node-metrics", client-id={client-id}, node-id={node-id}

"messaging.kafka.producer.node.request.latency.avg"

Attribute keys: "client_id"  and "node_id"  

Sparse metrics

To keep metrics volume down, it is recommended that a client only sends metrics with a recorded value.

...

All standard telemetry metric names begin with the prefix "messaging.kafka.". This is omitted from the table for brevity. The required metrics are bold.

Telemetry metric name

Type

Labels

Description

Existing Kafka metric name

producer.connection.creation.rate 

Gauge


The rate of connections established per second.

“connection-creation-rate”, group=”producer-metrics”

producer.connection.creation.total 

Sum


The total number of connections established.

“connection-creation-total”, group=”producer-metrics”

producer.node.request.latency.avg 

Gauge

node_id

The average request latency in ms for a node.

“request-latency-avg”, group=”producer-node-metrics”

producer.node.request.latency.max 

Gauge

node_id

The maximum request latency in ms for a node.

“request-latency-max”, group=”producer-node-metrics”

producer.produce.throttle.time.avg 

Gauge


The average time in ms a request was throttled by the broker.

“produce-throttle-time-avg”, group=“producer-metrics”

producer.produce.throttle.time.max 

Gauge


The maximum time in ms a request was throttled by the broker.

“produce-throttle-time-max”, group=“producer-metrics”

producer.record.queue.time.avg 

Gauge


The average time in ms record batches spent in the send buffer.

“record-queue-time-avg”, group=“producer-metrics”

producer.record.queue.time.max 

Gauge


The maximum time in ms record batches spent in the send buffer.

“record-queue-time-max”, group=“producer-metrics”

Standard consumer metrics

All standard telemetry metric names begin with the prefix "messaging.kafka.". This is omitted from the table for brevity. The required metrics are bold.

Telemetry metric name

Type

Labels

Description

Existing metric name

consumer.connection.creation.rate 

Gauge


The rate of connections established per second.

“connection-creation-rate”, group= “consumer-metrics”

consumer.connection.creation.total 

Sum


The total number of connections established.

“connection-creation-total”, group=”consumer-metrics”

consumer.node.request.latency.avg 

Gauge

node_id

The average request latency in ms for a node.

“request-latency-avg”, group= “consumer-node-metrics”

consumer.node.request.latency.max 

Gauge

node_id

The maximum request latency in ms for a node.

“request-latency-max”, group=“consumer-node-metrics”

consumer.poll.idle.ratio.avg 

Gauge


The average fraction of time the consumer’s poll() is idle as opposed to waiting for the user code to process records.

“poll-idle-ratio-avg”, group=“consumer-metrics”

consumer.coordinator.commit.latency.avg 

Gauge


The average time taken for a commit request.

“commit-latency-avg”, group=“consumer-coordinator-metrics”

consumer.coordinator.commit.latency.max 

Gauge


The maximum time taken for a commit request.

“commit-latency-max”, group=“consumer-coordinator-metrics”

consumer.coordinator.assigned.partitions 

Gauge


The number of partitions currently assigned to this consumer.

“assigned-partitions”, group=“consumer-coordinator-metrics”

consumer.coordinator.rebalance.latency.avg 

Gauge


The average time taken for group rebalance.

“rebalance-latency-avg”, group=“consumer-coordinator-metrics”

consumer.coordinator.rebalance.latency.max 

Gauge


The maximum time taken for a group rebalance.

“rebalance-latency-max”, group=“consumer-coordinator-metrics”

consumer.coordinator.rebalance.latency.total 

Sum


The total time taken for group rebalances.

“rebalance-latency-total”, group=“consumer-coordinator-metrics”

consumer.fetch.manager.fetch.latency.avg 

Gauge


The average time taken for a fetch request.

“fetch-latency-avg”, group=“consumer-fetch-manager-metrics”

consumer.fetch.manager.fetch.latency.max 

Gauge


The maximum time taken for a fetch request.

“fetch-latency-max”, group=“consumer-fetch-manager-metrics”

Standard client resource labels

The following labels should be added by the client as appropriate before metrics are pushed.

Label name

Description

application_idapplication.id  (Kafka Streams only)

client_rack

client.rack (if configured)

group_id

group.id (consumer)

group_instance_id

group.instance.id (consumer)

group_member_id

Group member id (if any, consumer)

transactional_id

transactional.id (producer)

Broker-added labels

The following labels should be added by the broker plugin as metrics are received.

Label name

Description

client_instance_id

The generated CLIENT_INSTANCE_ID.
client_idclient.id as reported in the Kafka protocol header.

client_software_name

The client’s implementation name as reported in ApiVersionRequest.

client_software_version

The client’s version as reported in ApiVersionRequest.

client_source_address

The client connection’s source address.

client_source_port

The client connection’s source port.

principal

Client’s security principal. Content depends on authentication method.

broker_id

Receiving broker’s node-id.

Client behavior

A client that supports this metric interface and identifies a supporting broker (through detecting at least GetTelemetrySubscriptionsRequestV0 in the ApiVersionResponse) will start off by sending a GetTelemetrySubscriptionsRequest with the ClientInstanceId field set to Null to one randomly selected connected broker to gather its client instance id, the subscribed metrics, the push interval, accepted compression types, etc. This handshake with a Null ClientInstanceId is only performed once for a client instance's lifetime. Sub-sequent GetTelemetrySubscriptionsRequests must include the ClientInstanceId returned in the first response, regardless of broker.

...

Actions to be taken by the client if the GetTelemetrySubscriptionsResponse.ErrorCode or PushTelemetryResponse.ErrorCode is set to a non-zero value.

Error code

Reason

Client action

InvalidRecord (87)

Broker failed to decode or validate the client’s encoded metrics.

Log a warning to the application and schedule the next GetTelemetrySubscriptionsRequest to 5 minutes.

UnknownSubscriptionId (NEW)Client sent a PushTelemetryRequest with an invalid or outdated SubscriptionId, the configured subscriptions have changed.Send a GetTelemetrySubscriptionRequest to update the client's subscriptions.

UnsupportedCompressionType (76)

Client’s compression type is not supported by the broker.

Send a GetTelemetrySubscriptionRequest to get an up-to-date list of the broker's supported compression types (and any subscription changes).

The 5 and 30 minute retries are to eventually trigger a retry and avoid having to restart clients if the cluster metrics configuration is disabled temporarily, e.g., by operator error, rolling upgrades, etc.

...

This applies to producers, consumers, admin client, and of course embedded uses of these clients in frameworks such as Kafka Connect.

ConfigurationDescriptionValues
enable.metrics.push Whether to enable pushing of client metrics to the cluster, if the cluster has a client metrics subscription which matches this client.

true (default) - The client will push metrics if there are any matching subscriptions.

false  - The client will not push metrics. 

Client metrics configuration

These are the configurations for client metrics resources. A client metrics subscription is defined by the configurations for a resource of type CLIENT_METRICS .

ConfigurationDescriptionValues
metrics A list of telemetry metric name prefixes which specify the metrics of interest.

An empty list means no metrics subscribed.

A list containing just an empty string means all metrics subscribed.

Otherwise, the list entries are prefix-matched against the metric names.

interval.ms The client metrics push interval in milliseconds.Default: 30000 (5 minutes)
match The match criteria for selecting which clients the subscription matches. If a client matches all of these criteria, the client matches the subscription.

A list of key-value pairs.

The valid keys are:

  • client_instance_id - CLIENT_INSTANCE_ID UUID string representation.
  • client_id  - client's reported client.id in the GetTelemetrySubscriptionsRequest.
  • client_software_name  - client software implementation name.
  • client_software_version  - client software implementation version.
  • client_source_address  - client connection's source address from the broker's point of view.
  • client_source_port  - client connection's source port from the broker's point of view.

The values are anchored regular expressions.

New error codes

UnknownSubscriptionId  - Client sent a PushTelemetryRequest with an invalid or outdated SubscriptionId. The configured subscriptions have changed.

...