Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update reactive workflow to use client_source_address

...

  • metrics  - a comma-separated list of metric name prefixes, e.g., "client.producer.partition., client.io.wait". Whitespaces are ignored.
  • interval  - metrics push interval in milliseconds. Defaults to 5 minutes if not specified.
  • match_<selector>  - Client matching selector that is evaluated as an anchored regexp (i.e., "something.*" is treated as "^something.*$"). Any client that matches all of the match_..  selectors will be eligible for this metrics subscription. Initially supported selectors:
    • client_instance_id - CLIENT_INSTANCE_ID UUID string representation.
    • client_software_name  - client software implementation name.
    • client_software_version  - client software implementation version.
    • client_source_address  - client connection's source address from the broker's point of view.
    • client_source_port  - client connection's source port from the broker's point of view.


Example using the standard kafka-configs.sh tool:

...

Broker-added labels

The following labels are should be added by the broker plugin as metrics are received

Label name

Description

client_instance_id

The generated CLIENT_INSTANCE_ID.
client_idclient.id as reported in the Kafka protocol header.

client_software_name

The client’s implementation name as reported in ApiVersionRequest.

client_software_version

The client’s version as reported in ApiVersionRequest.

client_source_address

The client connection’s source address.

client_source_port

The client connection’s source port.

principal

Client’s security principal. Content depends on authentication method.

broker_id

Receiving broker’s node-id.

...

Before sending the alert to the incident management system the monitoring system collects a set of labels that are associated with this CLIENT_INSTANCE_ID, such as:

  • client.id
  • client_source_address and client_source_port on broker id X (1 or more such mappings based on how many connections the client has used to push metrics).
  • principal
  • tenant
  • client_software_name and client_software_version
  • In case of consumer: group_id, group_instance_id (if configured) and the latest known group_member_id.
  • In case of transactional producer: transactional_id

...

The Kafka cluster configuration for metrics collection (i.e., metrics subscriptions) is irrelevant to this use-case, given that the proper a metrics plugin is enabled on the brokers. The metrics plugin is configured to write metrics to a topic. A support system with an interactive interface is reading from this metrics topic, and has an Admin client to configure the cluster with desired metrics subscriptions.

The application owner reports a lagging consumer that is not able to keep up with the incoming message rate and asks for the Kafka operator to help troubleshoot. The application owner, who unfortunately does not know the client instance id of the consumer, provides the client.id, userid, and source address.

The Kafka operator adds a metrics subscription for metrics matching prefix “org.apache.kafka.client.consumer.” and with the corresponding client_id and source_address as metrics matching selectors selector. Since this is a live troubleshooting case the metrics push interval is set to a low 10 seconds.

...

Upon the next PushTelemetryRequest, which now includes metrics for the subscribed metrics, the metrics are written to the output topic and the PushIntervalMs is adjusted to the configured interval of 10 seconds. This repeats until the metrics subscription configuration is changed.As the consumer metrics are now being written to the metrics topic the support system reads the metrics, sees that there is an active viewer for

Multiple consumers from the same source address may now be pushing metrics to the cluster. The support system starts receiving the metrics and soon finds a metric push from the desired client.id which now provides a mapping from client.id to client_instance_id. At this point the metrics subscription may be altered to only match the client_instance_id of the matching client. But in either case the metrics matching the given client.id , and displays the metrics are displayed to the operator.

The operator identifies an increasing trend in client.consumer.processing.time which indicates slow per-message processing in the application and reports this back to the application owner, ruling out the client and Kafka cluster from the problem space.

...