Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Kafka Streams exposes metrics on various levels. The number of metrics grows with the number of stream threads, the number of tasks (i.e., number of subtopologies and number of partitions), the number of processors, the number of state stores and the number of buffers in a Kafka Streams application. Hence, the number of metrics might grow substantially for a Kafka Streams client if it executes many task and/or many processors as well as if it has many state stores and/or many buffers.

Some users monitor their Kafka Streams applications by using with commercial monitoring services. Those services often limit the number of metrics that can be reported to them. Some providers truncate the metrics when the limit is exceeded. That means, that some metrics are then not sent to the monitoring service, which might lead to false alerts. For example, In Kafka Streams the metric alive-stream-threads records the number of alive stream threads. Users might configure their monitoring service to alert them on this metric when a stream thread dies. If metric alive-stream-threads is removed from the reported metrics because the limit of the number of reported metrics of the monitoring service is exceeded, users will get an alert although no stream thread actually died.        

In this KIP, we propose to add an API to the Kafka Streams client that adds a metric that records the aggregation of other metrics. The metric that records the aggregation can then be reported to the monitoring service instead of reporting multiple metrics that would be aggregated in the monitoring service anywayanyways. In such a way users can avoid exceeding the limit of number of reported metrics of the monitoring service and the associated possible false alerts.

...

Method KafkaStreams#addMetricsAggregation() creates will create one or more metrics on client-level that record the aggregation of the metrics specified by the arguments groupOfMetricsToAggregate and nameOfMetricsToAggregate. Before the specified metrics are aggregated, they are will be grouped by the tag labels provided in argument tagLabels. For example, if users want to aggregate state-store-level metric size-all-mem-tables (RocksDB specific metric) grouped by stream threads, they need to will provide the name size-all-mem-tables as argument nameOfMetricsToAggregate, the type stream-state-metrics as argument groupOfMetricsToAggregate, and the list of tag labels [thread-id] as argument tagLabels. If they additionally want to aggregate the metrics by task, they need to will provide [thread-id, task-id] as argument tagLabels. If users want to just aggregate by task, they can will provide [task-id] as argument tagLabels. 

Assuming argument tagLabels has n elements, the metrics that record the aggregation of the specified metrics are added with the following configuration:

...

type: stream-metrics
client-id: [client-id]
[thread-id]: [thread-id of metrics size-all-mem-tables that are aggregated in this metric]
[task-id]: [task-id of metrics size-all-mem-tables that are aggregated in this metric]
name: [name provided as argument name]

One metric per For each combination of tag values of different tag labels for which the a metric to aggregate exists is , one metric will be added. Consider the previous example and let's assume there exist stream-thread-1 and stream-thread-2. Stream-thread-1 has tasks 0_1 , 1_0, 1_1 and 1_2, and stream-thread-2 has task 0_0 and 0_2. Furthermore, let's assume that only tasks 0_0, 0_1, and 0_2 contain the metric (e.g. have a RocksDB state store). Then three metrics that record aggregations are added:

  • stream-thread-1 and task 0_1
  • stream-thread-2 and task 0_0
  • stream-thread-2 and task 0_2

Users can specify the recording level for the aggregations. The user specified recording level will not change the recording level of the metrics to aggregate. If the recording level for the application is set to INFO, a DEBUG-level metric that should be aggregated will not record values even if the metrics that records the aggregation is on recording level INFO.

The function that is used for the aggregation is will be specified by argument aggregationFunction and the initial value of the aggregate is will be specified by initialAggregateSupplier. The aggregation function takes will take the current aggregate as first argument and the value to add to the aggregate as second argument.

A metrics aggregation can only be added when the Kafka Streams client is in state CREATED.

The following code example shows how to add a metrics aggregation for the state-store-level metric size-all-mem-tables by stream threads and tasks:

...

No methods need to be deprecated and no migration plan is required.

Rejected Alternatives

  • Add the method to the StreamsMetrics interface: Adding the method to the StreamsMetrics interface would imply that the method could be called from everywhere within a processor that has access to an implementation of the StreamsMetrics interface. That would require , some ...  more concurrency control than adding the method to the KafkaStreams class. In our opinion, the value of adding the method to the StreamsMetrics interface does not outweigh the additional costs of concurrency control mechanisms, thus we rejected this approach.