...
An important part of deploying Kafka Connect is monitoring the health of the workers in a cluster and the connectors and tasks that have been deployed to the cluster. The Kafka Connect framework only has a few metrics capturing the number of connectors and tasks for each worker, so we propose to add metrics to monitor more information about the connectors, tasks, and workers. This All metrics reported by each worker are scoped by the activities within that worker.
There are several things that are out of scope for this proposal, though they may be addressed in future KIPs. First, this proposal expressly avoids changes to the Connect API, and therefore does not address how connector implementations can define their own connector-specific metrics. All metrics reported by each worker are scoped by the activities within that worker. Second, Kafka Connect does not have any existing mechanism to aggregate the metrics reported by each worker, and therefore any such aggregation is out of scope for this KIP.
Public Interfaces
All of the following will be added via Kafka's metrics library like most of the metrics in the Kafka brokers and other components. The scope context of all metrics are limited to the worker where the metrics are being reported, and all metric names metrics include the name of the worker ID in the MBean attribute (similarly to how Kafka producer and consumer metrics include the client ID).
Source Task Metrics
Metric Name | Description | MBean attribute |
---|---|---|
source record produce rate | The number of records per second produced (before transformation) by this task belonging to the named source connector in this worker | kafka.connect:type=source-task-metrics,name=source-record-produce-rate,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
source record produce total | The total number of records produced (before transformation) by this task belonging to the named source connector in this worker | kafka.connect:type=source-task-metrics,name=source-record-produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+) |
source record write rate | The number of records per second output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformation and excludes any records filtered out by the transformations. | kafka.connect:type=source-task-metrics,name=source-record-produce-rate,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
source record write total | The total number of records output from the transformations and written to Kafka by this task belonging to the named source connector in this worker. This is after transformation and excludes any records filtered out by the transformations. | kafka.connect:type=source-task-metrics,name=source-record-produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+) |
poll time percentage | The average percentage of time spent polling this task belonging to the named source connector in this worker | kafka.connect:type=source-task-metrics,name=poll-time-percentage,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
transform time percentage | The average percentage of time spent transforming source records for this task belonging to the named source connector in this worker | kafka.connect:type=source-task-metrics,name=transform-time-percentage,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
write time percentage | The average percentage of time spent converting and writing source records for this task belonging to the named source connector in this worker | kafka.connect:type=source-task-metrics,name=write-time-percentage,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
pause time percentage | The average percentage of time this task belonging to the named source connector in this worker were paused | kafka.connect:type=source-task-metrics,name=pause-time-percentage,worker=([-.\w]+),connector=([-.\w]+),task=([\d]+) |
...