...
There are several things that are out of scope for this proposal, though they may be addressed in future KIPs. First, this proposal expressly avoids changes to the Connect API, and therefore does not address how connector implementations can define their own connector-specific metrics. Second, Kafka Connect does not have any existing mechanism to aggregate the metrics reported by each workeracross multiple workers.
Public Interfaces
All of the following will be added via Kafka's metrics library like most of the metrics in the Kafka brokers and other components. The context of all metrics are limited to the worker where the metrics are being reported, and all metrics include the worker ID in the MBean attribute (similarly to how Kafka producer and consumer metrics include the client ID)are defined as attributes on the specified MBean attribute and are measured within the context of a single worker.
Connector Metrics
...
MBean name: kafka.connect:type=connector-metrics
...
,connector=([-.\w]+)
Metric Name | Description |
---|---|
connector-classtype | The name type of the connector class, one of: source, sink |
connector-class | The name of the connector classkafka.connect:type=connector-metrics,name=connector-class,connector=([-.\w]+) |
connector-version | The version of the connector class, as reported by the connector in this workerkafka.connect:type=connector-metrics,name=connector-version,connector=([-.\w]+) |
status | The current status of the connector in this worker, one of: running, paused, stopped kafka |
Common Task Metrics
MBean name: kafka.connect:type=
...
task-metrics
...
,connector=([-.\w]+)
...
Metric Name | DescriptionMBean attribute | |
---|---|---|
status | The current status of this task, one of: unassigned, running, paused, failed, destroyedkafka.connect:type=task-metrics,name=status,connector=([-.\w]+),task=([\d]+) | |
pause-ratio | The fraction of time this task has spent in the paused state.kafka.connect:type=task-metrics,name=pause-ratio ,connector=([-.\w]+),task=([\d]+) | |
offset-commit-success-percentage | The average percentage of this task's offset commit attempts that succeededkafka.connect:type=task-metrics,name=offset-commit-success-percentage,connector=([-.\w]+),task=([\d]+) | |
offset-commit-failure-percentage | The average percentage of this task's offset commit attempts that failed or had an error | kafka.connect:type=task-metrics,name=offset-commit-failure-percentage,connector=([-.\w]+),task=([\d]+) |
offset-commit-max-time | The maximum time taken by this task to commit offsets | |
kafka.connect:type=task-metrics,name=offset-commit-max-time,connector=([-.\w]+),task=([\d]+)offset-commit-99p-time | The 99th percentile time spent by this task to commit offsets | |
kafka.connect:type=task-metrics,name=offset-commit-99p-time,connector=([-.\w]+),task=([\d]+)offset-commit-95p-time | The 95th percentile time spent by this task to commit offsetskafka.connect:type=task-metrics,name=offset-commit-95p-time,connector=([-.\w]+),task=([\d]+) | |
offset-commit-90p-time | The 90th percentile time spent by this task to commit offsets | |
kafka.connect:type=task-metrics,name=offset-commit-90p-time,connector=([-.\w]+),task=([\d]+)offset-commit-75p-time | The 75th percentile time spent by this task to commit offsets | |
kafka.connect:type=task-metrics,name=offset-commit-75p-time,connector=([-.\w]+),task=([\d]+)offset-commit-50p-time | The 50th percentile (average) time spent by this task to commit offsets | kafka.connect:type=task-metrics,name=offset-commit-50p-time,connector=([-.\w]+),task=([\d]+) |
batch-size-max | The maximum size of the batches processed by the connectorkafka.connect:type=task-metrics,name=batch-size-max, connector=([-.\w]+),task=([\d]+) | |
batch-size-avg | The average size of the batches processed by the connector |
Source Task Metrics
MBean name: kafka.connect:type=source-task-metrics
...
,connector=([-.\w]+),task=([\d]+)
...
Source Task Metrics
Metric Name | DescriptionMBean attribute |
---|---|
source-record-poll-rate | The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker. This is before transformations are applied.kafka.connect:type=source-task-metrics,name=source-record-produce-rate,connector=([-.\w]+),task=([\d]+) |
source-record-write-rate | The average per-second number of records per second output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied. |
Sink Task Metrics
MBean name: kafka.connect:type=
...
sink-task-metrics
...
,connector=([-.\w]+),task=([\d]+)
...
...
Metric Name | Description | MBean attribute | ||
---|---|---|---|---|
sink-record-read-rate | The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.kafka.connect:type=sink-task-metrics,name= | |||
sink-record | -read-rate,connector=([- | .\w]+),task=([\d]+)sink-record-send-rate | The average per-second numbrer of records output from the transformations and sent to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations. | kafka.connect:type=sink-task-metrics,name=sink-record-process-rate ,connector=([-.\w]+),task=([\d]+) |
sink-record-lag-max | The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions. | |||
partition-count | The number of topic partitions assigned to this task belonging to the named sink connector in this | windowworker. |
MBean name: kafka.connect:type=sink-task-metrics
...
,connector=([-.\w]+),task=([\d]+)
...
,
...
topic
...
=([-.\w]+),
...
partition=([\d]+)
Metric Name | Description | |||
---|---|---|---|---|
sink-record-{topic}-{partition}.records-lag-avg | The average latest lag in terms of number of records behind the consumer the offset commits are for the topic partition. | |||
kafka.connect:type=sink-task-metrics,name=sink-record-{topic}-{partition}-lag-avg,connector=([-.\w]+),task=([\d]+) | sink-record-{topic}-{partition}.records-lag-max | The average The maximum lag in terms of number of records behind the consumer the offset commits are for the topic partition. | ||
kafka.connect:type=sink-task-metrics,name=sink-record-{topic}-{partition}-lag-max,connector=([-.\w]+),task=([\d]+) | partition-count | The number of topic partitions assigned to this task belonging to the named sink connector in this worker. | The maximum lag in terms of number of records behind the consumer the offset commits are for the topic partition. |
Worker Metrics
MBean name: kafka.connect:type=
...
connect-
...
worker-metrics,
...
connector=([-.\w
...
]+)
...
Worker Metrics
Metric Name | DescriptionMBean attribute |
---|---|
assigned-tasks | The number of tasks run in this worker (mirrors existing metric) |
kafka.connect:type=connect-coordinator-metrics,name=assigned-tasksassigned-connectors | The number of connectors run in this worker (mirrors existing metric)kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors |
task-count | The number of tasks run in this workerkafka.connect:type=connect- worker-metrics,name=task-count |
connector-count | The number of connectors run in this workerkafka.connect:type=connect- worker-metrics,name=connector-count |
leader-name | The name of the group leaderkafka.connect:type=connect-worker-metrics,name= leader-name |
state | The state of this worker, one of: rebalancing, runningkafka.connect:type=connect-worker-metrics,name=state |
rest-request-rate | The average per second number of requests handled by the REST endpoints in this worker |
Worker Rebalance Metrics
MBean name: kafka.connect:type=connect-worker-rebalance-metrics,
...
connector=
...
([-.\w]+)
Metric Name | DescriptionMBean attribute | ||
---|---|---|---|
rebalance-success-total | The total number of this worker's successful rebalanceskafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-total | ||
rebalance-success-percentage | The average percentage of this worker's rebalances that succeeded | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-percentage | |
rebalance-failure-total | The total number of this worker's failed rebalances | ||
kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-totalrebalance-failure-percentage | The average percentage of this worker's rebalances that failed | ||
kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-percentagerebalance-max-time | The maximum time spent by this worker to rebalancekafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-max-time | ||
rebalance-99p-time | The 99th percentile time spent by this worker to rebalance during the last window (defaults to an hour) | ||
kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-99p-timerebalance-95p-time | The 95th percentile time spent by this worker to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name= | ||
rebalance- | 95p-timerebalance-90p-time | The 90th percentile time spent by this worker to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-90p-time | |
rebalance-75p-time | The 75th percentile time spent by this worker to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-75p-time | ||
rebalance-50p-time | The 50th percentile (average) time spent by this worker to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-50p- | ||
time | time-since-last-rebalance | The time since the most recent rebalance in this worker | kafka.connect:type=connect-worker-rebalance-metrics,name=time-since-last-rebalance |
task-failure-rate | The number of tasks that failed in this workerkafka.connect:type=connect- worker-rebalance-metrics,name=task-failure-rate |
Proposed Changes
We will add the relevant metrics as specified in the Public Interfaces section, except for the two existing metrics that will be left unmodified.
...