Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
An important part of deploying Kafka Connect is monitoring the health of the workers in a cluster and the connectors and tasks that have been deployed to the cluster. The Kafka Connect framework only has a few metrics capturing the number of connectors and tasks for each worker, so we propose to add metrics to monitor more information about the connectors, tasks, and workers. This proposal expressly avoids changes to the Connect API, and therefore does not address how connector implementations can define their own connector-specific metrics.
Public Interfaces
All of the following will be added via Kafka's metrics library like most of the metrics in the Kafka brokers and other components.
Source Task Metrics
Metric Name | Description | MBean attribute |
---|---|---|
source record rate | The number of records produced per second by this task belonging to the named source connector | kafka.connect:type=source-task-metrics,name=source-record-rate,connector=([-.\w]+),task=([\d]+) |
source record total | The total number of records produced by this task belonging to the named source connector | kafka.connect:type=source-task-metrics,name=source-record-total,connector=([-.\w]+),task=([\d]+) |
poll time percentage | The average percentage of time spent polling this task belonging to the named source connector | kafka.connect:type=source-task-metrics,name=poll-time-percentage,connector=([-.\w]+),task=([\d]+) |
transform time percentage | The average percentage of time spent transforming source records for this task belonging to the named source connector | kafka.connect:type=source-task-metrics,name=transform-time-percentage,connector=([-.\w]+),task=([\d]+) |
write time percentage | The average percentage of time spent converting and writing source records for this task belonging to the named source connector | kafka.connect:type=source-task-metrics,name=write-time-percentage,connector=([-.\w]+),task=([\d]+) |
pause time percentage | The average percentage of time this task belonging to the named source connector were paused | kafka.connect:type=source-task-metrics,name=pause-time-percentage,connector=([-.\w]+),task=([\d]+) |
Source Connector Metrics
Metric Name | Description | MBean attribute |
---|---|---|
source record rate | The number of records produced per second by all tasks belonging to the named source connector | kafka.connect:type=source-connector-metrics,name=source-record-rate,connector=([-.\w]+) |
source record total | The total number of records produced by all tasks belonging to the named source connector | kafka.connect:type=source-connector-metrics,name=source-record-total,connector=([-.\w]+) |
poll time percentage | The average percentage of time spent polling all tasks belonging to the named source connector | kafka.connect:type=source-connector-metrics,name=poll-time-percentage,connector=([-.\w]+) |
transform time percentage | The average percentage of time spent transforming source records for all tasks belonging to the named source connector | kafka.connect:type=source-connector-metrics,name=transform-time-percentage,connector=([-.\w]+) |
write time percentage | The average percentage of time spent converting and writing source records for all tasks belonging to the named source connector | kafka.connect:type=source-connector-metrics,name=write-time-percentage,connector=([-.\w]+) |
pause time percentage | The average percentage of time all tasks belonging to the named source connector were paused | kafka.connect:type=source-connector-metrics,name=pause-time-percentage,connector=([-.\w]+) |
status | The current status of the connector, one of: running, paused, stopped | kafka.connect:type=source-connector-metrics,name=status,connector=([-.\w]+) |
Sink Task Metrics
Metric Name | Description | MBean attribute |
---|---|---|
sink record rate | The number of records produced per second by this task belonging to the named source connector | kafka.connect:type=sink-task-metrics,name=sink-record-rate,connector=([-.\w]+),task=([\d]+) |
sink record total | The total number of records produced by this task belonging to the named source connector | kafka.connect:type=sink-task-metrics,name=sink-record-total,connector=([-.\w]+),task=([\d]+) |
read time percentage | The average percentage of time spent polling this task belonging to the named source connector | kafka.connect:type=sink-task-metrics,name=read-time-percentage,connector=([-.\w]+),task=([\d]+) |
transform time percentage | The average percentage of time spent transforming sink records for this task belonging to the named sink connector | kafka.connect:type=sink-task-metrics,name=transform-time-percentage,connector=([-.\w]+),task=([\d]+) |
put time percentage | The average percentage of time this task belonging to the named sink connector spent putting/processing sink records | kafka.connect:type=sink-task-metrics,name=put-time-percentage,connector=([-.\w]+),task=([\d]+) |
flush time percentage | The average percentage of time this task belonging to the named sink connector spent flushing sink records | kafka.connect:type=sink-task-metrics,name=flush-time-percentage,connector=([-.\w]+),task=([\d]+) |
pause time percentage | The average percentage of time this task belonging to the named sink connector were paused | kafka.connect:type=sink-task-metrics,name=pause-time-percentage,connector=([-.\w]+),task=([\d]+) |
Sink Connector Metrics
Metric Name | Description | MBean attribute |
---|---|---|
sink record rate | The number of records produced per second by all tasks belonging to the named source connector | kafka.connect:type=sink-connector-metrics,name=sink-record-rate,connector=([-.\w]+) |
sink record total | The total number of records produced by all tasks belonging to the named source connector | kafka.connect:type=sink-connector-metrics,name=sink-record-total,connector=([-.\w]+) |
read time percentage | The average percentage of time spent polling all tasks belonging to the named source connector | kafka.connect:type=sink-connector-metrics,name=read-time-percentage,connector=([-.\w]+) |
transform time percentage | The average percentage of time spent transforming sink records for all tasks belonging to the named sink connector | kafka.connect:type=sink-connector-metrics,name=transform-time-percentage,connector=([-.\w]+) |
put time percentage | The average percentage of time all tasks belonging to the named sink connector spent putting/processing sink records | kafka.connect:type=sink-connector-metrics,name=put-time-percentage,connector=([-.\w]+) |
flush time percentage | The average percentage of time all tasks belonging to the named sink connector spent flushing sink records | kafka.connect:type=sink-connector-metrics,name=flush-time-percentage,connector=([-.\w]+) |
pause time percentage | The average percentage of time all tasks belonging to the named sink connector were paused | kafka.connect:type=sink-connector-metrics,name=pause-time-percentage,connector=([-.\w]+) |
status | The current status of the connector, one of: running, paused, stopped | kafka.connect:type=sink-connector-metrics,name=status,connector=([-.\w]+) |
partition count | The number of topic partitions assigned to all tasks running in this worker for the named sink connector | kafka.connect:type=sink-connector-metrics,name=partition-count,connector=([-.\w]+) |
Worker Metrics
Metric Name | Description | MBean attribute |
---|---|---|
assigned tasks | The number of tasks run in this worker (existing metric) | kafka.connect:type=connect-coordinator-metrics,name=assigned-tasks,worker=([-.\w]+) |
assigned connectors | The number of connectors run in this worker (existing metric) | kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors,worker=([-.\w]+) |
task count | The number of tasks run in this worker | kafka.connect:type=connect-worker-metrics,name=assigned-tasks,worker=([-.\w]+) |
connector count | The number of connectors run in this worker | kafka.connect:type=connect-worker-metrics,name=assigned-connectors,worker=([-.\w]+) |
sink record rate | The number of sink records consumed per second by all sink connectors | kafka.connect:type=connect-worker-metrics,name=sink-record-rate,worker=([-.\w]+) |
sink record total | The total number of sink records consumed by all sink connector | kafka.connect:type=connect-worker-metrics,name=sink-record-total,worker=([-.\w]+) |
source record rate | The number of source records produced per second by all source connectors | kafka.connect:type=connect-worker-metrics,name=source-record-rate,worker=([-.\w]+) |
source record total | The total number of source records produced by all source connector | kafka.connect:type=connect-worker-metrics,name=source-record-total,worker=([-.\w]+) |
leader name | The name of the group leader | kafka.connect:type=connect-worker-metrics,name=leader-name,worker=([-.\w]+) |
state | The state of this worker, one of: rebalancing, running | kafka.connect:type=connect-worker-metrics,name=state,worker=([-.\w]+) |
offset commit success total | The total number of successful offset commits | kafka.connect:type=connect-worker-metrics,name=offset-commit-success-total,worker=([-.\w]+) |
offset commit success percentage | The average percentage of offset commits that succeeded | kafka.connect:type=connect-worker-metrics,name=offset-commit-success-percentage,worker=([-.\w]+) |
offset commit failure total | The total number of failed offset commits | kafka.connect:type=connect-worker-metrics,name=offset-commit-failure-total,worker=([-.\w]+) |
offset commit failure percentage | The average percentage of offset commits that failed | kafka.connect:type=connect-worker-metrics,name=offset-commit-failure-percentage,worker=([-.\w]+) |
offset commit total | The total number of offset commits | kafka.connect:type=connect-worker-metrics,name=offset-commit-total,worker=([-.\w]+) |
offset commit maximum time | The maximum time spent to commit offsets | kafka.connect:type=connect-worker-metrics,name=offset-commit-max-time,worker=([-.\w]+) |
offset commit 99th percentile time | The 99th percentile time spent to commit offsets during the last window (defaults to an hour) | kafka.connect:type=connect-worker-metrics,name=offset-commit-99p-time,worker=([-.\w]+) |
offset commit 95th percentile time | The 95th percentile time spent to commit offsets during the last window (defaults to an hour) | kafka.connect:type=connect-worker-metrics,name=offset-commit-95p-time,worker=([-.\w]+) |
offset commit 90th percentile time | The 90th percentile time spent to commit offsets during the last window (defaults to an hour) | kafka.connect:type=connect-worker-metrics,name=offset-commit-90p-time,worker=([-.\w]+) |
offset commit 75th percentile time | The 75th percentile time spent to commit offsets during the last window (defaults to an hour) | kafka.connect:type=connect-worker-metrics,name=offset-commit-75p-time,worker=([-.\w]+) |
Worker Rebalance Metrics
Metric Name | Description | MBean attribute |
---|---|---|
rebalance success total | The total number of successful rebalances | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-total,worker=([-.\w]+) |
rebalance success percentage | The average percentage of rebalances that succeeded | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-percentage,worker=([-.\w]+) |
rebalance failure total | The total number of failed rebalances | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-total,worker=([-.\w]+) |
rebalance failure percentage | The average percentage of rebalances that failed | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-percentage,worker=([-.\w]+) |
rebalance maximum time | The maximum time spent to rebalance | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-max-time,worker=([-.\w]+) |
rebalance 99th percentile time | The 99th percentile time spent to rebalance during the last window (defaults to an hour) | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-99p-time,worker=([-.\w]+) |
rebalance 95th percentile time | The 95th percentile time spent to rebalance during the last window (defaults to an hour) | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-95p-time,worker=([-.\w]+) |
rebalance 90th percentile time | The 90th percentile time spent to rebalance during the last window (defaults to an hour) | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-90p-time,worker=([-.\w]+) |
rebalance 75th percentile time | The 75th percentile time spent to rebalance during the last window (defaults to an hour) | kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-75p-time,worker=([-.\w]+) |
time since last rebalance | The time since the most recent rebalance | kafka.connect:type=connect-worker-rebalance-metrics,name=time-since-last-rebalance,worker=([-.\w]+) |
task failure rate | The number of tasks that failed in this worker | kafka.connect:type=connect-worker-rebalance-metrics,name=task-failure-rate,worker=([-.\w]+ |
Worker REST Metrics
Metric Name | Description | MBean attribute |
---|---|---|
REST request rate | The number of requests handled by the REST endpoints | kafka.connect:type=worker-rest-metrics,name=request-rate,worker=([-.\w]+) |
kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors
Proposed Changes
We will add the relevant metrics as specified in the Public Interfaces section.
Compatibility, Deprecation, and Migration Plan
We are introducing new metrics so there is no compatibility impact. Note that two existing metrics exist but will not be changed.
Rejected Alternatives
The following metrics were considered but were rejected: