You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Status

Current stateUnder Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

An important part of deploying Kafka Connect is monitoring the health of the workers in a cluster and the connectors and tasks that have been deployed to the cluster. The Kafka Connect framework only has a few metrics capturing the number of connectors and tasks for each worker, so we propose to add metrics to monitor more information about the connectors, tasks, and workers. This proposal expressly avoids changes to the Connect API, and therefore does not address how connector implementations can define their own connector-specific metrics.

Public Interfaces

All of the following will be added via Kafka's metrics library like most of the metrics in the Kafka brokers and other components.

Source Task Metrics

Metric NameDescriptionMBean attribute
source record rateThe number of records produced per second by this task belonging to the named source connectorkafka.connect:type=source-task-metrics,name=source-record-rate,connector=([-.\w]+),task=([\d]+)
source record totalThe total number of records produced by this task belonging to the named source connectorkafka.connect:type=source-task-metrics,name=source-record-total,connector=([-.\w]+),task=([\d]+)
poll time percentageThe average percentage of time spent polling this task belonging to the named source connectorkafka.connect:type=source-task-metrics,name=poll-time-percentage,connector=([-.\w]+),task=([\d]+)
transform time percentageThe average percentage of time spent transforming source records for this task belonging to the named source connectorkafka.connect:type=source-task-metrics,name=transform-time-percentage,connector=([-.\w]+),task=([\d]+)
write time percentageThe average percentage of time spent converting and writing source records for this task belonging to the named source connectorkafka.connect:type=source-task-metrics,name=write-time-percentage,connector=([-.\w]+),task=([\d]+)
pause time percentageThe average percentage of time this task belonging to the named source connector were pausedkafka.connect:type=source-task-metrics,name=pause-time-percentage,connector=([-.\w]+),task=([\d]+)


Source Connector Metrics

Metric NameDescriptionMBean attribute
source record rateThe number of records produced per second by all tasks belonging to the named source connectorkafka.connect:type=source-connector-metrics,name=source-record-rate,connector=([-.\w]+)
source record totalThe total number of records produced by all tasks belonging to the named source connectorkafka.connect:type=source-connector-metrics,name=source-record-total,connector=([-.\w]+)
poll time percentageThe average percentage of time spent polling all tasks belonging to the named source connectorkafka.connect:type=source-connector-metrics,name=poll-time-percentage,connector=([-.\w]+)
transform time percentageThe average percentage of time spent transforming source records for all tasks belonging to the named source connectorkafka.connect:type=source-connector-metrics,name=transform-time-percentage,connector=([-.\w]+)
write time percentageThe average percentage of time spent converting and writing source records for all tasks belonging to the named source connectorkafka.connect:type=source-connector-metrics,name=write-time-percentage,connector=([-.\w]+)
pause time percentageThe average percentage of time all tasks belonging to the named source connector were pausedkafka.connect:type=source-connector-metrics,name=pause-time-percentage,connector=([-.\w]+)
 statusThe current status of the connector, one of: running, paused, stopped  kafka.connect:type=source-connector-metrics,name=status,connector=([-.\w]+)

 

Sink Task Metrics

 

Metric NameDescriptionMBean attribute
sink record rateThe number of records produced per second by this task belonging to the named source connectorkafka.connect:type=sink-task-metrics,name=sink-record-rate,connector=([-.\w]+),task=([\d]+)
sink record totalThe total number of records produced by this task belonging to the named source connectorkafka.connect:type=sink-task-metrics,name=sink-record-total,connector=([-.\w]+),task=([\d]+)
read time percentageThe average percentage of time spent polling this task belonging to the named source connectorkafka.connect:type=sink-task-metrics,name=read-time-percentage,connector=([-.\w]+),task=([\d]+)
transform time percentageThe average percentage of time spent transforming sink records for this task belonging to the named sink connectorkafka.connect:type=sink-task-metrics,name=transform-time-percentage,connector=([-.\w]+),task=([\d]+)
put time percentageThe average percentage of time this task belonging to the named sink connector spent putting/processing sink recordskafka.connect:type=sink-task-metrics,name=put-time-percentage,connector=([-.\w]+),task=([\d]+)
flush time percentageThe average percentage of time this task belonging to the named sink connector spent flushing sink recordskafka.connect:type=sink-task-metrics,name=flush-time-percentage,connector=([-.\w]+),task=([\d]+)
pause time percentageThe average percentage of time this task belonging to the named sink connector were pausedkafka.connect:type=sink-task-metrics,name=pause-time-percentage,connector=([-.\w]+),task=([\d]+)


Sink Connector Metrics

 

Metric NameDescriptionMBean attribute
sink record rateThe number of records produced per second by all tasks belonging to the named source connectorkafka.connect:type=sink-connector-metrics,name=sink-record-rate,connector=([-.\w]+)
sink record totalThe total number of records produced by all tasks belonging to the named source connectorkafka.connect:type=sink-connector-metrics,name=sink-record-total,connector=([-.\w]+)
read time percentageThe average percentage of time spent polling all tasks belonging to the named source connectorkafka.connect:type=sink-connector-metrics,name=read-time-percentage,connector=([-.\w]+)
transform time percentageThe average percentage of time spent transforming sink records for all tasks belonging to the named sink connectorkafka.connect:type=sink-connector-metrics,name=transform-time-percentage,connector=([-.\w]+)
put time percentageThe average percentage of time all tasks belonging to the named sink connector spent putting/processing sink recordskafka.connect:type=sink-connector-metrics,name=put-time-percentage,connector=([-.\w]+)
flush time percentageThe average percentage of time all tasks belonging to the named sink connector spent flushing sink recordskafka.connect:type=sink-connector-metrics,name=flush-time-percentage,connector=([-.\w]+)
pause time percentageThe average percentage of time all tasks belonging to the named sink connector were pausedkafka.connect:type=sink-connector-metrics,name=pause-time-percentage,connector=([-.\w]+)
statusThe current status of the connector, one of: running, paused, stopped  kafka.connect:type=sink-connector-metrics,name=status,connector=([-.\w]+)
partition countThe number of topic partitions assigned to all tasks running in this worker for the named sink connectorkafka.connect:type=sink-connector-metrics,name=partition-count,connector=([-.\w]+)


Worker Metrics

 

Metric NameDescriptionMBean attribute

assigned tasks

The number of tasks run in this worker (existing metric)kafka.connect:type=connect-coordinator-metrics,name=assigned-tasks,worker=([-.\w]+)
assigned connectorsThe number of connectors run in this worker (existing metric)kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors,worker=([-.\w]+)

task count

The number of tasks run in this workerkafka.connect:type=connect-worker-metrics,name=assigned-tasks,worker=([-.\w]+)
connector countThe number of connectors run in this workerkafka.connect:type=connect-worker-metrics,name=assigned-connectors,worker=([-.\w]+)
sink record rateThe number of sink records consumed per second by all sink connectorskafka.connect:type=connect-worker-metrics,name=sink-record-rate,worker=([-.\w]+)
sink record totalThe total number of sink records consumed by all sink connectorkafka.connect:type=connect-worker-metrics,name=sink-record-total,worker=([-.\w]+)
source record rateThe number of source records produced per second by all source connectorskafka.connect:type=connect-worker-metrics,name=source-record-rate,worker=([-.\w]+)
source record totalThe total number of source records produced by all source connectorkafka.connect:type=connect-worker-metrics,name=source-record-total,worker=([-.\w]+)
leader nameThe name of the group leaderkafka.connect:type=connect-worker-metrics,name=leader-name,worker=([-.\w]+)
stateThe state of this worker, one of: rebalancing, runningkafka.connect:type=connect-worker-metrics,name=state,worker=([-.\w]+)
offset commit success totalThe total number of successful offset commitskafka.connect:type=connect-worker-metrics,name=offset-commit-success-total,worker=([-.\w]+)
offset commit success percentageThe average percentage of offset commits that succeededkafka.connect:type=connect-worker-metrics,name=offset-commit-success-percentage,worker=([-.\w]+)
offset commit failure totalThe total number of failed offset commitskafka.connect:type=connect-worker-metrics,name=offset-commit-failure-total,worker=([-.\w]+)
offset commit failure percentageThe average percentage of offset commits that failedkafka.connect:type=connect-worker-metrics,name=offset-commit-failure-percentage,worker=([-.\w]+)
offset commit totalThe total number of offset commitskafka.connect:type=connect-worker-metrics,name=offset-commit-total,worker=([-.\w]+)
offset commit maximum timeThe maximum time spent to commit offsetskafka.connect:type=connect-worker-metrics,name=offset-commit-max-time,worker=([-.\w]+)
offset commit 99th percentile timeThe 99th percentile time spent to commit offsets during the last window (defaults to an hour)kafka.connect:type=connect-worker-metrics,name=offset-commit-99p-time,worker=([-.\w]+)
offset commit 95th percentile timeThe 95th percentile time spent to commit offsets during the last window (defaults to an hour)kafka.connect:type=connect-worker-metrics,name=offset-commit-95p-time,worker=([-.\w]+)
offset commit 90th percentile timeThe 90th percentile time spent to commit offsets during the last window (defaults to an hour)kafka.connect:type=connect-worker-metrics,name=offset-commit-90p-time,worker=([-.\w]+)
offset commit 75th percentile timeThe 75th percentile time spent to commit offsets during the last window (defaults to an hour)kafka.connect:type=connect-worker-metrics,name=offset-commit-75p-time,worker=([-.\w]+)


Worker Rebalance Metrics

Metric NameDescriptionMBean attribute
rebalance success totalThe total number of successful rebalanceskafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-total,worker=([-.\w]+)
rebalance success percentageThe average percentage of rebalances that succeededkafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-success-percentage,worker=([-.\w]+)
rebalance failure totalThe total number of failed rebalanceskafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-total,worker=([-.\w]+)
rebalance failure percentageThe average percentage of rebalances that failedkafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-failure-percentage,worker=([-.\w]+)
rebalance maximum timeThe maximum time spent to rebalancekafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-max-time,worker=([-.\w]+)
rebalance 99th percentile timeThe 99th percentile time spent to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-99p-time,worker=([-.\w]+)
rebalance 95th percentile timeThe 95th percentile time spent to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-95p-time,worker=([-.\w]+)
rebalance 90th percentile timeThe 90th percentile time spent to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-90p-time,worker=([-.\w]+)
rebalance 75th percentile timeThe 75th percentile time spent to rebalance during the last window (defaults to an hour)kafka.connect:type=connect-worker-rebalance-metrics,name=rebalance-75p-time,worker=([-.\w]+)
time since last rebalanceThe time since the most recent rebalancekafka.connect:type=connect-worker-rebalance-metrics,name=time-since-last-rebalance,worker=([-.\w]+)
task failure rateThe number of tasks that failed in this workerkafka.connect:type=connect-worker-rebalance-metrics,name=task-failure-rate,worker=([-.\w]+


Worker REST Metrics

Metric NameDescriptionMBean attribute
REST request rateThe number of requests handled by the REST endpointskafka.connect:type=worker-rest-metrics,name=request-rate,worker=([-.\w]+)


kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors

Proposed Changes

We will add the relevant metrics as specified in the Public Interfaces section.

Compatibility, Deprecation, and Migration Plan

We are introducing new metrics so there is no compatibility impact. Note that two existing metrics exist but will not be changed.

Rejected Alternatives

The following metrics were considered but were rejected:


  • No labels