
Current state: Voting

Discussion thread: here

JIRA: here

PR: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).


Connect framework provides good metrics for monitoring behavior. Currently, we have a metric on the Worker to measure number of tasks.

It is useful in many applications to measure the number of tasks on a Connector. For example, an administrator may wonder the average number of tasks per connector, which types of connectors create the most tasks, if some connectors are outliers (creating too many or too few tasks). The current metric providing tasks per Worker is useful in its own right, but does not address any of these needs.

Further, it is useful to know the breakdown of how many tasks on a connector are running, paused, or failed.

Public Interfaces

We propose adding the following new metrics on the existing group name "connector-metrics" in the ConnectMetricsRegistry.

MBeanMetric/Attribute NameDescription
kafka.connect:type=connector-metrics,connector=([-.\w]+)connector-total-task-countThe number of tasks of the connector. 
kafka.connect:type=connector-metrics,connector=([-.\w]+)connector-running-task-countThe number of running tasks of the connector.
kafka.connect:type=connector-metrics,connector=([-.\w]+)connector-paused-task-countThe number of paused tasks of the connector.
kafka.connect:type=connector-metrics,connector=([-.\w]+)connector-failed-task-countThe number of failed tasks of the connector.

Proposed Changes

The above metrics will be added. These metric will be calculated from the `Herder::connectorStatus` method. In order to enable this change, the `WorkerConnector` will be constructed with a `Herder` to have access to the task statuses.

Metrics gathering should perform efficiently and be cheap to calculate. The newly proposed metric will call "AbstractHerder::connectorStatus", which synchronously returns the values from the KafkaStatusBackingStore's cache. The most expensive part of the function is sorting the taskStates.

Compatibility, Deprecation, and Migration Plan

This KIP simply adds a new metric.

Rejected Alternatives

Excluding the "connector-" prefix

"connector-" prefix keeps naming consistent with the existing names "connector-version", "connector-type", and "connector-status". Further, while not strictly necessary, it does disambiguate from the Worker's "task-count" metric name.

Rely on the REST API

While the Connect REST API does provide the number of tasks for a connector, it is unuseful in Standalone Mode. Further, exposing this number through the common metrics interface enables downstream interfaces and pipelines to gather metrics from a single source.