Status
Current state: Voting
Discussion thread: here
Voting thread: here
JIRA: here
PR: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Connect framework provides good metrics for monitoring behavior. Currently, we have a metric on the Worker to measure number of tasks.
It is useful in many applications to measure the number of tasks on a Connector. For example, an administrator may wonder the average number of tasks per connector, which types of connectors create the most tasks, if some connectors are outliers (creating too many or too few tasks). The current metric providing tasks per Worker is useful in its own right, but does not address any of these needs.
Further, it is useful to know the breakdown of how many tasks on a connector are running, paused, or failed.
Public Interfaces
We propose adding the following new metrics on the existing group name "connector-metrics" in the ConnectMetricsRegistry.
MBean | Metric/Attribute Name | Description |
---|---|---|
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-total-task-count | The number of tasks of the connector. |
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-running-task-count | The number of running tasks of the connector. |
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-paused-task-count | The number of paused tasks of the connector. |
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-failed-task-count | The number of failed tasks of the connector. |
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-unassigned-task-count | The number of unassigned tasks of the connector. |
kafka.connect:type=connector-metrics,connector=([-.\w]+) | connector-destroyed-task-count | The number of destroyed tasks of the connector. |
Since each task must always have exactly one non-null status, and we've covered every task status, the "connector-total-task-count" will be equal to the sum of each status.
Proposed Changes
The above metrics will be added. These metric will be calculated from the `Herder::connectorStatus` method. In order to enable this change, the `WorkerConnector` will be constructed with a `Herder` to have access to the task statuses.
Metrics gathering should perform efficiently and be cheap to calculate. The newly proposed metric will call "AbstractHerder::connectorStatus", which synchronously returns the values from the KafkaStatusBackingStore's cache. The most expensive part of the function is sorting the taskStates.
Compatibility, Deprecation, and Migration Plan
This KIP simply adds a new metric.
Rejected Alternatives
Excluding the "connector-" prefix
"connector-" prefix keeps naming consistent with the existing names "connector-version", "connector-type", and "connector-status". Further, while not strictly necessary, it does disambiguate from the Worker's "task-count" metric name.
Rely on the REST API
While the Connect REST API does provide the number of tasks for a connector, it is unuseful in Standalone Mode. Further, exposing this number through the common metrics interface enables downstream interfaces and pipelines to gather metrics from a single source.