Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Clarify effect on quotas, as recommended by Arjun Satish

Table of Contents

Status

Current state: Under Accepted

Discussion thread: hereDiscussion

Voting thread: here

JIRA: KAFKA-5061

Released: 2.3.0

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

It is currently not possible to monitor producer and consumer metrics for individual tasks in Kafka Connect due to JMX MBean naming conflicts. Standard Kafka producer and consumer clients use client.id in metric names to disambiguate JMX MBeans when multiple instances are running in the same JVM. In Kafka Connect, all producer and consumer instances created by a Worker inherit the same client id from Worker properties file. Since it is not possible to set a different client ID for each task, any Connector with more than one task running on a Worker node will generate JMX MBean naming conflicts. This makes it impossible to collect per-task Kafka client metrics across a Kafka Connect cluster.

Public Interfaces

Prose adding one new property to the Propose updating the default client.id used by Kafka clients created by Kafka Connect worker .properties file, which requires changing WorkerConfig.java
Name: unique.client.id
Doc: If true the task id is appended to the client.id used by each Source or Sink task. This avoids name conflicts on JMX mbeans and enables task-level client metrics.
Type: BOOLEAN
Default: false

...

tasks.  The new default will include the current connector and task id.

Proposed Changes

PR Available here: https://github.com/apache/kafka/pull/6097


Consumers created for sink tasks will have a default client.id of the form:


 connector-consumer-{connectorId}-{taskId}

   e.g. For connector "conn1", task "2" the default client ID would become: connector-consumer-conn1-2

Producers created for source tasks will have a default client.id of the form:

  connector-producer-{connectorId}-{taskId}

Dead-letter queue producers created for sink tasks will have a default client.id of the form:

  connector-dlq-producer-{connectorId}-{taskId}

Compatibility, Deprecation, and Migration Plan

The change will affect any existing cluster where client.id has not been over-ridden in the worker configuration. Since the current default is not useful for JMX monitoring the change should have minimal impact. 

Any client IDs specified in the worker configuration via producer.client.id or consumer.client.id properties will remain unchanged, as those will take precedence.

While this change will not affect quota limits, in some cases it could have an indirect impact on resource usage by a Connector. For example, a system that was enforcing quotas using the default "consumer-[id]" client ids will need to update their configuration to enforce quota on "connector-consumer-[task-id]" instead. Note that enforcing quotas on default client ids in this way is not recommended, before or after this change. For systems that were not enforcing any quota limits on client ids, or using default quotas, no change is expected.

Rejected Alternatives


Add a worker configuration option to automatically append task ID to the client ID used by producer or consumer instances instantiated by a Worker. This ensures all JMX MBean names used within a Kafka Connect cluster are distinct
.

See PR here for proposed implementation:  https://github.com/apache/kafka/pull/5775

Compatibility, Deprecation, and Migration Plan

The default value is false, which keeps existing behavior unchanged.  

Rejected Alternatives


Several options were proposed in https://issues.apache.org/jira/browse/KAFKA-5061:
"Provide default client IDs based on the worker group ID + task ID (providing uniqueness for multiple connect clusters up to the scope of the Kafka cluster they are operating on)"
This option avoids a configuration change, but does not maintain backward compatibility and will alter metrics names in existing clusters that have not explicitly overridden the client id in configuration.  Also, since the default group ID and task ID both include the connector name, we would end up the connector name appearing twice in the default client id.  NOTE: This would be the simplest option if backwards compatibility is not a serious concern.  A possible implementation is offered in this PR:  https://github.com/apache/kafka/pull/6097
"This approach was rejected because adding new configuration options is considered a larger change to public interfaces than necessary.

Allow overriding client.id on a per-connector basis
"

This would not allow for per-task monitoring, as all tasks created by a connector would share the same client id.  A related option is to offer connectors more direct control of the client id, in configuration or code.  This option does not give the required level of granularity, individual tasks would still have name conflicts without other tasks create by the same connector.  This could be avoided if the connector could control the client id at the task level, but that would require a much more complex change.