Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state:  Under Discussion Adopted.

Discussion threadhere

Vote Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

In order to allow Kafka Connect to keep track of the set of topics that a specific connector has used during its lifetime the following public-facing changes are proposed. It's worth noting that the required changes do not include any changes to the public interfaces or classes of the Kafka Connect framework and rebuilding existing connector code is not required in order to use this feature. 

Storing and returning names of topics that are actively used by connectors

...

Connect's internal status topic (set by the configuration property status.storage.topic) already fulfils the specification requirements of a topic that could be used to store information related to topic observability. Specifically: 

  • It's a compacted topic.
  • It's a partitioned topic (i.e. has more than one partitions). Order is retained within a single partition, but global order is not required. 
  • It's a topic that is being fully read by every worker during startup.
  • Every worker listens to updates and is able to read new messages written to this topic.
  • Records are keyed.
  • Older workers can skip messages that don't understand in this topic. 
  • Json encoding is used for keys and values. 

...

Key formatKey exampleValue formatValue example
status-topic-${topic-name}:connector-${connector-name}
status-topic-foo:connector-some-source

{

  "topic"{

    "name": string, 

    "connector": string,

    "task": int32,

    "timestampdiscoverTimestamp": int64    

  }
}

{

  "topic"{

    "name": "foo",

    "connector": "some-source",

    "task": 0,

    "timestampdiscoverTimestamp": 1579297899

  }
}

The topic name can be safely separated by the connector name because the delimiter : (colon character) is not a valid topic name character in Kafka. 

...

The information stored in the value of the Kafka record is selected to include the topic name, the connector name and , the task ID of the task that last reported the topic is used by the connector and a timestamp relative to when this topic was detected as active. While the record value is not essential to decide whether a topic is used by a connector and it is partially redundant compared to what is stored in the key, it makes these entries easier to read and follow as well as more useful during runtime or when troubleshooting the topics used by a connector. Whereas the Kafka record key includes to the topic name and the connector name, the Kafka record value stores additionally the ID of the task that succeeded to store a topic status record last (in case more than one task produces a record concurrently for a short period of time) and a timestamp to mark when this topic was detected as active. In the future the record value can be easily extended to include additional information

Compared to the existing keys for the status topic, which currently have prefixes status-connector- and status-task-, the new key format extends the set of status topic record keys in a readable and intuitive way by adding the prefix status-topic- to the keys of the new Kafka records

Recording active topics

...

It is expected that, multiple Connect workers may compete to append more than one record to the status.storage.topic. These records will have the same key and because the topic is compacted, all the Kafka records of a specific key will eventually collapse into a single entry. As soon as a worker detects the addition of a topic to a connector's set of active topics, the worker will cease to not post update messages to the status.storage.topic for that connector.  additional update records for the connector and this newly-detected active topic.

Resetting a connector's set of active topics

...

When a Connect worker receives this request, it sends a tombstone message for each topic in a connector's set of active topics. It's worth noting that this operation is not atomic with respect to the whole set of topics. Kafka provides atomicity at the record level and it guarantees that two records (e.g. a tombstone record and a topic status record) will be atomically appended to the log. Therefore, the order in which this may happen if a request to reset a connector's set of active topics is interleaved with actual production or consumption of records from the connector's tasks is not characterized by a happens-before relationship between reset (production of tombstone message) and recording (production of a non-tombstone message) actions. 

Resetting the set of active topics of a connector while this connector is running is fine, as long as the intention is to reset any topics that are no longer used by the connector and retain the ones that are active. Topic reset is a composable operation with respect to a connector's normal execution. Soon after the reset, the worker tasks will populate the status.storage.topic with new topic status messages for the topics that the connector is currently using.

Restarting, reconfiguring or deleting a connector

Just restarting a connector (without altering the configuration) has no effect on the recorded set of active topics for that connector.

Reconfiguring a source connector also has no effect for a source connector's on the recorded set of active topics.  Reconfiguring a sink connector, on the other hand, may change the consumed topics, and any topics no longer consumed will be removed from the set of active topics for this sink connector by appending tombstone messages appropriately after the reconfiguration of the connector.For sink connectors, that means that a topic that was included in the previous configuration of the connector but is not included in the current configuration, will still show up in the set of active topics. An explicit reset request will have to be issued (immediately after the connector is reconfigured) in order to remove these old topics from the connectors active set. This requirement is suggested in this KIP to keep the symmetry between source and sink connectors with respect to reset and also keep the new as well as the existing Connect REST API endpoints simple and with a focused mission. 

Deleting a connector will reset this connector's set of active topics even when resetting topic tracking for connectors has been disabled. Successful reset of the history of topics used by connectors does depend on whether the connector was gracefully deleted. Partial reset can be followed by another attempt to reset topic tracking of topics for a connector. 

Extensions to Connect REST API

...

Code Block
languagejava
titleGet the set of active topics from a connector called 'some-source'
$ curl -s 'http://localhost:8083/connectorconnectors/some-source/topics' | jq
{
  "some-source": {
    "topics": [
      "foo",
	  "bar",
	  "baz",				
    ]
  }
}
$

...

Code Block
languagejava
titleSuccessful reset of the set of active topics of a connector called 'some-source'
$ curl -X PUT -s 'http://localhost:8083/connectorconnectors/some-source/topics/reset' | jq
$

...

Code Block
languagejava
titleAttempt to reset the set of active topics when reset is not allowed
$ curl -X PUT -s 'http://localhost:8083/connectorconnectors/some-source/topics/reset' | jq
{
  "error_code": 403,
  "message": "Topic tracking reset is disabled"
}
$

...


TypeDefaultPossible ValuesDescription
topic.tracking.enable
booleantruetrue, falseWhether the Connect worker will track and persist which topics are actively used per connector. It's highly recommended to set the same value in all the workers of a Connect cluster
topic.tracking.allow.reset
booleantruetrue, falseWhether to allow requests to reset the set of active topics for specific connectors. 

Security

This feature enables a user or application to find out the topic names that are used by a connector. With respect to security, this feature inherits the security characteristics that similar functionality has in Kafka Connect at the moment. Specifically: 

  • A user that has the ability to query the status, create or reconfigure a connector via the Connect REST API will be able to get the set of topic names that a connector uses. If access to specific endpoints is restricted for certain users, the Connect cluster administrators should consider restricting access to the new endpoints in a similar way. 
  • The topic names that a connector uses, are persisted in the status.storage.topic. This is an existing internal topic for Kafka Connect. Administrators should restrict access to the sets active topics per connector in the same way that they currently restrict access to the configuration and the status of connectors in their Connect clusters.

Given the above, the implementation of this KIP does not require extra steps to secure access to the set of active topic names that connectors are using

Compatibility, Deprecation, and Migration Plan

...