...
Another use case for creating connectors in a stopped (or paused) state could be deploying connectors as a part of a larger data pipeline before the source / sink data system has been created or is ready for data transfer.
Public Interfaces
"POST /connectors" REST API endpoint
A new optional field "initial_state"
will be added to the request body format for the POST /connectors
endpoint (the existing format can be seen in Kafka Connect's current OpenAPI specification). The new optional field "initial_state"
can have a value of "RUNNING", "PAUSED"
or "STOPPED"
(case-insensitive). If the value of the "initial_state"
field is invalid, a 400 Bad Request
response will be returned. If the field is omitted in the request body, the connector will be created in the RUNNING
state by default (preserving the existing behavior). An example request body would look like:
...
Note that when a connector is created in a PAUSED
or STOPPED
state, no tasks will be spawned for the connector until it is resumed via the PUT /connectors/{connector}/resume
endpoint endpoint. This method of creating connectors can currently be used in both the distributed mode as well as the standalone mode.
Standalone mode CLI
Kafka Connect in standalone mode is currently started using a command line like -
Code Block |
---|
bin/connect-standalone.sh config/connect-standalone.properties [connector1.properties connector2.properties ...] |
where the connector properties are Java properties files containing String key-value connector configurations.
The CLI will be updated to also accept JSON files containing the connector configurations (i.e. the optional additional CLI arguments can be either Java properties files or JSON files). The JSON files could be -
- A simple JSON object containing only string key-value pairs representing the connector configuration (i.e. same as the existing request body format for the
PUT /connectors/{connector}/config
endpoint - see current OpenAPI specification).
OR - A JSON object containing the
"name"
string,"config"
object and (optionally)"initial_state"
of the connector (i.e. same as the request body format for thePOST /connectors
endpoint)
This change offers two benefits. One is that users will be able to copy and use examples across both the methods of connector creation (REST API requests with JSON request bodies in standalone / distributed mode and JSON files passed to the standalone mode startup CLI). The second benefit is that any future extensions would be easily applied across both the methods consistently.
Proposed Changes
Background
...
The new field being added to the request body format for the POST /connectors
endpoint is optional and if omitted, the default behavior mirrors the existing behavior (i.e. the connector will be created in a RUNNING
state). Thus, there aren't any backward compatibility issues introduced in this KIP. The target state written to the config topic will use the convention established in KIP-875 ( "state"
and "state.v2"
) in order to facilitate cluster downgrades and rolling upgrades (older Connect workers may not recognize the STOPPED
state).
Test Plan
Unit tests, integration tests or integration system tests (whichever is deemed more most appropriate) will be added for these cases -
- Can create a connector in the
PAUSED
state in standalone mode - Can create a connector in the STOPPED state in standalone mode
- Can create a connector in the
PAUSED
state in distributed mode - Can create a connector in the
STOPPED
state in distributed mode - No tasks are spawned for a connector created in the
PAUSED
orSTOPPED
state - Creating a connector with no explicit state specified results in a
RUNNING
state connector - Can resume a connector created in the
PAUSED
orSTOPPED
state - Can delete a connector created in the
PAUSED
orSTOPPED
state - Can modify the offsets for a connector created in the
STOPPED
state - Can do an end-to-end migration of a connector from one Connect cluster to anotheruse JSON files containing connector key-value configurations with standalone mode
- Can use JSON files containing a JSON object similar to the request body format for the
POST /connectors
endpoint with standalone mode
Future Work
Allow specifying offsets directly during connector creation
In order to more directly support connector migrations / creating a connector with a specific starting offset, we could introduce an optional "offsets"
field to the connector creation request body. Internally, this would write the specified offset before starting the connector. This would be a more streamlined way of doing things from the user's point of view as opposed to creating a connector in the STOPPED
state, altering its offsets, and then resuming the connector.
Rejected Alternatives
Directly expose the existing target states as possible initial connector states
...