Status
Current State: Discussion
Discussion Thread: here
Jira:
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently very little checking is done against connector names, when creating a new connector in Kafka Connect. The only check that is performed is that no / is present in the name, other than that it is even possible to create a connector with an empty name that is then impossible to delete.
Additionally there are a large number of special characters that can be used in connector names but create issues afterwards, when they need to be used in the url for the rest call to change or delete the connector.
A pull request is ready for review that will deal with a few of the cases that cause issues, but this is more of a band aid than a proper fix for this issue, as I am sure there are a whole lot more characters out there that were missed and also create issues.
There are a number of related jiras currently open that could probably all be fixed by implementing something like the checks proposed here:
Proposed Change
Change the validation of connector names to use a whitelist of characters in order to ensure that no characters that create issues are allowed and enforce a maximum length for the connector name.
The set of allowed characters should be determined by the fact that the connector name is part of the url in a rest call to update the config or delete the connector, so we should take care
to allow only characters that can be "legally" used within urls - however it turns out that this is not an entirely easy distinction. After looking at a few stackoverflow threads as well as RfCs 1738, 2369 and 3896 my understanding is the following.
There are some characters that are definitely legal and allowed without any restriction:
a-z A-Z 0-9 . - _ ~
Then there are reserved characters, these are legal, but can have special meaning depending on which section of the URL they appear in:
; / ? : @ & = + $ ,
And last but not least there are a few "unwise" characters:
{ } | \ ^ [ ] `
During my work on KAFKA-4930 I found that at least ? from the list of reserved characters also creates issues and leads to connectors that are not accessible anymore after creation, so I'd be hesitant to simply include these in the list of allowed characters without further research into what causes these issues. A good example for one of these creating issues is the ; char. Connect uses jetty internally to serve the rest endpoints. Jetty considers ; to be a special character delimiting two url parameters from each other and stops parsing the url portion at this character. What this means is that you can create a connector with the name "test;test", since during creation you specify the name within the body of the request, but when you try calling the /connectors/test;test/status endpoint jetty will cut at ; - look for a connector named "test" and not find anything.
Based on this research and since I don't really see the benefit behind supporting a large number of exotic special characters I propose to limit allowed chars for connetor names to the unreserved characters:
a-z A-Z 0-9 . - _ ~
New or changed public interfaces
There is no new public interface, however the behavior of the create connector api will change insomuch as some connector names that previously worked will now not be accepted anymore when creating new connectors. This may break deployment scripts for a few people, so we should definitely announce this as a breaking change, not sure if a full deprecation cycle makes sense, but since this will definitely be post 1.0 I guess it might be a good idea.
Migration plan and compatibility
Backward compatibility should be given, subject to testing.
Connectors with illegal names that have been deployed previously to the upgrade that incorporates are assumed to continue working (will need to be confirmed once code is available), and probably even changes to the configuration are possible since validators are only applied when creating connectors.
The migration path is to undeploy connectors and redeploy them with a legal name. This will not work for all connectors, as mentioned above, some characters cause issues, but these are broken anyway and will need to be removed manually by deleting them from the config topic in Kafka. This situation is no change from the current state.
Rejected alternatives
The obvious alternative is to use a blacklist of characters and reject connector names that contain one of these. However there is a good chance that we would spend a lot of time and jiras slowly refining this list while probably still missing
some extremely exotic characters. Also a case could be made that there is not really a point to using too many special characters in connector names, so I don't think there would be a large benefit to this approach.