Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-4827

 

Proposed Change

Change the validation of connector names to use a whitelist of characters in order to ensure that no characters that create issues are allowed and enforce a maximum length for the connector name.trim leading and trailing whitespaces and reject zero length strings after trimming. This would allow for whitespaces in connector names but remove potential confusion caused by accidentally padding the name with whitespaces, which is easily possible due to the create request having the name as a json value, not in the url.

Apart from that no characters will be rejected.

New or changed public interfaces

There is no new public interface, however the behavior of the create connector api will change insomuch as some connector names that previously worked will now not be accepted anymore when creating new connectors. This may break deployment scripts for a few people, so we should definitely announce this as a breaking change, not sure if a full deprecation cycle makes sense, but since this will definitely be post 1.0 I guess it might be a good idea.


Migration plan and compatibility

Backward compatibility should be given, subject to testing.

Since no characters are restricted nothing that previously worked should break based on this KIP. Connectors with an empty name were broken before and could not be deleted anyway.

The only scenario that I can come up with is an automated deployment of connectors containing leading or trailing whitespaces that later expects being able to query configuration and status of these connectors under that connector name. I think adding a note to this effect to the release notes should suffice (if that, one might argue that this is a fringe case).

Rejected alternatives

Initially the proposal was to reject a number of characters as illegal in connector names based on a whitelist or blacklist. However following discussions on the mailinglist it was agreed that we can be very generous in allowing characters in connector names as long as all rest requests are properly url-encoded.

Original KIP:

Based on this research and since I don't really see the benefit behind supporting a large number of exotic special characters I propose to limit allowed chars for connetor names to the unreserved characters:

 

Code Block
a-z   A-Z   0-9   .   -   _   ~

 

Appendix A - Research

The set of allowed characters should be determined by the fact that the connector name is part of the url in a rest call to update the config or delete the connector, so we should take care
to allow only characters that can be "legally" used within urls - however it turns out that this is not an entirely easy distinction. After looking at a few stackoverflow threads(here, here & here) as well as RfCs RfCs 1738, 2396 and  and 3896 my  my understanding is the following.

...

During my work on KAFKA-4930 I found that at least ? from the list of reserved characters also creates issues and leads to connectors that are not accessible anymore after creation, so I'd be hesitant to simply include these in the list of allowed characters without further research into what causes these issues. A good example for one of these creating issues is the ; char. Connect uses jetty internally to serve the rest endpoints. Jetty considers ; to be a special character delimiting two url parameters from each other and stops parsing the url portion at this character. What this means is that you can create a connector with the name "test;test", since during creation you specify the name within the body of the request, but when you try calling the /connectors/test;test/status endpoint jetty will cut at ; - look for a connector named "test" and not find anything.

 

Based on this research and since I don't really see the benefit behind supporting a large number of exotic special characters I propose to limit allowed chars for connetor names to the unreserved characters:

 

Code Block
a-z   A-Z   0-9   .   -   _   ~

 

New or changed public interfaces

There is no new public interface, however the behavior of the create connector api will change insomuch as some connector names that previously worked will now not be accepted anymore when creating new connectors. This may break deployment scripts for a few people, so we should definitely announce this as a breaking change, not sure if a full deprecation cycle makes sense, but since this will definitely be post 1.0 I guess it might be a good idea.

...

Backward compatibility should be given, subject to testing.
Connectors with illegal names that have been deployed previously to the upgrade that incorporates are assumed to continue working (will need to be confirmed once code is available), and probably even changes to the configuration are possible since validators are only applied when creating connectors.

The migration path is to undeploy connectors and redeploy them with a legal name. This will not work for all connectors, as mentioned above, some characters cause issues, but these are broken anyway and will need to be removed manually by deleting them from the config topic in Kafka. This situation is no change from the current state.

...

The obvious alternative is to use a blacklist of characters and reject connector names that contain one of these. However there is a good chance that we would spend a lot of time and jiras slowly refining this list while probably still missing
some extremely exotic characters. Also a case could be made that there is not really a point to using too many special characters in connector names, so I don't think there would be a large benefit to this approach.