Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A new Connect protocol, sessioned, will be implemented that will be identical to the cooperative incremental protocol but with the addition of a session-key field to the assignment schema, which will then be retained by follower workers for use in request signing and by the leader for use in request verification. One downside of this approach is that the use of cooperative incremental assignments will be required in order to enable this new security behavior; however, given the lack of any serious complaints about the new rebalancing protocol thus far, this seems preferable to trying to enable this behavior across both assignment styles. In addition, periodically forcing a rebalance in order to rotate keys would incur a heavy performance penalty on a cluster using eager assignment; this approach isn't really practical in that case.

If the internal.request.verification is enabled property is set to true, the worker will advertise this new sessioned protocol to the Kafka group coordinator as a supported (and, currently, most preferable) protocol. If that protocol is then agreed on by the cluster during group coordination, a session key will be randomly generated during each rebalance and distributed by the leader to each follower node. This key will be used by followers to sign requests to the internal endpoint, and verified by the leader to ensure that the request came from a current group member. It is imperative that inter-worker communication have some kind of transport layer security; otherwise, this session key will be leaked during rebalance to anyone who can eavesdrop on request traffic.

...

  • X-Connect-Authorization: the signature of the request body
  • X-Connect-Key-Algorithm: the key algorithm used to sign the request

The leader will only accept requests signed with the most current key. This should not cause any major problems; if a follower attempts to make a request with an expired key (which should be quite rare and only occur if the request is made during an in-progress rebalance), the initial request will fail, but will be subsequently retried after a backoff period. This backoff period should leave sufficient room for the rebalance to complete. One potential downside is that, should this occur, an error-level log message of "Failed to reconfigure connector's tasks, retrying after backoff: " followed by a stack trace will be generated. This can be mitigated by altering the log message or the generated exception to include a note that this may not be an issue if key rotation is enabled, and/or logging an info-level log message after successfully completing task reconfiguration that potentially includes a note that any above error messages related to task reconfiguration may be safely disregarded.

Compatibility, Deprecation, and Migration Plan

All of the proposed configurations here have default values, making them backwards compatible.

...