Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder discussionAccepted

Discussion threadhere

Vote thread: here

JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-8804

PR(draft)https://github.com/apache/kafka/pull/7310

Released: 2.4.0

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

There will be five new configurations added for distributed workers:

  • internalinter.requestworker.key.generation.algorithm
    • Purpose: the algorithm used to generate session keys
    • Type: string
    • Default: "HmacSHA256"
    • Importance: low
  • internalinter.requestworker.key.size
    • Purpose: the size of generated session keys, in bits; if null, the default key size for the generation algorithm will be used (see the KeyGenerator Javadocs; specifically: "In case the client does not explicitly initialize the KeyGenerator (via a call to an init method), each provider must supply (and document) a default initialization.")
    • Type: int
    • Default: null
    • Importance: low
  • internalinter.requestworker.key.rotation.intervalttl.ms
    • Purpose: how often to force a rotation of the internal key used for request validation, or 0 if forced rotation should never occur
    • Type: long
    • Default: 3600000 (one hour)
    • Importance: low
  • internalinter.requestworker.signature.algorithm
    • Purpose: the algorithm to use to sign internal requests when sent from a follower worker to the leader
    • Type: string
    • Default: "HmacSHA256"
    • Importance: low
  • internalinter.requestworker.verification.algorithms
    • Purpose: a list of supported algorithms for verifying internal requests that are received by the leader from a follower. This list must include the value provided for the internalinter.requestworker.signature.algorithm property
    • Type: list
    • Default: "HmacSHA256"
    • Importance: low

...

Periodically (with frequency dictated by the internalinter.requestworker.key.rotationttl.interval.ms property), the leader will compute a new session key and distribute it to the cluster.

The default algorithm used to sign requests will be HmacSHA256; this algorithm is guaranteed to be supported on all implementations of the Java Platform (source). However, users will be able to configure their cluster to use other algorithms with the internalinter.requestworker.signature.algorithm property if, for example, the default is not suitable for compliance with an existing security standard.

Similarly, the default algorithm used to generate request keys will also be HmacSHA256; again, this algorithm is guaranteed to be supported on all implementations of the Java Platform (source). And again, users will be able to configure their cluster to use other algorithms or keys of a different size with the internalinter.requestworker.key.generation.algorithm and internalinter.requestworker.key.size properties, respectively.

...

When a request is received by the leader, the request signature algorithm described by the X-Connect-Request-Signature-Algorithm header will be used to sign the request body and the resulting signature will be checked against the contents of the X-Connect-Authorization header. If the contents do not match, or the request signature algorithm is not in the list of permitted algorithms controlled by the internalinter.requestworker.verification.algorithms property, the request will be rejected.

...

  • MBeankafka.connect:type=connect-worker-rebalance-metrics
  • Metric nameconnect-protocol
  • DescriptionThe Connect protocol used by this cluster
  • Value: The Connect subprotocol in use based on the latest join group response for this worker joining the Connect cluster.

...

The newly-proposed connect-protocol JMX metric can be used to monitor whether internal request verification is enabled for a cluster; if its value is sessioned (or, presumably, a later protocol), then request verification should be enabled.

Reverting an upgrade

Via connect.protocol config

The group coordination protocol will be used to ensure that all workers in a cluster support verification of internal requests before this behavior is enabled; therefore, a rolling upgrade of the cluster will be possible. In line with the regression plan for KIP-415: Incremental Cooperative Rebalancing in Kafka Connect, if it is desirable to disable this behavior for some reason, the connect.protocol configuration can be set to compatible or default for one (or more) workers, and it will automatically be disabled.

Via worker version downgrade

It should also be noted that the above will occur automatically if a worker is downgraded to a prior release of Kafka Connect that does not support the sessioned protocol. If this occurs, the worker will begin emitting error-level log messages when it reads session keys from the config topic. However, the worker will be otherwise unaffected and will continue to function properly (but without the security benefit of internal request verification).

Migrating to a new request signature algorithm

If a new signature algorithm should be used, a rolling upgrade will be possible with the following steps (assuming a new algorithm of HmacSHA489):

  1. Add HmacSHA489 to the internal.key.verification.algorithms list for each worker, and restart them one-by-one
  2. Change the internal.key.signature.algorithm property for each worker to HmacSHA489, and restart them one-by-one
  3. (Optional) Remove the old algorithm from the internal.key.verification.algorithms list for each worker, and restart them one-by-one

Rejected Alternatives

Configurable inter-worker headers

Summary: A new worker configuration would be added that would control auth headers used by workers when making requests to the internal endpoint.

Rejected because: The additional complexity of another required configuration would be negative for users; security already isn't simple to implement with Kafka Connect, and requiring just one more thing for them to add should be avoided if possible. Also, the use of static headers isn't guaranteed to cover all potential auth mechanisms, and would require manual rotation by reconfiguring the worker.

Replace endpoint with Kafka topic

Summary: The REST endpoint could be removed entirely and replaced with a Kafka topic. Either an existing internal Connect topic (such as the configs topic) could be used, or a new topic could be added to handle all non-forwarded follower-to-leader communication.

Rejected because: Achieving consensus in a Connect cluster about whether to begin engaging in this new topic-based protocol would require either reworking the Connect group coordination protocol or installing several new configurations and a multi-stage rolling upgrade in order to enable it. Requiring new configurations and a multi-stage rolling upgrade for the default use case of a simple version bump for a cluster would be a much worse user experience, and if the group coordination protocol is going to be reworked, we might as well just use the group coordination protocol to distribute keys instead. Additionally, the added complexity of switch from a synchronous to an asynchronous means of communication for relaying task configurations to the leader would complicate the implementation enough that reworking the group coordination protocol might even be a simpler approach with smaller changes required.

Distribute session key via Connect protocol

Summary: Instead of distributing a session key via the config topic, include the session key as part of the worker assignment handed out during rebalance via the Connect protocol. Periodically force a rebalance in order to rotate session keys.

Rejected because: The implementation complexity of adding a session key to the rebalance protocol would be quite high, and the additional API would complicate the code base significantly. Additionally, there are few, if any advantages, compared to distributing the keys via the config topic.