You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

When dynamically reconfiguring TLS truststores and keystores of Kafka listeners, Kafka performs the following validations:

  • Checks if the Subject / Distinguished Name (DN) of the old and new keystore is the same as before
  • Checks if the Subject Alternative Names (SANs) of the new keystore contain all names from the old keystore (i.e. adding new SANs is allowed, removing old SANs is not)
  • If the truststore / keystore being updated is used by the inter-broker listener, it also tries to perform a test TLS handshake to validate that the keystore and truststore are compatible with each other and will not break the inter-cluster communication.

While these validations do not seem to be explicitly mentioned in the KIP-226 / KAFKA-6241, the follow-up issues (KAFKA-10279, KAFKA-14770) seem to suggest that these checks are there to protect the user from unintentionally breaking the TLS communication. While this might indeed protect the users in some situations, it also prevents them from changing the certificates when desired without restarting the brokers. For example:

  • When running Kafka in a dynamic environment where DNS names (SANs) are changing often.
  • When moving to a brand new CA and server certificate.
  • When removing a SAN name that is not used anymore from the certificate and having the old certificate with the old SAN might be considered a security risk.

These changes today require a restart of the brokers. And in some cases - such as changing the CA - even multiple restarts:

  • Establish trust for the new CA (first rolling update)
  • Move to the new server certificate (second rolling update)

In situations like this, being able to update the keystores dynamically could be a big advantage. It would cause less disruptions and make the operation faster. It would also make it easier to use short-lived certificates even if their DN or SANs change and update them dynamically without the need for restart and thus making the Kafka cluster more secure.

Kafka brokers currently don’t terminate any existing connections when the certificates are updated. The new certificates will be used only for new connections. So when the users need to update the CA and the server certificates for some critical security reasons such as compromised CA, they would either need to perform a rolling update of the Kafka cluster to terminate any existing connections or handle it on some different layer outside of Kafka. For example by terminating the existing connections on the network layer. Termination of the existing connections by Kafka itself is not part of this KIP.

Public Interfaces

This change introduces a new broker configuration option named ssl.allow.dn.and.san.changes. It is described in the next section.

Proposed Changes

This KIP proposes adding a new configuration option ssl.allow.dn.and.san.changes. When set to true, the check for DN and SAN changes during dynamic configuration updates will be skipped. The default value of ssl.allow.dn.and.san.changes will be false. That way, the default behavior would not change and any users will still have the certificates validated unless they explicitly disable it. The new option will not have any impact on the TLS handshake check done for inter-broker listeners. It will be still executed to make sure the keystore and truststore match even if the DN and SAN check is disabled.

The ssl.allow.dn.and.san.changes option itself will not be dynamically configurable.

Compatibility, Deprecation, and Migration Plan

The default behavior for any existing and new users will not change. Only users who explicitly set the new option to true would be affected by this proposal.

Rejected Alternatives

Another option would be to remove the existing DN and SAN check completely. While it might protect the users in some situations, it would still allow them to break the TLS setup in some situations. However, that would mean a change for any user who relies on this check. So I rejected this alternative and proposed the new configuration option instead.

  • No labels