Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In some environments, one may want to differentiate between external clients, internal clients and replication traffic independently of the security protocol for cost, performance and security reasons. A couple of few examples that illustrate this:

  1. Replication traffic is assigned to a separate network interface so that it does not interfere with client traffic.
  2. External traffic goes through a proxy/load-balancer (security, flexibility) while internal traffic hits the brokers directly (performance, cost).
  3. Different security settings for external versus internal traffic even though the security protocol is the same (e.g. different set of enabled SASL mechanisms, authentication servers, different keystores, etc.)

As such, we propose that Kafka brokers should be able to define multiple listeners for the same security protocol for binding (i.e. listeners) and sharing (i.e. advertised.listeners) so that internal, external and replication traffic can be separated if required.

Public Interfaces

Configuration

A new broker config listener.security.protocol.map will be introduced so that we can map a protocol label to a security protocol. The config value should be in the CSV Map format that is currently used by max.connections.per.ip.overrides. The config value should follow map semantics: each key should only appear once, but values may appear multiple times. For example, the config could be defined in the following way to match the existing behaviour:

...

To ensure compatibility with existing configs, we propose the above as the default value for the new config. It's worth mentioning that the config value should be the same in every broker in the Kafka cluster for it to work as expected. This is also the case for a number of existing Kafka broker configs and since Kafka doesn't support cluster configs at this point, it seems acceptable.

The next step is to change the validation of advertised.listeners and listeners so that the protocol label has to be one of the keys in listener.security.protocol.map (only security protocols are allowed currently). For example, the following would configure a broker with two different host:port pairs mapped to the same security protocol in two cases:

...

It is an error to set both security.inter.broker.protocol and inter.broker.protocol.label at the same time. inter.broker.protocol.label will be null by default, which means that PLAINTEXT will be used by default (as is currently the case).

  1. Different security protocol settings per listener. For example, one may want to configure SSL differently for internal versus external traffic.

ZooKeeper

Version 4 of There are a couple more interfaces that need to be updated slightly to support protocol labels. The first is the broker registration data stored in ZooKeeper . Protocol labels would be used would have protocol labels instead of security protocols in version 4 of the format:the elements of the endpoints array and would have an additional listener.security.protocol.map field. The latter is not strictly needed if we assume that all brokers have the same config, but it would make config updates trickier (e.g. two rolling bounces would be required to add a new mapping from protocol label to security protocol).

Code Block
languagejs
{
  "version":4,
  "jmx_port":9999,
  "timestamp":2233345666,
  "host":"localhost",
  “port”:9092,
  "rack":"rack1",
  "listener.security.protocol.map":"PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL",
  "endpoints": [
    "CLIENT://cluster1.foo.com:9092",
    “REPLICATION://broker1.replication.local:9093”,
    “INTERNAL_PLAINTEXT://broker1.local:9094”,
	"INTERNAL_SASL://broker1.local:9095"
  ]    
}

Protocol

Version 2 of UpdateMetadataRequest would have  a The second and final interface change is to the UpdateMetadataRequest protocol type. Version 2 would have a protocol_label field instead of security_protocol_type:

Code Block
UpdateMetadata Request (Version: 2) => controller_id controller_epoch [partition_states] [live_brokers] 
  controller_id => INT32
  controller_epoch => INT32
  partition_states => topic partition controller_epoch leader leader_epoch [isr] zk_version [replicas] 
    topic => STRING
    partition => INT32
    controller_epoch => INT32
    leader => INT32
    leader_epoch => INT32
    isr => INT32
    zk_version => INT32
    replicas => INT32
  live_brokers => id [end_points] 
    id => INT32
    end_points => port host protocol_label (instead of security_protocol_type)
      port => INT32
      host => STRING
      protocol_label => String (instead of security_protocol_type => INT16)

Client

Protocol Note that protocol labels only exist in the brokers, clients never see them.

...

As mentioned previously, the default value of listener.security.protocol.map maps the existing security protocols to a label with the same name to maintain compatibility:

Code Block
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

...

For users upgrading, they should only use protocol labels once all the brokers have been upgraded .

Future work

  1. Different security protocol settings per listener. For example, one may want to configure SSL differently for internal versus external traffic.

...

to a version that supports protocol labels.

An important limitation of this proposal is that ZooKeeper-based consumers won't understand protocol labels and hence people who still rely on them won't be able to use this feature. We are in the process of deprecating the old consumers and they don't support newer features like security, so this seems acceptable.

Rejected Alternatives

  1. Implicit It's worth mentioning that the config value should be the same in every broker in the Kafka cluster for it to work as expected. This is also the case for a number of existing Kafka broker configs and since Kafka doesn't support cluster configs at this point, it seems acceptable.
  2. Using hard-coded listener domains for internal and replication traffic. The config format is simpler and there's less scope for hard to understand configs. The main disadvantage is that it's a bit too specific and may need to be extended again as more sophisticated use cases appear. The current proposal is more general and it seems like a natural evolution of the existing system.

...