Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Public Interfaces

  • We plan to add a the following new metric metrics :
    1. kafka.network:name=ControlPlaneRequestQueueSize,type=RequestChannel
      kafka.network:name=ControlPlaneResponseQueueSize,type=RequestChannel
      to show the size of the new control plane, request
    queue.We will add two new metrics 
    1. and response queues.
    2. kafka.network:name=
    ControlPlaneNetworkProcessorIdlePercent
    1. ControlPlaneNetworkProcessorAvgIdlePercent,type=SocketServer
      kafka.server:name=
    ControlPlaneRequestHandlerIdlePercent
    1. ControlPlaneRequestHandlerAvgIdlePercent,type=KafkaRequestHandlerPool
      with the former monitoring the idle percentage of pinned control plane network thread, and the latter monitoring the idle percentage of pinned control plane request handler thread. (Pinned control plane threads are explained in details below.)
    2. kafka.network:name=ControlPlaneExpiredConnectionsKilledCount,type=SocketServer
      to show the number of expired connections that were disconnected on the control plane.
  • The meaning of the existing metric
    kafka.network:name=RequestQueueSize,type=RequestChannel
    will be changed, and it will be used to show the size of the data request queue only, which does not include controller requests.
  • The meaning of the two existing metrics
    kafka.network:name=NetworkProcessorAvgIdlePercent,type=SocketServer
    kafka.server:name=RequestHandlerAvgIdlePercent,type=KafkaRequestHandlerPool
    will be changed slightly in that they now only cover the threads for data plane threads, not including the pinned threads for controller requests.
  • We plan to add a new config for brokers to specify a dedicated listener for controller connections: the "control.plane.listener.name" config. When not set explicitly, the config will have a default value of "null" and indicate that the proposed change in this KIP will not take effect. A detailed explanation of the config is shown in the "proposed changes" section.
  • A new listener-to-endpoint entry dedicated to controller connections need to be added to the "listeners" config and the "advertised.listeners" config.
    Also a new entry needs to be added to the "listener.security.protocol.map" to specify the security protocol of the new endpoint.

...

With the dedicated endpoints for controller connections, upon startup a broker will use the "control.plane.listener.name" to look up the corresponding endpoint in the listeners list for binding. For instance, in the example given above, the broker will derive the dedicated endpoint to be "CONTROLLER://192.1.1.8:9091". Then it will have a new dedicated acceptor that binds to this endpoint, and listens for controller connections. When a connection is received, the socket will be given to a dedicated control plane processor thread (network thread). The dedicated processor thread reads controller requests from the socket and enqueues them to a new dedicated control plane request queue, whose capacity is 20 [2].  On the other side of the controller request queue, a dedicated control plane request handler thread will take requests out, and handles them in the same way as being done today. In summary, we are 1) adding a dedicated acceptor, 2) pinning one processor thread, 3) adding a new request queue, and 4) pinning one request handler thread for controller connections and requests. The two new threads are exclusively for requests from the controller and do not handle data plane requests.

The metricmetrics

kafka.network:name=ControlPlaneRequestQueueSize,type=RequestChannel
kafka.network:name=ControlPlaneResponseQueueSize,type=RequestChannel

will be added to monitor the size of the new control plane request queueand response queues. Another two new metrics

...

  • Impacts: Controller requests will not longer be blocked by data requests, which should mitigate the effect of stale metadata listed in the motivation section.
  • Migration plan: 2 rounds of rolling upgrades are needed to pick up the proposed changes in this KIP. The goal of the first round is to add the controller endpoint, without adding the "control.plane.listener.name" config. Specifically, an endpoint with the controller listener name should be added to the "listeners" config, e.g. "CONTROLLER://192.1.1.8:9091"; if the "advertised.listeners" config is explicitly configured and is not getting its value from "listeners", the new endpoint for controller should also be added to "advertised.listeners". After the first round is completed, controller to brokers communication should still behave in the same way that uses the inter-broker-listener-name, since the "control.plane.listener.name" is not set yet. In the 2nd round, the "control.plane.listener.name" config should be set to the corresponding listener name, e.g. "CONTROLLER". During rolling upgrade of the 2nd round, controller to brokers traffic will start using the "control.plane.listener.name", and go through the proposed changes in this KIP.
  • No special migration tools are needed.The existing
  • behavior will be removed after the PR is merged inThis KIP does not support dynamic update of config.controlPlaneListenerName, to add or remove the control-plane.

Rejected Alternatives

  1. A few previous designs do not involve adding the dedicated endpoints, and focus on controller request prioritization after controller requests are read from the socket. However without the dedicated controller endpoints, a controller request can still be blocked in cases where the request queue for data requests is full. This is because today one processor thread can handle multiple connections, say 100 connections represented by connection0, ... connection99, among which connection0-98 are from clients, and connection99 is from the controller. Further let's assume after one selector polling, there are incoming requests on all connections. When the request queue is full, the processor thread will be blocked first when trying to enqueue the data request from connection0, and then possibly blocked again for the data request from connection1, ... etc even though the controller request is ready to be enqueued.

...