This KIP proposes to control request handler (I/O) thread and network thread utilization using request processing time quotas that limit the amount of time within each quota window that can be used by users/clients. Only I/O thread utilization will be taken into account in this KIP (Scenarios1- 4).

Limitations

...

This KIP attempts to avoid unintended denial-of-service scenarios from misconfigured application (eg. zero heartbeat interval) or a client library with a bug can cause broker overload. While it can reduce the load in some DoS scenarios, it does not completely protect against DoS attacks from malicious clients. A DDoS attack with large numbers of connections resulting in a large amount of expensive authentication or CPU-intensive requests can still overwhelm the broker..

Public Interfaces

Request quotas

Request quotas will be configured as the percentage of time a client can spend on request handler (I/O) threads and network threads within each quota window. A quota of n% represents n% of one request handler thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%. Each request quota will be the total percentage across all request handler and network threads that a user/client may use in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request quotas represent the total percentage of CPU that may be used by the user/client. In future, when if quotas are implemented for utlization utilization of other types of threads (e.g. network threads), the same , the same quota configuration can be used to limit the total utilization across all the threads monitored for quotas.

The limits will be applied to the same quota window configuration (quota.window.size.seconds with 1 second default) as existing produce/fetch quotas. This approach keeps the code consistent with the existing quota implementation, while making it easy for administrators to allocate a slice of each quota window to users/clients to control request handler thread CPU utilization on the broker. If a client/user exceeds the request processing time limit, responses will be delayed by an amount that brings the rate within the limit. The maximum delay applied will be the quota window size.

...

By default, clients will not be throttled based on I/O thread utilizationrequest processing time, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.

Requests

...

exempt from throttling

Requests that update cluster state will be throttled only if authorization for ClusterAction fails. These are infrequent requests for cluster management, typically not used by clients:

...

Fetch and produce requests will continue to be throttled based on byte rates and may also be throttled based on request handler thread utilization. Fetch requests used for replication will not be throttled based on request times since it is possible to configure replica.fetch.wait.max.ms and use the existing replication byte rate quotas to limit replication rate.

...

Two new metrics and corresponding sensors will be added to the broker for tracking request-time and throttle-time of each quota entity for the new quota type Request. These will be handled similar to the metrics and sensors for Produce/Fetch. A delay queue sensor with queue-size for the new quota type Request will also be added similar to the delay queue sensor for Produce/Fetch. All the metrics and sensors for request time throttling will be of similar format to the existing produce/fetch metrics and sensors for consistency, but with new group/name indicating the new quota type Request, keeping these separate from existing metrics/sensors.

An additional metric exempt-request-time will also be added for each quota entity for the quota type Request. This will capture the total time for requests that are exempt from throttling so that administrators can view the full utilization using the combination of two metrics.

Clients will expose average and maximum request throttle time as JMX metrics similar to the current produce/fetch throttle time metrics. The existing metrics will reflect the total throttle time for produce and fetch Clients will expose average and maximum request throttle time as JMX metrics similar to the current produce/fetch throttle time metrics. The existing metrics will reflect the total throttle time for produce and fetch requests including both byte-rate throttling and processing time throttling. Two new metrics request-throttle-time-max and request-throttle-time-min will be added to reflect total request processing time based throttling for all request types including produce/fetch.

...

kafka-configs.sh will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:

request._percentage: The percentage per quota window (out of a total of (num.io.threads + num.network.threads) * 100 %100%) for requests from the user or client, above which the request may be throttled.

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request._percentage=50' --entity-name user1 --entity-type users

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request._percentage=200' --entity-type users

...

Request quotas will be supported for <client-id>, <user> and <user, client-id> similar to the existing produce/fetch byte rate quotas. In addition to produce and fetch rates, an additional quota property will be added for throttling based on I/O thread utilization. As with produce/fetch quotas, request quotas will be per-broker. Defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels.

...

Quotas for requests will be configured as the percentage of time within a quota window that a client is allowed to use across all of the I/O and network threads. The total I/O thread capacity of ((num.io.threads + num.network.threads) * 100 %100%) can be distributed between clients/users. For example, with the default configuration of 1 second quota window size, if user alice has a I/O thread request quota of 1%, the total time all clients of alice can spend in the request handler and network threads in any one second window is 10 milliseconds. When this time is exceeded, a delay is added to the response to bring alice’s usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current calculation used for throttling produce/fetch requests:

If O is the observed usage and T is the target usage over a window of W, to bring O down to T, we need to add a delay of X to W such that: O * W / (W + X) = T.
Solving for X, we get X = (O - T)/T * W.
The response will be throttled by min(X, W)

Network thread time will be recorded for each request without performing throttling when the time is recorded. When I/O thread time is recorded, throttling will be performed, taking into account the total processing time of the user/client in network threads and I/O threads in the quota window. This simplifies the handling of network thread utilization without integrating the throttling mechanism into the network layer.

The maximum throttle time for any single request will be the quota window size The maximum throttle time for any single request will be the quota window size (one second by default). This ensures that timing-sensitive requests like heartbeats are not delayed for extended durations. For example, if a user has a quota of 0.1% and a stop-the-world GC pause takes 100ms during the processing of the user's request, we don't want all the requests from the user to be delayed by 100 seconds. By limiting the maximum delay, we reduce the impact of GC pauses and single large requests. To exploit this limit to bypass quota limits, clients would need to generate requests that take significantly longer than the quota limit. If R is the amount of time taken process one request and the user has C active connections, the maximum amount of time a user/client can use per quota window is max(quota, C * R). In practice, quotas are expected to be much larger than the time taken to process individual requests and hence this limit is sufficient. Byte rate quotas will also additionally help to increase throttling in the case where large produce/fetch requests result in larger per-request time. DoS attacks using large numbers of connections is not addressed in this KIP.

...

Code Block

language	js
title	Sample quota configuration

// Quotas for user1
// Zookeeper persistence path /config/users/<encoded-user1>
{
    "version":1,
    "config": {
        "producer_byte_rate":"1024",
        "consumer_byte_rate":"2048",
		"request._percentage" : "50"
    }
}

`Co-existence of multiple quotas`

...

On the client side, a new sensor named request-throttle-time will be added to track total request throttle time returned in all responses. This is in addition to the sensor used to track produce/fetch throttle times, which will continue to be supported. These existing produce/fetch throttle times will include total throttling time for both bandwidth and utilization for produce/fetch requests. New metrics request-throttle-time-avg and request-throttle-time-max will be added and these will include throttle times across all requests including produce/fetch.

...

`Compatibility, Deprecation, and Migration Plan`

What impact (if any) will there be on existing users?

None, since

As described in Scenario 5 and 6, to control broker resource utilization allocated to users/clients, both network thread utilization and I/O thread utilization should be limited by quotas. This KIP only addresses quotas for I/O thread utilization. Controlling network thread utilization is more complex and will be addressed in another KIP. The current quota implementation throttles requests by delaying the responses using Purgatory. This works for request handler thread utilization quotas, but we need to think through how this can be integrated into the network layer. Also, while request handlers have access to both user and client-id and can support quotas at <client-id>, <user> and <user, client-id> levels, the network layer does not have access to client-id.

`Compatibility, Deprecation, and Migration Plan`

What impact (if any) will there be on existing users?

None, since by default clients will not be throttled on request processing time.

...

The KIP proposes to use a quota that specifies the total percentage of time within each quota window as allocated to a client/user . This is as per-thread value, out of a total capacity of ((num.io.threads+num.network.threads) * 100 %100%) since , with the total request processing time is measured across all I/O and network threads. An alternative would be to configure relative percentage out of a fixed total capacity of 100. Absolute quota was chosen to avoid automatic changes to client quota values when when num.io.threads or num.ionetwork.threads is modified. Since threads are typically based on the number of cores on the broker host, the per-thread quota percentage reflects the % of cores allocated to client/user. This is consistent with other CPU quotas like cgroup and the way CPU usage is reported by commands like top.

`Use fractional units of threads instead of percentage for quota bound`

The KIP proposes to use a quota that specifies the total percentage of time within each quota window allocated to a client/user. This is , out of a total capacity of of ((num.io.threads * 100 %) since the time is measured across all I/O threads+num.network.threads) * 100%). An alternative would be to configure relative percentage out of a fixed total capacity of 100model each thread as one unit and configure quota as a fraction of the total number of available units. Percentage was chosen instead of fractional units of I/O threads so that the same request.request_percentage representing CPU utilization can be continue to be applied when network thread utilization (and even if other threads ) is are added in future.

`Allocate percentage of request handler pool as quota bound`

...

Space shortcuts

Child pages

Versions Compared

Old Version 16

New Version 17

Key

Limitations

Public Interfaces

Request quotas

Requests

exempt from throttling

`Co-existence of multiple quotas`

`Compatibility, Deprecation, and Migration Plan`

`Compatibility, Deprecation, and Migration Plan`

`Use fractional units of threads instead of percentage for quota bound`

`Allocate percentage of request handler pool as quota bound`

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 16

New Version 17

Key

Limitations

Public Interfaces

Request quotas

Requests

exempt from throttling

Co-existence of multiple quotas

Compatibility, Deprecation, and Migration Plan

Compatibility, Deprecation, and Migration Plan

Use fractional units of threads instead of percentage for quota bound

Allocate percentage of request handler pool as quota bound

`Co-existence of multiple quotas`

`Compatibility, Deprecation, and Migration Plan`

`Compatibility, Deprecation, and Migration Plan`

`Use fractional units of threads instead of percentage for quota bound`

`Allocate percentage of request handler pool as quota bound`