Request quotas will be configured as a fraction the percentage of time a client can spend on request handler (I/O) threads within each quota window. Each A quota of n% represents n% of one request handler thread is represented as one request handler unit, giving , so the quota is out of a total capacity of (num.io.threads units* 100)%. Each request quota will be the units percentage allocated to a user/client. The limits will be applied to the same quota window configuration (quota.window.size.seconds with 1 second default) as existing produce/fetch quotas. This approach keeps the code consistent with the existing quota implementation, while making it easy for administrators to allocate a slice of each quota window to users/clients to control request handler thread utilization on the broker. If a client/user exceeds the request processing time limit, responses will be delayed by an amount that brings the rate within the limit. The maximum delay applied will be the quota window size.

Default quotas

Since the number of threads allocated for I/O threads are typically based on the number of cores available on the broker host, request quotas represent the total percentage of CPU that may be used by the user/client. In future, when quotas are implemented for other thread (e.g. network thread) utilization, the same quota configuration can be used to limit the total utilization across all the threads monitored for quotas.

The limits will be applied to the same quota window configuration (quota.window.size.seconds with 1 second default) as existing produce/fetch quotas. This approach keeps the code consistent with the existing quota implementation, while making it easy for administrators to allocate a slice of each quota window to users/clients to control request handler thread utilization on the broker. If a client/user exceeds the request processing time limit, responses will be delayed by an amount that brings the rate within the limit. The maximum delay applied will be the quota window size.

Default quotas

By default, clients will not be throttled based on I/O thread utilization, but defaults can be configured using the dynamic default properties at <clientBy default, clients will not be throttled based on I/O thread utilization, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.

...

To ensure that these exempt requests cannot be used by clients to launch a DoS attack, these requests will be throttled on quota violation if ClusterAction authorization fails. SaslHandshake request will not be throttled when used for authentication, but will be throttled on quota violation if used at any other time.

...

Two new metrics and corresponding sensors will be added to the broker for tracking request-time and throttle-time of each quota entity for the new quota type IOThreadRequest. These will be handled similar to the metrics and sensors for Produce/Fetch. A delay queue sensor with queue-size for the new quota type IOThread Request will also be added similar to the delay queue sensor for Produce/Fetch. All the metrics and sensors for request time throttling will be of similar format to the existing produce/fetch metrics and sensors for consistency, but with new group/name indicating the new quota type IOThreadRequest, keeping these separate from existing metrics/sensors.

...

kafka-configs.sh will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:

iorequest.thread.unitspercentage: The fractional units of time percentage per quota window (out of a total of num.io.threads units* 100 %) for requests from the user or client, above which the request may be throttled.

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'iorequest.thread_units=0.1percentage=50' --entity-name user1 --entity-type users

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'iorequest.thread.units=2.0percentage=200' --entity-type users

Protocol changes

...

Quotas for requests will be configured as a fraction the percentage of time within a quota window that a client is allowed to use across all of the I/O threads. The total I/O thread capacity of (num.io.threads units * 100 %) can be distributed between clients/users. For example, with the default configuration of 1 second quota window size, if user alice has a I/O thread quota of 0.11%, the total time all clients of alice can spend in the request handler threads in any one second window is 10 milliseconds. When this time is exceeded, a delay is added to the response to bring alice’s usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current calculation used for throttling produce/fetch requests:

...

The maximum throttle time for any single request will be the quota window size (one second by default). This ensures that timing-sensitive requests like heartbeats are not delayed for extended durations. For example, if a user has a quota of 0.001 1% and a stop-the-world GC pause takes 100ms during the processing of the user's request, we don't want all the requests from the user to be delayed by 100 seconds. By limiting the maximum delay, we reduce the impact of GC pauses and single large requests. To exploit this limit to bypass quota limits, clients would need to generate requests that take significantly longer than the quota limit. If R is the amount of time taken process one request and the user has C active connections, the maximum amount of time a user/client can use per quota window is max(quota, C * R). In practice, quotas are expected to be much larger than the time taken to process individual requests and hence this limit is sufficient. Byte rate quotas will also additionally help to increase throttling in the case where large produce/fetch requests result in larger per-request time. DoS attacks using large numbers of connections is not addressed in this KIP.

...

Code Block

language	js
title	Sample quota configuration

// Quotas for user1
// Zookeeper persistence path /config/users/<encoded-user1>
{
    "version":1,
    "config": {
        "producer_byte_rate":"1024",
        "consumer_byte_rate":"2048",
		"iorequest.thread.unitspercentage" : "0.150"
    }
}

`Co-existence of multiple quotas`

...

Two new metrics and corresponding sensors will be added to track request-time and throttle-time of each quota entity for quota type IOThreadRequest. The request-time sensor will be configured with the quota for the user/client so that quota violations can be used to add delays to the response. Quota window configuration (quota.window.size.seconds) will be the same as the existing configuration for produce/fetch quotas: 1 second window with 11 samples retained in memory by default. A new delay queue sensor will also be added for quota type IOThread Request. All the new sensor names (IOThreadRequest-<quota-entity>, IOThreadThrottleTimeRequestThrottleTime-<quota-entity> and IOThreadRequest-delayQueue) are prefixed by the quota type, making these sensors consistent with existing sensors for Produce/Fetch. The new metrics will be in the metrics group IOThread Request, distinguishing these from similar metrics for Produce/Fetch byte rates.

...

Anchor
future
future
`Future Work`

As described in Scenario 5 and 6, to control broker resource utilization allocated to users/clients, both network thread utilization and I/O thread utilization should be limited by quotas. This KIP only addresses quotas for I/O thread utilization. Controlling network thread utilization is more complex and will be addressed in another KIP. The current quota implementation throttles requests by delaying the responses using Purgatory. This works for request handler thread utilization quotas, but we need to think through how this can be integrated into the network layer. Also, while request handlers have access to both user and client-id and can support quotas at <client-id>, <user> and <user, client-id> levels, the network layer does not have access to client-id.

`Compatibility, Deprecation, and Migration Plan`

What impact (if any) will there be on existing users?

None, since by default clients will not be throttled on request processing time.

If we are changing behavior how will we phase out the older behavior?

Quota limits for request processing time can be configured dynamically if required. Older versions of brokers will ignore request time quotas.
If request quotas are configured on the broker, throttle time will be returned in the response to clients only if the client supports the new version of requests being throttled.
If request quotas are configured, client produce/fetch throttle-time metrics will reflect total throttle time including bandwidth and utilization based throttling of these requests. The throttle time returned in produce/fetch responses will include this total throttle time.

`Test Plan`

One set of integration and system tests will be added for request throttling. Since most of the code can be reused from existing producer/consumer quota implementation and since quota tests take a significant amount of time to run, one test for testing the full path should be sufficient.

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

Produce and fetch quotas are configured as byte rates (e.g. 10 MB/sec) and enable throttling based on data volume. Requests could be throttled based on request rate (e.g. 10 requests/sec), making request quotas consistent with produce/fetch quotas. But the time taken for processing different requests can vary significantly and since the goal of the KIP is to enable fair allocation of broker resources between users/clients, request processing time is a better metric suited to this quota.

`Use request time percentage instead of absolute fractional units for quota bound`

client-id> levels, the network layer does not have access to client-id.

`Compatibility, Deprecation, and Migration Plan`

What impact (if any) will there be on existing users?

None, since by default clients will not be throttled on request processing time.

If we are changing behavior how will we phase out the older behavior?

Quota limits for request processing time can be configured dynamically if required. Older versions of brokers will ignore request time quotas.
If request quotas are configured on the broker, throttle time will be returned in the response to clients only if the client supports the new version of requests being throttled.
If request quotas are configured, client produce/fetch throttle-time metrics will reflect total throttle time including bandwidth and utilization based throttling of these requests. The throttle time returned in produce/fetch responses will include this total throttle time.

`Test Plan`

One set of integration and system tests will be added for request throttling. Since most of the code can be reused from existing producer/consumer quota implementation and since quota tests take a significant amount of time to run, one test for testing the full path should be sufficient.

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

Produce and fetch quotas are configured as byte rates (e.g. 10 MB/sec) and enable throttling based on data volume. Requests could be throttled based on request rate (e.g. 10 requests/sec), making request quotas consistent with produce/fetch quotas. But the time taken for processing different requests can vary significantly and since the goal of the KIP is to enable fair allocation of broker resources between users/clients, request processing time is a better metric suited to this quota.

`Use request time percentage across all threads instead of per-thread percentage for quota bound`

The KIP proposes to use a quota that specifies the total percentage of time within each quota window allocated to a client/user. This is out of a total capacity of (num.io.threads * 100 %) since the time is measured across all I/O threads. An alternative would be to configure relative percentage out of a fixed total capacity of 100. Absolute quota was chosen to avoid automatic changes to client quota values when num.io.threads is modified. Since threads are typically based on the number of cores on the broker host, the per-thread quota percentage reflects the % of cores allocated to client/user. This is consistent with other CPU quotas like cgroup and the way CPU usage is reported by commands like top.

`Use fractional units of threads instead of percentage for quota bound`

The KIP proposes to use a quota that specifies the absolute fraction total percentage of time within each quota window allocated to a client/user. This is out of a total capacity of (num.io.threads units * 100 %) since the time is measured across all I/O threads. An alternative would be to configure relative percentage out of a fixed total capacity of 100. Absolute quota was chosen to avoid automatic changes to client quota values when num.io.threads is modified.Percentage was chosen instead of fractional units of I/O threads so that the same request.percentage representing CPU utilization can be continue to be applied when network thread utilization (and other threads) is added in future.

`Allocate percentage of request handler pool as quota bound`

...

Space shortcuts

Child pages

Versions Compared

Old Version 14

New Version 15

Key

Default quotas

Default quotas

Protocol changes

`Co-existence of multiple quotas`

Anchor
future
future
`Future Work`

`Compatibility, Deprecation, and Migration Plan`

`Test Plan`

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

`Use request time percentage instead of absolute fractional units for quota bound`

`Compatibility, Deprecation, and Migration Plan`

`Test Plan`

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

`Use request time percentage across all threads instead of per-thread percentage for quota bound`

`Use fractional units of threads instead of percentage for quota bound`

`Allocate percentage of request handler pool as quota bound`

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 14

New Version 15

Key

Default quotas

Default quotas

Protocol changes

Co-existence of multiple quotas

AnchorfuturefutureFuture Work

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

Use request rate instead of request processing time for quota bound

Use request time percentage instead of absolute fractional units for quota bound

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

Use request rate instead of request processing time for quota bound

Use request time percentage across all threads instead of per-thread percentage for quota bound

Use fractional units of threads instead of percentage for quota bound

Allocate percentage of request handler pool as quota bound

`Co-existence of multiple quotas`

Anchor
future
future
`Future Work`

`Compatibility, Deprecation, and Migration Plan`

`Test Plan`

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

`Use request time percentage instead of absolute fractional units for quota bound`

`Compatibility, Deprecation, and Migration Plan`

`Test Plan`

`Rejected Alternatives`

`Use request rate instead of request processing time for quota bound`

`Use request time percentage across all threads instead of per-thread percentage for quota bound`

`Use fractional units of threads instead of percentage for quota bound`

`Allocate percentage of request handler pool as quota bound`