Clients send requests too quickly - eg. a consumer with fetch.max.wait.ms=0 that polls continuously. Fetch byte rate quotas are not sufficient here, either request count quotas or request processing time quotas are required.
DoS attack from clients that overload brokers with continuous authorized or unauthorized requests. Either request count quotas or request processing time quotas that limit all unauthorized requests and all non-broker (client) requests is required.
Client sends produce requests with compressed messages where decompression takes a long time, blocking the request handler thread. Request processing time quotas are required since neither produce byte rate quotas nor request count quotas will be sufficient to limit the broker resources allocated to users/clients in this case.
Consumer group starts with 10 instances and then increases to 20 instances. Number of requests may double, so request counts increase, even though the load on the broker doesn't double since the number of partitions per fetch request has halved. Quotas based on request count per second may not be easy to configure in this case.
Some requests may use more of their quota on the network threads rather than the request handler threads (eg. disk read for fetches happen on the network threads). While quotas of processing time on the request handler thread limit the request rate in many cases above, for a complete request rate quota solution, network thread utilization also needs to be taken into account.

This KIP proposes to allocate control request handler (I/O) thread utilization using request processing time quotas as a percentage of the total time available in a quota window across all the request handler threads. Only request handler that limit the amount of time within each quota window that can be used by users/clients. Only I/O thread utilization will be taken into account in this KIP. Network thread utilization (Scenario 5) will be addressed separately in a future KIP since that is a lot more complex.

Public Interfaces

Request

...

quotas

Request quotas will be configured as a fraction of time a client can spend on request handler (I/O) threads within each quota window. Each request handler thread is represented as one request handler unit, giving a total capacity of num.io.threads units. Each request quota will be the units allocated to a user/client. The current produce and fetch quota limits are based on byte rate within a quota window. Request rates that specify number of requests per second are not sufficient to control request handler thread utilization since the time used by different requests can vary significantly. Request rate quotas will be configured as a percentage of time within a quota window that a client is allowed to use, The limits will be applied to the same quota window configuration (quota.window.size.seconds with 1 second default) as existing produce/fetch quotas. This approach keeps the code consistent with the existing quota implementation, while making it easy for administrators to allocate a slice of each quota window to users/clients to control broker resource utilizationrequest handler thread utilization on the broker. If a client/user exceeds the request processing time limit, responses will be delayed by an amount that brings the request rate within the limit. The maximum delay applied will be the quota window size.

...

By default, clients will not be throttled based on request processing timeI/O thread utilization, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.

...

Fetch and produce requests will continue to be throttled based on byte rates and may also be throttled based on request rateshandler thread utilization. Fetch requests used for replication will not be throttled based on request rates times since it is possible to configure replica.fetch.wait.max.ms and use the existing replication byte rate quotas to limit replication rate.

...

Two new metrics and corresponding sensors will be added to the broker for tracking request-time and throttle-time of each quota entity for the new quota type RequestIOThread. These will be handled similar to the metrics and sensors for Produce/Fetch.

...

kafka-configs.sh will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:

requestio_timethread_percentunits : The percentage fractional units of time per quota window (out of a total of num.io.threads units) for requests from the user or client within a quota window, above which the request may be throttled.

For example:

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'requestio_timethread_percentunits=0.1.0' --entity-name user1 --entity-type users

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'requestio_timethread_percentunits=102.0' --entity-type users

Proposed Changes

...

Request quotas will be supported for <client-id>, <user> and <user, client-id> similar to the existing produce/fetch byte rate quotas. In addition to produce and fetch rates, an additional quota property will be added for request rate throttlingthrottling based on I/O thread utilization. As with produce/fetch quotas, request quotas will be per-broker. Defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels.

Request

...

quotas

Quotas for requests will be configured as a percentage fraction of time within a quota window that a client is allowed to use across all of the I/O threads. The total I/O thread capacity of num.io.threads units can be distributed between clients/users. For example, with the default configuration of 1 second quota window size and 8 I/O threads handling requests, the total time a broker can spend processing requests is 8 seconds across all the threads. If , if user alice has a request I/O thread quota of 0.1 percent, the total time all clients of alice can spend in the request handler threads in any one second window is 80 10 milliseconds. When this time is exceeded, a delay is added to the response to bring alice’s usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current calculation used for throttling produce/fetch requests:

...

Code Block

language	js
title	Sample quota configuration

// Quotas for user1
// Zookeeper persistence path /config/users/<encoded-user1>
{
    "version":1,
    "config": {
        "producer_byte_rate":"1024",
        "consumer_byte_rate":"2048",
		"requestio_timethread_percentunits" : "0.1.0"
    }
}

Co-existence of multiple quotas

Produce and fetch byte rate quotas will continue to be applied as they are today. Request rate processing time throttling will be applied on top if necessary. For example, if a large number of small produce requests are sent followed by a very large one, both request time quota and produce byte rate quota may be violated by the same request. The produce byte rate delay is applied first. Request time quota is checked only after the produce delay is applied. The request time quota is perhaps no longer violated (or the delay may be lower due to the first delay already applied). The remaining delay if any is applied to the response.

...

Two new metrics and corresponding sensors will be added to track request-time and throttle-time of each quota entity for quota type Request IOThread. The request-time sensor will be configured with the quota for the user/client so that quota violations can be used to add delays to the response. Quota window configuration (quota.window.size.seconds) will be the same as the existing configuration for produce/fetch quotas: 1 second window with 11 samples retained in memory by default.

...

As described in Scenario 5, to control broker resource utilization allocated to users/clients, both network thread utilization and request I/O thread utilization should be limited by quotas. This KIP only addresses quotas for request I/O thread utilization. Controlling network thread utilization is more complex and will be addressed in another KIP. The current quota implementation throttles requests by delaying the responses using Purgatory. This works for request handler thread utilization quotas, but we need to think through how this can be integrated into the network layer. Also, while request handlers have access to both user and client-id and can support quotas at <client-id>, <user> and <user, client-id> levels, the network layer does not have access to client-id.

...

None, since by default clients will not be throttled on request rateprocessing time.

If we are changing behavior how will we phase out the older behavior?

Quota limits for request rates processing time can be configured dynamically if required. Older versions of brokers will ignore request rate time quotas.
If request quotas are configured on the broker, throttle time will be returned in the response to clients only if the client supports the new version of requests being throttled.

...

Produce and fetch quotas are configured as byte rates (e.g. 10 MB/sec) and enable throttling based on data volume. Requests could be throttled based on request rate (e.g. 10 requests/sec), making request quotas consistent with produce/fetch quotas. But the time taken for processing different requests can vary significantly and since the goal of the KIP is to enable fair allocation of broker resources between users/clients, request processing time is a better metric suited to this quota.

Use request time percentage instead of absolute fractional units for quota bound

The KIP proposes to use a quota that specifies the absolute fraction of time within each quota window allocated to a client/user. This is out of a total capacity of num.io.threads units since the time is measured across all I/O threads. An alternative would be to configure relative percentage out of a fixed total capacity of 100. Absolute quota was chosen to avoid automatic changes to client quota values when num.io.threads is modified.

Allocate percentage of request handler pool as quota bound

...

Space shortcuts

Child pages

Versions Compared

Old Version 7

New Version 8

Key

Public Interfaces

Request

quotas

Proposed Changes

Request

quotas

Co-existence of multiple quotas

Use request time percentage instead of absolute fractional units for quota bound

Allocate percentage of request handler pool as quota bound

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 7

New Version 8

Key

Public Interfaces

Request

quotas

Proposed Changes

Request

quotas

Co-existence of multiple quotas

Use request time percentage instead of absolute fractional units for quota bound

Allocate percentage of request handler pool as quota bound