...

Current state: "Under Discussion"

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The current produce and fetch quota limits are based on byte rate within a quota window. It may be harder to estimate sensible values of request rates for configuring quotas. While 5 MB/second byte rates for producer/consumer are meaningful, 10 requests/second is perhaps less meaningful as a limit. For simpler configuration, quotas for requests will be configured as a percentage of time within a quota window that a client is allowed to useRequest rate limits that specify the maximum number of requests per second will be applied on the same quota window configuration (quota.window.size.seconds with 1 second default) . This approach keeps the code consistent with the existing quota implementation, while making it simpler for administrators to configure quotas for different clients/users. Administrators can use the existing request rate metrics exposed by brokers to determine the rates to allocate to each client/user. If a client/user exceeds the request rate limit, responses will be delayed by an amount that brings the request rate within the limit. The maximum delay applied will be the quota window size.

Default quotas

By default, clients will not be throttled based on request rate, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.

Requests that may be throttled

Requests that update cluster state will be throttled only if authorization for ClusterAction fails. These are infrequent requests for cluster management, typically not used by clients:

StopReplica
ControlledShutdown
LeaderAndIsr
UpdateMetadata

The following requests will not be throttled (unless authorization fails) since they are timing-sensitive or are one-off requests to control brokers:

...

Heartbeat
JoinGroup
LeaveGroup
SyncGroup

...

Fetch and produce requests will continue to be throttled based on byte rates and may also be throttled based on request rates. Fetch requests used for replication will not be throttled based on request rates since it is possible to configure replica.fetch.wait.max.ms and use the existing replication byte rate quotas to limit replication rate.

Metrics and sensors

Two new metrics and corresponding sensors will be added to the broker for tracking request-rate and throttle-time of each quota entity for the new quota type Request. These will be handled similar to the metrics and sensors for Produce/Fetch.

...

kafka-configs.sh will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:

request_time_percentrate : The percentage maximum number of time for requests per second from the user or client within a quota windowabove which requests are throttled

For example:

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request_time_percent=1.0rate=100' --entity-name user1 --entity-type users

...

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request_time_percent=10.0rate=1000' --entity-type users

Proposed Changes

...

Request quotas will be supported for <client-id>, <user> and <user, client-id> similar to the existing produce/fetch byte rate quotas. In addition to produce and fetch rates, an additional quota property will be added for request rate throttling. As with produce/fetch quotas, request quotas will be per-broker. Defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels.

...

Quotas for requests will be configured as a percentage of time within a quota window the number of requests per second that a client is allowed to use. For example, with the default configuration of 1 second quota window size and 8 I/O threads handling requests, the total time a broker can spend processing requests is 8 seconds across all the threads. If user alice has a request quota of 1 percent, the total time all clients of alice can spend in the request handler in any one second window is 80 milliseconds. When this time When this rate is exceeded, a delay is added to the response to bring alice’s the user/client's usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current calculation used for throttling produce/fetch requests:

If O is the observed usage and T is the target usage over a window of W, to bring O down to T, we need to add a delay of X to W such that: O * W / (W + X) = T.
Solving for X, we get X = (O - T)/T * W.
The response will be throttled by min((X, W)

Sample configuration in Zookeeper

...

Code Block

language	js
title	Sample quota configuration

// Quotas for user1
// Zookeeper persistence path /config/users/<encoded-user1>
{
    "version":1,
    "config": {
        "producer_byte_rate":"1024",
        "consumer_byte_rate":"2048",
		"request_time_percentrate" : "1.0100"
    }
}

Co-existence of multiple quotas

Produce and fetch byte rate quotas will continue to be applied as they are today. Request rate throttling will be applied on top if necessary. For example, if a large number of small produce requests are sent followed by a very large one, both request quota and produce byte rate quota may be violated by the same request. The produce byte rate delay is applied first. Request rate delay quota is computed checked only after the produce delay ,. During this time, the quota window time would have moved forward, while the request handling time for this request stays constantis applied. The request rate quota is perhaps no longer violated (or the delay may be lower due to the first delay already applied). The remaining delay if any is applied to the response.

...

Two new metrics and corresponding sensors will be added to track request-time rate and throttle-time of each quota entity for quota type Request. The request-time rate sensor will be configured with the quota for the user/client so that quota violations can be used to add delays to the response. Quota window configuration (quota.window.size.seconds) will be the same as the existing configuration for produce/fetch quotas: 1 second windows window with 11 samples retained in memory by default.

...

One set of integration and system tests will be added for request throttling. Since most of the code can be reused from existing producer/consumer quota implementation and since quota tests take a significant amount of time to run, one test for testing the full path should be sufficient.

Rejected Alternatives

Use request

...

processing time instead of

...

request rate for quota bound

Produce and fetch quotas are configured as byte rates (e.g. 10 MB/sec) and enable throttling based on data volume. Requests could be throttled based on request rate (e.g. 10 requests/sec), making request Quota limits could be applied to each client as a percentage of the total request processing time available. This limits CPU usage on the broker per client/user within each quota window. But the time used for processing different requests is unpredictable and clients/applications will find it harder to control this time. Request rate is a simpler metric and keeps request rate quotas consistent with produce/fetch byte rate quotas. But it will be difficult for administrators to decide request rates to allocate to each user/client, or even default rates. Percentage setting makes it simpler to configure request rate limitsAdministrators can configure request rates based on existing request rate metrics.

Allocate percentage of request handler pool as quota bound

An alternative to measuring monitoring request time rate will be to model the request handler pool as a shared resource and allocate a percentage of the pool capacity to each user/client. But since only one request is read into the pool from each connection, this would be a measure of the number of concurrent connections per user/client rather than the rate of usage (a single or small number of connections can still overload the broker with a continuous sequence of requests). And it will be harder to compute the amount of time to delay a request when the bound is violated.

Use percentage of request rate rather than request

...

rate for quota bound

The current proposal uses System.nanoTime() to compute the time taken per request. Start time is already available as nanoTime(), but end time is currently only available as currentTimeMillis(), so another time measurement is required per-request. It may be possible to count requests/second instead and take a percentage of total requests/second (instead of %request time), enabling quotas only when system Rather than an absolute request rate, we can use a percentage of maximum request rate and throttle clients only when the broker is running at full capacity. Request time percentage rate was chosen since it is easier to configure and testinstead of percentage to keep the configuration and implementation consistent with existing produce/fetch byte rate quotas.

Space shortcuts

Child pages

Versions Compared

Old Version 3

New Version 4

Key

Default quotas

Requests that may be throttled

Metrics and sensors

Proposed Changes

Sample configuration in Zookeeper

Co-existence of multiple quotas

Rejected Alternatives

Use request

processing time instead of

request rate for quota bound

Allocate percentage of request handler pool as quota bound

Use percentage of request rate rather than request

rate for quota bound

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 3

New Version 4

Key

Default quotas

Requests that may be throttled

Metrics and sensors

Proposed Changes

Sample configuration in Zookeeper

Co-existence of multiple quotas

Rejected Alternatives

Use request

processing time instead of

request rate for quota bound

Allocate percentage of request handler pool as quota bound

Use percentage of request rate rather than request

rate for quota bound