Table of Contents |
---|
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: "Under Discussion"
...
The current produce and fetch quota limits are based on byte rate within a quota window. It may be harder to estimate sensible values for of request rates for configuring quotas. While 5 MB/second byte rates for producer/consumer are meaningful, 10 requests/second is perhaps less meaningful as a limit. For simpler configuration, quotas for requests will be configured as a percentage of time within a quota window that a client is allowed to use. This approach keeps the code consistent with the existing quota implementation, while making it simpler for administrators to configure quotas for different clients/users.
...
By default, clients will not be throttled based on request rate, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.
Requests that may be throttled
The following requests will not be throttled since they are timing-sensitive or are one-off requests to control brokers:
- StopReplica
- ControlledShutdown
- Heartbeat
- JoinGroup
- LeaveGroup
- SyncGroup
All other requests may be throttled if the rate exceeds the configured quota. All requests that may be throttled will have an additional field request_throttle_time_ms
to indicate to the client that the request was throttled.The versions of these requests will be incremented.
Fetch and produce requests will continue to be throttled based on byte rates and may also be throttled based on request rates.
Metrics and sensors
Two new metrics and corresponding sensors will be added to track the broker for tracking request-rate and throttle-time of each quota entity for the new quota type Request. These will be handled similar to the metrics and sensors for Produce/Fetch.
Clients will expose average and maximum request throttle time as JMX metrics similar to the current produce/fetch throttle time metrics.
Tools
kafka-configs.sh
will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:
...
Quotas for requests will be configured as a percentage of time within a quota window that a client is allowed to use. For example, with the default configuration of a 1 second quota window size and 8 I/O threads handling requests, the total time a broker can spend processing requests is 8 seconds across all the threads. If user alice has a request quota of 1 percent, the total time all clients of alice can spend in the request handler in any one second window is 80 milliseconds. When this time is exceeded, a delay is added to the response to bring alice’s usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current rate calculation used for throttling produce/fetch requests:
- If O is the observed usage and T is the target usage over a window of W, to bring O down to T, we need to add a delay of X to W such that:
O * W / (W + X) = T
. - Solving for X, we get
X = (O - T)/T * W
.
Sample configuration in Zookeeper
The version number for quota configuration will be increased from 1 to 2.
Code Block | ||||
---|---|---|---|---|
| ||||
// Quotas for user1 // Zookeeper persistence path /config/users/<encoded-user1> { "version":21, "config": { "producer_byte_rate":"1024", "consumer_byte_rate":"2048", "request_time_percent" : "1.0" } } |
...
What impact (if any) will there be on existing users?
- None, since by default clients will not be throttled on request rate.
If we are changing behavior how will we phase out the older behavior?By default, clients are not throttled on request rate. Quotas
- Quota limits for request rates can be configured dynamically if required. Older versions of brokers will ignore request rate quotas.
- If request quotas are configured on the broker, throttle time will be returned in the response to clients only if the client supports the new version of requests being throttled.
Test Plan
One set of integration and system tests will be added for request throttling. Since most of the code can be reused from existing producer/consumer quota implementation and since quota tests take a significant amount of time to run, one test for testing the full path should be sufficient.
Rejected Alternatives
Use request rate instead of percentage for quota bound
Produce and fetch quotas are configured as byte rates (e.g. 10 MB/sec) and enable throttling based on data volume. Requests could be throttled based on request rate (e.g. 10 requests/sec), making request quotas consistent with produce/fetch quotas. But it will be difficult for administrators to decide request rates to allocate to each user/client, or even default rates. Percentage setting makes it simpler to configure request rate limits.
Allocate percentage of request handler pool as quota bound
An alternative to measuring request time will be to model the request handler pool as a shared resource and allocate a percentage of the pool capacity to each user/client. But since only one request is read into the pool from each connection, this would be a measure of the number of concurrent connections per user/client rather than the rate of usage (a single or small number of connections can still overload the broker with a continuous sequence of requests). And it will be harder to compute the amount of time to delay a request when the bound is violated.
Use percentage of request rate rather than request time for quota bound
The current proposal uses System.nanoTime()
to compute the time taken per request. Start time is already available as nanoTime(),
but end time is currently only available as currentTimeMillis(),
so another time measurement is required per-request. It may be possible to count requests/second instead and take a percentage of total requests/second (instead of %request time), enabling quotas only when system is running at full capacity. Request time percentage was chosen since it is easier to configure and test.
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.