Status

Current state: "Under Discussion"

Discussion thread: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka currently supports quotas by data volume. Clients that produce or fetch messages at a byte rate that exceeds their quota are throttled by delaying the response by an amount that brings the byte rate within the configured quota. However, if a client sends requests too quickly (e.g., a consumer with fetch.max.wait.ms=0), it can still overwhelm the broker even though individual request/response size may be small. It will be useful to additionally support throttling by request rate to ensure that broker resources are not monopolized by some users/clients.

Public Interfaces

Request rate quotas

The current produce and fetch quota limits are based on byte rate within a quota window. Request rate limits that specify the maximum number of requests per second will be applied on the same quota window configuration (quota.window.size.seconds with 1 second default) . This approach keeps the code consistent with the existing quota implementation. Administrators can use the existing request rate metrics exposed by brokers to determine the rates to allocate to each client/user. If a client/user exceeds the request rate limit, responses will be delayed by an amount that brings the request rate within the limit. The maximum delay applied will be the quota window size.

Default quotas

By default, clients will not be throttled based on request rate, but defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels. Defaults as well as overrides are stored as dynamic configuration properties in Zookeeper alongside the other rate limits.

Requests that may be throttled

Requests that update cluster state will be throttled only if authorization for ClusterAction fails. These are infrequent requests for cluster management, typically not used by clients:

StopReplica
ControlledShutdown
LeaderAndIsr
UpdateMetadata

All other requests may be throttled if the rate exceeds the configured quota. All requests that may be throttled will have an additional field request_throttle_time_ms to indicate to the client that the request was throttled.The versions of these requests will be incremented.

Fetch and produce requests will continue to be throttled based on byte rates and may also be throttled based on request rates. Fetch requests used for replication will not be throttled based on request rates since it is possible to configure replica.fetch.wait.max.ms and use the existing replication byte rate quotas to limit replication rate.

Metrics and sensors

Two new metrics and corresponding sensors will be added to the broker for tracking request-rate and throttle-time of each quota entity for the new quota type Request. These will be handled similar to the metrics and sensors for Produce/Fetch.

Clients will expose average and maximum request throttle time as JMX metrics similar to the current produce/fetch throttle time metrics.

Tools

kafka-configs.sh will be extended to support request quotas. A new quota property will be added, which can be applied to <client-id>, <user> or <user, client-id>:

request_rate : The maximum number of requests per second from the user or client above which requests are throttled

For example:

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request_rate=100' --entity-name user1 --entity-type users

Default quotas for <client-id>, <user> or <user, client-id> can be configured by omitting entity name. For example:

bin/kafka-configs --zookeeper localhost:2181 --alter --add-config 'request_rate=1000' --entity-type users

Proposed Changes

Quota entity

Request quotas will be supported for <client-id>, <user> and <user, client-id> similar to the existing produce/fetch byte rate quotas. In addition to produce and fetch rates, an additional quota property will be added for request rate throttling. As with produce/fetch quotas, request quotas will be per-broker. Defaults can be configured using the dynamic default properties at <client-id>, <user> and <user, client-id> levels.

Request rate quotas

Quotas for requests will be configured as the number of requests per second that a client is allowed to use. When this rate is exceeded, a delay is added to the response to bring the user/client's usage within the configured quota. The maximum delay added to any response will be the window size. The calculation of delay will be the same as the current calculation used for throttling produce/fetch requests:

If O is the observed usage and T is the target usage over a window of W, to bring O down to T, we need to add a delay of X to W such that: O * W / (W + X) = T.
Solving for X, we get X = (O - T)/T * W.
The response will be throttled by min((X, W)

Sample configuration in Zookeeper

Sample quota configuration

// Quotas for user1
// Zookeeper persistence path /config/users/<encoded-user1>
{
    "version":1,
    "config": {
        "producer_byte_rate":"1024",
        "consumer_byte_rate":"2048",
		"request_rate" : "100"
    }
}

Co-existence of multiple quotas

Produce and fetch byte rate quotas will continue to be applied as they are today. Request rate throttling will be applied on top if necessary. For example, if a large number of small produce requests are sent followed by a very large one, both request quota and produce byte rate quota may be violated by the same request. The produce byte rate delay is applied first. Request rate quota is checked only after the produce delay is applied. The request rate quota is perhaps no longer violated (or the delay may be lower due to the first delay already applied). The remaining delay if any is applied to the response.

Metrics and sensors

Two new metrics and corresponding sensors will be added to track request-rate and throttle-time of each quota entity for quota type Request. The request-rate sensor will be configured with the quota for the user/client so that quota violations can be used to add delays to the response. Quota window configuration (quota.window.size.seconds) will be the same as the existing configuration for produce/fetch quotas: 1 second window with 11 samples retained in memory by default.

Metrics and sensors will be expired as they are today for Produce/Fetch quotas.

Compatibility, Deprecation, and Migration Plan

What impact (if any) will there be on existing users?

None, since by default clients will not be throttled on request rate.

If we are changing behavior how will we phase out the older behavior?

Quota limits for request rates can be configured dynamically if required. Older versions of brokers will ignore request rate quotas.
If request quotas are configured on the broker, throttle time will be returned in the response to clients only if the client supports the new version of requests being throttled.

Test Plan

One set of integration and system tests will be added for request throttling. Since most of the code can be reused from existing producer/consumer quota implementation and since quota tests take a significant amount of time to run, one test for testing the full path should be sufficient.

Rejected Alternatives

Use request processing time instead of request rate for quota bound

Quota limits could be applied to each client as a percentage of the total request processing time available. This limits CPU usage on the broker per client/user within each quota window. But the time used for processing different requests is unpredictable and clients/applications will find it harder to control this time. Request rate is a simpler metric and keeps request rate quotas consistent with produce/fetch byte rate quotas. Administrators can configure request rates based on existing request rate metrics.

Allocate percentage of request handler pool as quota bound

An alternative to monitoring request rate will be to model the request handler pool as a shared resource and allocate a percentage of the pool capacity to each user/client. But since only one request is read into the pool from each connection, this would be a measure of the number of concurrent connections per user/client rather than the rate of usage (a single or small number of connections can still overload the broker with a continuous sequence of requests). And it will be harder to compute the amount of time to delay a request when the bound is violated.

Use percentage of request rate rather than request rate for quota bound

Rather than an absolute request rate, we can use a percentage of maximum request rate and throttle clients only when the broker is running at full capacity. Request rate was chosen instead of percentage to keep the configuration and implementation consistent with existing produce/fetch byte rate quotas.

Exempt timing-sensitive requests such as consumer heartbeat

Consumer requests such as heartbeat are timing sensitive and it is possible that throttling these requests could result in the consumer falling out of the consumer group. However, if we don't throttle some requests, a misbehaving application or a misconfigured application with low heartbeat interval or a client library with a bug can cause broker overload. To protect the cluster from DoS attacks and avoid these overload scenarios, all client requests will be throttled. In most deployments, quotas will be set to a large enough value to provide a safety net rather than very small values that throttle well-behaved clients, so this shouldn't be an issue.

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

Request rate quotas

Default quotas

Requests that may be throttled

Metrics and sensors

Tools

Proposed Changes

Quota entity

Request rate quotas

Sample configuration in Zookeeper

Co-existence of multiple quotas

Metrics and sensors

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

Use request processing time instead of request rate for quota bound

Allocate percentage of request handler pool as quota bound

Use percentage of request rate rather than request rate for quota bound

Exempt timing-sensitive requests such as consumer heartbeat

Space shortcuts

Child pages

KIP-124 - Request rate quotas

Status

Motivation

Public Interfaces

Request rate quotas

Default quotas

Requests that may be throttled

Metrics and sensors

Tools

Proposed Changes

Quota entity

Request rate quotas

Sample configuration in Zookeeper

Co-existence of multiple quotas

Metrics and sensors

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

Use request processing time instead of request rate for quota bound

Allocate percentage of request handler pool as quota bound

Use percentage of request rate rather than request rate for quota bound

Exempt timing-sensitive requests such as consumer heartbeat