Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this KIP, we will discuss a proposal to implement quotas in Kafka. We are proposing an approach that can used for both producer and consumer side quotas.

Public Interfaces

  1. Metrics - The Quota Metrics which will be captured on a per-clientId basis, will be exposed to JMX. These are new metrics and do not use codahale. Instead, they use KM (Kafka Metrics) which is a new metrics library written for Kafka. More details in the Metrics section below.
  2. Client Response - Clients could optionally handle a flag in the response indicating whether they were throttled. This serves to inform users about their quota status and is not an error code.

Proposed Changes

The changes are mainly focussed around Quota Policy, Distribution, Metrics, Quota Actions and config management.

...

Topic Based QuotasWe initially considered doing topic based quotas. This was rejected in favor or client based quotas since it seemed more natural to throttle clients than topics. Multiple producers/consumers can publish/fetch from a single topic. This makes it hard to eliminate interference of badly behaved clients. Topic based quotas also are harder to reason about when clients are accessing multiple topics (e.g. wildcard consumer). If quota for 1 topic is violated, how do we handle requests for the other topics?

Static Configs - Initially, our discussions assumed that we would use the current configuration scheme to manage quota overrides. However, changing overrides will require bouncing the cluster which is a big penalty to pay operationally.

Quota Distribution 

A) Instead of throttling by client id, we can consider throttling by a combination of "clientId, topic". The default quota is also defined on this tuple i.e. 10Mbps for each "clientId, topic". The obvious downside here is that a wildcard consumer can allocate almost infinite quota for itself if it subscribes to all topics. Producers can also be allocated very large quotas based on the number of topics they publish to. In practice, this approach needs to be paired with a cap on the maximum amount of data a client can publish/consume. But this cluster-wide cap again requires global co-ordination which is we mentioned is out of scope.

...