Throttling Algorithm

We propose to use a variation of the Token Bucket algorithm to achieve this behavior:

Let's define:

K: The number of tokens in the bucket
B: The maximum number of tokens that can be accumulated in the bucket (burst)
R: The rate at which we accumulate tokens
T: The last time K got updated

At the time now, we update the number of tokens in the bucket with:

K = min(K + (now - T) * R), B)

A partition mutation with N partitions is admitted iff K > 0, and updates K = K - N. The partition mutation request is rejected otherwise. The variation here is that we allow the number of tokens to become negative to accommodate any number of partitions for a topic.

leverage our existing Rate based throttling mechanism to behavior explained above but we will update its implementation to better cope with bursty workloads.

Let's take an example to illustrate how this works. Let's say that we want to guarantee an average rate R=5 mutations/sec while allowing a burst B=500 mutations. This translates to the following parameters for our current rate based quota:

Quota = 5
Samples = B / R = 100
Time Window = 1s (the default)

If a client sends a request to create 7 topics with 80 partitions each at the time T. It brings the average rate to 5.6 (7 * 80 / 100) which is above the quota so any new subsequent requests is rejected until the quota gets back to R. In theory, the client must wait 12 seconds ((5.6 - 5) / 5 * 100) to get back to R.

In practice, due to the sparse samples (one sample worth 560), the rate won't decrease until that sample is dropped and that will takes 101 seconds. To overcome this, we propose the update the implementation of our Rate metric to spread evenly the burst among the previous windows. So instead of having one sample equal to 560 in the last window, we will have 100 samples equal to 5.6. With this, dropping 12 samples brings the rate below the quota.

Altogether, a mutation with N partitions is admitted iff Average Rate < Q. The partition mutation request is rejected otherwise. The client must wait (Average Rate - Quota) / Quota * # Samples secondsIf the number of tokens decreases below 0 (K <= 0), a new operation with N partition mutations has to wait for R * ( -K + 1 ).

Controller Mutations Quota Manager

...

Code Block

language	js
linenumbers	true

{
  "apiKey": 20,
  "type": "response",
  "name": "DeleteTopicsResponse",
  // Version 1 adds the throttle time.
  //
  // Starting in version 2, on quota violation, brokers send out responses before throttling.
  //
  // Starting in version 3, a TOPIC_DELETION_DISABLED error code may be returned.
  //
  // Version 4 is the first flexible version.
  //
  // Version 5 adds the ErrorMessage field.
  "validVersions": "0-5",
  "flexibleVersions": "4+",
  "fields": [
    { "name": "ThrottleTimeMs", "type": "int32", "versions": "1+", "ignorable": true,
      "about": "The duration in milliseconds for which the request was throttled due to a quota violation, or zero if the request did not violate any quota." },
    { "name": "Responses", "type": "[]DeletableTopicResult", "versions": "0+",
      "about": "The results for each topic we tried to delete.", "fields": [
      { "name": "Name", "type": "string", "versions": "0+", "mapKey": true, "entityType": "topicName",
        "about": "The topic name" },
      { "name": "ErrorCode", "type": "int16", "versions": "0+",
        "about": "The deletion error, or 0 if the deletion succeeded." },
      { "name": "ErrorMessage", "type": "string", "versions": "5+", "nullableVersions": "5+", "ignorable": true,
        "about": "The error message, or null if there was no error." }
    ]}
  ]
} or null if there was no error." }
    ]}
  ]
}

New Broker Configurations

We propose to introduce the following new configuration in the Kafka broker:

Name	Type	Default	Description
controller.quota.window.num	Int	11	The number of samples to retain in memory for alter controller mutations replication quotas
controller.quota.window.size.seconds	Int	1	The time span of each sample for controller mutations quotas

New Quota Types

We propose the introduce the following new quota types in the Kafka Broker:

The maximum burst of mutations allowed at any given second.

Name	Type	Default	Description	controller_mutations_burst*	Long	Long.MaxValue	Type	Default	Description
controller_mutations_rate*	Long	Long.MaxValue	The rate at which mutations are accepted for the create topics request, the create partitions request and the delete topics request.

...

We propose to expose the following new metric in the Kafka Broker:

The remaining burst available in %. 100% means that the burst is fully available. 0% means that the burst is exhausted and thus mutations are not allowed.

Group	Name	Tags	Description	ControllerMutationsQuotaManager	available-burst	(user, client-id) - with the same rules used by existing quota metrics		Tags	Description
ControllerMutationsQuotaManager	rate	(user, client-id) - with the same rules used by existing quota metrics	The current rate.

...

By default, the upgrade should be transparent since the Admin Client will automatically retry QuotaViolationException and return it to the caller only if the retry timeout is reached. In this case, the caller must at minimum handle the RetryableException and retry. Handling retryable Exceptions is something that we can safely expect from clients.

Rejected Alternatives

Using Token Bucket Algorithm

Our initial idea was to use the Token Bucket Algorithm in lieu of the existing Rate based quota. While it is a bit easier to understand due to its explicit handling of the burst, we have finally decided against it in order to remain consistent with the existing quota.

Usage based Quota

We have considered using a usage based quota similarly to the request quota. The usage would mainly be measured as the CPU time taken by the operations. While this would the benefits to cover all the operations in the controller, we have decided against it for the following reasons:

...

Space shortcuts

Child pages

Versions Compared

Old Version 13

New Version 14

Key

Throttling Algorithm

Controller Mutations Quota Manager

New Broker Configurations

New Quota Types

Rejected Alternatives

Using Token Bucket Algorithm

Usage based Quota

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 13

New Version 14

Key

Throttling Algorithm

Controller Mutations Quota Manager

New Broker Configurations

New Quota Types

Rejected Alternatives

Using Token Bucket Algorithm

Usage based Quota