Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Throttling Algorithm

We propose to use a variation of the Token Bucket algorithm to achieve this behavior:

Let's define:

  • K: The number of tokens in the bucket
  • B: The maximum number of tokens that can be accumulated in the bucket (burst)
  • R: The rate at which we accumulate tokens
  • T: The last time K got updated

At the time now, we update the number of tokens in the bucket with:

  • K = min(K + (now - T) * R), B)

A partition mutation with N partitions is admitted iff K > 0, and updates K = K - N. The partition mutation request is rejected otherwise. The variation here is that we allow the number of tokens to become negative to accommodate any number of partitions for a topic.

leverage our existing Rate based throttling mechanism to behavior explained above but we will update its implementation to better cope with bursty workloads.

Let's take an example to illustrate how this works. Let's say that we want to guarantee an average rate R=5 mutations/sec while allowing a burst B=500 mutations. This translates to the following parameters for our current rate based quota:

  • Quota = 5
  • Samples = B / R = 100
  • Time Window = 1s (the default) 

If a client sends a request to create 7 topics with 80 partitions each at the time T. It brings the average rate to 5.6 (7 * 80 / 100) which is above the quota so any new subsequent requests is rejected until the quota gets back to R. In theory, the client must wait 12 seconds ((5.6 - 5) / 5 * 100) to get back to R.

In practice, due to the sparse samples (one sample worth 560), the rate won't decrease until that sample is dropped and that will takes 101 seconds. To overcome this, we propose the update the implementation of our Rate metric to spread evenly the burst among the previous windows. So instead of having one sample equal to 560 in the last window, we will have 100 samples equal to 5.6. With this, dropping 12 samples brings the rate below the quota.

Altogether, a mutation with N partitions is admitted iff Average Rate < Q. The partition mutation request is rejected otherwise. The client must wait (Average Rate - Quota) / Quota * # Samples secondsIf the number of tokens decreases below 0 (K <= 0), a new operation with N partition mutations has to wait for R * ( -K + 1 ).

Controller Mutations Quota Manager

...

Code Block
languagejs
linenumberstrue
{
  "apiKey": 20,
  "type": "response",
  "name": "DeleteTopicsResponse",
  // Version 1 adds the throttle time.
  //
  // Starting in version 2, on quota violation, brokers send out responses before throttling.
  //
  // Starting in version 3, a TOPIC_DELETION_DISABLED error code may be returned.
  //
  // Version 4 is the first flexible version.
  //
  // Version 5 adds the ErrorMessage field.
  "validVersions": "0-5",
  "flexibleVersions": "4+",
  "fields": [
    { "name": "ThrottleTimeMs", "type": "int32", "versions": "1+", "ignorable": true,
      "about": "The duration in milliseconds for which the request was throttled due to a quota violation, or zero if the request did not violate any quota." },
    { "name": "Responses", "type": "[]DeletableTopicResult", "versions": "0+",
      "about": "The results for each topic we tried to delete.", "fields": [
      { "name": "Name", "type": "string", "versions": "0+", "mapKey": true, "entityType": "topicName",
        "about": "The topic name" },
      { "name": "ErrorCode", "type": "int16", "versions": "0+",
        "about": "The deletion error, or 0 if the deletion succeeded." },
      { "name": "ErrorMessage", "type": "string", "versions": "5+", "nullableVersions": "5+", "ignorable": true,
        "about": "The error message, or null if there was no error." }
    ]}
  ]
} or null if there was no error." }
    ]}
  ]
}

New Broker Configurations

We propose to introduce the following new configuration in the Kafka broker:

NameTypeDefaultDescription
controller.quota.window.numInt11

The number of samples to retain in memory for alter controller mutations replication quotas

controller.quota.window.size.secondsInt1

The time span of each sample for controller mutations quotas

New Quota Types

We propose the introduce the following new quota types in the Kafka Broker:

The maximum burst of mutations allowed at any given second.
NameTypeDefaultDescriptioncontroller_mutations_burst*LongLong.MaxValueTypeDefaultDescription
controller_mutations_rate*LongLong.MaxValue

The rate at which mutations are accepted for the create topics request, the create partitions request and the delete topics request.

...

We propose to expose the following new metric in the Kafka Broker:

The remaining burst available in %. 100% means that the burst is fully available. 0% means that the burst is exhausted and thus mutations are not allowed.
GroupNameTagsDescriptionControllerMutationsQuotaManageravailable-burst(user, client-id) - with the same rules used by existing quota metricsTagsDescription
ControllerMutationsQuotaManagerrate(user, client-id) - with the same rules used by existing quota metricsThe current rate.

...

  • By default, the upgrade should be transparent since the Admin Client will automatically retry QuotaViolationException and return it to the caller only if the retry timeout is reached. In this case, the caller must at minimum handle the RetryableException and retry. Handling retryable Exceptions is something that we can safely expect from clients.

Rejected Alternatives

Using Token Bucket Algorithm

Our initial idea was to use the Token Bucket Algorithm in lieu of the existing Rate based quota. While it is a bit easier to understand due to its explicit handling of the burst, we have finally decided against it in order to remain consistent with the existing quota.

Usage based Quota

We have considered using a usage based quota similarly to the request quota. The usage would mainly be measured as the CPU time taken by the operations. While this would the benefits to cover all the operations in the controller, we have decided against it for the following reasons:

...