Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The proposal is to throttle based on client IDs. Any client using the system presents a client id (producer or consumer). Each client will receive a default quota which can be overridden on a per-client basis . We will "clients" and "users" interchangeably in the rest of the document. dynamically. In addition, there will be a quota reserved for clients not presenting a client id (simple consumers not setting the id). This will default to an empty client id ("") and all such clients will share the quota for that empty id (which should be the default quota).

A quota will defined terms of read/write bytes per second. Any user that has just joined the cluster will receive a default quota per broker (for e.g. 10MBps read, 5MBps write). We do expect that there will be some high volume clients that require more than the default quota. For such clients, we need to provide a mechanism to override their quotas. In short, we are proposing fixed quotas for everyone but the top k outliers which can justify custom quotas. If users violate their quota, we will throttle fetch/produce requests for them.

Producer side quotas are defined in terms of bytes written per second per client id. Consumer quotas as defined in terms of bytes read per second per client id. For example: if a client deletes their consumer offsets and "bootstraps" quickly, this should cause a "read bytes per second" violation and will throttle (slow down) that consumer group. It should have no effect on any producerother clients.

These metrics should be aggregated over a short period of time (5-10 seconds) before we declare a quota violation. This reduces the likelihood of bursts in traffic. In addition, replication traffic will be exempt from quotas. 

...

We need a mechanism to distribute a clients quota across all the brokers in a cluster. There are several options available and they have been described in the rejected alternatives section at the bottom. 

Our proposal is to divide define the bandwidth evenly across all broker instances. If we have 5 brokers in the cluster, each on a per-broker basis. Each client can publish a maximum of X MBps (configurable) per broker before it gets throttled. This approach assumes that client requests are properly distributed among all the broker instances which may not always be the case. However, this is much better than having a fixed cluster wide bandwidth per client because that would require a mechanism to share current quota usage per-client among all the brokers. This can be very tricky to implement and is outside the scope of this proposal. 


Configuration Management

How do we manage the quota overrides and the default topic configs? Manually configuring brokers with these is painful. In this case, the ability to dynamically change configs without bouncing brokers is very useful. There is already a proposal/patch for dynamic configuration management by Joe Stein which we plan to leverage for distributing these quota configs. In the future, we also need to think about restricting access to these configs (so that customers cannot modify their own quotas) but that is a separate discussion.

...