Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • We can decrease the threadpool size used by the executor. Reducing the pool size would prevent too many concurrent reads  The downside is that we will have to tune the threadpool size depending on the hardware specifications of the cluster.
  • We extend the quota framework to support quota on remote reads. We could specify the maximum rate at which we want to read data from the remote storage. When a worker thread in the reader threadpool picks up a new read task, it checks if the read quota has been exceeded. If the quota has not been exceeded, it proceeds to execute the read task, else the read task must wait. This can be implemented in two ways:
    • Compute the throttle time using the quota framework and sleep for that amount of time before executing the request. This means that the threads of the threadpool will always look busy even if they are just waiting.
    • Instead of using a linked blocking queue in the ThreadPoolExecutor, we use a DelayQueue. The read task is re-queued with the compute throttle time as the delay.
      • DelayQueues are unbounded, so we will have to override it to make it bounded.
      • ThreadPoolExecutor needs a BlockingQueue<Runnable>, hence cant use a DelayQueue with it.
    • Alternatively, we compute the throttle time using the quota framework and use it to throttle the client instead. The throttle time is propagated back in the response payload and we use it to throttle further fetch requests from the client. This approach is rejected because it also throttles fetch requests for topic partitions that do not need remote data.
  • We extend the quota framework to support client-level read quotas. We could define both request and bandwidth quotas per client for remote storage read requests. If the client violates any of the configured quotas, we throttle the next request from the client, by specifying the throttle time in the fetch response (Ref.) and muting the channel.
    • Allows us to throttle only the misbehaving clients and not all clients(including polite clients)
    • This approach is rejected because it also throttles fetch requests for topic partitions that do no need remote data.

...