Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Remote Log Manager a.k.a. RLM is one of the critical components in Kafka that helps support tiered storage. One of its responsibilities is to periodically upload rolled-over log segments on the local disk to the remote storage. It discovers the log segments eligible to be uploaded and then uploads them. RLM creates one such task (RLMTask) for each topic partition managed by the server broker and schedules them to run at a fixed frequency (of 30 secs) using a ScheduledThreadPoolExecutor.

When an RLMTask identifies a list of segments to upload, it tries to upload all segments in the same run. As a result, if there are a huge number of eligible segments, the RLM task processes them as quickly as possible. Since there are multiple such tasks are running concurrently within the thread-pool executor, it ends they end up consuming significant CPU, thereby affecting producer latencies. This phenomenon is often noticed when we enable tiered storage for existing topics on a cluster. 

...

Similarly, the RLM also plays a role in serving read requests for log segments residing in the remote storage. When receiving a request for remote read, the RLM submits an async read task to its internal thread pool executor. The async task reads data from the remote storage which that is then sent back in the fetch response. 

A large number of read requests requesting data residing on the remote storage could cause degradation of the remote storage. The large number of read requests may also cause over-utilization of CPU on the broker. This may happen when the majority of the consumers start reading from the earliest offset of their respective Kafka topics. To prevent such degradation, we should have a mechanism to control the rate at which log segments are read from the remote storage. 

Goals

  • The administrator should be able to configure an "upper bound" on the rate at which log segments are uploaded to the remote storage.
  • The administrator should be able to configure an "upper bound" on the rate at which log segments are read from the remote storage.

...