Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

RLM maintains a bounded cache(possibly LRU) of the index files of remote log segments to avoid multiple index fetches from the remote storage. They are stored in a directory `remote-log-index-cache` under log dir. These indexes can be used in the same way as local segment indexes are used. User can configure `remote.log.index.file.cache.total.size.mb` to set the total size that can be used for these index files

The earlier approach consists of pulling the remote log segment metadata from remote log storage APIs as mentioned in the earlier RemoteStorageManager_Old section. This approach worked fine for storages like HDFS. One of the problems of relying on remote storage to maintain metadata is that tiered-storage needs to be strongly consistent, with an impact not only on the metadata itself (e.g. LIST in S3) but also on the segment data (e.g. GET after a DELETE in S3). Also, the cost (and to a lesser extent performance) of maintaining metadata in remote storage needs to be factored in. In the case of S3, frequent LIST APIs incur huge costs. 

...

System-Wide

remote.log.storage.enable - Whether to enable remote log storage or not. Valid values are `true` or `false` and the default value is false. This property gives backward compatibility.

remote.log.storage.manager.class.name - This is mandatory if the remote.log.storage.enable is set as true.

remote.log.metadata.manager.class.name(optional) - This is an optional property. If this is not configured, Kafka uses an inbuilt metadata manager backed by an internal topic.

RemoteStorageManager

(These configs are dependent on remote storage manager implementation)

remote.log.storage.*

RemoteLogMetadataManager

(These configs are dependent on remote log metadata manager implementation)

remote.log.metadata.*

Thread pools
Remote log manager related configuration.

remote.log.index.file.cache.total.size.mb
The total size of the space allocated to store index files fetched from remote storage in the local storage.
Default value: 1024

remote.log.manager.thread.pool.size
Remote log thread pool size, which is used in scheduling tasks to copy segments, and clean up remote log segments.
Default value: 10

remote.log.manager.task.interval.ms
The interval at which the remote log manager runs the scheduled tasks like copy segments, and clean up remote log segments.
Default value: 30,000

remote.log.reader.threads
Remote log reader thread pool size, which is used in scheduling tasks to fetch data from remote storage.  
Default value: 5

remote.log.reader.max.pending.tasks
Maximum remote log reader thread pool task queue size. If the task queue is full, broker will stop reading remote log segments.
Default value: 100

Per Topic Configuration

 User can set the desired config for remote.log.storage.enable property while creating a topic but it is not allowed to be updated after the topic is created. Other remote.log.* properties can be modified. We will support flipping remote.log.storage.enable in the next versions.

Below retention configs are similar to the log retention. This configuration is used to determine how long the log segments are to be retained in the local storage. Existing log.retention.* are retention configs for the topic partition which includes both local and remote storage. 

local.log.retention.ms
The number of milli seconds to keep the local log segment before it gets deleted. If not set, the value in `log.retention.minutes` is used. If set to -1, no time limit is applied.

local.log.retention.bytes
The maximum size of local log segments that can grow for a partition before it deletes the old segments. There is no default value, but the above time based retention always applies.

...