Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The recording level for all metrics will be INFO.

The following change to the {{RocksDBConfigSetter}} interface is needed to get a cache that is possibly instantiated in an implementation of the config setter. Unfortunately, RocksDB's Java API does not offer a way to get the cache from the RocksDB instance. However, the metrics in this KIP need to know the cache that is used by the RocksDB instance to be able to aggregate the values correctly over the segments of a segmented state store.

Code Block
languagejava
titleRocksDBConfigSetter.java
public interface RocksDBConfigSetter {

    void setConfig(final String storeName, final Options options, final Map<String, Object> configs);

	/**
     * Returns the cache possibly instantiated in the instance 
     * that implements this interface.
     *
     * If no cache is instantiated, the default behavior does not 
     * need to be overridden.
     * 
     * This method is needed because RocksDB's Java API does not offer
     * any mean to get a cache that is passed to the RocksDB instance 
     * by the config setter.
     */	
    default Cache cache() {        // new method
        return null;
    }

    default void close(final String storeName, final Options options) {
        LOG.warn("The default close will be removed in 3.0.0 -- you should overwrite it if you have implemented RocksDBConfigSetter");
    }
}


Proposed Changes

In this section, I we will explain the meaning of the metrics listed in the previous section. To better understand the metrics, some basic concepts of RocksDB need to be explained first.

  • Memtable: Memtables are in-memory write buffers. Each new key-value pair is first written to a memtable and each read looks first into the memtable before it looks on disk. Once a memtable is full it becomes immutable and it is replaced by a new memtable. A background thread flushes a memtable asynchronously to disk. Additionally, memtables can also be flushed manually. RocksDB keeps in memory the currently active memtables, full but not yet flushed memtables, and flushed memtables that are kept around to maintain write history in memory.
  • Compaction: From time to time RocksDB needs to clean up the data it stores on disk and bring is LSM tree into a good shape (see https://github.com/facebook/rocksdb/wiki/Compaction). Compactions might block writes and flushes. Additionally, RocksDB offers different compaction algorithms with different properties. Thus, it is a good practise to monitor compactions in RocksDB.
  • SST files: SST files are the files in which RocksDB stores the data on disk. SST stands for Sorted Sequence Table.
  • Version: A version consists of all the live SST files at one point of time. Once a flush or compaction finishes, a new version is created because the list of live SST files has changed. An old version can be used by on-going read requests or compaction jobs. Old versions will eventually be garbage collected.
  • Cache: RocksDB caches data in memory for reads. By default, those caches contain only data blocks, i.e., uncompressed sequences of key-value pairs in sorted order. Therefore this cache is often referred to as block cache. However, users can configure RocksDB to also store index and filter blocks in the cache.

...

  • affect backward-compatibility
  • deprecate public interfaces
  • need a migration plan other than adding the new metrics to its own monitoring component

Rejected Alternatives

Introduce configuration in Kafka Streams to name RocksDB properties to expose

Since all of the above metrics can be exposed as gauges, there should not be too much performance overhead because recording is only triggered when the metric is actually queried. We thought that the maintenance costs of a configuration would be higher than just exposing this set of RocksDB properties.