You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Status

Current state: Accepted

Discussion thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/202005.mbox/%3CCADR0NwzJBJa6WihnpmGj0R%2BYPVrojq4Kg_hOArNEytHAG-tZAQ%40mail.gmail.com%3E

JIRA: Unable to render Jira issues macro, execution error.

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Monitoring the memory used by RocksDB instances run in a Kafka Streams application would allow to react to an increased memory demand by RocksDB before the application runs out of memory. Although, the memory used by RocksDB can be bounded in Kafka Streams, the bound is not a hard limit. Currently, the metrics exposed by Kafka Streams include information about RocksDB instances run by an application (see KIP-471: Expose RocksDB Metrics in Kafka Streams for more details) but they do not provide any information about the memory usage of the RocksDB instances. This KIP proposes to add metrics that record the memory used by RocksDB to Kafka Streams.  


Public Interfaces

Each added metric will be on store-level and have the following tags:

  • type = stream-state-metrics
  • thread-id = [thread ID]
  • task-id = [task ID]
  • rocksdb-state-id = [store ID]    for key-value stores
  • rocksdb-session-state-id = [store ID]    for session stores
  • rocksdb-window-state-id = [store ID]    for window stores  

The following metrics will be exposed in the Kafka Streams' metrics

  • num-immutable-mem-table
  • mem-table-flush-pending
  • compaction-pending
  • background-errors
  • cur-size-active-mem-table
  • cur-size-all-mem-tables
  • size-all-mem-tables
  • num-entries-active-mem-table
  • num-entries-imm-mem-tables
  • num-deletes-active-mem-table
  • num-deletes-imm-mem-tables
  • estimate-num-keys
  • estimate-table-readers-mem
  • num-live-versions
  • estimate-live-data-size
  • total-sst-files-size
  • live-sst-files-size
  • estimate-pending-compaction-bytes
  • num-running-compactions
  • num-running-flushes
  • actual-delayed-write-rate
  • estimate-oldest-key-time
  • block-cache-capacity
  • block-cache-usage
  • block-cache-pinned-usage

The recording level for all metrics will be INFO.

Proposed Changes

In this section, I will explain the meaning of the metrics listed in the previous section. To better understand the metrics, some basic concepts of RocksDB need to be explained.

  • Memtable: Memtables are in-memory write buffers. Each new key-value pair is first written to a memtable and each read looks first into the memtable before it looks on disk. Once a memtable is full it becomes immutable and it is replaced by a new memtable. A background thread flushes a memtable asynchronously to disk. Additionally, memtables can also be flushed manually. RocksDB keeps in memory the currently active memtables, full but not yet flushed memtables, and flushed memtables that are kept around to maintain write history in memory.
  • Compaction: From time to time RocksDB needs to clean up the data it stores on disk and bring is LSM tree into a good shape (see https://github.com/facebook/rocksdb/wiki/Compaction). Compactions might block writes and flushes. Additionally, RocksDB offers different compaction algorithms with different properties. Thus, it is a good practise to monitor compactions in RocksDB.
  • SST files: SST files are the files in which RocksDB stores the data on disk. SST stands for Sorted Sequence Table.

num-immutable-mem-table

Number of immutable memtables that have not yet been flushed. For segmented state stores, the sum of the number of immutable memtables over all segments is reported.

cur-size-active-mem-table

Approximate size of active memtable in bytes. For segmented state stores, the sum of the sizes over all segments is reported.

cur-size-all-mem-tables

Approximate size of active and unflushed immutable memtable in bytes. For segmented state stores, the sum of sizes over all segments is reported.

size-all-mem-tables

Approximate size of active, unflushed immutable, and pinned immutable memtables in bytes. For segmented state stores, the sum of sizes over all segments is reported.

num-entries-active-mem-table

Total number of entries in the active memtable. sum for segmented stores

num-entries-imm-mem-tables

Total number of entries in the unflushed immutable memtables. sum for segmented stores

num-deletes-active-mem-table

Total number of delete entries in the active memtable. sum for segmented stores

num-deletes-imm-mem-tables

Total number of delete entries in the unflushed immutable memtables. sum for segmented stores

mem-table-flush-pending

This metric returns 1 if a memtable flush is pending; otherwhise it returns 0. sum for segmented stores

num-running-flushes

Number of currently running flushes. sum for segmented stores

num-running-compactions

Number of currently running compactions. For segmented state stores, the sum of the number of currently running compactions over all segments is reported.

estimate-pending-compaction-bytes

Estimated total number of bytes a compaction needs to rewrite to get all levels down to under target size. This metric is not valid for compactions other than level-based. For segmented state stores, the sum of the estimated total number of bytes over all segments is reported.

compaction-pending

This metric 1 if at least one compaction is pending; otherwise, the metric reports 0. For segmented state stores, the sum of ones and zeros over all segments is reported.

total-sst-files-size

Total size in bytes of all SST files. For segmented state stores, the sum of the sizes of SST files over all segments is reported.

live-sst-files-size

returns total size (bytes) of all SST files belong to the latest LSM tree. sum for segmented stores

background-errors

returns the accumulated number of background errors. sum for segmented stores

estimate-num-keys

returns estimated number of total keys in the active and unflushed immutable memtables and storage. sum for segmented stores

estimate-table-readers-mem

returns estimated memory used for reading SST tables, excluding memory used in block cache (e.g., filter and index blocks). sum for segmented stores

num-live-versions

returns number of live versions. `Version` is an internal data structure. See version_set.h for details. More live versions often mean more SST files are held from being deleted, by iterators or unfinished compactions. sum for segmented stores

estimate-live-data-size

returns an estimate of the amount of live data in bytes. sum for segmented stores. sum for segmented stores

actual-delayed-write-rate

returns the current actual delayed write rate. 0 means no delay. sum for segmented stores

estimate-oldest-key-time

returns an estimation of oldest key timestamp in the DB. Currently only available for FIFO compaction with compaction_options_fifo.allow_compaction = false. min for segmented stores

block-cache-capacity

returns block cache capacity. sum for segmented stores if separate caches are used, otherwise any.

block-cache-usage

returns the memory size for the entries residing in block cache. sum for segmented stores if separate caches are used, otherwise any.

block-cache-pinned-usage

returns the memory size for the entries being pinned. sum for segmented stores if separate caches are used, otherwise any.




memtables-size

Memory usage of all mem-tables. Mem-tables are in-memory write buffers. Each new key-value pair is first written to a mem-table and each read looks first into the mem-table before it looks on disk. Once a mem-table is full it becomes immutable and it is replaced by a new mem-table. A background thread flushes a mem-table asynchronously to disk. Additionally, mem-tables can also be flushed manually. RocksDB keeps in memory the currently active mem-tables, full but not yet flushed mem-tables, and flushed mem-tables that are kept around to maintain write history in memory.

cache-size

Memory usage by caches. RocksDB caches data in memory for reads. By default, those caches contain only data blocks, i.e., uncompressed sequences of key-value pairs in sorted order. Therefore this cache is often referred to as block cache. However, users can configure RocksDB to also store index and filter blocks in the cache.

table-readers-memory-size

Memory usage of all the table readers. This is the memory RocksDB uses for reading SST tables, excluding the memory used in caches. This are for example index and filter blocks when they are not configured to be stored in the caches. 

Compatibility, Deprecation, and Migration Plan

Since metrics are only added and no other metrics are modified, this KIP should not

  • affect backward-compatibility
  • deprecate public interfaces
  • need a migration plan other than adding the new metrics to its own monitoring component

Rejected Alternatives

  • No labels