Status
Current state: Accepted
Discussion thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/202005.mbox/%3CCADR0NwzJBJa6WihnpmGj0R%2BYPVrojq4Kg_hOArNEytHAG-tZAQ%40mail.gmail.com%3E
JIRA:
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Monitoring the memory used by RocksDB instances run in a Kafka Streams application would allow to react to an increased memory demand by RocksDB before the application runs out of memory. Although, the memory used by RocksDB can be bounded in Kafka Streams, the bound is not a hard limit. Currently, the metrics exposed by Kafka Streams include information about RocksDB instances run by an application (see KIP-471: Expose RocksDB Metrics in Kafka Streams for more details) but they do not provide any information about the memory usage of the RocksDB instances. This KIP proposes to add metrics that record the memory used by RocksDB to Kafka Streams.
Public Interfaces
Each added metric will be on store-level and have the following tags:
- type = stream-state-metrics
- thread-id = [thread ID]
- task-id = [task ID]
- rocksdb-state-id = [store ID] for key-value stores
- rocksdb-session-state-id = [store ID] for session stores
- rocksdb-window-state-id = [store ID] for window stores
The following metrics will be exposed in the Kafka Streams' metrics
- memtables-size [bytes]
- caches-size [bytes]
- table-readers-memory-size [bytes]
The recording level for all metrics will be INFO.
Note that the recorded values are approximations of the memory usages.
Proposed Changes
In this section, I will explain the meaning of the metrics listed in the previous section.
num-immutable-mem-table
number of immutable memtables that have not yet been flushed.
mem-table-flush-pending
returns 1 if a memtable flush is pending; otherwhise, returns 0
compaction-pending
returns 1 if at least one compaction is pending; otherwise, return 0
background-errors
returns the accumulated number of background errors
cur-size-active-mem-table
returns approximate size of active memtable in bytes
cur-size-all-mem-tables
returns approximate size of active and unflushed immutable memtables
size-all-mem-tables
returns approximate size of active, unflushed immutable, and pinned immutable memtables (bytes).
num-entries-active-mem-table
returns total number of entries in the active memtable.
num-entries-imm-mem-tables
returns total number of entries in the unflushed immutable memtables.
num-deletes-active-mem-table
returns total number of delete entries in the active memtable.
num-deletes-imm-mem-tables
returns total number of delete entries in the unflushed immutable memtables.
estimate-num-keys
returns estimated number of total keys in the active and unflushed immutable memtables and storage.
estimate-table-readers-mem
returns estimated memory used for reading SST tables, excluding memory used in block cache (e.g., filter and index blocks).
num-live-versions
returns number of live versions. `Version` is an internal data structure. See version_set.h for details. More live versions often mean more SST files are held from being deleted, by iterators or unfinished compactions.
estimate-live-data-size
returns an estimate of the amount of live data in bytes.
min-log-number-to-keep
min-obsolete-sst-number-to-keep
total-sst-files-size
live-sst-files-size
base-level
estimate-pending-compaction-bytes
num-running-compactions
num-running-flushes
actual-delayed-write-rate
is-write-stopped
estimate-oldest-key-time
block-cache-capacity
block-cache-usage
block-cache-pinned-usage
memtables-size
Memory usage of all mem-tables. Mem-tables are in-memory write buffers. Each new key-value pair is first written to a mem-table and each read looks first into the mem-table before it looks on disk. Once a mem-table is full it becomes immutable and it is replaced by a new mem-table. A background thread flushes a mem-table asynchronously to disk. Additionally, mem-tables can also be flushed manually. RocksDB keeps in memory the currently active mem-tables, full but not yet flushed mem-tables, and flushed mem-tables that are kept around to maintain write history in memory.
cache-size
Memory usage by caches. RocksDB caches data in memory for reads. By default, those caches contain only data blocks, i.e., uncompressed sequences of key-value pairs in sorted order. Therefore this cache is often referred to as block cache. However, users can configure RocksDB to also store index and filter blocks in the cache.
table-readers-memory-size
Memory usage of all the table readers. This is the memory RocksDB uses for reading SST tables, excluding the memory used in caches. This are for example index and filter blocks when they are not configured to be stored in the caches.
Compatibility, Deprecation, and Migration Plan
Since metrics are only added and no other metrics are modified, this KIP should not
- affect backward-compatibility
- deprecate public interfaces
- need a migration plan other than adding the new metrics to its own monitoring component