Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • bytes-written-rate [bytes/s]
  • bytes-written-total [bytes]
  • bytes-read-rate [bytes/s]
  • bytes-read-total [bytes]
  • memtable-bytes-flushed-rate [bytes/s]
  • memtable-bytes-flushed-total [bytes]
  • memtable-flush-time-(avg|min|max) [ms]
  • memtable-hit-rateratio
  • block-cache-data-hit-rateratio
  • block-cache-index-hit-rateratio
  • block-cache-filter-hit-rateratio
  • bytes-read-compaction-rate [bytes/s]
  • bytes-written-compaction-rate [bytes/s]
  • compaction-time-(avg|min|max) [ms]
  • write-stall-duration-(avg|total) [ms]
  • num-open-files
  • num-file-errors-total

...

The metrics should help to identify flushes as bottlenecks.

memtable-hit-

...

ratio

When data is read from RocksDB, the memtable is consulted firstly to find the data. This metric measures the number of hits with respect to the number of all lookups into the memtable. Hence, the formula for this metric is hits/(hits + misses).

A low memtable-hit-rate ratio might indicate a too small memtable.

block-cache-data-hit-

...

ratio, block-cache-index-hit-

...

ratio, and block-cache-filter-hit-

...

ratio

If data is not found in the memtable, the block cache is consulted. Metric block-cache-data-hit-rate ratio measures the number of hits for data blocks with respect to the number of all lookups for data blocks into the block cache. The formula for this metric is the equivalent to the one for memtable-hit-rateratio.

Metrics block-cache-index-hit-rate ratio and block-cache-filter-hit-rate ratio measure the hit rates ratio for index and filter blocks if they are cached in the block cache. By default index and filter blocks are cached outside of block cache. Users can configure RocksDB to include index and filter blocks into the block cache to better control the memory consumption of RocksDB. If users do not opt to cache index and filter blocks in the block cache, the value of these metrics should stay at zero.

A low hit-rate ratio might indicate a too small block cache.

...

Part of the data in RocksDB is kept in files. This files need to be opened and closed. Metric num-open-files measures the number of currently open files and metric num-file-errors-total measures the number of file errors. Both metrics may help to find issues connected to OS and file systems.  

Metrics for

...

States Consisting of Multiple RocksDB Instances  

A state store might consist of multiple state stores. Hence, we can distinguish between logical state stores, i.e., the state store exposed in the public interface, and shown in the topology description is a logical state store. Each logical state store might consist of one or multiple physical state stores, i.e., the actual state stores instances of the that hold the data of a logical state store. CurrentlyFor example, window and session stores are implemented as segmented stores, i.e. That is, each store consists of multiple segments. For persistent segmented stores, each segment is a distinct physical store and each physical store is a distinct RocksDB instance. While the fact that some logical state stores consists consist of multiple physical state stores is an implementation detail, it is still important for the sake of documentation to specify how metrics for such state stores are exposed and computed.

First of all, I propose to expose RocksDB metrics for each logical state store that contains one or multiple physical RocksDB instances. That is, there will be just one set of the above mentioned metrics for each logical state store and not one set for each physical RocksDB instance. Hence, the values of the tags rocksdb-(window|session)-state-id will only contain the common prefix of all physical RocksDB instances belonging to one logical state store. Furthermore, the metrics need to be aggregated over all physical RocksDB instances belonging to the same logical state store. How to aggregate the above metrics over multiple RocksDB instances is specified in the following (I is the set of RocksDB instances per logical state store):

...

Rates

LaTeX Formatting
For recorded metrics values in a sample, $\textrm{metric-rate} = \frac{\sum_{i \in I} \textrm{metric}_{i}}{\textrm{time interval of sample}}$

...

Affected metrics: bytes-written-rate, bytes-read-rate, memtable-bytes-flushed-rate, bytes-read-compaction-rate, bytes-written-compaction-rate

Hit

...

Ratios

LaTeX Formatting
$\textrm{metric-hit-rateratio} = \frac{\sum_{i \in I}\textrm{hits}_i}{\sum_{i \in I}\textrm{hits}_i + \sum_{i \in I}\textrm{misses}_i}$

Affected metrics: memtable-hit-rateratio, block-cache-data-hit-rateratio, block-cache-index-hit-rateratio, block-cache-filter-hit-rateratio

Totals

LaTeX Formatting
$\textrm{metric-total} = \sum_{i \in I} \textrm{metric}_i$

...