Background

TimeIndex is a map from a prefix path to a timestamp, which serves two purposes in IoTDB: 1. it works as a borderline between seq and unseq data (it is not actually applied but it is an ideal data structure for this job); 2. it records the time range of data in a TsFile (in TsFileResource).

Currently, we have two implementations of TimeIndex: DeviceTimeIndex, which takes the deviceId (time-series path excluding the last level) of time-series as keys; FileTimeIndex, which uses the storage group name as keys (as SG names can be inferred from the context, it does not really record them). And we are planning to have SeriesTimeIndex, whose key is the full path of timeseries.

Either solution is a non-dynamic one that has its own limitations. When there are too many devices or time-series, DeviceTimeIndex and SeriesTimeIndex become infeasible as the number of keys is enormous and the index will be too memory-consuming. While when using FileTimeIndex, the files cannot be efficiently filtered when time series are distributed unevenly.

The ideal solution should be able to restrain the memory footprint to a given budget while maximizing the filtering ability of the structure.


Method

We propose AdaptiveTimeIndex (or DynamicTimeIndex), a weighted trie with path compression and auto-merging that may strict its memory usage to a budget.

Structure Definition

 The proposed data structure is a variant of trie, each of its nodes may have the following attributes: 

  1. node name. The name of the node and all its ascendant nodes make up a prefix path of time series. All nodes have this attribute.
  2. time interval. The time interval of the prefix represented by this node within the associated structure (MemTable or TsFile). Only leaves have this attribute.
  3. access count. The number of times that the node is accessed. Only leaves have this attribute.

The structure also has a memory budget b, in the number of nodes of bytes.

Construction

TimeIndex is only constructed during the ingestion process, we consider the ingestion of a time series ts with n points and how it updates an AdaptiveTimeIndex:

  1. search the path of ts along the TimeIndex, create intermediate nodes if necessary as normal trie does;
  2. if a leave node is hit during the search, update the time interval and access count of the node (the new time interval of the node is the union of the old one and the new points, and the new access count is the old count plus n);
  3. if a new leave node is created, the time interval of the new node is the time interval of the new nodes and the access count is n;
  4. if a new leave node is created, check the size of the TimeIndex, if it is over budget b, select a sub-tree and merge it into its root by summing up access counts and use the union of all time intervals in its leave as its time interval;
  • No labels