1. Requirement

1.1. Backgroud

An automotive equipment provider supplies its own in-vehicle sensor assemblies to automotive assemblers. Such assemblies consist of about 30 individual sensors that are all of the same model, i.e., each assembly contains the same individual sensor.

The total number of components that need to be supported is about 1 million, and the equipment provider needs to store, query and analyze the data collected by these sensors to realize its business value.

1.2. Challenge 

Analyzing the problem background and the actual situation of IoTDB for metadata storage overhead analysis, it is known that schema manager needs to maintain 30*1 million that is 30 million working nodes and related metadata. According to the previous experience, each working node memory occupation is about 300 bytes, and the overall memory occupation will reach 9GB.

Then we analyze the test and deployment environment, the total machine memory of the test and deployment environment is 32GB. There are a lot of I/O operations in the database, we need to keep a certain amount of off-heap memory and system memory. So the JVM's own heap memory is set at 20GB is more reasonable. Combined with the above analysis, metadata will account for 45% of the memory size, which will bring huge memory pressure to write and query. Under the case of good read and write performance, IoTDB tends to occupy only 10% of the total heap memory for metadata, and 45% of the memory in this scenario is bound to affect the read and write performance of the database.

1.3. Target

Eliminate duplicate definitions of leaf nodes in the metadata tree, thus reducing metadata memory usage and storage and improving the overall read and write performance of IoTDB.

1.4. Analysis 

The key point is that these components are all the same model, which means that the sensor types in each component are exactly the same.Bbut the metadata of each sensor in each component is recorded in IoTDB, which is actually a memory redundancy. We should store only one copy of these same metadata to save precious memory.
Today's industrial manufacturing companies are using large-scale, batch production to control costs and risks, and the case of multiple of the same sensor component is universal. So the above solution has some industrial scenario universality.


2. Solution

2.1 Example 

Provide users with a physical quantity template function to save all sensor metadata of a class of devices in one template, and mount the template to a storage group or device group node, indicating that all devices under that storage group or device group are of the same model with the same sensor type, thus eliminating memory redundancy.


Examples of user interfaces are as follows.

create schema template car_template(
 (s1 INT32 encoding=Gorilla compression=SNAPPY),
 (s2 FLOAT encoding=RLE compression=SNAPPY),
 ) 

set schema template car_template to root.beijing


The above statement creates a physical quantity template named car_template with two sensors s1, s2. s1 is a 32-bit integer with Gorilla encoding and Snappy compression, and s2 is a 32-bit floating point number with RLE encoding and Snappy compression. Then we mount the physical quantity template to the storage group root.beijing.


After executing the above operation, the user can write data normally, and the system will determine the metadata to be written according to the physical quantity template. Writing data that is belonged to template can still automatically create a time series, consistent with the default behavior of the system.

2.2 Effection on MTree


Two fields are added to each MNode, a pointer to a physical quantity template (8 bytes), and a boolean value to determine if the template is used (1 byte).


3. Test and Evalution

In the actual test, according to the above calculated device metadata memory occupation is only 20M. Considering that IoTDB metadata management uses related HashMap and other data structures, the memory occupied by IO reading and writing in Mlog, the upper limit of metadata memory is set to 1G in the actual scenario (the part that cannot be used up will also be used by other parts of the system). The metadata occupies 5% of the memory ratio in the heap, which is in line with the normal read and write memory load of IoTDB. There is no OOM phenomenon in the test and long test, which proves that the physical volume template does reduce the memory redundancy.

4. Set and Show Template

4.1 Function Definition

4.1.1 Set Template

A created template can be mounted to a node at any level of the metadata tree.
There is at most one mounted template on any path, i.e. if a node is mounted with a template, neither its ancestor node nor its descendant nodes can mount the template.
The subtree of this node can sense the mounted template and perform subsequent activation operations.
If the physical quantity names of existing time series on the subtree of the target path overlap with the physical quantity names in the template, mounting the template to that path is prohibited.

set schema template t1 to root.sg1.d1

4.1.2 Show Template

Queries the path where the specified template is mounted.

show paths set schema template t1

+-----------+
|      paths|
+-----------+
|root.sg1.d1|
+-----------+

4.2 Scenario

Template mounting operations should be small in practical scenarios, and it is generally recommended that users mount templates at the storage group level of the metadata tree or at nodes as high up as possible.
It is not recommended to abuse templates. Scenarios such as massive templates and massive mounts are not considered. It is enough to ensure that the system does not crash in this case, and low performance is normal.

4.3 Overview

4.3.1 Template mounting infomation

Since the logical metadata tree is sliced on the physical storage in distributed, the use of templates needs to be designed with specific slicing rules.
The rules for slicing the metadata tree in distributed are: first by storage group, then within the storage group then hash slots by device path, and each slot will have a schemaRegion to which it belongs. the rules have the following characteristics.

  1. the nodes above the storage group in the metadata tree will be stored persistently in the configNode, and each dataNode will be cached in conjunction with the operation.
  2. device nodes and physical nodes (leaf nodes) in the metadata tree have clear schemaRegion to which they belong, and there are no duplicate device nodes or physical nodes between different schemaRegions.
  3. Intermediate nodes on the path between the metadata tree species storage group nodes and device nodes may have duplicate storage in different schemaRegion.

As one of the metadata of the cluster as a whole, the template's mount is the information that multiple schemaRegions depend on together, and its consistency maintenance overhead is high because intermediate nodes will be stored repeatedly among different schemaRegions. Therefore, the template mount information is stored on the configNode as the unique standard of the cluster, and all the mount information is pushed to each dataNode for caching for use. Therefore, the management of template mount information follows the following architectural approach.

4.3.2 Template mounting process

  1. Pre-mount template information on ConfigNode, each DataNode caches and puts into use the formally mounted templates normally when it joins the cluster, and caches the pre-mounted templates in pre-release state.
  2. ConfigNode pre-publishes the template mount information to all DataNodes, and DataNode caches it in pre-published state. If a DataNode executes abnormally during pre-publishing, it rolls back the pre-publishing and pre-mounting operations; if a DataNode is never connected, it keeps retrying until its status changes to Unknown; if all of them execute successfully, it executes subsequent operations.
  3. ConfigNode checks whether there is any sequence with the same name as the physical quantity of the template in all related schemaRegion, if it exists, the template mounting operation fails and rolls back the pre-mounting and pre-publishing operation; if it does not exist, it continues to execute the subsequent process.
  4. ConfigNode sets the pre-mounted template to the official mount state.
  5. ConfigNode broadcasts to all DataNodes to commit the pre-release information, and DataNode puts the pre-release template mount information into normal use. If there is an abnormality in the execution of a DataNode during commit, all the template mounts are re-pushed to the DataNode according to the abnormality (i.e., the cache of the DataNode is reset); if a DataNode is never connected, it keeps retrying until its status changes to Unknown.

Once the commit phase is entered, the DataNode that has completed the commit operation may start to execute the template activation operation triggered by the data write. At this point, it is necessary to ensure that the template mount is officially put into use in the cluster, so the DataNode with the status of Running must ensure the final consistency of its template cache by retrying or resetting measures.

The entire process is shown in the following figure. To highlight the implementation, the following figure simplifies the communication process between each dataNode and configNode node, and only shows the execution process, in fact, all the judgment logic and branch selection in the following figure are executed in the configNode node.

4.3.3 Concurrent handling of auto-registration and template mounts

When performing auto-registration, it is necessary to check whether there is a template mount situation, and then determine whether to perform a template activation operation or register a common sequence. There are several scenarios for this concurrency scenario as follows.

  1. There is no formal or pre-release template mount information on the DataNode: The cluster has no template mount operation or is pre-releasing without notifying the current DataNode at this time, so the execution of auto-registration will perform the registration of ordinary sequences.
  2. Pre-published template mount information on DataNode: The cluster may be in the commit execution stage without notifying the current DataNode, or the current DataNode does not receive the commit notification due to network problems, then the DataNode can initiate a request to the ConfigNode, and if the template mount information on the ConfigNode is in the pre-published state, the DataNode can initiate a request to the ConfigNode. If the information mounted on the ConfigNode is in the pre-release state, the pre-release information cannot be put into use; if it is in the formal mount state, the commit operation can be executed in advance (idempotent), and the information mounted on the template will be put into the template activation operation.
  3. There is formally published template mount information on DataNode: put this information into use directly and execute template activation operation.

When DataNode performs the registration sequence, if the sequence to be registered is detected to overlap with the template mount information (prefix overlap, physical quantity renaming), then regardless of the pre-release or official release of the mount information detected at this time, the ordinary sequence registration is rejected to ensure that the subsequent template activation operation will not be impossible because of the overlapping sequence, and it is guaranteed that there will not be any sequence with the template under the template mount path The sequence with the same name as the physical quantity in the template mount path.

4.3.4 Concurrent handling of template mounts and new DataNodes

When performing template mounting operations, there may be cluster expansion and contraction, DataNode addition and deletion operations. Mainly consider the case of adding DataNode.

  1. When a DataNode is added before the completion of pre-mounting, this DataNode only needs to synchronize the rest of the template information of formal mounting, and this pre-mounting information will be processed in the subsequent phase.
  2. When a new DataNode is added in the pre-release phase, the DataNode will cache the pre-mounted information to the pre-release state when synchronizing the template mount information, as a remedy for the pre-release phase (it may get no new DataNode in the node information).
  3. When a new DataNode is added in the commit phase, the template mount information is directly cached in the official state and put into use since it is already in the official state on the ConfigNode.

4.4 Detailed Design

Since templates are mounted on trees, the existing MTreeAboveSG can be extended to record the template mount information on it. MTreeAboveSG is aggregated from MNodes, so the template mount information will be stored in the specific MNode.

In order to decouple the information storage of the template itself from that of the metadata tree, the template is Id'd, so that only the mounted template Id is stored on the MNode, and if a specific template is used, the specific template is obtained from the TemplateManager based on the template Id.
The template Id is implemented as a self-incrementing Id. An int variable is maintained in memory, and each time a new template is created, the value of this variable will be used as the Id of the new template, and the variable will be added by one; when restarted and restored, this variable will be set to the maximum value of all current template Id + 1.

MTreeAboveSG will be renamed to ConfigMTree and provide the corresponding query interface for template mount information. configMTree will serialize the template Id of InternalMNode mount when it does snapshot.

Note: ConfigNode.TemplateManager only maintains the template information; the template mount information and part of the metadata tree hierarchy information are maintained by ClusterSchemaManager.
The related code can be found in the setSchemaTemplate method and getPathsSetTemplate method in SchemaRegionMemoryImpl.

5. Template Cache

5.1 Scenario

Template mounts should be a small amount of data in the real scenario, and the template's own information and mount information can be cached in full on the DataNode.

5.2 Overview

The template cache of DataNode mainly includes the template information itself and the template mount information, and the template information is cached in full on DataNode.

In order to avoid performance impact on requests that do not use templates (try not to increase the number of RPCs), the caching and information acquisition of templates will take the way of ConfigNode active push. The reason for not using the DataNode active pull approach is that for requests that do not involve the use of templates, the template cache check must result in a cache miss and no determination that the template is not needed, so there must be an RPC overhead to the configNode. The configNode's proactive push approach ensures that all checks for templates are done locally in the dataNode.

DataNode.ITemplateManager provides the ability to fetch, cache and verify template mount information, and shields the cache access and remote fetch details from other modules.
The caching of the template information itself will be implemented as a simple map structure that maintains two maps, name -> template and id -> template.
There are currently two ideas for caching template mount information.

  1. use a simple caffiene cache to implement, key is the mount path, value is templateId; when performing the check, iterate over the key for path comparison, and check whether there is a mount path as the prefix of the active path. This solution is less efficient, but less difficult to develop. 
  2. implement tree structure cache, i.e., implement a structure similar to configMTree, cache part of the information, and support cache replacement and eviction of tree structure. The tree scheme is more efficient in searching for mount information, but more difficult to develop.

5.3 Process

5.3.1 Pull template information when DataNode starts

When DataNode starts and joins the cluster, after completing registration with ConfigNode, all template information is returned in ConfigNode's response and DataNode completes caching in the active phase.

5.3.2 Push template information when template is mounted on ConfigNode

The precondition for a template to be active and effective is that the template is mounted to the corresponding prefix path, so the template and template mount information needs to be pushed to each DataNode in a timely manner.
A separate write lock is set on the ConfigNode for the template mounting operation to ensure that only one template mounting task is being executed at the same time at the cluster level.
Template mounting will be divided into the following steps:

  1. Mount check: check if there is a storage group on the mounted path, check if the mounted path involves a mounted template
  2. Mount information push: synchronize the mount information to all DataNodes
    1. If all pushes are successful, continue to execute the mount operation
    2. If the partial push fails, terminate the push, notify the pushed DataNode to clean up the corresponding information, and return the mount failure
  3. Execute the mount operation: the precondition is that all DataNodes finish the mount operation


6. Activate Template

6.1 Function Definition

Manual activation:

create timeseries of schema template on root.sg1.d1

Automatic activation: In the case of automatic sequence creation on, if a template has been set, the template is activated first

insert into root.sg1.d1(time, temperature, status) values(15.0, true)

After the template is activated, you can view the specific activation path of the template.

show paths using schema template t1

+-----------+
|      paths|
+-----------+
|root.sg1.d1|
+-----------+

Once the template is activated, the template's activation node will be converted to a device node and the timeseries of template representations can be queried.

show devices root.sg1.d1

+---------------+---------+
|        devices|isAligned|
+---------------+---------+
|    root.sg1.d1|    false|
+---------------+---------+


show timeseries root.sg1.d1.*

+-----------------------+-----+-------------+--------+--------+-----------+----+----------+
|             timeseries|alias|storage group|dataType|encoding|compression|tags|attributes|
+-----------------------+-----+-------------+--------+--------+-----------+----+----------+
|root.sg1.d1.temperature| null|     root.sg1|   FLOAT|     RLE|     SNAPPY|null|      null|
|     root.sg1.d1.status| null|     root.sg1| BOOLEAN|   PLAIN|     SNAPPY|null|      null|
+-----------------------+-----+-------------+--------+--------+-----------+----+----------+

6.2 Scenario

Template mounts should be small amounts of data in real scenarios, but template activation is present in huge amounts in real scenarios.

6.3 Overview

6.3.1 Activation check

Before performing template activation, it is necessary to obtain the template mount information to check whether the template can be activated on the specified path.
When performing template activation, it is necessary to check whether there is an overlap between the sub-nodes of the device node in the schemRegion and the physical quantities within the template, and disable activation if there is an overlap.
Similarly, when creating a time series, it is necessary to check whether the time series to be created overlaps with the physical quantities within the template.

6.3.2 Activation infomation store

The current templates use single-level templates, i.e., the nodes within the templates are all representing physical quantities and are all leaf nodes of the characterization metadata tree at the time of activation, so the activation of the templates are at the device nodes. Each device node belongs to only one schemaRegion under the distributed metadata tree slicing rules, and there will be no redundant storage in the system (with a copy number of 1), so the activation information of the template will be stored in the device node within the schemaRegion. It is indicated by the int templateId field on the device node indicating both whether the template is active and the ID of the template used.

6.3.3 Query

After the template is activated, in order to avoid the dependency of the schemaRegion under the consensus layer on the template information, the query of sequence information will be divided into two parts as follows.

  1. ordinary time series query: both show timeseries and schemaFetch are executed according to the existing logic.
  2. Template time series query.
    1. check whether there may be a template time series that satisfies the condition, i.e. whether mountPath.measurement and mountPath.**.measurement can be matched by the input pathPattern
    2. For the possible match, you need to query the template activation information simultaneously at query time
      1. For Schema Query, the query result is directly constructed into the form of cli result set, so it needs to complete the template time series transformation during operator execution, so the template information needs to be passed in during query
      2. For Schema Fetch, the query result is in the form of a SchemaTree, so the template activation information can be saved and returned in the SchemaTree, which can facilitate the joint caching of templates, mounts and activation information later.

The above process involves the behavior of template information checking, which should be part of metadata analysis for distributed tasks, so the related interface will be provided by SchemaFetcher, which internally relies on TemplateManager for implementation.

7. Deactivate, Unset and Drop Template

7.1 Function Definition

7.1.1 Deactivete

De-activate the template from the specified path and the time series characterized by the activated template and its data will be cleared.
The input path can be a path pattern to support batch de-activation.

deactivate schema template t1 from root.sg.d1

deactivate schema template t1 from root.sg.*

delete timeseries of schema template t1 from root.sg.*

delete timeseries of schema template from root.sg1.*, root.sg2.*

7.1.2 Unset

Unset of templates is only supported for inactive templates. So if you want to unset an active template, you need to perform an deactivete operation on all its active paths.

unset schema template t1 from root.sg1.d1

7.1.3 Drop

Template deletion only supports deleting unmounted templates. So if you want to delete a mounted template, you need to unmounted it.

drop schema template t1

7.2 Overview

7.1.1 Deactivete

Template de-activation is similar to sequence deletion, an operation that involves both data and metadata deletion, with the following main process

  1. schemaRegion internally blacklists the specified template activation information
  2. Broadcast all dataNodes and clean up the relevant schema cache
  3. DataRegion performs data deletion and competes with data writes in the upper consensus layer for concurrency control and locks
  4. SchemaRegion internally cleans up the blacklist and performs template de-activation

7.1.2 Unset

The main process of unmounting a template is as follows.

  1. configNode internally removes the template's mount information logically and blacklists it
  2.  configNode broadcasts the DataNode, invalidating the corresponding template mount information (blacklisted)
  3. check if the template is active (competing locks on the same dataNode as the template activation operation (checksum + execution))
    1. if true
      1. roll back the blacklisted operation
      2. report an error and disable uninstallation of the activated template
    2. else then proceed with subsequent processes
      1. configNode broadcasts dataNode for cleanup of template mount information
      2. configNode internally deletes the template mount information and cleans up the blacklist

Transfer the verification of the template activation operation's mount information from the directly connected node of the client to the node where the corresponding schemaRegion is located, and perform lock competition with the invalidation operation of the mount information on that node, thus ensuring local consistency: either the verification fails and the activation fails, or the corresponding mount information can be detected in subsequent checks after successful activation

7.1.3 Drop

The main process of template deletion is as follows

  1. configNode internally checks whether the template is mounted or not
  2. configNode internal template logic deletion, blacklisting
  3. configNode broadcasts dataNode to clear the template cache
  4. configNode internal delete template
  5. configNode internally cleans up the blacklist

Template deletion and template mounting are checked and performed inside configNode, so consistency can be controlled inside configNode.





  • No labels