Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As one of the metadata of the cluster as a whole, the template's mount is the information that multiple schemaRegions depend on together, and its consistency maintenance overhead is high because intermediate nodes will be stored repeatedly among different schemaRegions. Therefore, the template mount information is stored on the configNode as the unique standard of the cluster, and all the mount information is pushed to each dataNode for caching for use. Therefore, the management of template mount information follows the following architectural approach.

Image Modified

4.3.2 Template mounting process

...

  1. There is no formal or pre-release template mount information on the DataNode: The cluster has no template mount operation or is pre-releasing without notifying the current DataNode at this time, so the execution of auto-registration will perform the registration of ordinary sequences.
  2. Pre-published template mount information on DataNode: The cluster may be in the commit execution stage without notifying the current DataNode, or the current DataNode does not receive the commit notification due to network problems, then the DataNode can initiate a request to the ConfigNode, and if the template mount information on the ConfigNode is in the pre-published state, the DataNode can initiate a request to the ConfigNode. If the information mounted on the ConfigNode is in the pre-release state, the pre-release information cannot be put into use; if it is in the formal mount state, the commit operation can be executed in advance (idempotent), and the information mounted on the template will be put into the template activation operation.
  3. There is formally published template mount information on DataNode: put this information into use directly and execute template activation operation.

When DataNode performs the registration sequence, if the sequence to be registered is detected to overlap with the template mount information (prefix overlap, physical quantity renaming), then regardless of the pre-release or official release of the mount information detected at this time, the ordinary sequence registration is rejected to ensure that the subsequent template activation operation will not be impossible because of the overlapping sequence, and it is guaranteed that there will not be any sequence with the template under the template mount path The sequence with the same name as the physical quantity in the template mount path.

...

Since templates are mounted on trees, the existing MTreeAboveSG can be extended to record the template mount information on it. MTreeAboveSG is aggregated from MNodes, so the template mount information will be stored in the specific MNode.

In order to decouple the information storage of the template itself from that of the metadata tree, the template is Id'd, so that only the mounted template Id is stored on the MNode, and if a specific template is used, the specific template is obtained from the TemplateManager based on the template Id.
The template Id is implemented as a self-incrementing Id. An int variable is maintained in memory, and each time a new template is created, the value of this variable will be used as the Id of the new template, and the variable will be added by one; when restarted and restored, this variable will be set to the maximum value of all current template Id + 1.

MTreeAboveSG will be renamed to ConfigMTree and provide the corresponding query interface for template mount information. configMTree will serialize the template Id of InternalMNode mount when it does snapshot.

Note: ConfigNode.TemplateManager only maintains the template information; the template mount information and part of the metadata tree hierarchy information are maintained by ClusterSchemaManager.
The related code can be found in the setSchemaTemplate method and getPathsSetTemplate method in SchemaRegionMemoryImpl.

5. Template Cache

5.1 Scenario

Template mounts should be a small amount of data in the real scenario, and the template's own information and mount information can be cached in full on the DataNode.

5.2 Overview

The template cache of DataNode mainly includes the template information itself and the template mount information, and the template information is cached in full on DataNode.

In order to avoid performance impact on requests that do not use templates (try not to increase the number of RPCs), the caching and information acquisition of templates will take the way of ConfigNode active push. The reason for not using the DataNode active pull approach is that for requests that do not involve the use of templates, the template cache check must result in a cache miss and no determination that the template is not needed, so there must be an RPC overhead to the configNode. The configNode's proactive push approach ensures that all checks for templates are done locally in the dataNode.

DataNode.ITemplateManager provides the ability to fetch, cache and verify template mount information, and shields the cache access and remote fetch details from other modules.
The caching of the template information itself will be implemented as a simple map structure that maintains two maps, name -> template and id -> template.
There are currently two ideas for caching template mount information.

  1. use a simple caffiene cache to implement, key is the mount path, value is templateId; when performing the check, iterate over the key for path comparison, and check whether there is a mount path as the prefix of the active path. This solution is less efficient, but less difficult to develop. 
  2. implement tree structure cache, i.e., implement a structure similar to configMTree, cache part of the information, and support cache replacement and eviction of tree structure. The tree scheme is more efficient in searching for mount information, but more difficult to develop.

5.3 Process

5.3.1 Pull template information when DataNode starts

When DataNode starts and joins the cluster, after completing registration with ConfigNode, all template information is returned in ConfigNode's response and DataNode completes caching in the active phase.

Image Added

5.3.2 Push template information when template is mounted on ConfigNode

The precondition for a template to be active and effective is that the template is mounted to the corresponding prefix path, so the template and template mount information needs to be pushed to each DataNode in a timely manner.
A separate write lock is set on the ConfigNode for the template mounting operation to ensure that only one template mounting task is being executed at the same time at the cluster level.
Template mounting will be divided into the following steps:

  1. Mount check: check if there is a storage group on the mounted path, check if the mounted path involves a mounted template
  2. Mount information push: synchronize the mount information to all DataNodes
    1. If all pushes are successful, continue to execute the mount operation
    2. If the partial push fails, terminate the push, notify the pushed DataNode to clean up the corresponding information, and return the mount failure
  3. Execute the mount operation: the precondition is that all DataNodes finish the mount operation

Image Added


6. Activate Template

6.1 Function Definition

Manual activation:

Code Block
create timeseries of schema template on root.sg1.d1

Automatic activation: In the case of automatic sequence creation on, if a template has been set, the template is activated first

Code Block
insert into root.sg1.d1(time, temperature, status) values(15.0, true)

After the template is activated, you can view the specific activation path of the template.

Code Block
show paths using schema template t1

+-----------+
|      paths|
+-----------+
|root.sg1.d1|
+-----------+

Once the template is activated, the template's activation node will be converted to a device node and the timeseries of template representations can be queried.

Code Block
show devices root.sg1.d1

+---------------+---------+
|        devices|isAligned|
+---------------+---------+
|    root.sg1.d1|    false|
+---------------+---------+


show timeseries root.sg1.d1.*

+-----------------------+-----+-------------+--------+--------+-----------+----+----------+
|             timeseries|alias|storage group|dataType|encoding|compression|tags|attributes|
+-----------------------+-----+-------------+--------+--------+-----------+----+----------+
|root.sg1.d1.temperature| null|     root.sg1|   FLOAT|     RLE|     SNAPPY|null|      null|
|     root.sg1.d1.status| null|     root.sg1| BOOLEAN|   PLAIN|     SNAPPY|null|      null|
+-----------------------+-----+-------------+--------+--------+-----------+----+----------+

6.2 Scenario

Template mounts should be small amounts of data in real scenarios, but template activation is present in huge amounts in real scenarios.

6.3 Overview

6.3.1 Activation check

Before performing template activation, it is necessary to obtain the template mount information to check whether the template can be activated on the specified path.
When performing template activation, it is necessary to check whether there is an overlap between the sub-nodes of the device node in the schemRegion and the physical quantities within the template, and disable activation if there is an overlap.
Similarly, when creating a time series, it is necessary to check whether the time series to be created overlaps with the physical quantities within the template.

6.3.2 Activation infomation store

The current templates use single-level templates, i.e., the nodes within the templates are all representing physical quantities and are all leaf nodes of the characterization metadata tree at the time of activation, so the activation of the templates are at the device nodes. Each device node belongs to only one schemaRegion under the distributed metadata tree slicing rules, and there will be no redundant storage in the system (with a copy number of 1), so the activation information of the template will be stored in the device node within the schemaRegion. It is indicated by the int templateId field on the device node indicating both whether the template is active and the ID of the template used.

6.3.3 Query

After the template is activated, in order to avoid the dependency of the schemaRegion under the consensus layer on the template information, the query of sequence information will be divided into two parts as follows.

  1. ordinary time series query: both show timeseries and schemaFetch are executed according to the existing logic.
  2. Template time series query.
    1. check whether there may be a template time series that satisfies the condition, i.e. whether mountPath.measurement and mountPath.**.measurement can be matched by the input pathPattern
    2. For the possible match, you need to query the template activation information simultaneously at query time
      1. For Schema Query, the query result is directly constructed into the form of cli result set, so it needs to complete the template time series transformation during operator execution, so the template information needs to be passed in during query
      2. For Schema Fetch, the query result is in the form of a SchemaTree, so the template activation information can be saved and returned in the SchemaTree, which can facilitate the joint caching of templates, mounts and activation information later.

The above process involves the behavior of template information checking, which should be part of metadata analysis for distributed tasks, so the related interface will be provided by SchemaFetcher, which internally relies on TemplateManager for implementation.

7. Deactivate, Unset and Drop Template

7.1 Function Definition

7.1.1 Deactivete

De-activate the template from the specified path and the time series characterized by the activated template and its data will be cleared.
The input path can be a path pattern to support batch de-activation.

Code Block
deactivate schema template t1 from root.sg.d1

deactivate schema template t1 from root.sg.*

delete timeseries of schema template t1 from root.sg.*

delete timeseries of schema template from root.sg1.*, root.sg2.*

7.1.2 Unset

Unset of templates is only supported for inactive templates. So if you want to unset an active template, you need to perform an deactivete operation on all its active paths.

Code Block
unset schema template t1 from root.sg1.d1

7.1.3 Drop

Template deletion only supports deleting unmounted templates. So if you want to delete a mounted template, you need to unmounted it.

Code Block
drop schema template t1

7.2 Overview

7.1.1 Deactivete

Template de-activation is similar to sequence deletion, an operation that involves both data and metadata deletion, with the following main process

  1. schemaRegion internally blacklists the specified template activation information
  2. Broadcast all dataNodes and clean up the relevant schema cache
  3. DataRegion performs data deletion and competes with data writes in the upper consensus layer for concurrency control and locks
  4. SchemaRegion internally cleans up the blacklist and performs template de-activation

7.1.2 Unset

The main process of unmounting a template is as follows.

  1. configNode internally removes the template's mount information logically and blacklists it
  2.  configNode broadcasts the DataNode, invalidating the corresponding template mount information (blacklisted)
  3. check if the template is active (competing locks on the same dataNode as the template activation operation (checksum + execution))
    1. if true
      1. roll back the blacklisted operation
      2. report an error and disable uninstallation of the activated template
    2. else then proceed with subsequent processes
      1. configNode broadcasts dataNode for cleanup of template mount information
      2. configNode internally deletes the template mount information and cleans up the blacklist

Transfer the verification of the template activation operation's mount information from the directly connected node of the client to the node where the corresponding schemaRegion is located, and perform lock competition with the invalidation operation of the mount information on that node, thus ensuring local consistency: either the verification fails and the activation fails, or the corresponding mount information can be detected in subsequent checks after successful activation

7.1.3 Drop

The main process of template deletion is as follows

  1. configNode internally checks whether the template is mounted or not
  2. configNode internal template logic deletion, blacklisting
  3. configNode broadcasts dataNode to clear the template cache
  4. configNode internal delete template
  5. configNode internally cleans up the blacklist

Template deletion and template mounting are checked and performed inside configNode, so consistency can be controlled inside configNode.