Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As one of the metadata of the cluster as a whole, the template's mount is the information that multiple schemaRegions depend on together, and its consistency maintenance overhead is high because intermediate nodes will be stored repeatedly among different schemaRegions. Therefore, the template mount information is stored on the configNode as the unique standard of the cluster, and all the mount information is pushed to each dataNode for caching for use. Therefore, the management of template mount information follows the following architectural approach.

Image Modified

4.3.2 Template mounting process

...

  1. There is no formal or pre-release template mount information on the DataNode: The cluster has no template mount operation or is pre-releasing without notifying the current DataNode at this time, so the execution of auto-registration will perform the registration of ordinary sequences.
  2. Pre-published template mount information on DataNode: The cluster may be in the commit execution stage without notifying the current DataNode, or the current DataNode does not receive the commit notification due to network problems, then the DataNode can initiate a request to the ConfigNode, and if the template mount information on the ConfigNode is in the pre-published state, the DataNode can initiate a request to the ConfigNode. If the information mounted on the ConfigNode is in the pre-release state, the pre-release information cannot be put into use; if it is in the formal mount state, the commit operation can be executed in advance (idempotent), and the information mounted on the template will be put into the template activation operation.
  3. There is formally published template mount information on DataNode: put this information into use directly and execute template activation operation.

When DataNode performs the registration sequence, if the sequence to be registered is detected to overlap with the template mount information (prefix overlap, physical quantity renaming), then regardless of the pre-release or official release of the mount information detected at this time, the ordinary sequence registration is rejected to ensure that the subsequent template activation operation will not be impossible because of the overlapping sequence, and it is guaranteed that there will not be any sequence with the template under the template mount path The sequence with the same name as the physical quantity in the template mount path.

...

Since templates are mounted on trees, the existing MTreeAboveSG can be extended to record the template mount information on it. MTreeAboveSG is aggregated from MNodes, so the template mount information will be stored in the specific MNode.

In order to decouple the information storage of the template itself from that of the metadata tree, the template is Id'd, so that only the mounted template Id is stored on the MNode, and if a specific template is used, the specific template is obtained from the TemplateManager based on the template Id.
The template Id is implemented as a self-incrementing Id. An int variable is maintained in memory, and each time a new template is created, the value of this variable will be used as the Id of the new template, and the variable will be added by one; when restarted and restored, this variable will be set to the maximum value of all current template Id + 1.

MTreeAboveSG will be renamed to ConfigMTree and provide the corresponding query interface for template mount information. configMTree will serialize the template Id of InternalMNode mount when it does snapshot.

Note: ConfigNode.TemplateManager only maintains the template information; the template mount information and part of the metadata tree hierarchy information are maintained by ClusterSchemaManager.
The related code can be found in the setSchemaTemplate method and getPathsSetTemplate method in SchemaRegionMemoryImpl.

5. Template Cache

5.1 Scenario

Template mounts should be a small amount of data in the real scenario, and the template's own information and mount information can be cached in full on the DataNode.

5.2 Overview

The template cache of DataNode mainly includes the template information itself and the template mount information, and the template information is cached in full on DataNode.

In order to avoid performance impact on requests that do not use templates (try not to increase the number of RPCs), the caching and information acquisition of templates will take the way of ConfigNode active push. The reason for not using the DataNode active pull approach is that for requests that do not involve the use of templates, the template cache check must result in a cache miss and no determination that the template is not needed, so there must be an RPC overhead to the configNode. The configNode's proactive push approach ensures that all checks for templates are done locally in the dataNode.

DataNode.ITemplateManager provides the ability to fetch, cache and verify template mount information, and shields the cache access and remote fetch details from other modules.
The caching of the template information itself will be implemented as a simple map structure that maintains two maps, name -> template and id -> template.
There are currently two ideas for caching template mount information.

  1. use a simple caffiene cache to implement, key is the mount path, value is templateId; when performing the check, iterate over the key for path comparison, and check whether there is a mount path as the prefix of the active path. This solution is less efficient, but less difficult to develop. 
  2. implement tree structure cache, i.e., implement a structure similar to configMTree, cache part of the information, and support cache replacement and eviction of tree structure. The tree scheme is more efficient in searching for mount information, but more difficult to develop.

5.3 Process

5.3.1 Pull template information when DataNode starts

When DataNode starts and joins the cluster, after completing registration with ConfigNode, all template information is returned in ConfigNode's response and DataNode completes caching in the active phase.

Image Added

5.3.2 Push template information when template is mounted on ConfigNode

The precondition for a template to be active and effective is that the template is mounted to the corresponding prefix path, so the template and template mount information needs to be pushed to each DataNode in a timely manner.
A separate write lock is set on the ConfigNode for the template mounting operation to ensure that only one template mounting task is being executed at the same time at the cluster level.
Template mounting will be divided into the following steps:

  1. Mount check: check if there is a storage group on the mounted path, check if the mounted path involves a mounted template
  2. Mount information push: synchronize the mount information to all DataNodes
    1. If all pushes are successful, continue to execute the mount operation
    2. If the partial push fails, terminate the push, notify the pushed DataNode to clean up the corresponding information, and return the mount failure
  3. Execute the mount operation: the precondition is that all DataNodes finish the mount operation

Image Added