You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

This document introduces the topology of major classes in the cluster module, i.e., how they are connected and their main functionality. The main purpose of the document is to help new comers of the cluster module grasp how each class works, so they would be able to find where to modify if they want to contribute their code. However, any other advice or questions about the organization are welcomed, you may comment within this page or send mail to our mail list to discuss.


General Topology

Fig.1 shows the class diagram of important classes in the cluster module. For simplicity, class members and class methods are omitted here as they are just too many. Helper classes and utility classes are also ignored, so you would find much more classes in the cluster module.

Class Functionalities

Program Entry Point

ClusterMain is the main class in the cluster module, which performs some developer-level customization and start a MetaClusterServer, which will further initialize the Raft logics and the underlying IoTDB.

Servers

Servers are implementations of Thrift RPC servers, their main concerns are network transport and protocol parsing. Each of them listens on a port and an ip address, accept connection from clients, wrap connections with certain types of Thrift transports, read requests from the transports, parse them with specific protocols, and send the requests to associated RaftMembers. Each node only have one each server.

  • MetaClusterServer: listens to internal_meta_port, receives MetaGroup-related Raft requests, and forwards them to MetaMember. It also starts the underlying IoTDB once it is up.
  • MetaHeartbeatServer: listens to internal_meta_port + 1, receives MetaGroup heartbeats, and forwards them to MetaMember.
  • DataClusterServer: listens to internal_data_port, receives DataGroup-related Raft requests, decides which DataMember will process it and forwards to it.
  • DataHeartbeatServer: listens to internal_data_port + 1, only receives DataGroup heartbeats, decides which DataMember will process it and forwards to it.
  • ClientClusterServer: listens to external_client_port, receives database operations from clients, and forwards them to MetaMember.

Raft Memers

Raft Members are implementations and extensions of Raft algorithm, which handles log generation, log replication, heartbeat, election, and catch-up of falling behind nodes. Each RaftMember plays one role in exactlly one Raft group. One cluster node has exactlly one MetaMember and multiple DataMembers, depending on the number of replications and partitioning strategy.

  • RaftMember: an abstraction of common Raft procedures, like log generation, log replication, heartbeat, election, and catch-up of falling behind nodes.
  • MetaMember: an extension of RaftMember, representing a member in the MetaGroup, where cluster configuration, authentication info, and storage groups are managed. It holds the partition table, processes operations that may modify the partition table, and also acts as a coordinator to route database opertions to responsible DataMember with the help of its partition table. One node has only one MetaMember.
  • DataMember: an extension of RaftMember, representing a member in one DataGroup, where timeseries schemas and timeseries data of one cluster-level partition are managed. It also handles data queries and thus manages query resources. One node may have multiple DataMembers, depending on how many data partition it manages, which is further decided by the number of replications and partitioning strategy.

Partition Table

Partition Table maintains the mapping from a partition key (storage group name and time partition id in IoTDB) to a DataMember(or a node). As timeseries schemas and data are replicated only in a subset of the cluster (i.e., a DataGroup), opertions involving them can only be applied to specific nodes. When a node receives an operation, and it does not manage the corresponding data or schemas of the operation, the node must forward it to the right node, and this is where a Partition Table applies.

  • SlotPartitionTable: a consistent-slot-hashing-based Partition Table, which divides the whole data into a fixed number of slots (10000 slots by default), distributes the slots evenly to all DataGroups, hashes the partition key to one of the slots, and the data holder will be the holder of the slot. Upon the DataGroups are updated, the slots are re-distributed and the changes are recorded for data migration.

Heartbeat Thread

Heartbeat Thread is an important component of a Raft Member. As the chracter of a Raft Member is alwayls one of LEADER, FOLLOWER, or ELECTOR(CANDIDATE), it utilizes Heartbeat Thread in three ways: as a LEADER, Heartbeat Thread sends heartbeats to followers; as a FOLLOWER, Heartbeat Thread periodicallly checks whether the leader has timed out; as a CANDIDATE, it starts elections and send election requests to other nodes until one of the nodes becomes a valid leader.

  • HeartbeatThread: an abstarction of heartbeat-related transactions, and each RaftMember will have its own HeartbeatThread. When the associated member is a LEADER, the thread sends heartbeat to each follower periodically; if the member is a FOLLOWER, it checks the timestamp when the last heartbeat was received, and decide whether the leader has timed out; otherwise the member is an elector, the thread will start an indefinite election loop, asks for votes from other members in this Raft group, and break from the loop once itself or another node has become a leader.
  • MetaHeartbeatThread: an extension of HeartbeatThread for MetaMember. It requries the the follower to generate a new identifier, if its identifier conflicts with a known node during the cluster-building phase. It also sends followers the partition table if it is required.
  • DataHeartbeatThread: an extension of HeartbeatThread for DataMember. It includes the header node of its group so the receiver knows which group this heartbeat if from, and it includes the log progress of its associated MetaMember to ensure the leader of a data group also has up-to-date partition table.

RaftLogManager

RaftLogManager provides interfaces to append, commit, and get Raft logs, controls how Raft logs are organized in memory and persisted on disks, when to apply committed logs to the underlying IoTDB, and how to generate a snapshot before a certain log index. As each Raft group keeps its own log, each RaftMember has its own RaftLogManager.

  • RaftLogManager: a basic RaftLogManager, which manages lastest logs in memory and colder logs on disks, providing interfaces such as append, commit, and get. It applies logs after they are committed, but how they are applied depends on the passed LogApplier. It provides no snapshot implementation, so sub-classes extending it should focus on their own definition of snapshots.
  • MetaSingleSnapshotLogManager: an extension of RaftLogManager for MetaMembers, which provides MetaSimpleSnapshot containing storage groups, TTLs, users, roles, and partition tables.
  • FilePartitionedSnapshotLogManager: an extension of RaftLogManager for DataMembers, which provides PartitionedSnapshot<FileSnapshot>, consisting of a map whose keys are slot numbers, and values are FileSnapshots. FileSnapshot contains timeseries schemas and hardlinked TsFiles (in the form of RemoteTsFileResource) of a slot.



  • No labels