Cluster Query Procedure

This document introduces how queries are performed in the distributed version of IoTDB (IoTDB cluster), mainly about how readers of timeseries in the stand-alone version are replaced with the distributed version and their functionalities. Details that are unchanged compared with the stand-alone version will be omitted, so please make sure that you are familiar with how queries are processed in the stand-alone version.

Raw data query (including aggregation with value filter)

Overview

Below is the overall procedure of raw data queries, the main difference with the stand-alone version is how readers of each timeseries are created, and other operations like SQL parsing, plan generation, how datasets merge the data from readers are almost unchanged from the stand-alone version.

A client sends a query to any node A in the cluster.
A parses the query and generates a physical plan (refer to the stand-alone version).
A executes the plan:
1. For each queried timeseries t in this plan
  1. Calculate data groups gs contains t using t's storage group and the time filter. (Extract time intervals from the time filter and calculate partitions using the partition table. If the time filter is null or contains open intervals, gs will be all data groups.)
  2. For each data group g in gs
    1. If A is in g, create a local reader (refer to the stand-alone version)
    2. If A is not in g, create a remote reader (refer to section Remote Reader)
  3. Merge all readers created as a MergedReader
2. Create a dataset over all MergedReaders
3. The dataset calculates results using the readers (refer to the stand-alone version)
A returns results to the client

Remote Reader

A remote reader uses a data group instead of local files as the data source of a timeseries and reads data points of the timeseries from a node of the data group through RPC requests.

Creation

A remote reader is created with a path of a timeseries, filters (optional), a data group, and a time offset (initialized with 0). During creation, the coordinator node sends the path, filters, time offset, and its query context id to a node in the data group (sorted by node latency). The received node find or create a local query context for the passed query context id and with the local query context and other parameters, it creates a local series reader and assigns an id for this reader, binds the id with the query context, fast-forwards it according to the time offset, and send the id back to the coordinator. The coordinator then stores the id in the remote reader, records the queried node in its context, and finishes the creation.

Data Fetch

When the upper level wants to fetch a batch from the remote reader, the reader sends its reader id to the node from where it was created, then the receiver node finds the corresponding local reader, reads a batch from that reader, and returns it to the coordinator. The coordinator will update its time offset to the start of the batch.

If the upper level wants to read a timestamp, the reader sends the timestamp and its id to the data owner node, then that node finds the corresponding local reader, reads the value of the timestamp, and returns it to the coordinator.

Reconnection

If a network error occurred during data fetch, the reader will choose the next node in the data group, repeat the creation to get another reader id from that node.

Deconstruction

Remember that we record every queried node and reader ids in the query context, and when a query context is released, requests containing the reader ids with the context id will be sent to each queried node, then the queried nodes retrieve each local reader and close it.

Aggregation

Aggregations (without value filter) are relatively simpler, as they are calculated only once on each node so there is no need to maintain readers for them. We also calculate the related data groups of each aggregation using the timeseries and time filter (if any), send the aggregation and filters to one node of each group, retrieve the aggregation result from that group, and merge the aggregation results together to produce the final result.

Space shortcuts

Page tree