Firstly and concretely, I'd like to demonstrate two industrial scenarios as well as the SQL and the query result format. If it makes sense, we could add SQL formats to the IoTDB as the first step of the IoTDB index mechanism.
Case 1: Erythromycin Fermentation
The figure above shows the IoTDB schema for the Erythromycin Fermentation. Some irrelevant information has been omitted. The pharmaceutical factory has many fermentors (Ferm 01, Ferm 02, etc.), each of which produces erythromycin batch by batch.
For each batch (e.g., batch-20191017), the factory will monitor some measurements in the fermentation process, like glucose feeding rate (Glu), carbon dioxide exit rate (CER), and pH value. Researchers have found that the glucose feeding rate is critical to the final erythromycin output. Therefore, analysts want to build a similarity index (e.g. RTree+PAA index) for Glu sequences for all fermenters and all batches. After building the index, the analyst inputs a Glu sequence, finds batches with similar Glu curves, and makes further analysis.
Index Creation
The SQL format of index creation statement is as follows:
CREATE INDEX ON seriesPath
WITH INDEX=indexName (, key=value)*
In addition to the index name and target series paths, users are allowed to pass in any index parameters in format of "key(string):value(string)", such as "PAA-dimension:8" or "leaf-size: 10000".
In this scenario, the SQL of index creation is:
CREATE INDEX ON root.Ery.*.Glu
WITH INDEX=RTreePAA, PAA_dim=8
Index Query
The analyst hopes to find the two Glu sequences closest to the given query Glu sequence in all fermenters and batches. The index query statement is as follows:
SELECT TOP 2 Glu
FROM root.Ery.*
WHERE Glu LIKE (0, 120, 20, 80, ..., 120, 100, 80, 0)
Recognizing the keyword LIKE, IoTDB assigns this query statement to the indexing mechanism. For consistency with other IoTDB query results, retrieved sequences are listed as separate columns and aligned according to timestamps.
As shown in the above figure, two sequences closest to the given query sequence are Ferm01-batch-20191018 and Ferm02-batch-20191024.
Note that after a fermentation batch is completed, the series has been fixed, and will not be appended or renewed. Because of the regular productivity, we can suppose that all series among batches and fermenters have similar lengths.
Index Drop
The index drop statement is as follow:
DROP INDEX indexName ON seriesPath
In this case, the drop SQL is:
DRRP INDEX RTreePAA ON root.Ery.*.Glu
Case 2: Extreme Operating Gust
The above figure shows the IoTDB schema of a wind turbine manufacturer. The manufacturer has many wind turbines (AZQ01, AZQ02, etc.), each of which continuously monitors the state of the turbine itself and the surrounding environment, such as wind speed, wind direction, generator power, etc. The analyst hopes to build an index (e.g. ELB index) for the speed series of AZQ02 (root.Wind.AZQ02.Speed
). After building the index, the analyst inputs a EOG pattern (Extreme Operating Gust), finds all results in speed series for further process control, fault diagnosis and predictive maintenance.
Index Creation
The index creation statement is as follows, similar to the above case:
CREATE INDEX ON root.Wind.AZQ02.Speed
WITH INDEX=ELBIndex, Block_Size=5
Index Query
The query condition and the result format are the main differences from the above scenario.
The analyst input a pattern of extreme operating gust
and wishes to find all similar subsequence on root.Wind.AZQ02.Speed
. The query statement is as follows:
SELECT Speed.* FROM root.Wind.AZQ02
where Speed
CONTAIN (15, 14, 12, ..., 12, 12, 11) WITH TOLERANCE 1
CONCAT (10, 20, 25, ..., 24, 14, 8) WITH TOLERANCE 2
CONCAT (8, 9, 10, ..., 14, 15, 15) WITH TOLERANCE 1
The pattern contains three sub-patterns with different searching thresholds of which the second sub-pattern is 2 and the other two are 1.
In this case, the result is the subsequence of long series, which does not exist in the original IoTDB schema. For consistency with other IoTDB query results, each retrieved subsequence is listed as individual columns, and the column name is the concatenation of the series pathname and the start time of the subsequence. As shown in the above figure, there are two retrieved subsequences, which started from 2019-10-18 12: 30: 00.000 and 2019-10-18 12: 30: 10.000 respectively.
Since the monitoring process is continuous, the new-coming data will be inserted into the index continuously.
Index Drop
Similar to the above case.