In this document, we introduce the problems that occur when applying deletions in the context of a cluster of IoTDB and their solutions. Solving an intuitive problem may induce a more hidden one, introducing the problems and their solutions progressively. We consider time partitioning in a storage group is always enabled as it is necessary for load balancing.


Problem1 Distribution of Deletions

The first problem is how to distribute a deletion to appropriate nodes. Apparently, deletions should be ordered together with insertion to assure consistency, so deletions should be executed in data groups, but what groups should receive the deletion need to be determined. Unlike an insertion that will fall into a certain time partition (storage group + time range), a deletion may cover more than one time partitions even an unbounded number of time partitions, when using an open interval, for example, the execution of "DELETE FROM root.sg1.d1.s1 WHERE time < 121412516346" involves almost infinitely many time partitions. Even if the interval is close, it may also include too many time partitions if the interval is long or the length of a time partition is short. When the number of time partitions is too large, calculating the owners of each partition with the help of a partition table will be time-consuming. 


The Solution to Problem 1

By the time of the first implementation, IoTDB only supports open interval deletions, so for simplicity, each deletion will be distributed to all data groups. As deletions are rare compared with insertions and when the deletion interval is long and there are enough time partitions, almost all data groups will be involved so little extra overhead is introduced.


Problem2 Imprecise Execution of Deletions

Although we distribute a deletion to all data groups, the intention is to delete data in each data group only, in other words, the execution of a deletion in a data group should only delete data that is in the time partitions managed by the data group. However, without any additional information, the deletion executed by different data groups is identical, which means a data group may delete the data of other data groups and introduce potential inconsistency as data groups execute operations concurrently on each node, so their order is not unified. For example, on node A, a data group may delete before another data group inserts, but on node B, the order is reversed, so some data is deleted on node B while it remains on node A, causing an inconsistency.


The Solution to Problem 2

We can add the structure TimePartitionFilter before the execution of a deletion by the DataGroupMember. The added TimePartitionFilter tells the underlying PlanExecutor what time partitions should be involved in the execution of the deletion and leaves other time partitions unchanged, so when a deletion is executed by one data group, the data of other data groups will not be changed and consistency is preserved.





  • No labels