Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Discussion thread

https://lists.apache.org/thread/7qjzbcfzdshqb3h7ft31v9o3x43t8k6r
Vote thread
ISSUE
ReleaseTBD

Motivation

Position delete is a solution to implement the Merge-On-Read (MOR) structure, which has been adopted by other formats such as Iceberg[1] and Delta[2]. By combining with Paimon's LSM tree, we can create a new position deletion mode unique to Paimon.

...

"data rate / max num" = 20% / 2,000,000, means randomly call "RoaringBitmap.add(x)", which "x" is in the range of 0 to 2,000,000 for a total of 20% * 2,000,000 = 400,000 times to build the bitmap, then serialize it to file, next deserialize from file, finally call "RoaringBitmap.addcontains(x)" for 400,000 times to simulate filter.

measure: the total time of calling "RoaringBitmap.add(x)", time of serialize to file, time of deserialize from file, the serialized file size, and the total time of calling "RoaringBitmap.contains(x)".

data rate / max num

add(ms)

serialization(ms)

deserialization(ms)

file size(MB)

constains(ms)

20% /2,000,000

43

5

26

0.24

7

50% /2,000,000

47

3

52

0.24

5

80% /2,000,000

57

1

24

0.24

8

20% /20,000,000

450

13

247

2.4

49

50% /20,000,000

629

6

222

2.4

76

80% /20,000,000

1040

5

222

2.4

121

20% /200,000,000

5079

44

2262

24

442

50% /200,000,000

9469

43

2773

24

1107

80% /200,000,000

13625

38

2233

24

1799

20% /2,000,000,000

93753

568

22290

239

5747

50% /2,000,000,000

166070

679

22339

239

14735

80% /2,000,000,000

218233

553

22684

239

26504


Summarize the following points:

...