Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Under this mode, extra overhead (lookup and write delete file) will be introduced during writing, but during reading, data can be directly retrieved using "data + filter with position delete", avoiding additional merge costs between different files. Furthermore, this mode can be easily integrated into native engine solutions like Spark + GlutonGluten[3] in the future, thereby significantly enhancing read performance.

...

Delete file is used to mark the deletion of original file. The following figure illustrates how data updating and deleting under the delete file mode:


Image Modified

Currently, there are two ways to represent the deletion of records:

...

"data rate / max num" = 20% / 2,000,000, means randomly call "RoaringBitmap.add(x)", which "x" is randomly in the range of 0 to 2,000,000 for a total of 20% * 2,000,000 = 400,000 times to build the bitmap, then serialize it to file, next deserialize from file, finally call "RoaringBitmap.contains(x)" for 400,000 times to simulate filter.

...