Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-22
AuthorVladimir Ozerov Ozerov
SponsorVladimir Ozerov Ozerov
Created15 Jun 2018
Status

Status
colourGreenGrey
titleACTIVEDRAFT


Table of Contents

Motivation

...

We will not update PK and secondary indexes during the data load, so it is necessary to rebuild them in the end. The most efficient way to build indexes is bottom-up approach, when the lowest level of BTree is built first, and the root is build last. We will need a buffer where indexed values and respective links will be sorted in index order. If the buffer is big enough and all the data fits into it, index will be created in one hop. Otherwise it is necessary to sort indexed values in several runs using an external sort. It is necessary to let user users configure sort parameters - buffer size (ideally - in bytes), and the file system path where temp files will be stored. The latter is critical - typically user users would like to keep temp files on a separate disk, so that WAL and checkpoint operations are not affected.

Direct Data Load

We will have a small in-memory buffer where for several consecutive data blocks. Date Data being injected are is put into these blocks, bypassing the page memory. When the buffer is full, we could issue a multi-buffer async disk write and continue filling the buffer with new data. As data loading typically affects several partitions, multiple buffers and/or some additional synchronization might may be required. 

Data will be inserted into the new blocks only. We will have to track the start and end position positions of the inserted data blocks. These positions will be used to scan new data during index rebuildrebuilding.

TBD

...

  • Interaction with WAL
  • Rebalance
  • Crash recovery

...