Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposed Changes

  1. Implement distributed cache (table (cache) lock - when it is held no other operation on cache is possible, PME infrastructure will be used for that

  2. Implement external sort - get some memory buffer, collect there values in some order, flush to disk if needed, merge in the end
  3. Implement direct data load - write data to data blocks, re-created indexes with external sort and other necessary data structures in the end
  4. Add necessary API to control direct data load mode 

Distributed Cache Lock

Index update and integrity checks will be delayed during direct data load. In order to maintain data consistency during this time it is essential to prevent any concurrent updates. This could be done with new cache lock concept. We will initiate exchange which will wait for all cache operations to finish, and then put special flag on that cache. If flag is set, exception will be thrown in an attempt to access cache. 

External Sort

We will not update PK and secondary indexes during data load, so it is necessary to rebuild them in the end. The most efficient way to build indexes is bottom-up approach, when the lowest level of BTree is built first, and the root is build last. We will need a buffer where indexed values and respective links will be sorted in index order. If buffer is big enough and all data fits into it, index will be created in one hop. Otherwise it is necessary to sort indexed values in several runs using external sort. It is necessary to let user configure sort parameters - buffer size (ideally - in bytes), and file system path where temp files will be stored. The latter is critical - typically user would like to keep temp files on separate disk, so that WAL and checkpoint operations are not affected.

Direct Data Load

We will have small in-memory buffer where for several consecutive data blocks. Date being injected are put into these blocks, bypassing page memory. When buffer is full, we could issue multi-buffer async disk write and continue filling buffer with new data. As data loading typically affects several partitions, multiple buffers and/or some additional synchronization might be required. 

Data will be inserted into new blocks only. We will have to track start and end position of inserted data blocks. These positions will be used to scan new data during index rebuild.

TBD:

  • Interaction with WAL
  • Rebalance
  • Crash recovery

Tickets

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (iep-22) ORDER BY status
serverId5aa69414-a9e9-3523-82ec-879b028fb15b