ID	IEP-22
Author	Vladimir Ozerov Ozerov
Sponsor	Vladimir Ozerov Ozerov
Created	15 Jun 2018
Status	ACTIVE

Motivation

Initial data load is one of the most frequent and important use cases for Apache Ignite. At the moment the fastest way to load data is data streamer, which relies on asynchronous messaging and fast-path entry update method [1]. But all other internal mechanics of cache and page memory works as usual during data load.

General approach employed by major database vendors is to disable and/or skip as much internals as possible during data load, and to employ alternative sorting methods for indexes. With this idea in mind, we have potential to improve data loading speed in several times as follows:

Add exclusive cache access mode, when only data loading process is able to interact with cache
Load data directly to data pages, skipping page buffer and free lists
Then rebuild indexes bottom-up using external sort algorithm

Speedup is expected from less number of IO operations and and less locking page/entry overhead.

[1] https://github.com/apache/ignite/blob/ignite-2.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheMapEntry.java#L2691

Competitive Analysis

Most industrial vendors rely on the following assumptions:

Sequential IO is faster than random
Maintenance of internal database data structures is expensive, it is better to rebuild them from scratch in bulks when data load is finished
Acquiring of per-entry locks is expensive, so it is better to escalate lock to upper level (typically - table)
Crash recovery of data being loaded is typically not needed

Oracle

Oracle offers "direct path load" [1] as the fastest method to populate data after manual file copying. Direct path load can be from both command line utility SQL*Loader to load existing files (resembling Ignite's COPY command), and with conventional SQL, such as INSERT and CREATE TABLE AS SELECT (CTAS). Major highlights:

SQL parsing phase is skipped in case of SQL*Loader usage
Page cache is skipped
Old pages are not changed, data is written to new data blocks
Multi-block writes and async IO is used to improve throughput
Table lock is held during data load - writes are not allowed, reads see data before data load start
Partial indexes are built using separate memory buffers and then merged to existing indexes

[1] https://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#SUTIL009
[2] https://docs.oracle.com/en/database/oracle/oracle-database/18/admin/managing-tables.html#GUID-134D6EB6-219E-4820-AB54-8C60067A8F0F

PostgreSQL

PostgreSQL offers a number of best practices for efficient data load. Dedicated COPY command exists to bypass SQL parsing and minimize network overhead. Apart from that there are not dedicated features for data load. General recommendations are:

Disable indexes and FK constraints
Turn off WAL
Increase size of maintenance_work_mem parameter (buffer for index rebuild routines)
Indirectly increase time between checkpoints through max_wal_size parameter

MySQL and MariaDB

MySQL and it's forks are pretty similar to PostgreSQL in available techniques and best practices [1]. MySQL also has COPY command [2], which is the fastest data load method, reducing a lot parsing and network overhead. MySQL also advises to disable indexes, unique and FK constraints. But as MySQL is index-organized database and has key-value MyISAM engine, there are two unique points:

MyISAM first saves KV pairs in separate memory-optimized buffer base on red-black tree; once full, it is flushed to disk in a series of sequential writes [3]
Data load in PK order is much faster than in arbitrary order (Oracle has the same recommendation for secondary indexes)

[1] https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
[2] https://dev.mysql.com/doc/refman/8.0/en/load-data.html
[3] https://dev.mysql.com/doc/internals/en/bulk-insert.html

Proposed Changes

Implement distributed table (cache) lock - when it is held no other operation on cache is possible, PME infrastructure will be used for that
Implement external sort - get some memory buffer, collect there values in some order, flush to disk if needed, merge in the end
Implement direct data load - write data to data blocks, re-created indexes with external sort and other necessary data structures in the end
Add necessary API to control direct data load mode

Tickets

key	summary	type	created	updated	due	assignee	reporter	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-22: Direct Data Load