You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

IDIEP-22
AuthorVladimir Ozerov Ozerov
SponsorVladimir Ozerov Ozerov
Created15 Jun 2018
StatusDRAFT


Motivation

Initial data load is one of the most frequent and important use cases for Apache Ignite. At the moment the fastest way to load data is data streamer, which relies on asynchronous messaging and fast-path entry update method [1]. But all other internal mechanics of cache and page memory works as usual during data load.

General approach employed by major database vendors is to disable and/or skip as much internals as possible during data load, and to employ alternative sorting methods for indexes. With this idea in mind, we have potential to improve data loading speed in several times as follows:

  • Add exclusive cache access mode, when only data loading process is able to interact with cache
  • Load data directly to data pages, skipping page buffer and free lists
  • Then rebuild indexes bottom-up using external sort algorithm

Speedup is expected from less number of IO operations and and less locking page/entry overhead.

[1] https://github.com/apache/ignite/blob/ignite-2.5/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheMapEntry.java#L2691

Competitive Analysis

Most industrial vendors rely on the following assumptions:

  • Sequential IO is faster than random
  • Maintenance of internal database data structures is expensive, it is better to rebuild them from scratch in bulks when data load is finished
  • Acquiring of per-entry locks is expensive, so it is better to escalate lock to upper level (typically - table)
  • Crash recovery of data being loaded is typically not needed

Oracle

Oracle offers "direct path load" [1] as the fastest method to populate data. Direct path load can be from both command line utility SQL*Loader to load existing files (resembling Ignite's COPY command), and with conventional SQL, such as {{INSERT}} and {{CREATE TABLE AS SELECT}} (CTAS).

[1] https://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#SUTIL009
[2] https://docs.oracle.com/en/database/oracle/oracle-database/18/admin/managing-tables.html#GUID-134D6EB6-219E-4820-AB54-8C60067A8F0F

PostgreSQL

TODO

MySQL and MariaDB

TODO

Proposed Changes

TODO

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels