You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

IDIEP-74
Author
Sponsor
Created

 

StatusDRAFT

Motivation

Data storage in 2.x versions of Ignite is tightly coupled to offheap-based implementations called Page Memory used in both in-memory and persistent modes. These implementations work well for some workloads but aren't optimal for others (e.g. in persistent mode Page Memory suffers from write amplification problem and both implementations use B+Tree as a primary key index which has its limitations compared to hash index).

In Ignite 3 we want to provide more flexibility for end users to choose the best storage implementation for their use cases. To achieve this we need to define a minimal API and hide all implementation details about particular storage implementations behind it.

Description

Data Storage API

As a starting point I suggest to define the following interfaces (of course they'll evolve when other components start integrating them):

Storage API
/** Interface providing methods to read, remove and update keys in storage. */
public interface Storage {
	/** Reads a DataRow for a given Key. */
	public DataRow read(Key key);

	/** Removes DataRow associated with a given Key. */
	public void remove(Key key);

	/** Executes an update with custom logic implemented by UpdateClosure interface. */
	public update(Key key, UpdateClosure clo);

	/** Obtains Iterator over some DataRows in storage. */
	public Iterator<DataRow> iterator(/* parameters */).
}

Proposed API is based on IgniteCacheOffheapManager interface from Ignite 2.x which is significantly simplified and cleaned up from Ignite 2.x concepts like partitions, caches and data stores.
However it keeps useful ideas like KeyCacheObject (in the form of Key class, its shape and responsibilities are subject of discussion and clarification), CacheDataRow (in the form of DataRow) and InvokeClosure needed to effectively implement read-modify-write pattern (in the form of UpdateClosure).

The first time this interface looks enough for needs of existing components.


Implementation of transactions needs a LockManager enabling to lock and unlock particular keys in Storage. It may look as simple as this but will be clarified when transaction protocol is defined:

LockManager API
/** Interface enabling obtaining locks on Keys in Storage. */
public interface LockManager {
	/** Locks given Key. */
	public void lock(Key key);

	/** Unlocks given Key. */
	public void unlock(Key key);
}


Index Storage API

SortedInternalStore
public interface SortedInternalStore {
    /** Put row to index. */
    void put(Row r);

    /** Remove row from index. */
    void remove(Row r);

    /**
     * Return rows between lower and upper bounds.
     * Fill results rows by fields specified at the projection set.
     *
     * @param low Lower bound of the scan.
     * @param up Lower bound of the scan.
     * @param opts Scan bound option.
     * @param proj Set of the columns IDs to fill results rows.
     */
    Cursor<Row> scan(Row low, Row up, ScanBoundOption opts, BitSet proj);

    enum ScanBoundOption {
        /** Include to results rows that equal all bounds. */
        INCLUDE_INCLUDE,

        /** Include to results rows that are equal lower bound and exclude from results rows that are equal upper bounds. */
        INCLUDE_EXCLUDE,

        /** Exclude from results rows that are equal lower bound and include to results rows that are equal upper bounds. */
        EXCLUDE_INCLUDE,

        /** Include from results rows that equal all bounds. */
        EXCLUDE_EXCLUDE
    }
}

Modules structure

As we aim to have more than one option of storage we need to keep storage api separated from implementation, so we'll have one module containing api and a separate module for each specific implementation.

Possible implementations

PageMemory code base migration (persistent)

Page Memory from Ignite 2.x is a good candidate as a persistent storage implementation in 3.0 although some refactoring and modifications are needed during migration.

Ideas of refactoring are covered in depth in [1], but here is a short list of proposed changes:

  1. WAL logging is eliminated as Storage component in general should integrate with new replication component (IEP-61) and reuse its replication log as an analogue of WAL log from 2.x.

  2. Single physical partition file on disk is replaced with a set of files: main "partition" file and several smaller files with incremental modifications to main file. This will allow to simplify checkpointing algorithm and also help to get rid of binary recovery phase.

RocksDB

RocksDB [2] is an LSM-based K-V database that should provide better performance on write-intensive workloads and should be easy to integrate for quick start.

In-memory storages

For in-memory storages we can consider both heap and off-heap solutions, including porting Page Memory without persistence support.

Risks and Assumptions

//NA

Discussion Links

//NA

Reference Links

  1.  https://github.com/apache/ignite-3/blob/ignite-14647/modules/vault/README.md
  2. https://rocksdb.org/
  • No labels