You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

IDIEP-74
Author
Sponsor
Created

 

StatusDRAFT

Motivation

Data storage in 2.x versions of Ignite is tightly coupled to offheap-based implementations called Page Memory used in both in-memory and persistent modes. These implementations work well for some workloads but aren't optimal for others (e.g. in persistent mode Page Memory suffers from write amplification problem and both implementations use B+Tree as a primary key index which has its limitations compared to hash index).

In Ignite 3 we want to provide more flexibility for end users to choose the best storage implementation for their use cases. To achieve this we need to define a minimal API and hide all implementation details about particular storage implementations behind it.

Description

Data Storage API

As a starting point I suggest to define the following interfaces (of course they'll evolve when other components start integrating them):

Storage API
/** Interface providing methods to read, remove and update keys in storage. */
public interface Storage {
	/** Reads a DataRow for a given Key. */
	public DataRow read(Key key);

	/** Removes DataRow associated with a given Key. */
	public void remove(Key key);

	/** Executes an update with custom logic implemented by UpdateClosure interface. */
	public update(Key key, UpdateClosure clo);

	/** Obtains Iterator over some DataRows in storage. */
	public Iterator<DataRow> iterator(/* parameters */).
}

Proposed API is based on IgniteCacheOffheapManager interface from Ignite 2.x which is significantly simplified and cleaned up from Ignite 2.x concepts like partitions, caches and data stores.
However it keeps useful ideas like KeyCacheObject (in the form of Key class, its shape and responsibilities are subject of discussion and clarification), CacheDataRow (in the form of DataRow) and InvokeClosure needed to effectively implement read-modify-write pattern (in the form of UpdateClosure).

The first time this interface looks enough for needs of existing components.


Implementation of transactions needs a LockManager enabling to lock and unlock particular keys in Storage. It may look as simple as this but will be clarified when transaction protocol is defined:

LockManager API
/** Interface enabling obtaining locks on Keys in Storage. */
public interface LockManager {
	/** Locks given Key. */
	public void lock(Key key);

	/** Unlocks given Key. */
	public void unlock(Key key);
}


Index Storage API

SortedInternalStore
public interface SortedInternalStore {
    /** Exclude lower bound. */
    byte GREATER = 0;

    /** Include lower bound. */
    byte GREATER_OR_EQUAL = 1;

    /** Exclude upper bound. */
    byte LESS = 0;

    /** Include upper bound. */
    byte LESS_OR_EQUAL = 1 << 1;

    /**
     * Update row at the index if need:
     * <ul>
     *     <li>put operation if the {@code oldR} is {@code null},</li>
     *     <li>remove operation if the {@code newR} is {@code null},</li>
     *     <li>update operation otherwise.</li>
     * </ul>
     */
    void update(Row oldR, Row newR);

    /**
     * Return rows between lower and upper bounds.
     * Fill results rows by fields specified at the projection set.
     *
     * @param low Lower bound of the scan.
     * @param up Lower bound of the scan.
     * @param scanBoundMask Scan bound mask (specify how to work with rows equals to the bounds: include or exclude).
     * @param proj Set of the columns IDs to fill results rows.
     */
    Cursor<Row> scan(Row low, Row up, byte scanBoundMask, BitSet proj);
}

Modules structure

As we aim to have more than one option of storage we need to keep storage api separated from implementation, so we'll have one module containing api and a separate module for each specific implementation.

Possible implementations

PageMemory code base migration (persistent)

Page Memory from Ignite 2.x is a good candidate as a persistent storage implementation in 3.0 although some refactoring and modifications are needed during migration.

Ideas of refactoring are covered in depth in [1], but here is a short list of proposed changes:

  1. WAL logging is eliminated as Storage component in general should integrate with new replication component (IEP-61) and reuse its replication log as an analogue of WAL log from 2.x.

  2. Single physical partition file on disk is replaced with a set of files: main "partition" file and several smaller files with incremental modifications to main file. This will allow to simplify checkpointing algorithm and also help to get rid of binary recovery phase.

RocksDB

RocksDB [2] is an LSM-based K-V database that should provide better performance on write-intensive workloads and should be easy to integrate for quick start.

In-memory storages

For in-memory storages we can consider both heap and off-heap solutions, including porting Page Memory without persistence support.

Risks and Assumptions

//NA

Discussion Links

//NA

Reference Links

  1.  https://github.com/apache/ignite-3/blob/ignite-14647/modules/vault/README.md
  2. https://rocksdb.org/
  • No labels