Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Excerpt

This article covers the internal design of Durable Memory. Intended for Ignite developers

Table of Contents

...

.

The reasoning behind the durable memory is described in the main documentation: https://apacheignite.readme.io/docs/durable-memory

...

Table of Contents


Ignite Durable Memory

Let's split the memory into pages (synonyms: buffers, block, chunks). Let's consider the Page as a fundamental unit the whole memory is split into. This makes the memory addressing page-based.

...

B+Tree are used for SQL indexes: tree maps field value to reference to entry value.Segment

Region and segment structure

Page content is stored in RAM in segments. Actually memory region allocated in RAM is not continious sequence of pages. 

 Ignite uses a special component that manages information about pages currently available in memory segment (FullPageIdTable). There is one page ID table per memory segment chunk (UnsafeChunk, DirectMemoryRegion). For unsafe segment chunk sun.misc.Unsafe is used internally, which provides memory for off-heap data storage.

PageIdTable manages mapping from Page ID to relative pointer map (rowAddr) within unsafe segment chunk.

 

...

Each segment manages it's own lock. Lock protects segment content (page table) from concurrent modification. Splitting each data region (DataRegionConfiguration) to a number of segments allows to minimise contention on one lock for region (striping technique is used). By default segment count is determined by available CPUs, but may be configured by DataStorageConfiguration.setConcurrencyLevel().

Read lock is held to segment to resolve real unmanaged memory address from page ID (page pin). Write lock is held if segment pages are being modified (e.g. page is rotated from RAM to disk).

Page replacement (rotation with disk)

If durable memory operates with disk (native peristence is covered in Ignite Persistent Store - under the hood) then PageIdTable, paging is handled automatically by Ignite.

PageIdTable (FullPageIdTable) table is also used to check

  • if a page is actually available in memory
  • or has to be loaded from disk.

As a result, even SQL queries can be fully utilized from Ignite applications when only a subset of pages is in memory. In this case required pages will be taken from disk during query execution. 

If memory amount is less than whole size of B+tree for SQL index, Ignite still can operate. When Ignite runs out of RAM memory for the new pages allocation, some of the pages will be purged from RAM (as it can be found in disc). The eviction of pages is based on the algorithms described belowloaded from disk later).

This process is named page rotation (page replacement, swap to disk, historical naming - page eviction).

Let's suppose RAM memory is fully filled with pages, and it is required to allocate new. It is required to evict some page from memory

Image Removed

Ignite uses Eviction Policy to determine which page to select to be evicted.

.

Example of rotation of new page and clear page in RAM is shown in picture.

Image Added

Simplest algorithm would be selected for eviction selecting page to rotate is LRU, but it requires double linked list. It is not simple to implement such structure in off heap.

Algorithm used instead is Random-LRU (most recent access timestamp is stored for a data page). This algorihtm is used always if Persistent Data store mode enabled.  Ignite will adaptively page-in and page-out memory to disk in a fashion very similar to how modern operating systems handle virtual memory. 

Replacement (rotation) of pages Eviction is started only if memory segment is full and it is impossible to allocate page in segment chunk.

During first page eviction rotated warning message is printed to logs

...

If this message appeared in log it means segment is full and for allocation of new page it is required to purge some other page.

Page replacement Eviction may have negative influence to performance because in some cases it is possible that ignite Ignite continiously evicts page from RAM and some steps later reqiures this page data. In this case it is required to re-read data from disc.

Entry

...

Eviction

If Native Peristence is not used, then upcoming hit to memory limit requires Ignite to clean up some data (otherwise IgniteOutOfMemory may occur). In this case, users have to decide how to deal with out-of-memory situations by specifying their own eviction policy. Ignite will not magically page to disk and so users need to configure eviction. 

Ignite uses Eviction Policy to determine which page to select to be evicted. Policy is configured by user. This configuration is useful

...

Eviction is used not only in Ignite Persistent Store - mode. Same technique is required if Ignite is used as fast access cache with 3rd party DB as persistence . (such as write-behind to an RDBMS).


If eviction is configured In that case we need to remove one cache entry, but removing entry from the middle of page will cause pages fragmentation.

...

This method allows to clean up big continuous segment of memory (usually whole page)

Eviction policy only makes sense when native persistence is disabled because what it does is actually freeing up memory when a user hits the memory limit.

The only way to do this is to destroy inserted data because there is no other way to free memory.

Eviction mechanism, it is both per-page and per-entry:

  • first, we choose a page which fits most for eviction (however, we cannot simply erase the page because quite a lot of data structures are referring to that page).
  • second, we read keys that the chosen page contains and then clear individual cache entries, which in turn will clear up all necessary links.
     

If there are no concurrent updates, the page becomes empty and will be reused for other user data. 

Random-2-LRU

There is second option for eviction if Persitent Data store is not enabled. In that algorithm two most recent access timestamps are stored for every data page.

...

This policy solves case of one-time access of data, for example, one full scan query. Pages touched during running this query is not considered hot. See also documentation

Free lists

Ignite manages free lists to solve issue with fragmentation in pages (not full pages).

...

  • Consult marshaller about size in bytes of this value pair
  • Upper-round this value to be divisible by 8 bytes
  • Use value from previous step to get page list from free list
  • Select some page from appropriate list of free pages. This page will have required amount of free space

Long objects

If object is longer than page size, it will require several pages to store

...

In previous case (updated field value has same length) only one page will be marked as dirty.

Page structure

Page types and headers

There is class PageIO - it is abstract class for reading and writing pages. Several implementations also exist: BplusIODataPageIO

...

Link allows to read K,V pair as N-th item in page.

Value delete from Data Page

Deletion of lastest added item is trivial. We can just remove Item, and Key-Value pair without any additional changes (see It3, K,V3 at picture).

...

Example of re-insertion new K,V4 element after some other K,V2 deleted. Following picture shows Indirect Item to direct Item replacement

 

BPlus Tree Structure

B+Tree structure is build mostly the same as binary tree. If requied value was not found, it is compared with some value in the tree. Greater values can be found using rigth link, less - using left.

...

Hash Index is also B+ Tree (not hash table), key is hashcode and value is link.

Memory policy

Data Region Configuration

DataRegionConfiguration (historical naming is Memory Policy) is especially important when Ignite Persistent Store configuration is enabled.

Several caches in previous Ignite version may allocate uncontrolled amount of memory. In case of limited resources first cache which perfomed allocations wins. There was no way to limit this memory for particular cache.

In 2.0 Ignite version it is possible to control it using Memory Policy. One Memory Policy may using DataRegionConfiguration (before 2.3 named Memory Policy). One data region may include several caches (relation is 1.to.N).

All data may be separated now in the end-by user system to following classes: archive data and operational data:

User now can specify how much memory it is possible to allocate for cache or cache group.

Reference tables (dictionaries) are usually small, and may be assigned to be allocated to memory always.

...