...
Table of Contents:
Table of Contents |
---|
...
There are following file types used for persisting data: Cache pages or page store, Checkpoint markers, and WAL segments.
...
Ignite with enabled persistence uses following folder structure:
2.3+ | Older versions (2.1 & 2.2) | ||
---|---|---|---|
The Pst-subfolder name is same for all storage folders. A name is selected on start, may be based on node consistentId. |
|
The subfolder name is generated on start. By default new style naming is used, for example node00-e819f611-3fb9-4dbe-a3aa-1f6de4af5d02
...
PST subfolder naming options explained:
...
Expand |
---|
1) A starting node binds to a port and generates old-style compatible consistent ID (e.g. 127.0.0.1:47500) using DiscoverySpi.consistentId(). This method still returns ip:port-based identifier. 2) The node scans the work directory and checks if there is a folder matching the consistent ID. (e.g. work\db\127_0_0_1_49999). If such a folder exists, we start up with this ID (compatibility mode), and we get file lock to this folder. See PdsConsistentIdProcessor.prepareNewSettings. 3) If there are no matching folders, but the directory is not empty, scan it for old-style consistent IDs. If there are old-style db folders, print out a warning (see warning text above), then switch to new style folder generation (step 4). 4) If there are existing new style folders, pick up the one with the smallest sequence number and try to lock the directory. Repeat until we succeed or until the list of new-style consistent IDs is empty. (e.g. work\db\node00-uuid, node01-uuid, etc). 5) If there are no more available new-style folders, generate a new one with next sequence number and random UUID as consistent ID. (e.g. work\db\node00-uuid, uuid overrides uuid in GridDiscoveryManager). 6) Use this consistent ID for the node startup (using value from GridKernalContext.pdsFolderResolver() and from PdsFolderSettings.consistentId()). There is a system property to disable new-style generation and using old-style consistent ID (IgniteSystemProperties.IGNITE_DATA_STORAGE_FOLDER_BY_CONSISTENT_ID). |
Ignite Durable Memory is the basis for all data structures. There is no cache state saved on heap now.
...
Collection of pages (GridCacheDatabaseSharedManager.Checkpoint#cpPages) is a snapshot of dirty pages at checkpoint start. This collection allows writing pages which were changed since the last checkpoint.
Info |
---|
When the checkpoint process starts, pages marked for checkpoint are no longer marked as dirty ones in metrics. |
In parallel with the process of writing pages to disk, some thread may need to update data in the page being written (or scheduled to being written).
...
...
...
Ignite node chooses one of them adaptively.
There are two ways to find out that Pages Write Throttling affects data update operations.
Take a thread dump -
There are two ways to find out that Pages Write Throttling affects data update operations.
Take a thread dump - some threads will be waiting at LockSupport#parkNanos with "throttle" classes in a trace. Example stacktrace:
Code Block |
---|
"data-streamer-stripe-4-#14%pagemem.PagesWriteThrottleSandboxTest0%@2035" prio=5 tid=0x1e nid=NA waiting java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Unsafe.java:-1) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:232) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:220) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:463) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:463) at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:501) at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:102) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1300) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:1438) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4338) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4296) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3051) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$6200(BPlusTree.java:2945) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1717) ... |
If throttling is applied, related statistics is dumped to log from time to time:
Code Block |
---|
[2018-03-29 21:36:28,581][INFO ][data-streamer-stripe-0-#10%pagemem.PagesWriteThrottleSandboxTest0%][PageMemoryImpl] Throttling is applied to page modifications [percentOfPartTime=0,92, markDirty=9905 pages/sec, checkpointWrite=6983 pages/sec, estIdealMarkDirty=41447 pages/sec, curDirty=0,07, maxDirty=0,26, avgParkTime=741864 ns, pages: (total=169883, evicted=0, written=112286, synced=0, cpBufUsed=15936, cpBufTotal=241312)] |
The most relevant part of this message is percentOfPartTime metric. In the example it's 0.92 - writing threads are stuck in LockSupport.parkNanos() for 92% of the time, which means very heavy throttling.
Each message appears in the log when percentOfPartTime reaches 20% border.
...
Planned future optimisation - refer data modified from PageDeltaRecord to logical record. Will allow to not store byte updates twice. There is file WAL pointer, pointer to record from the beginning of time. This refreence may be used.
WAL consist of segments (files). The part of segments creates a work directory and files there are cyclically overwritten. Another part is archive - it is sequentially enumerated files, old files are deleted.
WAL file segments and rotation structureand rotation structure is shown at the picture below:
A number of segments may be not needed anymore (depending on History Size setting). Old fashion WAL History size setting is set in checkpoints number (See also WAL history size section below), the new one is set in bytes. History size setting is mentioned here https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive
Let’s assume node start process is running with existent files.
...
There several levels of guarantees (WALMode)
Implementation | Warranties | |
---|---|---|
FSYNC | fsync() on each commit | Any crashes (OS and process crash) |
LOG_ONY | write() on commit Synchronisation is responsibility of OS | Kill process, but no OS fail |
BACKGROUND | do nothing on commit (records are accumulated in memory) write() on timeout | kill -9 may cause loss of several latest updates |
NONE | WAL is disabled | data is persisted only in case of graceful cluster shutdown (Ignite.cluster().active(false)) |
But there is several nodes containing same data and there is possible to restore data from other nodes.
...
Partition update counter is saved with update recods in WAL.
...
Consider partition on joining node was is owning state, update counter = 50. Existing nodes has update counter = 150
Node join causes partition map exchange, update counter is sent with other partition data. (Joining node will have new ID and from the point of view of dicsovery this node is a new node.)
Coordinator observes older partition state and forces partition to moving state. Moving force is required to setup uploading newer data.
...
Page size configuration for storage path [/work/db/node00-3a1415b8-aa54-4a63-a40a-c75ad48dd6b8]: 4096; Linux memory page size: 4096; Selected FS block size : 4096.
Selected FS block size : 4096
Direct IO is enabled for block IO operations on aligned memory structures. [block size = 4096, durable memory page size = 4096]
However, disabling plugin’s function is possible through system Property. To disable Direct IO set IgniteSystemProperties#IGNITE_DIRECT_IO_ENABLED to false.
...