Page History

...

A complex index reduild procedure that requires the development of additional crash recovery guarantees. It will start immediately when the partition file is fully received from the supplier node. If the node crashes in the middle of the rebuilding index process it will have an inconsistent index state at the further node startup. To avoid this a new index-undo WAL record must be logged within rebuilding and used on node start to remove previously added index records.

...

Historical rebalance

...

After partition is received the historical rebalance must be initiated to load other cache updates.
...
When the supplier node receives the cache partition file demand request it will send the file over the CommunicationSpi. The cache partition file can be concurrently updated by checkpoint thread during its transmission. To guarantee the file consistency Сheckpointer must use Copy-on-Write [3] tehnique and save a copy of updated chunk into the temporary file.

Apply partition on the fly

Catch-up temporary WAL

While the demander node is in the partition file transmission state it must save sequentially all cache entries corresponding to the MOVING partition into a new temporary storage. These entries will be applied later one by one on the newly received cache partition file. All asynchronous operations will be enrolled to the end of temporary storage during storage reads until it becomes fully read. The file-based FIFO approach assumes to be used by this process.

The temporary storage is chosen to be WAL-based. The storage must support to:

Unlimited number of WAL-files to store temporary data records;
Iterating over stored data records during an asynchronous writer thread insert new records;
WAL-per-partiton approach is need to be used;
Write operations to storage must have higher priority over read operations;

Expected problems to be solved

We must stop updating indexes on demander when the data is ready to be transferred from the supplier node. All async cache updates on demander must not cause the index update;
The previous partition metadata page and all stored meta information must be destroyed in PageMemory and restored from the new partition file;

Rebuild indexes

The node is ready to become partition owner when partition data is rebalanced and cache indexes are ready. For the message-based cluster rebalancing approach indexes are rebuilding simultaneously with cache data loading. For the file-based rebalancing approach, the index rebuild procedure must be finished before the partition state is set to the OWNING state.

Failover and Recovery

Rebuild indexes

The node is ready to become partition owner when partition data is rebalanced and cache indexes are ready. For the message-based cluster rebalancing approach indexes are rebuilding simultaneously with cache data loading. For the file-based rebalancing approach, the index rebuild procedure must be finished before the partition state is set to the OWNING state.

Failover and Recovery

Ignite doesn't provide any recovery guarantees for the partitions with the MOVING state. The cache partitions will be fully loaded when the next rebalance procedure occurs.

FAIL\LEFT during rebalancing

The node which is beeing rebalancing left the cluster. For such nodes WAL is always disabled (all partitions have MOVING state due to this node is new for the cluster and has no cache data).
Since WAL is disabled we can guarantee that all operations with loaded partition files are safe to be done (renaming partition files, applying async updates) due to a cache directory will be fully dropped on recoveryApache Ignite doesn't provide any recovery guarantees for the partitions with the MOVING state. The cache partitions will be fully loaded when the next rebalance procedure occurs.

Topology change

Each topology change event JOIN/LEFT/FAILED may or may not change cache affinity assignments of currently rebalacning caches. If assignments is not changed and the node is still needs partitions being rebalanced we can continue the current rebalance process (see for details IGNITE-7165).

...

To provide basic recovery guarantees we must to:

...

Wait for the first checkpoint ends and set OWNING status to partition;

Recovery from different stages:

The Supplier crashes when sending partition;
The Demander crashes when receiving partition;
The Demander crashes when applying temp WAL;

Phase-2

The SSL must be disabled to take an advantage of Java NIO zero-copy file transmission using FileChannel#transferTo method. If we need to use SSL the file must be splitted on chunks the same way to send them over the socket channel with ByteBuffer. As the SSL engine generally needs a direct ByteBuffer to do encryption we can't avoid copying buffer payload from the kernel level to the application level.

...

Page tree

Versions Compared

Old Version 78

New Version Current

Key

Historical rebalance

Apply partition on the fly

Catch-up temporary WAL

Rebuild indexes

Failover and Recovery

Rebuild indexes

Failover and Recovery

FAIL\LEFT during rebalancing

Topology change

Phase-2