...
- The demander node prepares the set of IgniteDhtDemandedPartitionsMap#full cache partitions to fetch;
- The demander node checks compatibility version (for example, 2.8) and starts recording all incoming cache updates to the new special storage – the temporary WAL;
- The demander node sends the GridDhtPartitionDemandMessage to the supplier node;
- When the supplier node receives GridDhtPartitionDemandMessage and starts the new checkpoint process;
- The supplier node creates empty the temporary cache partition file with .tmp postfix in the same cache persistence directory;
- The supplier node splits the whole cache partition file into virtual chunks of predefined size (multiply to the PageMemory size);
- If the concurrent checkpoint thread determines the appropriate cache partition file chunk and tries to flush dirty page to the cache partition file
- If rebalance chunk already transferred
- Flush the dirty page to the file;
- If rebalance chunk not transferred
- Write this chunk to the temporary cache partition file;
- Flush the dirty page to the file;
- The node starts sending to the demander node each cache partition file chunk one by one using FileChannel#transferTo
- If the current chunk was modified by checkpoint thread – read it from the temporary cache partition file;
- If the current chunk is not touched – read it from the original cache partition file;
- The demander node starts to listen to new pipe incoming connections from the supplier node on TcpCommunicationSpi;
- The demander node creates the temporary cache partition file with .tmp postfix in the same cache persistence directory;
- The demander node receives each cache partition file chunk one by one
- The node checks CRC for each PageMemory in the downloaded chunk;
- The node flushes the downloaded chunk at the appropriate cache partition file position;
- When the demander node receives the whole cache partition file
- The node stops recording temporary WAL cache data entries;The node starts applying for cache data entries from temporary WAL storage on .tmp partition file;
- All concurrent cache puts are applying both on .tmp and original partition files;operations corresponding to cache partition file still write to the end of temporary WAL;
- At the moment of temporary WAL store is ready to be empty
- Suspend applying async operations on the partition file;
- Wait on last operations are applied from the temporary WAL store to the
When everything from temporary WAL applied on .tmp cache partition file- Stop applying concurrent cache updates on the partition file;
- Cut the .tmp postfix on partition file;
- Move the original partition file to .tmp;
- Resume applying concurrent cache updates async operations;
- Schedule the original partition file deletion and temporary WAL storage deletion;
- The supplier node deletes the temporary cache partition file;
- The demander node owning the new cache partition file;
...
When the supplier node receives the cache partition file demand request it must prepare and provide the cache partition file to transfer over network. The Copy-on-Write [3] tehniques assume to be used to guarantee the data consistency during chunk transfer.
The checkpointing process description on the supplier node – items 4, 5, 6 of the Process Overview.
...
Catch-up WAL
During the cache partition file transmitting, the demander node must hold all corresponding data entries on the new temporary WAL storage to apply them later. The file-based FIFO technique assumes to be used.
- The new write-ahead-log manager for writing temporary records must support
- Unlimited number of wal-files to store temporary cache records;
- Iterating over stored data records during an asynchronous writer thread inserts new records;
The process description on the demander node – items 2, 10 of the Process Overview.
...