...
- The demander node prepares the set of IgniteDhtDemandedPartitionsMap#full cache partitions to fetch;
- The demander node checks compatibility version (for example, 2.8) and starts recording all incoming cache updates to the new special storage – the temporary WAL;
- The demander node sends the GridDhtPartitionDemandMessage to the supplier node;
- When the supplier node receives GridDhtPartitionDemandMessage and starts the new checkpoint process;
- The supplier node creates empty the temporary cache partition file with .tmp postfix in the same cache persistence directory;
- The supplier node splits the whole cache partition file into virtual chunks of predefined size (multiply to the PageMemory size);
- If the concurrent checkpoint thread determines the appropriate cache partition file chunk and tries to flush dirty page to the cache partition file
- If rebalance chunk already transferred
- Flush the dirty page to the file;
- If rebalance chunk not transferred
- Write this chunk to the temporary cache partition file;
- Flush the dirty page to the file;
- The node starts sending to the demander node each cache partition file chunk one by one using FileChannel#transferTo
- If the current chunk was modified by checkpoint thread – read it from the temporary cache partition file;
- If the current chunk is not touched – read it from the original cache partition file;
- The demander node starts to listen to new pipe incoming connections from the supplier node on TcpCommunicationSpi;
- The demander node creates the temporary cache partition file with .tmp postfix in the same cache persistence directory;
- The demander node receives each cache partition file chunk one by one
- The node checks CRC for each PageMemory in the downloaded chunk;
- The node flushes the downloaded chunk at the appropriate cache partition file position;
- When the demander node receives the whole cache partition file
- The node swaps the original partition file with the .tmp partition file;
- The node starts applying for data begins to apply data entries from temporary WAL storage;
- All concurrent async operations corresponding to cache partition file still write to the end of temporary WAL;
- At the moment of temporary WAL store is ready to be empty
- Suspend applying async operations to the temporary WAL;
- Wait on last operations are applied Waiting for recent operations from the temporary WAL store finished to be applied to the partition file;
- The node owning owns the new cache partition;
- Resume applying async applying async operations to the new owning owned partition file;
- Schedule the temporary WAL storage deletion;
- The supplier node deletes the temporary cache partition file;
...
- Zero-copy limitations – If operating system does not support zero copy, sending a file with FileChannel#transferTo might fail or yield worse performance. For example, sending a large file doesn't work well enough on Windows;
- Disbaled SSL connection – SSL must be disabled to take an advantage of Java NIO zero copy file transmission using of FileChannel#transferTo. We can consider to use OpenSSL's non-copying interface to avoid allocating new buffers for each read and write operation at Phase-2;
- Writing WAL io wait time – Under the heavy load of partition file transmission, writing to the temp-temporary WAL storage may be slowing down. Since the loss of data of temporary WAL storage has no risks we can consider store the whole storage into the memory.
...
{"serverDuration": 125, "requestCorrelationId": "6c43bedc25e3755f"}