Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The demander node prepares the set of IgniteDhtDemandedPartitionsMap#full cache partitions to fetch;
  2. The demander node checks compatibility version (for example, 2.8) and starts recording all incoming cache updates to the new special storage – the temporary WAL;
  3. The demander node sends the GridDhtPartitionDemandMessage to the supplier node;
  4. When the supplier node receives GridDhtPartitionDemandMessage and starts the new checkpoint process;
  5. The supplier node creates empty the temporary cache partition file with .tmp postfix in the same cache persistence directory;
  6. The supplier node splits the whole cache partition file into virtual chunks of predefined size (multiply to the PageMemory size);
    1. If the concurrent checkpoint thread determines the appropriate cache partition file chunk and tries to flush dirty page to the cache partition file
      1. If rebalance chunk already transferred
        1. Flush the dirty page to the file;
      2. If rebalance chunk not transferred
        1. Write this chunk to the temporary cache partition file;
        2. Flush the dirty page to the file;
    2. The node starts sending to the demander node each cache partition file chunk one by one using FileChannel#transferTo
      1. If the current chunk was modified by checkpoint thread – read it from the temporary cache partition file;
      2. If the current chunk is not touched – read it from the original cache partition file;
  7. The demander node starts to listen to new pipe incoming connections from the supplier node on TcpCommunicationSpi;
  8. The demander node creates the temporary cache partition file with .tmp postfix in the same cache persistence directory;
  9. The demander node receives each cache partition file chunk one by one
    1. The node checks CRC for each PageMemory in the downloaded chunk;
    2. The node flushes the downloaded chunk at the appropriate cache partition file position;
  10. When the demander node receives the whole cache partition file
    1. The node stops recording temporary WAL cache data entries;The node starts applying for cache data entries from temporary WAL storage on .tmp partition file;
    2. All concurrent cache puts are applying both on .tmp and original partition files;operations corresponding to cache partition file still write to the end of temporary WAL;
    3. At the moment of temporary WAL store is ready to be empty
      1. Suspend applying async operations on the partition file;
      2. Wait on last operations are applied from the temporary WAL store to the
      When everything from temporary WAL applied on .tmp cache partition file
      1. Stop applying concurrent cache updates on the partition file;
      2. Cut the .tmp postfix on partition file;
      3. Move the original partition file to .tmp;
      4. Resume applying concurrent cache updates async operations;
      5. Schedule the original partition file deletion and temporary WAL storage deletion;
  11. The supplier node deletes the temporary cache partition file;
  12. The demander node owning the new cache partition file;

...

When the supplier node receives the cache partition file demand request it must prepare and provide the cache partition file to transfer over network. The Copy-on-Write [3] tehniques assume to be used to guarantee the data consistency during chunk transfer.  

The checkpointing process description on the supplier node – items 4, 5, 6 of the Process Overview.

...

Catch-up WAL

During the cache partition file transmitting, the demander node must hold all corresponding data entries on the new temporary WAL storage to apply them later.  The file-based FIFO technique assumes to be used.

  • The new write-ahead-log manager for writing temporary records must support
    • Unlimited number of wal-files to store temporary cache records;
    • Iterating over stored data records during an asynchronous writer thread inserts new records;

The process description on the demander node – items 2, 10 of the Process Overview.

...