Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Unlimited number of wal-files to store temporary data records;
  • Iterating over stored data records during an asynchronous writer thread inserts new records;
  • WAL-per-partiton approach need to be used;
  • Write operations to temporary WAL storage must have higher priority over read operations;

The problems to be solved:

  • We must stop updating indexes when the data is ready to be transferred from the supplier node. All async updates must not cause the index updates.

Rebuild indexes

The cache groups indexs must be rebuilded before setting the partition state to the node is ready to become partition owner when partition data is rebalanced and cache indexes are ready. For the message-based cluster rebalancing, approach indexes are rebuilt simultaneously with cache data loading. For the file-based rebalancing approach, the index rebuild procedure must be run before the partition state is set to the OWNING state. 

Public API changes

...

Code Block
languagejava
titleCommunicationSpi.java
collapsetrue
/**
 * @return {@code True} if new type of direct connections supported.
 */
public default boolean channelConnectionSupported() {
    return false;
}
 
/**
 * @param remote Destination cluster node to communicate with.
 * @param msg Configuration channel message.
 * @throws IgniteSpiException If fails.
 */
public default IgniteSocketChannel channel(ClusterNode remote, T msg) throws IgniteSpiException {
    throw new UnsupportedOperationException();
}

Failover and Recovery

In case of crash recovery, there is no additional actions need to be applied to keep the cache partition file consistency. We are not recovering partition Apache Ignite doesn't provide any recovery guarantees for the partitions with the MOVING state, thus the single partition file will be lost and only it. The cache partition file . The cache partitions will be fully loaded on when the next rebalance procedure occurs.

Topology change

Each topology change event JOIN/LEFT/FAILED may or may not change cache affinity assignments of currently rebalacning caches. If assignments is not changed and the node is still needs partitions being rebalanced we can continue the current rebalance process (see for details IGNITE-7165).

...

A new connection must be established and the download process of partition file must be continued from the last successfully send cache partition chunk.

Crash recovery

To provide default cluster basic recovery guarantee guarantees we must to: 

  • Start the checkpoint process when the temporary WAL storage becomes empty;
  • Wait for the first checkpoint ends and set owningOWNING status to the cache partition;

Supplier crashes when sending partition

Demander crashes when receiving partition

Demander crashes when applying temp WAL

Phase-2

The SSL must be disabled to take an advantage of Java NIO zero-copy file transmission using FileChannel#transferTo method. If we need to use SSL the file must be splitted on chunks the same way to send them over the socket channel with ByteBuffer. As the SSL engine generally needs a direct ByteBuffer to do encryption we can't avoid copying buffer payload from the kernel level to the application level

...