Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Two-Phase Commit Protocol

 

A single transaction in distributed systems usually spans across several server nodes which imposes additional requirements for the sake of data consistency. For instance, it is obligatorily to detect and handle situations when a transaction was not fully committed due to a partial outage or cluster nodes loss. Ignite relies on two-phase commit for handling this and many other situations in order to ensure data consistency cluster-wide. 

As the protocol name suggests, a transaction is executed in two phases. The "prepare" phase goes first: 

 


Image Added
Picture 3.
 
  1. Transaction coordinator (aka. near node or application that runs a transaction) send "prepare" message (step 1) all primary nodes participating in the transaction.
  2. The primary nodes forward the message to nodes that hold a backup copy of data if any (step 2) and acquire all the required data locks.
  3. The primary nodes acknowledge (step 4) that all the locks are required and they are ready to commit the transaction.


Right after that, the transaction coordinator executes the second phase by sending "commit" message:

 
Image Added
Picture 4.
 

Once the backup and primary copies are updated, the transaction coordinator gets acknowledged and assumes that the transaction is finished.

This is how the 2-phase commit works in a nutshell. Below we will see how the protocol tolerates failures, distinguishes pessimistic and optimistic transaction and does many other things. 

Near Node, Remote Node and DHT

The transaction coordinator is also known as a near node among Ignite community and committers. The transaction coordinator (near node) initiates a transaction, tracks its state, sends over "prepare" and "commit" message, orchestrates the overall transaction process. Usually, a near node is a client node that connects our applications to the cluster. The application issues 

 

 


Image Added

Picture 5.

In contrast to the Near involved in the transaction partitions are called Remote. They keep "their" part of the cache. Physically, every node allocates at part DHT (Distributed hash table) and through afiniti function, any transaction participant may determine its partition and nodes. It should be noted that DHT keeps the buckets to the level of partitioning, and then use B + tree.