Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Slow Path
If a replica refuses to accept t0 it responds with a higher t than any other it has witnessed. If C fails to reach fast path consensus it takes the highest t it witnessed from its responses, which constitutes a simple Lamport clock value imposing a valid total order. This value is proposed to at least a simple majority of nodes, along with the union of the dependenciesreceived in the preaccept phase. This round’s purpose is only to record durably which Lamport clock value that might be derived was selected (as multiple valid Lamport clock values might be obtained depending on which responses were received by C), so that if C crashes a recovery coordinator will pick the same timestamp. The inclusion of dependencies in the proposal is solely to facilitate Recovery of other transactions that may be incomplete - these are stored on each replica to facilitate decisions at recovery. Replicas as a result always accept the proposal (unless a newer ballot has been issued by a recovery coordinator to take over the transaction), and once a majority have accepted the proposal it is durably decided. Replicas respond with a new set of dependencies containing those transactions they have witnessed with a lower t0 than the t they received. The coordinator discards the dependencies it received previously on the fast path and uses these new dependencies for execution.

Execution
The union of all dependencies received during consensus is derived before t is disseminated via Commit and simultaneously a Read is issued by C to a member of each participating shard (preferably in the same DC), with those dependencies known to participate in that shard attached. This replica waits for all dependencies to be committed before filtering out those that are assigned a later t. The remaining dependencies are waited on until they execute and their result applied on this replica, before the read is evaluated and returned to the coordinator. C combines these responses to compute an update and client response, which is then disseminated by Apply to all replicas and returned to the client (respectively).

Recovery
Recovery of a transaction X is not terribly complicated. It assumes that it is executed by a replica R that has witnessed X. Replicas are contacted for their knowledge of the transaction. If they do not already know of X, they initially perform the same steps as they would in the fast path to record its arrival time. Then the replica consults its local state for any transactions that it has recorded that would have witnessed X if X had reached fast path consensus. If any are present that did not witness X this fact is returned to the coordinator. Any transactions with a lower t0 that have been proposed a higher t are also returned, alongside the replica’s other state for X.

R assembles a simple quorum, and picks the highest step to complete. That is to say, if it sees an Apply, Execute, Commit or Slow Path round it picks them up in that order of precedence. Otherwise it must decide if it should pick t0 or some t it found from the majority it contacted to propose in a new Slow Path round. The simple intuition here is that either it is safe to propose t0 or we know that we did not reach fast path consensus - this is determined by the replica responses indicating if they have witnessed a newer transaction that did not witness X. In this case we know X did not reach a fast path quorum else it would have been witnessed. If we witness no such transaction, then it is safe to propose t0 as every newer transaction knows of X and is waiting for it to commit. The only edge case here is a transaction with a lower t0 that is proposing a newer t but has not yet committed. If we get to this point, we must wait for these transactions to commit before retrying.