Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Accord
In order to exploit the above innovations, we must select a timestamp protocol. To ensure reads are scalable we must also derive dependencies, so Caesar is the only candidate in the literature. Unfortunately Caesar has suboptimal characteristics: ACKs to older transactions may be delayed by newer ones (in theory indefinitely), and three round-trips may be required. Fortunately this can be remedied. Caesar appears to suffer this fate due to assembling a precise set of dependencies so that only one possible set of dependencies may be committed (and perhaps also because of its recovery protocol). Accord instead assembles an inconsistent set of dependencies. By inconsistent we mean that it may differ between coordinators, and we are OK with different sets of dependencies being committed to different replicas. We only require that all of these sets of dependencies are a superset of those that may be committed with a lower timestamp.

Preliminaries
We use hybrid logical clocks that are globally unique, that is each replica has its own unique id that is appended to each logical clock value.

Fast Path
A coordinator C proposes a timestamp t0 to at least a quorum of a fast path electorate. If t0 is larger than all timestamps witnessed for all prior conflicting transactions, t0 is accepted by a replica. If a fast path quorum of responses accept, the transaction is agreed to execute at t0. Replicas respond with the set of transactions they have witnessed that may execute with a lower timestamp, i.e. those with a lower t0.

Slow Path
If a replica refuses to accept t0 it responds with a higher t than any other it has witnessed. If C fails to reach fast path consensus it takes the highest t it witnessed from its responses, which constitutes a simple Lamport clock value imposing a valid total order. This value is proposed to at least a simple majority of nodes, along with the union of the dependenciesreceived in the preaccept phase. This round’s purpose is only to record durably which Lamport clock value that might be derived was selected (as multiple valid Lamport clock values might be obtained depending on which responses were received by C), so that if C crashes a recovery coordinator will pick the same timestamp. The inclusion of dependencies in the proposal is solely to facilitate Recovery of other transactions that may be incomplete - these are stored on each replica to facilitate decisions at recovery. Replicas as a result always accept the proposal (unless a newer ballot has been issued by a recovery coordinator to take over the transaction), and once a majority have accepted the proposal it is durably decided. Replicas respond with a new set of dependencies containing those transactions they have witnessed with a lower t0 than the t they received. The coordinator discards the dependencies it received previously on the fast path and uses these new dependencies for execution.

Execution
The union of all dependencies received during consensus is derived before t is disseminated via Commit and simultaneously a Read is issued by C to a member of each participating shard (preferably in the same DC), with those dependencies known to participate in that shard attached. This replica waits for all dependencies to be committed before filtering out those that are assigned a later t. The remaining dependencies are waited on until they execute and their result applied on this replica, before the read is evaluated and returned to the coordinator. C combines these responses to compute an update and client response, which is then disseminated by Apply to all replicas and returned to the client (respectively).