Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Audience: All Cassandra Users and Developers
User Impact: Support for fast general purpose transactions
WhitepaperAccord

Motivation

Users must expend significant effort to modify their database consistently while maintaining scalability. Even simple transactions involving more than one partition may become complex and error prone, as a distributed state machine must be built atop the database. Conversely, packing all of the state into one partition is not scalable.

Performance also remains an issue, despite recent Paxos improvements: latency is still twice its theoretical minimum over the wide area network, and suffers particularly badly under contention.

This work aims to improve Cassandra to support fast general purpose transactions. That is, those that may operate over any set of keys in the database atomically, modifying their contents at-once, with any action conditional on the existing contents of any key.

...

Code Block
Recovery
Coordinator C: 
    send Recover(X) to replicas of each shard
Replica R receiving Recover(X): 
    Ensure X is PreAccepted; if only PreAccepted compute deps
    Wait = {Accepted transactions with lower t0 but higher t than X}
    Superceding = {Accepted transactions with higher t0 that did not witness X}
                ∪ {Committed transactions with higher t than t0 of X, that did not witness X}
    Reply (local state for X, Wait, Superceding)
Coordinator C (with a quorum of responses from each shard):
    If (any R in responses had X as Applied, Committed, or Accepted) then continue the state machine from there
    Otherwise 
        If (any shard has insufficient R where t = t0 to have taken the fast path) then propose the highest t of any response on the Slow Path
        If (any R.Superceding is non-empty) then propose the highest t in any response on the Slow Path
        If (any R.Wait is non-empty) then wait for them to commit and retry
        Otherwise propose t0 on the Slow Path

Test Plan

This work will be tested extensively using a combination of isolated randomised testing using verification of the standalone library using Jepsen.io tooling, dedicated linearizability and serializability verification of the standalone library, and similar facilities in situ in Cassandra using the Simulator facility coming with CEP-10.