Audience: All Cassandra Users and Developers
User Impact: Support for fast general purpose transactions
Whitepaper: Accord
Motivation
Users must expend significant effort to modify their database consistently while maintaining scalability. Even simple transactions involving more than one partition may become complex and error prone, as a distributed state machine must be built atop the database. Conversely, packing all of the state into one partition is not scalable.
Performance also remains an issue, despite recent Paxos improvements: latency is still twice its theoretical minimum over the wide area network, and suffers particularly badly under contention.
This work aims to improve Cassandra to support fast general purpose transactions. That is, those that may operate over any set of keys in the database atomically, modifying their contents at-once, with any action conditional on the existing contents of any key.
...
Code Block |
---|
Recovery Coordinator C: send Recover(X) to replicas of each shard Replica R receiving Recover(X): Ensure X is PreAccepted; if only PreAccepted compute deps Wait = {Accepted transactions with lower t0 but higher t than X} Superceding = {Accepted transactions with higher t0 that did not witness X} ∪ {Committed transactions with higher t than t0 of X, that did not witness X} Reply (local state for X, Wait, Superceding) Coordinator C (with a quorum of responses from each shard): If (any R in responses had X as Applied, Committed, or Accepted) then continue the state machine from there Otherwise If (any shard has insufficient R where t = t0 to have taken the fast path) then propose the highest t of any response on the Slow Path If (any R.Superceding is non-empty) then propose the highest t in any response on the Slow Path If (any R.Wait is non-empty) then wait for them to commit and retry Otherwise propose t0 on the Slow Path |
Test Plan
This work will be tested extensively using a combination of isolated randomised testing using verification of the standalone library using Jepsen.io tooling, dedicated linearizability and serializability verification of the standalone library, and similar facilities in situ in Cassandra using the Simulator facility coming with CEP-10.