Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add Status section

Audience: All Cassandra Users and Developers
User Impact: Support for fast general purpose transactions
Whitepaper: Accord

...

GitHub: https://github.com/

...

apache/cassandra-accord

...

Motivation

...


Status

Current state: Accepted

Discussion thread: https://lists.apache.org/thread/xgj4sym0d3vox3dzg8xc8dnx4c8jb4d5 , https://lists.apache.org/thread/j402lzzf7m699zc2vk23vgfxz8wwtlyl

JIRA

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyCASSANDRA-17092

Motivation

Users must expend significant effort to modify their database consistently while maintaining scalability. Even simple transactions involving more than one partition may become complex and error prone, as a distributed state machine must be built atop the database. Conversely, packing all of the state into one partition is not scalable.

Performance also remains an issue, despite recent Paxos improvements: latency is still twice its theoretical minimum over the wide area network, and suffers particularly badly under contention.

This work aims to improve Cassandra to support fast general purpose transactions. That is, those that may operate over any set of keys in the database atomically, modifying their contents at-once, with any action conditional on the existing contents of any key.

Goals

  • General purpose transactions (may operate over any keys in the database at once)
  • Strict-serializable isolation
  • Optimal latency: one wide area round-trip for all transactions under normal conditions

Goals

  • General purpose transactions (may operate over any keys in the database at once)
  • Strict-serializable isolation
  • Optimal latency: one wide area round-trip for all transactions under normal conditions
  • Optimal failure tolerance: latency and performance should be unaffected by any minority of replica failures
  • Scalability: there should be no bottleneck introduced
  • Should have no intrinsic limit to transaction complexity
  • Must work on commodity hardware
  • Must support live migration from Paxos

...

Code Block
languagesql
BEGIN BATCH
UPDATE tbl1 SET value1 = newValue1 WHERE partitionKey = k1
UPDATE tbl2 SET value2 = newValue2 WHERE partitionKey = k2 AND conditionValue = someCondition
APPLY BATCH

Long Term

This work is expected to replace Paxos for the project. In order to support live migration both protocols will be maintained for at least an interim period, after which it is expected that Paxos will be deprecated. The precise timing of this will be up to the community.

 BATCH

Long Term

This work is expected to replace Paxos for the project. In order to support live migration both protocols will be maintained for at least an interim period, after which it is expected that Paxos will be deprecated. The precise timing of this will be up to the community.

While not an initial goal, supporting transactional global secondary indexes and supporting non-global modes of operation (i.e. to replace LOCAL_SERIAL for latency sensitive users) are future goals we anticipate enabling with this work.

This work shall be developed in a modular manner, to allow for coexistence with other consensus protocols or transaction managers. This will allow us to evolve Accord without precluding alternative solutions, as future work expands Cassandra's transactional capabilities beyond the goals of this CEP. Initially, supporting the Paxos-based LWT and Accord side by side is also an example of such modularity and optionality.While not an initial goal, supporting transactional global secondary indexes and supporting non-global modes of operation (i.e. to replace LOCAL_SERIAL for latency sensitive users) are future goals we anticipate enabling with this work.

Non-Goals

  • UX improvements to exploit this facility are considered out of scope, and will be addressed in separate CEP
  • SASI, SAI and local secondary indexes

...

Execution
The union of all dependencies received during consensus is derived before t is disseminated via Commit and simultaneously a Read is issued by C to a member of each participating shard (preferably in the same DC), with those dependencies known to participate in that shard attached. This replica waits for all dependencies to be committed before filtering out those that are assigned a later t. The remaining dependencies are waited on until they execute and their result applied on this replica, before the read is evaluated and returned to the coordinator. C combines these responses to compute an update and client response, which is then disseminated by Apply to all replicas and returned to the client (respectively).

Code Block

Execution
Replica R receiving Commit(X, deps):
    Committed[X] = true
Coordinator C: 
    send a read to one or more (preferably local) replicas of each shard 
        (containing those deps that apply on the shard)
Replica R receiving Read(X, t, deps): 
    Wait for deps to be committed
    Wait for deps with a lower t to be applied locally
    Reply with result of read
Coordinator C (with a response from each shard):
    result = execute(read responses)
    send Apply(result) to all replicas of each shard
    send result to client
Replica R receiving Apply(X, t, deps, result):
    Wait for deps to be committed
    Wait for deps with a lower t to be applied locally
    Apply result locally
    Applied[X] = true

...