Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Audience: All Cassandra Users and Developers
User Impact: Improved LWT performance, particularly in WAN or contended scenarios

Released: 4.1

Motivation

  • LWTs suffer from poor performance, particularly in a WAN setting, and particularly under contention.
  • LWTs have never guaranteed linearizability across range movements, which is a significant problem for a mechanism intended to offer strong consistency.
  • CASSANDRA-12126 has introduced significant performance regressions in order to resolve long-standing correctness issues. This may result in users being unable to use LWTs where they could previously, or else having to accept a poorly-documented correctness trade-off in order to keep the lights on.

...

We will deprecate system.paxos TTLs, and instead expunge records that are older than the most recent paxos repair for any given range/table.

This mechanism alone will permit safely returning success prior to performing the COMMIT step of our Paxos implementation. Users will still have to opt-in to this behaviour by providing commit consistency of ANY, or ONE, or perhaps LOCAL_QUORUM, depending on their preference. However this can be recommended as safe and preferred once this mechanism is in place, taking us from eight to six message delaysfour to three round-trips.

Paxos Optimisations
Several optimisations to our paxos implementation will be introduced, including

  • Combine promise+read before a proposal: if the proposal is successful, the read will have been linearized along with the write, taking us to four message delaystwo round-trips.
  • Optimistic reads: if a majority of promises witnessed consistent state when promising and performing their read, the majority read can be returned to the client without waiting to issue an empty proposal. This takes us to two message delays on one round-trip on read.
  • Preventing read/read competition: promises will be issued separately for reads and writes, with read promises invalidating write promises, and write promises invalidating read promises, but read promises will not invalidate each other, or prevent the above optimistic read optimisation.
  • Bounding re-proposals: incomplete commands that are re-proposed will not continue to be re-proposed if the original command has been committed (specifically, we track separately the ballot of the original proposal and the re-proposal, so that if the original proposal reaches the commit state as part of the original proposal, or any re-proposal, all re-proposals can instead go straight to commit).
  • Coordinators will not self-compete for operations on the same partition
  • Coordinators will cache PaxosState to limit dependence on performance of system.paxos

...