Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder DiscussionMerged

Discussion thread: here

JIRA: TBD 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyCASSANDRA-16921

Released: Unreleased

Audience: Cassandra Developers
User Impact: None
Target Release: 4.0.xQ3 2021

Motivation

Like all distributed systems, proving correctness in Cassandra is challenging. We have made great strides in testing with in-jvm dtests, harry and other approaches, however these all require either fairly invasive and deliberate perturbations to the normal ordering of events in the system in order to elicit specific conditions, regressions etc, or they must be made immune to changes in the ordering of events in order for the test to run successfully.

Ideally boundary and unexpected conditions and event orderings could be elicited automatically, without any manual intervention - either for cluster level tests, or those at the component or class level. Tests written only to assert internal consistency of state could then explore far more system behaviours for the same level of investment.

This work dovetails well with Harry, which automates the exploration and validation of different data patterns and workloads. We hope to eventually combine the two approaches.

...

  • Refactor internal APIs around concurrency to support mock implementations that are able to control execution, including
    • SimpleCondition, Semaphore, CountDownLatch, BlockingQueue, etc
    • Executors, futures, starting threads, etc - including important improvements to consistency of approach in the codebase
    • The use of currentTimeMillis and nanoTime
    • The replacement of java.io.File with a wrapper on java.nio.files.Path providing an ergonomic API, and some improvements to consistency of file handling
    • Support for alternative streaming implementations
    • Improvements to the dtest API to support necessary functionality
  • Introduction of a simulator package, containing
    • Object monitors
    • Network messages
    • Intercept monitor entry/exit and control when these occur
    • Intercept the invocation of certain global methods we mock the implementation of
    • Replace certain non-deterministic constructs with deterministic ones, such as IdentityHashMap, Object.hashCode(), Enum.hashCode()
    • Pseudo-randomly pause thread execution either side of important (ordinarily non-blocking) synchronisation events, such as atomic field updating, volatile field access, etc
    • Mock implementations of all systems that control event ordering, including those mentioned above; and
      • Object monitors
      • Network messages
    • A framework for intercepting events on these mock systems and translating them into events to be scheduled and evaluated in arbitrary order
    • A system for orchestrating random modifications to cluster topology that should not affect the correctness of operations on the system (initially this will be quite strict as to how these events occur, given Cassandra’s present weakness in performing these reliably)
    • Byte weaving class loaders for modifying execution to:
      • Intercept monitor entry/exit and control when these occur
      • Intercept the invocation of certain global methods we mock the implementation of
      • Replace certain non-deterministic constructs with deterministic ones, such as IdentityHashMap, Object.hashCode(), Enum.hashCode()
      • Pseudo-randomly pause thread execution either side of important (ordinarily non-blocking) synchronisation events, such as atomic field updating, volatile field access, etc
  • Introduction of test cases using the new facilities, including
    • A linearizability verifier for LWTs
    • Unit test to expose concurrency bugs in an individual class

...