Status

Current stateDraft

Discussion thread:

JIRA

Released: <Cassandra Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Anti-entropy (Apache Cassandra repairs) is important for every Apache Cassandra cluster to fix data inconsistencies. Frequent data deletions and downed nodes are common causes of data inconsistency. A few open-source orchestration solutions that trigger repair externally are available, as many corporations may have already figured out their own repair solution. However, multiple custom solutions have led to a lot of confusion. Therefore, the repair activity should be an integral part of Cassandra itself, very much like Compaction, to call it a complete solution.

Audience

This enhancement proposal unlocks newly adopting Apache Cassandra users, who often must make significant investments before using it.

Goals

  1. The proposal is to align one solution among the existing solutions and have it officially blessed and supported as a first-class by the Apache Cassandra community.
  2. The solution has to be extremely easy for an operator to manage, so any naive user should be able to manage it.
  3. The solution should scale on a large fleet without much additional operational overhead. In other words, the operational complexity should not linearly increase with the Cassandra fleet size.

Non-Goals

  1. Automated repair inside Cassandra itself, like compaction.

Proposed Changes

TODO

  • We already have a few ready-made solutions available and being used in the industry at scale in private forks. So, the first and foremost thing is to get a consensus among the available solutions.
  • Once the solution is finalized, then integrate it with the latest Apache Cassandra trunk.

New or Changed Public Interfaces

TODO

  • No new changes to the interface. In fact, the completely supported automated repairs inside Cassandra should not require any changes. It should just work like Compactions!
  • Additional interface to tune the repair configurations.

Compatibility, Deprecation, and Migration Plan

  • Folks who already have their own solution should be able to continue their solution. They can, however, migrate to the offered solution if it meets all their criteria.

Test Plan

  • Operationally easy to manage.
  • Correctness of the repair solution.

Rejected Alternatives

TODO


  • No labels