Table of Contents |
---|
Status
Current state: Draft
Discussion thread:
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Anti-entropy (Apache Cassandra repairs) is important for every Apache Cassandra cluster to fix data inconsistencies. Frequent data deletions and downed nodes are common causes of data inconsistency. A few open-source orchestration solutions that trigger repair externally are available, as many corporations may have already figured out their own repair solution. However, multiple custom solutions have led to a lot of confusion. Therefore, the repair activity should be an integral part of Cassandra itself, very much like Compaction, to call it a complete solution.
Audience
This enhancement proposal unlocks newly adopting Apache Cassandra users, who often must make significant investments before using it.
Goals
- The proposal is to align one solution among the existing solutions and have it officially blessed and supported as a first-class by the Apache Cassandra community.
- The solution has to be extremely easy for an operator to manage, so any naive user should be able to manage it.
- The solution should scale on a large fleet without much additional operational overhead. In other words, the operational complexity should not linearly increase with the Cassandra fleet size.
Non-Goals
- Automated repair inside Cassandra itself, like compaction.
Proposed Changes
TODO
- We already have a few ready-made solutions available and being used in the industry at scale in private forks. So, the first and foremost thing is to get a consensus among the available solutions.
- Once the solution is finalized, then integrate it with the latest Apache Cassandra trunk.
New or Changed Public Interfaces
TODO
- No new changes to the interface. In fact, the completely supported automated repairs inside Cassandra should not require any changes. It should just work like Compactions!
- Additional interface to tune the repair configurations.
Compatibility, Deprecation, and Migration Plan
- Folks who already have their own solution should be able to continue their solution. They can, however, migrate to the offered solution if it meets all their criteria.
Test Plan
- Operationally easy to manage.
- Correctness of the repair solution.
Rejected Alternatives
TODO