This is a place to gather big ideas and think differently about the future of Cassandra. This list was initially started at ApacheCon 2022 in New Orleans. In a birds of a feather session after a full day of Cassandra talks, a diverse group of users and committers had a fun jam session. Just throw out the wildest ideas, and let's collect them. It was inspired by this talk given by Benedict Elliott Smith in 2015.
Idea | Proposer | CEPs or Jiras | |
---|---|---|---|
Endless Partitions that you can read (why people should care about bucketing). | Jeremy Hanna | ||
0-allocation compaction and validation compaction (80% of allocations are user data) | David Capwell | ||
Repair: it should just work. Repair service should be internal to Cassandra. | David Capwell | ||
Maintenance scheduling in Cassandra (?) or adjacent distributed workflow service that does scheduling. | Joey Lynch | ||
Eliminate Repair (real-time repair). | Alex Petrov | ||
Global Arbitrary Sort and Offset across partition keys like Mongo does it ™ | Patrick McFadin | ||
Multi-consistency level (multi ACK): block till LOCAL_QUORUM but give a second ACK on replication. For people who want performance, not just consistency. “Wait for replication delay” MySQL thing. | Jordan West Joey Lynch | ||
It should not matter how many tombstones you have. | Branimir Lambov | ||
Pagination that actually works (maybe snapshot isolation?). | Alex Petrov Jordan West Joey Lynch | ||
COUNT that doesn’t involve ALLOW FILTERING. | Patrick McFadin | ||
Public Analytics/Batch Contact. End goal is to make it easier to use the data you already have and use it for analytics. Format should make it easier to make range queries upon. | Joey Lynch Alex Petrov Jeremy Hannah Jordan West | ||
CQL operations as a means to get rid of everything. Invoke operations via CQL. Deprecate JMX. Add an ability to query the status of the operation. Have an ID of the running operation. | Paulo Motta | ||
Diagnotic events in Chronicle bin logs + some kind of a visual "replayer" of events, something like "bootchart" Linux Kernel has | Stefan Miklosovic |
| |
Self-tuning. Either basic built-in features for some settings or a node/DC controller using ML to optimize an output e.g. read latency P99. | Romain Hardouin | ||
Continuous query / notifications / subscription / notifier / listener for data and virtual tables. (e.g. streaming CQL statements) | Mick Semb Wever | ||
OR operator | Jordan West Jeremy Hannah | ||
Cost-based Query Planner | Patrick McFadin | ||
Improve performance of IN queries with multiple rows in the same partition (it is slow now) | Jordan West Joey Lynch | ||
JOINs in CQL | Parick McFadin | ||
Lifecycle on the CF level rather than row; retention policies. Move lifecycle into metadata. | Joey Lynch | ||
Dynamic TTL (could be related to the previous item, a way to implement it). | Paulo Motta Joey Lynch | ||
Unified Compaction Strategy. | Joey Lynch Branimir Lambov | ||
Out-of-process compaction. Or, taking it even further, Cloud-Aware Cassandra. | Jeremy Hannah | ||
Global 2i (basically accord + infinite partitions). | Claude (didn't get the last name) | ||
MVs that work (basically, accord). | Jordan West | ||
Modern build system (not ant). Gradle. Teh people have spoken. | Benjamin Lerer | ||
Formalize backups API / contract | Paulo Motta |