This is a place to gather big ideas and think differently about the future of Cassandra. This list was initially started at ApacheCon 2022 in New Orleans. In a birds of a feather session after a full day of Cassandra talks, a diverse group of users and committers had a fun jam session. Just throw out the wildest ideas, and let's collect them. It was inspired by this talk given by Benedict Elliott Smith in 2015.
If you feel compelled to expound on any of the points below, please create a new sub-page and link in this document.
Idea | Proposer | CEPs or Jiras | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Endless Partitions that you can read (why people should care about bucketing). | Jeremy Hanna | ||||||||||
0-allocation compaction and validation compaction (80% of allocations are user data) | David Capwell | ||||||||||
Repair: it should just work. Repair service should be internal to Cassandra. | David Capwell | ||||||||||
Maintenance scheduling in Cassandra (?) or adjacent distributed workflow service that does scheduling. | Joey Lynch | ||||||||||
Eliminate Repair (real-time repair). | Alex Petrov | ||||||||||
Global Arbitrary Sort and Offset across partition keys like Mongo does it ™ | Patrick McFadin | ||||||||||
Multi-consistency level (multi ACK): block till LOCAL_QUORUM but give a second ACK on replication. For people who want performance, not just consistency. “Wait for replication delay” MySQL thing. | Jordan West Joey Lynch | ||||||||||
It should not matter how many tombstones you have. | Branimir Lambov | ||||||||||
Pagination that actually works (maybe snapshot isolation?). | Alex Petrov Jordan West Joey Lynch | ||||||||||
COUNT that doesn’t involve ALLOW FILTERING. | Patrick McFadin | ||||||||||
Public Analytics/Batch Contact. End goal is to make it easier to use the data you already have and use it for analytics. Format should make it easier to make range queries upon. | Joey Lynch Alex Petrov Jeremy Hannah Jordan West | ||||||||||
CQL operations as a means to get rid of everything. Invoke operations via CQL. Deprecate JMX. Add an ability to query the status of the operation. Have an ID of the running operation. | Paulo Motta | CEP-38: CQL Management API | |||||||||
Diagnostic Diagnotic events in Chronicle bin logs + some kind of a visual "replayer" of events, something like "bootchart" Linux Kernel has | Stefan Miklosovic |
| |||||||||
Self-tuning. Either basic built-in features for some settings or a node/DC controller using ML to optimize an output e.g. read latency P99. | Romain Hardouin | ||||||||||
Continuous query / notifications / subscription / notifier / listener for data and virtual tables. (e.g. streaming CQL statements, not pagination isolation) | Mick Semb Wever | ||||||||||
OR operator | Jordan West Jeremy Hannah | ||||||||||
Cost-based Query Planner | Patrick McFadin | ||||||||||
Improve performance of IN queries with multiple rows in the same partition (it is slow now) | Jordan West Joey Lynch | ||||||||||
JOINs in CQL | Parick McFadin | ||||||||||
Lifecycle on the CF level rather than row; retention policies. Move lifecycle into metadata. | Joey Lynch | ||||||||||
Dynamic TTL (could be related to the previous item, a way to implement it). | Paulo Motta Joey Lynch | ||||||||||
Unified Compaction Strategy. | Joey Lynch Branimir Lambov | ||||||||||
Out-of-process compaction. Or, taking it even further, Cloud-Aware Cassandra. | Jeremy Hannah | ||||||||||
Global 2i (basically accord + infinite partitions). | Claude Warren | ||||||||||
MVs that work (basically, accord). | Jordan West | ||||||||||
Modern build system (not ant). Gradle. Teh people have spoken. | Benjamin Lerer | ||||||||||
Formalize backups API / contract | Paulo Motta | ||||||||||
Efficient bulk load client interface. Goal: replace SSTables streaming in jobs (e.g. Spark) with storage agnostic format. Handle replication? RBAC? | Romain Hardouin | ||||||||||
Backups, but with a consideration of deduplication | Jeremy Hannah | ||||||||||
Live migration of keyspaces. Dual writing for table migration. Could be related to bulk import API. Same- and multi- cluster migration. | Paulo Motta Patrick McFadin | ||||||||||
Actual Multi-Tenancy | Jordan West Alex Petrov | ||||||||||
Resource management / resource isolation | Mick Semb Wever | ||||||||||
Infinite number of tables | Jeremy Hannah | ||||||||||
Elasticity. Scale down. | Mick Semb Wever | ||||||||||
µCassandra | Patrick McFadin | ||||||||||
Bad Partition Handling | Cheng Wang | ||||||||||
Zone Maps. Statistics over a block of data. Cardinality, number of nulls, visibility into the data. Can be used in the query planner. | David Caldwell | ||||||||||
Make it possible to replicate seamlessly and securely over (problematic?) geographic boundaries. | Jeremy Hannah | ||||||||||
Evaluate Quick protocol for internode messages. | Joey Lynch | ||||||||||
Data placement. Routing keys / Static Column (?). Support GDPR relationships, restrict placement of partitions to a subset of cluster. | Joey Lynch | ||||||||||
Snapshot Isolation. Restore data from specific snapshot / time travel. | Cheng Wang Jordan West | ||||||||||
Quality of Service labeling | Mick Semb Wever | ||||||||||
CONTAINS / NOT CONTAINS queries for anything, including collections. | Patrick McFadin | ||||||||||
T-Shirt sizing of numTokens | Aaron Ploetz | ||||||||||
Dynamic balancing of data | Mick Semb Wever | ||||||||||
Type-based cell resolving. General CRDTs, bitmaps, bitsets. | |||||||||||
Transparent Data Encryption | Cheng Wang | ||||||||||
Fix live-migrate collections. Take a look at exposing complex and deleted columns metadata. | Benjamin Lerer Joey Lynch Alex Petrov | ||||||||||
Formal contract about JVM version support. Looks like it might already be discussed on ML. | Jeremy Hannah Mick Semb Wever | ||||||||||
Codebase that is approachable for the new people | Everyone | ||||||||||
Bring LHF tag back | Erick Ramirez | ||||||||||
Diverse and engaged community | Erick Ramirez | ||||||||||
Make it easier for new contributors to start on projects | Erick Ramirez |
...