THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Implement approximate nearest neighbor (ANN) vector search capability in Apache Cassandra using storage-attached indexes (SAI).
- Support Float32 a vector of float32 embeddings as a new CQL type.
- Add ANN search to work with normal Cassandra data flow (insertion, updating, and deleting rows). The implementation should support adding a new vector in log(N) time, and ANN queries in M log(N) time where N is the number of vectors and M is the number of sstables.
- Compose with other SAI predicates.
- Enable Apache Cassandra to be the Vector Search component in ML platforms, and intuitive to use for Data Engineers new to Cassandra.
...
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE test;
CREATE TABLE test.foo(
i INT PRIMARY KEY,
j VECTOR<float, 3>
);
CREATE CUSTOM INDEX ann_index ON foo(j) USING 'StorageAttachedIndex';
INSERT INTO test.foo (i, j) VALUES (1, [8, 2.3, 58]);
INSERT INTO test.foo (i, j) VALUES (2, [1.2, 3.4, 5.6]);
INSERT INTO test.foo (i, j) VALUES (5, [23, 18, 3.9]);
SELECT * FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1;
i |j
---+---------------------------------------------------------
5 |[23, 18, 3.9]
...
- Verify that ANN search works with normal Cassandra data flow (insertion, updating, and deleting rows).
- Test the integration of Lucene's HNSW with the SAI framework.
- Verify cross-partition search and validate ANN results
- Simulate corrupted stored data vs index data on disk
- Test the new data type (VECTOR<type, dimension>) and CQL operator (ANN) with various use cases.
- Evaluate the performance of the new features and their impact on existing Cassandra setups.
...