Status

Current state: Accepted

Discussion threadhere

JIRACASSANDRA-18584

Released: N/A

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

CQL supports the following comparison operators that can be used in the WHERE clause:

'=', '<', '>', '<=', '>=', '!=', IN, CONTAINS, CONTAINS KEY, LIKE.

It is possible to reverse a query restriction by negating some of the operators:

  • '='  → '!='
  • '!=' → '='
  • '<'  → '>='
  • '>'  → '<='
  • '<=' → '>'
  • '>=' → '<'

However, the operators IN, CONTAINS, CONTAINS KEY and LIKE have no corresponding negative match operator. E.g. it is not possible to match rows with a collection that does not contain any of the given items. This lack of symmetry pushes users to select more rows and resort to application-side filtering.

Audience

  • Cassandra users and application developers.
  • Developers building higher level query APIs on top of CQL (e.g document-oriented, graph oriented, etc).

Goals

  • Improve the flexibility of CQL and make Cassandra more user friendly.
  • Allow more query types that are supported by other databases.
  • Give secondary indexes of various types the ability to do negative searches.

Non-Goals

  • Implementing efficient handling of NOT by all types of indexes in Cassandra is out of scope. This CEP is more a framework that allows adding such support later on a case-by-case basis, and fallback to ALLOW FILTERING when such support is missing from the index implementation.
  • Enabling NOT in front of any logical expression is to be implemented later as a separate CEP, as it requires changes to the statement restrictions code needed also for supporting OR.
  • Enabling NOT in UPDATE and DELETE is not considered, because ALLOW FILTERING is currently not supported there.
  • Rethinking our approach to ALLOW FILTERING. This feauture should be consistent with the current use of ALLOW FILTERING – if an operator is allowed with ALLOW FILTERING, then its negated variant should be also allowed.

Proposed Changes

The changes can be introduced gradually in milestones. We may decide to not implement all of them.

Milestone 1

Introduce new relational operators in the CQL WHERE clause grammar:

  • NOT IN
  • NOT CONTAINS
  • NOT CONTAINS KEY
  • NOT LIKE


Using those new operators will require an ALLOW FILTERING clause. This limitation might be relaxed later, after the storage/indexing engine gets extended with support for those new types of restrictions. 

Example

The following queries become valid:

SELECT * FROM table WHERE  partition_key = 10 AND id NOT IN {1, 5, 8} ALLOW FILTERING;
SELECT * FROM table WHERE  partition_key = 10 AND collection NOT CONTAINS {'foo', 'bar'} ALLOW FILTERING;
SELECT * FROM table WHERE partition_key = 10 AND collection NOT CONTAINS KEY {0, 3} ALLOW FILTERING;

Required changes in Cassandra code:
  • Extending the CQL parser with new operators 
  • Adding new operator variants (org.apache.cassandra.cql3.Operator)
  • Modifying rules in the StatementRestrictions so that negated operators are added to filterRestrictions and adapting the rules for ALLOW FILTERING

Milestone 2

Don't require ALLOW FILTERING if a negative predicate (e.g NOT IN) is used on a primary key column (including clustering columns), for all the operators that currently don't require ALLOW FILTERING. In a valid query that doesn't use ALLOW FILTERING, the user should be able to negate any predicate by prepending NOT to the operator; and the query should still not require ALLOW FILTERING.

Implementation notes:
Implementing NOT IN can be at worst implemented as a series of range queries and returning a union of their results. No changes to the SSTable format are needed.

Milestone 3 

Don't require ALLOW FILTERING if a negative restriction is used on an indexed column and the index supports efficient negative searches - skipping items that match the restricted term. This is useful in particular for implementing predicates of type `collection NOT CONTAINS element`, where collection is indexed.

To be implemented in SAI, after CEP-7 is merged. This can be implemented in SAI by introduction of "negating iterator" that would return a complement of a posting list. This way there is no need to change the implementation of the index data structure and such iterator would compose well with other SAI features, e.g. would support intersections and unions.

New or Changed Public Interfaces

  • New syntax allowed in CQL.
  • New operator variants.

Compatibility, Deprecation, and Migration Plan

CQL syntax

The proposed changes to CQL should be fully backwards compatible - they do not introduce any regressions to the current CQL capabilities. All existing queries should still work the same.

Storage formats

No changes to storage are needed for Milestone 1. There may be some changes needed in secondary index formats to support efficient handling of negated operators without ALLOW FILTERING, but this is to be decided on a case-by-case basis.

Internode messaging

New operator binary codes need to be introduced, therefore old nodes will not be able to deserialize RowFilter. Queries using newly introduced operators must be rejected until all nodes in the cluster are upgraded to the new version that supports those operators. 

Test Plan

In addition to unit tests for the CQL parser or the newly added operators, use CQLTester to test new query types end-to-end and verify their results.
In milestones 2 and 3, performance of queries with NOT IN / NOT CONTAINS that skip a huge number of rows and partitions should be evaluated.

Rejected Alternatives

We considered implementing a NOT operator that could be placed in front of any predicate in a CQL WHERE clause. However this would require deeper changes to StatementRestrictons and RowFilter, for very little functionality gain. We postpone full implementation until after we have OR operator support, which would require those changes anyway.


  • No labels