IDIEP-49
Author
SponsorMaksim Timonin
Created 22 May 2020
Status

ACTIVE


Motivation

Since Ignite wants to leverage from several SQL engines we need to make work with index independent from the used SQL engine. We also should consider moving all machinery related to index to the core module to make it available from any module that wants to use it.

Description

Introduced abstractions:

Index - base index interface with common methods:

  • id() unique ID
  • name() unique name
  • isOnline()
  • onUpdate(oldRow(nullable), newRow(nullable)) where rows have an indexed types
  • find(Args) - where args are implementation specific arguments, like a TextQuery for a full text index or lowerBound + upperBound for sorted index, returns a cursor over found rows
  • unwrap(indexInterface extends Index) - gets index implementation interface instance, for example, it's obvious that fullText index may have some additional methods, different to hash or sorted index.
  • acquire(Cancellable owner, boolean force) when calls with force = true on all previous owners cancel() method calls and waits for all previous owners leave the index, after index acquired, destroy flag checks, so that, we wait for all index readers (like running queries) finish gracefully before the index is dropped or altered and cannot acquire dropped or altered index.
  • leave() 
  • break() marks index instance as stale (after drop or altering) after this method called, acquire method throws an exception (index destroyed or modified)

IndexDefinition describes index implementation (should be enhanced for each index implementation and provide implementation specific parameters):

  • indexId - long identifier
  • indexName - String 
  • sourceCacheId - cache id the index is created for
  • indexType - Enum (hash, sorted, fullText, userDefined)
  • indexedType type the index is built for
  • indexedColumns list of columns, it's needed to make possible skipping index updates when an indexed column was not changed
  • indexFactory - creates index instance
  • indexValidatorList - optional, the way to implement various constrains

IndexFactory creates specific index instance:

  • create(IndexManager, IndexDefinition) - on index is altered all internal structures may be obtained from previous instance using IndexManager 

IndexLifecycleListener - all callbacks should be executed on both client and server nodes.

  • onIndexCreated(IndexDefinition)
  • onIndexModified(IndexDefinition)
  • onIndexDeleted(indexId)
  • onIndexStateChange(indexId, newState) - to make indexes online/offline

IndexManager allows next operations (like appropriate SQL commands):

  • createIndex(IndexDefinition)
  • alterIndex(IndexDefinition)
  • dropIndex(IndexId)
  • getIndex(indexId) - throws an exception on client nodes
  • listen(IndexLifecycleListener)
  • onRowUpdate(cacheId, oldRow, newRow) - callback method, called on cache entry update by IgniteCacheOffheapManager

Basic postulates:

  • Newly created index is in offline state
  • offline index cannot be read, only modified (in scope of index rebuild or regular updates)
  • index becomes online after index rebuild
  • Index may be created on cache start, by DDL command or by API call (direct call to IndexManager)
  • on index create an Index instance is created using provided factory, this way we may introduce geospatial indexes or prefix trees in future just providing specific factory.
  • sorted index represents a database index in terms of SQL and requires hash index created first (if not exists).
  • hash index is just a proxy to cache partitions and always online, it represents a table in terms of SQL. This way SQL queries may be executed before index is fully built

On index create:

  1. index created - all indexes and definitions registered on all nodes, all indexes starts applying current updates
  2. onIndexCreate() callback executes - index is registered in a query execution engine
  3. index rebuild started - index is filling up with existing data
  4. index rebuild finished - index is ready to use
  5. onIndexStateChange() callback executes - index becomes available for a query execution engine.

On index read:

  1. index is get by its Id from an index manager (if it doesn't exist an exception is thrown - cannot to execute due to schema change)
  2. index is acquired for read, scan cancel (query cancel) callback is provided by reader (if it cannot acquire index, an exception is thrown - cannot to execute due to schema change)
  3. index is reading (if a cancel occurs the index has to be leaved by reader)
  4. reader leaves the index (while index is acquired the index cannot be dropped or modified)

On index delete:

  1. index is get by its Id from an index manager (if it doesn't exist an exception is thrown - cannot to execute due to schema change)
  2. index is acquired for write (all current readers may be cancelled or not, depending on force flag, if the index cannot be acquired, an exception is thrown - cannot to execute due to schema change)
  3. an index marked as broken, all readers trying to acquire it right now and after will get an exception
  4. onIndexDeleted() callback executes - index is deregistered from a query execution engine
  5. all internal structures are destroyed gracefully

On index altered:

  1. index is get by its Id from an index manager (if it doesn't exist an exception is thrown - cannot to execute due to schema change)
  2. index is acquired for write (all current readers may be cancelled or not, depending on force flag, if the index cannot be acquired, an exception is thrown - cannot to execute due to schema change)
  3. an index marked as broken, all readers trying to acquire it right now and after will get an exception
  4. a new index is created on the basis of previous one and registers in the manager
  5. onIndexModified() callback executes - index is updated in a query execution engine and becomes offline, planner shouldn't consider its using
  6. in case the index needs rebuild,
    1. index rebuild started - index is filling up with existing data
    2. index rebuild finished - index is ready to use
  7. onIndexStateChange() callback executes - index becomes available for a query execution engine.

Risks and Assumptions

New indexes should be binary compatible with current H2 indexes

Dev list discussion

http://apache-ignite-developers.2346864.n4.nabble.com/Basic-index-infrastructure-as-a-part-of-core-APIs-td47638.html

JIRA tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels

2 Comments

  1. Igor Seliverstov I have the following points to discuss:

    • Looks like there is some sort of circular dependency between IndexFactory and IndexManager - index factory is indirectly passed to create index as a part of IndexDefinition, but at the same time index manager is passed to the factory's createIndex method.
    • Do we need index factory and user defined indexes at all? We should have a clear separation between index types, but I doubt it will be practical for the users to provide their own index implementation. This will simplify the overall architecture with little or no sacrifice in functionality
    • Perhaps we need to specify what Args is? Looks like there are some common cases where we need to specifically expose internal structures to the cursor logic: for example, partition filter, MVCC version visibility filter
    • We need to specify the cursor interface to support reactive-style iteration because a filter (above) may skip very large number of rows before a row is passed to upper level (this is important for further Calcite integration). Perhaps, we need to explicitly define the limit for cursor.next() and allow the method to return a marker row so that we can resubmit the iteration
  2. Alexey Goncharuk, see my comments below:

    1. we need the manager available for a factory to provide creation context. It's needed to alter an index and use previous index internal structures inside a new one.
    2. Possible it isn't needed, but it's a way to create specific index structures in a query module (like Ignite Calcite query engine) having core module unmodified.
    3. We may remove find methods from base interface, specific implementation could define all necessary arguments
    4. It makes sense I think, so, lets take it into consideration when we will design appropriate indexes (such as sorted ones)