Lucene Internals

Geode supports creating Lucene indexes on data in Geode regions. See the javadocs for and user guide for details on the API.

Internally, lucene indexes are stored in a region which is colocated with the user region. Geode provides an implementation of Lucene's Directory interface which stores lucene's data files directly in that colocated region, rather than on disk.

Lucene indexes are updated asynchronously by an AsyncEventQueue which is attached to the region.

Data flow into a lucene index

A closer look at Partitioned region data flow

Class Diagram

Processing Queries

Handling failures, restarts, and rebalance

The index region and async event queue will be restored with its colocated data region's buckets. So during failover the new primary should be able to read/write index as usual.

Aggregation

In the case of partitioned regions, the query must be sent out to all the primaries. The results will then need to be aggregated back together. Lucene search uses FunctionService to distribute query to primaries.

Input to primaries

Serialized Query
CollectorManager to be used for local aggregation
Result limit

Output from primaries

Merged collector created from results of search on local bucket indexes.

Naba drawed flowcharts for LuceneIndex:

Space shortcuts

Page tree