Geode supports creating Lucene indexes on data in Geode regions. See the javadocs for and user guide for details on the API.

Internally, lucene indexes are stored in a region which is colocated with the user region. Geode provides an implementation of Lucene's Directory interface which stores lucene's data files directly in that colocated region, rather than on disk.

Lucene indexes are updated asynchronously by an AsyncEventQueue which is attached to the region.

Data flow into a lucene index


Colocated Partitioned Regions User Data Region Async Queue Lucene Regions LuceneIndex RegionDirectory User Puts Batch Writes  


A closer look at Partitioned region data flow


LuceneIndex AEQ listener processes events into index documents RegionDirectory1 file region bucket 1 RegionDirectory2 file region bucket 2 User User Data Region Bucket 1 Async Queue Bucket 1 Bucket 2 Async Queue Bucket 2 PUTs Batch Write Batch Write


Class Diagram


One For Each Bucket IndexRepository Operates on a the lucene index: put, delete query IndexRepositoryImpl LuceneSerializer Converts a user object into a lucene document RegionDirectory Implements Lucene's directory interface but writes to geode regions «Lucene Class»IndexWriter «Lucene Class»Directory FileSystem BucketTargetingMap BucketRegion LuceneServiceImpl LuceneIndex LuceneIndexForPartitionedRegion RepositoryManager PartitionedRepositoryManager Manages an IndexRepository for each bucket AsyncEventQueue LuceneEventListener creates 1many



Processing Queries

  User LuceneQuery User Data Region Function Execution Bucket 1 RegionDirectory for bucket 1 Bucket 2 RegionDirectory for bucket 2 fields, Analyzer, query strings, or Query call search() score, key TopDocs, ScoreDocs score, key TopDocs, ScoreDocs

 

Handling failures, restarts, and rebalance 

The index region and async event queue will be restored with its colocated data region's buckets.  So during failover the new primary should be able to read/write index as usual.

 

Aggregation

In the case of partitioned regions, the query must be sent out to all the primaries. The results will then need to be aggregated back together. Lucene search uses FunctionService to distribute query to primaries. 

Input to primaries

  1. Serialized Query
  2. CollectorManager to be used for local aggregation
  3. Result limit

Output from primaries

  1. Merged collector created from results of search on local bucket indexes.

  LuceneQuery LuceneQuery FunctionService FunctionService FunctionCollector FunctionCollector CollectorManager CollectorManager M1_LuceneFunction M1_LuceneFunction M1_CollectorManager M1_CollectorManager Index_1 Index_1 Index_2 Index_2 M2_LuceneFunction M2_LuceneFunction Query LuceneContext LuceneContext search(Collector_1) loaded Collector_1 search(Collector_2) loaded Collector_2 merge Collectors merged Collector Collector_M1 Collector_M2 merge Collectors Final Collector Final Collector QueryResults



 Naba drawed flowcharts for LuceneIndex:









 
  • No labels