Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PlantUML
() User -down-> [User Data Region] : PUTs

[User Data Region] ..> [Bucket 1]
 [Bucket 1] -down-> [Async Queue Bucket 1]
node LuceneIndex {
[Async Queue Bucket 1] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory1]
[RegionDirectory1] -down-> [file region bucket 1]

[file region bucket 1] -down-> [chunk region bucket 1]
}
 
[User Data Region] ..> [Bucket 2]
 [Bucket 2] -down-> [Async Queue Bucket 2]
node LuceneIndex {
[Async Queue Bucket 2] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory2]
[RegionDirectory2] -down-> [file region bucket 2]
[file region bucket 2] -down-> [chunk region bucket 2]
}


In a partition region eIf user's data region is a partitioned region, there will be one LuceneIndex is for the partitioned region. Every bucket in the data region will have its own GeodeFSDirectory to store the lucene indexes. The GeodeFSDirectory implements a file system using 2 regions RegionDirectory (implements Lucene's Directory interface), which keeps the FileSystem for index regions. Index regions contain 2 regions:
  • FileRegion : holds the meta data about indexing files
  • ChunkRegion : Holds the actual data chunks for a given index file. 

The FileRegion and ChunkRegion will be collocated with the data region which is to be indexed. The GeodeFSDirectory FileRegion and ChunkRegion will have a key that contains the bucket id for file metadata chunks. The FileRegion and ChunkRegion will have partition resolver that looks partition resolver that looks at the bucket id part of the key only.
In AsyncEventListener, when a data entry is processed
  1. create document for indexed fields
  2. determine the bucket id of the entry.
  3. Get the directory RegionDirectory for that bucket, do save the indexing operation document into that instanceRegionDirectory. 

Storage with different region types

...

The index and async event queue will be stored and a region with the same redundancy level as the original region. We will take care to ensure that all updates are written to the index files before removing events from the queue. So during failover the new primary should be able to read index files from disk.

 

Walkthrough creating index in Geode region

  
1) Create a LuceneIndex object to hold the data structures that will be created in following steps. This object will be registered to cache owned LuceneService later. 
2) LuceneIndex will keep all the reflective fields. 2
3 Assume the dataregion is PartitionedRegion (otherwise, no need to define PartitionResolver). Create a FileRegion (let's call it "fr") and a ChunkRegion (let's call it "cr"), collocated with Data Region (let's name it "dataregion"). FileRegion and ChunkRegion use the same region attributes as dataregion. If the index regions are persistent, use dataregion's bucket name as path to persist index region. For example, dataregion bucket name is /root/_P_BUCKET_1, then the path will be _B_dataregion_21 (dataregion's bucket 21).In partitioned region case, the FileRegion and ChunkRegion will be under the same parent region, i.e. /root in this example. In replicated region case, the index regions will be root regions all the time. 
 
3) Create a GeodeDirectory object using the FileRegion, ChunkRegion and the path we got in previous step. 
 
4) Create PerFieldAnalyzerWrapper and save the fields in LuceneIndex. 
 
5) Create a Lucene's IndexWriterConfig object using Analyzer. 
 
6) Create a Lucene's IndexWriter object using GeodeDirectory and IndexWriterConfig object. 
 
7) Define PartitionResolver to use dataregion's bucket id as routing object, which will guarantee the index bucket region will be the same bucket id as the dataregion's bucket region's even when dataregion has its own customer-defined PartitionResolver. We don't nedd to define PartitionResolver on dataregion. 
 
8) Define AEQ with multiple dispatcher threads and order-policy=partition. That will group events by bucket id into different dispatcher queues. Each dispatcher thread will call our AEQ listener to process events for one or more buckets. Each event will be processed to be Document and write into ChunkRegion via GeodeDirectory. We don't need lock for GeodeDirectory, since only one thread will process one bucket's events. 
 
9) If dataregion is a replicated region, then define AEQ with single dispatcher thread. 
 
10) Register the newly created LuceneIndex into LuceneService. The registration step will also publish the meta data into the "lucene_meta_region" which is a persistent replicate region, then other JVM will know a new luceneIndex with these meta data was created. All the members should have a LuceneService instance with the same LuceneIndex definition.

...