Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Gfsh API

 

Code Block
// Create Index
gfsh> create lucene-index --name=indexName --region=/orders --fields=customer,tags

// Destory Index
gfsh> destroy lucene-index --name=indexName --region=/orders

Execute Lucene query
gfsh> luceneQuery --regionName=/orders -queryStrings="" --limit=100 page-size=10

 

XML Configuration 

 

Code Block
<region name="region">  
 <lucene-index indexName="luceneIndex">
             <FieldDefinition name="fieldName" analyzer="KeywordAnalyzer"/> 
 </lucene-index>
</region>

 

REST API

TBD - But using solr to provide a REST API might make a lot of sense

Spring Data GemFire Support

TBD - But the Searchable annotation described in this blog might be a good place to start.

Implementation Flowchart

 

 

Index Storage

The lucene indexes will be stored in memory instead of disk. This will be done by implementing a lucene Directory called RegionDirectory which uses Geode as a flat file system. This way we get all the benefits offered by Geode and we can achieve replication and shard-ing of the indexes. The lucene indexes will be co-located with the data region in case of HA. 
A LuceneIndex object will be created for each index, to manage all the attributes related with the index, such as reflection fields, AEQ listener, RegionDirectory array, Search, etc. 

 

PlantUML
[LuceneIndex] --> [RegionDirectory]
() "User"
node "Colocated PR or Replicated Region" {
  () User --> [User Data Region] : Puts
  [User Data Region] --> [Async Queue]
  [Async Queue] --> [LuceneIndex] : Batch Writes
  [RegionDirectory] --> [Lucene Regions]
}

...

 

dd

Partitioned region data flow

PlantUML
() User -down-> [Cache] : PUTs
node cluster {
 database {
 () "indexBucket1Primary"
 }

 database {
 () "indexBucket1Secondary"
 }

[Cache] ..> [Bucket 1]
 [Bucket 1] -down-> [Async Queue Bucket 1]
[Async Queue Bucket 1] -down-> [FSDirectoryBucket1] : Batch Write
[FSDirectoryBucket1] -> indexBucket1Primary
indexBucket1Primary -right-> indexBucket1Secondary

 database {
 () "indexBucket2Primary"
 }

 database {
 () "indexBucket2Secondary"
 }

[Cache] ..> [Bucket 2]
 [Bucket 2] -down-> [Async Queue Bucket 2]
 [Async Queue Bucket 2] -down-> [FSDirectoryBucket2] : Batch Write
 [FSDirectoryBucket2] -> indexBucket2Primary
 indexBucket2Primary -right-> indexBucket2Secondary 
}

...

PlantUML
node "LuceneIndex" {
  [Reflective fields]
  [AEQ listener]
  [RegionDirectory array (one per bucket)]
  [Query objects]
}


In a partition region every bucket in the region will have its own GeodeFSDirectory to store the lucene indexes. The GeodeFSDirectory implements a file system using 2 regions 
  • FileRegion : holds the meta data about indexing files
  • ChunkRegion : Holds the actual data chunks for a given index file. 

The FileRegion and ChunkRegion will be collocated with the data region which is to be indexed. The GeodeFSDirectory will have a key that contains the bucket id for file metadata chunks. The FileRegion and ChunkRegion will have partition resolver that looks at the bucket id part of the key only.
In AsyncEventListener, when a data entry is processed
  1. determine the bucket id of the entry.
  2. Get the directory for that bucket, do the indexing operation into that instance.

...