Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



Geode supports creating Lucene indexes on data in Geode regions. See the javadocs for and user guide for details on the API.

Internally, lucene indexes are stored in a region which is colocated with the user region. Geode provides an implementation of Lucene's Directory interface which stores lucene's data files directly in that colocated region, rather than on disk.

Lucene indexes are updated asynchronously by an AsyncEventQueue which is attached to the region.

Data flow into a lucene index

Implementation Details

 

Implementation Flowchart

...


PlantUML
[LuceneIndex] --> [RegionDirectory]
() "User"
node "Colocated Partitioned Regions" {
  () User --> [User Data Region] : Puts
  [User Data Region] --> [Async Queue]
  [Async Queue] --> [LuceneIndex] : Batch Writes
  [RegionDirectory] --> [Lucene Regions]
}

...

 


A closer look at Partitioned region data flow


PlantUML
() User -down-> [User Data Region] : PUTs

[User Data Region] ..> [Bucket 1]
 [Bucket 1] -down-> [Async Queue Bucket 1]
node LuceneIndex {
[Async Queue Bucket 1] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory1]
[RegionDirectory1] -down-> [file region bucket 1]
}
 
[User Data Region] ..> [Bucket 2]
 [Bucket 2] -down-> [Async Queue Bucket 2]
node LuceneIndex {
[Async Queue Bucket 2] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory2]
[RegionDirectory2] -down-> [file region bucket 2]
}



Class Diagram


PlantUML
@startuml
hide empty members
class LuceneServiceImpl
interface LuceneIndex
class LuceneIndexForPartitionedRegion
interface RepositoryManager
class PartitionedRepositoryManager {
  Manages an IndexRepository for each bucket
}
package "One For Each Bucket" {
interface IndexRepository {
  Operates on a the lucene index: put, delete query
}
class IndexRepositoryImpl
interface LuceneSerializer {
  Converts a user object into a lucene document
}
class RegionDirectory {
  Implements Lucene's directory interface but writes to geode regions
}
class IndexWriter <<Lucene Class>>
interface Directory <<Lucene Class>>
class FileSystem
class BucketTargetingMap
class BucketRegion
}
LuceneIndexForPartitionedRegion --> AsyncEventQueue: creates
AsyncEventQueue o-- LuceneEventListener
LuceneEventListener o-- RepositoryManager
LuceneServiceImpl *-- LuceneIndex
LuceneIndex <|-- LuceneIndexForPartitionedRegion
LuceneIndexForPartitionedRegion *-- RepositoryManager
RepositoryManager <|-- PartitionedRepositoryManager
PartitionedRepositoryManager  "1" *-- "many" IndexRepository
IndexRepository <|-- IndexRepositoryImpl
IndexRepositoryImpl *-- IndexWriter
IndexRepositoryImpl o-- LuceneSerializer
IndexWriter *-- RegionDirectory
RegionDirectory --|> Directory
RegionDirectory *-- FileSystem
FileSystem *-- BucketTargetingMap
BucketTargetingMap *-- BucketRegion
@enduml


Processing Queries

 

PlantUML
() User -down-> [LuceneQuery] : fields, Analyzer, query strings, or Query
[LuceneQuery] -down-> [User Data Region]: call search()
[User Data Region] -down-> [Function Execution]
[Function Execution] -down-> [Bucket 1]
[Bucket 1] -down-> [RegionDirectory for bucket 1]
[RegionDirectory for bucket 1] ..> [Bucket 1] : TopDocs, ScoreDocs
[Bucket 1] ..> [Function Execution] : score, key

[Function Execution] -down-> [Bucket 2]
[Bucket 2] -down-> [RegionDirectory for bucket 2]
[RegionDirectory for bucket 2] ..> [Bucket 2] : TopDocs, ScoreDocs
[Bucket 2] ..> [Function Execution] : score, key



 



Handling failures, restarts, and rebalance 

...

In the case of partitioned regions, the query must be sent out to all the primaries. The results will then need to be aggregated back together. Lucene search uses FunctionService to distribute query to primaries. 

Input to primaries

  1. Serialized Query
  2. CollectorManager to be used for local aggregation
  3. Result limit

Output from primaries

  1. Merged collector created from results of search on local bucket indexes.

 

PlantUML
 participant LuceneQuery
 participant FunctionService
 participant FunctionCollector
 participant CollectorManager
 participant M1_LuceneFunction
 participant M1_CollectorManager
 participant Index_1
 participant Index_2
 LuceneQuery -> FunctionService: Query
 activate FunctionService
 FunctionService --> M1_LuceneFunction : LuceneContext
 activate M1_LuceneFunction
 FunctionService --> M2_LuceneFunction: LuceneContext
 activate M2_LuceneFunction
 M1_LuceneFunction -> Index_1 : search(Collector_1)
 Index_1 -> M1_LuceneFunction : loaded Collector_1
 M1_LuceneFunction -> Index_2 : search(Collector_2)
 Index_2 -> M1_LuceneFunction : loaded Collector_2
 M1_LuceneFunction -> M1_CollectorManager : merge Collectors
 activate M1_CollectorManager
 M1_CollectorManager -> M1_LuceneFunction : merged Collector
 deactivate M1_CollectorManager
 activate FunctionCollector
 M1_LuceneFunction -> FunctionCollector:Collector_M1
 deactivate M1_LuceneFunction
 M2_LuceneFunction -> FunctionCollector:Collector_M2
 deactivate M2_LuceneFunction
 FunctionCollector -> CollectorManager : merge Collectors
 activate CollectorManager
 CollectorManager -> FunctionCollector : Final Collector
 deactivate CollectorManager
 FunctionCollector -> FunctionService : Final Collector
 deactivate FunctionCollector
 FunctionService -> LuceneQuery : QueryResults
 deactivate FunctionService


 Naba drawed flowcharts for LuceneIndex:

Image Added

Image Added

Image Added