THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Data flow into a lucene index
PlantUML |
---|
[LuceneIndex] --> [RegionDirectory]
() "User"
node "Colocated Partitioned Regions" {
() User --> [User Data Region] : Puts
[User Data Region] --> [Async Queue]
[Async Queue] --> [LuceneIndex] : Batch Writes
[RegionDirectory] --> [Lucene Regions]
} |
A closer look at Partitioned region data flow
PlantUML |
---|
() User -down-> [User Data Region] : PUTs
[User Data Region] ..> [Bucket 1]
[Bucket 1] -down-> [Async Queue Bucket 1]
node LuceneIndex {
[Async Queue Bucket 1] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory1]
[RegionDirectory1] -down-> [file region bucket 1]
}
[User Data Region] ..> [Bucket 2]
[Bucket 2] -down-> [Async Queue Bucket 2]
node LuceneIndex {
[Async Queue Bucket 2] -down-> [AEQ listener processes events into index documents]:Batch Write
[AEQ listener processes events into index documents] -down-> [RegionDirectory2]
[RegionDirectory2] -down-> [file region bucket 2]
} |
Class Diagram
PlantUML |
---|
@startuml
hide empty members
class LuceneServiceImpl
interface LuceneIndex
class LuceneIndexForPartitionedRegion
interface RepositoryManager
class PartitionedRepositoryManager {
Manages an IndexRepository for each bucket
}
package "One For Each Bucket" {
interface IndexRepository {
Operates on a the lucene index: put, delete query
}
class IndexRepositoryImpl
interface LuceneSerializer {
Converts a user object into a lucene document
}
class RegionDirectory {
Implements Lucene's directory interface but writes to geode regions
}
class IndexWriter <<Lucene Class>>
interface Directory <<Lucene Class>>
class FileSystem
class BucketTargetingMap
class BucketRegion
}
LuceneIndexForPartitionedRegion --> AsyncEventQueue: creates
AsyncEventQueue o-- LuceneEventListener
LuceneEventListener o-- RepositoryManager
LuceneServiceImpl *-- LuceneIndex
LuceneIndex <|-- LuceneIndexForPartitionedRegion
LuceneIndexForPartitionedRegion *-- RepositoryManager
RepositoryManager <|-- PartitionedRepositoryManager
PartitionedRepositoryManager "1" *-- "many" IndexRepository
IndexRepository <|-- IndexRepositoryImpl
IndexRepositoryImpl *-- IndexWriter
IndexRepositoryImpl o-- LuceneSerializer
IndexWriter *-- RegionDirectory
RegionDirectory --|> Directory
RegionDirectory *-- FileSystem
FileSystem *-- BucketTargetingMap
BucketTargetingMap *-- BucketRegion
@enduml |
Processing Queries
PlantUML |
---|
() User -down-> [LuceneQuery] : fields, Analyzer, query strings, or Query
[LuceneQuery] -down-> [User Data Region]: call search()
[User Data Region] -down-> [Function Execution]
[Function Execution] -down-> [Bucket 1]
[Bucket 1] -down-> [RegionDirectory for bucket 1]
[RegionDirectory for bucket 1] ..> [Bucket 1] : TopDocs, ScoreDocs
[Bucket 1] ..> [Function Execution] : score, key
[Function Execution] -down-> [Bucket 2]
[Bucket 2] -down-> [RegionDirectory for bucket 2]
[RegionDirectory for bucket 2] ..> [Bucket 2] : TopDocs, ScoreDocs
[Bucket 2] ..> [Function Execution] : score, key
|
Handling failures, restarts, and rebalance
...
In the case of partitioned regions, the query must be sent out to all the primaries. The results will then need to be aggregated back together. Lucene search uses FunctionService to distribute query to primaries.
Input to primaries
- Serialized Query
- CollectorManager to be used for local aggregation
- Result limit
Output from primaries
- Merged collector created from results of search on local bucket indexes.
PlantUML |
---|
participant LuceneQuery
participant FunctionService
participant FunctionCollector
participant CollectorManager
participant M1_LuceneFunction
participant M1_CollectorManager
participant Index_1
participant Index_2
LuceneQuery -> FunctionService: Query
activate FunctionService
FunctionService --> M1_LuceneFunction : LuceneContext
activate M1_LuceneFunction
FunctionService --> M2_LuceneFunction: LuceneContext
activate M2_LuceneFunction
M1_LuceneFunction -> Index_1 : search(Collector_1)
Index_1 -> M1_LuceneFunction : loaded Collector_1
M1_LuceneFunction -> Index_2 : search(Collector_2)
Index_2 -> M1_LuceneFunction : loaded Collector_2
M1_LuceneFunction -> M1_CollectorManager : merge Collectors
activate M1_CollectorManager
M1_CollectorManager -> M1_LuceneFunction : merged Collector
deactivate M1_CollectorManager
activate FunctionCollector
M1_LuceneFunction -> FunctionCollector:Collector_M1
deactivate M1_LuceneFunction
M2_LuceneFunction -> FunctionCollector:Collector_M2
deactivate M2_LuceneFunction
FunctionCollector -> CollectorManager : merge Collectors
activate CollectorManager
CollectorManager -> FunctionCollector : Final Collector
deactivate CollectorManager
FunctionCollector -> FunctionService : Final Collector
deactivate FunctionCollector
FunctionService -> LuceneQuery : QueryResults
deactivate FunctionService |
...
Naba drawed flowcharts for LuceneIndex: