...
A search request will be intercepted by a custom ParserAggregator. This component will distribute the search query to all PRs. Each PR will route the request to local Lucene. The result will be routed to ParserAggregator. ParserAggregator will reorder and trim the aggregated result set and return the updated result set to user.
PlantUML |
---|
() User -> [Cache] : Search node cluster { database { () indexPR1 } [Cache] ..> [PR 1] [PR 1] --> [ParserAggregator] [ParserAggregator] --> [LucenePR1] [LucenePR1] --> [FSDirectoryPR1] [FSDirectoryPR1] -> indexPR1 database { () indexPR2 } [ParserAggregator] --> [LucenePR2] [LucenePR2] --> [FSDirectoryPR2] [FSDirectoryPR2] -> indexPR2 } |
...
- High maintenance
- Complexity
Option - 2: Distributed FS Directory implementation
Here search request is handled by Lucene and Lucene's Parser and aggregator is utilized. DistributedFSDirectory will provide a unified view to Lucene. Lucene will request DistributedFSDirectory to fetch index chunks. DistributedFSDirectory will aggregate the index chunks from the PR which hosts the data. This is similar to a Cache Client in behavior. Cache Client reaches different PRs and provides a unified data view to the user.
PlantUML |
---|
() User -> [Cache] : Search node cluster { database { () indexPR1 } [Cache] ..> [PR 1] [PR 1] --> [LucenePR1] [LucenePR1] --> [DistributedFSDirectory] [DistributedFSDirectory] -down-> [FSDirectoryPR1] [FSDirectoryPR1] -> indexPR1 database { () indexPR2 } [DistributedFSDirectory] -down-> [FSDirectoryPR2] [FSDirectoryPR2] -> indexPR2 } |
...
- Performance:
- Memory requirement
- Network overhead
Option - 3: Embedded Solr
Here search request is handled by Solr. Solr distributes queries to Solr agents and its aggregator is utilized.
PlantUML |
---|
() User -> [Cache] : Search
node cluster {
database {
() indexPR1
}
[Cache] ..> [PR 1]
[PR 1] --> [SolrServer]
[SolrServer] --> [SolrPR1]
[SolrPR1] -down-> [FSDirectoryPR1]
[FSDirectoryPR1] -> indexPR1
database {
() indexPR2
}
[SolrServer] --> [SolrPR2]
[SolrPR2] -down-> [FSDirectoryPR2]
[FSDirectoryPR2] -> indexPR2
} |
Advantages
- Performance
- Full API compliance
- Accurate results
Limitations
- Solr instance management complexity
- Additional point of failures
Work In Progress
- How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?
...