...

A search request will be intercepted by a custom ParserAggregator. This component will distribute the search query to all PRs. Each PR will route the request to local Lucene. The result will be routed to ParserAggregator. ParserAggregator will reorder and trim the aggregated result set and return the updated result set to user.

PlantUML

() User -> [Cache] : Search
node cluster {
 database {
 () indexPR1
 }

 [Cache] ..> [PR 1]
 [PR 1] --> [ParserAggregator]
 [ParserAggregator] --> [LucenePR1]
 [LucenePR1] --> [FSDirectoryPR1]
 [FSDirectoryPR1] -> indexPR1
 
 database {
 () indexPR2
 }

 [ParserAggregator] --> [LucenePR2]
 [LucenePR2] --> [FSDirectoryPR2]
 [FSDirectoryPR2] -> indexPR2
}

...

High maintenance
Complexity

Option - 2: Distributed FS Directory implementation

Here search request is handled by Lucene and Lucene's Parser and aggregator is utilized. DistributedFSDirectory will provide a unified view to Lucene. Lucene will request DistributedFSDirectory to fetch index chunks. DistributedFSDirectory will aggregate the index chunks from the PR which hosts the data. This is similar to a Cache Client in behavior. Cache Client reaches different PRs and provides a unified data view to the user.

PlantUML

() User -> [Cache] : Search
node cluster {
 database {
 () indexPR1
 }

 [Cache] ..> [PR 1]
 [PR 1] --> [LucenePR1]
 [LucenePR1] --> [DistributedFSDirectory]
 [DistributedFSDirectory] -down-> [FSDirectoryPR1]
 [FSDirectoryPR1] -> indexPR1
 
 database {
 () indexPR2
 }

 [DistributedFSDirectory] -down-> [FSDirectoryPR2]
 [FSDirectoryPR2] -> indexPR2
}

...

Performance:
Memory requirement
Network overhead

Option - 3: Embedded Solr

Here search request is handled by Solr. Solr distributes queries to Solr agents and its aggregator is utilized.

PlantUML

() User -> [Cache] : Search
node cluster {
 database {
 () indexPR1
 }

 [Cache] ..> [PR 1]
 [PR 1] --> [SolrServer]
 [SolrServer] --> [SolrPR1]
 [SolrPR1] -down-> [FSDirectoryPR1]
 [FSDirectoryPR1] -> indexPR1
 
 database {
 () indexPR2
 }

 [SolrServer] --> [SolrPR2]
 [SolrPR2] -down-> [FSDirectoryPR2]
 [FSDirectoryPR2] -> indexPR2
}

Advantages

Performance
Full API compliance
Accurate results

Limitations

Solr instance management complexity
Additional point of failures

Work In Progress

How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?

...

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version 4

Key

Option - 2: Distributed FS Directory implementation

Option - 3: Embedded Solr

Advantages

Limitations

Work In Progress

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version 4

Key

Option - 2: Distributed FS Directory implementation

Option - 3: Embedded Solr

Advantages

Limitations

Work In Progress