Table of Contents | ||||
---|---|---|---|---|
|
Goals
- Accurate text search
- Reuse code
- Scalability
- Performance, compare to RAMFSDirectory
User Input
- A region and list of to-be-indexed fields (or text searchable fields)
- [ Optional ] Standard Analyzer or its implementation User needs to be able to provide the analyzer to be used with all the fields in a index
- [ Optional ] Field types. index and field type for each field. Note: A string can be Text or String in lucene. The two have different behavior
...
- How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?
Faceting
Lucene / Solr support flat, Json and API based interfaces for faceting
- API
// Create Readers
DirectoryReader indexReader = DirectoryReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
// Create counters along dimensions
FacetSearchParams fsp = new FacetSearchParams(new CountFacetRequest(new CategoryPath("Author"), 10));
// Aggregates the facet counts
FacetsCollector fc = FacetsCollector.create(fsp, searcher.getIndexReader(), taxoReader);
// Search
searcher.search(...);
// Retrieve results
List<FacetResult> facetResults = fc.getFacetResults();
- Solr Json query
{
high_popularity : {
type : query,
q : "popularity:[8 TO 10]",
facet : { average_price : "avg(price)" }
}
}
Example response "high_popularity": {
"count": 147,
"average_price": 74.25
}
{
prices : {
type : range,
field : price,
start : 0,
end : 40,
gap : 20
}
}
"prices"
:{
"buckets"
:[
{
"val"
:0.0,
// the bucket value represents the start of each range. This bucket covers 0-20
"count"
:5},
{
"val"
:20.0,
"count"
:1}
]
}