Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel3
outlinetrue

Goals

  1. Accurate text search
  2. Reuse code
  3. Scalability
  4. Performance, compare to RAMFSDirectory

User Input

  1. A region and list of to-be-indexed fields (or text searchable fields)
  2. [ Optional ] Standard Analyzer or its implementation User needs to be able to provide the analyzer to be used with all the fields in a index
  3. [ Optional ] Field types. index and field type for each field. Note: A string can be Text or String in lucene. The two have different behavior

...

  1. How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?

Faceting

Lucene / Solr support flat, Json and API based interfaces for faceting

  • API

// Create Readers
DirectoryReader indexReader = DirectoryReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);

// Create counters along dimensions
FacetSearchParams fsp = new FacetSearchParams(new CountFacetRequest(new CategoryPath("Author"), 10));

// Aggregates the facet counts
FacetsCollector fc = FacetsCollector.create(fsp, searcher.getIndexReader(), taxoReader);

// Search
searcher.search(...);

// Retrieve results
List<FacetResult> facetResults = fc.getFacetResults();

  • Solr Json query
{
  high_popularity : {
    type : query,
    q : "popularity:[8 TO 10]",
    facet : { average_price : "avg(price)" }
  }
}
 
Example response
 
"high_popularity": {
 
  "count": 147,
  "average_price": 74.25
}
{
  prices : {
    type : range,
    field : price,
    start : 0,
    end : 40,
    gap : 20
  }
}
"prices":{
  "buckets":[
    {
      "val":0.0,  // the bucket value represents the start of each range.  This bucket covers 0-20
      "count":5},
    {
      "val":20.0,
      "count":1}
  ]
}