Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. org.apache.lucene.index.MultiReader: An IndexReader which reads multiple indexes, appending their content.

    1. termDocs(Term term): Returns an enumeration of all the documents which contain term.

    2.  termPositions: Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available.

  2. org.apache.lucene.index.IndexWriter

    1. updateDocument, addDocument

IndexWriter can control how the terms are distributed and persisted. In case of a distributed search, MultiReader can distribute the query to shard based sub-readers and each sub-reader streams filtered results from the shard to the query coordinator.

A map with this form <term, map <docId, list <position>>> is needed for supporting various lucene functions.

Limitations

  1. A popular term will have a large value (map of doc and position of term in the doc). Managing such a large needs to be efficient.

Work In Progress 

  1. How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?

...