Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Solr instance management complexity
  2. Additional point of failures

Option - 4: IndexWriter and MultiReader implementation

A custom implementation of IndexWriter and IndexReader could be provided as an alternative to FSDirectory implementation. FSDirectory is file-like interface. Lucene constructs a file and hands it over to FSDirectory for writes and reads. Lucene manages file merges. The directory implementation does not have visibility into the contents of the file. The IndexWriter approach is one layer above FSDirectory. Lucene interacts at a document and term level granularity with IndeReader/IndexWriter layer. The following are the important classes and methods to look at:

  1. org.apache.lucene.index.MultiReader: An IndexReader which reads multiple indexes, appending their content.

    1. termDocs(Term term): Returns an enumeration of all the documents which contain term.

    2.  termPositions: Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available.

  2. org.apache.lucene.index.IndexWriter

    1. updateDocument, addDocument

IndexWriter can control how the terms are distributed and persisted. In case of a distributed search, MultiReader can distribute the query to shard based sub-readers and each sub-reader streams filtered results from the shard to the query coordinator.

Work In Progress 

  1. How many active segment files are maintained per index? It seems one large file remains after merge. If so how to chunk a segment and colocate it with region?

...