Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Lucene has four underlying types that a docvalues field can have. Currently Solr uses three of these:

  1. Wiki Markup
    NUMERIC: a single-valued per-document numeric type. This is like having a large long\[\] array for the whole index, though the data is compressed based upon the values that are actually used.

    • For example, consider 3 documents with these values:
      No Format
             doc[0] = 1005
             doc[1] = 1006
             doc[2] = 1005
      
      In this example the field would use around 1 bit per document, since that is all that is needed.
  2. Wiki Markup
    SORTED: a single-valued per-document string type. This is like having a large String\[\] array for the whole index, but with an additional level of indirection. Each unique value is assigned a term number that represents its ordinal value. So each document really stores a compressed integer, and separately there is a "dictionary" mapping these term numbers back to term values.

    • For example, consider 3 documents with these values:
      No Format
             doc[0] = "aardvark"
             doc[1] = "beaver"
             doc[2] = "aardvark"
      
      Value "aardvark" will be assigned ordinal 0, and "beaver" 1, creating these two data structures:
      No Format
             doc[0] = 0
             doc[1] = 1
             doc[2] = 0
      
             term[0] = "aardvark"
             term[1] = "beaver"
      

  3. SORTED_SET: a multi-valued per-document string type. Its similar to SORTED, except each document has a "set" of values (in increasing sorted order). So it intentionally discards duplicate values (frequency) within a document and loses order within the document.
    • For example, consider 3 documents with these values:
      No Format
             doc[0] = "cat", "aardvark", "beaver", "aardvark"
             doc[1] =
             doc[2] = "cat"
      
      Value "aardvark" will be assigned ordinal 0, "beaver" 1, and "cat" 2, creating these two data structures:
      No Format
             doc[0] = [0, 1, 2]
             doc[1] = []
             doc[2] = [2]
      
             term[0] = "aardvark"
             term[1] = "beaver"
             term[2] = "cat"
      

  4. Wiki Markup
    BINARY: a single-valued per-document byte\[\] array. This can be used for encoding custom per-document datastructures.