TODO for Payloads

Background

Payloads allow Lucene users to optionally store a byte array of information on a term by term basis. While payloads in general are here to stay, the specific implementation/API for payloads may not be.

For background on Payloads see: https://issues.apache.org/jira/browse/LUCENE-755

http://www.gossamer-threads.com/lists/lucene/java-dev/43511?search_string=payload;#43511

http://www.gossamer-threads.com/lists/lucene/java-dev/43860?search_string=payload;

TODO

NOTE: This is just suggestions of what might be useful and are not necessarily the final names, etc.

Query

These queries probably should extend/use SpanQueries, b/c they rely on TermPositions anyway.

  1. Probably create a package called payloads under search, similar to Spans
  2. PayloadQuery – A Query implementation much like TermQuery that allows for matching one payload
  3. PayloadPhraseQuery – A Query implementation much like PhraseQuery that matches payloads occurring w/in some slop of each other
  4. Payload*Query – Ambitious contributors may find it useful to be able to do the other types of queries (prefix, wildcard, etc.)

Similarity and Scoring

While the Query implementations outlined above allow us to search payloads, it is also useful to use the payload information for scoring. It should be possible to override the following method in Similarity:

Lucene 2.4 and before:

  public float scorePayload(byte[] data, int offset, int length) { }

Lucene 2.9+:

  public float scorePayload(int docId, String fieldName, int start, int end, byte [] payload, int offset, int length)

that can use the payload to score a term. It was suggested in the threads above that a WeightedTermScorer be created, instead of altering the TermScorer so that performance issues are addressed.

3/17/2007: GSI: See https://issues.apache.org/jira/browse/LUCENE-834 for an implementation of what I called BoostingTermQuery which does the weighting of terms based on the payloads.

TermDocs and TermPositions

Thanks to https://issues.apache.org/jira/browse/LUCENE-761, we can merge TermDocs and TermPositions and just use TermPositions. It may be possible to then make all Query implementations be Span Queries, but judgment is out on whether this is worthwhile or not.

  • No labels