2 November 2019, Apache Lucene™ 8.3.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 8.3.0.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: https://lucene.apache.org/core/downloads.html

Please read CHANGES.txt for a full list of new features and changes:

https://lucene.apache.org/core/8_3_0/changes/Changes.html

Lucene 8.3.0 Release Highlights:

API Changes:

  • Deprecated IndexWriter#getFieldNames() (since this is no longer needed after LUCENE-8316)

  • SpatialPrefixTreeFactory now consumes the "version" parsed with Lucene's Version class

  • QueryRescorer now only sorts the first topN hits instead of all initial hits

  • IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq. And don't call if docFreq <= 0. (The previous implementation survives as deprecated and final. It's removed in 9.0.)

  • PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by the given IntersectVisitor. The method is used to compute the cost() of ScorerSuppliers instead of PointValues#estimatePointCount(visitor)

New Features

  • New SpanishMinimalStemFilter
  • New "export all terms and doc freqs" feature to Luke with delimiters
  • Composite Matches from multiple subqueries now allow access to their submatches, and a new NamedMatches API allows marking of subqueries and a simple way to find which subqueries have matched on a given document
  • Range Query For Multiple Connected Ranges
  • LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField
  • New UniformSplitPostingsFormat (name "UniformSplit") primarily benefiting in simplicity and extensibility
  • New STUniformSplitPostingsFormat (name "SharedTermsUniformSplit") that shares a single internal term dictionary across fields

Optimizations & Improvements

  • DisjunctionMaxQuery more efficiently leverages impacts to skip non-competitive hits

  • BooleanQuery with no scoring clause can now early terminate the query when the total hits is not requested

  • Matches on wildcard queries will defer building their full disjunction until a MatchesIterator is pulled

  • spatial-extras quad and packed quad prefix trees now index points faster

  • Add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery

  • Improve performance of WITHIN and DISJOINT queries for Shape queries by doing just one pass whenever possible

  • Introduce shared count based early termination across multiple slices

  • Blocktree's seekExact now short-circuits false if the term isn't in the min-max range of the segment. Large perf gain for ID/time like data when populated sequentially

  • Show SPI names instead of class names in Luke Analysis tab
  • GraphTokenStreamFiniteStrings preserves all Token attributes through its finite strings TokenStreams
  • Introduced SpanPositionRange into XML Query Parser
  • Use a sort key instead of true distance in NearestNeighbor
  • Tessellator labels the edges of the generated triangles whether they belong to the original polygon
  • Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor
  • The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters
  • MoreLikeThis is biased for uncommon fields

Further details of changes are available in the change log available at: https://lucene.apache.org/core/8_3_0/changes/Changes.html

Please report any feedback to the mailing lists (https://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access.

  • No labels