Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

NOTE: THIS PAGE IS IN PROGRESS.  PLEASE CHECK BACK FOR MORE DETAILS.

For now, see: https://downloadsarchive.apache.org/dist/tika/2.0.0/CHANGES-2.0.0.txt

Major breaking changes

  • OCR is now triggered automatically for PDFs if tesseract is on the user's path see (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr) for how to disable OCR.
  • We upgraded from log4j to log4j2 in tika-app, tika-server and anywhere else we used to use log4j.
  • Removed deprecated Metadata keys/properties (TIKA-1974).  See below for a list of changed keys.
  • Removed deprecated PDFPreflightParser (TIKA-3437). 
  • Removed dangerous calls to read an inputstream or convert to bytes without specifying a charset
  • Parsers can be configured via tika-config.xml on instantiation. We have moved away from configuration via .properties files because of confusion among users. This affects the PDFParser, TesseractOCRParser and the StringsParser. See below for links to the specific parsers.
  • Changed namespaces of translator implementations (o.a.t.language.translate.impl) to avoid split-package with tika-core

Metadata

Removed duplicate/triplicate keys

...