NOTE: THIS PAGE IS IN PROGRESS. PLEASE CHECK BACK FOR MORE DETAILS.

Major breaking changes

OCR is now triggered automatically for PDFs if tesseract is on the user's path see (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr) for how to disable OCR.
We upgraded from log4j to log4j2 in tika-app, tika-server and anywhere else we used to use log4j.
Removed deprecated Metadata keys/properties (TIKA-1974). See below for a list of changed keys.
Removed deprecated PDFPreflightParser (TIKA-3437).
Removed dangerous calls to read an inputstream or convert to bytes without specifying a charset
Parsers can be configured via tika-config.xml on instantiation. We have moved away from configuration via .properties files because of confusion among users. This affects the PDFParser, TesseractOCRParser and the StringsParser. See below for links to the specific parsers.
Changed namespaces of translator implementations (o.a.t.language.translate.impl) to avoid split-package with tika-core

...