For now, see: https://downloads.apache.org/tika/2.0.0/CHANGES-2.0.0.txt

Metadata

Metadata.RESOURCE_NAME_KEY has been renamed TikaCoreProperties.RESOURCE_NAME_KEY.
TikaCoreProperties.KEYWORDS has been removed.
Meta X-Parsed-By has changed to X-TIKA:Parsed-By.

tika-parsers – specific parser changes

...

If you are checking for CVEs (recommended), the tika-parser-scientific-module:2.0.0 comes with a transitive dependency on quartz 2.2.0 which should be fixed like this:

Code Block

language	xml
title	quartz

    <dependency>
      <groupId>edu.ucar</groupId>
      <artifactId>netcdf4</artifactId>
      <version>${netcdf-java.version}</version>
      <exclusions>
        ....
        <exclusion>
          <groupId>org.quartz-scheduler</groupId>
          <artifactId>quartz</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.quartz-scheduler</groupId>
      <artifactId>quartz</artifactId>
      <version>2.3.2</version>
    </dependency>

When using lang detection, you need to change also the dependencies from 2.0.0. It was before:And it's nownow use:

Code Block

language	xml
title	pom.xml 2.0.0

<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-langdetect-optimaize</artifactId>
  <version>2.0.0</version>
</dependency>

Also note that org.apache.tika.langdetect.OptimaizeLangDetector.getDefaultLanguageDetector has moved to org.apache.tika.langdetect.optimaize.OptimaizeLangDetector.getDefaultLanguageDetector.

For OCR, you can not use anymore TesseractOCRConfig.setTesseractPath(String) and TesseractOCRConfig.setTessdataPath(String) methods. They moved to the TesseractOCRParser class.

tika-app

tika-server

General

...

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Metadata

tika-parsers – specific parser changes

tika-app

tika-server

General

Page tree

Page History

Versions Compared

Old Version 13

New Version 14

Key

Metadata

tika-parsers – specific parser changes

tika-app

tika-server

General