...
For now, see: https://downloads.apache.org/tika/2.0.0/CHANGES-2.0.0.txt
Metadata
Metadata.RESOURCE_NAME_KEY
has been renamedTikaCoreProperties.RESOURCE_NAME_KEY
.TikaCoreProperties.KEYWORDS
has been removed.- Meta
X-Parsed-By
has changed toX-TIKA:Parsed-By
.
tika-parsers – specific parser changes
...
If you are checking for CVEs (recommended), the tika-parser-scientific-module:2.0.0 comes with a transitive dependency on quartz 2.2.0 which should be fixed like this:
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency>
<groupId>edu.ucar</groupId>
<artifactId>netcdf4</artifactId>
<version>${netcdf-java.version}</version>
<exclusions>
....
<exclusion>
<groupId>org.quartz-scheduler</groupId>
<artifactId>quartz</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.quartz-scheduler</groupId>
<artifactId>quartz</artifactId>
<version>2.3.2</version>
</dependency> |
When using lang detection, you need to change also the dependencies from 2.0.0. It was before:And it's nownow use:
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect-optimaize</artifactId> <version>2.0.0</version> </dependency> |
Also note that org.apache.tika.langdetect.OptimaizeLangDetector.getDefaultLanguageDetector
has moved to org.apache.tika.langdetect.optimaize.OptimaizeLangDetector.getDefaultLanguageDetector
.
For OCR, you can not use anymore TesseractOCRConfig.setTesseractPath(String)
and TesseractOCRConfig.setTessdataPath(String)
methods. They moved to the TesseractOCRParser
class.
tika-app
tika-server
General
...