NOTE: THIS PAGE IS IN PROGRESS. PLEASE CHECK BACK FOR MORE DETAILS.
For now, see: https://downloads.apache.org/tika/2.0.0-BETA/CHANGES-2.0.0-BETA.txt
Metadata
tika-parsers – specific parser changes
tika-parsers module
When using tika-parsers in you project, you need to change the dependencies from
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId> <version>1.27</version> </dependency>
to
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <version>2.0.0</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-scientific-module</artifactId> <version>2.0.0</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-sqlite3-module</artifactId> <version>2.0.0</version> </dependency>
Also, there's a small transitive dependency issue with jcl-over-slf4j between tika-parsers-standard-package 2.0.0 and tika-parser-scientific-module:2.0.0. So if you are using maven enforcer plugin, you will need to fix it by adding this:
<!-- Fix tika-parsers-standard-package 2.0.0 vs tika-parser-scientific-module:2.0.0 transitive dependency --> <dependency> <groupId>org.slf4j</groupId> <artifactId>jcl-over-slf4j</artifactId> <version>1.7.31</version> </dependency>
If you are checking for CVEs (recommended), the tika-parser-scientific-module:2.0.0 comes with a transitive dependency on quartz 2.2.0 which should be fixed like this:
When using lang detection, you need to change also the dependencies from 2.0.0. It was before:
And it's now:
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect-optimaize</artifactId> <version>2.0.0</version> </dependency>
tika-app
tika-server
General
enableFileUrl
has been removed in favor of aFileSystemFetcher
see tika-pipes#FetchersInClassicServerEndpoints.
Configuration
tika-pipes
See the tika-pipes page.
tika-eval
tika-langid
In the 1.x branch, the default (hardwired) language identification wrapper was the wrapper around optimaize. In 1.x, you'd use:
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect</artifactId> <version>1.27</version> </dependency>
In 2.x, change this to:
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect-optimaize</artifactId> <version>2.0.x</version> </dependency>
The legacy homegrown language id component that used to be in tika-core is now in the tika-langdetect-tika module.