...
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers-standard-package</artifactId> <version>2.1.0</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-scientific-module</artifactId> <version>2.1.0</version> </dependency> <dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parser-sqlite3-module</artifactId> <version>2.1.0</version> </dependency> |
NOTE: As in Tika 1.x, if you need detection on container formats (e.g. OLE2: .doc, .ppt, .xls or zip-based: .xlsx, .pptx, .docx or .ogg based), you need to include the underlying Tika parsers that will parse the container files and make the detection based on the information in the container. In Tika 2.x, this means that you need to include tika-parsers-standard-package
!
Lesser parser notes that may only affect early versions of 2.x
Also, there's a small transitive dependency issue with jcl-over-slf4j between tika-parsers-standard-package 2.0.0 and tika-parser-scientific-module:2.0.0. So if you are using maven enforcer plugin, you will need to fix it by adding this:
...
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency> <groupId>edu.ucar</groupId> <artifactId>netcdf4</artifactId> <version>${netcdf-java.version}</version> <exclusions> ... <exclusion> <groupId>org.quartz-scheduler</groupId> <artifactId>quartz</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.quartz-scheduler</groupId> <artifactId>quartz</artifactId> <version>2.3.2</version> </dependency> |
Language Detection
When using lang detection, you need to change now use:
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-langdetect-optimaize</artifactId> <version>2.1.0</version> </dependency> |
...
Note! In 2.x, Tika will not warn you if a PDF page that you're trying to render has a JPEG2000 in it. PDFBox will log a warning.
tika-app
tbd
tika-server
General
- enableFileUrl has been removed in favor of two separate fetchers, one for files and one for URLs (see tika-pipes#FetchersInClassicServerEndpoints).
FileSystemFetcher
(which is packaged with tika-core) for filesHttpFetcher
(requires an external jar from https://mvnrepository.com/artifact/org.apache.tika/tika-fetcher-http) for URLs.
...