...
Code Block | ||||
---|---|---|---|---|
| ||||
<dependency> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId> <version>1.27</version> </dependency> |
...
For OCR, you can not use anymore TesseractOCRConfig.setTesseractPath(String)
and TesseractOCRConfig.setTessdataPath(String)
methods. They moved to the TesseractOCRParser
class.
tika-parsers-module optional dependencies
zstd
The zstd dependency includes native libs and is not packaged with the tika-parsers-module. If you'd like to parse zstd files, include:
Code Block |
---|
<dependency>
<groupId>com.github.luben</groupId>
<artifactId>zstd-jni</artifactId>
<version>1.5.0-4</version>
</dependency> |
TIFF and JPEG2000
If you plan to write TIFFs with Tika (rendering of PDF pages for OCR) or read JPEG2000, and if the BSD-3 with nuclear disclaimer license is acceptable to you, include:
Code Block |
---|
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.4.0</version>
</dependency> |
If you plan on processing JPEG2000 images (most common use case would be rendering PDF pages for OCR), include:
Code Block |
---|
<dependency> <groupId>com.github.jai-imageio</groupId> <artifactId>jai-imageio-jpeg2000</artifactId> <version>1.4.0</version> </dependency> |
tika-app
tika-server
General
...