Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagexml
titlepom.xml for 2.0.0+
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parsers-standard-package</artifactId>
  <version>2.1.0</version>
</dependency>
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parser-scientific-module</artifactId>
  <version>2.1.0</version>
</dependency>
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-parser-sqlite3-module</artifactId>
  <version>2.1.0</version>
</dependency>


NOTE: As in Tika 1.x, if you need detection on container formats (e.g. OLE2: .doc, .ppt, .xls or zip-based: .xlsx, .pptx, .docx or .ogg based), you need to include the underlying Tika parsers that will parse the container files and make the detection based on the information in the container.  In Tika 2.x, this means that you need to include tika-parsers-standard-package

Lesser parser notes that may only affect early versions of 2.x

Also, there's a small transitive dependency issue with jcl-over-slf4j between tika-parsers-standard-package 2.0.0 and tika-parser-scientific-module:2.0.0. So if you are using maven enforcer plugin, you will need to fix it by adding this:

...

Code Block
languagexml
titlequartz
  <dependency>
    <groupId>edu.ucar</groupId>
    <artifactId>netcdf4</artifactId>
    <version>${netcdf-java.version}</version>
    <exclusions>
      ...
      <exclusion>
        <groupId>org.quartz-scheduler</groupId>
        <artifactId>quartz</artifactId>
      </exclusion>
    </exclusions>
  </dependency>
  <dependency>
    <groupId>org.quartz-scheduler</groupId>
    <artifactId>quartz</artifactId>
    <version>2.3.2</version>
  </dependency>

Language Detection

When using lang detection, you need to change now use:

Code Block
languagexml
titlepom.xml 2.0.0
<dependency>
  <groupId>org.apache.tika</groupId>
  <artifactId>tika-langdetect-optimaize</artifactId>
  <version>2.1.0</version>
</dependency>

...

Note! In 2.x, Tika will not warn you if a PDF page that you're trying to render has a JPEG2000 in it.  PDFBox will log a warning.


tika-app

tbd

tika-server

General

...