Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. tika-server-standard jar: https://dlcdn.apache.org/tika/2.1.0/tika-server-standard-2.1.0.jar
  2. tika-eval-core.jar: https://repo1.maven.org/maven2/org/apache/tika/tika-eval-core/2.1.0/tika-eval-core-2.1.0.jar
  3. If you'd like to experiment with tesseract, make sure that tesseract is installed and callable as 'tesseract' from your commandline.
  4. Some knowledge of SQL
Example docs, extracts and config files: tika-eval-workshop-20211109.tgz

Before the class, you should unzip the tika-eval-workshop-20211109.tgz (tar -xzvf tika-eval-workshop-20211109.tgz) and , move the tika-app-2.1.0.jar into the tika-eval-workshop-20211109/ folder and  run tika-app on the docs directory: java -jar tika-app-2.1.0.jar -J -t -i docs -o extracts/my_extracts 


Note: There's a bug in the default logging configuration for tika-app in batch mode (e.g. "No configuration found for '4b85612c' at 'null' in 'null'...").  This is fixed in the latest tika-app and will be available in the next release 2.1.1.