1) Hands-on tika-eval module workshop, Part 1
November 9, 2021, Tuesday 11am EST/4pm UTC
The dial-in information is available to those who register via Meetup.
This workshop is designed for hands-on tech folks who can run Tika from the commandline or can curl
to a local tika-server.
Stay tuned for prerequisites, resources and an agenda!
The following is all a work in progress. Please check back right before the workshop!
Prerequisites:
- java >= 8
- tika-eval app and tika-app jars: https://dlcdn.apache.org/tika/2.1.0/tika-eval-app-2.1.0.jar and https://dlcdn.apache.org/tika/2.1.0/tika-app-2.1.0.jar
- JSON editor/viewer (
jq
should be sufficient. I like Sublime with the PrettyJSON plugin https://github.com/dzhibas/SublimePrettyJson) - XLSX viewer (Excel or Open/LibreOffice)
Optional materials:
- tika-server-standard jar: https://dlcdn.apache.org/tika/2.1.0/tika-server-standard-2.1.0.jar
- tika-eval-core.jar: https://repo1.maven.org/maven2/org/apache/tika/tika-eval-core/2.1.0/tika-eval-core-2.1.0.jar
- If you'd like to experiment with tesseract, make sure that tesseract is installed and callable as 'tesseract' from your commandline.
- Some knowledge of SQL
Example docs, extracts and config files: tika-eval-workshop-20211109.tgz
Before the class, you should unzip the tika-eval-workshop-20211109.tgz (tar -xzvf tika-eval-workshop-20211109.tgz
), move the tika-app-2.1.0.jar
into the tika-eval-workshop-20211109/
folder and run tika-app on the docs
directory: java -jar tika-app-2.1.0.jar -J -t -i docs -o extracts/my_extracts
Note: There's a bug in the default logging configuration for tika-app in batch mode (e.g. "No configuration found for '4b85612c' at 'null' in 'null'..."
). This is fixed in the latest tika-app and will be available in the next release 2.1.1.
2) Hands-on tika-pipes module workshop
December 2, 2021, Thursday 12pm (NOON) EST/5pm UTC
The dial-in information is available to those who register via Meetup.
More details coming soon...
Prerequisites:
- java >= 8
- tika-server-standard jar: https://dlcdn.apache.org/tika/2.1.0/tika-server-standard-2.1.0.jar
- Installation of Apache Solr (~8.9.x) and/or OpenSearch (~1.x)