Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Also, please be polite. This feature was added as a convenience. Please consider using a robust crawler (instead of our simple TikaInputStream.get(new URL(fileUrl))) that will allow for better configuration of redirects, timeouts, cookies, etc.; and a robust crawler will respect robots.txt!

Transfer-Layer Compression

As of Tika 1.24.1, users can turn on gzip compression for either files on their way to tika-server  or the output from tika-server.

If you want to gzip your files before sending to tika-server , add

No Format
curl -T test_my_doc.pdf -H "Content-Encoding: gzip" http://localhost:9998/rmeta


If you want tika-server  to compress the output of the parse:

No Format
curl -T test_my_doc.pdf -H "Accept-Encoding: gzip" http://localhost:9998/rmeta


Making Tika Server Robust to OOMs, Infinite Loops and Memory Leaks

...