Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If the child process is in the process of shutting down, and it gets a new request it will return 503 -- Service Unavailable. If the server times out on a file, the client will receive an IOException from the closed socket. Note that all other files that are being processed will end with an IOException from a closed socket when the child process shuts down; e.g. if you send three files to tika-server concurrently, and one of them causes a catastrophic problem requiring the child to shut down, you won't be able to tell which file caused the problems. In the future, we may implement a gentler shutdown than we currently have.

NOTE 1: -spawnChild has become the default in Tika 2.x.  If you need to return to the legacy 1.x behavior, configure tika-server element in the tika-config.xml with <noFork>true</noFork> or add --noFork as the commandline argument.

NOTE 2: to specify the JVM args for the child process, prepend the arguments with -J as in -JXmx4g after the -jar tika-server.x.x.jar call as in:

No Format
$ java -Dlog4j.configuration=file:log4j_server.xml -jar tika-server-x.x.jar -spawnChild -JXmx4g -JDlog4j.configuration=file:log4j_child.xml}}

NOTE 23: Before Tika 1.27, we strongly encourage -JXX:+ExitOnOutOfMemoryError, which admittedly has limitations: https://bugs.openjdk.java.net/browse/JDK-8155004.  When a JVM is struggling with memory, it is possible that the final trigger for the OOM happens in reading bytes from the client or writing bytes to the client NOT during the parse.  In short, OOMs can happen outside of Tika's code, and our internal watcher can't see/respond to some OOMs.  In 1.27 and later (and in 2.x), we added a shutdown hook in TesseractOCRParser to decrease the chances of orphaning tesseract.  The use of -JXX:+ExitOnOutOfMemoryError prevents the shutdown hooks from working, and tesseract processes may more easily be orphaned on an out of memory error.

NOTE 34: When using the -spawnChild option, clients will need to be aware that the server could be unavailable temporarily while it is restarting.  Clients will need to have a retry logic.

...