Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page is documentation on tika's JSR 311 network server, tika-server. The server package uses the Apache CXF framework that provides an implementation of JAX-RS for Java. The Tika server component builds to a standalone package in Tika, tika-server.

Table of Contents

Installation

The easiest way to get the Tika JAXRS server is to download the latest stable release binary. This is available from the Apache Tika downloads page, via your favourite local mirror. You want the tika-server-1.x.jar file, eg tika-server-1.13.jar

...

  1. Checkout the source from SVN as detailed on the Apache Tika contributions page or retrieve the latest code from Github,
  2. Build source using Maven
  3. Run the Apache Tika JAXRS server runnable jar.
No Format
git clone https://github.com/apache/tika.git tika-trunk
cd ./tika-trunk/
mvn install
cd ./tika-server/target/
java -jar tika-server-x.x.jar

...

  • 200 Ok - request completed sucessfully
  • 204 No content - request completed sucessfully, result is empty
  • 422 Unprocessable Entity - Unsupported mime-type, encrypted document & etc
  • 500 Error - Error while processing document

Metadata Resource

No Format
/meta

...

No Format
/language/stream

HTTP PUTs or POSTs a document UTF-8 text file to the LanguageIdentifier to identify its language

NOTE: This endpoint does not parse files.  It runs detection on a UTF-8 string.

Default return is a string of the 2 character identified language.

...

  • -maxFiles: restart the child process after it has processed maxFiles. If there is a slow building memory leak, this restart of the JVM should help. The default is 100,000 files. To turn off this feature: -maxFiles -1. The child and/or parent will log the cause of the restart as HIT_MAX when there is a restart because of this threshold.
  • -taskTimeoutMillis and -taskPulseMillis: taskPulseMillis specifies how often to check to determine if a parse/detect task has timed out taskTimeoutMillis
  • -pingTimeoutMillis and -pingPulseMillis: pingPulseMillis specifies how often for the parent process to ping the child process to check status. pingTimeoutMillis how long the parent process should wait to hear back from the child process before restarting it and/or how long the child process should wait to receive a ping from the parent process before shutting itself down.

If the child process is in the process of shutting down, and it gets a new request it will return 503 -- Service Unavailable. If the server times out on a file, the client will receive an IOException from the closed socket. Note that all other files that are being processed will end with an IOException from a closed socket when the child process shuts down; e.g. if you send three files to tika-server concurrently, and one of them causes a catastrophic problem requiring the child to shut down, you won't be able to tell which file caused the problems. In the future, we may implement a gentler shutdown than we currently have.

...