...
This page is documentation on tika's JSR 311 network server, tika-server. The server package uses the Apache CXF framework that provides an implementation of JAX-RS for Java. The Tika server component builds to a standalone package in Tika, tika-server.
Table of Contents |
---|
Installation
The easiest way to get the Tika JAXRS server is to download the latest stable release binary. This is available from the Apache Tika downloads page, via your favourite local mirror. You want the tika-server-1.x.jar file, eg tika-server-1.13.jar
...
- Checkout the source from SVN as detailed on the Apache Tika contributions page or retrieve the latest code from Github,
- Build source using Maven
- Run the Apache Tika JAXRS server runnable jar.
No Format |
---|
git clone https://github.com/apache/tika.git tika-trunk cd ./tika-trunk/ mvn install cd ./tika-server/target/ java -jar tika-server-x.x.jar |
...
- 200 Ok - request completed sucessfully
- 204 No content - request completed sucessfully, result is empty
- 422 Unprocessable Entity - Unsupported mime-type, encrypted document & etc
- 500 Error - Error while processing document
Metadata Resource
No Format |
---|
/meta |
...
No Format |
---|
/language/stream |
HTTP PUTs or POSTs a document UTF-8 text file to the LanguageIdentifier to identify its language.
NOTE: This endpoint does not parse files. It runs detection on a UTF-8 string.
Default return is a string of the 2 character identified language.
...
-maxFiles
: restart the child process after it has processedmaxFiles
. If there is a slow building memory leak, this restart of the JVM should help. The default is 100,000 files. To turn off this feature:-maxFiles -1
. The child and/or parent will log the cause of the restart asHIT_MAX
when there is a restart because of this threshold.-taskTimeoutMillis
and-taskPulseMillis
:taskPulseMillis
specifies how often to check to determine if a parse/detect task has timed outtaskTimeoutMillis
-pingTimeoutMillis
and-pingPulseMillis
:pingPulseMillis
specifies how often for the parent process to ping the child process to check status.pingTimeoutMillis
how long the parent process should wait to hear back from the child process before restarting it and/or how long the child process should wait to receive a ping from the parent process before shutting itself down.
If the child process is in the process of shutting down, and it gets a new request it will return 503 -- Service Unavailable
. If the server times out on a file, the client will receive an IOException from the closed socket. Note that all other files that are being processed will end with an IOException from a closed socket when the child process shuts down; e.g. if you send three files to tika-server concurrently, and one of them causes a catastrophic problem requiring the child to shut down, you won't be able to tell which file caused the problems. In the future, we may implement a gentler shutdown than we currently have.
...