Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction to Tika server

This page is documentation on accessing Tika as a RESTful API via the tika-Tika server component.

Table of Contents

Installation

The current installation process for Tika server post 1.23 and prior to 1.24 is a bit in flux.  Read on below for some options:

Building from source

If you need to customise the customize Tika server in some way, and/or need the very latest version to try out a fix, then to build from source:

...

Your specific customization to Tika setup are stored in the /etc/init.d/tika file.


Tika Server Services

All services that take files use HTTP "PUT" requests. When "PUT" is used, the original file must be sent in request body without any additional encoding (do not use multipart/form-data or other containers).

Additionally, TikaResource, Metadata and RecursiveMetadata Services accept POST multipart/form-data requests, where the original file is sent as a single attachment.

Information services (eg e.g. defined mimetypes, defined parsers etc) work with HTML "GET" requests.

...

HTTP PUTs an embedded document type to the /unpack service and you get back a zip or tar of the raw bytes of the extracted text for each resource filename in the original PUT embedded document type. the embedded files.  Note that this does not operate recursively; it extracts only the child documents of the original file.

You can also use /unpack/all to get back both the text and metadata from the container file.  If you want the text and metadata from all embedded files, consider using the /rmeta end point.

Default return type is ZIP (without internal compression). Use "Accept" header for TAR return type.Please note the mapping of this resource was changed in Apache Tika 1.6 from /unpacker/id to /unpack/id /all/id & /unpack/all/id (TIKA-1324).

Some Example example calls with cURL:

PUT zip file and get back met file zip

...