Introduction to Tika server
This page is documentation on accessing Tika as a RESTful API via the tika-Tika server component.
Table of Contents |
---|
Installation
The current installation process for Tika server post 1.23 and prior to 1.24 is a bit in flux. Read on below for some options:
Building from source
If you need to customise the customize Tika server in some way, and/or need the very latest version to try out a fix, then to build from source:
...
Your specific customization to Tika setup are stored in the /etc/init.d/tika
file.
Tika Server Services
All services that take files use HTTP "PUT" requests. When "PUT" is used, the original file must be sent in request body without any additional encoding (do not use multipart/form-data or other containers).
Additionally, TikaResource, Metadata and RecursiveMetadata Services accept POST multipart/form-data requests, where the original file is sent as a single attachment.
Information services (eg e.g. defined mimetypes, defined parsers etc) work with HTML "GET" requests.
...
HTTP PUTs an embedded document type to the /unpack service and you get back a zip or tar of the raw bytes of the extracted text for each resource filename in the original PUT embedded document type. the embedded files. Note that this does not operate recursively; it extracts only the child documents of the original file.
You can also use /unpack/all to get back both the text and metadata from the container file. If you want the text and metadata from all embedded files, consider using the /rmeta end point.
Default return type is ZIP (without internal compression). Use "Accept" header for TAR return type.Please note the mapping of this resource was changed in Apache Tika 1.6 from /unpacker/id to /unpack/id /all/id & /unpack/all/id (TIKA-1324).
Some Example example calls with cURL:
PUT zip file and get back met file zip
...