This table highlights some differences between some of the handlers. I've temporarily left in question marks for items we need to confirm.
Feature | /tika (text|body) | /tika (html) | /tika (json) | /rmeta | /meta | /unpack |
---|---|---|---|---|---|---|
Text (including text of embedded documents) | Y | Y | Y | Y | N | Y ( with /unpack/all) |
Metadata of main document | N | Y | Y | Y | Y |
Y ( with /unpack/all) | ||||||
Metadata of embedded documents/attachments | N | N | N | Y | N | N |
Notification of parse exception | Y/N[1] | Y/N[1] | Y | Y | Y | Y? |
Specific stacktrace if server is started with the -s option | N | N | Y | Y | N | N |
Notification of parse exception in embedded document | N | N | N | Y | N | N? |
Specific stacktrace for parse exception in embedded document | N | N | N | Y | N | N |
Streaming write[2] | Y | Y | N | N | N | N |
Actual attachments (raw bytes) | N | N | N | N | N | Y |
1 If the parse exception comes early in the parse before the streaming starts (as with an EncryptedDocumentException), you'll get an http status 422 in /tika (text) and /tika (html). If the parse exception happens after content has started streaming, the stream will simply stop and you'll have no idea that there was a parse exception with the /tika (text) option; you'll see truncated html in /tika (html) if this happens.
...