You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This table highlights some differences between some of the handlers. I've temporarily left in question marks for items we need to confirm.

Feature/tika (text|body)/tika (html)/tika (json)/rmeta/meta/unpack
Text (including text of embedded documents)YYYYNY ( with /unpack/all)
Metadata of main documentNYYYYY ( with /unpack/all)
Metadata of embedded documents/attachmentsNNNYNN
Notification of parse exceptionY/N[1]Y/N[1]YYYY?
Specific stacktrace if server is started with the -s  optionNNYYNN
Notification of parse exception in embedded documentNNNYNN?
Specific stacktrace for parse exception in embedded documentNNNYNN
Streaming write[2]YYNNNN
WriteLimit with the writeLimit  headerNNYYN/AN
Actual attachments (raw bytes)NNNNNY

1 If the parse exception comes early in the parse before the streaming starts (as with an EncryptedDocumentException), you'll get an http status 422 in /tika (text) and /tika (html).  With the /tika (text)  option, if the parse exception happens after content has started streaming, the stream will simply stop and you'll have no idea that there was a parse exception.  With the /tika (html)  option, you'll see truncated html in /tika (html) if this happens.

2 Tika tries to stream in processing and in writing the output.  For some file formats, the parsers currently load the full document into memory and then write the content.  So, this row focuses on whether Tika streams the writing of the content (and not the streaming read of the file).

  • No labels