Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Feature/tika (text|body)/tika (html)/tika (json)/rmeta/meta/unpack
Text (including text of embedded documents)YYYYNY ( with /unpack/all)
Metadata of main documentNYYYYY ( with /unpack/all)
Metadata of embedded documents/attachmentsNNNYNN
Notification of parse exceptionY/N[1]Y/N[1]YYYY?
Specific stacktrace if server is started with the -s (stacktrace)  optionNNYYNN
MetadataFilters are applied (see ModifyingContentWithHandlersAndMetadataFilters)NNYYNN
Notification of parse exception in embedded documentNNNY as of 2.4.1YNN?
Specific stacktrace for parse exception in embedded documentNNNY as of 2.4.1YNN
Streaming write[2]YYNNNN
WriteLimit with the writeLimit  headerNNYYN/AN
Actual attachments (raw bytes)NNNNNY

...

1 If the parse exception comes early in the parse before the streaming starts (as with an EncryptedDocumentException), you'll get an http status 422 in /tika (text) and /tika (html).  With the /tika (text)  option, if the parse exception happens after content has started streaming, the stream will simply stop and you'll have no idea that there was a parse exception.  With the /tika (html)  option, you'll see truncated html in /tika (html) if this happens.

2 Tika tries to stream in processing while parsing and in while writing the output.  For some file formats, the parsers currently load the full document into memory and then write the content.  So, this row focuses on whether Tika streams the writing of the content (and not the streaming read/parse of the file).