Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you need different behavior, implement a WriteFilterFactory, add it to your classpath and specify it in the tika-config.xml.

4. AutoDetectParserConfig

We've mentioned briefly above some of the factories that can be modified in the AutoDetectParserConfig.  There are other parameters that can be used to modify the behavior of the AutoDetectParser via the tika-config.xml.

Code Block
languagexml
titleAutoDetectParserConfig
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <autoDetectParserConfig>
    <params>
      <!-- if the incoming metadata object has a ContentLength entry and it is larger than this
           value, spool the file to disk; this is useful for some file formats that are more efficiently
           processed via a file instead of an InputStream -->
      <spoolToDisk>100000</spoolToDisk>
      <!-- the next four are parameters for the SecureContentHandler -->
      <!-- threshold used in zip bomb detection. This many characters must be written
           before the maximum compression ratio is calculated -->
      <outputThreshold>10000</outputThreshold>
      <!-- maximum compression ratio between output characters and input bytes -->
      <maximumCompressionRation>100</maximumCompressionRatio>
      <!-- maximum XML element nesting level -->
      <maximumDepth>100</maximumDepth>
      <!-- maximum embedded file depth -->
      <maximumPackageEntryDepth>100</maximumPackageEntryDepth>
    </params>
  </autoDetectParserConfig>
</properties>