...
We've mentioned briefly above some of the factories that can be modified in the AutoDetectParserConfig
. There are other parameters that can be used to modify the behavior of the AutoDetectParser
via the tika-config.xml
. The AutoDetectParser
is built from/contains the <parsers/>
element (or SPI if no <parsers/>
element is specified) in the tika-config
. Because of this, the configuration of the AutoDetectParser
differs from the component parsers that it wraps – the AutoDetectParser
uses its own <autoDetectParserConfig/>
element at the main level inside the <properties/>
element.
Code Block | ||||
---|---|---|---|---|
| ||||
<?xml version="1.0" encoding="UTF-8"?> <properties> <autoDetectParserConfig> <params> <!-- if the incoming metadata object has a ContentLength entry and it is larger than this value, spool the file to disk; this is useful for some file formats that are more efficiently processed via a file instead of an InputStream --> <spoolToDisk>100000</spoolToDisk> <!-- the next four are parameters for the SecureContentHandler --> <!-- threshold used in zip bomb detection. This many characters must be written before the maximum compression ratio is calculated --> <outputThreshold>10000</outputThreshold> <!-- maximum compression ratio between output characters and input bytes --> <maximumCompressionRatio>100</maximumCompressionRatio> <!-- maximum XML element nesting level --> <maximumDepth>100</maximumDepth> <!-- maximum embedded file depth --> <maximumPackageEntryDepth>100</maximumPackageEntryDepth> <!-- as of Tika > 2.7.0, you can skip the check and exception for a zero-byte inputstream--> <throwOnZeroBytes>false</throwOnZeroBytes> </params> <!-- as of Tika 2.5.x, this is the preferred way to configure digests --> <digesterFactory class="org.apache.tika.parser.digestutils.CommonsDigesterFactory"> <params> <markLimit>100000</markLimit> <!-- this specifies SHA256, base32 and MD5 --> <algorithmString>sha256:32,md5</algorithmString> </params> </digesterFactory> </autoDetectParserConfig> </properties> |
...