Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To configure the StandardWriteFilter, set the properties in its factory in the <autoDetectParserConfig> element in the tika-config.xml file:.

Code Block
languagexml
titleStandardWriteFilter
<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <autoDetectParserConfig>
    <metadataWriteFilterFactory class="org.apache.tika.metadata.writefilter.StandardWriteFilterFactory">
      <params>
		<!-- all measurements are in UTF-16 bytes. If any values are truncated, TikaCoreProperties.TRUNCATED_METADATA is set to true in the metadata object -->

        <!-- the maximum size for a metadata key. Keys  <maxKeySize>999<will be truncated to this length if &gt; this value -->
        <maxKeySize>1000</maxKeySize>

        <!-- max total  <maxFieldSize>10001</maxFieldSize>size for a field in UTF-16 bytes.  If a field has multiple values, their lengths are summed to calculate the field size. -->
        <maxFieldSize>10000</maxFieldSize>

        <!-- max total estimated byte is a sum of the key sizes and values -->
        <maxTotalEstimatedBytes>100000</maxTotalEstimatedBytes>
  
        <!-- limit the count of values for multi-valued fields -->
        <maxValuesPerField>100</maxValuesPerField>
        <!-- include only these fields. NOTE, however that there a several fields that are important to the 
             parse process and these fields are always allowed in addition (see ALWAYS_SET_FIELDS and ALWAYS_ADD_FIELDS 
             in the StandardWriteFilter -->
        <includeFields>
          <field>dc:creator</field>
          <field>dc:title</field>
        </includeFields>
      </params>
    </metadataWriteFilterFactory>
  </autoDetectParserConfig>
</properties>

...