Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagexml
titleFieldNameMappingFilterIncludeFieldMetadataFilter
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
      <params>
        <include>
          <field>X-TIKA:content</include>
          <field>extended-properties:Application</include>
          <field>Content-Type</include>
        </param>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...

To exclude those three fields but include all other fields:


noformat
Code Block
languagexml
titleExcludeFieldMetadataFilter
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.ExcludeFieldMetadataFilter">
      <params>
        <exclude>
          <field>X-TIKA:content</field>
          <field>extended-properties:Application</field>
          <field>Content-Type</field>
        </param>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...

A user may want to parse a file type to get at the embedded contents within it, but s/he may not want a metadata object or contents for the file type itself.  For example, image/emf files often contain duplicative text, but they may contain an embedded PDF file.  If the client had turned off the EMFParser, the embedded PDF file would not be parsed.  When the /rmeta  endpoint is configured with the following, it will delete the entire metadata object for files of type image/emf .

noformat
Code Block
languagexml
titleClearByMimeMetadataFilter
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.ClearByMimeMetadataFilter">
      <params>
        <mimes>
          <mime>image/emf</mime>
        </mimes>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...