...
Code Block | ||||
---|---|---|---|---|
| ||||
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter"> <params> <include> <field>X-TIKA:content</include> <field>extended-properties:Application</include> <field>Content-Type</include> </param> </params> </metadataFilter> </metadataFilters> </properties> |
...
To exclude those three fields but include all other fields:
Code Block | ||||
---|---|---|---|---|
| ||||
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.ExcludeFieldMetadataFilter"> <params> <exclude> <field>X-TIKA:content</field> <field>extended-properties:Application</field> <field>Content-Type</field> </param> </params> </metadataFilter> </metadataFilters> </properties> |
...
A user may want to parse a file type to get at the embedded contents within it, but s/he may not want a metadata object or contents for the file type itself. For example, image/emf
files often contain duplicative text, but they may contain an embedded PDF file. If the client had turned off the EMFParser
, the embedded PDF file would not be parsed. When the /rmeta
endpoint is configured with the following, it will delete the entire metadata object for files of type image/emf
.
Code Block | ||||
---|---|---|---|---|
| ||||
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.ClearByMimeMetadataFilter"> <params> <mimes> <mime>image/emf</mime> </mimes> </params> </metadataFilter> </metadataFilters> </properties> |
...