...
Note: the MetadataFilters
only work with the /rmeta
endpoint. Further, they do not shortcut metadata extraction within Parsers. They only delete the unwanted fields after the parse. This still can save resources in storage and network bandwidth.
A user can map Tika field names to names they prefer. If excludeUnmapped
is set to true, only those fields that are included in the mapping are passed back to the client.
Code Block | ||||
---|---|---|---|---|
| ||||
<properties>
<metadataFilters>
<metadataFilter class="org.apache.tika.metadata.filter.FieldNameMappingFilter">
<params>
<excludeUnmapped>true</excludeUnmapped>
<mappings>
<mapping from="X-TIKA:content" to="content"/>
<mapping from="a" to="b"/>
</mappings>
</params>
</metadataFilter>
</metadataFilters>
</properties> |
A user can set the following in a tika-config.xml
file to have the /rmeta
end point only return three fields:
Code Block | ||||
---|---|---|---|---|
| ||||
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter"> <params> <param name="include" type="list"><include> <string>X<field>X-TIKA:content</string>include> <string>extended<field>extended-properties:Application</string>include> <string>Content<field>Content-Type</string>include> </param> </params> </metadataFilter> </metadataFilters> </properties> |
...
No Format |
---|
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.ExcludeFieldMetadataFilter"> <params> <param name="exclude" type="list"><exclude> <string>X<field>X-TIKA:content</string>field> <string>extended<field>extended-properties:Application</string>field> <string>Content<field>Content-Type</string>field> </param> </params> </metadataFilter> </metadataFilters> </properties> |
...
No Format |
---|
<properties> <metadataFilters> <metadataFilter class="org.apache.tika.metadata.filter.ClearByMimeMetadataFilter"> <params> <param name="mimes" type="list"><mimes> <string>image<mime>image/emf</string>mime> </param>mimes> </params> </metadataFilter> </metadataFilters> </properties> |
...