Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Removed duplicate/triplicate keys

Background: In early 1.x, we had basic metadata keys that were created somewhat ad hoc.  We then added metadata keys based on standards such as Dublin Core, or we at least tried to add namespaces to the metadata keys for specific file formats.  To maintain backwards compatibility, we kept the old keys and added new keys.  This led to quite a bit of metadata bloat, where we'd have the same information two or three times.  In Tika 2.x, we slimmed down the metadata keys and relied only on the standards-based or name-spaced keys.  In the table below, we document the mappings.  If you notice any missing, please let us know or update the wiki.

...

tika-parsers – Configuring via tika-config.xml 

In 2.x, we're moving to centralize and prefer configuration for everything through ve moved all configuration into a tika-config.xml file.  Two popular parsers used to rely on *.properties files; see their individual pages for details: PDFParser and TesseractOCRParser.

...