...

Build the Solr dist: cd solr/ and ant package
Unzip your shiny new Solr and create a collection TODO: add example collection here
Place this config file in the collection TODO: add this
from: https://github.com/tballison/tika-addons/tree/main/solr-tika-integration/src/configs
bin\solr start
Copy the files from tika-parsers/src/test/resources/test-documents ... make sure to remove ucar files: *.nc, *.hdf, *.fb2, *.he5 – these wreak havoc with the data importerbin\solr start
Navigate to the Solr admin window->Dataimport.
Close your eyes, cross your fingers, pray to your appropriate diet(y|ies) or not, and press Execute
Watch the command window to see if there were any catastrophic missing class problems
Go to logs to see if there are any show stoppers for exceptions.
When this completes, go to Query and check how many documents are actually indexed
Compare the number of documents in Solr to the number you'd get if you ran java -jar tika-app.jar -i <input_dir> -o <output_dir>

In addition to DIH, the above configs are also set up to work with the ExtractingHandler.

Make sure to set the source directory appropriately for your test files. Note that these indexers do not process files recursively.

Phase 3: Submit a Pull Request

...

Page tree