...

Build the Solr dist: cd solr/ and ant package
Unzip your shiny new Solr and create a collection TODO: add example collection herePlace this config file in the collection TODO: add thisfrom: https://github.com/tballison/tika-addons/tree/main/solr-tika-integration/src/configs
bin\solr start
Copy the files from tika-parsers/src/test/resources/test-documents ... make sure to remove ucar files: *.nc, *.hdf, *.fb2, *.he5 – these wreak havoc with the data importer
Navigate to the Solr admin window->Dataimport.
Close your eyes, cross your fingers, pray to your appropriate diet(y|ies) or not, and press Execute
Watch the command window to see if there were any catastrophic missing class problems
Go to logs to see if there are any show stoppers for exceptions.
When this completes, go to Query and check how many documents are actually indexed
Compare the number of documents in Solr to the number you'd get if you ran java -jar tika-app.jar -i <input_dir> -o <output_dir>

In addition to DIH, the above configs are also set up to work with the ExtractingHandler.

You can run either the SolrJ client (https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/SolrJIndexer.java) or the

Curl wrapper https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/CurlIndexer.java

Make sure to set the source directory appropriately and the solr-collection name correctly for your test files and Solr collection. Note that these indexers do not process files recursively.

Phase 3: Submit a Pull Request

...

Page tree

Versions Compared

Old Version 4

New Version Current

Key

Phase 3: Submit a Pull Request

Page tree

Page History

Versions Compared

Old Version 4

New Version Current

Key

Phase 3: Submit a Pull Request