Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Build the Solr dist: cd solr/ and ant package
  2. Unzip your shiny new Solr and create a collection TODO: add example collection herePlace this config file in the collection TODO: add thisfrom: https://github.com/tballison/tika-addons/tree/main/solr-tika-integration/src/configs
  3. bin\solr start
  4. Copy the files from tika-parsers/src/test/resources/test-documents ... make sure to remove ucar files: *.nc, *.hdf, *.fb2, *.he5 – these wreak havoc with the data importer
  5. Navigate to the Solr admin window->Dataimport.
  6. Close your eyes, cross your fingers, pray to your appropriate diet(y|ies) or not, and press Execute
  7. Watch the command window to see if there were any catastrophic missing class problems
  8. Go to logs to see if there are any show stoppers for exceptions.
  9. When this completes, go to Query and check how many documents are actually indexed
  10. Compare the number of documents in Solr to the number you'd get if you ran java -jar tika-app.jar -i <input_dir> -o <output_dir>

In addition to DIH, the above configs are also set up to work with the ExtractingHandler. 


You can run either the SolrJ client (https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/SolrJIndexer.java) or the

Curl wrapper https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/CurlIndexer.java

Make sure to set the source directory appropriately and the solr-collection name correctly for your test files and Solr collection.  Note that these indexers do not process files recursively.

Phase 3: Submit a Pull Request

...