Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Build the Solr dist: cd solr/ and ant package
  2. Unzip your shiny new Solr and create a collection TODO: add example collection here
  3. Place this config file in the collection TODO: add this
  4. from: https://github.com/tballison/tika-addons/tree/main/solr-tika-integration/src/configs
  5. bin\solr start
  6. Copy the files from tika-parsers/src/test/resources/test-documents ... make sure to remove ucar files: *.nc, *.hdf, *.fb2, *.he5 – these wreak havoc with the data importerbin\solr start
  7. Navigate to the Solr admin window->Dataimport.
  8. Close your eyes, cross your fingers, pray to your appropriate diet(y|ies) or not, and press Execute
  9. Watch the command window to see if there were any catastrophic missing class problems
  10. Go to logs to see if there are any show stoppers for exceptions.
  11. When this completes, go to Query and check how many documents are actually indexed
  12. Compare the number of documents in Solr to the number you'd get if you ran java -jar tika-app.jar -i <input_dir> -o <output_dir>

In addition to DIH, the above configs are also set up to work with the ExtractingHandler. 


You can run either the SolrJ client (https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/SolrJIndexer.java) or the

Curl wrapper https://github.com/tballison/tika-addons/blob/main/solr-tika-integration/src/main/java/org/tallison/indexers/CurlIndexer.java

Make sure to set the source directory appropriately for your test files.  Note that these indexers do not process files recursively.

Phase 3: Submit a Pull Request

...