You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Pluggable Indexing

The index command (running org.apache.nutch.indexer.IndexingJob) takes the content from one or multiple segments and passes it to all enabled IndexWriter plugins which send the documents to Solr, Elasticsearch, and various other index back-ends.


Nutch 1.x


Usage: bin/nutch index <crawldb> [-linkdb <linkdb>] [-params k1=v1&k2=v2...] (<segment> ... | -dir <segments>) [-noCommit] [-deleteGone] [-filter] [-normalize] [-addBinaryContent] [-base64]

Indexwriter plugins have to be enabled by the property plugin.includes. See IndexWriter how to configure these plugins.

  • No labels