Configuring and running the Index Job

The Nutch REST API can be used to performing indexing of the Nutch segments. Before running the index job, the user needs to configure an indexer. The following are the configurations of two indexers (Solr and Elasticsearch)

Configuring Solr

First, make sure your solr instance has been set up correctly. You can follow this tutorial to set up Solr with Nutch. Once your instance is up and running, use the configuration end point to create a new configuration with the arguments passed as below.

POST /config/{name-of-new-config}

{
"configId":"solr-config",
"force":"true",
"params":{"solr.server.url":"http://127.0.0.1:8983/solr/"}
}

Now use the above confId while running the jobs.

Configuring Elasticsearch

First, make sure you have elasticsearch running on your machine. Now, you can use the configuration end point to create a new configuration to set up using elasticsearch with Nutch.

POST /config/{name-of-new-config}

{
"configId":"elastic-config",
"force":"true",
"params":{"elastic.host":"localhost",
          "elastic.cluster":"elasticsearch"
         }
}

Now use the above confId while running the jobs.

  • No labels