Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add Solr version for Nutch 1.19

...

  • Unix environment, or Windows-Cygwin environment
  • Java Runtime/Development Environment (JDK 1.8 11 / Java 811)
  • (Source build only) Apache Ant: https://ant.apache.org/

...

  • Download a source package (apache-nutch-1.X-src.zip)
  • Unzip
  • cd apache-nutch-1.X/
  • Run ant in this folder (cf. RunNutchInEclipse)
  • Now there is a directory runtime/local which contains a ready to use Nutch installation.
    When the source distribution is used ${NUTCH_RUNTIME_HOME} refers to apache-nutch-1.X/runtime/local/. Note that
  • config files should be modified in apache-nutch-1.X/runtime/local/conf/
  • ant clean will remove this directory (keep copies of modified config files)

Option 3: Set up Nutch from source

See UsingGit#CheckingoutacopyofNutchandmodifyingit

Verify your Nutch installation

...

No Format
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.811/Home
# note that the actual path may be different on your system

...

NOTE: If you previously modified the file conf/regex-urlfilter.txt as covered here you will need to change it back.

...

This option shadows the creation of the seed list as covered here.

No Format
bin/nutch inject crawl/crawldb urls

...

Note: For this step you should have Solr installation. If you didn't integrate Nutch with Solr. You should read here.

Now we are ready to go on and index all the resources. For more information see the command line options.

...

Every version of Nutch is built against a specific Solr version, but you may also try a "close" version.

Nutch

Solr

1.198.11.2
1.188.5.1
1.178.5.1
1.167.3.1

1.15

7.3.1

1.14

6.6.0

1.13

5.5.0

1.12

5.4.1

...