...
- Unix environment, or Windows-Cygwin environment
- Java Runtime/Development Environment (JDK 1.8 11 / Java 811)
- (Source build only) Apache Ant: https://ant.apache.org/
...
- Download a source package (
apache-nutch-1.X-src.zip
) - Unzip
cd apache-nutch-1.X/
- Run
ant
in this folder (cf. RunNutchInEclipse) - Now there is a directory
runtime/local
which contains a ready to use Nutch installation.
When the source distribution is used${NUTCH_RUNTIME_HOME
} refers toapache-nutch-1.X/runtime/local/
. Note that - config files should be modified in
apache-nutch-1.X/runtime/local/conf/
ant clean
will remove this directory (keep copies of modified config files)
Option 3: Set up Nutch from source
See UsingGit#CheckingoutacopyofNutchandmodifyingit
Verify your Nutch installation
...
No Format |
---|
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.811/Home # note that the actual path may be different on your system |
...
NOTE: If you previously modified the file conf/regex-urlfilter.txt
as covered here you will need to change it back.
...
This option shadows the creation of the seed list as covered here.
No Format |
---|
bin/nutch inject crawl/crawldb urls |
...
Note: For this step you should have Solr installation. If you didn't integrate Nutch with Solr. You should read here.
Now we are ready to go on and index all the resources. For more information see the command line options.
...
Every version of Nutch is built against a specific Solr version, but you may also try a "close" version.
Nutch | Solr |
1.19 | 8.11.2 |
1.18 | 8.5.1 |
1.17 | 8.5.1 |
1.16 | 7.3.1 |
1.15 | 7.3.1 |
1.14 | 6.6.0 |
1.13 | 5.5.0 |
1.12 | 5.4.1 |
...