THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
This is based on GettingNutchRunningWithRedHatApplicationServer. To make this easier to start we are using the yum command line as an example.
Repositories we need
Packages to Install
This is a primary list from the Redhat server
yum install ant ant-apache-regexp axis jaf jakarta-commons-beanutils jakarta-commons-collections jakarta-commons-daemon jakarta-commons-dbcp jakarta-commons-digester jakarta-commons-discovery jakarta-commons-el jakarta-commons-fileupload jakarta-commons-httpclient jakarta-commons-launcher jakarta-commons-logging jakarta-commons-modeler jakarta-commons-pool jakarta-commons-validator jakarta-regexp jakarta-taglibs-standard jakarta-taglibs-standard-javadoc javamail jta jta-javadoc junit libgcj34 log4j mx4j oro regexp servletapi4 servletapi5 struts11 tomcat5 tomcat5-admin-webapps tomcat5-webapps tyrex wsdl4j xalan xerces xml-commons xml-commons-apis xml-commons-resolver
Installing for dependencies:
bcel i386 5.1-8jpp.1 core 983 k eclipse-ecj i386 1:3.2.1-4.fc6 core 7.9 M gcc-java i386 4.1.1-30 core 2.8 M geronimo-specs i386 1.0-0.M2.2jpp.12 core 230 k jakarta-oro i386 2.0.8-3jpp.1 core 173 k java-1.4.2-gcj-compat-devel i386 1.4.2.0-40jpp.110 core 49 k libgcj-devel i386 4.1.1-30 core 1.4 M mx4j i386 1:3.0.1-6jpp.4 core 2.5 M regexp i386 1.4-2jpp.2 core 91 k wsdl4j i386 1.5.2-4jpp.1 core 388 k zlib-devel i386 1.2.3-3 core
Yum Install Errors:
- No Match for argument: jta-javadoc
Install Java
Download and Testing
- DownloadingNutch: downloaded nutch-0.8.tar.gz
tar xzf nutch-08.tar.gz cd nutch-0.8 {{{ export JAVA_HOME=/usr/java/jdk1.5.0_08/ bin/nutch
- Test using NutchTutorial
- make a new dir
urls
- add an url in a new file 'urls/nutch'
- add/edit `conf/crawl-urlfilter.txt' (under # accept hosts in MY.DOMAIN.NAME )
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
Check logs/hadoop.log for success.
Instead oft catalina.sh you starting the tomcat5 service by running:
/sbin/service tomcat5 start
You find tomcats log in /var/log/tomcat5/catalina.out
<<< FrontPage