THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- DownloadingNutch
- Current CommandLineOptions: Command line options for 1.X and 2.X
- JavaDocs – The JavaDocs for the most recent Nutch-1.X release
- JavaDocs – of Nutch-1.X nightly builds
- JavaDocs – The JavaDocs for or the most recent Nutch-2.X release.
...
- OverviewDeploymentConfigs
:This full page requires a complete update to reflect recent Nutch releases:
- NutchConfigurationFiles: An overview from Nutch developers.
- NutchPropertiesCompleteList: A fine grained account of all Nutch property configuration.
- HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes.
- NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch intranet crawling configuration.
- OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch.
- ErrorMessages – What they mean and suggestions for getting rid of them.
:This requires extensive updating to reflect recent Nutch releases. In addition the legacy indexing and searching material should be archived.
- IndexStructure
:This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing:
- IndexWriters: How to configure the index writers for indexing step.
- Exchanges: How to configure the exchanges for indexing step.
- Logging: Details of logging using slf4j and log4j2
- Metrics: A narrative on Nutch application metrics. It details which metrics are captured for which Nutch Job's within which Tasks.
General Information
- Nutch Website
- Features
:TODO:This needs to be completely overhauled to reflect recent Nutch features.
- Current Nutch Gotchas
- PublicServers running Nutch
- Presentations on Nutch
- Press Articles
- Evaluations of Search Quality
- Commercial Support & developers for hire
- Mailing Lists
- AcademicArticles that deal with Nutch
- FAQ
- HardwareRequirements
- NutchResources
- NutchScoring - The whats and wheres of Scoring implementations in Apache Nutch
- NutchFileFormats - Provides information on the Nutch file formats
...
- Becoming a Nutch Developer - Start developing and contributing to Nutch.
- PluginCentral – How to write your own plugins and use other people's.
- InternalDocumentation – How Nutch works.
- Nutch Version Control
- UsingGit - a guide to leveraging Git and Nutch. Nutch's source code is no longer managed in Subversion, it's managed in Git.
- HowToContribute
- Committer's_Rules – Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit.
- Release_HOWTO
- Apache CMS] - How to edit the Nutch website based on the [http://www.apache.org/dev/cms.html.Nutch website repository (see README there how to edit and deploy changes to the website)
- Image_Search_Design
- StrategicGoals
- Getting_Started
- NutchMeetUps - Records of previous Nutch community meetup, hackathons, barcamps etc.
- Using Nutch as a Maven dependency
- GoogleSummerOfCode - An area dedicated to GSoC projects and student/mentor development/documentation sandbox.
- AdvancedAjaxInteraction - Discussion centered on enabling Nutch to not only fetch, but also interact with JavaScript
- WhiteListRobots - User guide for the new host robots.txt whitelist capability
...