Giraph implementation of Nutch LinkRank Algorithm

Author

Renato Marroquin Mogrovejo - renatoj.marroquin at gmail dot com

Project Aim

  • Provide a new implementation of web site ranking to Apache Nutch while offering users the ability to extend ranking algorithms by using Apache Giraph.

Project Objectives

  1. Fully integrate the LinkRank algorithm developed within the Apache Giraph community into Apache Nutch due to the lack of ranking algorithms in the latest version of Nutch 1.
  2. Be able to reproduce the example in 3 but using the PageRank implementation in Giraph.
  3. Study different approaches and possibilities of creating variations of the open source PageRank2 as possible new/future ranking algorithms for Nutch.

Project Scope

  • Integrate Apache Giraph's PageRank implementation with Apache Nutch 2.x
  • Write an standard API with Apache Giraph to enable users/devs to create/use new algorithms developed with Apache Giraph

References

1 https://wiki.apache.org/nutch/NewScoring
2 https://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
3 http://wiki.apache.org/nutch/NewScoringIndexingExample

  • No labels