– AndyHedges - 02 Jul 2004
bin/nutch readdb
called java class
net.nutch.db.WebDBReader
command line options
Wiki Markup |
---|
bin/nutch readdb <db> \[-pageurl url\] | \[-pagemd5 md5\] | \[-dumppageurl\] | \[-dumppagemd5\] | \[-toppages \] | \[-linkurl url\] | \[-linkmd5 md5\] | \[-dumplinks\] | \[-stats\] |
-stats
Displays stats on what is in the database
example
$nutch readdb data/db -stats 040702 100856 loading file:/C:/csp2/nutch/conf/nutch-default.xml 040702 100857 loading file:/C:/csp2/nutch/conf/nutch-site.xml Stats for net.nutch.db.WebDBReader@2bb514
Number of pages: 1886 Number of links: 21201
-pageurl url
Displays info on a particular url in the database (N.B. the url must be in the exact form it is in the database including trailing slashes, query strings (and order of query strings), case and so on).
example
$nutch readdb data/db -pageurl http://www.example.com/
Version: 4 URL: http://www.example.com/
ID: 5ef4623d0b61f32c5677695a4bbb86d6 Next fetch: Sun Aug 01 09:40:47 BST 2004 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 42 Score: 1780542.2 NextScore: 1823969.9
-pagemd5 url
Displays info on a particular ID (md5) in the database .
example
$nutch readdb data/db -pagemd5 5ef4623d0b61f32c5677695a4bbb86d6 Version: 4 URL: http://www.example.com/
ID: 5ef4623d0b61f32c5677695a4bbb86d6 Next fetch: Sun Aug 01 09:40:47 BST 2004 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 42 Score: 1780542.2 NextScore: 1823969.9