$ nutch generate crawldb segments5
...
2020-12-22 10:17:05,168 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-22 10:17:05,759 INFO crawl.Generator: Generator: starting at 2020-12-22 10:17:05
2020-12-22 10:17:05,759 INFO crawl.Generator: Generator: Selecting best-scoring urls due for fetch.
2020-12-22 10:17:05,759 INFO crawl.Generator: Generator: filtering: true
2020-12-22 10:17:05,759 INFO crawl.Generator: Generator: normalizing: true
2020-12-22 10:17:05,955 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-12-22 10:17:06,071 INFO client.AHSProxy: Connecting to Application History server at localhost/127.0.0.1:10200
2020-12-22 10:17:06,308 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/lmcgibbn/.staging/job_1608661005352_0001
2020-12-22 10:17:07,115 INFO input.FileInputFormat: Total input files to process : 1
2020-12-22 10:17:07,161 INFO mapreduce.JobSubmitter: number of splits:1
2020-12-22 10:17:07,387 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1608661005352_0001
2020-12-22 10:17:07,388 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-12-22 10:17:07,531 INFO client.YARNRunner: Number of stages: 2
2020-12-22 10:17:07,597 INFO conf.Configuration: resource-types.xml not found
2020-12-22 10:17:07,598 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-12-22 10:17:07,809 INFO counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
2020-12-22 10:17:07,809 INFO counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=120
2020-12-22 10:17:07,809 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.10.1-SNAPSHOT, revision=849e1d7694cdfd2432d631830940bc95c6f26ead, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2020-12-17T01:41:13Z, buildUser=lmcgibbn, buildJavaVersion=1.8.0_221 ]
2020-12-22 10:17:07,825 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-12-22 10:17:07,825 INFO client.AHSProxy: Connecting to Application History server at localhost/127.0.0.1:10200
2020-12-22 10:17:07,826 INFO client.TezClient: Submitting DAG application with id: application_1608661005352_0001
2020-12-22 10:17:07,828 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://localhost:9000/apps/tez-0.10.1-SNAPSHOT/tez-0.10.1-SNAPSHOT.tar.gz#tez,hdfs://localhost:9000/apps/nutch/apache-nutch-1.18-SNAPSHOT-bin.tar.gz#nutch
2020-12-22 10:17:07,828 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: ./tez/tez-0.10.1-SNAPSHOT/*:./tez/tez-0.10.1-SNAPSHOT/lib/*:./nutch/apache-nutch-1.18-SNAPSHOT/*:./nutch/apache-nutch-1.18-SNAPSHOT/conf/*:./nutch/apache-nutch-1.18-SNAPSHOT/lib/*:./nutch/apache-nutch-1.18-SNAPSHOT/plugins/*/*
2020-12-22 10:17:07,842 INFO client.TezClient: Tez system stage directory hdfs://localhost:9000/tmp/hadoop-yarn/staging/lmcgibbn/.staging/job_1608661005352_0001/.tez/application_1608661005352_0001 doesn't exist and is created
2020-12-22 10:17:08,413 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1608661005352_0001, dagName=generate: select from crawldb
2020-12-22 10:17:08,787 INFO impl.YarnClientImpl: Submitted application application_1608661005352_0001
2020-12-22 10:17:08,790 INFO client.TezClient: The url to track the Tez AM: http://localhost:8088/proxy/application_1608661005352_0001/
^[[C2020-12-22 10:17:50,693 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2020-12-22 10:17:50,693 INFO client.AHSProxy: Connecting to Application History server at localhost/127.0.0.1:10200
2020-12-22 10:17:50,720 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1608661005352_0001/
2020-12-22 10:17:50,721 INFO mapreduce.Job: Running job: job_1608661005352_0001
2020-12-22 10:17:51,729 INFO mapreduce.Job: Job job_1608661005352_0001 running in uber mode : false
2020-12-22 10:17:51,731 INFO mapreduce.Job: map 0% reduce 0%
2020-12-22 10:17:56,764 INFO mapreduce.Job: map 100% reduce 0%
2020-12-22 10:17:56,766 INFO mapreduce.Job: map 100% reduce 100%
2020-12-22 10:17:56,768 INFO mapreduce.Job: Job job_1608661005352_0001 completed successfully
2020-12-22 10:17:56,775 INFO mapreduce.Job: Counters: 0
2020-12-22 10:17:56,776 INFO crawl.Generator: Generator: number of items rejected during selection:
2020-12-22 10:17:56,806 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ... |