Page History

...

Difference	Hadoop 1.X	Hadoop 2.X
Number of nodes	~4,000 nodes per cluster	~10,000 nodes per cluster
Running Time	O(#nodes in cluster)	O(cluster size)
Namespace Config	Only 1 namespace node	Multiple namespaces for managing HDFS
Application support	Only able to run Map and reduce jobs, that are static	Able to run any java apps that can integrate with Hadoop
Efficiency	Bottleneck lies in the JobTracker for both resource management and taskTracker task scheduling	Uses YARN (Yet Another Resource Negotiator) to perform effective cluster management

Wiki Markup
Table 1.1 – Key difference in Hadoop 1.X and 2.X \[11\] \\

Although this table does not highlight all the differences between the two codebases, it is a good start to start exploring what changes must be made to Apache Nutch’s tasks to port it to 2.X. In Apache Hadoop 2.x the part that deals with resource management capabilities has been placed into Apache Hadoop YARN, a general purpose, distributed application management framework while Apache Hadoop MapReduce (aka MRv2) and it remains as a pure distributed computation framework.

...

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version 2

Key