You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »


Using Bigtop Hadoop distribution artifacts you can have an up and running Hadoop cluster complete with
various Hadoop ecosystem projects in a manner of minutes. Be it a single node pseudo-distributed
configuration, or a fully distributed cluster, just make sure you install the packages, install the JDK,
format the namenode and have fun!

Getting the packages onto your box:

CentOS 5, CentOS 6, Fedora 15, RHEL5, RHEL6

  1. Make sure to grab the repo file:
    sudo wget -O /etc/yum.repos.d/bigtop.repo[centos5|centos6|fedora]/bigtop.repo
  2. This step is optional, but recommended: enable the mirror that is closest to you (uncomment one and only one of the baseurl lines and remove the mirrorlist line):
    sudo vi /etc/yum.repos.d/bigtop.repo
  3. Browse through the artifacts
    yum search hadoop
  4. Install the full Hadoop stack (or parts of it)
    sudo yum install hadoop\* flume\* mahout\* oozie\* whirr\*


  1. Make sure to grab the repo file:
    sudo wget -O /etc/zypp/repos.d/bigtop.repo
  2. Enable the mirror that is closest to you (uncomment one and only one of the baseurl lines)
    sudo vi /etc/zypp/repos.d/bigtop.repo
  3. Browse through the artifacts
    zypper search hadoop
  4. Install the full Hadoop stack (or parts of it)
    sudo zypper install hadoop\* flume\* mahout\* oozie\* whirr\*


  1. Install the Apache Bigtop GPG key
    wget -O- | sudo apt-key add -
  2. Make sure to grab the repo file:
    sudo wget -O /etc/apt/sources.list.d/bigtop.list
  3. Enable the mirror that is closest to you (uncomment one and only one pair of deb/deb-src lines)
    sudo vi /etc/apt/sources.list.d/bigtop.list
  4. Browse through the artifacts
    apt-cache search hadoop
  5. Install the full Hadoop stack (or parts of it)
    sudo apt-get install install hadoop\* flume\* mahout\* oozie\* whirr\*

Running Hadoop

After installing Hadoop packages onto your Linux box, make sure that:

  1. You have the latest JDK installed on your system as well. You can either get it from the official Oracle website ( or follow the advice given by your Linux distribution (e.g. some Debian based Linux distributions have JDK packaged as part of their extended set of packages).
  2. Format the namenode
    sudo -u hdfs hadoop namenode -format
  3. Start the necessary Hadoop services. E.g. for the pseudo distributed Hadoop installation you can simply do:
    for i in hadoop-namenode hadoop-datanode hadoop-jobtracker hadoop-tasktracker ; do sudo service $i start ; done
  4. Once your basic cluster is up and running it is a good idea to create a home directory on the HDFS:
    sudo -u hdfs hadoop fs -mkdir /user/$USER
    sudo -u hdfs hadoop fs -chown $USER /user/$USER
  5. Enjoy your cluster
    hadoop fs -lsr /
    hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000

Where to go from here

It is highly recommended that you read documentation provided by the Hadoop project itself ( and that you browse through the Puppet deployment code that is shipped as part of the Bigtop release.

  • No labels