You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Introduction

Installing Bigtop Hadoop distribution artifacts lets you have an up and running Hadoop cluster complete with
various Hadoop ecosystem projects in just a few minutes. Be it a single node pseudo-distributed
configuration, or a fully distributed cluster, just make sure you install the packages, install the JDK,
format the namenode and have fun!

Getting the packages onto your box:

CentOS 5, CentOS 6, Fedora 15, RHEL5, RHEL6

  1. Make sure to grab the repo file:
    sudo wget -O /etc/yum.repos.d/bigtop.repo http://www.apache.org/dist/incubator/bigtop/stable/repos/[centos5|centos6|fedora]/bigtop.repo
    
  2. This step is optional, but recommended: enable the mirror that is closest to you (uncomment one and only one of the baseurl lines and remove the mirrorlist line):
    sudo vi /etc/yum.repos.d/bigtop.repo
    
  3. Browse through the artifacts
    yum search hadoop
    
  4. Install the full Hadoop stack (or parts of it)
    sudo yum install hadoop\* flume\* mahout\* oozie\* whirr\*
    

SLES 11, OpenSUSE

  1. Make sure to grab the repo file:
    sudo wget -O /etc/zypp/repos.d/bigtop.repo http://www.apache.org/dist/incubator/bigtop/stable/repos/suse/bigtop.repo
    
  2. Enable the mirror that is closest to you (uncomment one and only one of the baseurl lines)
    sudo vi /etc/zypp/repos.d/bigtop.repo
    
  3. Browse through the artifacts
    zypper search hadoop
    
  4. Install the full Hadoop stack (or parts of it)
    sudo zypper install hadoop\* flume\* mahout\* oozie\* whirr\*
    

Ubuntu

  1. Install the Apache Bigtop GPG key
    wget -O- http://www.apache.org/dist/incubator/bigtop/stable/repos/GPG-KEY-bigtop | sudo apt-key add -
    
  2. Make sure to grab the repo file:
    sudo wget -O /etc/apt/sources.list.d/bigtop.list http://www.apache.org/dist/incubator/bigtop/stable/repos/ubuntu/bigtop.list
    
  3. Enable the mirror that is closest to you (uncomment one and only one pair of deb/deb-src lines)
    sudo vi /etc/apt/sources.list.d/bigtop.list
    
  4. Update the apt cache
    sudo apt-get update
    
  5. Browse through the artifacts
    apt-cache search hadoop
    
  6. Install the full Hadoop stack (or parts of it)
    sudo apt-get install hadoop\* flume\* mahout\* oozie\* whirr\*
    

Running Hadoop

After installing Hadoop packages onto your Linux box, make sure that:

  1. You have the latest JDK installed on your system as well. You can either get it from the official Oracle website (http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html) or follow the advice given by your Linux distribution (e.g. some Debian based Linux distributions have JDK packaged as part of their extended set of packages).
  2. Format the namenode
    sudo -u hdfs hadoop namenode -format
    
  3. Start the necessary Hadoop services. E.g. for the pseudo distributed Hadoop installation you can simply do:
    for i in hadoop-namenode hadoop-datanode hadoop-jobtracker hadoop-tasktracker ; do sudo service $i start ; done
    
  4. Once your basic cluster is up and running it is a good idea to create a home directory on the HDFS:
    sudo -u hdfs hadoop fs -mkdir /user/$USER
    sudo -u hdfs hadoop fs -chown $USER /user/$USER
    
  5. Enjoy your cluster
    hadoop fs -lsr /
    hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000
    

Where to go from here

It is highly recommended that you read documentation provided by the Hadoop project itself (http://hadoop.apache.org/common/docs/r0.20.205.0/) and that you browse through the Puppet deployment code that is shipped as part of the Bigtop release.

  • No labels