Introduction
Installing Bigtop Hadoop distribution artifacts lets you have an up and running Hadoop cluster complete with
various Hadoop ecosystem projects in just a few minutes. Be it a single node pseudo-distributed
configuration, or a fully distributed cluster, just make sure you install the packages, install the JDK,
format the namenode and have fun!
Getting the packages onto your box:
CentOS 5, CentOS 6, Fedora 15, RHEL5, RHEL6
- Make sure to grab the repo file:
sudo wget -O /etc/yum.repos.d/bigtop.repo http://www.apache.org/dist/incubator/bigtop/stable/repos/[centos5|centos6|fedora]/bigtop.repo
- This step is optional, but recommended: enable the mirror that is closest to you (uncomment one and only one of the baseurl lines and remove the mirrorlist line):
sudo vi /etc/yum.repos.d/bigtop.repo
- Browse through the artifacts
yum search hadoop
- Install the full Hadoop stack (or parts of it)
sudo yum install hadoop\* flume\* mahout\* oozie\* whirr\*
SLES 11, OpenSUSE
- Make sure to grab the repo file:
sudo wget -O /etc/zypp/repos.d/bigtop.repo http://www.apache.org/dist/incubator/bigtop/stable/repos/suse/bigtop.repo
- Enable the mirror that is closest to you (uncomment one and only one of the baseurl lines)
sudo vi /etc/zypp/repos.d/bigtop.repo
- Browse through the artifacts
zypper search hadoop
- Install the full Hadoop stack (or parts of it)
sudo zypper install hadoop\* flume\* mahout\* oozie\* whirr\*
Ubuntu
- Install the Apache Bigtop GPG key
wget -O- http://www.apache.org/dist/incubator/bigtop/stable/repos/GPG-KEY-bigtop | sudo apt-key add -
- Make sure to grab the repo file:
sudo wget -O /etc/apt/sources.list.d/bigtop.list http://www.apache.org/dist/incubator/bigtop/stable/repos/ubuntu/bigtop.list
- Enable the mirror that is closest to you (uncomment one and only one pair of deb/deb-src lines)
sudo vi /etc/apt/sources.list.d/bigtop.list
- Update the apt cache
sudo apt-get update
- Browse through the artifacts
apt-cache search hadoop
- Install the full Hadoop stack (or parts of it)
sudo apt-get install hadoop\* flume\* mahout\* oozie\* whirr\*
Running Hadoop
After installing Hadoop packages onto your Linux box, make sure that:
- You have the latest JDK installed on your system as well. You can either get it from the official Oracle website (http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html) or follow the advice given by your Linux distribution (e.g. some Debian based Linux distributions have JDK packaged as part of their extended set of packages).
- Format the namenode
sudo -u hdfs hadoop namenode -format
- Start the necessary Hadoop services. E.g. for the pseudo distributed Hadoop installation you can simply do:
for i in hadoop-namenode hadoop-datanode hadoop-jobtracker hadoop-tasktracker ; do sudo service $i start ; done
- Once your basic cluster is up and running it is a good idea to create a home directory on the HDFS:
sudo -u hdfs hadoop fs -mkdir /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER
- Enjoy your cluster
hadoop fs -lsr / hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000
Where to go from here
It is highly recommended that you read documentation provided by the Hadoop project itself (http://hadoop.apache.org/common/docs/r0.20.205.0/) and that you browse through the Puppet deployment code that is shipped as part of the Bigtop release.