Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. You have the latest JDK installed on your system as well. You can either get it from the official Oracle website (http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html) or follow the advice given by your Linux distribution (e.g. some Debian based Linux distributions have JDK packaged as part of their extended set of packages). If your JDK is installed in a non-standard location, make sure to add the line below to the /etc/default/hadoop file
    No Format
    export JAVA_HOME=XXXX
    
  2. Format the namenode
    No Format
    sudo -u hdfs hadoop namenode -format
    
  3. Start the necessary Hadoop services. E.g. for the pseudo distributed Hadoop installation you can simply do:
    No Format
    for i in hadoop-namenode hadoop-datanode hadoop-jobtracker hadoop-tasktracker ; do sudo service $i start ; done
    
  4. Once your basic cluster is up and running it is a good idea to create a home directory on the HDFS:
    No Format
    sudo -u hdfs hadoop fs -mkdir /user/$USER
    sudo -u hdfs hadoop fs -chown $USER /user/$USER
    
  5. Enjoy your cluster
    No Format
    hadoop fs -lsr /
    hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000
    
  6. If you are using Amazon AWS it is important the IP address in /etc/hostname matches the Private IP Address in the AWS Management Console. If the addresses do not match Map Reduce programs will not complete. Image Added

Running Hadoop Components

...

  1. Install HBase
    No Format
    sudo apt-get install hbase\*
    
  2. For bigtop-0.2.0 uncomment and set JAVA_HOME in /etc/hbase/conf/hbase-env.sh
  3. For bigtop-0.3.0 this shouldn't be necessary because JAVA_HOME is auto detected
    No Format
    sudo service hbase-master start
    hbase shell
    
  4. Test the HBase shell by creating a HBase table named t1 with 3 columns f1, f2 and f3. Verify the table exists in HBase
    No Format
    hbase(main):001:0> create 't2','f1','f2','f3'
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/lib/hbase/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    0 row(s) in 3.4390 seconds
    
    hbase(main):002:0> list
    TABLE                                                                                                                                                                                                                                                                                                                                                    
    t2                                                                                                                                                                            
    2 row(s) in 0.0220 seconds
    
    hbase(main):003:0>
    
    you should see a verification from HBase the table t1 exists, the symbol t1 which is the table name should appear under list

...

  1. This is for bigtop-0.2.0 where hadoop-hive, hadoop-hive-server, and hadoop-hive-metastore are installed automatically because the hive services start with the word hadoop. For bigtop-0.3.0 if you use the sudo apt-get install hadoop* command you won't get the Hive components installed. For bigtop-0.3.0 you will have to do
    No Format
    sudo apt-get install hive hive-server hive-metastore
    
    Create the HDFS directories Hive needs
    The Hive Post install scripts should create the /tmp and /user/hive/warehouse directories. If they don't exist, create them in HDFS. The Hive post install script doesn't create these directories because HDFS is not up and running during the deb file installation because JAVA_HOME is buried in hadoop-env.sh and HDFS can't start to allow these directories to be created.
    No Format
    hadoop fs -mkdir /tmp
    hadoop fs -mkdir /user/hive/warehouse
    hadoop -chmod g+x /tmp
    hadoop -chmod g+x /user/hive/warehouse
    
  2. If the post install scripts didn't create directories /var/run/hive and /var/lock/subsus, create directory /var/run/hive and create directory /var/lock/subsys
    No Format
    sudo mkdir /var/run/hive
    sudo mkdir /var/lock/subsys
    
  3. start the Hive Server
    No Format
    sudo /etc/init.d/hadoop-hive-server start
    
  4. create a table in Hive and verify it is there
    No Format
    ubuntu@ip-10-101-53-136:~$ hive 
    WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
    Hive history file=/tmp/ubuntu/hive_job_log_ubuntu_201203202331_281981807.txt
    hive> create table doh(id int);
    OK
    Time taken: 12.458 seconds
    hive> show tables;
    OK
    doh
    Time taken: 0.283 seconds
    hive> 
    

Running Mahout

Running Whirr

...