Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in .bashrc according to the values under your AWS account. Verify using echo $AWS_ACCESS_KEY_ID this is valid before proceeding. 
  2. run the zookeeper recipe as below. 
    Panel

    ~/whirr-0.7.1:bin/whirr launch-cluster  --config recipes/hadoop-ec2.properties

  3. if you get an error message like:
    Panel

    Unable to start the cluster. Terminating all nodes.
    org.apache.whirr.net.DnsException: java.net.ConnectException: Connection refused
    at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:83)
    at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:40)
    at org.apache.whirr.Cluster$Instance.getPublicHostName(Cluster.java:112)
    at org.apache.whirr.Cluster$Instance.getPublicAddress(Cluster.java:94)
    at org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler.doBeforeConfigure(HadoopNameNodeClusterActionHandler.java:58)
    at org.apache.whirr.service.hadoop.HadoopClusterActionHandler.beforeConfigure(HadoopClusterActionHandler.java:87)
    at org.apache.whirr.service.ClusterActionHandlerSupport.beforeAction(ClusterActionHandlerSupport.java:53)
    at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:100)
    at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:109)
    at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:63)
    at org.apache.whirr.cli.Main.run(Main.java:64)
    at org.apache.whirr.cli.Main.main(Main.java:97)

    apply Whirr patch 459: https://issues.apache.org/jira/browse/WHIRR-459Image Removed
  4. When whirr is finished launching the cluster, you will see an entry under ~/.whirr to verify the cluster is running
  5. cat out the hadoop-proxy.sh command to find the EC2 instance address or you can cat out the instance file. Both will give you the Hadoop namenode address even though you started the mahout service using whirr.
  6. ssh into the instance to verify you can login. Note: this login is different than a normal EC2 instance login. The ssh key is id_rsa and there is no user name for the instance IP address ~/.whirr/mahout:ssh -i ~/.ssh/id_rsa ec2-50-16-85-59.compute-1.amazonaws.com
    #verify you can access the HDFS file system from the instance
    No Format
    dc@ip-10-70-18-203:~$ hadoop fs -ls /
    Found 3 items
    drwxr-xr-x   - hadoop supergroup          0 2012-03-30 23:44 /hadoop
    drwxrwxrwx   - hadoop supergroup          0 2012-03-30 23:44 /tmp
    drwxrwxrwx   - hadoop supergroup          0 2012-03-30 23:44 /user
    

...

  1. Stop the Oozie daemons using ps -ef | grep oozie to find them then sudo kill -i pid ( the pid from the ps -ef command)
  2. Stopping the Oozie daemons may not remove the oozie.pid file which tells the system an oozie process is running. You may have to manually remove the pid file using sudo rm -rf /var/run/oozie/oozie.pid
  3. cd into /usr/lib/oozie and setup the oozie environment variables using bin/oozie-env.sh
  4. Download ext-2.2.js from http://incubator.apache.org/oozie/QuickStart.htmlImage Removed
  5. Install ext-2.2.js using
    No Format
    bin/oozie-setup.sh -hadoop 1.0.1 ${HADOOP_HOME} -extjs ext-2.2.zip 
    
  6. You will get an error message change the above to the highest Hadoop version available,
    No Format
    sudo bin/oozie-setup.sh -hadoop 0.20.200 ${HADOOP_HOME} -extjs ext-2.2.zip 
    
  7. start oozie, sudo bin/oozie-start.sh
  8. run oozie, sudo bin/oozie-run.sh you will get a lot of error messages, this is ok.
  9. go to the public DNS EC2 address/oozie/11000, my address looked like: http://ec2-67-202-18-159.compute-1.amazonaws.com:11000/oozie/Image Removed

Image Added

Running Zookeeper

Zookeeper is installed as part of HBase. Do we need to add anything here?

...