...
- Set bash environment variables HADOOP_HOME=/usr/lib/hadoop, HADOOP_CONF_DIR=$HADOOP_HOME/conf
- Go to /usr/share/doc/mahout/examples/bin and unzip cluster-reuters.sh.gz
Code Block export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/conf
- modify the contents of cluster-reuters.sh, replace MAHOUT="../../bin/mahout" with MAHOUT="/usr/lib/mahout/bin/mahout"
- make sure the Hadoop file system is running
- ./cluster-reuters.sh will display a menu selection
Panel ubuntu@ip-10-224-109-199:/usr/share/doc/mahout/examples/bin$ ./cluster-reuters.sh
Panel Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. lda clustering
4. dirichlet clustering
5. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-ubuntu
Downloading Reuters-21578
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7959k 100 7959k 0 0 346k 0 0:00:22 0:00:22 -::- 356k
Extracting...
AFTER WAITING 1/2 HR...
Inter-Cluster Density: 0.8080922658756075
Intra-Cluster Density: 0.6978329770855537
CDbw Inter-Cluster Density: 0.0
CDbw Intra-Cluster Density: 89.38857003754612
CDbw Separation: 303.4892272989769
12/03/29 03:42:56 INFO clustering.ClusterDumper: Wrote 19 clusters
12/03/29 03:42:56 INFO driver.MahoutDriver: Program took 261107 ms (Minutes: 4.351783333333334)
Running Whirr
- Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in .bashrc according to the values under your AWS account. Verify using echo $AWS_ACCESS_KEY_ID this is valid before proceeding.
- run the zookeeper recipe as below.
Panel ~/whirr-0.7.1:bin/whirr launch-cluster --config recipes/hadoop-ec2.properties
- if you get an error message like:
apply Whirr patch 459: https://issues.apache.org/jira/browse/WHIRR-459Panel Unable to start the cluster. Terminating all nodes.
org.apache.whirr.net.DnsException: java.net.ConnectException: Connection refused
at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:83)
at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:40)
at org.apache.whirr.Cluster$Instance.getPublicHostName(Cluster.java:112)
at org.apache.whirr.Cluster$Instance.getPublicAddress(Cluster.java:94)
at org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler.doBeforeConfigure(HadoopNameNodeClusterActionHandler.java:58)
at org.apache.whirr.service.hadoop.HadoopClusterActionHandler.beforeConfigure(HadoopClusterActionHandler.java:87)
at org.apache.whirr.service.ClusterActionHandlerSupport.beforeAction(ClusterActionHandlerSupport.java:53)
at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:100)
at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:109)
at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:63)
at org.apache.whirr.cli.Main.run(Main.java:64)
at org.apache.whirr.cli.Main.main(Main.java:97)
...