THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Set bash environment variables HADOOP_HOME=/usr/lib/hadoop, HADOOP_CONF_DIR=$HADOOP_HOME/conf
- Go to /usr/share/doc/mahout/examples/bin and unzip cluster-reuters.sh.gz
Code Block export HADOOP_HOME=/usr/lib/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/conf
- modify the contents of cluster-reuters.sh, replace MAHOUT="../../bin/mahout" with MAHOUT="/usr/lib/mahout/bin/mahout"
- make sure the Hadoop file system is running
- ./cluster-reuters.sh will display a menu selection
ubuntu@ip-10-224-109-199:/usr/share/doc/mahout/examples/bin$ ./cluster-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. lda clustering
4. dirichlet clustering
5. minhash clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
creating work directory at /tmp/mahout-work-ubuntu
Downloading Reuters-21578
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7959k 100 7959k 0 0 346k 0 0:00:22 0:00:22 -::- 356k
Extracting...
AFTER WAITING 1/2 HR...
Inter-Cluster Density: 0.8080922658756075
Intra-Cluster Density: 0.6978329770855537
CDbw Inter-Cluster Density: 0.0
CDbw Intra-Cluster Density: 89.38857003754612
CDbw Separation: 303.4892272989769
12/03/29 03:42:56 INFO clustering.ClusterDumper: Wrote 19 clusters
12/03/29 03:42:56 INFO driver.MahoutDriver: Program took 261107 ms (Minutes: 4.351783333333334)
Running Whirr
Where to go from here
...