Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Set bash environment variables HADOOP_HOME=/usr/lib/hadoop, HADOOP_CONF_DIR=$HADOOP_HOME/conf
  2. Go to /usr/share/doc/mahout/examples/bin and unzip cluster-reuters.sh.gz
    Code Block
    export HADOOP_HOME=/usr/lib/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/conf
    
  3. modify the contents of cluster-reuters.sh, replace MAHOUT="../../bin/mahout" with MAHOUT="/usr/lib/mahout/bin/mahout"
  4. make sure the Hadoop file system is running
  5. ./cluster-reuters.sh will display a menu selection
    ubuntu@ip-10-224-109-199:/usr/share/doc/mahout/examples/bin$ ./cluster-reuters.sh
    Please select a number to choose the corresponding clustering algorithm
    1. kmeans clustering
    2. fuzzykmeans clustering
    3. lda clustering
    4. dirichlet clustering
    5. minhash clustering
    Enter your choice : 1
    ok. You chose 1 and we'll use kmeans Clustering
    creating work directory at /tmp/mahout-work-ubuntu
    Downloading Reuters-21578
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 7959k 100 7959k 0 0 346k 0 0:00:22 0:00:22 -::- 356k
    Extracting...
    AFTER WAITING 1/2 HR...
    Inter-Cluster Density: 0.8080922658756075
    Intra-Cluster Density: 0.6978329770855537
    CDbw Inter-Cluster Density: 0.0
    CDbw Intra-Cluster Density: 89.38857003754612
    CDbw Separation: 303.4892272989769
    12/03/29 03:42:56 INFO clustering.ClusterDumper: Wrote 19 clusters
    12/03/29 03:42:56 INFO driver.MahoutDriver: Program took 261107 ms (Minutes: 4.351783333333334)

Running Whirr

Where to go from here

...