Multi-Node Ozone Cluster
Pre-requisites
Ensure you have password-less ssh setup between your hosts.
Configuration
ozone-site.xml
Save the following snippet to etc/hadoop/ozone-site.xml
in the compiled Ozone distribution.
<configuration> <properties> <property><name>ozone.scm.block.client.address</name><value>SCM-HOSTNAME</value></property> <property><name>ozone.scm.names</name><value>SCM-HOSTNAME</value></property> <property><name>ozone.scm.client.address</name><value>SCM-HOSTNAME</value></property> <property><name>ozone.om.address</name><value>OM-HOSTNAME</value></property> <property><name>ozone.handler.type</name><value>distributed</value></property> <property><name>ozone.scm.datanode.id.dir</name><value>/tmp/ozone/data/</value></property> <property><name>ozone.replication</name><value>3</value></property> <property><name>ozone.metadata.dirs</name><value>/tmp/ozone/data/metadata</value></property> </properties> </configuration>
Replace SCM-HOSTNAME and OM-HOSTNAME with the names of the machines where you want to start the SCM and OM services respectively. It is okay to start these services on the same host. If you are unsure then just use any machine from your cluster.
hadoop-env.sh
The only mandatory setting in hadoop-env.sh is JAVA_HOME. E.g.
# The java implementation to use. By default, this environment # variable is REQUIRED on ALL platforms except OS X! export JAVA_HOME=/usr/java/latest
workers
The workers file should contain a list of hostnames in your cluster where DataNode service will be started. E.g.
n001.example.com n002.example.com n003.example.com n004.example.com
Start Services
Initialize the SCM
Run the following commands on the SCM host
bin/ozone scm --init bin/ozone --daemon start scm
Format the OM
Run the following commands on the OM host
bin/ozone om --init bin/ozone --daemon start om
Start DataNodes
Run the following command on any cluster host.
su hdfs -c 'bin/ozone --config /etc/ozone/conf --daemon start datanode'
Hadoop Integration
Shutdown Hadoop Cluster
Edit hadoop-env.sh in $HADOOP_CONF_DIR to include Ozone filesystem jar file in Hadoop classpath
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar
Edit core-site.xml, and update core-site.xml to include Ozone configuration
<property> <name>fs.o3fs.impl</name> <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.o3fs.impl</name> <value>org.apache.hadoop.fs.ozone.OzFs</value> </property> <property> <name>fs.defaultFS</name> <value>o3fs://bucket.volume</value> <final>true</final> </property>
Copy ozone-site.xml from $OZONE_CONF_DIR to $HADOOP_CONF_DIR
cp $OZONE_CONF_DIR/ozone-site.xml $HADOOP_CONF_DIR/ozone-site.xml
Update mapred-site.xml to include Ozone file system jar file
<property> <name>mapreduce.application.classpath</name> <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar</value> </property>
Create volumes and buckets
Volume and bucket defined in core-site.xml will be used to store HDFS data. Use Ozone CLI to create the corresponding volume and bucket
ozone sh volume create volume ozone sh bucket create /volume/bucket
These commands creates a volume named volume, and a bucket named bucket and attached to /volume.
Start YARN Services
YARN can be started and write data to Ozone File system after the volume and bucket have been created.
$HADOOP_HOME/sbin/start-yarn.sh
Mapreduce and YARN work load will run on Ozone file system in /volume/bucket bucket.
Stop Services
Run the following command on any cluster host.
su hdfs -c 'bin/ozone --config /etc/ozone/conf --daemon stop datanode'