Multi-Node Ozone Cluster

Pre-requisites

Ensure you have password-less ssh setup between your hosts.

Configuration

ozone-site.xml

Save the following snippet to etc/hadoop/ozone-site.xml in the compiled Ozone distribution.

<configuration>
<properties>
<property><name>ozone.scm.block.client.address</name><value>SCM-HOSTNAME</value></property>
<property><name>ozone.scm.names</name><value>SCM-HOSTNAME</value></property>
<property><name>ozone.scm.client.address</name><value>SCM-HOSTNAME</value></property>
<property><name>ozone.om.address</name><value>OM-HOSTNAME</value></property>
<property><name>ozone.handler.type</name><value>distributed</value></property>
<property><name>ozone.scm.datanode.id.dir</name><value>/tmp/ozone/data/</value></property>
<property><name>ozone.replication</name><value>3</value></property>
<property><name>ozone.metadata.dirs</name><value>/tmp/ozone/data/metadata</value></property>
</properties>
</configuration>

Replace SCM-HOSTNAME and OM-HOSTNAME with the names of the machines where you want to start the SCM and OM services respectively. It is okay to start these services on the same host. If you are unsure then just use any machine from your cluster.

hadoop-env.sh

The only mandatory setting in hadoop-env.sh is JAVA_HOME. E.g.

# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
export JAVA_HOME=/usr/java/latest

workers

The workers file should contain a list of hostnames in your cluster where DataNode service will be started. E.g.

n001.example.com
n002.example.com
n003.example.com
n004.example.com

Start Services

Initialize the SCM

Run the following commands on the SCM host

bin/ozone scm --init
bin/ozone --daemon start scm

Format the OM

Run the following commands on the OM host

bin/ozone om --init
bin/ozone --daemon start om

Start DataNodes

Run the following command on any cluster host.

  su hdfs -c 'bin/ozone --config /etc/ozone/conf --daemon start datanode'

Hadoop Integration

Shutdown Hadoop Cluster

Edit hadoop-env.sh in $HADOOP_CONF_DIR to include Ozone filesystem jar file in Hadoop classpath

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar

Edit core-site.xml, and update core-site.xml to include Ozone configuration

<property>
  <name>fs.o3fs.impl</name>
  <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
</property>

<property>
  <name>fs.AbstractFileSystem.o3fs.impl</name>
  <value>org.apache.hadoop.fs.ozone.OzFs</value>
</property>

<property>
  <name>fs.defaultFS</name>
  <value>o3fs://bucket.volume</value>
  <final>true</final>
</property>

Copy ozone-site.xml from $OZONE_CONF_DIR to $HADOOP_CONF_DIR

cp $OZONE_CONF_DIR/ozone-site.xml $HADOOP_CONF_DIR/ozone-site.xml

Update mapred-site.xml to include Ozone file system jar file

<property>
  <name>mapreduce.application.classpath</name>
  <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$OZONE_HOME/share/ozone/lib/hadoop-ozone-filesystem-lib-current-$OZONE_VERSION.jar</value>
</property>

Create volumes and buckets

Volume and bucket defined in core-site.xml will be used to store HDFS data.  Use Ozone CLI to create the corresponding volume and bucket

ozone sh volume create volume
ozone sh bucket create /volume/bucket

These commands creates a volume named volume, and a bucket named bucket and attached to /volume.

Start YARN Services

YARN can be started and write data to Ozone File system after the volume and bucket have been created.

$HADOOP_HOME/sbin/start-yarn.sh

Mapreduce and YARN work load will run on Ozone file system in /volume/bucket bucket.

Stop Services

Run the following command on any cluster host.

su hdfs -c 'bin/ozone --config /etc/ozone/conf --daemon stop datanode'




  • No labels