Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Ozone is an object store for Hadoop. It is a redundant, distributed objectstore build by leveraging primitives present in HDFS. Ozone supports RESTAPI for accessing the store.

Ozone is a work in progress and currently in alpha state. To test it you needto build if from the source code or use Hadoop version higher than 3.1.

To run an Ozone cluster you have multiple option:

  1. Use prebaked docker images from the dockerhub (no build is required, but these images are not from the Apache Hadoop project)

  2. Build a new Ozone cluster from the source code and start a cluster with docker (also useful for development)
  3. Build new Ozone cluster from the source code and start it with the startup scripts without docker.

We will describe all of these scenarios in the next sections

Starting Ozone cluster with docker (using prebuilt images)

The easiest to start an Ozone cluster is using prebuilt docker images uploaded to the docker hub.

Warning

Please note that the docker images are not provided by the Apache project, (yet, see

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHADOOP-14898
for the official containers). This method uses third-party docker images from the flokkr projects.

...

Code Block
titledocker-compose.yaml
version: "3"
services:
   namenode:
      image: flokkr/hadoop:ozone
      hostname: namenode
      ports:
         - 50070:50070
         - 9870:9870
      environment:
          ENSURE_NAMENODE_DIR: /data/namenode
      env_file:
         - ./docker-config
      command: ["/opt/hadoop/bin/hdfs","namenode"]
   datanode:
      image: flokkr/hadoop:ozone
      ports:
        - 9864
      env_file:
         - ./docker-config
      command: ["/opt/hadoop/bin/hdfs","datanode"]
   ksm:
      image: flokkr/hadoop:ozone
      ports:
         - 9874:9874
      env_file:
          - ./docker-config
      command: ["/opt/hadoop/bin/hdfs","ksm"]
   scm:
      image: flokkr/hadoop:ozone
      ports:
         - 9876:9876
      env_file:
          - ./docker-config
      command: ["/opt/hadoop/bin/hdfs","scm"]

And the configuration in the docker-config file:

Code Block
titledocker-config
CORE-SITE.XML_fs.defaultFS=hdfs://namenode:9000
OZONE-SITE.XML_ozone.ksm.address=ksm
OZONE-SITE.XML_ozone.scm.names=scm
OZONE-SITE.XML_ozone.enabled=True
OZONE-SITE.XML_ozone.scm.datanode.id=/data/datanode.id
OZONE-SITE.XML_ozone.scm.block.client.address=scm
OZONE-SITE.XML_ozone.container.metadata.dirs=/data/metadata
OZONE-SITE.XML_ozone.handler.type=distributed
OZONE-SITE.XML_ozone.scm.client.address=scm
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:9000
HDFS-SITE.XML_dfs.namenode.name.dir=/data/namenode
LOG4J.PROPERTIES_log4j.rootLogger=INFO, stdout
LOG4J.PROPERTIES_log4j.appender.stdout=org.apache.log4j.ConsoleAppender
LOG4J.PROPERTIES_log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

 

Save both the files to a new directory and run the containers with:

Code Block
docker-compose up -d

 

You can check the status of the components:

Code Block
docker-compose ps

 

You can check the output of the servers with:

Code Block
docker-compose logs

 

As the webui ports are forwarded to the external machine, you can check the web UI:

* Storage Container Manager: http://localhost:9876/ 
* Key Space Manager: http://localhost:9874/
* Datanode: http://localhost:9870/

 

You can start multiple datanodes with:

Code Block
docker-compose scale datanode=3

 

You can test the commands from the OzoneShell page after opening a new shell in one of the containers:

Code Block
docker-compose exec datanode bash

Notes:

Please note, that:

  1. The containers could be configured by environment variables. We just moved out the env definitions to an external file to avoid duplication.

  2. For more detailed explanation of the Configuration variables see the OzoneConfiguration page.

  3. The flokkr base image contains a simple script to convert environment variables to files, based on naming convention. All of the environment variables will be converted to traditional Hadoop config XMLs or  log4j configuration files

Starting Ozone cluster with docker from the source build

 

Warning

Only do this if

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHDFS-12469
is merged. (Or use the patch).

...

  1. First, it uses a much more smaller common image which doesn't contains Hadoop.
  2. Second, the real Hadoop should be built from the source and the dist director should be mapped to the container.

With this method you can start a full cluster on your local machine from your own build.

Build Ozone

To build Ozone, please checkout the hadoop sources from github or the apache git repository. Then checkout the ozone branch, HDFS-7240 and build it.

Code Block
 git checkout HDFS-7240
 mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Pdist -Dtar -DskipShade

Note: skipShade is just to make compilation faster and not really required.

This will give you a directory in your hadoop-dist/target directory which could be mapped to the docker containers. 

Start the cluster

Code Block
cd dev-support/compose/ozone
docker-compose up

For more docker-compose commands, please check the previous section.

Starting Ozone cluster with shell scripts from the source build (without docker)

This is the traditional method. You need a build as defined in the previous section.

You can start it by going to the hadoop-dist/target/hadoop-3.1.0-alpha directory and start the cluster.

Configuration

There is a detailed documentation about the configuration of Ozone cluster. But If you would like to getting started, just save the following snippet to the etc/hadoop/ozone-site.xml (inside your hadoop distribution)

Code Block
<properties>
<property><name>ozone.ksm.address</name><value>localhost</value></property>
<property><name>ozone.scm.datanode.id</name><value>/tmp/datanode.id</value></property>
<property><name>ozone.scm.names</name><value>localhost</value></property>
<property><name>ozone.handler.type</name><value>distributed</value></property>
<property><name>ozone.container.metadata.dirs</name><value>/tmp/metadata</value></property>
<property><name>ozone.scm.block.client.address</name><value>localhost</value></property>
<property><name>ozone.scm.client.address</name><value>localhost</value></property>
<property><name>ozone.enabled</name><value>True</value></property>
</properties>

Run

Ozone is designed to run concurrently with HDFS. The simplest way to start is to run start-dfs.sh from the $HADOOP/sbin/start-dfs.sh. Once HDFS is running, please verify it is fully functional by running some commands like.

 

 

Code Block
hdfs dfs -mkdir /usr
hdfs dfs -ls /

 

Once you are sure that HDFS is running, start Ozone. To start ozone, you need to start SCM and KSM. Currently we assume that both KSM and SCM is running on the same node, this will change in future.

Code Block
./hdfs --daemon start scm
./hdfs --daemon start ksm

 

 

if you would like to start HDFS and Ozone together, you can do that by running a single command.

Code Block
$HADOOP/sbin/start-ozone.sh

This command will start HDFS and then start the ozone components.

Once you have ozone running you can use these ozone shell commands to create a volume, bucket and keys.Moved to: Contributing to Ozone