A Guide to Streaming OODT

What is Streaming OODT?

Streaming OODT (S-OODT) is an attempt to bring technologies that support big-cluster processing and stream processing into the OODT processing system. This will enable OODT to tackle bigger data sets and new data paradigms moving into the future. The core technologies handle: stream management, cluster management, processing, file system efficiency and backward-compatibility (running standard OODT within a managed cluster). These technologies include:

Apache Kafka (Stream management)
Apache Mesos (Cluster management)
Apache Spark and Spark Streaming (Processing)
Tachyon and Hadoop HDFS (File system efficiency)

For detailed design documentation please refer to: Streaming OODT

What is required for Streaming OODT?

Streaming OODT is designed for use on clusters of machines. It requires at least one head node and at least one slave node. The head node runs the main servers, and the slaves run the client services including processing slaves and HDFS slaves. In general the cluster should have moderate local disk storage and above normal RAM on each slave node. Currently S-OODT has only been tested on linux hosts.

In order to install and run S-OODT the following items are reuired to be configured on your cluster before you install.

Required Packages for Mesos (Ubuntu Package Names):

Development Python (python-devel)
Python Amazon Web Services Interface (python-boto)
Lib Curl (libcurl4-nss-dev)
Lib ASL (libasl2-dev)

Required software:

Apache Maven 3.x
Java 7
GNU Compiler Tools (gcc, g++)
SVN

Required environment variables:

M2_HOME
JAVA_HOME

Required Linux Settings:

Allow all communications between cluster machines
Update thread/process ulimit to 4096
Setup 5GB RAM disk (for Tachyon)
Setup ssh-key login for cluster running user

Using Streaming OODT

The following sections describe how to install, setup and use S-OODT.

Cluster Installation

In order to install S-OODT, first make sure you have the required software. Then login to your head-node and perform the following steps.

Note: Root user access should not be required.

Setup Bootstrap Script

Setup Bootstrap Script

#Make a temp directory
mkdir tmp
cd tmp
#Get install script (minimal bootstrap script)
svn export https://svn.apache.org/repos/asf/oodt/trunk/cluster-tools/setup/
#Setup environment variables
cd setup
mv env-vars.sh.tmpl env-vars.sh

Setup Environment Variables

Now edit "env-vars.sh" and set the following environment variables (found in the top section):

RUN_DIR: This is a directory to store log files, scratch files, and any other storage the system needs.
INSTALL_DIR: A directory in which to install the software
TMP_DIR: A temporary directory used for downloads and other scratch space during installation

Note 1: Make sure to create the above directories and grant read, write and execute permissions.

Note 2: It is inadvisable to change the versions of the installed software at this time. The software is tested as a set.

Setup Hosts File

Now edit "hosts". This is a list of nodes used by your cluster. This first host is the head-node, and the rest represent slaves. Add all hosts in your cluster to this file.

Run Cluster Install

Run Cluster Installation

# Add execute permission to scripts
chmod +x install.sh env-vars.sh deploy.sh
# Run installation (will take some time)
./install.sh
#NOTE: Bootstrap finished, use installed software at ${INSTALL_DIR}.
# Navigate to installed setup scripts
. env-vars.sh
cd ${INSTALL_DIR}/cluster-tools/setup

Space shortcuts

Page tree