Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info from the ApacheCon NA 2011 talk by Paul Ramirez and Cameron Goodale is here: http://lanyrd.com/2011/apachecon-north-america/smfcf/

Overview

...

The high level goal of this effort is to build a distribution of OODT that sets up, installs, and runs within a few commands. Reducing the number of commands to run OODT is an effort to push the edge of ease of setup and configuration required to get going. This distribution of OODT will include both a deployment and source structure for managing the evolution of your installation of OODT.

...

This wiki will be used to capture thoughts, ideas and plans for the first archetypes we develop for OODT. To keep things simple we are going to initially focus on a small number of modules that are typically deployed and configured together. Finally, our goal is to build an 80% solution that works in most cases to get people out of the gates and running with a full OODT solution. We believe this effort will help increase adoption and conformity amongst installations of an already great system.

Lineage

...

Long, long, ago there was a thought that making some archetypes for OODT would be great to get people going. Of course we already had example projects along with each of our components but let's be honest, how many of us started from those? Okay, so a few did but that's not the point. Sometime later a wise man pondered that it might be cool to borrow the idea Cloudera like distribution for OODT. That of course fizzled, until sometime recently a young padawan came along and said OODT should just be easier. The archetype guy said, you are right and that's what I've been saying. Of course, being that the idea was always in the aether and not a reality no one knew. That is until now. RADiX is the realization of what has been brewing and itch that has been waiting to be scratched. It builds on stuff we have done all along but brings our work to the forefront and helps it shine. This new way of working with OODT will not be for everyone but we hope that it will give the 80% solution and that,"Then you'll see, that it is not the spoon that bends, it is only yourself."

Assumptions

...

  • The initial archetype will export RELEASED versions of OODT
  • The initial archetype will export Crawler, FileManager and Workflow Manager ONLY (they will be bundled together and configured to work together)
    • other modules will be added in the future
  • FileManager Policy will be read recursively from the components/filemanager/policy directory. This will remove the requirement to make properties updates when additional policy files are added in sub-directories.

Constraints

...

  • Archetype will only support a single version for all components. No mixing versions of individual components
  • Maven Archetype process will be completed using as few commands as possible

Prerequisites

...

  • Submit INFRA ticket to create a place to put all Maven Central artifacts
  • Load/Install artifacts to Maven Central

The Commands

Requires Maven 2.x and Java 1.5+

...

NOTE: The "oodt start" command should return quite quickly. This is normal. If, however, the OpsUI link above is not functional on your system, one good place to start looking for probable cause is the "oodt.out" log file, which can be found (if you've been following the example above) in "oodt-deploy/logs/oodt.out"

Version Control

If you want to manage your OODT RADiX distribution with Subversion.

Code Block
prompt> svn import <dataSystemName> http://your_repo_path/<dataSystemName>/trunk -m "Initial OODT Import"

Default Deployment Structure

In order for the easy installation to work properly we will need to settle on a default deployment layout structure. Below is our plan of how we want to layout the deployment when the project is built. First we will list out an overview, then we can detail each path and what files are going to be saved into each.

Code Block
/$DEPLOYMENT_BASE_DIR
  /bin
  /crawler
    /bin
    /etc
    /policy
    /lib
  /filemgr
    /bin
    /etc
    /policy
    /lib
  /workflow
    /bin
    /etc
    /policy
    /lib
  /extensions
    /bin
    /etc
    /lib
  /tomcat
  /data
     /archive
     /staging
     /work
     /met
     /failure
     /catalog

Deployment Path Descriptions

Path

Description

/data/archive

This is the root of where the filemgr will store its archived products

/data/staging

This directory will be monitored by the crawler. Products to be ingested should be placed here

/data/catalog

In a configuration that uses Lucene as a back end this directory holds the contents of that index

/data/work

...

/data/failure

Any products that have failed ingestion will be placed here along with any metadata files.

/bin

System level scripts to start, stop, restart the OODT infrastructure.

/crawler

Crawler deployment for your data management system (i.e. policy, scripts, and configuration). This component is responsible for monitoring the staging area

/filemgr

Filemgr deployment for your data management system (i.e. policy, scripts, and configuration). This component catalogs and archives products into the archive area.

/workflow

Workflow deployment for your data management system (i.e. policy, scripts, and configuration). This component orchestrates any processing that may need to be done on your products

/extensions

Sandbox area to test out metadata extractors, versioners, actions, etc. that you have developed to extend the functionality of the existing OODT framework.

/etc

System wide configuration

/bin Details

This directory contains scripts manipulate the underlying components. You can start, stop, and restart each individual component or all 3 at the same time.

...

Code Block
./oodt [start OR stop OR restart] [crawler OR filemanager OR workflowmanager]

/conf Details

These are the default parameters set within oodt.properties.

Code Block
crawler_port=9020

filemanager_port=9000

workflowmanager_port=9001

resmgr_port=9002

batchstub_port=2001

JAVA_HOME

Component settings we plan to default

crawler

port

filemanager

Code Block
titlefilemgr
borderStylesolid
FILEMGR_PORT=9000
export FILEMGR_PORT

...

Code Block
titlewmgr
borderStylesolid
WFMGR_PORT=9001
export WFMGR_PORT

Default Source Structure

Code Block
/$DEPLOYMENT_BASE_DIR
  /crawler
    /policy
    /bin
    /etc
  /filemgr
    /policy
      /oodt
    /bin
    /etc
  /workflow
    /policy
    /bin
    /etc
  /webapps
    /fmprod
    /fmbrowser
    /wmonitor
    /curator
  /extensions
    /src/main
      /java
        /<package>
          /extractor
          /versioner
          /task
          /action
      /python
  /distribution
    /bin
      /oodt
    /etc
      /oodt.properties

Source Path Descriptions

Path

Description

/crawler

Project specific crawler configuration, policy, and scripts

/filemgr

Project specific filemgr configuration, policy, and scripts

/workflow

Project specific workflow configuration, policy, and scripts

/webapps

Web Applications from Apache OODT

/extensions

Extensions to the OODT framework to do metadata extraction, archive layout (aka versioner), workflow tasks, crawler actions

/distribution

Distribution package project for system level build, configuration, and scripts

Future Work

Once the above is complete our thoughts are that the next items to be incorporated are as follows:

...