You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 40 Next »

Formerly named OODT Easy Install : The effort was renamed to RADiX to help delineate it from the core distribution of OODT. RADiX will be both easy and awesome.

Overview

The high level goal of this effort is to build a distribution of OODT that sets up, installs, and runs within five commands. While five commands may seem arbitrary the number serves to push this effort to the edge of ease of setup and configuration required to get going. This distribution of OODT will include both a deployment and source structure for managing the evolution of your installation of OODT.

Maven offers an improved way to export, configure and build OODT called Archetypes. Archetypes simply put are a way to define templates for projects. Within these project templates we will include packaging instructions to conform to the guidelines below to increase the similarity amongst deployments of OODT. Moreover, we will build higher level scripts and configuration to tie the pieces together at the system level. Finally, we will leverage the CAS Install Maven Plugin to take us from our source structure to our deployment structure.

This wiki will be used to capture thoughts, ideas and plans for the first archetypes we develop for OODT. To keep things simple we are going to initially focus on a small number of modules that are typically deployed and configured together. Finally, our goal is to build an 80% solution that works in most cases to get people out of the gates and running with a full OODT solution. We believe this effort will help increase adoption and conformity amongst installations of an already great system.

Assumptions:

  • The initial archetype will export RELEASED versions of OODT
  • The initial archetype will export Crawler, FileManager and Workflow Manager ONLY (they will be bundled together and configured to work together)
    • other modules will be added in the future
  • FileManager Policy will be read recursively from the components/filemanager/policy directory. This will remove the requirement to make properties updates when additional policy files are added in sub-directories.

Constraints:

  • Archetype will only support a single version for all components. No mixing versions of individual components
  • Maven Archetype process will be completed using 5 commands or less

Prerequisites:

  • Submit INFRA ticket to create a place to put all Maven Central artifacts
  • Load/Install artifacts to Maven Central

The 5 Commands

Requires Maven 2.x and Java 1.5+

prompt> wget http://www.apache.org/dist/oodt/radix-0.4.tgz
prompt> tar -xzvf radix-0.4.tgz
prompt> export PATH=${PATH}:<downloadDirectory>/radix
prompt> oodt-radix <dataSystemName> <packageName>
prompt> ./<dataSystemName>/deploy/bin/oodt start

Get OODT RADiX Distribution

Unpackage OODT RADiX

Add OODT RADiX Commands

OODT Create

OODT Start

Version Control

If you want to manage your OODT RADiX distribution with Subversion.

prompt> svn import <dataSystemName>/source http://your_repo_path/my-pipline/trunk -m "Initial OODT Import"

Default Deployment Structure

In order for the easy installation to work properly we will need to settle on a default deployment layout structure. Below is our plan of how we want to layout the deployment when the project is built. First we will list out an overview, then we can detail each path and what files are going to be saved into each.

/$DEPLOYMENT_BASE_DIR
  /bin
  /components
    /crawler
      /bin
      /etc
      /policy
      /lib
    /filemgr
      /bin
      /etc
      /policy
      /lib
    /workflow
      /bin
      /etc
      /policy
      /lib
    /extensions
      /bin
      /etc
      /lib
    /webapps
      /fmprod
      /fmbrowser
      /wmonitor
    /tomcat5
  /etc
  /data
     /archive
     /staging
     /work
     /met
     /failure
     /catalog

Path Descriptions

Path

Description

/data/archive

This is the root of where the filemgr will store its archived products

/data/staging

This directory will be monitored by the crawler. Products to be ingested should be placed here

/data/catalog

In a configuration that uses Lucene as a back end this directory holds the contents of that index

/data/work

...

/data/failure

Any products that have failed ingestion will be placed here along with any metadata files.

/bin

Contains system level scripts to start, stop, restart the OODT infrastructure

/components

The guts of what make the data management system work

/components/crawler

The crawler deployment for your data management system (i.e. policy, scripts, and configuration). This component is responsible for monitoring the staging area

/components/filemgr

The filemgr deployment for your data management system (i.e. policy, scripts, and configuration). This component catalogs and archives products into the archive area.

/components/workflow

The workflow deployment for your data management system (i.e. policy, scripts, and configuration). This component orchestrates any processing that may need to be done on your products

/components/extensions

This is a sandbox area to test out metadata extractors, versioners, actions, etc. that you have developed to extend the functionality of the existing OODT framework.

/etc

System wide configuration

Deployment Path Details

/$DEPLOYMENT_BASE_DIR/bin- This will contain scripts that will manipulate the underlying components. For example all 3 components can be started, stopped and restarted from this directory. At the same time you can also manipulate a single component at a time from this dir also.

Manipulate all components (DEFAULT BEHAVIOR)

./oodt [start, stop, restart]

Manipulate a single component

_./oodt [start,stop, restart] [crawler OR filemanager OR workflowmanager]_

*/$DEPLOYMENT_BASE_DIR/components - This will contain a single folder for each component.  Initially this will only contain the 3 components we have selected to start this process, but as more components are added they will be added in here.

/$DEPLOYMENT_BASE_DIR/etc - This will contain configuration and properties files which apply to several components. This should (like the bin dir) give users a single directory they can go into to configure the associate components.

Parameters that can be managed within the conf directory

oodt.properties

crawler_port=9020

filemanager_port=9000

workflowmanager_port=9001

resmgr_port=9002

batchstub_port=2001

JAVA_HOME

Component settings we plan to default

crawler

port

filemanager

filemgr
FILEMGR_PORT=9000
export FILEMGR_PORT

workflow

wmgr
WFMGR_PORT=9001
export WFMGR_PORT

Default Source Structure

/$DEPLOYMENT_BASE_DIR
  /crawler
    /src/main/resources
      /policy
      /bin
      /etc
  /filemgr
    /src/main/resources
      /policy
        /oodt
      /bin
      /etc
  /workflow
    /src/main/resources
      /policy
      /bin
      /etc
  /webapps
    /fmprod
    /fmbrowser
    /wmonitor
    /curator
  /extensions
    /src/main
      /java
        /<package>
          /extractor
          /versioner
          /task
          /action
      /python
  /distribution
    /src/main/resources
      /bin
        /oodt
      /etc
        /oodt.properties

Source Path Details

Path

Description

/crawler

Project specific crawler configuration, policy, and scripts

/filemgr

Project specific filemgr configuration, policy, and scripts

/workflow

Project specific workflow configuration, policy, and scripts

/webapps

Web Applications from Apache OODT

/extensions

Extensions to the OODT framework to do metadata extraction, archive layout (aka versioner), workflow tasks, crawler actions

/distribution

Distribution package project for system level build, configuration, and scripts

Future Work

Once the above is complete our thoughts are that the next items to be incorporated are as follows:

  • Tomcat Distribution
  • OODT Services (Health Monitor, ?)
  • OODT Web Apps (Curator, ?)
  • CAS PGE
  • Expand OODT Easy Commands
    • upgrade - to allow for upgrades in OODT components
    • status - to print out the version of OODT running and component status
    • add_product_type - to configure all components with a new product type

Maven Archetype Information

Requirements to getting Artifacts Synched with Maven Central:

https://docs.sonatype.org/display/Repository/Central+Sync+Requirements

  • No labels