Table of Contents |
---|
Info | ||
---|---|---|
|
...
The effort was renamed to RADiX to help delineate it from the core distribution of OODT. RADiX will be both easy and awesome. By default it is already RAD. |
ApacheCon NA 2011 Fast Feather Talk
Info from the ApacheCon NA 2011 talk by Paul Ramirez and Cameron Goodale is here: http://lanyrd.com/2011/apachecon-north-america/smfcf/
Overview
...
The high level goal of this effort is to build a distribution of OODT that sets up, installs, and runs within a few commands. Reducing the number of commands to run OODT is an effort to push the edge of ease of setup and configuration required to get going. This distribution of OODT will include both a deployment and source structure for managing the evolution of your installation of OODT.
Maven offers an improved way to export, configure and build OODT called Archetypes. Archetypes simply put are a way to define templates for projects. Within these project templates we will include packaging instructions to conform to the guidelines below to increase the similarity amongst deployments of OODT. Moreover, we will build higher level scripts and configuration to tie the pieces together at the system level. Finally, we will leverage the CAS Install Maven Plugin to take us from our source structure to our deployment structure.
This wiki will be used to capture thoughts, ideas and plans for the first archetypes we develop for OODT. To keep things simple we are going to initially focus on a small number of modules that are typically deployed and configured together. Finally, our goal is to build an 80% solution that works in most cases to get people out of the gates and running with a full OODT solution. We believe this effort will help increase adoption and conformity amongst installations of an already great system.
Lineage
...
Long, long, ago there was a thought that making some archetypes for OODT would be great to get people going. Of course we already had example projects along with each of our components but let's be honest, how many of us started from those? Okay, so a few did but that's not the point. Sometime later a wise man pondered that it might be cool to borrow the idea Cloudera like distribution for OODT. That of course fizzled, until sometime recently a young padawan came along and said OODT should just be easier. The archetype guy said, you are right and that's what I've been saying. Of course, being that the idea was always in the aether and not a reality no one knew. That is until now. RADiX is the realization of what has been brewing and itch that has been waiting to be scratched. It builds on stuff we have done all along but brings our work to the forefront and helps it shine. This new way of working with OODT will not be for everyone but we hope that it will give the 80% solution and that,"Then you'll see, that it is not the spoon that bends, it is only yourself."
Assumptions
...
- The initial archetype will export RELEASED versions of OODT
- The initial archetype will export Crawler, FileManager and Workflow Manager ONLY (they will be bundled together and configured to work together)
- other modules will be added in the future
- FileManager Policy will be read recursively from the components/filemanager/policy directory. This will remove the requirement to make properties updates when additional policy files are added in sub-directories.
Constraints
...
- Archetype will only support a single version for all components. No mixing versions of individual components
- Maven Archetype process will be completed using as few commands as possible
Prerequisites
...
- Submit INFRA ticket to create a place to put all Maven Central artifacts
- Load/Install artifacts to Maven Central
The Commands
Requires Maven 2.x and Java 1.5+
Until the The archetype is made available at maven central you will need to run the following that will install the radix archetype locally. Here is the set of commands you would typically run.
Code Block |
---|
prompt> svn co$ curl -o radix https://svnraw.apachegithubusercontent.orgcom/repos/asfapache/oodt/trunk/mvn/archetypes/radix prompt> cd radix prompt> mvn install |
Once you have the archetype installed or it is available at maven central. Here is the set of commands you would typically run.
/src/main/resources/bin/radix
|
You should then edit the radix file replacing the parameters below with whatever you want.
The first command is the running of a maven archetype to make an oodt project. Inside of the RADIX script downloaded in the curl command is an mvn archetype generation command. That command has a number of parameters that the 1 liner radix script encapsulates. (parameters marked in italics below)
The groupId is a place to specify your company's namespace.
The artifactId is a place to specify a short name of your project.
The version indicates the initial version label for your project.
The oodt flag indicates the version of OODT that you want your project to be built on. N.B., this should most likely match the most recent version of OODT.
Code Block |
---|
prompt> mv oodt oodt-src; cd oodt-src; mvn install |
Code Block |
prompt> mkdir ${project.name.directory} prompt> cd ${project.name.directory} prompt> mvn archetype:generate -DarchetypeGroupId=org.apache.oodt -DarchetypeArtifactId=radix-archetype -DarchetypeVersion=0.6 -Doodt=0.6 -DgroupId=com.mycompany -DartifactId=dms -Dversion=0.1-SNAPSHOT prompt> cd dms; mvn package prompt> mkdir ../oodt-deploy; tar -xvf distribution/target/dmsoodt-distribution-0.1-SNAPSHOT-bin.tar.gz -C ../oodt-deploy prompt> cd ../oodt-deploy; ./bin/oodt start prompt> ./resmgr/bin/batch_stub 2001 |
- The first and second commands simply create a new project directory and cd into it. This is necessary as we need a clean directory containing no pom for the archetype to work its magic.
- The third command is the running of a maven archetype to make an oodt project. The groupId is a place to specify your company's namespace. The artifactId is a place to specify a short name of your project. The version indicates the initial version label for your project. The oodt flag indicates the version of OODT that you want your project to be built on.
- The fourth command simply moves into the created oodt project directory where your source and configuration can be maintained and later can be placed into version control. Then creates the distribution of your OODT project using "mvn package"
- The fifth third command merely untars the distribution into the created deployment directory
- The last fourth command moves into the deployment directory and starts the OODT system..
- The fifth command launches batch stub on the port 2001.
NOTE1: If you observe for some reason OODT doesn't start, make sure JAVA_HOME is set in your ~/.bashrc (Example: export JAVA_HOME="/usr/lib/jvm/java-6-openjdk-i386") and then start OODT in a new terminal. The actual problem can be seen in the tomcat logs which is located in $OODT_HOME/tomcat/logs.
NOTE2: After you launch oodt (the fourth command), you may observe the following output:
Code Block |
---|
Using CLASSPATH:
-e Starting OODT File Manager [ Failed ]
-e Starting OODT Resource Manager [ Failed ]
-e Starting OODT Workflow Manager [ Failed ] |
Don't be confused. In order to see whether the oodt is running, open a browser to You can now check the installation by opening a browser to http://localhost:8080/opsui or running the various command line programs for the various components. . Click on PCS Status link to get detailed information about running processes. A green arrow indicates that the corresponding process runs correctly.
Alternatively one may run the following command and get list of relevant processes and assigned ports.
Code Block |
---|
ps -ax | grep "oodt" |
NOTE: The "oodt start" command should return quite quickly. This is normal. If, however, the OpsUI link http://localhost:8080/opsui link above is not functional on your system, one good place to start looking for probable cause is the "oodt.out" log file, which can be found (if you've been following the example above) in "oodt-deploy/logs/oodt.out"oodt/logs/oodt.out"
Note | ||
---|---|---|
| ||
While starting/stopping oodt if you get errors like "Is File/Workflow/Resource Manager
You should get this message for all these three links: "Method GET not implemented (try POST)" which means all pcs_stat |
Version Control
If you want to manage your OODT RADiX distribution with Subversion.
Code Block |
---|
prompt> svn import <dataSystemName> http://your_repo_path/<dataSystemName>/trunk -m "Initial OODT Import" " |
If you want to manage your OODT RADiX distribution with Git
Info | ||
---|---|---|
| ||
prompt> git init prompt> git add . prompt> git commit -a -m "Initial OODT import" |
Default Deployment Structure
In order for the easy installation to work properly we will need to settle on a default deployment layout structure. Below is our plan of how we want to layout the deployment when the project is built. First we will list out an overview, then we can detail each path and what files are going to be saved into each.
Code Block | ||||
---|---|---|---|---|
| ||||
/$DEPLOYMENT_BASE_DIR /bin /crawler /bin /etc /policy /lib /filemgr /data /archive /catalog /failure /met /staging /work /workflow /binextensions /etcbin /policyetc /lib /workflowfilemgr /bin /etc /policy /lib /logs /extensionspcs /bin /etc /lib /logs /policy /run /tomcatpge /bin /lib /policy /dataresmgr /archivebin /stagingetc /lib /logs /policy /run /tomcat /LICENSE /NOTICE /RELEASE-NOTES /RUNNING.txt /bin /common /conf /logs /server /shared /temp /webapps /work /workflow /metbin /failureetc /lib /catalog /logs /policy /run |
Deployment Path Descriptions
Path | Description |
---|---|
/data/archive | This is the root of where the filemgr will store its archived products |
/data/staging | This directory will be monitored by the crawler. Products to be ingested should be placed here |
/data/catalog | In a configuration that uses Lucene as a back end this directory holds the contents of that index |
/data/work | ... |
/data/failure | Any products that have failed ingestion will be placed here along with any metadata files. |
/bin | Contains system System level scripts to start, stop, restart the OODT infrastructure. |
/crawler | The crawler Crawler deployment for your data management system (i.e. policy, scripts, and configuration). This component is responsible for monitoring the staging area |
/filemgr | The filemgr Filemgr deployment for your data management system (i.e. policy, scripts, and configuration). This component catalogs and archives products into the archive area. |
/workflow | The workflow Workflow deployment for your data management system (i.e. policy, scripts, and configuration). This component orchestrates any processing that may need to be done on your products |
/extensions | This is a sandbox Sandbox area to test out metadata extractors, versioners, actions, etc. that you have developed to extend the functionality of the existing OODT framework. |
/etc | System wide configuration |
...
/bin Details
/$DEPLOYMENT_BASE_DIR/bin- This will contain scripts that will manipulate This directory contains scripts manipulate the underlying components. For example all 3 components can be started, stopped and restarted from this directory. At the same time you can also manipulate a single component at a time from this dir alsoYou can start, stop, and restart each individual component or all 3 at the same time.
Manipulate all components (DEFAULT BEHAVIOR):unmigrated-wiki-markup
Code Block |
---|
...
./oodt |
...
[start |
...
OR stop |
...
OR restart |
...
] |
...
Manipulate a single component: \_ Wiki Markup
Code Block |
---|
./oodt |
...
[start |
...
OR stop OR restart |
...
] |
...
|
...
[crawler OR filemanager OR workflowmanager |
...
] |
...
/conf Details
Parameters that can be managed within the conf directory
These are the default parameters set within
oodt.properties.
Code Block |
---|
crawler_port=9020
filemanager_port=9000
workflowmanager_port=9001
resmgr_port=9002
batchstub_port=2001
|
JAVA_HOME
Component settings we plan to default
crawler
port
filemanager
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
FILEMGR_PORT=9000
export FILEMGR_PORT
|
workflow
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
WFMGR_PORT=9001
export WFMGR_PORT
|
Default Source Structure
Code Block | ||||
---|---|---|---|---|
| ||||
/$DEPLOYMENT_BASE_DIR
/crawler
/policy
/bin
/etc
/filemgr
/policy
/oodt
/bin
/etc
/workflow
/policy
/bin
/etc
/webapps
/fmprod
/fmbrowser
/wmonitor
/curator
/extensions
/src/main
/java
/<package>
/extractor
/versioner
/task
/action
/python
/distribution
/bin
/oodt
/etc
/oodt.properties
|
Source Path
...
Descriptions
Path | Description |
---|---|
/crawler | Project specific crawler configuration, policy, and scripts |
/filemgr | Project specific filemgr configuration, policy, and scripts |
/workflow | Project specific workflow configuration, policy, and scripts |
/webapps | Web Applications from Apache OODT |
/extensions | Extensions to the OODT framework to do metadata extraction, archive layout (aka versioner), workflow tasks, crawler actions |
/distribution | Distribution package project for system level build, configuration, and scripts |
Future Work
Once the above is complete our thoughts are that the next items to be incorporated are as follows:
- Tomcat Distribution
- OODT Services (Health Monitor, ?)
- OODT Web Apps (Curator, ?)
- CAS PGE
- Expand OODT Easy Commands
- upgrade - to allow for upgrades in OODT components
- status - to print out the version of OODT running and component status
- add_product_type - to configure all components with a new product type
Maven Archetype Information
Requirements to getting Artifacts Synched with Maven Central:
...