Overview

We'd like to support what is normally called Workflow by taking the existing processing pipeline capabilities out of the existing CAS, and separating these capabilities, untying them from their relationship with file management and product ingestion. In order to support generic Workflow, there needs to be the separate notion of "Workflow Management". This page will discuss what capabilities a Workflow Manager should have.

Desired Capabilities

  • Workflow should be represented as a graph. This will allow for true parallelism.
  • We should support identified workflow patterns especially control-flow. The current level of support for control-flow has to a large extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution.
  • Data-flow should be captured. Since we are in the business of doing science processing our tasks are focused on inputs and outputs. The workflow should be able to minimally hook together input and output streams between tasks. If we take this one step further we could create record based streams to allow for horizontal parallelism. Futhermore, if we are able to capture the streams they can be perserved for traceability.
  • Workflow need not have any interaction with a database. What if one wants to persist a workflow in XML? Or as a flat file, or some other lightweight format?

Resources

  • No labels