Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Phase I

Introduction

This document defines the scope of the phase I Workflow implementation in Airavata 0.17.

The implementation was initially motivated to is mainly to support CIPRES requirement for workflow capabilities in Airavata. Current target is to have the SEAGrid capabilities to consume workflow capabilities.  

 

Requirements

...

  • Simple language to represent basic needs of workflow execution.

...

  • Execute multiple applications in sequence mode.

  • Doesn’t require control over workflow. No loops or conditions nodes.

  • Use previous working directory if user specify. other wise create separate working directory for each job.

  • Instead of input staging, use same data(move locally) when one application use the same data which was produced or used by previous application executed in the same workflow sequence and in the same machine.

  • Stage input data if a remote resource is involved making sure to associate Airavata Experiment with the local job.

  • Experiment only goes to it’s end state after all associate jobs come to one of end state.

  • Workflow can have different set of applications which runs on set of compute resource.

  • Login user name, credential token will be same for all applications.

Implementation Scope

Initial Scope

Support sequential multiple applications running inside an experiment 

  • No separate model called workflow.

...

  • Change existing experiment model to support multiple applications.

  • Multiple applications can be defined as a DAG

  • Applications will be executed sequentially; one-after-the-other. Output will be available at the completion of the complete workflow.

  • Workflow node representation should contain application id, inputs, work directories, host application (deployment id) and other necessary information in order to make processes and tasks internally.

  • New Workflow parser for thrift based workflow language.


Cases to support 

  • All applications running on same host.

...

  • Applications running on multiple hosts.

 

...

Concerns to handle

...

 

...

  • If output of one application becomes the input to next application, do we stage output  Intermediate step outputs should be staged out and also kept for the consequent  steps.

...

  • When application running on multiple hosts, how to handle output staging

  • The output staging (to the storage resource) should be automatically executed

  • When applications running on same host, do we use same working directory and such applications needs to be handled differently

  • The working directory for a new job in the workflow should be different and in some cases this is important to ensure data reuse from the previous run for yet unknown run since a job may rewrite some data.  

  • Orchestrator needs to know whether the experiment is a single application or contains multiple applications.

  • When defining inputs to applications, will an output of a previous application be an input for another application? (This will change input data handling models)

  • Implement register to support thrift base workflow language. This will introduce lot of new database tables and registry codes.

...