Apache Airavata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

General Assumptions

  1. Single Application Catalog will exists in Airavata to be used by all the gateways.
  2. Each gateway will be creating, updating, deleting, etc... their own application deployments in the supercomputers.
  3. Each gateway will have ADMIN users who would do above tasks in the Application Catalog
  4. Airavata ADMIN users will be able to view details entered through all the gateways using Airavata instance. They can also search and view application deployments;
    E.g.:
    • Search for all applications which are in turned-on state in all supercomputers
    • Search for application deployments of  specific gateway
  5. If necessary Airavata ADMIN users will also be able to maintain application deployments across gateways.
  6. Applications and workflows will always be deployed, created (workflows will be created using XBAYA), etc... prior to entering in the Application Catalog.

Existing Descriptors

  1. In Airavata there are two ways of executing experiments in supercomputers/ computational resources (Trestles, Stampede, Amazon EC2, Big Red II, etc…).
    • Execute applications - Single submission experiments
    • Execute workflows - Multiple node OR single node experiment
  2. To execute experiment jobs of either above types; we rely on ‘Descriptors’ in Airavata to provide information required on application, application hosted supercomputers and input output parameters of the application.
  3. There are three types of descriptors;
    • Host Descriptors
      Specifies the application residing/hosting resource information. GRAM, GridFTP, Amazon EC2, etc….
    • Service Descriptors
      Input and output parameter information for an application
    • Application Descriptors
      Specifies executable location information, deployment location information, scratch locations of output files, meta data specific for the application, meta data specific for the deployment (location where the application is deployed in the resource), etc…

Introduction - Application Catalog

  1. So what is more with Application Catalog? With application catalog user will be able to;
    • Store same set of information as with descriptors
    • Have number of new API methods to access and manage the applications as required
      • Create Application
      • Publish Application
      • Search/List Application
      • Update Application
      • Clone Application
      • Import Application
      • Delete Application
      • Turn On/Turn Off Application
    • Read from Application Catalog  - Other components will read from the Catalog in order to get relevant information required for experiments.
  2. In descriptors we have same set of parameters captured for clouds, supercomputers irrespective of the necessity. With the Catalog ‘the not required parameters’ will be ignored when creating, updating application information
  3. Airavata components; Orchestrator, Workflow Interpreter & GFac will connect to Application Catalog to obtain information required for executing jobs at resources.
  4. Back-end services of Application Catalog. These will be automated services which would run periodically as specified by the Airavata ADMIN users.
    • Verifying application location in the resource(s) and actual existence
    • Verifying application version in the resource
    • Verify availability of space on scratch location folders.

Status of Applications & Workflows

StatusNOTE
CREATED

When created initially, when cloned or imported the new application or workflow instance will have the status CREATED.

Created application, workflow can be updated and saved in CREATED status
PUBLISHED

Applications and workflows which are CREATED will be PUBLISHED. Published application, workflow can be updated and saved in PUBLISHED status.

Only applications & workflows with status PUBLISHED will be turned OFF.

Applications and workflows which are turned OFF will only be turned ON.

API Methods

Create Application/Workflow

  • Create application includes many sub tasks in the Application Catalog. Creating an application instance extends to;

 

    • Entering application specific information such as;

      • Application version details

      • Deployment visibility to gateway users

      • Defining application specific parameters

      • Defining resource specific parameters required for the application execution

      • Define input and output data file sizes, file locations and scratch folder locations and also their folder sizes, etc for the application

      • Error file locations, log file locations ,etc…. for an application in residing resource

      • Application user group (Application is available only for a selected user group(s)

      • Maximum wall-time, number of processors, nodes per processor, etc….

    • Entering resource specific information such as;

      • Deployed resource/supercomputer information

      • Defining resource specific parameters such as application residing location

      • Queue information

      • Deployment information such as using MPI, GPU, Serial, etc…

  1. Creating workflow consists;

    • Enter workflow information & resources where the workflow will be executed

    • Define workflow nodes and applications to be used at each nodes in the workflow

    • Defining subsequent input and output files

  2. Application & workflow creation in the catalog will be a gateway ADMIN task.

  3. Applications and workflows can be created by cloning & importing existing applications and workflows.

  4. Creating applications and workflows will not make them available for users to execute their experiments.

Publish Application/Workflow

  1. Application deployments and workflows which are CREATED will be published in order to be used by gateway users.

  2. Publishing applications and workflows is a gateway ADMIN user level task.

  3. At the time of publishing applications the system would validate

    • Existence of at least 1 deployment of the application

    • Existence of at least 1 application and resource level parameter

    • ...
  4. At the time of publishing workflows system should validate;

    • Existence of at least one application in the workflow
    • Existence of at least one node in the workflow

    • ...
  5. User can search within created applications in order to publish by providing;

    • Application ID/Workflow ID (Generate ID within Application Catalog)

    • Application Name

    • Resource

    • Application User Group

    • ...

Search/List Application/Workflow

  1. Gateway users (ADMIN, normal) and Airavata ADMIN users will search/list applications and workflows for different purposes. when searching/listing for applications and workflows different keys can be used;

  2. Some examples;

    • Application ID/Workflow ID

    • Application name/Workflow Name

    • Resource

    • By ON/OFF flag

    • Status (CREATED & PUBLISHED)

    • By the special application user groups

    • ...

  3. Gateway users or Airavata users may want to use above listing for auditing purposes.

Update Application/Workflow

  1. Once the applications and workflows are created in the catalog they can be updated/modified as per requirement changes.

  2. Applications can be modified irrespective of their statuses.

  3. The modification will be mostly handled by the gateway ADMIN users. Gateway ADMIN users can only modify their gateway applications.

  4. Airavata ADMIN users can modify applications across gateways and resources.

Clone Application/Workflow

  1. Application or workflow cloning can be done only within the gateway. If needed gateways will be able to restrict applications and workflows from getting cloned.

  2. Cloning will be carried out by the gateway ADMIN users.

  3. Any existing application or workflow can be cloned irrespective of the application state, existing flag, etc…..

  4. The new cloned application/workflow will always be in created state and ready to be published.

  5. After cloning the user can add remove parameters to suit the new application requirements.

Import Application/Workflow

  1. In an environment where Application Catalog will be shared among gateways, users can import applications/workflows from other gateways to create new applications.

  2. Gateways will be able to restrict their applications and workflows from getting imported by other gateways.

  3. Application/workflow status or any other flag will not be considered when importing; any application/workflow can be imported unless its restricted by the owning gateway.

Delete Application/Workflow

  1. Applications can be deleted from the catalog by the gateway or/and Airavata ADMIN users. Once deleted application cannot be used by the gateway users.

  2. Any existing experiments for the application can proceed without interruption as long as the application actually exist in the supercomputer.

Turn ON/OFF Application/Workflow

  1. Application/Workflow can be turned ON and OFF whenever required by the gateway or Airavata ADMIN users.

  2. This functionality will be used when the application/workflow need to be temporarily unavailable for the users.

  3. Applications/workflows which are published can be turned ON and OFF.

  4. Turning ON and Off can be done at different levels. Such as;

    • Resource level

    • Application user group level

    • Application level

  5. When an application or a workflow is turned OFF from a certain level it needs to be turned ON from the same level as well.

    • E.g.: When applications are turned OFF at resource level they cannot be turned ON individually; turning ON also will happen at resource level.

Read from Application Catalog

  1. Airavata internal component can also query information from the Application Catalog. Orchestrator, Workflow Interpreter, GFac can query information required to formulate the tasks to be executed in the supercomputers.

  2. Orchestrator will query Application Catalog to retrieve information required to submit single submission jobs into GFac. It will retrieve;

    • Information on application residing resource

    • Information on application specific parameters

    • If the experiment didn’t specify resource Orchestrator will select a resource from Application Catalog

  3. Workflow interpreter will retrieve information required to submit jobs of the workflows.

    • Retrieve information on applications to execute for each node

    • Information on resources where applications are running

    • Application version information

    • ...

  4. GFac will also use information stored in the Application Catalog

    • Information on locations for input data files, output data files, error files and log files for each application

    • Information on application versions  and other information required for application execution

    • Information to select the correct providers and handlers

    • ...

Use Cases - Setting-up & maintenance of Application Catalog

Create/Enter Application/Workflow

Use Case I

Enter details of application deployment (WRF) in supercomputer (Big Red II) into the Application Catalog.

  • Create a new application deployment  instance in the catalog

  • Add supercomputer detail which has it deployed

  • Define application specific, application deployment specific information and parameters required to use the application

  • Application specific information such as;

    • Application version

    • Input & output file sizes (maximum file size)

    • Input and output file locations (scratch locations)

    • duration of scratch location holding the output files

    • Error file and log file locations

  • Application deployment specific information;

    • Deployed application version

    • Deployed Application description

    • Deployed location

    • Application accessible providers (GSI-SSH)

  • E.g.: Define echo application in Big Red II (or some other application)
    • Step 1 : Define Big Red II details
      • Host address : bigred2.uits.iu.edu
      • Remote Access Protocol : GSI-SSH
      • Port : 22
      • Installed Path : /opt/torque/bin/Authentication : ???
    • Step 2 : Define application details
      • Input : [name: echo_input, type:String]
      • Output : [name: echo_output, type:String]
      • Executable Location : /bin/echo
      • App version : 1.0
    • Step 3 : Define Job details
      •  Scratch Location : /home/ogce/scratch 
      • Project Account : sds128
      • Queue Type : normal
      • CPU Count : 1
      • Job Type : Serial
      • Node Count : 1
      • Processors Per Node : 1
      • Max Wall Time : 10

Use Case II

Enter details of the application (Amber)  which has the same version (V 12) deployed in multiple supercomputers (BigRed II & Stampede)

  • Create two instances of the application deployments  in the catalog for each supercomputer deployment with application version information  

  • Define general set of application parameters for the deployment in first supercomputer

  • Define resource specific set of parameters for the second supercomputer deployment

Use Case III

Enter details of the application (Amber)  which has the same application version (V 12) deployed in BigRed II but with different deployment variations; mpi & gpu

  • Create two instances of the same application version in the catalog for Big Red II

  • Define one instance as MPI deployment and other as GPU deployment

  • Define the mandatory primary parameters and additional advance parameters for both instances

Use Case IV

  • Enter details of WRF application which has two different versions (cray/3.5 & 3.4.1) deployed in Big Red II & Quarry supercomputers

  • Create two instances of the application deployments  in the catalog for each supercomputer deployment for each version

Use Case V

Create two instances of application Scorep with two different versions (gnu/1.2beta & gnu/1.2.3) which are deployed in the Big Red II

  • Create two instances of the application deployment in Big Red II with different versions

  • One version of Deployment will have general set of application parameters

  • The second version will have supercomputer specific set of parameters

Use Case VI

Create an application deployment instance which will not be visible to the gateway user to select at the time of creating experiment.

  • Create the application deployment with a flag to say ‘ NOT visible’

Use Case VII

Create an application deployment instance which can only be used in workflows; not available for single submissions.

  • Create the application deployment with a flag to say ‘ ONLY for workflows’

Use Case VIII

Create an application deployment which has its deployments in supercomputers hidden from the gateway users

  • Create the application deployment with a flag to say ‘ Hidden deployments’

NOTE: Gateway users can only select the application but not the supercomputer; user will not know where the application is running

Use Case IX

Enter details of an workflow and its nodes which executes using multiple applications deployed in Big Red II

  • Create the workflow in the catalog and define the supercomputer as supercomputer-I

  • Define the applications which will be used in the workflow

Use Case X

Enter details of the workflow which executes using multiple applications from two different supercomputers (Stampede & Lonestar)

  • Create the workflow and its nodes  in the catalog and define the multiple supercomputers which runs the applications

  • Define the applications which will be used in the workflow from each supercomputer

Publish Application/Workflow

Use Case I

Publish an application which has defined its deployments, primary parameters & advance parameters properly

  • Search for the application using application ID/Resource/Status/etc…

  • Publish the application ( Validations will run to verify existence of at least single deployment

Use Case II

Publish an application which has no deployments, primary parameters & advance parameters properly defined

  • Search for the application using application ID/Resource/Status/etc…

  • Publishing validations will fail and publishing will be restricted

Use Case III

Publish a workflow which has its nodes, applications to be executed at each node properly defined with input parameters and all relevant file locations

  • Search for the workflow using workflow ID/Status/etc…

  • Publish the workflow

Use Case IV

Publish a workflow which has no nodes or their applications defined

  • Search for the workflow using workflow ID/Status/etc…

  • Publishing validations will fail and publishing will be restricted

Update Application/Workflow

Use Case I

Update an already created application adding new set of resource specific parameters for a new resource

  • Search for the existing application by providing the application ID

  • Search result will be listed with already existing deployments

  • Add the new application deployment in the new resource and define specific parameters for the resource.

Use Case II

Update an already published application adding new set of resource specific parameters for a new resource

  • Search for the existing application by providing the application ID

  • Search result will be listed with already existing deployments

  • Add the new application deployment in the new resource and define specific parameters for the resource.

Use Case III

Update an already created application by making all its deployments visible to gateway users from the gateway.

  • Search for the existing application by providing the application ID

  • Search result will be listed with already existing deployments

  • Change the ‘Deployments Visible’ State to TRUE for all deployments of the application

Assumption: Each gateway will enter details of their application deployments in to the application catalog.

E.g. if two gateways are using the same application version deployed in the same supercomputer there will be two records in the catalog; one record for each gateway.

Use Case IV

Update an already created application by removing a deployment in a resource and its specific parameters

  • Search for application deployments by giving the resource ID/name

  • All applications deployed will be listed with the application version & their specific parameters

  • Select the application to be removed from the list and delete it. Delete specific parameters of the deployment as well

Use Case V

Update an application to be deployed in multiple resources. Currently the deployment status of the application is stated as single deployment

  • Search for the application by giving the application ID

  • Make the ‘Multiple Deployments flag to TRUE

  • Now the gateway ADMIN should be able to enter details of multiple deployments in tho the catalog

Use Case VI

Update an already existing application version to have resource specific parameters. The particular application version is deployed in multiple resources but the modification is only for one of the instances

  • Search for the application by giving the application ID

  • Select the relevant deployment version & the resource

  • Add the new resource specific parameters

  • Now the gateway users has to give the additional resource specific parameters for the particular resource when using the application

Use Case VII

Update an already existing application version to have resource specific parameters. The particular resource has two versions of the same application deployed in the resource. Only one of those application deployment will be added with the new resource specific parameters.

  • Search for the application by giving the resource

  • Select the relevant deployment version which needs the parameters

  • Add the new resource specific parameters

  • Now the gateway users has to give the additional resource specific parameters for the particular application version only; when using the application

Use Case VIII

Update details of a created workflow which executes using multiple applications deployed in the supercomput

  • Search for the workflow in the supercomputer

  • Add a another application in to the newly defined workflow node

Use Case IX

Update details of a published workflow which executes using multiple applications deployed in the supercomput

  • Search for the workflow in the supercomputer

  • Add a another application in to the newly defined workflow node

Use Case X

Update details of the workflow which executes using multiple applications from two different supercomputers (supercomputer-I & supercomputer-II)

  • Search for the workflow in the supercomputer-I & supercomputer-II

  • Add a new application to the new workflow node which is in the supercomputer-I

  • Remove existing application from a existing node which is in the supercomputer-II



  • No labels