Apache Airavata

Title: Developing Science Gateways with Apache Airavata

Tutorial Length: 3 hours


Abstract: The authors will provide a hands-on tour of the entire Apache Airavata framework for deploying science gateways. Hands-on demonstrations will illustrate the major components of the framework: the core task and workflow execution management system (including metadata capture); security services including authentication and account management, authorization management, permission management, and resource credential management; Python Django-based Web development and content management for end user environments; and managed file transfer and distributed data management services.  We will show Apache Airavata in action through in-operation science gateways created with collaborators in multiple scientific fields that are supporting research and education.

Tutorial  Goals and Objectives: The tutorial’s goal is to demonstrate the use of different components of the Apache Airavata framework and how these can be used to build different types of gateways. These include gateways that focus on task and workflow execution on remote high-performance computing resources; gateways that focus on remote data management; and gateways that concentrate on digital object management, including groups, permissions, and restricted content sharing. 

The specific objectives for the attendees are as follows:

  • Understand Apache Airavata’s capabilities through basic demonstrations and real-world examples.
  • Learn how to configure, through Web interfaces, Airavata’s Django-based reference implementation gateway to add applications, computing, and storage resources.
  • Learn how to integrate Custos security services into science gateways to manage authentication, user accounts, groups, permissions, and resource credentials.
  • Learn how to use Airavata Managed File Transfers to integrate user-provisioned cloud storage and local storage with university mass storage.
  • Learn how to use Data Lake services to manage metadata and distributed data, and to execute automated data pipelines. 
  • Learn how to integrate Apache Airavata services Jupyter Notebooks, either locally or in JupyterHubs.


Target audience, expected audience, and relevance to PEARC22 attendees: Any interested PEARC22 attendees are welcome. The target audience members are a) scientists and researchers interested in deploying a science gateway or similar cyberinfrastructure to expand access and use of their software, data, or infrastructure; b) science gateway developers; and c) scientific resource system managers and operators who would like to learn more about science gateways. The target audiences represent core constituencies of the PEARC conference series. Based on previous in-person workshops at PEARC, we expect the number of attendees to be approximately 20-25. 

  1. TUTORIAL LENGTH, STRUCTURE, AND FORMAT

Tutorial Length

180 minutes

Skill Level

Intermediate

Prerequisites

Familiarity with the basics of web development, science gateway operations (including cybersecurity), and the basics of running scientific applications on clusters are required for the hands-on exercises. All interested participants are welcome.

Technology Requirements

Laptop or device with a commonly used Web browser


Apache Airavata’s metadata and workflow scheduling infrastructure: builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions and to capture the metadata needed to audit and to reproduce execution outcomes. 

The Airavata Django Portal: provides an out-of-the-box end-user environment for all of the Apache Airavata middleware subsystems.  The Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of diverse research communities. 

Airavata Custos: services manage user accounts; provide federated authentication; support role, group, and attribute-based authorization; manage sharing and permissions; and manage resource credential (secrets). Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API. 

The Airavata Managed File Transfer (MFT): subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box.  Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from the gateway middleware. 

Airavata Data Lake: provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable the processing of data, the movement of data from the data sources to central storage points, and the distribution of data to respective authorized users.  


Prerequisites: Anyone interested in the topic is welcome to attend. Participants will benefit from prior general knowledge of how to execute scientific applications on HPC and Cloud environments. A laptop or other device and web browser will be required to participate in the hands-on sessions.  Familiarity with Python, Linux, and Web programming will help with the hands-on exercises but are not required. Attendees will be provided with a sample hosted Airavata Django Portal tenant to hosted SciGaP services and will have access to a virtual cluster on XSEDE’s Jetstream2 and Expanse systems with pre-installed applications.


Tutorial Outline and Schedule: The tutorial will be organized following the outline of Table 2, which is based on successful prior tutorials.  All tutorial information will be available from Apache Airavata’s online documentation (https://airavata.readthedocs.io/en/latest/, https://apache-airavata-django-portal.readthedocs.io/en/latest/). 

Time

Agenda Item

9:00 am - 9:15 am

Presenter: Suresh Marru

Presentation: Introduction and Overview

9:15 am - 10:15 am

Presenter: Marcus Christie

Airavata Django Portal: Run scientific applications on multiple remote supercomputers

  1. Hands On: Login to tutorial portal & run a computation 
  2. Hands On: Create an application with customized user interface 
  3. Advanced topic: extending with Django Apps following documentation

10:15 am - 10:30 am

Presenter: Suresh Marru

Advanced Examples: a Tour of Airavata-Based Gateways: SEAGrid (Computational Chemistry), SimVascular (Vascular System Modeling), InterACTWEL (Water Management), EMCenter (Scientific Instrument Data Management)

10:30 am - 11:00 am 

Break

11:00 am - 11:45 am

Presenter: Isuru Ranawaka

Airavata Custos:  Control permissions on digital objects 

  1. Hands On: Use the sample Learning Management Systems gateway to create, share, and update digital objects
  2. Hands On: Using the Custos API in Jupyter Notebooks

11:45 am - 12:10 pm 

Presenter: Dimuthu Wannipurage

Data Lake and Managed File Transfer (MFT)

  1. Hands On: Use MFT to seamlessly download data from remote storage systems, import and export data into cloud storages. 
  2. Hands On: Scalable domain specific metadata and data discovery

12:10 pm - 12:25 pm

Presenter: Dimuthu Wannipurage

Jupyter Labs Kernel Extensions to Apache Airavata

  1. Hands On: Use Apache Airavata programmatically through Jupyter interfaces: integrate local and remote data, local and remote computations

12:25 pm - 12:30

Presentation: Tutorial wrap up with Q & A


Presenter Information: The presentation will be given by Suresh Marru, Marcus Christie, Isuru Ranawaka, and Dimuthu Wannipurage of the Cyberinfrastructure Integration Research Center (CIRC), part of the Indiana University Pervasive Technology Institute. CIRC members are active participants in the XSEDE and Science Gateways Community Institute projects, and operate SciGaP services.   


Examples of Prior Tutorials: The workshop team has presented similar workshops at PEARC19, PEARC20, PEARC21, e-Science 2021, and Gateways 2019, and Gateways 2020.  Recordings are available for tutorials from 2020; see for example https://www.youtube.com/watch?v=CuBvFj194Kghttps://www.youtube.com/watch?v=dKV50B7vPZg, and https://www.youtube.com/watch?v=ES4LNC_j8a0

Prior tutorial agendas are available from https://cwiki.apache.org/confluence/display/AIRAVATA/Home and https://cwiki.apache.org/confluence/display/CUSTOS/Home. The tutorial proposed here combines and streamlines this previous content and adds new content on MFT, Data Lake, and Jupyter topics.

  • No labels