Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Title: Developing Science Gateways

...

with Apache Airavata

Tutorial Length: 3 hours


Abstract: The authors will provide a hands-on tour of the entire Apache Airavata framework for deploying science gateways. Hands-on demonstrations will illustrate the major components of the framework: the core task and workflow execution management system (including metadata capture); security services including authentication and account management, authorization management, permission management, and resource credential management; Python Django-based Web development and content management for end user environments; and managed file transfer and distributed data management services.  We will show Apache Airavata in action through in-operation science gateways created with collaborators in multiple scientific fields that are supporting research and education.

Tutorial  Goals and Objectives: The tutorial’s goal Tutorial Description and Goals: In this tutorial, we present the Apache Airavata middleware and the new Django Portal for Airavata, which provides an extensible frontend for building Apache Airavata-based science gateways. The goal of the tutorial is to demonstrate the use of different components of the Apache Airavata as a gateway environment that framework and how these can be used to manage the execution of scientific software on diverse computing resources; to enable users to create, organize, replicate, and share computational experiments; to enable gateway providers to leverage Apache Airavata’s platform services to quickly deploy a science gateway that provides access to XSEDE, university, departmental, and commercial cloud computing resources; and to give gateway providers an environment for creating highly customizable and configurable gateways that can target diverse research groups.  

This tutorial will showcase new capabilities not present in prior PEARC tutorials, including extensive customizability of the new Django Portal for Airavata user interface environment, advanced file handling across distributed resources, and standalone security components that can be integrated into a wide variety of gateway platforms.

Relevance to PEARC20: Science gateways provide science-specific user interfaces for scientific applications for end users who are unfamiliar with or need more capabilities than provided by command-line interfaces.  Science gateway middleware like Apache Airavata provides the general purpose capabilities behind gateway user interfaces. Gateways are particularly valuable in making the latest software such as machine learning applications available to users and helping users access the newest XSEDE and other resources, assisting researchers and students with a wide variety of skills to “catch the wave” for accessing new scientific software and advanced systems. 

Target audiences for this tutorial include a) scientific software developers, who want simplified ways to deliver their software and support larger user communities; b) educators who want to integrate scientific software usage into their classroom without having students get bogged down in the submission mechanisms of specific resources; and c) campus computing center staff, who want to use gateways to broaden their reach beyond their traditional users and to help users make more efficient use of resources.

Format: The format will be a mixture of presentations, demonstrations, and hands-on exercises, as described in the detailed agenda. The agenda indicates the roles of each of the organizers and the nature of each section (presentation, demonstration, exercise, discussion). In summary, the tutorial will be approximately 75% demonstrations and hands-on exercises, and 25% presentations and discussion.   

build different types of gateways. These include gateways that focus on task and workflow execution on remote high-performance computing resources; gateways that focus on remote data management; and gateways that concentrate on digital object management, including groups, permissions, and restricted content sharing. 

The specific objectives for the attendees are as follows:

  • Understand Apache Airavata’s capabilities through basic demonstrations and real-world examples.
  • Learn how to configure, through Web interfaces, Airavata’s Django-based reference implementation gateway to add applications, computing, and storage resources.
  • Learn how to integrate Custos security services into science gateways to manage authentication, user accounts, groups, permissions, and resource credentials.
  • Learn how to use Airavata Managed File Transfers to integrate user-provisioned cloud storage and local storage with university mass storage.
  • Learn how to use Data Lake services to manage metadata and distributed data, and to execute automated data pipelines. 
  • Learn how to integrate Apache Airavata services Jupyter Notebooks, either locally or in JupyterHubs.


Target audience, expected audience, and relevance to PEARC22 attendees: Any interested PEARC22 attendees are welcome. The target audience members are a) scientists and researchers interested in deploying a science gateway or similar cyberinfrastructure to expand access and use of their software, data, or infrastructure; b) science gateway developers; and c) scientific resource system managers and operators who would like to learn more about science gateways. The target audiences represent core constituencies of the PEARC conference series. Based on previous in-person workshops at PEARC, we expect the number of attendees to be approximately 20-25. 

  1. TUTORIAL LENGTH, STRUCTURE, AND FORMAT

Tutorial Length

180 minutes

Skill Level

Intermediate

Prerequisites

Familiarity with the basics of web development, science gateway operations (including cybersecurity), and the basics of running scientific applications on clusters are required for the hands-on exercises. All interested participants are welcome.

Technology Requirements

Laptop or device with a commonly used Web browser


Apache Airavata’s metadata and workflow scheduling infrastructure:builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions and to capture the metadata needed to audit and to reproduce execution outcomes. 

The Airavata Django Portal:provides an out-of-the-box end-user environment for all of the Apache Airavata middleware subsystems.  The Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of diverse research communities. 

Airavata Custos:services manage user accounts; provide federated authentication; support role, group, and attribute-based authorization; manage sharing and permissions; and manage resource credential (secrets). Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API. 

The Airavata Managed File Transfer (MFT):subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box.  Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from the gateway middleware. 

Airavata Data Lake: provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable the processing of data, the movement of data from the data sources to central storage points, and the distribution of data to respective authorized users.  


Prerequisites: Anyone interested in the topic is welcome to attend. Participants will benefit from prior Prerequisites: Participants should have a general knowledge of how to execute scientific applications on HPC and Cloud environments. Only a A laptop or other device and web browser will be required to follow participate in the demo sessions and hands-on sessions, but optional programming exercises will require a Mac OS X laptop or a laptop with a common Linux distribution, such as Ubuntu.  Computations on XSEDE will use the tutorial team’s XSEDE community account allocation.

Special Requirements: the full tutorial assumes network access by the instructors and attendees. If network problems occur, the presenters can give the tutorial in demonstration-only mode. 

Instructors and Roles: The tutorial will be offered by leaders of the Apache Airavata project. As a top level Apache Software Foundation project, Airavata follows Apache’s open source governance model and the instructors are all committers (developers), contributors and project management committee members of Airavata. Marlon Pierce will lead the overall tutorial and will introduce science gateways and Airavata. Eroma Abeyasinghe and Marcus Christie will present the hands-on exercises. Sudhakar Pamidighantam will guide hands on examples of Gaussian model execution from a gateway. Suresh Marru will present Airavata architecture and under the hood details. All presenters are actively involved in XSEDE ECSS projects and Science Gateway Community Institute consultations. 

.  Familiarity with Python, Linux, and Web programming will help with the hands-on exercises but are not required. Attendees will be provided with a sample hosted Airavata Django Portal tenant to hosted SciGaP services and will have access to a virtual cluster on XSEDE’s Jetstream2 and Expanse systems with pre-installed applications.


Tutorial Outline and Schedule: The tutorial will be organized following the outline of Table 2, which is based on successful prior tutorials.  All tutorial information will be available from Apache Airavata’s online documentation (https://airavata.readthedocs.io/en/latest/, https://apache-airavata-django-portal.readthedocs.io/en/latest/). 

Time

Agenda Item

9:00 am - 9:15 am

Presenter: Suresh Marru

Presentation: Introduction and Overview

9:15 am - 10:15 am

Presenter: Marcus Christie

Airavata Django Portal: Run scientific applications on multiple remote supercomputers

  1. Hands On: Login to tutorial portal & run a computation 
  2. Hands On: Create an application with customized user interface 
  3. Advanced topic: extending with Django Apps following documentation

10:15 am - 10:30 am

Presenter: Suresh Marru

Advanced Examples: a Tour of Airavata-Based Gateways: SEAGrid (Computational Chemistry), SimVascular (Vascular System Modeling), InterACTWEL (Water Management), EMCenter (Scientific Instrument Data Management)

10:30 am - 11:00 am 

Break

11:00 am - 11:45 am

Presenter: Isuru Ranawaka

Airavata Custos:  Control permissions on digital objects 

  1. Hands On: Use the sample Learning Management Systems gateway to create, share, and update digital objects
  2. Hands On: Using the Custos API in Jupyter Notebooks

11:45 am - 12:10 pm 

Presenter: Dimuthu Wannipurage

Data Lake and Managed File Transfer (MFT)

  1. Hands On: Use MFT to seamlessly download data from remote storage systems, import and export data into cloud storages. 
  2. Hands On: Scalable domain specific metadata and data discovery

12:10 pm - 12:25 pm

Presenter: Dimuthu Wannipurage

Jupyter Labs Kernel Extensions to Apache Airavata

  1. Hands On: Use Apache Airavata programmatically through Jupyter interfaces: integrate local and remote data, local and remote computations

12:25 pm - 12:30

Presentation: Tutorial wrap up with Q & A


Presenter Information: The presentation will be given by Suresh Marru, Marcus Christie, Isuru Ranawaka, and Dimuthu Wannipurage of the Cyberinfrastructure Integration Research Center (CIRC), part of the Indiana University Pervasive Technology Institute. CIRC members are active participants in the XSEDE and Science Gateways Community Institute projects, and operate SciGaP services.   


Examples of Prior Tutorials: The workshop team has presented similar workshops at PEARC19, PEARC20, PEARC21, e-Science 2021, and Gateways 2019, and Gateways 2020.  Recordings are available for tutorials from 2020; see for example https://www.youtube.com/watch?v=CuBvFj194Kghttps://www.youtube.com/watch?v=dKV50B7vPZg, and https://www.youtube.com/watch?v=ES4LNC_j8a0

Prior tutorial agendas are available from Recent Tutorial Offerings: This half-day tutorial will build on XSEDE14, XSEDE15, XSEDE16, PEARC17, PEARC19, and Gateways 2019 tutorials. Extensive tutorial material is available from https://cwiki.apache.org/confluence/display/AIRAVATA/PEARC+2019+Tutorial+Agenda, https://cwiki.apache.org/confluence/display/AIRAVATA/Gateways19+Tutorial+Agenda, Home and https://airavatacwiki.readthedocs.io; see also https://courses.airavata.org. Previous tutorials have had 20-25 attendees eachapache.org/confluence/display/CUSTOS/Home. The tutorial proposed here combines and streamlines this previous content and adds new content on MFT, Data Lake, and Jupyter topics.