Apache Airavata
Abstract: The authors will provide a hands-on tour of the entire Apache Airavata framework for deploying science gateways. Hands-on demonstrations will illustrate the major components of the framework: the core task and workflow execution management system (including metadata capture); security services including authentication and account management, authorization management, permission management, and resource credential management; Python Django-based Web development and content management for end user environments; and managed file transfer and distributed data management services. We will show Apache Airavata in action through in-operation science gateways created with collaborators in multiple scientific fields that are supporting research and education.
Tutorial Goals and Objectives: The tutorial’s goal is to demonstrate the use of different components of the Apache Airavata framework and how these can be used to build different types of gateways. These include gateways that focus on task and workflow execution on remote high-performance computing resources; gateways that focus on remote data management; and gateways that concentrate on digital object management, including groups, permissions, and restricted content sharing.
The specific objectives for the attendees are as follows:
Target audience, expected audience, and relevance to PEARC22 attendees: Any interested PEARC22 attendees are welcome. The target audience members are a) scientists and researchers interested in deploying a science gateway or similar cyberinfrastructure to expand access and use of their software, data, or infrastructure; b) science gateway developers; and c) scientific resource system managers and operators who would like to learn more about science gateways. The target audiences represent core constituencies of the PEARC conference series. Based on previous in-person workshops at PEARC, we expect the number of attendees to be approximately 20-25.
Tutorial Length | 180 minutes |
Skill Level | Intermediate |
Prerequisites | Familiarity with the basics of web development, science gateway operations (including cybersecurity), and the basics of running scientific applications on clusters are required for the hands-on exercises. All interested participants are welcome. |
Technology Requirements | Laptop or device with a commonly used Web browser |
Apache Airavata’s metadata and workflow scheduling infrastructure: builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions and to capture the metadata needed to audit and to reproduce execution outcomes.
The Airavata Django Portal: provides an out-of-the-box end-user environment for all of the Apache Airavata middleware subsystems. The Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of diverse research communities.
Airavata Custos: services manage user accounts; provide federated authentication; support role, group, and attribute-based authorization; manage sharing and permissions; and manage resource credential (secrets). Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API.
The Airavata Managed File Transfer (MFT): subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box. Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from the gateway middleware.
Airavata Data Lake: provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable the processing of data, the movement of data from the data sources to central storage points, and the distribution of data to respective authorized users.
Prerequisites: Anyone interested in the topic is welcome to attend. Participants will benefit from prior general knowledge of how to execute scientific applications on HPC and Cloud environments. A laptop or other device and web browser will be required to participate in the hands-on sessions. Familiarity with Python, Linux, and Web programming will help with the hands-on exercises but are not required. Attendees will be provided with a sample hosted Airavata Django Portal tenant to hosted SciGaP services and will have access to a virtual cluster on XSEDE’s Jetstream2 and Expanse systems with pre-installed applications.
Tutorial Outline and Schedule: The tutorial will be organized following the outline of Table 2, which is based on successful prior tutorials. All tutorial information will be available from Apache Airavata’s online documentation (https://airavata.readthedocs.io/en/latest/, https://apache-airavata-django-portal.readthedocs.io/en/latest/).
Time | Agenda Item |
9:00 am - 9:15 am Presenter: Suresh Marru | Presentation: Introduction and Overview |
9:15 am - 10:15 am Presenter: Marcus Christie | Airavata Django Portal: Run scientific applications on multiple remote supercomputers
|
10:15 am - 10:30 am Presenter: Suresh Marru | Advanced Examples: a Tour of Airavata-Based Gateways: SEAGrid (Computational Chemistry), SimVascular (Vascular System Modeling), InterACTWEL (Water Management), EMCenter (Scientific Instrument Data Management) |
10:30 am - 11:00 am | Break |
11:00 am - 11:45 am Presenter: Isuru Ranawaka | Airavata Custos: Control permissions on digital objects
|
11:45 am - 12:10 pm Presenter: Dimuthu Wannipurage | Data Lake and Managed File Transfer (MFT)
|
12:10 pm - 12:25 pm Presenter: Dimuthu Wannipurage | Jupyter Labs Kernel Extensions to Apache Airavata
|
12:25 pm - 12:30 | Presentation: Tutorial wrap up with Q & A |
Presenter Information: The presentation will be given by Suresh Marru, Marcus Christie, Isuru Ranawaka, and Dimuthu Wannipurage of the Cyberinfrastructure Integration Research Center (CIRC), part of the Indiana University Pervasive Technology Institute. CIRC members are active participants in the XSEDE and Science Gateways Community Institute projects, and operate SciGaP services.
Examples of Prior Tutorials: The workshop team has presented similar workshops at PEARC19, PEARC20, PEARC21, e-Science 2021, and Gateways 2019, and Gateways 2020. Recordings are available for tutorials from 2020; see for example https://www.youtube.com/watch?v=CuBvFj194Kg, https://www.youtube.com/watch?v=dKV50B7vPZg, and https://www.youtube.com/watch?v=ES4LNC_j8a0.
Prior tutorial agendas are available from https://cwiki.apache.org/confluence/display/AIRAVATA/Home and https://cwiki.apache.org/confluence/display/CUSTOS/Home. The tutorial proposed here combines and streamlines this previous content and adds new content on MFT, Data Lake, and Jupyter topics.