...
Abstract: The authors will provide a hands-on tour of the entire Apache Airavata framework for deploying science gateways. Hands-on demonstrations will illustrate the major components of the framework: the core task and workflow execution management system (including metadata capture); security services including authentication and account management, authorization management, permission management, and resource credential management; Python Django-based Web development and content management for end user environments; and managed file transfer and distributed data management services. We will show Apache Airavata in action through in-operation science gateways created with collaborators in multiple scientific fields that are supporting research and education.
Tutorial Goals and Objectives: The tutorial’s goal Tutorial Description and Goals: In this tutorial, we present the Apache Airavata middleware and the new Django Portal for Airavata, which provides an extensible frontend for building Apache Airavata-based science gateways. The goal of the tutorial is to demonstrate the use of different components of the Apache Airavata as a gateway environment that framework and how these can be used to manage the execution of scientific software on diverse computing resources; to enable users to create, organize, replicate, and share computational experiments; to enable gateway providers to leverage Apache Airavata’s platform services to quickly deploy a science gateway that provides access to XSEDE, university, departmental, and commercial cloud computing resources; and to give gateway providers an environment for creating highly customizable and configurable gateways that can target diverse research groups.
This tutorial will showcase new capabilities not present in prior PEARC tutorials, including extensive customizability of the new Django Portal for Airavata user interface environment, advanced file handling across distributed resources, and standalone security components that can be integrated into a wide variety of gateway platforms.
Relevance to PEARC20: Science gateways provide science-specific user interfaces for scientific applications for end users who are unfamiliar with or need more capabilities than provided by command-line interfaces. Science gateway middleware like Apache Airavata provides the general purpose capabilities behind gateway user interfaces. Gateways are particularly valuable in making the latest software such as machine learning applications available to users and helping users access the newest XSEDE and other resources, assisting researchers and students with a wide variety of skills to “catch the wave” for accessing new scientific software and advanced systems.
Target audiences for this tutorial include a) scientific software developers, who want simplified ways to deliver their software and support larger user communities; b) educators who want to integrate scientific software usage into their classroom without having students get bogged down in the submission mechanisms of specific resources; and c) campus computing center staff, who want to use gateways to broaden their reach beyond their traditional users and to help users make more efficient use of resources.
Format: The format will be a mixture of presentations, demonstrations, and hands-on exercises, as described in the detailed agenda. The agenda indicates the roles of each of the organizers and the nature of each section (presentation, demonstration, exercise, discussion). In summary, the tutorial will be approximately 75% demonstrations and hands-on exercises, and 25% presentations and discussion.
build different types of gateways. These include gateways that focus on task and workflow execution on remote high-performance computing resources; gateways that focus on remote data management; and gateways that concentrate on digital object management, including groups, permissions, and restricted content sharing.
The specific objectives for the attendees are as follows:
Target audience, expected audience, and relevance to PEARC22 attendees: Any interested PEARC22 attendees are welcome. The target audience members are a) scientists and researchers interested in deploying a science gateway or similar cyberinfrastructure to expand access and use of their software, data, or infrastructure; b) science gateway developers; and c) scientific resource system managers and operators who would like to learn more about science gateways. The target audiences represent core constituencies of the PEARC conference series. Based on previous in-person workshops at PEARC, we expect the number of attendees to be approximately 20-25.
Tutorial Length | 180 minutes |
Skill Level | Intermediate |
Prerequisites | Familiarity with the basics of web development, science gateway operations (including cybersecurity), and the basics of running scientific applications on clusters are required for the hands-on exercises. All interested participants are welcome. |
Technology Requirements | Laptop or device with a commonly used Web browser |
Apache Airavata’s metadata and workflow scheduling infrastructure:builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions and to capture the metadata needed to audit and to reproduce execution outcomes.
The Airavata Django Portal:provides an out-of-the-box end-user environment for all of the Apache Airavata middleware subsystems. The Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of diverse research communities.
Airavata Custos:services manage user accounts; provide federated authentication; support role, group, and attribute-based authorization; manage sharing and permissions; and manage resource credential (secrets). Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API.
The Airavata Managed File Transfer (MFT):subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box. Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from the gateway middleware.
Airavata Data Lake: provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable the processing of data, the movement of data from the data sources to central storage points, and the distribution of data to respective authorized users.
Prerequisites: Anyone interested in the topic is welcome to attend. Participants will benefit from prior Prerequisites: Participants should have a general knowledge of how to execute scientific applications on HPC and Cloud environments. Only a A laptop or other device and web browser will be required to follow participate in the demo sessions and hands-on sessions, but optional programming exercises will require a Mac OS X laptop or a laptop with a common Linux distribution, such as Ubuntu. Computations on XSEDE will use the tutorial team’s XSEDE community account allocation.
Special Requirements: the full tutorial assumes network access by the instructors and attendees. If network problems occur, the presenters can give the tutorial in demonstration-only mode.
Instructors and Roles: The tutorial will be offered by leaders of the Apache Airavata project. As a top level Apache Software Foundation project, Airavata follows Apache’s open source governance model and the instructors are all committers (developers), contributors and project management committee members of Airavata. Marlon Pierce will lead the overall tutorial and will introduce science gateways and Airavata. Eroma Abeyasinghe and Marcus Christie will present the hands-on exercises. Sudhakar Pamidighantam will guide hands on examples of Gaussian model execution from a gateway. Suresh Marru will present Airavata architecture and under the hood details. All presenters are actively involved in XSEDE ECSS projects and Science Gateway Community Institute consultations.
. Familiarity with Python, Linux, and Web programming will help with the hands-on exercises but are not required. Attendees will be provided with a sample hosted Airavata Django Portal tenant to hosted SciGaP services and will have access to a virtual cluster on XSEDE’s Jetstream2 and Expanse systems with pre-installed applications.
Tutorial Outline and Schedule: The tutorial will be organized following the outline of Table 2, which is based on successful prior tutorials. All tutorial information will be available from Apache Airavata’s online documentation (https://airavata.readthedocs.io/en/latest/, https://apache-airavata-django-portal.readthedocs.io/en/latest/).
Time | Agenda Item |
9:00 am - 9:15 am Presenter: Suresh Marru | Presentation: Introduction and Overview |
9:15 am - 10:15 am Presenter: Marcus Christie | Airavata Django Portal: Run scientific applications on multiple remote supercomputers
|
10:15 am - 10:30 am Presenter: Suresh Marru | Advanced Examples: a Tour of Airavata-Based Gateways: SEAGrid (Computational Chemistry), SimVascular (Vascular System Modeling), InterACTWEL (Water Management), EMCenter (Scientific Instrument Data Management) |
10:30 am - 11:00 am | Break |
11:00 am - 11:45 am Presenter: Isuru Ranawaka | Airavata Custos: Control permissions on digital objects
|
11:45 am - 12:10 pm Presenter: Dimuthu Wannipurage | Data Lake and Managed File Transfer (MFT)
|
12:10 pm - 12:25 pm Presenter: Dimuthu Wannipurage | Jupyter Labs Kernel Extensions to Apache Airavata
|
12:25 pm - 12:30 | Presentation: Tutorial wrap up with Q & A |
Presenter Information: The presentation will be given by Suresh Marru, Marcus Christie, Isuru Ranawaka, and Dimuthu Wannipurage of the Cyberinfrastructure Integration Research Center (CIRC), part of the Indiana University Pervasive Technology Institute. CIRC members are active participants in the XSEDE and Science Gateways Community Institute projects, and operate SciGaP services.
Examples of Prior Tutorials: The workshop team has presented similar workshops at PEARC19, PEARC20, PEARC21, e-Science 2021, and Gateways 2019, and Gateways 2020. Recordings are available for tutorials from 2020; see for example https://www.youtube.com/watch?v=CuBvFj194Kg, https://www.youtube.com/watch?v=dKV50B7vPZg, and https://www.youtube.com/watch?v=ES4LNC_j8a0.
Prior tutorial agendas are available from Recent Tutorial Offerings: This half-day tutorial will build on XSEDE14, XSEDE15, XSEDE16, PEARC17, PEARC19, and Gateways 2019 tutorials. Extensive tutorial material is available from https://cwiki.apache.org/confluence/display/AIRAVATA/PEARC+2019+Tutorial+Agenda, https://cwiki.apache.org/confluence/display/AIRAVATA/Gateways19+Tutorial+Agenda, Home and https://airavatacwiki.readthedocs.io; see also https://courses.airavata.org. Previous tutorials have had 20-25 attendees eachapache.org/confluence/display/CUSTOS/Home. The tutorial proposed here combines and streamlines this previous content and adds new content on MFT, Data Lake, and Jupyter topics.