Apache Airavata

Background :-

There is a need to develop a UI application integrated with Apache Custos as the authentication service to host jupyter notebooks platform on Jetstream that can cater to handle multiple concurrent users at the same time with proper resource isolation, persist user states, and integrated to allow for uploading files from the local and add to the existing services of Apache Airavata as a scientific gateway.

Design/Description of work :-

Platform architecture - Airavata based infrastructure with functionality similar to colab.

  1. UI :- Develop a UI and Integrate the jupyter notebook with cloud service(Jetstream) with a network filesystem(explored Dynamic Partitioning file system.) and along with authentication with Apache Custos. UI will authenticate users with the help of Custos services. The UI will have a login credential form and on successful authentication, a new user session will be created and user will be redirected to the Jupyter Notebook hosted on Kubernetes Jetstream.

  2. Jupyter Hub :- JupyterHub is the best way to serve Jupyter notebook for multiple users. It is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub allows users to interact with a computing environment through a webpage. As most devices have access to a web browser, JupyterHub makes it is easy to provide and standardize the computing environment for a group of people.

  1. Kubernetes Cluster :- For now, we will go for a simple Master Worker nodes setup for a Kubernetes cluster. We need to host the Jupyter Hub containers/pods on our Kubernetes cluster. Each user will be signed onto a container which will be hosting a jupyter notebooks on one of its ports and this will be load balanced with the help of the Kubernetes load balancer service. The UI will be using the K8s cluster through a load balancer which will itself have a number of PODS. Each POD will have a container, which can scale users using namespaces.

  2. Dynamic Volume Provisioning :- Now, to persist the user data, the user work directories will be hosted on the network file system. On a successful connection, the users session pod will be connected to an appropriate volume with the help of Dynamic Volume Provisioning.

Results for the Apache community :-

Scalable colab based application allowing the use of jupyter notebook services in an isolated fashion based on the user authentication based on Custos on Jetstream cloud.

Deliverables and Timelines :-

Task

Timelines

Deliverables

Airavata Architecture (Custos, Integration with Jetstream)

May 27, 2022

  • Understand Apache Airavata end to end architecture.
  • Setup apache airavata locally .
  • Understand the requirements for integration with authentication using Custos on Jetstream cloud.

Creation of UI and Integration with Custos

June 10, 2022

  • Develop user login portal.
  • Integrate with Apache Custos authentication.

Launch and configure K8s cluster

June 24, 2022

  • Launching and configuring multiple nodes in Jetstream
  • Configuring K8s cluster across the multiple nodes in a master worker setup.

Jupyter Hub and Notebook Pod setup

July 1, 2022

  • Installing Jupyter Hub and Jupyter Notebook pods in the K8s cluster.

Dynamic Volume Provisioning

July 18, 2022

  •  User specific volume provisioning to the Jupyter pods.

End to End Testing

August 8, 2022

  •  Develop a test plan
  •  Complete unit and integration testing.

Prepare project report and documentation

August 30, 2022

  • Consolidate the test results, user guideline documentation.
  • Consolidated project report.
  • Consolidated GSOC journey.
  • No labels