Apache Airavata

GSoC Proposal

  • Title

    Disaster management for Custos Kubernetes Resources and Persistent Volumes

  • Background

    Although Custos has a replica of data volumes internally, a backup of the k8s cluster and data volumes to an external location is needed. A good backup and restore strategy will make Custos resilient against outages. It will also help migrate k8s cluster resources and persistent volumes when required.

  • Design/description of work

    • 1. Study Custos deployment architecture

      Following is Custos’ cluster deployment architecture. Our focus is on the databases shown in the bottom half of the image.

      Untitled

      Custos has three types of databases - MySQL (for Microservices), PostgreSQL (for Keycloak services), and Hashicorp DB (for Consul services). These databases are mounted to volumes, and each of them has a master and two secondary worker nodes. A thorough study of the DB deployments is needed to plan out how we can copy data at the filesystem level and store it externally.

      There are two main ways to store backups in the k8s environment:

      1. Use Object store In this case, the backup tools prepare a backup and push that backup into the object store (like AWS S3, Google Cloud Storage, etc.)
      2. Cloud level block storage volume snapshots If we have AWS EBS or DigitalOcean Volumes Block Storage, they will only work in AWS or DigitalOcean environments. In this case, cloud-level volume snapshots should be made for the specific cloud provider.

      For our use case, we need Object Storage.

    • 2. Study existing k8s backup solutions (Velero seems to be the best)

      There are a few k8s backup solutions - I will study all of them, but from initial analysis - Velero seems to be the most promising solution, and I’m tentatively planning the project over its implementation.

    • 3. Build solution to backup databases in K8 cluster The following block diagram below shows a possible solution to backup databases in Custos using Velero. Velero consists of the following components:

      1. A velero server in the cluster
      2. A CLI client
      3. A Restic daemonset in the cluster


      MinIO can be used as the cloud provider in the k8s cluster. It’s an object store that uses S3 compatible API and can be used to store our backed-up resources.

      Velero uses Restic to back up file-system-based Persistent Volumes. It scans the volume directory for its files and then splits those files into blobs that can be sent to MinIO.

    • 4. Integrate solution with existing databases in Custos deployment

      This would require setting up a velero server in a k8s and then testing the three types of databases - MySQL, PostgreSQL, and Hashicorp for disaster management and recovery. Once the tests pass - i.e., we’re able to back up the three types of databases on external storage - we can work on integrating the Velero components to the Custos deployment architecture.

  • Results for the Apache Community

    Even though the project is aimed to develop a database backup solution for Custos, the analysis of existing K8 backup solutions will provide insights on how to backup and restore K8 resources and persistent volumes across Apache Airavata.

    The intent is to formulate a robust backup and restore process that could be used as a reference across the Apache Community.

  • Project timelines


  • Other commitments

    No other commitments

  • Community engagement

    https://issues.apache.org/jira/browse/AIRAVATA-3608

  • No labels