The following diagram depicts the deployment architecture highlevel components of the message flow of the Custos. The Custos services are deployed on a three5-node K8 cluster and exposed to the external traffic via a K8 Ingress-Controller. The Ingress-Controller is fronted by Nginx Reverse Proxy and the whole cluster is protected through a Firewall.
The following sections describe the fresh deployment of the Custos services on a K8 Cluster.
Set up K8 Cluster
Prerequisites
- Three ubuntu VMs
- Setup SSH Keys on the local machine and remote ubuntu VMs [1]
- Setup Ansible on local machine [2]
- Basic understanding of Docker and its concepts.
Step 1
We need to create an Ansible playbook to execute the K8 deployment setup. Hence, create a local working directory and create a hosts file that contains IPs of Master and Worker nodes of the K8 Cluster.
...
Next, configure access privileges for non-root users. To this end, create a file with the following content.
...
K8 Cluster Deployment Architecture
Custos K8 cluster is a rancher bootstrapped cluster. And has following configurations.
- Two master nodes
- Four worker nodes (including master which has two worker nodes)
Data Replicas
Custos has three types of databases
- MySQl database for Custos microsevices
- PostgreSQL database for Keycloak services.
- Harshicorp database for Consul services.
- External Kafka service for event based message delivering.
Each database is mounted to volumes as shown in the above figure and has master and secondary deployments to make them highly available. In addition, underlying docker services are taken away from root data structure and mounted to a separate volume.
External Data Backups (Pending)
Although, we have a replica of data volumes internally, we take backups of K8 cluster and data volumes to external location to bootstrap new cluster from a backup in any case if underlying infrastructure is not available.We use
velero to implement automatic backups along with restic plugin.
The following sections describe the fresh deployment of the Custos services on a K8 Cluster.
Set up K8 Cluster
Prerequisites
- Seven ubuntu VMs
- Setup SSH Keys on the local machine and remote ubuntu VMs [1]
- Basic understanding of Docker, Kubernetes and its concepts.
Step 1 : Installing Rancher
- Install Rancher 2.5.5 or above on one of the VMs
- Spin Up a VM on JS2
- use https://github.com/CloudVE/cloudman-boot to boostrap a K8 cluster for rancher
- Login to that node
- helm repo add rancherhttps://releases.rancher.com/server-charts/stable
- kubectl create namespace cattle-system
- helm repo update
- helm install -n cattle-system rancher rancher/rancher --set hostname=HOSTNAME --set ingress.tls.source=letsEncrypt --set letsEncrypt.email="admin@cloudve.org" --set letsEncrypt.environment="production" --set letsEncrypt.ingress.class=nginx Step 2 : Bootstrap K8 Cluster on Bare Metals Servers With Rancher
Step 3
- Login to the Custos Master Node
- Create Namespaces : Custos, Keycloak, Vault
- Install helm3
- Deploy Cert-Manager
- https://cert-manager.io/docs/installation/kubernetes/
Create ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: email@iu.edu
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource that will be used to store the account's private key.
name: acme-prod-private-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
class: nginx
- Login to Keycloak
- Deploy Postgresql
- https://github.com/bitnami/charts/tree/master/bitnami/postgresql
- Create three PVs
PV config
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume1
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 28Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/bitnami/postgresql"
- helm install keycloak-db-postgresql bitnami/postgresql -f values.yaml -n keycloak
- values.yaml
- https://github.com/bitnami/charts/tree/master/bitnami/postgresql
- Deploy Postgresql
# Global Docker image parameters ## Please, note that this will override the image parameters, including dependencies, configured to use the global value ## Current available global Docker image parameters: imageRegistry and imagePullSecrets ## global: postgresql: {} # imageRegistry: myRegistryName # imagePullSecrets: # - myRegistryKeySecretName # storageClass: myStorageClass ## Bitnami PostgreSQL image version ## ref: https://hub.docker.com/r/bitnami/postgresql/tags/ ## image: registry: docker.io repository: bitnami/postgresql tag: 11.11.0-debian-10-r50 ## Specify a imagePullPolicy ## Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent' ## ref: http://kubernetes.io/docs/user-guide/images/#pre-pulling-images ## pullPolicy: IfNotPresent ## Optionally specify an array of imagePullSecrets. ## Secrets must be manually created in the namespace. ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ ## # pullSecrets: # - myRegistryKeySecretName ## Set to true if you would like to see extra information on logs ## It turns BASH and/or NAMI debugging in the image ## debug: false ## String to partially override common.names.fullname template (will maintain the release name) ## # nameOverride: ## String to fully override common.names.fullname template ## # fullnameOverride: ## ## Init containers parameters: ## volumePermissions: Change the owner of the persist volume mountpoint to RunAsUser:fsGroup ## volumePermissions: enabled: true image: registry: docker.io repository: bitnami/bitnami-shell tag: "10" ## Specify a imagePullPolicy ## Defaults to 'Always' if image tag is 'latest', else set to 'IfNotPresent' ## ref: http://kubernetes.io/docs/user-guide/images/#pre-pulling-images ## pullPolicy: Always ## Optionally specify an array of imagePullSecrets. ## Secrets must be manually created in the namespace. ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ ## # pullSecrets: # - myRegistryKeySecretName ## Init container Security Context ## Note: the chown of the data folder is done to securityContext.runAsUser ## and not the below volumePermissions.securityContext.runAsUser ## When runAsUser is set to special value "auto", init container will try to chwon the ## data folder to autodetermined user&group, using commands: `id -u`:`id -G | cut -d" " -f2` ## "auto" is especially useful for OpenShift which has scc with dynamic userids (and 0 is not allowed). ## You may want to use this volumePermissions.securityContext.runAsUser="auto" in combination with ## pod securityContext.enabled=false and shmVolume.chmod.enabled=false ## securityContext: runAsUser: 0 ## Use an alternate scheduler, e.g. "stork". ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/ ## # schedulerName: ## Pod Security Context ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ## securityContext: enabled: true fsGroup: 1001 ## Container Security Context ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ## containerSecurityContext: enabled: true runAsUser: 1001 ## Pod Service Account ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ ## serviceAccount: enabled: false ## Name of an already existing service account. Setting this value disables the automatic service account creation. # name: ## Pod Security Policy ## ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/ ## psp: create: false ## Creates role for ServiceAccount ## Required for PSP ## rbac: create: false replication: enabled: true user: CHANGEME password: CHANGEME readReplicas: 2 ## Set synchronous commit mode: on, off, remote_apply, remote_write and local ## ref: https://www.postgresql.org/docs/9.6/runtime-config-wal.html#GUC-WAL-LEVEL synchronousCommit: 'on' ## From the number of `readReplicas` defined above, set the number of those that will have synchronous replication ## NOTE: It cannot be > readReplicas numSynchronousReplicas: 2 ## Replication Cluster application name. Useful for defining multiple replication policies ## applicationName: postgres_application ## PostgreSQL admin password (used when `postgresqlUsername` is not `postgres`) ## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md#creating-a-database-user-on-first-run (see note!) # postgresqlPostgresPassword: ## PostgreSQL user (has superuser privileges if username is `postgres`) ## postgresqlUsername: CHANGEME ## PostgreSQL CHANGEME ## postgresqlPassword: CHANGEME ## PostgreSQL password using existing secret ## existingSecret: secret ## ## Mount PostgreSQL secret as a file instead of passing environment variable # usePasswordFile: false ## Create a database ## postgresqlDatabase: postgresDB ## PostgreSQL data dir ## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md ## postgresqlDataDir: /bitnami/postgresql/data ## An array to add extra environment variables ## For example: ## extraEnv: ## - name: FOO ## value: "bar" ## # extraEnv: extraEnv: [] ## Name of a ConfigMap containing extra env vars ## # extraEnvVarsCM: ## Specify extra initdb args ## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md ## # postgresqlInitdbArgs: ## Specify a custom location for the PostgreSQL transaction log ## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md ## # postgresqlInitdbWalDir: ## PostgreSQL configuration ## Specify runtime configuration parameters as a dict, using camelCase, e.g. ## {"sharedBuffers": "500MB"} ## Alternatively, you can put your postgresql.conf under the files/ directory ## ref: https://www.postgresql.org/docs/current/static/runtime-config.html ## # postgresqlConfiguration: ## PostgreSQL extended configuration ## As above, but _appended_ to the main configuration ## Alternatively, you can put your *.conf under the files/conf.d/ directory ## # postgresqlExtendedConf: ## Configure current cluster's primary server to be the standby server in other cluster. ## This will allow cross cluster replication and provide cross cluster high availability. ## You will need to configure pgHbaConfiguration if you want to enable this feature with local cluster replication enabled. ## primaryAsStandBy: enabled: false # primaryHost: # primaryPort: ## PostgreSQL client authentication configuration ## Specify content for pg_hba.conf ## Default: do not create pg_hba.conf ## Alternatively, you can put your pg_hba.conf under the files/ directory # pgHbaConfiguration: |- # local all all trust # host all all localhost trust # host mydatabase mysuser 192.168.0.0/24 md5 ## ConfigMap with PostgreSQL configuration ## NOTE: This will override postgresqlConfiguration and pgHbaConfiguration # configurationConfigMap: ## ConfigMap with PostgreSQL extended configuration # extendedConfConfigMap: ## initdb scripts ## Specify dictionary of scripts to be run at first boot ## Alternatively, you can put your scripts under the files/docker-entrypoint-initdb.d directory ## # initdbScripts: # my_init_script.sh: | # #!/bin/sh # echo "Do something." ## ConfigMap with scripts to be run at first boot ## NOTE: This will override initdbScripts # initdbScriptsConfigMap: ## Secret with scripts to be run at first boot (in case it contains sensitive information) ## NOTE: This can work along initdbScripts or initdbScriptsConfigMap # initdbScriptsSecret: ## Specify the PostgreSQL username and password to execute the initdb scripts # initdbUser: # initdbPassword: ## Audit settings ## https://github.com/bitnami/bitnami-docker-postgresql#auditing ## audit: ## Log client hostnames ## logHostname: false ## Log connections to the server ## logConnections: false ## Log disconnections ## logDisconnections: false ## Operation to audit using pgAudit (default if not set) ## pgAuditLog: "" ## Log catalog using pgAudit ## pgAuditLogCatalog: "off" ## Log level for clients ## clientMinMessages: error ## Template for log line prefix (default if not set) ## logLinePrefix: "" ## Log timezone ## logTimezone: "" ## Shared preload libraries ## postgresqlSharedPreloadLibraries: "pgaudit" ## Maximum total connections ## postgresqlMaxConnections: ## Maximum connections for the postgres user ## postgresqlPostgresConnectionLimit: ## Maximum connections for the created user ## postgresqlDbUserConnectionLimit: ## TCP keepalives interval ## postgresqlTcpKeepalivesInterval: ## TCP keepalives idle ## postgresqlTcpKeepalivesIdle: ## TCP keepalives count ## postgresqlTcpKeepalivesCount: ## Statement timeout ## postgresqlStatementTimeout: ## Remove pg_hba.conf lines with the following comma-separated patterns ## (cannot be used with custom pg_hba.conf) ## postgresqlPghbaRemoveFilters: ## Optional duration in seconds the pod needs to terminate gracefully. ## ref: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods ## # terminationGracePeriodSeconds: 30 ## LDAP configuration ## ldap: enabled: false url: '' server: '' port: '' prefix: '' suffix: '' baseDN: '' bindDN: '' bind_password: search_attr: '' search_filter: '' scheme: '' tls: '' ## PostgreSQL service configuration ## service: ## PosgresSQL service type ## type: ClusterIP # clusterIP: None port: 5432 ## Specify the nodePort value for the LoadBalancer and NodePort service types. ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport ## # nodePort: ## Provide any additional annotations which may be required. Evaluated as a template. ## annotations: {} ## Set the LoadBalancer service type to internal only. ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer ## # loadBalancerIP: ## Load Balancer sources. Evaluated as a template. ## # loadBalancerSourceRanges: # - 10.10.10.0/24 ## Start primary and read(s) pod(s) without limitations on shm memory. ## By default docker and containerd (and possibly other container runtimes) ## limit `/dev/shm` to `64M` (see e.g. the ## [docker issue](https://github.com/docker-library/postgres/issues/416) and the ## [containerd issue](https://github.com/containerd/containerd/issues/3654), ## which could be not enough if PostgreSQL uses parallel workers heavily. ## shmVolume: ## Set `shmVolume.enabled` to `true` to mount a new tmpfs volume to remove ## this limitation. ## enabled: true ## Set to `true` to `chmod 777 /dev/shm` on a initContainer. ## This option is ignored if `volumePermissions.enabled` is `false` ## chmod: enabled: true ## PostgreSQL data Persistent Volume Storage Class ## If defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## persistence: enabled: true ## A manually managed Persistent Volume and Claim ## If defined, PVC must be created manually before volume will be bound ## The value is evaluated as a template, so, for example, the name can depend on .Release or .Chart ## # existingClaim: ## The path the volume will be mounted at, useful when using different ## PostgreSQL images. ## mountPath: /bitnami/postgresql ## The subdirectory of the volume to mount to, useful in dev environments ## and one PV for multiple services. ## subPath: '' storageClass: manual accessModes: - ReadWriteOnce size: 8Gi annotations: {} ## selector can be used to match an existing PersistentVolume ## selector: ## matchLabels: ## app: my-app selector: {} ## updateStrategy for PostgreSQL StatefulSet and its reads StatefulSets ## ref: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies ## updateStrategy: type: RollingUpdate ## ## PostgreSQL Primary parameters ## primary: ## PostgreSQL Primary pod affinity preset ## Allowed values: soft, hard ## podAffinityPreset: "" ## PostgreSQL Primary pod anti-affinity preset ## Allowed values: soft, hard ## podAntiAffinityPreset: soft ## PostgreSQL Primary node affinity preset ## ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity ## Allowed values: soft, hard ## nodeAffinityPreset: ## Node affinity type ## Allowed values: soft, hard type: soft ## Node label key to match ## E.g. ## key: "kubernetes.io/e2e-az-name" ## key: "postgresql.role" ## Node label values to match ## E.g. ## values: ## - e2e-az1 ## - e2e-az2 ## values: - primary ## Affinity for PostgreSQL primary pods assignment ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity ## Note: primary.podAffinityPreset, primary.podAntiAffinityPreset, and primary.nodeAffinityPreset will be ignored when it's set ## affinity: {} ## Node labels for PostgreSQL primary pods assignment ## ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: {} ## Tolerations for PostgreSQL primary pods assignment ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ ## tolerations: [] labels: {} annotations: {} podLabels: {} podAnnotations: {} priorityClassName: '' ## Extra init containers ## Example ## ## extraInitContainers: ## - name: do-something ## image: busybox ## command: ['do', 'something'] ## extraInitContainers: [] ## Additional PostgreSQL primary Volume mounts ## extraVolumeMounts: [] ## Additional PostgreSQL primary Volumes ## extraVolumes: [] ## Add sidecars to the pod ## ## For example: ## sidecars: ## - name: your-image-name ## image: your-image ## imagePullPolicy: Always ## ports: ## - name: portname ## containerPort: 1234 ## sidecars: [] ## Override the service configuration for primary ## service: {} # type: # nodePort: # clusterIP: ## ## PostgreSQL read only replica parameters ## readReplicas: ## PostgreSQL read only pod affinity preset ## Allowed values: soft, hard ## podAffinityPreset: "" ## PostgreSQL read only pod anti-affinity preset ## Allowed values: soft, hard ## podAntiAffinityPreset: soft ## PostgreSQL read only node affinity preset ## ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity ## Allowed values: soft, hard ## nodeAffinityPreset: ## Node affinity type ## Allowed values: soft, hard type: soft ## Node label key to match ## E.g. ## key: "kubernetes.io/e2e-az-name" ## key: "postgresql.role" ## Node label values to match ## E.g. ## values: ## - e2e-az1 ## - e2e-az2 ## values: - slave ## Affinity for PostgreSQL read only pods assignment ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity ## Note: readReplicas.podAffinityPreset, readReplicas.podAntiAffinityPreset, and readReplicas.nodeAffinityPreset will be ignored when it's set ## affinity: {} ## Node labels for PostgreSQL read only pods assignment ## ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: {} ## Tolerations for PostgreSQL read only pods assignment ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ ## tolerations: [] labels: {} annotations: {} podLabels: {} podAnnotations: {} priorityClassName: '' ## Extra init containers ## Example ## ## extraInitContainers: ## - name: do-something ## image: busybox ## command: ['do', 'something'] ## extraInitContainers: [] ## Additional PostgreSQL read replicas Volume mounts ## extraVolumeMounts: [] ## Additional PostgreSQL read replicas Volumes ## extraVolumes: [] ## Add sidecars to the pod ## ## For example: ## sidecars: ## - name: your-image-name ## image: your-image ## imagePullPolicy: Always ## ports: ## - name: portname ## containerPort: 1234 ## sidecars: [] ## Override the service configuration for read ## service: {} # type: # nodePort: # clusterIP: ## Whether to enable PostgreSQL read replicas data Persistent ## persistence: enabled: true # Override the resource configuration for read replicas resources: {} # requests: # memory: 256Mi # cpu: 250m ## Configure resource requests and limits ## ref: http://kubernetes.io/docs/user-guide/compute-resources/ ## resources: requests: memory: 512Mi cpu: 250m ## Add annotations to all the deployed resources ## commonAnnotations: {} networkPolicy: ## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now. ## enabled: false ## The Policy model to apply. When set to false, only pods with the correct ## client label will have network access to the port PostgreSQL is listening ## on. When true, PostgreSQL will accept connections from any source ## (with the correct destination port). ## allowExternal: true ## if explicitNamespacesSelector is missing or set to {}, only client Pods that are in the networkPolicy's namespace ## and that match other criteria, the ones that have the good label, can reach the DB. ## But sometimes, we want the DB to be accessible to clients from other namespaces, in this case, we can use this ## LabelSelector to select these namespaces, note that the networkPolicy's namespace should also be explicitly added. ## ## Example: ## explicitNamespacesSelector: ## matchLabels: ## role: frontend ## matchExpressions: ## - {key: role, operator: In, values: [frontend]} ## explicitNamespacesSelector: {} ## Configure extra options for startup, liveness and readiness probes ## startupProbe: enabled: false initialDelaySeconds: 30 periodSeconds: 15 timeoutSeconds: 5 failureThreshold: 10 successThreshold: 1 livenessProbe: enabled: true initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 successThreshold: 1 readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 successThreshold: 1 ## Custom Startup probe ## customStartupProbe: {} ## Custom Liveness probe ## customLivenessProbe: {} ## Custom Rediness probe ## customReadinessProbe: {} ## ## TLS configuration ## tls: # Enable TLS traffic enabled: false # # Whether to use the server's TLS cipher preferences rather than the client's. preferServerCiphers: true # # Name of the Secret that contains the certificates certificatesSecret: '' # # Certificate filename certFilename: '' # # Certificate Key filename certKeyFilename: '' # # CA Certificate filename # If provided, PostgreSQL will authenticate TLS/SSL clients by requesting them a certificate # ref: https://www.postgresql.org/docs/9.6/auth-methods.html certCAFilename: # # File containing a Certificate Revocation List crlFilename: ## Configure metrics exporter ## metrics: enabled: false # resources: {} service: type: ClusterIP annotations: prometheus.io/scrape: 'true' prometheus.io/port: '9187' loadBalancerIP: serviceMonitor: enabled: false additionalLabels: {} # namespace: monitoring # interval: 30s # scrapeTimeout: 10s ## Custom PrometheusRule to be defined ## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart ## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions ## prometheusRule: enabled: false additionalLabels: {} namespace: '' ## These are just examples rules, please adapt them to your needs. ## Make sure to constraint the rules to the current postgresql service. ## rules: ## - alert: HugeReplicationLag ## expr: pg_replication_lag{service="{{ template "common.names.fullname" . }}-metrics"} / 3600 > 1 ## for: 1m ## labels: ## severity: critical ## annotations: ## description: replication for {{ template "common.names.fullname" . }} PostgreSQL is lagging by {{ "{{ $value }}" }} hour(s). ## summary: PostgreSQL replication is lagging by {{ "{{ $value }}" }} hour(s). ## rules: [] image: registry: docker.io repository: bitnami/postgres-exporter tag: 0.9.0-debian-10-r23 pullPolicy: IfNotPresent ## Optionally specify an array of imagePullSecrets. ## Secrets must be manually created in the namespace. ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ ## # pullSecrets: # - myRegistryKeySecretName ## Define additional custom metrics ## ref: https://github.com/wrouesnel/postgres_exporter#adding-new-metrics-via-a-config-file # customMetrics: # pg_database: # query: "SELECT d.datname AS name, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE 0 END AS size_bytes FROM pg_catalog.pg_database d where datname not in ('template0', 'template1', 'postgres')" # metrics: # - name: # usage: "LABEL" # description: "Name of the database" # - size_bytes: # usage: "GAUGE" # description: "Size of the database in bytes" # ## An array to add extra env vars to configure postgres-exporter ## see: https://github.com/wrouesnel/postgres_exporter#environment-variables ## For example: # extraEnvVars: # - name: PG_EXPORTER_DISABLE_DEFAULT_METRICS # value: "true" extraEnvVars: [] ## Pod Security Context ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ ## securityContext: enabled: false runAsUser: 1001 ## Configure extra options for liveness and readiness probes ## livenessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 successThreshold: 1 readinessProbe: enabled: true initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 successThreshold: 1 ## Array with extra yaml to deploy with the chart. Evaluated as a template ## extraDeploy: [] |
---|
- Deploy Keycloak
- Deploy OLM
- https://operatorhub.io/how-to-install-an-operator
- https://github.com/operator-framework/operator-lifecycle-manager/issues/854
- Deploy keycloak operator
- make cluster/prepare
- kubectl apply -f deploy/operator.yaml -n keycloak
- You might need to edit roles to add apiGroup networking.k8s.io and give permissions to ingresses
- kubectl edit roles keycloak-operator -n keycloak
- Deploy keycloak
- kubectl apply -f keycloak-db-secret.yaml -n keycloak
- kubectl apply -f custos-keycloak.yaml -n keycloak
- Deploy ingress controller in ingress-nginx namespace
Deploy Hashicorp vault on vault namespace
Deploy consul
Attach volumes
helm install consul hashicorp/consul --version 0.31.1 -n vault
Deploy Vault
- helm install vault hashicorp/vault --namespace vault -f vaules.yaml --version 0.10.0
Create custos namespace
Deploy bitnami mysql
helm install mysql bitnami/mysql -f values_updated.yaml -n custos --version 8.8.8
Deploy custos services
-------OLD___SETUP--------
This will create non-root user "ubuntu" on remote servers and configure to access them using SSH public keys. Next, execute the above script using
ansible-playbook -i hosts config.yml
Step 3
Next, install K8 dependencies using the following Ansible scripts. This installs Docker, Transports, Kubelet, Kubeadm, and Kubectl on Ubuntu VMs.
...
Step 4
Next, we can set-up the master node using the following script.
...
The Cert-Manager component is used for the auto renewal of Let’s Encrypt certificates[6]. This will install Cert-Manager into on the K8 Cluster. Nextly, Certificate Issuer is needed to be configured. For that, we have used the following configuration.
...
According to the above configuration, Let’ Encrypt will use the HTTP challenge. So port 80 should be opened, and traffic should be directed to K8 Cluster. If NodePort is used, ensure that traffic is reachable from outside through the standard ports. (configure reverse proxy)
...
We have used Flutend, ElasticSearch, and Kibana to pull logs from all nodes. Flutend pulls all logs from nodes and stores them in ES storage. Kibana is pointed to ES as a dashboard and ES fetches all logs collected by Flutend. This logging stack is installed in on kube-logging namespace.
Install Linkerd Service Mesh
We are using Linkerd Service Mesh[8] to enable SSL for internal inter-service communication and as a dashboard for services. The Following configuration will expose the Linkerd dashboard to the outside environment.
apiVersion: v1 |
...
The Keycloak[9] is used as our main IDP implementation. The following diagram depicts the architecture of the Keycloak and the Keycloak pods are connected to PostgreSQL Database for persistence.
Install Vault
The HashiCorp Vault is used as our secret storage. It ensures the security of secrets. The following diagram depicts the architecture of the Vault deployment.
We have used [11,12,13,14] for deployment instructions.
Install Vault
The HashiCorp Vault is used as our secret storage. It ensures the security of secrets. The following diagram depicts the architecture of the Vault deployment.
...
We have used [11,12,13,14] for deployment instructions.
Install Custos Services
The Custos source code is hosted openly at github[16]. Active developments are going on the development branch and the master branch contains the latest released source code.
Prerequisites
- Install Java11 and Maven 3.6.X
Checkout source code
git clone https://github.com/apache/airavata-custos.git
Build Source code
mvn clean install
This will build the source code and create docker images, helm charts to be deployed in K8 cluster and Java artifacts. Helm charts are created at the "target/helm" path. To publish docker images to the docker repository the use following command.
mvn dockerfile:push
Use the below commands to install the Custos service using helm charts.
helm install --name service_name --namespace custos chart_name
To upgrade an existing service
helm upgrade --name service_name --namespace custos chart_name
Troubleshooting
Install Custos Services
The Custos source code is hosted openly at github[16]. Active developments are going on the development branch and the master branch contains the latest released source code.
Prerequisites
- Install Java11 and Maven 3.6.X
Checkout source code
git clone https://github.com/apache/airavata-custos.git
Build Source code
mvn clean install
This will build the source code and create docker images, helm charts to be deployed on K8 cluster and Java artifacts. Helm charts are created at the "target/helm" path. To publish docker images to the docker repository then use following command.
mvn dockerfile:push
Use the below commands to install the Custos service using helm charts.
helm install --name service_name --namespace custos chart_name
To upgrade an existing service
helm upgrade --name service_name --namespace custos chart_name
The following diagram depicts the current snapshot of Custos services on the K8 Cluster.
Troubleshooting
The main areas that you might need to troubleshoot are the Custos services and Databases. Troubleshooting Custos services are straightforward, you can check the logs from the Kibana server for the Custos namespace, or you can directly log into the particular Pod and check for console logs.
Troubleshooting Services
- check all services are up and running by login to linkered service mesh.
- login to custos master VM
- check all nodes are in ready state.
- kubectl get all nodes
- check all services are up and running
- kubectl get all -n custos
- if not try to redeploy services
- helm delete --purge <<service_name>>
- helm install --name service_name artifacts/<<artifacts_name>> -n custos
Troubleshooting
The main areas that you might need to troubleshoot are the Custos services and Databases. Troubleshooting Custos services are very easy, you can check the logs fom Kibana server related to the Custos namespace, or you can directly log into the Pod relevant to the service and check for console logs.
Troubleshooting Databases
First, check logs from the Kibana dashboard, or login login into the Master and the Slave nodes pods and check for console logs. Most probably it should print errors.
Steps to replace a database or migrate a database
- First login to Slave node and stop Slave replication thread using
...