Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

Services like Kubernetes, Cloud Foundry, DBaaS require integration support from underlying cloudstack. This support includes Grouping Vms, Scaling, Monitoring. Rather than making changes every time to support various services in ACS, a generic framework has to be developed.

Not only but predominantly container Container technologies are gaining quite a momentum and changing the way how application are traditionally deployed in the public and private clouds. Gaining interest in micro services based architecture is also fostering adaption adooption of container technologies. Like Much like how cloud orchestration platforms enabled enable the provisioning of VM's and adjunct adjacent services, container orchestration platforms like Kubernetes [3], docker swarm [1], mesos [2] are emerging to enable orchestration of containers. Container orchestration platforms typically can be run any where and can be used to provision containers. A popular choice of running containers has been running them on the IAAS provisioned VM's. AWS and GCE provides provide native functionality to launch containers abstracting out the underlying consumption of VM's. A container orchestration platform can be provisioned on top of CloudStack using develop development tools, for e.g (see [6]), but they are not an out of the box solution. Given the momentum of container technologies, miromicro-services etc it make sense to provide a native functionality in CloudStack which is available out-of-the-box for users.

Purpose

Purpose of this document is present the functional requirements for supporting native generic vm cluster service functionality in CloudStack to provision containers and detail design aspects of how the functionality will be achieved.

Scope

Glossary

Node - Vm in CloudStack

Machine cluster - a managed group of VMs in CloudStack

DBaaS - Database as a Service

IaaS - Infrastructure as a service

PaaS - Platform as a serviceScope of this proposal is limited to using kubernetes a container cluster manager.

Functional specification

Container Cluster

CloudStack container vm cluster service shall introduce the notion of container machine cluster. A 'container machine cluster' shall be first class CloudStack entity that will be a composite of existing CloudStack entities like virtual machines, network, network rules etc. Container

The machine cluster service shall stitch together container cluster resources, and deploys the chosen cluster manager like Kubernetes, Mesos, docker swarm etc to provide a container the managers service type, like AWS ECS, Google container service etc to the CloudStack users.

Cluster life-cycle management

Container service shall provide following container cluster life-cycle operations. 

  • create container machine cluster: provision container cluster resourceresources, and brings the container cluster in to operational readiness state to launch containers. Resources provisioned shall depend on the cluster manager used. all All the cluster VM's shall be launched in to a dedicated network for the cluster. API end point of cluster manager shall be exposed through creating port forwarding rule on source nat ip of the network dedicated for the cluster.
  • delete container machine cluster: destroy all the resources provisioned for the container machine cluster. Post delete, a container machine cluster can not be performed any operations on it.
  • start container machine cluster: Starting a cluster will start the VM's and possibly start the network.
  • stop container machine cluster: Stopping a cluster will shutdown all the resources consumed by the container machine cluster. user User can start the cluster at a later point with Start operation.
  • recovering a cluster: Due to possible faults (like VM's that got stopped due to failures, or malfunctioning cluster manager etc) container machine cluster can end up in Alert state. Recover is used to revive container machine cluster to a sane running state.
  • cluster resizing (scale-in/out): increase or decrease the size of the cluster
  • list container machine cluster: list all the container machine clusters

provisioning

...

service orchestrator

As part of container cluster creation, container service machine cluster shall be responsible for setting up control plane of container orchestrator that was choosenservice type that was chosen. How a container orchestrator service will be setup is dependent on the choosen orchestrator. chosen service type.

Design

Api

API changes

Following API shall be introduced with container servicemachine cluster:

  • createContainerClustercreateMachineCluster
    • name: name of container the machine cluster
    • description: description of container machine cluster
    • type: service type - Kubernetes, CloudFoundry, Mesos etc
    • zoneid: uuid of the zone in which container machine cluster will be provisioned
    • serviceofferingid: service offering with which cluster VM's shall be provisioned
    • cluster: size of the cluster or number of VM's to be provisioned
    • accountname: account for which container machine cluster shall be created
    • domainid: domain of the account the account for which container machine cluster shall be created
    • networkid: uuid of the network in to which container machine cluster VM's will be provisioned. If not specified container cluster service shall provision a new isolated network with default isolated network offering with source nat service.
  • deleteMachineCluster
    • id: uuid of machine cluster
  • startMachineCluster
    • id: uuid of machine cluster
  • stopMachineCluster
    • id: uuid of machine cluster
  • addNodeToCluster (Not planned yet)deleteContainerCluster
    • id: uuid of container machine cluster
    startContainerCluster
  • removeNodeFromCluster (not planned yet)
    • id: uuid of the node
    • clusterid: uuid of machine container cluster
  • listMachineClustersstopContainerCluster
    • id: uuid of container machine cluster
    listContainerCluster
  • listClusterNodes
    • id: uuid of container machine cluster

 

New reponse 'containerclusterreponsemachineclusterreponse' shall be added with below details:

  • name
  • description
  • zoneid
  • serviceofferingid
  • networkid
  • clustersize
  • endpoint: URL of the container machine cluster manger API server endpoint endpoint

Life cycle operations

Each of the life cycle operation is a workflow resulting in either provisioning or deleting multiple CloudStack resources. There is no guarantee a workflow of a life cycle operation will succeed due to lack of 2PC like model of resource reservation followed by provisioning semantics. Also there is no guarantee rollback getting succeeded. For e.g. while provisioning a cluster of size 10 VM's, deployment may run out of capacity to provision any more VM's after provisioning 5 Vm's . In which case as rollback provisioned VM's can be destroyed. But there can be cases where deleting a provisioned VM is not possible temporarily like disconnected hosts etc. So its not possible to achieve strong consistency.

...

  • Do a best effort rollback for a life cycle operation in case of failure
  • In case rollback fails, have reconciliation mechanisms that will ensure eventual consistency

Below The below state machine reflects how container a machine cluster state transitions for each of life cycle oerationsoperations

Image Modified

 

Garbage collection

garbage Garbage collection shall be implemented as a way to clean up the resources of container machine cluster, as a background task. Following are cases where cluster resources are freed up.

  • Starting container machine cluster fails, resulting in clean up of the provisioned resources (Starting → Expunging → Destroyed)
  • deleting container Deleting machine cluster (Stopped→ Expunging → Destroyed and Alert→ Expunging → Destroyed )

If there is are failures in cleaning up resources, and clean up can not proceed, the state of container the machine cluster is marked in as 'Expunge' state from instead of 'Expunging' state.   Garbage The garbage collector will loop through the list of container machine clusters in 'Expunge' state periodically and try to free the resources held by container machine cluster.

Cluster state synchronization

State of the container machine cluster is 'desired state' of the cluster as intended by the user or what the system's logical view of the container machine cluster. However there are various scenarios where desired state of the container machine cluster is not sync with state that can be inferred from actual physical/infrastructure. For e.g a container machine cluster in 'Running' state with cluster size of 10 VM's all in running state. Its possible due to host failures, some of the VM's may get stopped at later point. Now the desired state of the container machine cluster is a cluster with 10 VM's running and in operationally ready state (w.r.t to container provisioning), but the resource layer is state is different. So we need a mechanism to ensure:

  • cluster is in desired state at resource/infrastructure layer. Which could mean provision new VM's or delete VM's, in the cluster etc to ensure desired state of the container machine cluster
  • Conversely when reconciliation can not happen reflect the state of the cluster accordingly, and to recover at later point.

Following mechanism will be implemented.

  • A state 'Alert' will be maintained that indicates container machine cluster is not in its desired state.
  • A state synchronization background task will run periodically to infer if the cluster is in desired state. If not cluster will marked as alert state.
  • A recovery action try to recover the cluster

State transitions in FSM, where a container machine cluster ends up in 'Alert' state:

  • failure in middle of scale in/out, resulting in cluster size (# of VM's) not equal to the expected.
  • failure in stopping a cluster, leaving some VM's to be running state.
  • Difference of states as detected by the state synchronization thread.

example provisioning kubernetes container cluster manager

Core OS template shall be used to provision container cluster VM. Setting up a cluster VM as master/node of kubernetes is done through cloud-config script [7] in CoreOS. CloudStack shall pass necessary cloud config script as base 64 encoded user data. Once Core OS instances are launched by CloudStack, by virtue of cloud-config data passed as user data, core OS instances self-configures as kubernetes master and node VM's

schema changes

 

Code Block
languagesql
CREATE TABLE IF NOT EXISTS `cloud`.`container`machine_cluster` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `uuid` varchar(40),
    `name` varchar(255) NOT NULL,
    `description` varchar(4096) COMMENT 'display text for this containermachine cluster',
    `zone_id` bigint unsigned NOT NULL COMMENT 'zone id',
    `service_offering_id` bigint unsigned COMMENT 'service offering id for the cluster VM',
    `template_id` bigint unsigned COMMENT 'vm_template.id',
    `network_id` bigint unsigned COMMENT 'network this containermachine cluster uses',
    `node_count` bigint NOT NULL default '0',
    `account_id` bigint unsigned NOT NULL COMMENT 'owner of this cluster',
    `domain_id` bigint unsigned NOT NULL COMMENT 'owner of this cluster',
    `state` char(32) NOT NULL COMMENT 'current state of this cluster',
    `key_pair` varchar(40),
    `cores` bigint unsigned NOT NULL COMMENT 'number of cores',
    `memory` bigint unsigned NOT NULL COMMENT 'total memory',
    `endpoint` varchar(255) COMMENT 'url endpoint of the containermachine cluster manager api access',
    `console_endpoint` varchar(255) COMMENT 'url for the containermachine cluster manager dashbaord',
    `created` datetime NOT NULL COMMENT 'date created',
    `removed` datetime COMMENT 'date removed if not null',
    `gc` tinyint unsigned NOT NULL DEFAULT 1 COMMENT 'gc this containermachine cluster or not',
    CONSTRAINT `fk_cluster__zone_id` FOREIGN KEY `fk_cluster__zone_id` (`zone_id`) REFERENCES `data_center` (`id`) ON DELETE CASCADE,
    CONSTRAINT `fk_cluster__service_offering_id` FOREIGN KEY `fk_cluster__service_offering_id` (`service_offering_id`) REFERENCES `service_offering`(`id`) ON DELETE CASCADE,
    CONSTRAINT `fk_cluster__template_id` FOREIGN KEY `fk_cluster__template_id`(`template_id`) REFERENCES `vm_template`(`id`) ON DELETE CASCADE,
    CONSTRAINT `fk_cluster__network_id` FOREIGN KEY `fk_cluster__network_id`(`network_id`) REFERENCES `networks`(`id`) ON DELETE CASCADE,
    PRIMARY KEY(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE IF NOT EXISTS `cloud`.`container`machine_cluster_vm_map` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `cluster_id` bigint unsigned NOT NULL COMMENT 'cluster id',
    `vm_id` bigint unsigned NOT NULL COMMENT 'vm id',
    PRIMARY KEY(`id`),
    CONSTRAINT `container`machine_cluster_vm_map_cluster__id` FOREIGN KEY `container`machine_cluster_vm_map_cluster__id`(`cluster_id`) REFERENCES `container`machine_cluster`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE IF NOT EXISTS `cloud`.`container`machine_cluster_details` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `cluster_id` bigint unsigned NOT NULL COMMENT 'cluster id',
    `username` varchar(255) NOT NULL,
    `password` varchar(255) NOT NULL,
    `registry_username` varchar(255),
    `registry_password` varchar(255),
    `registry_url` varchar(255),
    `registry_email` varchar(255),
    `network_cleanup` tinyint unsigned NOT NULL DEFAULT 1 COMMENT 'true if network needs to be clean up on deletion of containermachine cluster. Should be false if user specfied network for the cluster',
    PRIMARY KEY(`id`),
    CONSTRAINT `container`machine_cluster_details_cluster__id` FOREIGN KEY `container`machine_cluster_details_cluster__id`(`cluster_id`) REFERENCES `container`machine_cluster`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

References 

[1https://www.docker.com/products/docker-swarm

...