...
Following API shall be introduced with container service:
New reponse 'containerclusterreponse' shall be added with below details
Each of the life cycle operation is a workflow resulting in either provisioning or deleting multiple CloudStack resources. Its not possible to achieve atomicity. There is no guarantee a workflow of a life cycle operation will succeed due to lack of 2PC like model of resource reservation followed by provisioning semantics. Also there is no guarantee rollback getting succeeded. For e.g. while provisioning a cluster of size 10 VM's, deployment may run out of capacity to provision any more VM's after provisioning 5 Vm's . In which case as rollback provisioned VM's can be destroyed. But there can be cases where deleting a provisioned VM is not possible temporarily like disconnected hosts etc.
...
Below state machine reflects how container cluster state transitions for each of life cycle oerations
Below state machine captures the state of container cluster as it goes through various life-cycle operations. Not all states are necessarily end user visible.
garbage collection will shall be implemented as a way to clean up the resources of container cluster, as a background task. Following are cases where cluster resources are freed up.
If there is failures in cleaning up resources, and clean up can not proceed, state of container cluster is marked in 'Expunge' state from 'Expunging' state. Garbage collector will loop through the list of container clusters in 'Expunge' state periodically and try to free the resources held by container cluster.
should we care to implement rollback of failure in container cluster creation, or do a lazy cleanup. Which is to mark the container cluster to be in 'Expunging' state and let garbage collector do the cleanup. Its just matter of when to do it. Both the flows may be using same cleanup module.
State of the container cluster is 'desired state' of the cluster as intended by the user or what the system's logical view of the container cluster. However there are various scenarios where desired state of the container cluster is not sync with state that can be inferred from actual physical/infrastructure. For e.g a container cluster in 'Running' state with cluster size of 10 VM's all in running state. Its possible due to host failures, some of the VM's may get stopped at later point. Now the desired state of the container cluster is a cluster with 10 VM's running and in operationally ready state (w.r.t to container provisioning), but the resource layer is state is different. So we need a mechanism to ensure:
Following mechanism will be implemented.
State transitions in FSM, where a container cluster ends up in 'Alert' state:
From layering perspective, CCS is like layered on top of CloudStack functionality. There is no way to control the life-cycle of individual resources that are part of container cluster. For e.g user can go and delete the VM's that are part of container cluster.
OPEN QUESTION There are no hooks to restrict this actions?
Only design option is to cluster state synchronization to figure missing entities (in case of destroyed VM's) or conflicting states (User can stop a VM, that is expected to be running by CCS) and put the cluster state in alert.
Policies can be defined on how to recover the cluster.
re-use cloud DB vs keep separate DB
Handling out-of-band
CCS will keep below book keeping tables to store the cloudstack resources provisioned and used for a container cluster.
Note there are no foreign key and delete cascades. CCS should not loose book keeping data on the resources even if resource is deleted from the CloudStack DB.
CCS code need to do defensive coding to verify entity exist in CloudStack tables before using it.
CREATE TABLE IF NOT EXISTS `cloud`.`container_cluster` ( `id` `id` bigint unsigned unsigned NOT NULL auto_increment COMMENTCOMMENT 'id', `uuid` `uuid` varchar(40), ` name ` `name` varchar(255) NOT NULL, `description` `description` varchar(4096) NULL COMMENT 'descriptionCOMMENT 'display text for this container cluster', `zone_ id` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'zone id', `service_offering_ id` bigint unsigned COMMENT 'service offering id for the cluster VM', `template_ id` bigint unsigned COMMENT 'vm_template.id', `network_ id` bigint unsigned COMMENT 'network this container cluster uses', `node_ count` bigint NOT NULL default '0', `account_ id` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'owner of this cluster', `domain_ id` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'owner of this cluster', `state` `state` char(32) NOT NULL COMMENT COMMENT 'current state of this cluster', `key_ pair` varchar(40), `cores` `cores` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'number of cores', `memory` `memory` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'total memory', `endpoint` `endpoint` varchar(255) COMMENT 'url endpoint of the container cluster manager api access', `console_ endpoint` varchar(255) COMMENT 'url for the container cluster manager dashbaord', `created` datetime NOT NULL COMMENT 'date created', CONSTRAINT `fk_cluster__zone_id` FOREIGN KEY `fk_cluster__zone_id` (`zone_id`) REFERENCES `data_center` (`id`) ON DELETE CASCADE, CONSTRAINT `fk_cluster__service_offering_id` FOREIGN KEY `fk_cluster__service_offering_id` (`service_offering_id`) REFERENCES `service_offering`(`id`) ON DELETE CASCADE, CONSTRAINT `fk_cluster__template_id` FOREIGN KEY `fk_cluster__template_id`(`template_id`) REFERENCES `vm_template`(`id`) ON DELETE CASCADE, CONSTRAINT `fk_cluster__network_id` FOREIGN KEY `fk_cluster__network_id`(`network_id`) REFERENCES `networks`(`id`) ON DELETE CASCADE, PRIMARY KEY(`id`) InnoDB DEFAULT CHARSET=utf8; CREATE TABLE IF IF NOT EXISTS `cloud`.`container_cluster_vm_map` ( `id` `id` bigint unsigned unsigned NOT NULL auto_increment COMMENT 'id', `cluster_ id` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'cluster id', `vm_ id` bigint unsigned unsigned NOT NULL COMMENT COMMENT 'vm id', PRIMARY KEY(`id`), CONSTRAINT `container_cluster_vm_map_cluster__id` FOREIGN KEY `container_cluster_vm_map_cluster__id`(`cluster_id` ) REFERENCES `sb_ccs_container_cluster`(`id`) ON DELETE CASCADE CREATE TABLE IF NOT EXISTS `cloud`.`container_cluster_details` ( PRIMARY KEY(`id`), cluster`(`id`) ON DELETE CASCADE
InnoDB DEFAULT CHARSET=utf8; |