Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleunder reconsideration

This design, though valid is ignoring the rising use of tools like oasis camp and tosca, as well as the more propriatary format of terraform. Embedding or co-installing Apache Brooklyn with CloudStack for the use of creating application landscapes seems more appropriate.

Table of Contents

Introduction

ApplicationClusters (or AppC, pronounce appz) are an attempt to make orchestrating bigger application landscapes easier in a vanilla Apache CloudStack install.

Table of Contents

...

Services like Kubernetes, Cloud Foundry, DBaaS require integration support from underlying CloudStack. This support includes Grouping Vms, Scaling, Monitoring. Rather than making changes every time to support various services in ACS, a generic framework has to be developed.

...

  • create application cluster: provision cluster resources, and brings the cluster in to operational readiness state. Resources provisioning shall be the responsibility of the caller, that can act according to the cluster manager used. All the cluster VM's shall be launched in to a dedicated network for the cluster. API end point of cluster manager can be exposed by the caller through creating a port forwarding rule on source nat ip of the network dedicated for the cluster.
  • delete application cluster: destroy all the resources provisioned for the application cluster. Post delete, a application cluster can not be performed any operations on it.
  • start application cluster: Starting a cluster will start the VMs and possibly start the network.
  • stop application cluster: Stopping a cluster will shutdown all the resources consumed by the application cluster. User can start the cluster at a later point with Start operation.
  • recovering a cluster: Due to possible faults (like VMs that got stopped due to failures, or malfunctioning cluster manager etc) application cluster can end up in Alert state. Recover is used to revive application cluster to a sane running state. In the initial version this is just trying to have the correct number of VMs per role. In later versions callbacks for (re-)provisioning may be added.
  • cluster resizing (scale-in/out): increase or decrease the size of the cluster on a per role basis. The functionality here is adhering to the same limitations as stated above under recovering.
  • list application cluster: list all the application clusters

...

The below state machine reflects how a application cluster state transitions for each of life cycle operations

Gliffy Diagram

Image Removed

nameapplication cluster life cycle
 

Garbage collection

Garbage collection shall be implemented as a way to clean up the resources of application cluster, as a background task. Following are cases where cluster resources are freed up.

  • Starting application cluster fails, resulting in clean up of the provisioned resources (Starting → Expunging → Destroyed)
  • Deleting application cluster (Stopped→ Expunging → Destroyed and Alert→ Expunging → Destroyed )

If there are failures in cleaning up resources, and clean up can not proceed, the state of the application cluster is marked as 'Expunge' instead of 'Expunging'. The garbage collector will loop through the list of application clusters in 'Expunge' state periodically and try to free the resources held by application cluster.

Cluster state synchronization

State of the application cluster is 'desired state' of the cluster as intended by the user or what the system's logical view of the application cluster. However there are various scenarios where desired state of the application cluster is not sync with state that can be inferred from actual physical/infrastructure. For e.g a application cluster in 'Running' state with cluster size of 10 VM's all in running state. Its possible due to host failures, some of the VM's may get stopped at later point. Now the desired state of the application cluster is a cluster with 10 VM's running and in operationally ready state, but the resource layer is state is different. So we need a mechanism to ensure:

  • cluster is in desired state at resource/infrastructure layer. Which could mean provision new VM's or delete VM's, in the cluster etc to ensure desired state of the application cluster
  • Conversely when reconciliation can not happen reflect the state of the cluster accordingly, and to recover at later point.

Following mechanism will be implemented.

  • A state 'Alert' will be maintained that application cluster is not in its desired state.
  • A state synchronization background task will run periodically to infer if the cluster is in desired state. If not cluster will marked as alert state.
  • A recovery action try to recover the cluster

State transitions in FSM, where a application cluster ends up in 'Alert' state:

  • failure in middle of scale in/out, resulting in cluster size (# of VM's) not equal to the expected
  • failure in stopping a cluster, leaving some VM's to be running state
  • Difference of states as detected by the state synchronization thread.

example provisioning kubernetes container cluster manager

Core OS template shall be used to provision container cluster VM. Setting up a cluster VM as master/node of kubernetes is done through cloud-config script [7] in CoreOS. CloudStack shall pass necessary cloud config script as base 64 encoded user data. Once Core OS instances are launched by CloudStack, by virtue of cloud-config data passed as user data, core OS instances self-configures as kubernetes master and node VM's

schema changes

 

Code Block
languagesql
CREATE TABLE IF NOT EXISTS `cloud`.`application_cluster` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `uuid` varchar(40),
    `name` varchar(255) NOT NULL,
    `description` varchar(4096) COMMENT 'display text for this application cluster',
    `zone_id` bigint unsigned NOT NULL COMMENT 'zone id',
    `network_id` bigint unsigned COMMENT 'network this application cluster uses',
    `account_id` bigint unsigned NOT NULL COMMENT 'owner of this cluster',
    `domain_id` bigint unsigned NOT NULL COMMENT 'owner of this cluster',
    `state` char(32) NOT NULL COMMENT 'current state of this cluster',
    `key_pair` varchar(40),
    `created` datetime NOT NULL COMMENT 'date created',
    `removed` datetime COMMENT 'date removed if not null',
    `gc` tinyint unsigned NOT NULL DEFAULT 1 COMMENT 'gc this application cluster or not',
    `network_cleanup` tinyint unsigned NOT NULL DEFAULT 1 COMMENT 'true if network needs to be clean up on deletion of application cluster. Should be false if user specfied network for the cluster',
    CONSTRAINT `fk_cluster__zone_id` FOREIGN KEY `fk_cluster__zone_id` (`zone_id`) REFERENCES `data_center` (`id`) ON DELETE CASCADE,
    CONSTRAINT `fk_cluster__network_id` FOREIGN KEY `fk_cluster__network_id`(`network_id`) REFERENCES `networks`(`id`) ON DELETE CASCADE,
    PRIMARY KEY(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 
CREATE TABLE IF NOT EXISTS `cloud`.`application_cluster_role` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `cluster_id` bigint unsigned NOT NULL COMMENT 'cluster id',
    `name` varchar(255) NOT NULL COMMENT 'role name',
    `service_offering_id` bigint unsigned COMMENT 'service offering id for the cluster VM',
    `template_id` bigint unsigned COMMENT 'vm_template.id',
    `node_count` bigint NOT NULL default '0',
    PRIMARY KEY(`id`),
    CONSTRAINT `fk_cluster__service_offering_id` FOREIGN KEY `fk_cluster__service_offering_id` (`service_offering_id`) REFERENCES `service_offering`(`id`) ON DELETE CASCADE,
    CONSTRAINT `fk_cluster__template_id` FOREIGN KEY `fk_cluster__template_id`(`template_id`) REFERENCES `vm_template`(`id`) ON DELETE CASCADE,
    CONSTRAINT `application_cluster_role_cluster__id` FOREIGN KEY `application_cluster_role_cluster__id`(`cluster_id`) REFERENCES `application_cluster`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 
CREATE TABLE IF NOT EXISTS `cloud`.`application_cluster_role_vm_map` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `role_id` bigint unsigned NOT NULL COMMENT 'role id',
    `vm_id` bigint unsigned NOT NULL COMMENT 'vm id',
    PRIMARY KEY(`id`),
    CONSTRAINT `application_cluster_role_vm_map_cluster_role__id` FOREIGN KEY `application_cluster_role_vm_map_cluster_role__id`(`role_id`) REFERENCES `application_cluster_role`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 
CREATE TABLE IF NOT EXISTS `cloud`.`application_cluster_details` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `cluster_id` bigint unsigned NOT NULL COMMENT 'cluster id',
    `key` varchar(255) NOT NULL,
    `value` text,
    PRIMARY KEY(`id`),
    CONSTRAINT `application_cluster_details_cluster__id` FOREIGN KEY `application_cluster_details_cluster__id`(`cluster_id`) REFERENCES `application_cluster`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
CREATE TABLE IF NOT EXISTS `cloud`.`application_cluster_role_details` (
    `id` bigint unsigned NOT NULL auto_increment COMMENT 'id',
    `role_id` bigint unsigned NOT NULL COMMENT 'role id',
    `key` varchar(255) NOT NULL,
    `value` text,
    PRIMARY KEY(`id`),
    CONSTRAINT `application_cluster_role_details_role__id` FOREIGN KEY `application_cluster_role_details_cluster__id`(`role_id`) REFERENCES `application_cluster_role`(`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Code Block
languagejava
// example details for  a cluster used as a k8 container cluster:
enum {
  `username`,
  `password`,
  `registry_username`,
  `registry_password`,
  `registry_url`,
  `registry_email`,
  `endpoint` varchar(255) COMMENT 'url endpoint of the application cluster manager api access',
  `console_endpoint` varchar(255) COMMENT 'url for the application cluster manager dashbaord',
  `cores` bigint unsigned NOT NULL COMMENT 'number of cores',
  `memory` bigint unsigned NOT NULL COMMENT 'total memory'
};

 

...