Feature: VMWare Enhancements: Support for DRS, VM HA

Background: VMware DRS cluster is a collection of ESXi hosts and associated VMs with shared resources and a shared management interface. DRS can be used for

  1. Load Balancing: distribution and usage of CPU and memory resources for all hosts and VMs in the cluster are continuously monitored and compared to ideal resource utilization given the attributes of the cluster’s resource pools and VMs, the current demand, and the imbalance target. It then performs (or recommends) virtual machine migrations accordingly. Also, when a VM is powered on in the cluster, DRS attempts to maintain proper load balancing by either placing the VM on an appropriate host or making a recommendation.
  2. Power Management: When the vSphere Distributed Power Management (DPM) feature is enabled, DRS compares cluster- and host-level capacity to the demands of the cluster’s VMs, including recent historical demand. It places (or recommends placing) hosts in standby power mode if sufficient excess capacity is found or powering on hosts if capacity is needed. Depending on the resulting host power state recommendations, VMs might need to be migrated to and from the hosts as well.
  3. Affinity Rules: control the placement of virtual machines on hosts within a cluster, by assigning affinity rules 

Today, CloudStack assigns VM down to host level, in VM startup phase, once it has assigned the VM to a particular host, it assumes the state-sync information comes from this original host. With DRS, this may because not true, since DRS may re-arrange the VM even at this stage, and if DRS has done that, it may have some effect to CloudStack VM startup process. (CloudStack is able to handle VM replacement after VM has been started). Additionally, we need to sync up VMware better – something like, when an out of band vMotion happens, we need to know it faster/asynchronously to enable CS to get the up-to-date resource situation. Moreover, today, all VMware clusters are assumed to be HA or not – i.e. all or nothing – this needs to be more granular, preferably automatically learnt from vCenter.

DRS has 3 automation levels – Manual, Partially Automated and Fully Automated.

 

manual

partially automated

fully automated

Initial placement

Recommendation only

Automatic

Automatic

Migration

Recommendation only

Recommendation only

Automatic

CloudStack assigns VM down to host level, in VM startup phase, once it has assigned the VM to a particular host, it assumes the state-sync information comes from this original host. With DRS, this may because not true, since DRS may re-arrange the VM even at this stage, and if DRS has done that, it may have some effect to CloudStack VM startup process. (CloudStack is able to handle VM replacement after VM has been started)

Requirements:  General, high-level requirement is better synchronization between VMware DRS/HA/vMotion (out of band) and CloudStack as described above.

-          CloudStack must be able to onboard ESX hosts that belong to a DRS cluster

-          CloudStack must be able to support HA, load balancing and power management in a DRS cluster that is setup in a fully automated fashion

-          Support vMotion (migration of VMs, Storage or both) initiated out of band (not through CloudStack but using VMware tools (vCenter, APIs etc.)

-          In essence, we need to sync up VMware better – something like, when an out of band vMotion happens, we need to know it faster/asynchronously to enable CS to get the up-to-date resource situation

-          today, all VMware clusters are assumed to be HA or not – i.e. all or nothing – this needs to be more granular, preferably automatically learnt from vCenter

Non-requirements:

-          Affinity rules are not required to be supported as part of this requirement – they are specified as a separate requirement

Open Questions:

-       Do we support Manual or partially automated? If so, what is CS responsibility (nothing??)

-       Today, all VMware clusters are assumed to be HA or not – i.e. all or nothing – does this needs to be more granular, preferably automatically learnt from vCenter?

  • No labels

2 Comments

  1. reviewed the Functional spec and please find the review comments below:

    1. Are we going to implement Cloudstack HA support for VMware or still will depend on Native HA.
    2. Is the Cloudstack HA support for VMware is only for DRS enabled cluster.
    3. Are we going to address open vmware vmsync issue as part of this feature enhancement.
    4. please provide/update the FS with Flow/ implementation details ,like

     How are we going to handle the power management
     What happen CS and DRS tries to control the same vm at same time
     How the CS will query/get info from DRS
     Earlier when the DRS enabled ,CS has no idea where the VM moved but it will try to query all the hosts till it finds the moved VM by DRS and it will update the host as parent host. Now how this behaviour will work with your implementation.

     How the Cloudstack handle VM replacement when& after Vm has been started
    5. DRS with fully automated will impact host/cluster performance and network. Not sure why we added as requirement in FS (CloudStack must be able to support HA, load balancing and power management in a DRS cluster that is setup in a fully automated fashion )is there any specific reason.

  2. Better vm sync is initially planned for vmware and later scope was extended other hypervisors. Would like to understand scope of this extension is included  only to  bug fixes or completely changing the implementation part for other hypervisors.  

      --  CloudStack vmware HA is depending on Native HA  and are we changing same for XEN and KVM  hypervisors as part of vmsync.? 

     --  Are we doing any additional implementation  for vmwar DRS/HA enhancements or  it will be covered as part of vmsync feature?

     --   Functional spec to  be updated with latest details