https://issues.apache.org/jira/browse/CLOUDSTACK-658
master, 4.1.0
It is not always possible to exactly identify the CPU and RAM requirements at the time of deploying a VM. But for various reasons it may be required to scale up these resources later on. At that time there is no other way but to restart the VM with increased resources. Dynamic scaling for CPU and RAM feature would allow to change these resources for a running VM avoiding any downtime.
Currently CS allows updating CPU/RAM by changing to a different compute offering for stopped VMs. This feature will enable the same for running VMs.
This document describes the specifications and design of the feature.
Currently planning to do it for Vmware. Support for other HVs can also be added based on HV capabilities.
A new command class needs to be introduced for actually changing CPU/RAM at agent layer. This needs to be handled for the supported HVs.
ReconfigureVMCommand: This will have the updated CPU and RAM values as members
Allocation logic: Refer to flow chart below
Following APIs needs to be changed:
upgradeVirtualMachine - This is an existing API and take vm_id and compute_offering_id as inputs. This is a sync call currently and will be modified to async (breaking change but I feel should be fine). For system VMs the same API can be used but proper access checks would be done. In case of a migration this will internally use the migrateVirtualMachine API logic.
createServiceOffering - boolean flag indicating if dynamic scale up of CPU/RAM is allowed (see open issue#1)
TBD
1. Should VMs be always marked for dynamic scaling? If so should this be a VM property or part of compute offering? Should dynamic scaling be treated as premium service, in that case having it in compute offering makes more sense from usage/billing perspective?
2. Should scale down be allowed? It can be explicitly prevented since none of the HVs/guest OS supports it.
3. There is also an option of having a custom compute offering where user can specify values for CPU and RAM during deployment or scaling up. But am not sure if this option can be misused since this is a user level API. Another complexity is to capture usage.
TBD
Appendix A:
Appendix B: