Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If value is not provided we default it to 1, meaning no overcommiting of resources by default.

Capacity calculations.

Capacity calculation model will be changed to align with the hypervisors calculation. When a vm is deployed with "x" overprovisioing factor we want to guarantee (service offering of vm / x ) during its lifecycle even though the over provisioning changes.

When the cluster overprovisioing factor = x

Total Capacity = (actualHardwareCapacity * x)

Used Capacity = sum (service offering of each running vm) + sum (service offering of each stopped vm in the skipped.counting.hours)  

When the cluster overprovisioing factor is changed to y

Total Capacity = (actualHardwareCapacity * y)

Wiki Markup
Used Capacity = \[sum (service offering of each running vm deployed when factor was x) + sum (service offering of each stopped vm deployed when factor was x in the skipped.counting.hours)\] * y/x \+ 

sum (service offering of each running vm deployed when factor was y ) + sum (service offering of each stopped vm deployed when factor was y in the skipped.counting.hours)

Ideally you shouldn't change the over-provisioning factor in a cluster with vms running. This is because the existing vms got deployed with the previous factor. 
Lets say you still want to change the factor. On changing it, both used and total capacity are multiplied by this factor to keep a track of available capacity.

Let's understand the capacity calculation below through an example :-

Cluster – c, 
cpu over provisioning = 1, 
Total cpu = 2GHZ

when we deploy 2VMs of 512Mhz service offering each then 
totalCapacity = 2GHz 
AvailableCapacity = 1GHz
UsedCapacity = 1GHZ

Now change the cpu over provisioning ratio of cluster c to 2
totalCapacity = 4GHz 
AvailableCapacity = 2GHz
UsedCapacity = 2GHZ

Notice the difference in multiplication here. Both used and total capacity are multiplied by this factor. Used Capacity in the new model after changing the factor = (service offering of vm / overcommit it got deployed with) * new overcommit => (1GHZ/1)*2
The reason is want to guarantee minimum cpu in case of contention. So when a vm is deployed with "x" overprovisioing factor we want to gurantee (service offering of vm / x ) during its lifecycle even though the overprovisioning changed.
So the reason we scale the used cpu to keep track of the actual amount of cpu left on the host.

Now if we launch 2 VMs with 1Ghz cpu service offering
totalCapacity = 4GHz 
AvailableCapacity = 0GHz
UsedCapacity = 4GHZ 
Calculation for used capacity for 4vms ((service offering of vm / overcommit it got deployed with) * new overcommit) = 
(512Mhz/1)*2 + (512Mhz/1)*2 + (1Ghz/2)*2 + (1Ghz/2)*2 = 4Ghz

now suppose we change the over provisioning to 3 
totalCapacity = 6 GHz 
AvailableCapacity = 0 GHz
UsedCapacity = 6 GHZ
Calculation for used capacity for 4vms ((service offering of vm / overcommit it got deployed with) * new overcommit) = 
(512Mhz/1)*3 +(512Mhz/1)*3 +(1Ghz/2)*3 + (1Ghz/2)*3 = 6Ghz

Now this is assuming, you haven't stopped and started the vms all this while. Say now you stop and start 1 VM = 512Mhz and another VM = 1Ghz. The over-provisioning factor ratio changes for these vms to 3 each. Note the denominator in the calculation 
totalCapacity = 6 GHz 
AvailableCapacity = 1.5 GHz
UsedCapacity = 4.5 GHZ
Calculation for used capacity for 4vms ((service offering of vm / overcommit it got deployed with) * new overcommit) = 
(512Mhz/3)*3 +(512Mhz/1)*3 +(1Ghz/3)*3 + (1Ghz/2)*3 = 4.5 Ghz

The upside of new model is we are guaranteeing QOS as (service offering of vm / x ) during its lifecycle vs the old model

The overcommit ratios are dynamically plugged into the capacity calculations. All the capacity calculations is done based on the overcommitted value of capacities. So if the overcommit ratios is decreased the used capacity may go beyond 100%. 
Example:
Overcommit =2 
capacity = 2GB
capacity after overcommit = 4GB.
Now if we deploy 3 VM of 1 GB each 
used =3GB
free = 1GB
used % = 3/4 *100 = 75%
if the overcommit ratio is decreased to 1
used = 3GB
free = -1GB
used % = 3/2 *100 =150% (will generate alerts based on this.)

DB changes:

There will be no changes to the db. We will add the cpu and ram overcommit ratios in the cluster_details table.

...

2b. or accept the change but not add more VMs anymore ( preferred method)

if we decrease the factor - currently we allow doing that (say change from 2X to 1X) . Lets say If the allocation is beyond the factor already (say 1.5 X) then what it means is no future allocation will be allowed and secondly the dashboard would start showing >100% allocated which might confuse the admin (in our example it would show 150%).  The admin would also start getting alerts for capacity being already exhausted. i.e. we should accept the new value and allocate only if the system has enough capacity to deploy more VMs based on the new overcommit ratios.

...

Almost all the hosts have the capability to overcommit, and it is up to the admin to make sure of it. Even if the host is not configured properly, cloudstack will try to set the parameters assuming it has capability.

Capacity calculations.

The overcommit ratios are dynamically plugged into the capacity calculations. All the capacity calculations is done based on the overcommitted value of capacities. So if the overcommit ratios is decreased the used capacity may go beyond 100%.
Example:
Overcommit =2
capacity = 2GB
capacity after overcommit = 4GB.
Now if we deploy 3 VM of 1 GB each
used =3GB
free = 1GB
used % = 3/4 *100 = 75%
if the overcommit ratio is decreased to 1
used = 3GB
free = -1GB
used % = 3/2 *100 =150% (will generate alerts based on this.)

Alert generation.

All the alerts are generated based on the global threshold values. Will change this behavior once we have
thresholds per cluster.

...