Simulator enhancements

The simulator changes makes it easy to write tests for various scenarios that was not possible earlier, especially negative/fault scenarios. Now it is possible to write end-to-end tests for VM deployment retry logic, HA, migration etc. This document takes some of these scenarios to explain the simulator changes.

User VM HA (source code test/integration/smoke/misc/test_vm_ha.py)

Scenario

  1. Deploy HA enabled user VM
  2. Simulate host failure in CS using the new mock framework
  3. CS triggers HA on the user VM in created in step 1

Setting up the test

First ensure that there is already a cluster with at least 2 hosts in it.

self.hosts = []
suitablecluster = None
clusters = Cluster.list(self.apiclient)
self.assertTrue(isinstance(clusters, list) and len(clusters) > 0, msg = "No clusters found")
for cluster in clusters:
self.hosts = Host.list(self.apiclient, clusterid=cluster.id, type='Routing')
if isinstance(self.hosts, list) and len(self.hosts) >= 2:
suitablecluster = cluster
break
self.assertTrue(isinstance(self.hosts, list) and len(self.hosts) >= 2, msg = "Atleast 2 hosts required in cluster for VM HA test")
 

Tag the hosts in the cluster, so that HA enabled VM can be deployed in this

#update host tags
for host in self.hosts:
Host.update(self.apiclient, id=host.id, hosttags=self.testdata["service_offering"]["hasmall"]["hosttags"])

 

Deploy HA VM

#deploy ha vm
self.virtual_machine = VirtualMachine.create(
self.apiclient,
self.testdata["virtual_machine"],
accountid=self.account.name,
zoneid=self.zone.id,
domainid=self.account.domainid,
serviceofferingid=self.service_offering.id,
templateid=self.template.id)

Now in order to simulate host failure where HA VM is running, following mocks needs to be created. The above call says that create a mock for the agent command 'PingCommand' to return failure (result:fail) for the agent/resource identified by zoneid, podid, clusterid, hostid. Possible values for 'result' can be fail/fault. To create a mock with generic scope don't specify anything for hostid, clusterid, podid, zoneid in that specific order. For e.g. to create mock for all hosts in a cluster specify clusterid, podid and zoneid only. All these mock are persisted in the mock configuration table in simulator DB. 

self.mock_ping = SimulatorMock.create(
    apiclient=self.apiclient,
command="PingCommand",
zoneid=suitablecluster.zoneid,
podid=suitablecluster.podid,
clusterid=suitablecluster.id,
hostid=self.virtual_machine.hostid,
value="result:fail")

After 3 ping failures, investigation happens. First a 'CheckHealthCommand' is issued to check the health of the host for which 'PingCommand' failed. After that the various investigators are invoked to check if the host is alive. The investigation stops whenever an investigator is able to conclusively determine the state of the host. There is a simulator investigator which does this by issuing 'CheckOnHostCommand' from other hosts (in 'Up' state) in cluster. If the investigator returns host status as 'Down' then HA is triggered for HA enabled VMs.

self.mock_checkhealth = SimulatorMock.create(
apiclient=self.apiclient,
command="CheckHealthCommand",
zoneid=suitablecluster.zoneid,
podid=suitablecluster.podid,
clusterid=suitablecluster.id,
hostid=self.virtual_machine.hostid,
value="result:fail")
 self.mock_checkonhost_list = []
for host in self.hosts:
if host.id != self.virtual_machine.hostid:
self.mock_checkonhost_list.append(SimulatorMock.create(
apiclient=self.apiclient,
command="CheckOnHostCommand",
zoneid=suitablecluster.zoneid,
podid=suitablecluster.podid,
clusterid=suitablecluster.id,
hostid=host.id,
value="result:fail"))

HA process is triggered and as part of restarting the VM on another host, first there is check to see if the VM is alive using the 'CheckVirtualMachineCommand' again using the various investigators. 

self.mock_checkvirtualmachine = SimulatorMock.create(
apiclient=self.apiclient,
command="CheckVirtualMachineCommand",
zoneid=suitablecluster.zoneid,
podid=suitablecluster.podid,
clusterid=suitablecluster.id,
hostid=self.virtual_machine.hostid,
value="result:fail")

This mock is there to prevent the UserVmDomRInvestigator from determining host state. Note that for this mock cluster and host is not passed implying that the scope is for entire pod.

self.mock_pingtest = SimulatorMock.create(
apiclient=self.apiclient,
command="PingTestCommand",
zoneid=suitablecluster.zoneid,
podid=suitablecluster.podid,
value="result:fail")

Actual test

In the actual test there is a wait for HA to happen. Then there is a validation to see that the HA VM has moved to another host in the cluster.

In this case while creating the mocks, parameters like 'count', 'jsonResonse' are not used. These are used in other test cases.

The count parameter is used to make sure that the mock is active only for 'count' times. Every time a mock successfully executes count is decremented by 1. This parameter can be used to make sure that the mock actually got executed as expected and that the test failure is due to the mock and not due to some other issues. For e.g. take a look at test/integration/smoke/misc/test_deploy_vm.py where there are tests for VM deployment retry logic.

The jsonResponse parameter is used to pass a json string that is expected as response as part of agent command execution. This data is simply deserialized into a json object and returned from the agent layer. Take a look at test/integration/smoke/misc/test_vm_sync.py.

Cleanup

As part of cleanup, clear all the mocks created during setup so that subsequent tests are not impacted by them. The mocks are cleaned up by simply setting the 'removed' field.

  • No labels