The problem:

In 4.2, we added VM snapshot for Vmware/Xenserver. The current workflow will be like the following:
createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.

If anybody wants to change the workflow, then need to either change VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. Both are not the ideal choice, as VMSnapshotManagerImpl should be able to handle different ways to take vm snapshot, instead of hard code.

The requirements for the pluggable VM snapshot coming from:
Storage vendor may have their optimization, such as NetApp.
VM snapshot can be implemented in a totally different way(For example, I could just send a command to guest VM, to tell my application to flush disk and hold disk write, then come to hypervisor to take a volume snapshot).

The possible options:

1. coarse grained interface. Add a VMSnapshotStrategy interface, which has the following interfaces:
VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);

The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
VMSnapshotManagerImpl will manage VM state, do the sanity check, then will handle over to VMSnapshotStrategy.
In VMSnapshotStrategy implementation, it may just send a Create/revert/delete VMSnapshotCommand to hypervisor host, or do anything special operations.

2. fine-grained interface. Not only add a VMSnapshotStrategy interface, but also add certain methods on the storage driver.
The VMSnapshotStrategy interface will be the same as option 1.
Will add the following methods on storage driver:
/* volumesBelongToVM is the list of volumes of the VM that created on this storage, storage vendor can either take one snapshot for this volumes in one shot, or take snapshot for each volume separately
The pre-condition: vm is unquiesced.
It will return a Boolean to indicate, do need unquiesce vm or not.
In the default storage driver, it will return false.
*/
boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSnapshot);
Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSnapshot);
Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSNapshot);

The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's takeVMSnapshot, the pseudo code looks like:
HypervisorHelper.quiesceVM(vm);
val volumes = vm.getVolumes();
val maps = new Map[driver, listVolumeInfo]();
Volumes.foreach(volume => maps.put(volume.getDriver, volume :: maps.get(volume.getdriver())))
val needUnquiesce = true;
maps.foreach((driver, volumes) => needUnquiesce = needUnquiesce && driver.takeVMSnapshot(volumes))

By default, the quiesceVM in HypervisorHelper will actually take vm snapshot through hypervisor.

3. The pros and cons of each options:

The pros of option 1 is that: it's simple, no need to change storage driver interfaces. The cons is that each storage vendor need to implement a strategy, maybe they will do the same thing.
The pros of option 2 is that, storage driver won't need to worry about how to quiesce/unquiesce vm. The cons is that, it will add these methods on each storage drivers, so it assumes that this work flow will work for everybody.

The final design 

Per discuss on the mailing list, people prefer the former one, only provide a coarse grained interface for now.

The following changes are made:

1. UI changes

a checkbox: "quiesce", added on taking vm snapshot

a checkbox: "quiesce" added on taking volume snapshot

2. API changes:

a new parameter: quiescevm, added in createVMSnapshot api

a new parameter: quiescevm, added in createSnapshot api

a new parameter: caps, added in ListStoragePools api, which can indicate the capabilities of storage pool. For example, if the storage pool supports "quiesce volume snapshot", it should return "VOLUME_SNAPSHOT_QUIESCEVM" = "true", then UI can show up "quiesce vm" when taking volume snapshot if the volume is created on this storage pool.

3. mgt server code changes:

refactor VMSnapshotManagerImpl, move the code into StorageStrategyFactory and VMSnapshotHelper

add a new helper function called: HypervisorHelper, which has quiesceVm and unquiesceVm methods

The result 

1. For VM snapshot:

  If "quiesce" is choose by user:

     For vmware, vm snapshot will quiesce vm, otherwise, won;t

     For xenserver, this option won't take effect, as current code doesn't use xenserver's snapshot-with-quiesce api at all.

2. For volume snapshot:

    the "quiesce" option won;t take effect, for both vmware and xenserver, will change our current code a lot to implement it if possible.

3. If primary storage is not coming from default storage supported by cloudstack, such as NetApp, above statement is not true. Different storage provider can choose to implement "quiesce" differently.

  • No labels