Problem Statement
Currently CloudStack does not offer a flexible pluggable framework for users to easily integrate and configure any 3rd-party object stores for such backup services as registering templates, taking snapshots, etc. Along with Edison's recent refactored storage subsystem 2.0, we are proposing to develop a storage backup object store plugin framework to allow CloudStack to systematically manage and configure various types of backup data stores from different vendors.
With this new plugin framework, we would like to achieve following functionalities:
so that we can easily add an S3-based Object Storage as a secondary storage in CloudStack for storing all artifacts currently stored in secondary storage - snapshots, ISOs and templates. Although current CloudStack has some existing business logic to handle S3, Swift, NFS secondary storages, the code to interact with these vendor-specific providers is closely mingled together with CloudStack orchestration code, specifically, TemplateManagerImpl, SnapshotManagerImpl, etc, using an ugly if-else control flow. With such a tight coupling, it becomes very hard to extend the framework to support new object store providers. By extracting out ImageDataStore pluggble service interface, we can wrap these vendor-specific logic inside each provider plugin, then CloudStack orchestration code can be greatly simplified and provide a uniform and extensible way to manage different object store vendors.
Enable region wide object backup using s3-like object store.
Currently CloudStack only allows registering a template to a specific zone by using NFS secondary storage. In order to enable this template to another zone, user has to invoke copyTemplate to copy the template from one zone to another. With this new plugin framework, customers can plugin their own object store providers, like S3/Swift, etc and choose them as the backup store. CloudStack can provide an option in *registerTemplate* api to allow them to create either "perZone" or "regionWide" templat
Provide cache storage framework
For certain hypervisors, when moving data(either templates or snapshots, etc) between primary storage and backup object store, a cache storage is required. The cache storage acts as an intermediate storage, stores templates, volumes and snapshots temporarily. Current CloudStack uses admin provided NFS server as the only one cache storage and takes cache storage as zone-wide for granted.
It has its drawbacks:
- Admin can't add other type of storages as cache storage.
- Deep coupled with cache storage in the operations related to template and snapshot upload, which is unnecessary and introduce scalability issue on cache storage.
With cache storage framework, it will allow developer add other type of cache storage into CloudStack, and let admin decide how to use cache storage.
Provide scalable cache storage solution
The current NFS based cache storage solution is not scalable:
- NFS cache storage is zone-wide, has its limit on how many concurrent I/O can happen on cache storage
- CloudStack does support multiple NFS cache storage, but lacks auto scaling capability based on I/O load. Currently CloudStack only randomly selects one of NFS cache storages whenever cache storage is needed, there is no policy to choose cache storage from cache storage pools based on certain criteria.
In order to scale cache storage, there are possible solutions:
- cache storage doesn't need to be zone wide, it can be pod or cluster wide.
- CloudStack can create its own cache storage VMs(the VM created by CloudStack which can present NFS export to CloudStack), then admin doesn't need to setup lot of NFS servers by himself, all the cache storages are handled by CloudStack itself.
- Add anther type of storage as cache storage, such as Ceph.
Provide pluggable Data motion strategies to handle data transfer
There is only one way to move data between primary and secondary storage in current CloudStack code, which is moving data between primary storage and NFS secondary storage, then between NFS secondary storage and object storage. This part of code is highly coupled with current CloudStack storage model, which is hard to be extended.
The new pluggable data motion strategies will make developer easier to write a new strategies, the possible strategies will be:
- Move data between primary storage and object storage directly, totally bypass NFS cache storage. This is possible for Vmware and KVM.
- Move data between primary storage and object storage, but through a cache storage. This is for certain operations on XenServer.
- Move data between primary storages, through hypervisor specific storage motion functionalities, such XenMotion, VMotion etc.
Storage DataStore Plugin Framework
In this new refactored backup datastore plugin framework, we have clearly defined those pluggable service interfaces, such as PrimaryDataStore, ImageDataStore, DataMotionStrategy, AutoScaleStrategy, etc, so that different storage providers can develop their vendor-specific plugins based on the well-defined contracts that can be seemlessly managed by CloudStack orchestration.
DB Schema
This is the new DB model related to data store.
The following are deprecated tables:
- template_host_ref (replaced with template_store_ref)
- template_s3_ref (replaced with template_store_ref)
- template_swift_ref (replaced with template_store_ref)
- volume_host_ref (replaced with volume_store_ref)
- s3 (replaced with image_data_store)
- swift (replaced with image_data_store)
API Changes
Following will be the changes in their behavior post introduction of Object Store plugin framework:
image store commands
- addImageStoreCmd - This command will be used to handle adding image store from different providers, including NFS, S3, Swift, etc
- listImageStoreCmd
- deleteImageStoreCmd
- enableImageStoreCmd
- listStorageProvidersCmd - List all enabled image store providers (with type=IMAGE)
Here are previous APIs to be deprecated
- addSecondaryStorageCmd
- addS3Cmd
- addSwiftCmd
- listS3Cmd
- listSwiftCmd
snapshot commands
template/iso commands
- No changes: createTemplate, updateTemplate, listTemplates, updateTemplatePermissions, listTemplatePermissions, prepareTemplate
- To be changed:
registerTemplate - Would register the URI. Download work would be done by S3 now.
deleteTemplate - Would just unregister the template, physically deleting it from OS should happen through S3 api.
extractTemplate - would just give the S3 url
copyTemplate - would return success since the template would be available region wide.
volume commands
- No changes: attachVolume, detachVolume, deleteVolume, listVolumes
- To be changed:
extractVolume - Do we put it into S3 now internally ?
uploadVolume - similar to register template. User needs to put in just the S3 url here.
createVolume - when creating volume from snapshot, zone id needs to be mentioned.
migrateVolume - should work as is using scratch storage instead of nfs sec. storage.
Tasks:
- Backup storage framework
- refactoring snapshot service
- refactoring template service
- refactoring data motion strategy plugin
- Autoscaling cache storage framework
- cache storage plugin interface
- autoscaling strategy plugin
- NFS based cache storage implementation
- NFS transport VM based cache storage implementation
- NFS cache storage vs NFS transport VM based cache storage
- image store plugins at the mgt server side
- classical NFS plugin
- S3 plugin
- swift plugin
- image store plugins at the hypervisor side
- vmware
- kvm
- xenserver
- API changes
- end user APIs
- admin APIS