Bug Reference

https://issues.apache.org/jira/browse/CLOUDSTACK-714

Branch

master, 4.2.0

Purpose

Cloudstack artifacts such as templates, ISOs and snapshots are stored in nfs based Secondary Storage. This can present a few issues such as scalability, performance because when a number of hosts access secondary storage concurrently, typically via SSVM there can be performance degradation. So the idea is to use S3 based Object Storage as secondary storage. Major benefits of doing this are :-

  • Object store provides built-in high availability capability and so we will have much better scalability and performance.
  • When using an Object Storage, access to secondary storage data will be made available across multiple zones in a Region. This will therefore provide a huge benefit as it would not be necessary to copy templates, snapshots etc. across zones as needed in a NFS-only environment. It would also help in better disaster recovery when a zone goes down.

PRD

https://cwiki.apache.org/confluence/display/CLOUDSTACK/S3-based+secondary+Storage

Acronyms

  • OS - Object Store
  • PS - Primary storage
  • SS - Scratch/cache storage.

Deployment Model

  • When admin upgrades to 4.2, he will continue to use zone wide NFS sec. storage. At that point of time Image Transfer Service VM (a.k.a old ssvm) to transfer images in/out of zones will continue to work as is with same functionalities. But, this VM should be launched by the admin instead of autolaunch like it is today. This would be part of the framework so people can reuse/relaunch this vm.
  • At any point of time in 4.2 the admin can choose to "replace" the NFS sec. storage with S3 based Object store which will act as the new sec. storage. Object store will be visible for the entire region unlike nfs sec. storage which existed zone wide. Both might coexist for some time depending on the migration strategy. Following will be the workflow
    • Migration needs to be thought out more in depth, but the idea is to have a reliable and scalable copy mechanism which reliably copies data from nfs to OS.
    • System admin launches an NFS Export Service VM per zone. This NFS Export Service VM should allocate a large disk from primary storage to use as its cache. It has NFS export and it downloads from the object store.
    • From the hypervisor side, it would look like it was connecting to the old secondary storage (NFS) but it’s actually connecting to the NFS export service.
    • For system vms to come up the admin seeds the system vm template on Object store. For XS and KVM we inject scripts so that it can pull the the template directly on its PS. Vmware doesn't need any work since it can work with a URI.
    • CS core now doesn't care about the downloading template work anymore. All it needs is a URI of the template. The user will need to upload his template to object store which will give him a URL. He registers this URL to CS core.
Major Changes with the deployment model
  • NFS export service vm - 
    • An admin launched vm, this vm would have an nfs export which would be used as cache per zone to store templates and snapshots. The disk would be carved out of primary storage. There would be a java agent running on this vm and we can leverage on some of the old functionalities existing in the SSVM.
    • Moving objects between SS and OS functionalities - These would be new functionalities put into the NFS VM which would help in moving objects to and fro between SS and OS.  
    • Moving objects between HV to SS functionalities - I would leverage on existing SSVM functionalities and reuse the same code.
  • Carving out Scratch Storage (SS) -
    • When the admin creates a NFS VM create a data disk from PS and export it as an NFS disk to the vm.
  • Seeding System vm templates - Admin seeds the system vm template on Object store. For XS and KVM we inject scripts so that it can pull the the template directly on its PS. Vmware doesn't need any work since it can work with a URI.
  • API changes - We need to keep the api's backward compatible since Object store can be introduced anytime in 4.2. Following will be the changes in their behavior post introduction of Object Store 
    • snapshot commands - createSnapshot, listSnapshots, deleteSnapshot, createSnapshotPolicy, deleteSnapshotPolicies, listSnapshotPolicies - No changes
    • template/iso commands  createTemplate, updateTemplate, listTemplates, updateTemplatePermissions, listTemplatePermissions, prepareTemplate  No changes
      • registerTemplate - Would register the URI. Download work would be done by S3 now.
      • deleteTemplate - Would just unregister the template, physically deleting it from OS should happen through S3 api.
      • extractTemplate - would just give the S3 url
      • copyTemplate - would return success since the template would be available region wide.
    • volume commands - attachVolume, detachVolume, deleteVolume, listVolumes - No changes
      • extractVolume - Do we put it into S3 now internally ?
      • uploadVolume - similar to register template. User needs to put in just the S3 url here.
      • createVolume - when creating volume from snapshot, zone id needs to be mentioned.
      • migrateVolume - should work as is using scratch storage instead of nfs sec. storage.
  • CRUD api's for Object Store -
    • CRUD api's - add/list/update/delete ObjectStore API need to be introduced.
  • CRUD operations on NFS export VM
    • Introduce new api DeploySystemVM - An admin controlled api to launch NFS vm. This api will also create the SS and export it as NFS to the NFS vm.
    • Start, Reboot, Destroy, Stop, List, migrate, changeServiceForSystemVm - These api's will be enhanced to work for NFS vm as well.
  • Snapshot changes - 
    • Storage - Snapshots will be stored in Object store available in the entire region. SS available per zone will be used for operations on them like creating template, volume from snapshot. From HV front, it should stay the same as before with nfs sec. storage. 
    • New functionality - With OS we will provide the ability to restore the snapshot in any zone of the region. This would help in disaster recovery when a zone goes down.
    • Some of the use cases with their flow with Object store. Please find comprehensive list below. 
    • CreateSnapshot -
      • createSnapshot - User invokes createSnapshot API on a volume residing in PS.
      • ManageSnapshotCommand - CS core sends ManageSnapshotCommand to the HV agent which actually creates the snapshot on PS.
      • Upload - Once the snapshot is created it will be backed up on Object Store. 
        • XS and KVM - We inject scripts into the HV to push the snapshot into OS.
        • Vmware - BackUpSnapshotCommand puts the snapshot on SS and finally from SS its pushed to OS. 
diag1
  • Template changes - 
    • Storage - Templates will be stored in Object store available in the entire region. SS available per zone will be used for caching the templates for faster deployment of vms.
    • Some of the use cases with their flow with Object store. Please find comprehensive list below.
    • PrepareTemplate on PS while deploying vm  TemplateDiag
    • *CreateTemplate from Snapshot for Vmware * createTemplateFromSnapshot(Vmware)
    • *CreateTemplate from Snapshot for Xenserver. * CreateTemplateFromSnapshot(Xenserver)
  • Fitting it into the new storage framework -  
    • S3ImageDataStoreDriver - Would contain all the specific logic to communicate with NFS VM. Following methods will be implemented.
    • S3DataStoreProvider
    • S3ImageDataStoreImpl
  • Sharing credentials between S3 and CS core 
    • CS core would need to know the S3 credentials so that it can store users objects in the appropriate location in Object Store.

Use cases and their flow

  • Templates
    • Adding Templates -
      • The user will use S3 api's to create a bucket in a specific region and put his template in the bucket. User will set acl layer to make it public/private/domainPublic.
      • User will register the S3 url with CS core.
    • Copying templates from Object Store to Primary-
      • The template would be first looked for in the Primary storage (PS).
      • If the PS doesn't have it then the scratch storage (SS) which acts as a cache at the zone level for storing the templates will be searched for.
      • If the template is not already present on the SS the NFS Export Service VM will copy the template from S3 to SS. Copying template from SS to PS will depend on the hypervisor and will leverage the existing functionality. For example - Xenserver - CS creates an SR out of SS and copies the template from SS SR to PS SR. For Vmware - NFS Export Service VM copies the template from SS to PS.
    • Creating templates from volume -
      • We will use SS to be an intermediate disk to store the template finally to object store.
      • Depending on the hypervisor we will use NFS Export Service VM to copy it from PS to SS.
      • Finally NFS Export Service VM will push the template out of SS to object store.
    • Creating templates from snapshots (can be used for DR if one zone goes down)-
      • A random SS (ideally choose the one which has least load) will be used as a storage where all the snapshots would be brought from S3 for template creation (NFS Export Service VM will do this).
      • Depending on the hypervisor, NFS Export Service VM might be involved in coalescing all the snapshots into a template.
      • Once the template is created it will be pushed back to the object store by NFS Export Service VM and the snapshots and template would be removed from the SS.
    • Extracting templates -
      • This should be very straightforward now where we would provide the s3 link to the end user to download from.
    • Deleting templates -
      • The template needs to be first unregistered from CS. Once its unregistered we will remove its copy from all the cache storages. If there is no vm using the template then we will delete its copy from ps as well.
      • Once the template is unregistered the user can use S3 call to delete the template (How would S3 know if its unregistered from CS core ?)
    • Template Sync -
      • When agent on NFS Export Service VM connects with MS we should sync the templates with the db on MS.
      • Alternatively, we could also delete all the templates cleaning up the cache. We need to figure out whats less problematic of the two.
    • Seeding System vm templates -
      • Admin seeds the system vm template on Object store. For XS and KVM we inject scripts so that it can pull the the template directly on its PS. Vmware doesn't need any work since it can work with a URI.
  • Snapshots
    • Creating Snapshots -
      • SS will be used as an intermediate disk to copy the snapshots.
      • NFS Export Service VM will finally copy it from from SS to object store.
    • Deleting snapshots -
      • The snapshot can be directly removed from S3.
      • In case coalescing is required we would leverage a random SS to do the operation.
  • Volumes
    • Upload Volumes -
      • First step of uploading should be similar to adding templates.
      • Once it is uploaded, attaching the volume to vm would utilize NFS Export Service VM and SS to put the volume on the right PS.
    • Download Volumes -
      • Copy the volume from PS to Object store and finally delete them after the expiration interval.
    • Copy Volumes - This would be possible as is.
    • Deleting Volumes (for the ones that are uploaded)-
      • Similar to deleting templates (Point 6 above).
  • Carving Secondary storage from Primary Storage -
    • Once an admin requests for nfs export service vm and if the configuration is complete (meaning >=1 host and >=1 primary storage) we will carve a secondary storage out of one of the ps and attach it to the vm which will then be formatted it as nfs.
    • The size of secondary storage would be configurable.
  • Garbage collection on SS -
    • We will have a periodic thread to remove LRU templates. The thread will also cleanup volumes and snapshots that got leaked into the storage for whatever reasons.
    • If there is no space left we can use LRU to remove the older templates and make space for new templates.
    • Every time we need to do snapshot/volume operation if there is no space left we can use the LRU to remove the oldest template.
  • Scaling algorithm for NFS Export Service VM-
    • We can use the current scaling algo for SSVM where we spawn new SSVMs based on the number of tasks current ones are handling. We would also monitor the capacity and if that is nearing exhaustion we would spawn new NFS Export Service VMs.
    • We need to scale down if the number of operations fall below a certain threshold.
  • Monitoring -
    • We would need to monitor the NFS Export Service VMs and see how much data is coming in and give a clear picture to the admin.
  • SS health check-
    • SS will be mounted on NFS export service vm and there will be a remote agent running on the vm. We will leverage the agent framework to check the health of this vm (It will be an HA vm as well).
  • Copy Templates across Regions - NOT SUPPORTED for 4.2.

Migration Strategy for moving templates/snapshots into OS

  • Still needs to be discussed out thoroughly. Following needs to be thought over before taking one of the strategies.
    • The idea should be to provide a robust way which helps migrating/copying virtually Terabytes of templates and snapshots securely into object store and update the DB of their location.
    • We would also need to kill the Image Transfer Service VM at each zone and admin would need to start NFS export service vm.
    • We would want to achieve migration with minimum downtime of MS, minimum performance affect of the template/snapshot functionalities, a good way to revert back in case of
    • Once the migration is doen shut down nfs secondary storage.
  • Strategy 1 - Provide a tool to copy data into OS. (approved so far)
    • Provide a tool to the system admin which will do the migration. This will be done with MS running.
      • Day 1 - sys admin decides to migrate to Object Store. (Note :- this can happen any time after 4.2 upgrade or coincide with 4.2 CS upgrade)
      • Day M - Sys admin starts the operation to reliably copy his data from nfs sec. storage to OS. For this he starts this tool (Now the work can be done in one go by the tool or several runs (say run it during non peak hours) of the tool. Basically the tool needs to know the mapping from source nfs sec. storage to object store.). All this while the MS is running on a version <= 4.2 with all the operations.
      • Day N - He is done with data migration.
      • Day X - Time for formally introducing the OS, but wait there are some delta objects created that are not migrated. Shut down MS. Run the tool again to copy the delta objects into OS. Update the DB to store the new location for all the templates and snapshots. Shutdown NFS sec. storage and SSVM. Admin starts the MS with Object Store.
  • Strategy 2 - JIT push into Object store.
    • Both nfs secondary storage and Object store co exist.
    • If the user in Zone B asks for a template in Zone A, we push it into OS from nfs secondary storage in Zone A.
    • When do we shut down the nfs secondary storage ? Currently we have a sync mechanism for swift where we sync the templates/snapshots on swift from nfs.We can create a similar mechanism with better performance (like doing copy in batches ? or something like admin controlled so that he can invoke it in non-peak hours)
  • Strategy 3 - Delay the migration to next release.
    • Lets not take care of migration and both nfs secondary storage and Object store will have to co exist.
    • For templates and snapshots created before 4.2 they wont be available cross zones. Will both the vms (Image Transfer and NFS export Service) exist per zone or we can stuff all the functionalities in the latter and will the nfs secondary storage become the scratch storage or we create another scratch storage? What will happen to subsequent snapshots created for snapshots created pre 4.2 - where will these reside ? - in nfs secondary storage ?
    • Ruled out as this has been one of the biggest pain point for CS deployers while using swift as well

Architecture and Design description

Web Services APIs

UI flow

Open Issues

Test cases

TBD

Appendix

Appendix A:

Appendix B:

  • No labels