This is the functional specification for the supporting Dell EMC PowerFlex/ScaleIO storage plugin.
Github issue#
master
CloudStack currently supports Ceph/RBD storage which is a distributed block storage. PowerFlex (formerly ScaleIO/VxFlexOS) also provides distributed shared block storage. This proposed feature will add support for PowerFlex (v3.5 & above) storage as a primary storage (new Storage Plugin) in CloudStack.
Author | Description | Date |
---|---|---|
Added feature specification and design | 27 Aug 2020 |
This feature should able to:
ScaleIO basic architecture consists of SDC, SDS and MDM, details below.
The ScaleIO Data Client is a lightweight block device driver that exposes ScaleIO shared block volumes to applications. The SDC runs on the same server as the application. This enables the application to issue an IO request and the SDC fulfills it regardless of where the particular blocks physically reside. The SDC communicates with other nodes (beyond its own local server) over TCP/IP-based protocol, so it is fully routable. For this feature to work, all KVM hosts will need SDCs to be installed and connected to MDM.
The ScaleIO Data Server owns local storage that contributes to the ScaleIO Storage Pools. An instance of the SDS runs on every server that contributes some or all of its local storage space (HDDs, SSDs, PCIe, NVMe and flash cards) to the aggregated pool of storage within the ScaleIO virtual SAN. Local storage may be disks, disk partitions, even files. The role of the SDS is to actually perform the Back-End IO operations as requested by an SDC
The Meta Data Manager manages the ScaleIO system. The MDM contains all the metadata required for system operation; such as configuration changes. The MDM also allows monitoring capabilities to assist users with most system management tasks. The MDM manages the meta data, SDC, SDS, devices mapping, volumes, snapshots, system capacity including device allocations and/or release of capacity, RAID protection, errors and failures, and system rebuild tasks including rebalancing. In addition, all user interaction with the system is handled by the MDM. This is similar to the Ceph monitor or manager.
The ScaleIO Gateway services RESTful API requests and connects to a single MDM and services requests by querying the MDM, and reformatting the answers it receives from the MDM in a RESTful manner, back to a REST client. Every ScaleIO scli command is also available in the ScaleIO REST API. Responses returned by the Gateway are formatted in JSON format. The API is available as part of the ScaleIO Gateway package. For the integration to work with CloudStack, the gateway must be installed and be accessible for the CloudStack control plane. There is also a GUI client for Windows, Mac and Linux for administrator to monitor and manage a cluster.
A Protection Domain is a set of SDSs. Each SDS belongs to one (and only one) Protection Domain. Thus, by definition, each Protection Domain is a unique set of SDSs. The ScaleIO Data Client (SDC) is not part of the Protection Domain.
Storage Pools allow the generation of different performance tiers in the ScaleIO system. A Storage Pool is a set of physical storage devices in a Protection Domain. Each storage device belongs to one (and only one) Storage Pool. When a Protection Domain is generated, it has one Storage Pool by default.
Datacenters are designed such that a unit of failure may consist of more than a single node. The fault set will limit mirrored chunks from being in the same fault set. A minimum of 3 fault sets is required per protection domain. And therefore the basic ScaleIO setup requires a 3-node (SDS) cluster setup
Single accessible/logical storage drive that can be accessed by hosts as block-based storage. A volume can be mapped and un-mapped on one or more SDCs, or for this feature can be mounted/unmounted on one or more KVM hosts appearing as a block-storage disk device.
The ScaleIO storage system enables users to take snapshots of existing volumes, up to 127 per volume. The snapshots are thinly provisioned and are extremely quick. Once a snapshot is generated, it becomes a new un-mapped “volume” in the system. Users manipulate snapshots in the same manner as any other volume exposed to the ScaleIO storage system.
All the snapshots resulting from one volume is referred to as a V-Tree (or Volume Tree). It’s a tree spanning from the source volume as the root, whose siblings are either snapshots of the volume itself or descendants of it. Each volume has a construct called a vTree which holds the volume and all snapshots associated with it. The limit on a VTree is 128 ‘volumes and snapshots’ – so one is taken by the original volume and the rest (127) are available for snapshots [1].
A consistency group is created when snapshot is taken on two or more volumes.
The ScaleIO counterparts above can be mapped and used with CloudStack and KVM as follows:
A ScaleIO storage pool can be mapped 1:1 with a CloudStack storage pool and store the host/IP, port, username, password and id/name of the ScaleIO storage pool in CloudStack DB.
Templates can be of QCOW2 or RAW type, no changes in secondary storage or template/iso lifecycle are necessary.
At the time of root-disk/VM provisioning, the KVM host agent can convert a template from secondary storage or direct-download into a RAW disk and write it to a mounted block-storage device (i.e. the mapped ScaleIO volume), which is the spooled template on the primary pool.
Root disk resize will cause resize of the related ScaleIO volume, similarly deletion of the root disk will cause deletion of the ScaleIO volume after unmapping it across all KVM hosts.
On provisioning data-disks can be simply volumes that are created in ScaleIO that can be mapped on a KVM host and attached as “raw” disk(s) to a VM. Detach operation would be to detach the raw block-storage device from the VM and un-mapping the volume from a KVM host. Data disk resize will cause resize of the ScaleIO volume, similarly deletion of the data disk will cause deletion of the disk on ScaleIO after unmapping it on all KVM hosts.
CloudStack volume snapshots can be mapped 1:1 with volume snapshot on ScaleIO side. This will not require any backup operation to secondary storage by default similar to Ceph. A backup to secondary storage operation is possible by mounting the secondary storage and mapping the ScaleIO snapshot/volume on the KVM host and perform a block based transfer (using dd or qemu-img) or using qemu-img.
Creating snapshots of more than one ScaleIO volume creates a consistency group on ScaleIO side. For a running VM, VM snapshots with memory is not possible for root-disks on ScaleIO storage. Only VM snapshots without memory is possible (a consistent snapshot of root and data-disks of a VM).
Any storage IOPS settings can be taken and applied to a ScaleIO volume based on the compute offering for root-disk and the disk offering for a data-disk.
Note: due to ScaleIO limitation, the disk sizes must be multiples of 8GB, otherwise ScaleIO will roundoff and create disk with sizes on the boundary of 8GB.
This feature can be refactored in CloudStack where a local scratch/cache space can be defined on the KVM hosts for hosting the config drive ISOs, with a global setting that can change to behaviour of where to host the config drive isos (secondary storage, primary storage, local/scratch path on the host).
This feature would require a caching/scratch space to download a template and then use that to perform block-based copy to a mapped/mounted ScaleIO volume before it could be used a root disk.
ScaleIO allows migration of an entire VTree from one storage pool to another storage pool of the same system. Therefore, storage migration will be limited to storage pools managed by the same ScaleIO cluster gateway/manager.
Implement a new CloudStack storage plugin for ScaleIO storage. This will follow the design principles abstracted by CloudStack API for implementing a pluggable storage plugin.
Introduce a new storage pool type “PowerFlex” that associates with PowerFlex/ScaleIO storage pool and allows for shared storage and over-provisioning. This type is used across various operations for handling a storage pool specific handling of operations especially on the hypervisor (KVM agent) side. Implement a new storage volume/datastore plugin with the following [2]:
i. ScaleIO Datastore Driver: a primary datastore driver class that is responsible for lifecycle operations of a volume and snapshot resource such as to grant/revoke access, create/copy/delete data object, create/revert snapshot and return usage data.
ii. ScaleIO Datastore Lifecycle: a class that is responsible for managing lifecycle of a storage pool for example to create/initialise/update/delete a datastore, attach to a zone/cluster and handle maintenance of the storage pool.
iii. ScaleIO Datastore Provider: a class that is responsible for exporting the implementation as a datastore provider plugin for CloudStack storage sub-system to pick it up and use for the storage pools of type “PowerFlex”.
iv. ScaleIO gateway client and utilities: a ScaleIO Java-SDK that provides helper classes for the driver and lifecycle classes to communicate with the ScaleIO gateway server using RESTful APIs. The new thin ScaleIO API client (Java client) will have the following functionality:
→ Secure authentication with provided URL and credentials
→ List all storage pools, find storage pool by ID/name
→ List all SDCs, find SDC by IP address
→ Map/unmap volume to SDC (a KVM host)
→ ScaleIO Volume life cycle operations
→ Map/unmap volume to SDC (a KVM host)
→ Other volume lifecycle operations supported in ScaleIO
2. Hypervisor layer (KVM): The hypervisor layer would have the following design aspects:
ScaleIO StorageAdaptor and StoragePool: For handling of ScaleIO volumes and snapshots, a ScaleIO storage specific adaptor and pool management classes may need to be added. These classes will be responsible for managing storage operations and pool related tasks and metadata.
All storage related operations need to be handled by various Command handlers and hypervisor/storage processors (KVMStorageProcessor) as orchestrated by the KVM server resource class (LibvirtComputingResource) such as CopyCommand, AttachCommand, DetachCommand, CreateObjectCommand, DeleteCommand, SnapshotAndCopyCommand, DirectDownloadCommand, etc.
Scratch/cache storage directory path on KVM host:
→ Define a new scratch/cache path in agent.properties with a default path, for example /var/cache/cloudstack/agent/
→ The cache directory will be used to host config drive ISOs for VMs and temporary cache for direct download templates
The configuration settings changes below, are incorporated.
PowerFlex/ScaleIO Storage Pool:
Configuration | Description / Changes | Default Value |
storage.pool.disk.wait | New primary storage level configuration to set the custom wait time for ScaleIO disk availability in the host (currently supports ScaleIO only). | 60 secs |
storage.pool.client.timeout | New primary storage level configuration to set the ScaleIO REST API client connection timeout (currently supports ScaleIO only). | 60 secs |
custom.cs.identifier | New global configuration, which holds 4 chars randomly generated initially. This parameter can be updated to suit the requirement of unique cloudstack installation identifier that helps in tracking the volumes of a specific cloudstack installation in the ScaleIO storage pool, used in Sharing basis. | random 4 chars string |
Other settings added/updated:
Configuration | Description / Changes | Default Value |
vm.configdrive.primarypool.enabled | Scope changed from Global to Zone level | false |
vm.configdrive.use.host.cache.on.unsupported.pool | New zone level configuration to use host cache for config drives when storage pool doesn't support config drive. | true |
vm.configdrive.force.host.cache.use | New zone level configuration to force host cache for config drives. | false |
router.health.checks.failures.to.recreate.vr | New test "filesystem.writable.test" added, which checks the router filesystem is writable or not. If set to "filesystem.writable.test", the router is recreated when the disk is read-only. | <empty> |
The below parameters are introduced in the agent.properties file of the KVM host.
Parameter | Description | Default Value |
host.cache.location | new parameter to specify the host cache path. Config drives will be created on the "/config" directory on the host cache. | /var/cache/cloud |
powerflex.sdc.home.dir | new parameter to specify sdc home path if installed in custom dir, required to rescan and query_vols in the sdc. | /opt/emc/scaleio/sdc |
The following naming conventions are used for CloudStack resources in ScaleIO storage pool, which avoids the naming conflicts when the same ScaleIO pool is shared across multiple CloudStack zones / installations.
where,
[pool-key] = 4 characters picked from the pool uuid. Example UUID: fd5227cb-5538-4fef-8427-4aa97786ccbc => fd52(27cb)-5538-4fef-8427-4aa97786ccbc. The highlighted 4 characters (in yellow) are picked. The pool can tracked with the UUID containing [pool-key].
[custom.cs.identifier] = value of the global configuration “custom.cs.identifier”, which holds 4 characters randomly generated initially. This parameter can be updated to suit the requirement of unique CloudStack installation identifier, which helps in tracking the volumes of a specific CloudStack installation.
N/A
KVM
N/A
N/A
[2] CloudStack Storage-subsystem design
[3] Getting to Know PowerFlex/ScaleIO