Introduction

Purpose

The objective of this feature is 

  • Give admin control to sequence the upgrade of the system Vms by:
    1. Infrastructure hierarchy: by Cluster, Pod, and Zone etc.
    2. Administrative hierarchy: by Tenant or Domain 
  • Minimize service interruption to users
  • Improve the speed of the upgrade time by making as many upgrade operations in parallel as possible

Scope

Scope of this document is to provide a functional specification for the SystemVm Upgrades implementation planned for the 4.3 release of CloudStack.

Non Goals

  • Multiple version support
    • Support of multiple systemVms is designed only for the transition period. All systemVms should be upgraded as soon as possible. 
    • If VR is not at the latest version, commands are not sent to the VR. Services of old VR  will continue to be available, but no further commands can be sent to VR until it is upgraded to latest version. 

Motivation

To address the pain points during the upgrade of system Vms:

  • Upgrade takes ‘long’ time and the time exponentially increases with the size of the cloud
  • There is no way to sequence upgrade of different parts of the cloud, i.e., specific clusters or pods or even zones. 
  • Similarly, there is no way to determine when a particular customer’s services (e.g. VR) will be upgraded within the upgrade interval
  • For the entire duration of the upgrade users cannot launch new services or make changes to existing service
  • For certain amount of time, the users do not have access to the existing services (e.g. during VR reboot)
  • Upgrade can fail prior to completion, but it takes a long time to detect the failure

References

Proposal: http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201310.mbox/%3CC7522783E0E6D3488D82B9A7F7F004B912993730@SINPEX01CL01.citrite.net%3E
Jira Ticket: CLOUDSTACK-4793

Feature Specifications

Sequence the upgrade of the system Vms

systemVms are currently upgraded using cloud-sysadm script. This scripts upgrades all systemVms by rebooting them in sequence. To provide flexibility to upgrade VRs selectively, upgradeRouterTemplate API will be added. (upgradeRouter API already exists, it is used for changing router service offering. Hence using the name, upgradeRouterTemplate)

upgradeRouterTemplate API  will have the following parameters:

  • id: Upgrade specified VR
  • zone_id : Upgrade VRs in the specified zone
  • pod_id : Upgrade VRs in the specified pod
  • cluster_id : Upgrade VRs in the specified cluster
  • domain_id : Upgrade VRs belonging to the specified domain
  • account_id : Upgrade VRs belonging to the specified account
    Using the above parameters,  upgrade can be controlled by infrastructure or administrative hierarchy.

Support multiple SystemVm versions

Before sending any command to the VR, MS checks if the VR is at the latest version. If VR is not at the latest version, command is not sent to the VR.
Services of older version VR will continue to be available, but no further commands can be sent to VR until it is upgraded to latest version. This will be a transient state until the VR is upgraded.
This will ensure the availability of VR services and VR state is not impacted due to MS upgrade. 

SystemVm Version

VR version from domain_router table

Improve the speed of the upgrade

Prepare systemVm Template

SystemVms are upgraded during restart. New systemVm template is downloaded to primary storage during the restart (after stopping and before start) of the 1st VR in the cluster. This may take considerable time.
Downtime can be minimized by downloading systemVm template to primary storage before restarting systemVm. prepareTemplate API should be used to download systemVm template to aa primary storages in a zone. SystemVm templates are never GCed, so it is safe to download them to primary storage before upgrading VRs.

Parallel Execution

upgradeRouterTemplate API will execute jobs in parallel, rather than executing 1 operation at a time. This will reduce the time taken to complete upgrade.

Database

None

API Changes

New APIs

upgradeRouterTemplate

 - Parameters

  • zone_id : Upgrade systemVms in the specified zone
  • pod_id : Upgrade systemVms in the specified pod
  • cluster_id : Upgrade systemVms in the specified cluster
  • domain_id : Upgrade systemVms belonging to the specified domain
  • account_id : Upgrade systemVms belonging to the specified account
  •  

- response parameters

  • async job ids of the reboot VR API (This jobid can be queried using queryasyncjobresult api to know the status of VR reboot)

Changes to existing APIs

listRouters

  - new parameters

  • version: list routers by specified version
  • zone_id : list routers in specified zone
  • pod_id : list routers in specified pod
  • cluster_id : list routers in specified cluster
  • account: list routers owned by specified account (requires domainId also to be specified)
  • domain_id : list routers owned by specified domain   

- new response parameters

  • version : (String) Router version : e.g. 4.3.0
  • requiresupgrade: (Boolean) Flag to indicate if the router template requires upgrade

Recommended Upgrade Procedure

  1. While on older version, download new system Vm template
  2. Download new systemVm Template to all primary storage pool using prepareTemplate API
  3. Upgrade MS to latest version
  4. Upgrade CPVM and SSVM as mentioned in below step
  5. All VRs are still on older version, existing services will continue to be available. MS cannot send any commands to VR until it is upgraded
  6. Selectively upgrade VRs using upgradeRouterTemplate API (zone/pod/cluster/account/domain wise)
  7. MS will send command to upgraded VRs and normal functioning resumes
  8. For VRs which are not upgraded, services will continue to be available (VR state is same as before upgrade). VR cannot receive new commands until VR is upgraded.

CPVM and SSVM

CPVM and SSVM can be upgraded by rebooting from the UI or by using the script:

$ cloudstack-sysvmadm -d <IP address> -u cloud -p -s

s option will upgrade only SSVMs and CPVMs.

List of Commands sent to VR

  • SetSourceNatCommand
  • SetFirewallRulesCommand
  • IpAssocCommand
  • SetStaticNatRulesCommand
  • Site2SiteVpnCfgCommand
  • RemoteAccessVpnCfgCommand
  • SavePasswordCommand
  • DeleteIpAliasCommand
  • VmDataCommand
  • CheckRouterCommand
  • VpnUsersCfgCommand
  • DhcpEntryCommand
  • BumpUpPriorityCommand
  • SetNetworkACLCommand
  • DnsMasqConfigCommand
  • GetDomRVersionCmd
  • SetupGuestNetworkCommand
  • CreateIpAliasCommand
  • SetStaticRouteCommand
  • CheckS2SVpnConnectionsCommand
  • SetPortForwardingRulesCommand
  • UserDataCommand
  • HealthCheckLBConfigCommand
  • LoadBalancerConfigCommand
  • IpAssocVpcCommand

Above commands are sent to VR using sendCommandsToRouter method in com.cloud.network.router.VirtualNetworkApplianceManagerImpl. Version check will be added to this method to control the commands sent to VR.

Exception: NetworkUsageCommand will continue to be sent to VRs which are not upgraded. There wont be any loss of network usage data

List of VR Services

  • SecurityGroup
  • UserData
  • DHCP
  • DNS
  • LB
  • PortForwarding
  • VPN
  • StaticNat
  • SourceNat
  • Firewall
  • Gateway
  • NetworkACL

Above service will be available even if the VR is not upgraded. But no changes for the service can be sent to the VR, until it is upgraded

Supported VRs

- VR

- VPC VR

- Redundant VR

UI

SystemVm version will be displayed on the UI. "Requires Upgrade" will be displayed along with version for systemVms which are not upgraded.

Limitations

  •  

Open Issues

  1. Add operations page to show upgrade status?
    1. Could be considered in future release
  2. Impact on Project Accounts has to be investigated
  • No labels