Introduction

This feature allows CloudStack administrators to automate the upgrade or patching process on KVM hosts with little manual intervention. Leveraging the use of scripts defined by the administrator, this feature allows the administrator to select multiple zones, pods, clusters or hosts ensuring that hosts will perform the script's execution when hosts are in maintenance. After the tasks defined in the maintenance script are executed, the hosts are back to the previous state to continue being operational.

References

Pull Request

Targeted for ACS master (WIP):
https://github.com/apache/cloudstack/pull/3610

Document History

AuthorDescriptionDate
nvazquezInitial draft19 November 2019



Feature requirements, Architecture and Design description

  • New API is created: startRollingMaintenance. Starts rolling maintenance of the hosts in the selected scope, can be zones, pods, clusters or multiple hosts. 
    • Mutually exclusive parameters: hostids, podids, clusterids, zoneids
    • Force parameter: 
  • The rolling maintenance process is asynchronous and iterative, managed by the management server.
    • It is achieved by executing multiple stages on the KVM hosts within the selected scope in 'Up' state consisting on multiple stages:
      • Preflight: Pre flight checks are performed on every host in the scope before taking any action to ensure every host is capable of entering maintenance
      • Premaintenance: Pre maintenance checks. These checks are executed just before entering maintenance on the host.
      • Maintenance: Tasks to execute while the host is in maintenance
      • PostMaintenance: Post maintenance checks/verification.
    • Before taking any action, the management server performs two type of checks:
      • Capacity checks: For every host within the selected scope, CloudStack checks if there is enough capacity in each host's cluster to ensure that maintenance mode is possible for that host (host tags, affinity rules, CPU and memory capacity checks)
      • Pre flight checks: For every host within the selected scope, the PreFlight script is executed. If any of these fail, and the 'force' flag is not enabled, then the process does not start. If the 'force' flag is enabled, then the process starts anyways.
    • The administrator can set a script per stage, either python or bash script in a location which must be specified in the agent.properties file as the key: 'rolling.maintenance.hooks.dir'. If a stage does not contain a script file, then it is skipped
    • The scripts execution can be executed directly from the KVM agent or through a systemctl service on each KVM host. This can also be selected on the agent.properties file by the property: 'rolling.maintenance.service.executor.disabled'.
      • Its value is false by default, meaning that the default behaviour of to execute the scripts through the service, externally to the KVM agent
      • If 'rolling.maintenance.service.executor.disabled'=true, then the KVM agent is in charge of invoking each script execution.

Target users

  • CloudStack Administrators.

API Changes

New API: startRollingMaintenance. Parameters:

  • zoneids, podids, clusterids, hostids: Mutually exclusive parameters to selected multiple hosts within a scope
  • force: Optional boolean parameter, default = false. If force = true, then the process will start despite a failure on pre flight checks.
  • timeout: Optional long parameter, to specify the time in seconds in which the management server must wait for a response from each host within the scope
  • payload: Optional strin parameter, to specify extra parameters for each script to be executed

DB Changes

N/A

Hypervisors supported

This feature is supported only for KVM.

UI Flow

The Zones, Pods, Clusters, Hosts pages are extended. The table displaying each entity allows multiple selection. Once multiple entities are selected in those pages, a new button is displayed, allowing to start the rolling maintenance process for the hosts in 'Up' state within the selected scope:

The 'Start Rolling Maintenance' button displays a new dialog, in which the administrator can set the parameters described above in the API changes section:

The 'Start Rolling Maintenance' button is also displayed within the detailed view of a zone, pod, cluster and host.

  • In case of a host, this button is only displayed when the host hypervisor is KVM and the host is in 'Enabled' state.

Open Items/Questions

  • NA
  • No labels