Introduction

Auto Scaling allows you to scale up or scale down back-end services or application virtual machines(guest VMs) based on various conditions you define and thereby ensure optimum use of virtual resources. Conditions for triggering a scale up or scale down action can vary from a simple use case like monitoring the CPU usage of a server to a complex use case say of monitoring a combination of server's responsiveness and its cpu usage. Loadbalancers like NetScaler are well placed to monitor all aspects of a server’s health and work in unison with Cloud Orchestration products like CloudStack to initiate scale up or scale down actions. This feature is an extension to ELB support and currently supported only in NetScaler 10.0

Purpose

This is the functional specification for the LB rule based AutoScale feature of CloudStack.

Document History

S.No.

Version

Date

Author

Remarks

1

1

13 June 2012

Deepak

First Draft of FS

2

1.1

02 July 2012

Ram Ganesh

second draft version. Revised various sections

3.

1.2

10 July 2012

Vijay Venkat

Updated with list and delete commands and added account info in create commands

4.

1.3

18 July 2012

Vijay Venkat

Introduced with defaults and update APIs

5.

1.4

8 Aug 2012

Vijay Venkat

Added recommended steps and more text to update commands.

Glossary

Counter

Performance counters to be used  for monitoring the health of the back-end services(guest VMs) which are being load balanced by load balancers such as NetScaler. These counters can be either SNMP based counters or the counters supported by loadbalancers. We could ship CloudStack with a set of built-in popular counters.

Condition

Conditions express criteria for triggering an autoscale action. It uses the counters defined above.

AutoScale Policy

AutoScale Policy defines a policy for taking an autoscale action by combining conditions defined above with an autoscale action such a scaleup or scaledown

AutoScale Vm Profile

AutoScale Vm Profile is a container for various settings to be used while taking scale up or scale down action. Settings could be what template id to use,what service offerings to use etc

AutoScale Vm Group

AutoScale Vm Group associates scaleup and scaledown policies with a loadbalancing  rule. It also defines the size of the Autoscale'd vm group.

Feature Specifications

Auto Scaling allows you to scale up or scale down back-end services or application virtual machines(guest VMs) based on various conditions you define and thereby ensure optimum use of virtual resources. 
Use cases

  • Add a NetScaler loadbalancer device into a zone configured either in Basic or Advanced mode.
  • Configure an Elastic Load balancer rule
  • Configure AutoScale feature on the loadbalancer rule using either AutoScale APIs or CloudStack UI

Architecture and Design description

Collector/Monitor: This module is responsible for gathering metrics from various guest VMs. Timers wakes up a collector/monitor periodically for metric collection purpose. Periodicity of timer is configurable.Monitoring framework in NetScaler is rich in protocol support varying from L3 to L7 with deep packet inspection capabilities.

Aggregator: This module aggregates the collected metric values based on various algorithms such as average,minimum,maximum etc. In NetScaler this is part of the monitoring infrastructure.

Trigger/Alarm Generator: Timer based triggers monitors the output of aggregators,evaluates autoscale conditions and generates a trigger/alarm if the condition is evaluated to true.In NetScaler this is part of powerful policy based infrastructure responsible for various expression evalutions and actions.

Trigger/Alarm Handler: Trigger/Alarm Handlers acts on the trigger/alarm and initiates a scaleup/scale down action working in unison with Cloud orchestration products such as CloudStack.This is a web service residing in the NetScaler management plane and acts as a glue between CloudStack management server and NetScaler data plane.

In the current implementation all the above 4 modules reside inside NetScaler system. The interaction between Trigger Generator and Handler is based on standard json payload over HTTP.

Following are the proposed APIs for this feature. For all the APIs defined corresponding delete and list* commands will also be provided

createCounter

Available to ROOT admin only

Request parameters

  • source: defines the source of the  counter - SNMP | NetScaler
  • name: CloudStack defined name for the counter
  • value: OID if the source is SNMP and  loadbalancer exposed counter name if source is NetScaler

New tables

  •   counter(id,uuid,source,name,value)

deleteCounter

Available to ROOT admin only

Request parameters

  • id: counter id

listCounters

Available to all users

Request parameters

  • id: counter id
  • name: query based on name
  • source: query based on source
  • keyword: applied to search in name

createCondition

Request parameters

  • counterid: counter id
  • relationaloperator: relation operation - GT | GE | LT | LE | EQ
  • threshold: threshold value for the monitored counter
  • account
  • domainid

New tables

  • conditions(id,uuid,counter_id,threshold,relational_operator,account_id,domain_id)

deleteCondition

Request parameters

  • id: condition id

listConditions

Request parameters

  • id: condition id
  • policyid (to list the conditions of a policy)
  • counterid (query based on counter's participating in condition)

createAutoScaleVmProfile

Request parameters

  • zoneid: zone id where VMs will be deployed
  • templateid: CS template id
  • serviceofferingid: service offerings to be used while deploying a VM as part of scaleup autoscale action (basically during the deployvirtualmachine)
  • otherdeployparams: Any other VM deploy parameters to be used while deploying a VM as part of scale-up autoscale action. Multiple parameters can be specified using URL parameter convention
  • snmpcommunity: SNMP community to be used while gather counter values using SNMP. Default: "public"
  • snmpport: SNMP agent port. Default: 161
  • destroyvmgraceperiod: Time,in seconds, to wait before a VM is actually destroyed as part of scale-down autoscale action. Default : 120 seconds
  • autoscaleuserid: CS username to be used while issuing scale up and scale down action. Default: caller. This parameter also determines the account/domain for which the profile is created. ApiKey and SharedKey should be generated for this user, if not then the call should error out. Default: caller

New tables

  • autoscale_vmprofiles(id,uuid,zone_id,domain_id,account_id,service_offering_id,template_id,other_deploy_params,snmp_community,snmp_port,destroy_vm_grace_period,autoscale_user_id)

updateAutoScaleVmProfile

Request parameters

  • id: autoscale profile id
  • templateid: CS template id
  • snmpcommunity: SNMP community to be used while gather counter values using SNMP. Default: "public"
  • snmpport: SNMP agent port. Default: 161
  • destroyvmgraceperiod: Time,in seconds, to wait before a VM is actually destroyed as part of scale-down autoscale action. Default : 120 seconds
  • autoscaleuserid: CS username to be used while issuing scale up and scale down action. Default: caller. This parameter also determines the account/domain for which the profile is created. Default: caller

NOTE: An autoscale vm profile can be updated only if all the VM Groups it is associated with are in disabled state. disableAutoScaleVmGroup has to be called to all Vm Groups associated with this profile before issuing this command.

deleteAutoScaleVmProfile

Request parameters

  • id: autoscale profile id

listAutoScaleVmProfiles

Request parameters

  • id: autoscale profile id
  • templateid: template id
  • otherdeployparams

createAutoScalePolicy

Request parameters

  • action: actions - scaleup | scaledown
  • conditionids: array of conditions which need to be evaluated before an autoscale action is triggered
  • duration: duration,in seconds, the  conditions should hold true for an autoscale action is to be invoked
  • quiettime: Time,in seconds. This is the cool down period after an action is initiated. It will allow the fleet to come to a stable state before any action can take place. Default:300 seconds.

New tables

  • autoscale_policies(id,uuid,zone_id,domain_id,account_id,duration,quiet_time,action)
  • autoscale_policy_condition_map(id,policy_id,condition_id)

updateAutoScalePolicy

Request parameters

  • id: autoscale policy id
  • *conditionids: one or more condition ids which need to be evaluated before an autoscale action is triggered
  • duration: duration,in seconds, the  conditions should hold true for an autoscale action is to be invoked
  • quiettime: Time,in seconds. This is the cool down period after an action is initiated. It will allow the fleet to come to a stable state before any action can take place. 

NOTE: An autoscale policy can be updated only if all the VM Groups it is associated with are in disabled state. disableAutoScaleVmGroup has to be called to all Vm Groups associated with this policy before issuing this command.
* The conditionids are always reset, no addition or deletion of policies is possible. Meaning, in order to add one more to the list, pass the existing and the new ones. In order to delete one, remove the one to be deleted from the existing list and pass the remaining.

deleteAutoScalePolicy

Request parameters

  • id: autoscale policy id

listAutoScalePolicies

Request parameters

  • id: autoscale policy id
  • action: scale up or scale down.
  • vmgroupid: the autoscale policies of a vmgroup.
  • conditionid: any of the conditions in a policy. (all policies using a condition will be returned)

createAutoScaleVmGroup

Request parameters

  • lbruleid: Loadbalancer rule which is to be enabled for autoscale.
  • minmembers: minimum number of  VM instances   (At the creation of the vm group, the fleet will grow till it reaches this minimum number of  vm instances)
  • maxmembers: maximum number of  VM instances (This is the maximum number of vm instances that the fleet grow to. Once it reaches this limit it wont scaleup even if the policies are hit)
  • vmprofileid: Autoscale Vm Profile id
  • interval: frequency,in seconds, in which the conditions need to be evaluated. Default : 30 seconds
  • scaleuppolicyids: scale up policy ids
  • scaledownpolicyids: scale down policy ids

New tables

  • autoscale_vmgroups(id,uuid,domain_id,account_id,lbrule_id,min_members,max_members,member_port, interval, profile_id,state)
  • autoscale_vmgroup_policy_map(id,policy_id,vmgroup_id)

updateAutoScaleVmGroup

Request parameters

  • id: autoscale vm group id
  • minMembers: minimum number of  VM instances
  • maxMembers: maximum number of  VM instances
  • interval: frequency,in seconds, in which the conditions need to be evaluated. Default : 30 seconds
  • scaleuppolicyids: scale up policy ids
  • scaledownpolicyids: scale down policy ids

NOTE: An autoscale vm group can be updated only if it is in disabled state. disableAutoScaleVmGroup has to be called to first before issuing this command.

deleteAutoScaleVmGroup

Request parameters

  • id: autoscale vm group id

listAutoScaleVmGroups

Request parameters

  • id: autoscale vm group id
  • lbruleid: to query vmgroup based on a loadbalaner
  • vmprofileid: to query vmgroups based on a vm profile
  • policyid: to query vmgroups based on a policy
  • zoneid: to query vmgroups

disableAutoScaleVmGroup

Request parameters

  • id: autoscale vm group id

This API is used to disable autoscaling on a AutoScaled Vm Group. By default on creation, autoscaling is enabled on an Autoscale Vm Group. A user could disable AutoScaling in a Vm Group for 2 reasons.
1. A need to update the autoscale config. An autoscale config here means updation of policy parameters, addition/deletion of new conditions in a scaleup or scale down policy or updation of autoscale vm profile parameters or updation of vm group parameter itself.
2. For performing a maintenance operation on his fleet. This will enable the user to bring down/up any members of his fleet, during which autoscaling should be disabled. Meaning, no operation like scale up / down would happen.

enableAutoScaleVmGroup

Request parameters

  • id: autoscale vm group id

This command is used to enable autoscaling on a AutoScale Vm Group. By default autoscaling is enabled on a autoscaled Vm Group during creation. A user could have disabled AutoScaling in a Vm Group for several reasons, this API is called to enable the AutoScale Vm Group back to autoscaling. Once this is done the fleet with scale up and down depending on the policies configured.

Event Types

COUNTER.CREATE
COUNTER.DELETE
CONDITION.CREATE
CONDITION.DELETE
AUTOSCALEPOLICY.CREATE
AUTOSCALEPOLICY.DELETE
AUTOSCALEPOLICY.UPDATE
AUTOSCALEVMPROFILE.CREATE
AUTOSCALEVMPROFILE.UPDATE
AUTOSCALEVMPROFILE.DELETE

AUTOSCALEVMGROUP.CREATE
AUTOSCALEVMGROUP.DELETE
AUTOSCALEVMGROUP.UPDATE

AUTOSCALEVMGROUP.ENABLE

AUTOSCALEVMGROUP.DISABLE

Operations not supported

  • No implicit delete is allowed. Meaning, if counter is used in a condition the user cannot delete a condition and such restriction applies to all entities.
  • Only autoscaled VMs(VMs provisioned as part of loadbalancer devices such as NetScaler initiating a provision call) can be assigned to an AutoScale enabled LB rule. What this means is, autoscaled VMs and explicitly bound VMs cannot co-exist for a loadbalancer.

Quality Risks

  • Overload scenarios: Specifying a large value for maxMembers parameter will result in large number of VM instances to be provisioned and all the more easily if the autoscale conditions are kept at a low watermark
  • *corner cases and boundary conditions: *After issuing a provision call to CloudStack and before provision call is complete if NetScaler reboots/shutdowns for some reason then the provisioned VM will not be part of an LB rule though the intent was to assign it to an LB rule. Workaround: AutoScale provisioned VMs will be named based on the LB rule name/Id so at any point of time we can reconcile the VMs to its LB rule.
  • *negative usage scenarios: *external* destroyVM CloudStack API call on an autoscaled VM will leave NetScaler LB configuration in an inconsistent state. Specified configuration changes in NetScaler need to be made recover from this inconsistent state

external*: CloudStack API calls made outside the context  of Autoscale

Logging and  Debugging

  • All logs will go into CloudStack management server logs
  • Standard NetScaler logs,ns.log,will also be updated with all autoscale action details,error conditions etc.

Run-time environment requirements

  • For monitoring SNMP based counters ensure the SNMP agents are installed in guest VMs  and SNMP operations like GET work with configured SNMP community and port by using standard SNMP managers.
  • If a ApiKey and SharedSecretKey is re-generated for autoscaleuserid then the user has to call disable and then enabled for the VmGroups it is participating in.
  • Before setting up the management server for AutoScale, the Global setting "EndpointeUrl" has to be set to the MS's API url. If mutliple server deployment then the MS's LB ip has to be mentioned in this. Also the loadbalancer should have access to this IP to provide AutoScale support. 
  • If there is an update in the EndpointeUrl all the AutoScale Vm Groups in the system should be disabled and enabled back to reflect the change.
  • The following sequence of steps is recommended before your create the AutoScale Config.
    • Prepare a template. 
      • Very important step! 
      • When a VM is deployed using this template and it comes up, the App should be in running state. 
      • If this is not the case then AutoScale will consider this VM as useless and will keep provisioning vms unconditionally till working App vms reach VmGroup.minMembers.
    • Deploy the template (make sure the app comes up on the first boot and is ready to take traffic, make a note of the the time it requires to deploy the template, it can be used to specify the quiet time in autoscale policy).
    • Repeat steps 1 & 2 till it is possible to create the right template. Even if you are confident with your template, it is always recommended to deploy once, because the first deploy usually takes time and you don't want that happening during autoscale.
    • Create a load balancer and assign the loadbalancer VM that got deployed in Step 2.
    • Delete the VM and LoadBalancer created in Step2 and Step 4.
    • Proceed with AutoScale config. 
    • Check the events page to observer how autoscaling is happeing.

Performance considerations

  • minMembers and maxMembers parameters which are part of AutoScaleVMGroup should be properly configured to ensure resources are used optimally. Any mis-configuration could lead to wasteful provision of resources
  • autoscale conditions should be configured after due consideration to ensure there are no frequent provision/de-provision calls.

Evaluation of possible security attacks against the feature 

  • DDoS attack on LB rule which in turn loads the guest VMs being load balanced. This could trigger artificial load scenarios and thereby trigger scale-up  autoscale actions. Load balancers systems such as NetScaler have inbuilt capability to filter DDoS attacks
  • Sophisticated L7 layer attacks could be triggered to infuse artificial load conditions. Filters in NetScaler can be configured to weed out these misbehaving clients

Interoperability and compatibility requirements

  • Load balancer: NetScaler 10.0 release supports this feature. 

UI flow:

A basic idea of the UI.

On clicking the configure button, a configuration window will popup allowing an admin to configure AutoScale feature .

  • No labels