Resource scaling
One the most important uses of orchestration is to allow consistent resource scaling. Such scaling can be done by manually updating a stack, through web hooks, or automatically when some alarm is triggered.
Contents
The term scaling in this part refers to cluster scaling, that is, changing the number of resources in a cloud configuration such as servers and volumes. The example templates take as parameters the names of desired flavor and image, as well as existing key pair, local network and security group.
OpenStack Heat resource groups implement container types for resources that provide a convenient way to create multiple identically configured resources by command or a triggered by a signal. The two types of resource groups are OS::Heat::ResourceGroup
which can be used to change the group size by modifying the count property in the parent stack, and OS::Heat::AutoScalingGroup
with one or more OS::Heat::ScalingPolicy
which allows scale operations triggered by signals (webhooks or workload alarms).
Update stack
Scaling can be enabled by creating a OS::Heat::ResourceGroup
object, which typically encapsulates a resource template. This allows the number of resource instances to be controlled with a node count parameter. The example template below has a resource group containing the declaration of a single server. Due to its simplicity, the resource group contains the server declaration directly in the main template.
The stack is created with
openstack stack create --template <template-name> <stack-name>
where the default value for count
(on line 29 and 35 in the template) is used.
When Heat processes a template, it creates a parent stack for the main template and a number of child stacks for resource templates which are encapsulated in the main template. This creates a hierarchical structure of stacks, which can be inspected with
openstack stack list --nested
In the template, the output section contains reference to the resource group only. This section is printed with
openstack stack output show <stack-name> --all
The stack content can also be listed with
openstack stack resource list <stack-name> -n 1
This output after stack creation is shown in Figure 1.
Scaling the stack created from this template is done by changing the count
parameter value through the declared parameter cluster_size. The stack update command is
openstack stack update <stack-name> --template <template-name> --parameter <parameter-name:value>
where the parameter argument is a key-value pair in the format "cluster_size=2"
. An increase of the node count to two is reflected in the stack output (Figure 2) and the resources list (Figure 3).
Scaling down is done in the same way, by decreasing the count
parameter.
Template for scaling by stack update
heat_template_version: 2018-08-31 description: Scaling by stack update parameters: server_image: type: string description: Image used for instance default: <image> server_flavor: type: string description: Flavor used for instance default: <flavor> server_key: type: string description: Key pair used on instance default: <key-pair> server_secgroup: type: string description : Security group used on instance default: <security-group> local_network: type: string description: Local network default: <local-network> cluster_size: type: number description: Number of instances in the cluster default: 1 resources: rg: type: OS::Heat::ResourceGroup properties: count: { get_param: cluster_size } resource_def: type: OS::Nova::Server properties: image: { get_param: server_image } flavor: { get_param: server_flavor } key_name: { get_param: server_key } networks: - network: { get_param: local_network } security_groups: [{ get_param: server_secgroup }] outputs: ips: description: "Private IP addresses" value: { get_attr: [rg, "attributes", first_address] } refs: description: "Resource ID" value: { get_attr: [rg, refs] }
Webhooks
The Heat type OS::Heat::AutoScalingGroup
, takes one or more rules of type OS::Heat::ScalingPolicy
for scaling operations which may be triggered by an event such as a webhook or an alarm.
A parent template with an AutoScalingGroup
is shown below. It has an embedded template called instance.yaml
with the actual resource declarations, such as the templates described in https://pannet.atlassian.net/l/c/01TerhN5
The minimum and maximum size limits are set by min_size
and max_size
.
By defining scaling policies to AutoScalingGroup
, the scaling operations can be initiated by sending a POST request to the URL created, which are shown in the template output section. The two scaling policies contains the parameters
adjustment_type
- specifies how the cluster is scaled: by a fixed increment,change_in_capacity
, a percentage of the current size,percent_change_in_capacity
, or an exact target size,exact_capacity
.auto_scaling_group_id
- reference to theAutoScalingGroup
the policy applies to.cooldown
- time interval in seconds where the cluster will be on stand-by, allowing monitoring of the workload. A large value of this parameter leads to fewer scaling operations and a stable cluster, but also a slower response to urgent load situations. This is not used for manually triggered scaling operations.scaling_adjustment
- a number or percentage, depending onadjustment_type
. Note the negative sign of the capacity change in the scale-down policy.
When the stack has been created, its resources are listed with
openstack stack resource list <stack-name> -n 2
The switch -n
- short for nested-depth
- must in this case be set to 2 to display the actual servers. To scale up the cluster by the amount scaling_adjustment
, fetch the URL with
openstack stack output show scale_up_url
and make a POST request with
curl -X POST '<webhook>'
where the URL <webhook>
is copied from the stack output and must be surrounded by quotation marks (single or double) to conform with required format. The curl utility may fail due to a missing certificate. The path to the certificate is added to the command with the parameter --cacert <file-path>
.
Figure 4 shows the number of servers before and after a scaling up the stack triggered by the webhook for this operation.
Scaling by capacity step
When the adjustment type is change_in_capacity
, the scaling adjustment take as argument an integer specifying a fixed capacity step. To achieve down-scaling the number has to be negative.
heat_template_version: 2018-08-31 description: Scaling through webhooks resources: asg: type: OS::Heat::AutoScalingGroup properties: resource: type: instance.yaml min_size: 1 desired_capacity: 2 max_size: 5 scale_up_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '1' scale_dn_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '-1' outputs: scale_up_url: description: > The URL is the webhook to scale up the group. The scale-up operation is invoked by an HTTP POST request to this URL. No body nor extra headers are needed. value: { get_attr: [scale_up_policy, alarm_url] } scale_dn_url: description: > This URL is the webhook to scale down the group. The scale-down operation is invoked by an HTTP POST request to this URL. No body nor extra headers are needed. value: { get_attr: [scale_dn_policy, alarm_url] }
Scaling by percentage
When the adjustment type is percent_change_in_capacity
, the scaling adjustment take as argument an integer denoting percentage of current capacity. The realized adjustment will be the nearest attainable capacity configuration. To achieve down-scaling the number has to be negative. The policy is used for scaling based on used capacity rather than fixed capacity increments. In the following example, the scaling are to the double and half of the current capacity, respectively.
scale_up_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: percent_change_in_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '100' scale_dn_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: percent_change_in_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '-50'
Exact capacity
When the adjustment type is exact_capacity
, the scaling adjustment take as argument an integer the fixed target capacity, which always is greater than zero. The policy is used to switch between a small number of pre-determined capacity steps.
scale_up_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: exact_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '3' scale_dn_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: exact_capacity auto_scaling_group_id: { get_resource: asg } scaling_adjustment: '1'
Environments
The environment is a separate file used with a template to create a stack. There is also a default OpenStack and possibly operator-defined global environments used in the cloud. An entry in the user-defined environment takes precedence over the global environment.
Structure
An environment file is used with a template and is included in the openstack stack create
command by supplying the -e
option. The two prominent features are parameter handling and resource mapping with examples given in the subsequent two subsections.
The environment file can contain some of the following sections:
parameters
- parameter values to be used with the template resourcesparameter_defaults
- default parameters passed to all template resourcesparameter_merge_strategies
- merge strategies for parameters from input sourcesresource registry
- resource mapping and registry for custom resource types
Other optional sections not described in this tutorial are:
encrypted_parameters
- list of encrypted parametersevent_sinks
- list of endpoints that would receive stack events
Parameter values
The environment extends the parameter handling possibilities, and there are certain rules that need to be adhered to. It basically separates the parameter values from the resource declarations.
In the template, a resource property can be assigned a value directly or through a declared parameter whose value can be overwritten either from CLI using the -p
option, or by the environment. When used with an environment file, it therefore needs to include a parameter declaration, but no value needs to be specified if it is to be passed from the environment. An example of a minimal parameter declaration in the template looks like
parameters: server_flavor: type: string server_image: type: string server_secgroup: type: comma_delimited_list
The type has to be included in the parameter declaration. The values of the parameter names declared in the template are set in the environment without any other properties, like
parameters: server_flavor: g1.standard-1-1 server_image: ubuntu-20.04-x86_64 server_secgroup: ssh_only
Values defined under parameters
override default values under parameter_defaults
. In case parameter values are entered from different sources, it is necessary to follow some rule for how to handle the presence of multiple declarations. Such rules, that can be specified down to parameter level, are collected under the section parameter_merge_strategies
.
The options are overwrite
(default strategy, unless otherwise specified), merge
(for Python list-type arguments) and deep_merge
(for JSON arguments). If no merge strategy is provided in an environment file, overwrite
becomes the default strategy for all parameters in the environment file. This can be changed by default: merge
.
The strategy must conform with the parameter type declared in the template. If declared as string
, the merge strategy will concatenate two strings. To merge values into a list argument, the type must be comma_delimited_list
.
To illustrate handling of parameters from different sources, consider a situation where we wish to deploy a server with a given image and where the key pair is user specific, and other parameters determined by the project.
The image parameter is considered part of the resource type and specified directly in the template, the flavor and security group have default values, and the server name, network, and key pair and a second security group are given values in the parameters
section of environment file.
Say we create the stack with
openstack stack create --environment environment.yaml --template template.yaml --parameter "server_secgroup=http_only" test-stack
This would launch an instance with the property values
name: web_server image: ubuntu-20.04-x86_64 flavor: g1.standard-1-1 key_name: common network: internal_network security_groups: [ssh_only, http_only]
Template (template.yaml):
heat_template_version: 2018-08-31 description: Test instance parameters: server_name: type: string server_key: type: string server_flavor: type: string server_secgroup: type: comma_delimited_list local_network: type: string resources: server: type: OS::Nova::Server properties: name: { get_param: server_name } image: ubuntu-20.04-x86_64 flavor: { get_param: server_flavor } key_name: { get_param: server_key } networks: - network: { get_param: local_network } security_groups: { get_param: server_secgroup }
Environment file (environment.yaml):
parameters: server_name: web_server server_key: common server_secgroup: ssh_only local_network: internal_network parameter_defaults: server_flavor: g1.standard-1-1 server_secgroup: default parameter_merge_strategies: server_secgroup: merge
Autoscaling
The OpenStack component Ceilometer is a telemetry service that gathers data from the cloud, including events (such as creation of an instance) and samples (like load figures for VCPUs). Sample data are stored in a database, managed by the Gnocchi service that allows to fetch data for project health monitoring and alarms.
The Aodh service collects measures from Gnocchi, performs logical operations on the data and trigger actions according to specified rules. This is used to trigger autoscaling in Heat as shown in Figure 5.
To design an alarm, a number of factors need to be considered, such as
Source object - is data from the correct resource (for example server or server group) used for the alarm?
Does the relevant Gnocchi metric and measure exist, and presented in useful units and scale?
Does the alarm trigger the desired action?
How should the alarm be tuned with threshold setting and repeated action?
Note that the Gnocchi client is a plugin and installed as part of the OpenStack client suite. The individual Gnocchi client can be installed with pip install python-gnocchiclient
.
Available metrics
When the Gnocchi service is active, the command
openstack metric resource list
prints a list of Gnocchi resources, such as a measurement time series. Each OpenStack resource can have several metric resources associated to it. A specific Gnocchi time series associated to a OpenStack resource with ID <resource-id>
can be found in two steps:
openstack metric resource show <resource-id> -c metrics openstack metric measures show <gnocchi-id>
which displays measurements and timestamps for the given resource (for example cpu
) as shown in Figure 6.
The two operations can be combined in
openstack metric measures show --resource-id <resource-id> cpu
The commands of the Gnocchi client are invoked with the prefix gnocchi
or, equivalently, with openstack metric
. The definitions of the Gnocchi resources listed in Figure 6 are:
compute.instance.booting.time
- booting time of instance (s)cpu
- accumulated CPU time (ns)disk.ephemeral.size
- size of ephemeral disk, by default 0 (GB)disk.root.size
- size of root disk in server configuration (GB)memory.usage
- instantaneous RAM usage (MB)memory
- size of RAM in server configuration (MB)vcpus
- number of VCPUs in server configuration
Archives and aggregation
Gnocchi archive policies are used to define the aggregation and storage policies for data. An archive policy definition uses has a fixed number of data points in a given time interval.
The default archive policy contains two definitions and one rule - metrics for seven days with granularity of one minute and for 365 days with granularity of one hour.
Archive policy names are listed with
openstack metric archive-policy list
Details of an archive policy, including aggregation methods are displayed with
openstack metric archive-policy show <policy-name>
as shown in Figure 7.
The cpu
metric cannot be directly compared with a static threshold since it has an accumulated value. To compute a representative load figure for each sample, the data has to be aggregated with one of the built-in policies. The details of available policies (Figure 8) is produced with
openstack metric archive-policy list
The average CPU load is given by the policy rate:mean
, the value of the aggregation method argument. The aggregated values of the CPU data are shown with
openstack metric measures show --aggregation rate:mean --resource-id <server-id> cpu
The periodical load figures are shown in Figure 9.
Note that the aggregated values are given in nanoseconds, so with the granularity 300 seconds, 100% load corresponds to 3.0*1011 ns per core. This implies that the alarm threshold has to be multiplied by the number of cores and expressed in nanoseconds.
Stress testing
In stress testing a script or process is run on a server to induce high resource utilization. A simple stress test on a server can performed by running the stress
utility through SSH, which generates load that can be controlled both in intensity and duration. It is installed on a tested server with
sudo apt install stress
To generate a certain CPU load for a specified time duration, the command is passed values to the parameters --cpu
and --timeout
.
The command uptime
returns values of the current time, total system time in running state, number of users currently logged in, and the average load for the past 1, 5 and 15 minutes respectively. This can be useful to inspect the load figures on the server before and after the stress test,
uptime sudo stress --cpu 100 --timeout 300 uptime
Figure 10 shows a stress test on a single core server of duration 300 seconds.
Create alarm
Alarms provide configurable monitoring services for resources running on OpenStack. It is used for resource scaling through Heat orchestration, but can in principle also be used for health monitoring of cloud resources. This section describes alarms based on performance data from the Gnocchi data base; alarms can also be based on events, such as resource creation or power state.
Alarms are managed with the relevant command following openstack alarm
, for example
openstack alarm list
Gnocchi alarms follow a tri-state model:
ok
- the rule governing the alarm has been evaluated asFalse
alarm
- the rule governing the alarm has been evaluated asTrue
insufficient data
- there are not enough data points available in the evaluation periods to determine the alarm state meaningfully
Roughly speaking, an alarm is defined to perform an action when its state changes to its target state, based on some input. Unfortunately, not all diagnostic tools for alarms, such as notifications, are available to non-administrator OpenStack users, so an alarm should be designed carefully to avoid lengthy troubleshooting. The alarm parameters should be set properly so that
Desired change in Gnocchi data changes the state of the alarm
Alarm actions result in some observable event
Alarms are created with openstack alarm create
, which requires a number of parameters for its full specification.
The alarm may be triggered by comparing a single metric on a resource with a threshold value, aggregate data across metrics or resources. The case determines the alarm type
, where the single metric case is gnocchi_resources_threshold
.
The alarm type determines the arguments needed to create it. For a simple CPU load alarm, the metric is cpu
, aggregation method is rate:mean
, comparison operator is gt
(greater than) and the threshold is expressed in nanoseconds. The alarm granularity must match the granularity of the metric configured in Gnocchi (in this case 300).
Alarms are evaluated on a periodic basis, defaulting to once every minute. The evaluation period parameter specifies the number of periods to evaluate over before an action is invoked. If repeat-actions
is set to True
, the actions are repeatedly notified while the alarm remains in its target state.
Finally, the resource type is set to instance
and the parameter resource_id
specifies the instance the alarm is collecting data from.
An alarm typically invokes an action through a webhook. For this purpose, openstack alarm create
accepts the arguments
--alarm-action <webhook-url>
--ok-action <webhook-url>
--insufficient-data-action <webhook-url>
The parameters can be combined with repeat-actions
and time-constraint <time-constraint>
to control the number of alarm actions.
Figure 11 shows the command for creating an alarm with threshold 50% (for testing purposes) and the alarm action set to the webhook scale-up policy from https://pannet.atlassian.net/wiki/spaces/DocEng/pages/2116943877/Resource+scaling#Scaling-by-capacity-step
The details of an alarm can be printed with
openstack alarm show <alarm-id>
To trigger the alarm, high CPU load can induced on the target server running the stress
utility through SSH as described in https://pannet.atlassian.net/wiki/spaces/DocEng/pages/2116943877/Resource+scaling#Stress-testing. It then enters the alarm state and activates the webhook, which scales up the stack by adding a new server.
Testing an alarm action can also be done directly by manually setting it in its target state, through
openstack alarm state set --state alarm <alarm-id>
Its current state is read with
openstack alarm state get <alarm-id>
Create alarm with Heat
In autoscaling, the alarm should typically aggregate data from a cluster of servers, or server group. The command below computes the periodic CPU load on each instance, and aggregates these values by taking their mean.
openstack metric aggregates --resource-type=instance \ --granularity 300 \ "(aggregate mean (metric cpu rate:mean))" "project_id!='dummy'"
The type of alarm used in such situations is OS::Aodh::GnocchiAggregationByResourcesAlarm
. To define it, the following four arguments are required:
resource-type
metric
threshold
query
The query is the last line of the command and defines the server group and resource aggregation method. The expression "project_id!='dummy'"
filters instances with respect to project_id
, where any ID will do (apart from 'dummy'), which returns data for all instances in the project.
In a Heat template, a better choice of server group might be stack_id
which is created at the same time as the autoscaling group and filters out to the instances belonging to it. This is can be implemented with metadata. In the properties of the server, add the line
metadata: {"metering.server_group": {get_param: "OS::stack_id"}}
No other identifiers are required; Gnocchi has the keyword server_group
defined for instances by default. In a Heat template, a query to filter data on server group would be
query: list_join: - '' - - {'=': {server_group: {get_param: "OS::stack_id"}}}
The query can also be used to format the CPU load to a percentage figure, for example for a server group with two single core instances
openstack metric aggregates --resource-type instance \ "(* ( / (aggregate rate:mean (metric cpu mean)) 60000000000.0) 100)" \ "server_group=<stack-id>"
The result of the data transformation is shown in Figure 12.
The details of the query can be seen in the printout from openstack alarm show <alarm-id>
.
Additional resources
https://docs.openstack.org/aodh/pike/admin/telemetry-alarms.html