Availability

The Pan-Net infrastructure cloud (IC) is based on OpenStack and has availability zones implemented as defined by the OpenStack standard.

Contents

Regions and availability zones

An OpenStack availability zone is a logical partitioning of the infrastructure visible to the user. The partitioning does not need to correspond to the physical distribution of infrastructure, but is rather a representation based on the level of service resilience that can be ensured.

Availability zones are based on aggregates, which are specific sets of hosts with labels (meta-data) attached, and gives the user freedom to allocate instances to chosen zones or sets of hosts. An host can only be in one availability zone, and any host is part of a default availability zone even if it does not belong to an aggregate.

Pan-Net uses a modification of the OpenStack availability concepts. The OpenStack concepts are illustrated in Figure 1, and comprises

  • Region (or Deployment)

  • Sites (or Data Centers)

  • Availability Zone

Each Region has its own OpenStack deployment, including API endpoints, networks and compute resources. Different regions share Keystone and Horizon services, providing access control and the web interface, respectively.

Within each region, compute nodes (hosts) are logically grouped into Availability Zones (AZ). When an instance is created, the user can specify the AZ where it should be instantiated. It may even be possible to specify a specific node inside an AZ when the instance should be run.

In the Pan-Net cloud, an enhanced availability concept is used, as illustrated in Figure 2. The terminology is defined as follows:

  • Regions and Regional Availability Zones

  • Sites (or Data Centers)

  • Site Availability Zones

In the Pan-Net IC, each region consists of at least two geographically redundant data centers, which constitute the regional availability zones. Each site (that is, data center) has three site availability zones with physically separate compute resources, but shared control plane. Services requiring high availability should be deployed using regional and site availability zones.

A region corresponds to a country or part of a country. It represents a cluster of geographically close sites constituting Regional Availability Zones (RAZ). A site's membership in a RAZ is driven by

  • Country geography and site distances (to ensure low intra-region latency)

  • Population density and service demand (for load distribution)

  • Pan-Net IC control plane scalability limits

A Site is an OpenStack deployment. At present, each Pan-Net data center has its own OpenStack deployment, so a site can be considered synonymous with a data center.

A Site Availability Zone (SAZ) is the same as an OpenStack availability zone of a single environment or deployment.

Anti-affinity rules

The anti-affinity rule is a setting to ensure that VMs supporting the same application can be deployed on different hardware platforms (that is, compute nodes). This guarantees spatial diversity in the implementation, which in turn increases the reliability of the application. An AA rule can be set independently or together with a deployment across different availability zones.

The procedure to apply anti-affinity rules by creating server groups is described in how-to Create (anti-)affinity group

Pan-Net cloud availability

The Pan-Net cloud topology allows for design of high-availability application when distributed across availability zones. The purpose of having availability zones is to be able to avoid or mitigate service impact by identifying single points of failure. These points are such that when an event occurs, it causes the service to fail (leading to server downtime and non-availability).

Critical faults and operational events can be classified as

  • Random failures - component or link failures due to technical malfunction or accidents. The effect of this type of failures is mitigated by implementing the service with geographical redundancy, for example in different availability zones.

  • Planned maintenance - scheduled maintenance can be mitigated by proper operational countermeasures. Each availability zone has a defined maintenance plan, so distributing a service across availability zones increases server uptime.

By using regional availability zones, Pan-Net cloud targets a service availability of 99.99% for any region. Base on this target figure, zone availability figures can be computed by assuming independent failures of each logical component.

With two (or more) sites per region, the availability requirement A per RAZ is

1-(1-A)*(1-A) = 0.9999

or A=99% for any single RAZ.

Repeating the same argument for two (or more) VM's per site, we see that the required VM reliability is at least 90%. The nominal availability figures for Pan-Net cloud components are listed in Table 1.

Zone

Target availability

Region

99,99 %

RAZ

99,01 %

SAZ

> Single instance VM

Table 1. Availability of zones.

With a VM availability of at least 90%, with proper distribution of VMs highly resilient applications can be built, with an availability similar to a region, or 99.99%. For still higher availability, a multi-regional design must be used. These principles are illustrated in Figure 3.

Availability SLA

Region SLA for Compute is applied for the deployment with minumum 4 x VM Instances distributed over minimum 2 x RAZ where VM Instances in each RAZ are placed in separate SAZ and/or Anti-Affinity rule is applied.

Region Availability Zone SLA for Compute is applied for the deployment with minumum 2 x VM Instances in this RAZ, where VM Instances are placed in separate SAZ and/or Anti-Affinity rule is applied.

Downtime means:

  • For VM Instances: Loss of external connectivity or persistent disk access for all running VM instances.

  • Downtime does not include loss of external connectivity as a result of

    • Failure of NatCo specific private peering; that sort of downtime is addressed exclusively in the MPLS NNI SLA

    • Failures outside Pan-Net network responsibility.

The availability period is one month.