Accelerated networks

The DT Cloud Services cloud offers standard and accelerated network performance. Accelerated or high-performance networking is able to achieve Telco grade KPIs.

Contents

Accelerated network performance is achieved through special technologies. This is realized differently in the Beryllium and Boron releases.

KPIs achieved in the DT Cloud Services cloud using network acceleration:

  • Framerate: 11 mpps, bi-directional; at 64 byte packets size

  • Bitrate: Interface or line saturation is 40 gbps at 750 bytes packets size

  • Average latency: 50 microseconds at 64 byte packet size

Network acceleration in Beryllium

Compute nodes in the Beryllium release are equipped with Netronome Agilio SmartNICs and Express Virtio Forwarder (XVIO) technology.

SmartNics is a network adapter that is able to accelerate functionality and offload it from hardware (server CPU). The virtio driver is a generic network driver supported by the majority of Linux and Windows operating systems. It enables network layer abstraction by ensuring that generic virtio driver requests are translated (via virtio adapter) to NIC specific software requests. This allows full detachment from the cloud infrastructure and allows the VM operating system to be hardware (NIC) independent.

These technologies have a number of benefits:

  • Applications receive networking performance of SR-IOV while abstraction from underlying infrastructure is maintained

  • Compute nodes utilizes SDN with Contrail vRouter running in the SmartNIC hardware

  • DPDK accelerated and non-accelerated VM’s can share the same compute node without performance impact

  • Due to the ability to abstract from the underlying infrastructure, the lifecycles of the application and the (NIC) hardware are decoupled

With SmartNIC, DT Cloud Services supports any type of application or service from moderate to very high networking requirements, such as Telco-grade VNF’s. The architecture of SmartNIC is shown in Figure 1.

Figure 1. The SmartNIC architecture.

SmartNics with XVIO can provide almost the same networking performance across all NUMA nodes of the compute. Therefore, it is not (stringently) required to target the application placement in the same NUMA node as the SmartNic (Numa 0). It is important to note that placement in NUMA 1 can lead to a loss of networking performance (pps) of a few percent as illustrated in Figure 2.

Figure 2. VM performance depending on NUMA placement.

Network acceleration in Boron

In the Boron release, network acceleration is based on:

  • Legacy SR-IOV

  • DPDK acceleration for user workloads (VMs)

Legacy SR-IOV

We have two types of the compute nodes: Intel CPU based dual socket and AMD EPYC2 CPU based single socket. The following NIC models and NUMA placement are available for the legacy SR-IOV mode:

  • On the Intel CPU dual socket compute nodes the Intel x710 NIC is used for the user workload traffic (VMs) and this NIC is mounted in the NUMA0.

  • On the AMD EPYC2 CPU single socket compute nodes the Mellanox ConnectX-5 NIC is used for user workload.

DPDK acceleration

The Data Plane Development Kit (DPDK) is a set of data plane libraries and network interface controller drivers for fast packet processing. The DPDK implements a low overhead low-latency model for fast data plane performance and accesses devices via polling to eliminate the performance overhead of interrupt processing. DPDK leverages existing Intel Processor technologies like SIMD instructions (Singles Instruction Multiple Data), Huge pages memory, multiple Memory channels and Caching to provide acceleration with its own libraries.

The Network workloads running its own application sits with network stack in user space and calls DPDK libraries for data plane functions, this includes processing incoming network frames at user level for exception processing than previous OS processing at kernel level. DPDK libraries utilize special CPU cores and memory at initialization time, including buffer pools, descriptor rings, and lock-less queues.

Huge pages

Huge pages is a technology used for efficient memory management. It enables processes operating on larger blocks of data between the memory access times, since it decreases the amount of memory address mapping. This needs to be configured in the Linux kernel, and is available for flavors of performance type (having dedicated resources).

The standard memory block (or page) size in Linux virtual memory management is 4K, whereas Huge pages allow up to 1GB blocks.

Network interfaces for VMs

Any on-boarded application and its VMs can by default get:

  1. Standard virtio interface on both «Shared compute nodes» and «Dedicated compute nodes».

  2. Legacy SR-IOV connection by special request and only on «Dedicated compute nodes».

After network acceleration has been enabled, the customer simply creates VMs of performance type on the SR-IOV port type. Standard and accelerated compute node configurations are shown in Table 1.

Compute node type

Standard

Accelerated

Shared

Port type: virtio

vCPU: shared

Huge pages disabled

Not available

Dedicated

Port type: virtio

vCPU: pinned

Huge pages enabled

Port type: SR-IOV

vCPU: pinned

Huge pages enabled

Table 1. Standard and accelerated compute node configurations.