How the Xen Hypervisor Supports CPU Virtualization on ARM
Introduction
Early computer architectures, like the first Acorn RISC Machine pictured in Figure 0, had no support for CPU virtualization. In the 30 years since, processor designers have added new hardware components to fully enable virtualization.
This article explores how the Xen hypervisor supports CPU virtualization on modern ARM processors. We will start with a short background on virtualization and Xen, followed by a discussion of how ARM supports CPU virtualization with the virtualization extensions and finishing with a discussion of how Xen utilizes these features.
We are focused specifically on the ARMv8 architecture, commonly referred to as AArch64, and systems that support the ARM virtualization extensions. We will be assuming that a device tree is used for hardware discovery as is the case for nearly all embedded ARM devices.
Background
What is virtualization?
Virtualization provides a way to run multiple operating systems (OS) on the same hardware. It allows for the partitioning of resources between different Virtual Machines (VM) preventing one VM from accessing another. Modern virtualization takes advantage of underlying hardware features to allow for a performant system capable of running many different isolated workloads.
What is Xen? And what is Xen on ARM?
Xen is an open-source, type I hypervisor. A type 1 hypervisor runs directly on the hardware with the OS running on top of it. This is in contrast to a type II hypervisor, like KVM, in which the OS boots first and serves as the mediator of the hardware to the hypervisor which then interfaces with the VMs.
Xen on ARM is a port of Xen to support ARM devices using the virtualization extensions that modern ARM CPUs have. It uses a significantly smaller codebase than x86 Xen as well as a simpler architecture with most of the same features [1].
How does ARM support virtualization?
What are exception levels?
The ARM virtualization model is built on the concept of exception levels with each successive level having greater privileges. Instructions can trap from an exception level with fewer privileges to one with more privileges. For example, software running at EL1 can trap to EL2 using HVC instructions. Software running at EL2 can trap to EL3 using Secure Monitor Call (SMC) instructions. There are four exception levels:
EL0 – the userspace or application level
EL1 – the kernel or OS level
EL2 – the hypervisor
EL3 – the secure monitor and firmware
In addition to exception levels, ARM supports the idea of both a non-secure and a secure world. Everything listed so far describes the non-secure world. The right-hand side of Figure 1 shows the secure world which contains support at both EL1 and EL0 for specific trusted applications but does not currently support anything at EL2 [2]. Switching between the secure world and the non-secure world is done via an SMC instruction.
How are CPUs identified?
Each CPU can be identified using two read-only (RO) registers that are exposed to EL1, EL2, and EL3. The Main ID Register (MIDR) is used to identify the CPU type and the Multiprocess Affinity Register (MPIDR) is used to identify the CPU topology. Using these two registers allows the hypervisor or OS to make scheduling decisions.
In a virtualized system, there are two additional registers accessible only from EL2 and EL3 with both read and write (RW) access. The Virtualization Processor ID Register (VPIDR) mirrors the MIDR and the Virtualization Multiprocessor ID Register (VMPIDR) mirrors the MPIDR. Reads from software running at EL1 to MIDR_EL1 actually return the value held in VPIDR_EL2 and the same for MPIDR_EL1 and VMPIDR_EL2. This allows the hypervisor to have control over what each guest VM knows about the underlying hardware. For example, the hypervisor can choose to correctly identify each CPU passed into a VM while hiding information about the CPU topology.
Main ID Register [MIDR_EL1]
The MIDR is a 32-bit RO register that holds information about the identity of the CPU and the architecture. These bits have fixed values shown in Figure 2 to identify the type of CPU, revision number, CPU architecture, and company. As an example, the values for a specific Cortex-A53 are included [3].
Virtualization Processor ID Register [VPIDR_EL2]
The VPIDR is a 32-bit RW register that holds the value returned by non-secure EL1 reads of MIDR_EL1. This register is accessible as RW from both EL2 and EL3, but is not accessible from EL1 or EL0. It holds a value set by either the hypervisor running at EL2 or the firmware running at EL3 during boot. For it to be interpreted correctly by the OS, it must correspond to a legal value for the MIDR_EL1 register.
Multiprocessor Affinity Register [MPIDR_EL1]
The MIDR_EL1 is a 64-bit RO register that holds information about the affinity of the CPU to describe how it fits into the system topology. While it is a 64-bit register, most systems use only the lower 32 bits. The lowest 8 bits (Aff0 ) indicate the core number of the CPU. On a four-core MPSOC, this would hold the value 0x0, 0x1, 0x2, or 0x3.
The higher bits are used to hold the higher affinity levels which describe CPU clusters. One way these are used is to identify Heterogeneous Multiprocessor (HMP) systems that contain multiple sets of CPU cores. This is branded as big.LITTLE and used commonly in smartphones where there is a combination of “little” energy-efficient cores with “big” highly performant cores.
Virtualization Multiprocessor ID Register [VMPIDR_EL2]
The VMPIDR_EL2 is a 64-bit RW register that holds the value returned by non-secure EL1 reads of MPIDR_EL1. Software running at EL3 and EL2 can access this register, but software at EL1 and EL0 cannot. VMPIDR holds a value set either by the hypervisor at EL2 or the firmware running at EL2/3 during boot. For it to be interpreted correctly by the OS, it must correspond to a legal value for the MPIDR_EL1 register.
How does Xen virtualize CPUs?
At Xen initialization
Xen begins its boot process running on a single CPU. It does the initial setup of the page tables, memory, and interrupts before moving onto the other CPUs. Xen scans the device tree to find the cpus
node which contains the information describing the CPUs [4]. Each CPU has its own subnode describing the features of that particular CPU core. It queries the MPIDR_EL1 register to determine the MPIDR of the boot CPU.
Xen assigns each CPU a logical ID number, with the boot CPU taking 0 and the other CPUs numbered sequentially in the order that they are discovered in the device tree. The MPIDR for each CPU is stored alongside its ID. Each CPU is checked to see if it has a MIDR that is the same as the boot CPU, and if it does, Xen initializes and starts the CPU. By default, Xen disables all CPUs that have a different MIDR from the boot CPU. This is due to architectural decisions in Xen that assume all CPUs will be of the same type [5].
At domain initialization
Once Xen is fully booted, it makes preparations to start the VMs, generally referred to as domains by Xen. The first domain started is the hardware domain–often called Dom0–which is a special, privileged domain used to initialize hardware drivers for any devices that may need to be shared by multiple domains instead of assigned exclusively to one domain. It is also used to start up conventional guest domains.
For the hardware domain to be booted, it needs a device tree. Xen builds the device tree for the hardware domain using the host device tree passed to Xen at boot, but with some nodes modified or removed. One of these modified nodes is the cpus
node. The hardware domain only gets as many CPUs as are specified on the Xen command line at boot with the dom0_max_cpus
parameter.
Xen creates a new cpus
node containing as many child CPU nodes as were specified in the command line parameter. Since Xen only booted CPUs that had the same MIDR as the boot CPU, it can take shortcuts in assembling the CPU nodes. The CPU nodes are numbered starting at 0 and contain the appropriate MPIDR pulled from the table Xen created at boot. Xen adds other critical fields including device_type
and enable-method
to allow the guest OS to make use of the CPU, but hides any further information from the domain.
cpus {
#address-cells = < 0x01 >;
#size-cells = < 0x00 >;
cpu@0 {
compatible = "arm,cortex-a53", "arm,armv8";
device_type = "cpu";
enable-method = "psci";
reg = < 0x00 >;
};
};
Figure 4: Example cpus
device tree node from a Xen domain
Xen goes through this same process for creating guest domains. Xen guests are referred to as DomUs because they are unprivileged domains that cannot control the hypervisor. Conventional DomUs are started by the Dom0 after both Xen and the Dom0 are booted. The guest domain is specified in a file that specifies the number of CPUs, amount of memory, kernel, and initramfs. More recently, Xen has added support for “dom0less” guests that are booted directly by Xen. These domains are specified in Xen’s host device tree and booted simultaneously with the Dom0.
Conclusion
The ARM virtualization extensions provide a hardware means for a hypervisor to virtualize the CPU, allowing for multiple OS to be run on the same system. Each physical CPU contains dual copies of its primary ID and affinity registers to allow a hypervisor to manipulate the values passed into a VM. Xen builds upon this by creating specific CPU device tree nodes to pass into each guest.
Looking forward, there is work that remains to be done to allow Xen to support HMP systems used in a variety of mobile platforms to meet intermittent performance goals on a fixed energy budget. Adding this support will require reworking assumptions that underly how Xen virtualizes CPUs.
Footnotes
This whitepaper produced by the Xen foundation gives more detail about Xen on ARM.
This changes with ARMv8.4a which introduces support for virtualization in the secure world.
The ARM reference manual gives more information here.
For more information about this process, see `dt_smp_init_cpus()` in smpboot.c.
The Xen command line parameter `hmp-unsafe` can be used to allow Xen to boot multiple types of CPUs. There are some significant performance problems that will be encountered compared to a non-virtualized system as Xen will use the same MIDR for all CPUs. Xen also does not contain a mechanism to specify which specific CPUs a guest will boot with. Further, when Xen builds the guest device tree it assumes that it can assign any of the CPUs available because to Xen they are all the same.