Benchmarking Xen Virtualization
Introduction to Xen Virtualization Types (PV, PVHVM, PVH)
Xen is an open-source baremetal hypervisor that is widely used by commercial and non-commercial platforms to provide virtualization support. However, unlike most other hypervisors, Xen supports multiple ways of virtualizing guests. Below is a brief history of the development of these modes and their relationships with one another:
Naturally, there are performance implications when selecting a virtualization mode. For example, newer modes like PVH are better able to take advantage of newer hardware virtualization features. The following graphic shows a breakdown of the different system components that are emulated by Xen, along with whether this occurs in hardware or software:
So we can see that PV mode does not make use of many hardware accelerated virtualization features, while PVH can use features like VT-d Posted-Interrupts and VT-x (on intel platforms) to improve performance. Other modes like PVHVM are somewhere in the middle, using some hardware acceleration but less-than-ideal software emulation for other components.
Note: PV can use some hardware features like EPT to improve paging performance (called HAP in Xen).
The goal of this research was to investigate how the different virtualization types affected performance under a number of benchmarks. Naturally, the expectation was that PVH would be the most performant, however that mode requires a relatively recent Linux kernel version and does not support some Xen features (like PCI-passthrough) that may be required for some users. It seemed useful, therefore, to investigate the actual difference in performance.
Historic Benchmarks
Of course, Xen is a widely used technology, and there have been previous efforts to do benchmarks like these. However many of these are very dated and do not include more recent modes like PVH. We ultimately decided to style most of these tests after the work that Rackspace did in 2013:
https://developer.rackspace.com/blog/welcome-to-performance-cloud-servers-have-some-benchmarks/
Their tests were largely comparing PV vs HVM, different cpu configurations (i.e., under and oversubscribed). These largely showed results in line with what we would expect. (PV)HVM makes more use of hardware acceleration than PV, and so it has generally higher performance across the various benchmarks Rackspace ran.
Test Setup
Our benchmarks are similar to Rackspace’s, but use more recent versions of Xen and the Linux kernel. This gives us the ability to test newer virtualization modes like PVH. The test environment is as follows:
Test platform: Supermicro 5018D-FN4T
Intel Xeon D-1541
16GB RAM
Linux Distribution: Debian Buster (kernel version 4.19) for both dom0 and the guests
Xen version: 4.11 installed from the standard Debian packages
Other Xen configuration:
Cores are pinned and the NULL scheduler is used. This is done for consistency of test results.
Cores 0 and 1 are reserved for dom0
1GB of RAM is reserved for dom0, no ballooning
Disk for guest is a SSD passed through as a raw device
All tests are performed 5 times to get an average and std deviation
Creating the guests
Guests were created using a standard xen-image-create
command like:
xen-image-create --dist=buster --dir=/etc/xen \
--hostname=testvm --dhcp --password=password --noswap \
--memory=14G
This same guest configuration was used for the PV, PVHVM, and PVH tests, but the ‘type’ was manually changed for each test, to use the appropriate virtualization mode.
Benchmarks
We used essentially the same benchmarks as the Rackspace test. Each benchmark was run 5 times. The average of the 5 iterations is reported in the results section.
To test disk I/O,
fio
was used as follows:fio --name fio_test_file --direct=1 --rw=randwrite \ --bs=16k --size=1G --numjobs=16 --time_based --runtime=180 --group_reporting
Most general system performance was measured using
unixbench
, which has several benchmarks that tests the performance of syscalls, floating point operations, pipe throughput and more. It can be setup by cloning the unixbench repository and runningmake && ./Run
from theUnixBench
directory.Linux kernel (4.19) compilation time was also benchmarked to test combined CPU and I/O performance. This compilation was executed with:
make -j$(cat /proc/cpuinfo | grep processor | wc -l)
Test Results
Disk I/O
Unfortunately, these first results don’t provide much insight. It seems very unlikely that native performance is lower than the virtualized case. It is likely that we are completely saturating the disk with every virtualization method.
Unixbench
Here we see some other strange results. The first few benchmarks are as expected however, with the virtualization methods all having almost identical scores. This is reasonable because ‘Wetstone’ and ‘Drystone’ are measuring numerical performance, which should be unaffected by the virtualization method used. In later benchmarks, though, we see that PVHVM gets slightly better scores than PVH in almost every benchmark. This is the opposite of what we would expect.
There are a few possible explanations for these results. One candidate is the nature of where the emulation happens when using PVH vs PVHVM. In the latter mode, the device model (QEMU in this case) is executing in Dom0. The former mode executes emulation in the hypervisor itself. Because we are using the ‘null’ scheduler, Dom0 always has cores available to execute on. So when in PVHVM mode, the hypervisor has relatively little work to do. PVH mode, on the other hand, will require more time spent in the hypervisor (and therefore less in the guests). This could account for the comparatively worse performance.
Kernel Compilation
The kernel compilation results are aligned with the previous ones. PV performs significantly worse than PVH and PVHVM. However, this is a more realistic workload and shows the significant benefit of moving from PV to PVHVM (or PVH).
Conclusions
This has been a quick look at the performance for the various Xen virtualization modes. Of course, these results may not apply to your specific workload. Xen makes it very easy to switch between modes, so it is typically best to test in each mode and use the one that is best for you.
Headline image courtesy of the Xen Project.