mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-02-02 01:45:17 +00:00
docs: add kbase entry showing KVM real time guest config
There are many different settings that required to config a KVM guest for real time, low latency workoads. The documentation included here is based on guidance developed & tested by the Red Hat KVM real time team. Reviewed-by: Jiri Denemark <jdenemar@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This commit is contained in:
parent
e0cf04ffd6
commit
ead908014d
@ -36,6 +36,9 @@
|
||||
|
||||
<dt><a href="kbase/virtiofs.html">Virtio-FS</a></dt>
|
||||
<dd>Share a filesystem between the guest and the host</dd>
|
||||
|
||||
<dt><a href="kbase/kvm-realtime.html">KVM real time</a></dt>
|
||||
<dd>Run real time workloads in guests on a KVM hypervisor</dd>
|
||||
</dl>
|
||||
</div>
|
||||
|
||||
|
214
docs/kbase/kvm-realtime.rst
Normal file
214
docs/kbase/kvm-realtime.rst
Normal file
@ -0,0 +1,214 @@
|
||||
==========================
|
||||
KVM Real Time Guest Config
|
||||
==========================
|
||||
|
||||
.. contents::
|
||||
|
||||
The KVM hypervisor is capable of running real time guest workloads. This page
|
||||
describes the key pieces of configuration required in the domain XML to achieve
|
||||
the low latency needs of real time workloads.
|
||||
|
||||
For the most part, configuration of the host OS is out of scope of this
|
||||
documentation. Refer to the operating system vendor's guidance on configuring
|
||||
the host OS and hardware for real time. Note in particular that the default
|
||||
kernel used by most Linux distros is not suitable for low latency real time and
|
||||
must be replaced by a special kernel build.
|
||||
|
||||
|
||||
Host partitioning plan
|
||||
======================
|
||||
|
||||
Running real time workloads requires carefully partitioning up the host OS
|
||||
resources, such that the KVM / QEMU processes are strictly separated from any
|
||||
other workload running on the host, both userspace processes and kernel threads.
|
||||
|
||||
As such, some subset of host CPUs need to be reserved exclusively for running
|
||||
KVM guests. This requires that the host kernel be booted using the ``isolcpus``
|
||||
kernel command line parameter. This parameter removes a set of CPUs from the
|
||||
scheduler, such that that no kernel threads or userspace processes will ever get
|
||||
placed on those CPUs automatically. KVM guests are then manually placed onto
|
||||
these CPUs.
|
||||
|
||||
Deciding which host CPUs to reserve for real time requires understanding of the
|
||||
guest workload needs and balancing with the host OS needs. The trade off will
|
||||
also vary based on the physical hardware available.
|
||||
|
||||
For the sake of illustration, this guide will assume a physical machine with two
|
||||
NUMA nodes, each with 2 sockets and 4 cores per socket, giving a total of 16
|
||||
CPUs on the host. Furthermore, it is assumed that hyperthreading is either not
|
||||
supported or has been disabled in the BIOS, since it is incompatible with real
|
||||
time. Each NUMA node is assumed to have 32 GB of RAM, giving 64 GB total for
|
||||
the host.
|
||||
|
||||
It is assumed that 2 CPUs in each NUMA node are reserved for the host OS, with
|
||||
the remaining 6 CPUs available for KVM real time. With this in mind, the host
|
||||
kernel should have booted with ``isolcpus=2-7,10-15`` to reserve CPUs.
|
||||
|
||||
To maximise efficiency of page table lookups for the guest, the host needs to be
|
||||
configured with most RAM exposed as huge pages, ideally 1 GB sized. 6 GB of RAM
|
||||
in each NUMA node will be reserved for general host OS usage as normal sized
|
||||
pages, leaving 26 GB for KVM usage as huge pages.
|
||||
|
||||
Once huge pages are reserved on the hypothetical machine, the ``virsh
|
||||
capabilities`` command output is expected to look approximately like:
|
||||
|
||||
::
|
||||
|
||||
<topology>
|
||||
<cells num='2'>
|
||||
<cell id='0'>
|
||||
<memory unit='KiB'>33554432</memory>
|
||||
<pages unit='KiB' size='4'>1572864</pages>
|
||||
<pages unit='KiB' size='2048'>0</pages>
|
||||
<pages unit='KiB' size='1048576'>26</pages>
|
||||
<distances>
|
||||
<sibling id='0' value='10'/>
|
||||
<sibling id='1' value='21'/>
|
||||
</distances>
|
||||
<cpus num='8'>
|
||||
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
|
||||
<cpu id='1' socket_id='0' core_id='1' siblings='1'/>
|
||||
<cpu id='2' socket_id='0' core_id='2' siblings='2'/>
|
||||
<cpu id='3' socket_id='0' core_id='3' siblings='3'/>
|
||||
<cpu id='4' socket_id='1' core_id='0' siblings='4'/>
|
||||
<cpu id='5' socket_id='1' core_id='1' siblings='5'/>
|
||||
<cpu id='6' socket_id='1' core_id='2' siblings='6'/>
|
||||
<cpu id='7' socket_id='1' core_id='3' siblings='7'/>
|
||||
</cpus>
|
||||
</cell>
|
||||
<cell id='1'>
|
||||
<memory unit='KiB'>33554432</memory>
|
||||
<pages unit='KiB' size='4'>1572864</pages>
|
||||
<pages unit='KiB' size='2048'>0</pages>
|
||||
<pages unit='KiB' size='1048576'>26</pages>
|
||||
<distances>
|
||||
<sibling id='0' value='21'/>
|
||||
<sibling id='1' value='10'/>
|
||||
</distances>
|
||||
<cpus num='8'>
|
||||
<cpu id='8' socket_id='0' core_id='0' siblings='8'/>
|
||||
<cpu id='9' socket_id='0' core_id='1' siblings='9'/>
|
||||
<cpu id='10' socket_id='0' core_id='2' siblings='10'/>
|
||||
<cpu id='11' socket_id='0' core_id='3' siblings='11'/>
|
||||
<cpu id='12' socket_id='1' core_id='0' siblings='12'/>
|
||||
<cpu id='13' socket_id='1' core_id='1' siblings='13'/>
|
||||
<cpu id='14' socket_id='1' core_id='2' siblings='14'/>
|
||||
<cpu id='15' socket_id='1' core_id='3' siblings='15'/>
|
||||
</cpus>
|
||||
</cell>
|
||||
</cells>
|
||||
</topology>
|
||||
|
||||
Be aware that CPU ID numbers are not always allocated sequentially as shown
|
||||
here. It is not unusual to see IDs interleaved between sockets on the two NUMA
|
||||
nodes, such that ``0-3,8-11`` are on the first node and ``4-7,12-15`` are on
|
||||
the second node. Carefully check the ``virsh capabilities`` output to determine
|
||||
the CPU ID numbers when configiring both ``isolcpus`` and the guest ``cpuset``
|
||||
values.
|
||||
|
||||
Guest configuration
|
||||
===================
|
||||
|
||||
What follows is an overview of the key parts of the domain XML that need to be
|
||||
configured to achieve low latency for real time workflows. The following example
|
||||
will assume a 4 CPU guest, requiring 16 GB of RAM. It is intended to be placed
|
||||
on the second host NUMA node.
|
||||
|
||||
CPU configuration
|
||||
-----------------
|
||||
|
||||
Real time KVM guests intended to run Linux should have a minimum of 2 CPUs.
|
||||
One vCPU is for running non-real time processes and performing I/O. The other
|
||||
vCPUs will run real time applications. Some non-Linux OS may not require a
|
||||
special non-real time CPU to be available, in which case the 2 CPU minimum would
|
||||
not apply.
|
||||
|
||||
Each guest CPU, even the non-real time one, needs to be pinned to a dedicated
|
||||
host core that is in the `isolcpus` reserved set. The QEMU emulator threads
|
||||
need to be pinned to host CPUs that are not in the `isolcpus` reserved set.
|
||||
The vCPUs need to be given a real time CPU scheduler policy.
|
||||
|
||||
When configuring the `guest CPU count <../formatdomain.html#elementsCPUAllocation>`_,
|
||||
do not include any CPU affinity at this stage:
|
||||
|
||||
::
|
||||
|
||||
<vcpu placement='static'>4</vcpu>
|
||||
|
||||
The guest CPUs now need to be placed individually. In this case, they will all
|
||||
be put within the same host socket, such that they can be exposed as core
|
||||
siblings. This is achieved using the `CPU tunning config <../formatdomain.html#elementsCPUTuning>`_:
|
||||
|
||||
::
|
||||
|
||||
<cputune>
|
||||
<emulatorpin cpuset="8-9"/>
|
||||
<vcpupin vcpu="0" cpuset="12"/>
|
||||
<vcpupin vcpu="1" cpuset="13"/>
|
||||
<vcpupin vcpu="2" cpuset="14"/>
|
||||
<vcpupin vcpu="3" cpuset="15"/>
|
||||
<vcpusched vcpus='0-4' scheduler='fifo' priority='1'/>
|
||||
</cputune>
|
||||
|
||||
The `guest CPU model <formatdomain.html#elementsCPU>`_ now needs to be
|
||||
configured to pass through the host model unchanged, with topology matching the
|
||||
placement:
|
||||
|
||||
::
|
||||
|
||||
<cpu mode='host-passthrough'>
|
||||
<topology sockets='1' dies='1' cores='4' threads='1'/>
|
||||
<feature policy='require' name='tsc-deadline'/>
|
||||
</cpu>
|
||||
|
||||
The performance monitoring unit virtualization needs to be disabled
|
||||
via the `hypervisor features <../formatdomain.html#elementsFeatures>`_:
|
||||
|
||||
::
|
||||
|
||||
<features>
|
||||
...
|
||||
<pmu state='off'/>
|
||||
</features>
|
||||
|
||||
|
||||
Memory configuration
|
||||
--------------------
|
||||
|
||||
The host memory used for guest RAM needs to be allocated from huge pages on the
|
||||
second NUMA node, and all other memory allocation needs to be locked into RAM
|
||||
with memory page sharing disabled.
|
||||
This is achieved by using the `memory backing config <formatdomain.html#elementsMemoryBacking>`_:
|
||||
|
||||
::
|
||||
|
||||
<memoryBacking>
|
||||
<hugepages>
|
||||
<page size="1" unit="G" nodeset="1"/>
|
||||
</hugepages>
|
||||
<locked/>
|
||||
<nosharepages/>
|
||||
</memoryBacking>
|
||||
|
||||
|
||||
Device configuration
|
||||
--------------------
|
||||
|
||||
Libvirt adds a few devices by default to maintain historical QEMU configuration
|
||||
behaviour. It is unlikely these devices are required by real time guests, so it
|
||||
is wise to disable them. Remove all USB controllers that may exist in the XML
|
||||
config and replace them with:
|
||||
|
||||
::
|
||||
|
||||
<controller type="usb" model="none"/>
|
||||
|
||||
Similarly the memory balloon config should be changed to
|
||||
|
||||
::
|
||||
|
||||
<memballoon model="none"/>
|
||||
|
||||
If the guest had a graphical console at installation time this can also be
|
||||
disabled, with remote access being over SSH, with a minimal serial console
|
||||
for emergencies.
|
Loading…
x
Reference in New Issue
Block a user