mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-01-23 04:55:18 +00:00
158 lines
6.2 KiB
ReStructuredText
158 lines
6.2 KiB
ReStructuredText
|
=============================
|
||
|
QEMU command-line passthrough
|
||
|
=============================
|
||
|
|
||
|
.. contents::
|
||
|
|
||
|
Libvirt aims to provide explicit modelling of virtualization features in
|
||
|
the domain XML document schema. QEMU has a very broad range of features
|
||
|
and not all of these can be mapped to elements in the domain XML. Libvirt
|
||
|
would like to reduce the gap to QEMU, however, with finite resources there
|
||
|
will always be cases which aren't covered by the domain XML schema.
|
||
|
|
||
|
|
||
|
XML document additions
|
||
|
======================
|
||
|
|
||
|
To deal with the problem, libvirt introduced support for command-line
|
||
|
passthrough of QEMU arguments. This is achieved by supporting a custom
|
||
|
XML namespace, under which some QEMU driver specific elements are defined.
|
||
|
|
||
|
The canonical place to declare the namespace is on the top level ``<domain>``
|
||
|
element. At the very end of the document, arbitrary command-line arguments
|
||
|
can now be added, using the namespace prefix ``qemu:``
|
||
|
|
||
|
::
|
||
|
|
||
|
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
|
||
|
<name>QEMUGuest1</name>
|
||
|
<uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
|
||
|
...
|
||
|
<qemu:commandline>
|
||
|
<qemu:arg value='-newarg'/>
|
||
|
<qemu:arg value='parameter'/>
|
||
|
<qemu:env name='ID' value='wibble'/>
|
||
|
<qemu:env name='BAR'/>
|
||
|
</qemu:commandline>
|
||
|
</domain>
|
||
|
|
||
|
Note that when an argument takes a value eg ``-newarg parameter``, the argument
|
||
|
and the value must be passed as separate ``<qemu:arg>`` entries.
|
||
|
|
||
|
Instead of declaring the XML namespace on the top level ``<domain>`` it is also
|
||
|
possible to declare it at time of use, which is more convenient for humans
|
||
|
writing the XML documents manually. So the following example is functionally
|
||
|
identical:
|
||
|
|
||
|
::
|
||
|
|
||
|
<domain type='kvm'>
|
||
|
<name>QEMUGuest1</name>
|
||
|
<uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
|
||
|
...
|
||
|
<commandline xmlns="http://libvirt.org/schemas/domain/qemu/1.0">
|
||
|
<arg value='-newarg'/>
|
||
|
<arg value='parameter'/>
|
||
|
<env name='ID' value='wibble'/>
|
||
|
<env name='BAR'/>
|
||
|
</commandline>
|
||
|
</domain>
|
||
|
|
||
|
Note that when querying the XML from libvirt, it will have been translated into
|
||
|
the canonical syntax once more with the namespace on the top level element.
|
||
|
|
||
|
Security confinement / sandboxing
|
||
|
=================================
|
||
|
|
||
|
When libvirt launches a QEMU process it makes use of a number of security
|
||
|
technologies to confine QEMU and thus protect the host from malicious VM
|
||
|
breakouts.
|
||
|
|
||
|
When configuring security protection, however, libvirt generally needs to know
|
||
|
exactly which host resources the VM is permitted to access. It gets this
|
||
|
information from the domain XML document. This only works for elements in the
|
||
|
regular schema, the arguments used with command-line passthrough are completely
|
||
|
opaque to libvirt.
|
||
|
|
||
|
As a result, if command-line passthrough is used to expose a file on the host
|
||
|
to QEMU, the security protections will activate and either kill QEMU or deny it
|
||
|
access.
|
||
|
|
||
|
There are two strategies for dealing with this problem, either figure out what
|
||
|
steps are needed to grant QEMU access to the device, or disable the security
|
||
|
protections. The former is harder, but more secure, while the latter is simple.
|
||
|
|
||
|
Granting access per VM
|
||
|
----------------------
|
||
|
|
||
|
* SELinux - the file on the host needs an SELinux label that will grant access
|
||
|
to QEMU's ``svirt_t`` policy.
|
||
|
|
||
|
- Read-only access - use the ``virt_content_t`` label
|
||
|
- Shared, write access - use the ``svirt_image_t:s0`` label (ie no Multi-
|
||
|
Category Security (MCS) value appended)
|
||
|
- Exclusive, write access - use the ``svirt_image_t:s0:MCS`` label for the VM.
|
||
|
The MCS is auto-generatd at boot time, so this may require re-configuring
|
||
|
the VM to have a fixed MCS label
|
||
|
|
||
|
* Discretionary Access Control (DAC) - the file on the host needs to be
|
||
|
readable/writable to the ``qemu`` user or ``qemu`` group. This can be done
|
||
|
by changing the file ownership to ``qemu``, or relaxing the permissions to
|
||
|
allow world read, or adding file ACLs to allow access to ``qemu``.
|
||
|
|
||
|
* Namespaces - a private ``mount`` namespace is used for QEMU by default
|
||
|
which populates a new ``/dev`` with only the device nodes needed by QEMU.
|
||
|
There is no way to augment the set of device nodes ahead of time.
|
||
|
|
||
|
* Seccomp - libvirt launches QEMU with its built-in seccomp policy enabled with
|
||
|
``obsolete=deny``, ``elevateprivileges=deny``, ``spawn=deny`` and
|
||
|
``resourcecontrol=deny`` settings active. There is no way to change this
|
||
|
policy on a per VM basis.
|
||
|
|
||
|
* Cgroups - a custom cgroup is created per VM and this will either use the
|
||
|
``devices`` controller or an ``BPF`` rule to whitelist a set of device nodes.
|
||
|
There is no way to change this policy on a per VM basis.
|
||
|
|
||
|
Disabling security protection per VM
|
||
|
------------------------------------
|
||
|
|
||
|
Some of the security protections can be disabled per-VM:
|
||
|
|
||
|
* SELinux - in the domain XML the ``<seclabel>`` model can be changed to
|
||
|
``none`` instead of ``selinux``, which will make the VM run unconfined.
|
||
|
|
||
|
* DAC - in the domain XML an ``<seclabel>`` element with the ``dac`` model can
|
||
|
be added, configured with a user / group account of ``root`` to make QEMU run
|
||
|
with full privileges.
|
||
|
|
||
|
* Namespaces - there is no way to disable this per VM.
|
||
|
|
||
|
* Seccomp - there is no way to disable this per VM.
|
||
|
|
||
|
* Cgroups - there is no way to disable this per VM.
|
||
|
|
||
|
Disabling security protection host-wide
|
||
|
---------------------------------------
|
||
|
|
||
|
As a last resort it is possible to disable security protection host wide which
|
||
|
will affect all virtual machines. These settings are all made in
|
||
|
``/etc/libvirt/qemu.conf``
|
||
|
|
||
|
* SELinux - set ``security_default_confied = 0`` to make QEMU run unconfined by
|
||
|
default, while still allowing explicit opt-in to SELinux for VMs.
|
||
|
|
||
|
* DAC - set ``user = root`` and ``group = root`` to make QEMU run as the root
|
||
|
account.
|
||
|
|
||
|
* SELinux, DAC - set ``security_driver = []`` to entirely disable both the
|
||
|
SELinux and DAC security drivers.
|
||
|
|
||
|
* Namespaces - set ``namespaces = []`` to disable use of the ``mount``
|
||
|
namespaces, causing QEMU to see the normal fully popualated ``dev``.
|
||
|
|
||
|
* Seccomp - set ``seccomp_sandbox = 0`` to disable use of the Seccomp sandboxing
|
||
|
in QEMU.
|
||
|
|
||
|
* Cgroups - set ``cgroup_device_acl`` to include the desired device node, or
|
||
|
``cgroup_controllers = [...]`` to exclude the ``devices`` controller.
|