mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-03-20 07:59:00 +00:00
Add documentation about the QEMU driver security features
* docs/drvqemu.html.in: Document DAC usage, SELinux integration, Linux capabilities, and Cgroups device ACLs
This commit is contained in:
parent
3ec80d0112
commit
690b4ad329
@ -142,6 +142,21 @@
|
|||||||
<a href="#prereq">Deployment pre-requisites</a>
|
<a href="#prereq">Deployment pre-requisites</a>
|
||||||
</li><li>
|
</li><li>
|
||||||
<a href="#uris">Connections to QEMU driver</a>
|
<a href="#uris">Connections to QEMU driver</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#security">Driver security architecture</a>
|
||||||
|
<ul><li>
|
||||||
|
<a href="#securitydriver">Driver instances</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#securitydac">POSIX DAC users/groups</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#securitycap">Linux DAC capabilities</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#securityselinux">SELinux MAC basic confinement</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#securitysvirt">SELinux MAC sVirt confinement</a>
|
||||||
|
</li><li>
|
||||||
|
<a href="#securityacl">Cgroups device ACLs</a>
|
||||||
|
</li></ul>
|
||||||
</li><li>
|
</li><li>
|
||||||
<a href="#imex">Import and export of libvirt domain XML configs</a>
|
<a href="#imex">Import and export of libvirt domain XML configs</a>
|
||||||
<ul><li>
|
<ul><li>
|
||||||
@ -196,6 +211,271 @@
|
|||||||
qemu+tcp://example.com/system (remote access, SASl/Kerberos)
|
qemu+tcp://example.com/system (remote access, SASl/Kerberos)
|
||||||
qemu+ssh://root@example.com/system (remote access, SSH tunnelled)
|
qemu+ssh://root@example.com/system (remote access, SSH tunnelled)
|
||||||
</pre>
|
</pre>
|
||||||
|
<h2>
|
||||||
|
<a name="security" id="security">Driver security architecture</a>
|
||||||
|
</h2>
|
||||||
|
<p>
|
||||||
|
There are multiple layers to security in the QEMU driver, allowing for
|
||||||
|
flexibility in the use of QEMU based virtual machines.
|
||||||
|
</p>
|
||||||
|
<h3>
|
||||||
|
<a name="securitydriver" id="securitydriver">Driver instances</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
As explained above there are two ways to access the QEMU driver
|
||||||
|
in libvirt. The "qemu:///session" family of URIs connect to a
|
||||||
|
libvirtd instance running as the same user/group ID as the client
|
||||||
|
application. Thus the QEMU instances spawned from this driver will
|
||||||
|
share the same privileges as the client application. The intended
|
||||||
|
use case for this driver is desktop virtualization, with virtual
|
||||||
|
machines storing their disk imags in the user's home directory and
|
||||||
|
being managed from the local desktop login session.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The "qemu:///system" family of URIs connect to a
|
||||||
|
libvirtd instance running as the privileged system account 'root'.
|
||||||
|
Thus the QEMU instances spawned from this driver may have much
|
||||||
|
higher privileges than the client application managing them.
|
||||||
|
The intended use case for this driver is server virtualization,
|
||||||
|
where the virtual machines may need to be connected to host
|
||||||
|
resources (block, PCI, USB, network devices) whose access requires
|
||||||
|
elevated privileges.
|
||||||
|
</p>
|
||||||
|
<h3>
|
||||||
|
<a name="securitydac" id="securitydac">POSIX DAC users/groups</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
In the "session" instance, the POSIX DAC model restricts QEMU virtual
|
||||||
|
machines (and libvirtd in general) to only have access to resources
|
||||||
|
with the same user/group ID as the client application. There is no
|
||||||
|
finer level of configuration possible for the "session" instances.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
In the "system" instance, libvirt releases from 0.7.0 onwards allow
|
||||||
|
control over the user/group that the QEMU virtual machines are run
|
||||||
|
as. A build of libvirt with no configuration parameters set will
|
||||||
|
still run QEMU processes as root:root. It is possible to change
|
||||||
|
this default by using the --with-qemu-user=$USERNAME and
|
||||||
|
--with-qemu-group=$GROUPNAME arguments to 'configure' during
|
||||||
|
build. It is strongly recommended that vendors build with both
|
||||||
|
of these arguments set to 'qemu'. Regardless of this build time
|
||||||
|
default, administrators can set a per-host default setting in
|
||||||
|
the <code>/etc/libvirt/qemu.conf</code> configuration file via
|
||||||
|
the <code>user=$USERNAME</code> and <code>group=$GROUPNAME</code>
|
||||||
|
parameters. When a non-root user or group is configured, the
|
||||||
|
libvirt QEMU driver will change uid/gid to match immediately
|
||||||
|
before executing the QEMU binary for a virtual machine.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
If QEMU virtual machines from the "system" instance are being
|
||||||
|
run as non-root, there will be greater restrictions on what
|
||||||
|
host resources the QEMU process will be able to access. The
|
||||||
|
libvirtd daemon will attempt to manage permissions on resources
|
||||||
|
to minise the likelihood of unintentionale security denials,
|
||||||
|
but the administrator / application developer must be aware of
|
||||||
|
some of the consequences / restrictions.
|
||||||
|
</p>
|
||||||
|
<ul><li>
|
||||||
|
<p>
|
||||||
|
The directories <code>/var/run/libvirt/qemu/</code>,
|
||||||
|
<code>/var/lib/libvirt/qemu/</code> and
|
||||||
|
<code>/var/cache/libvirt/qemu/</code> must all have their
|
||||||
|
ownership set to match the user / group ID that QEMU
|
||||||
|
guests will be run as. If the vendor has set a non-root
|
||||||
|
user/group for the QEMU driver at build time, the
|
||||||
|
permissions should be set automatically at install time.
|
||||||
|
If a host administrator customizes user/group in
|
||||||
|
<code>/etc/libvirt/qemu.conf</code>, they will need to
|
||||||
|
manually set the ownership on these directories.
|
||||||
|
</p>
|
||||||
|
</li><li>
|
||||||
|
<p>
|
||||||
|
When attaching PCI and USB devices to a QEMU guest,
|
||||||
|
QEMU will need to access files in <code>/dev/bus/usb</code>
|
||||||
|
and <code>/sys/bus/devices</code>. The libvirtd daemon
|
||||||
|
will automatically set the ownership on specific devices
|
||||||
|
that are assigned to a guest at start time. There should
|
||||||
|
not be any need for administrator changes in this respect.
|
||||||
|
</p>
|
||||||
|
</li><li>
|
||||||
|
<p>
|
||||||
|
Any files/devices used as guest disk images must be
|
||||||
|
accessible to the user/group ID that QEMU guests are
|
||||||
|
configured to run as. The libvirtd daemon will automatically
|
||||||
|
set the ownership of the file/device path to the correct
|
||||||
|
user/group ID. Applications / administrators must be aware
|
||||||
|
though that the parent directory permissions may still
|
||||||
|
deny access. The directories containing disk images
|
||||||
|
must either have their ownership set to match the user/group
|
||||||
|
configured for QEMU, or their UNIX file permissions must
|
||||||
|
have the 'execute/search' bit enabled for 'others'.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The simplest option is the latter one, of just enabling
|
||||||
|
the 'execute/search' bit. For any directory to be used
|
||||||
|
for storing disk images, this can be achived by running
|
||||||
|
the following command on the directory itself, and any
|
||||||
|
parent directories
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
chmod o+x /path/to/directory
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
In particular note that if using the "system" instance
|
||||||
|
and attempting to store disk images in a user home
|
||||||
|
directory, the default permissions on $HOME are typically
|
||||||
|
too restrictive to allow access.
|
||||||
|
</p>
|
||||||
|
</li></ul>
|
||||||
|
<h3>
|
||||||
|
<a name="securitycap" id="securitycap">Linux DAC capabilities</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
The libvirt QEMU driver has a build time option allowing it to use
|
||||||
|
the <a href="http://people.redhat.com/sgrubb/libcap-ng/index.html">libcap-ng</a>
|
||||||
|
library to manage process capabilities. If this build option is
|
||||||
|
enabled, then the QEMU driver will use this to ensure that all
|
||||||
|
process capabilities are dropped before executing a QEMU virtual
|
||||||
|
machine. Process capabilities are what gives the 'root' account
|
||||||
|
its high power, in particular the CAP_DAC_OVERRIDE capability
|
||||||
|
is what allows a process running as 'root' to access files owned
|
||||||
|
by any user.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
If the QEMU driver is configured to run virtual machines as non-root,
|
||||||
|
then they will already loose all their process capabilities at time
|
||||||
|
of startup. The Linux capability feature is thus aimed primarily at
|
||||||
|
the scenario where the QEMU processes are running as root. In this
|
||||||
|
case, before launching a QEMU virtual machine, libvirtd will use
|
||||||
|
libcap-ng APIs to drop all process capabilities. It is important
|
||||||
|
for administrators to note that this implies the QEMU process will
|
||||||
|
<strong>only</strong> be able to access files owned by root, and
|
||||||
|
not files owned by any other user.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Thus, if a vendor / distributor has configured their libvirt package
|
||||||
|
to run as 'qemu' by default, a number of changes will be required
|
||||||
|
before an administrator can change a host to run guests as root.
|
||||||
|
In particular it will be neccessary to change ownership on the
|
||||||
|
directories <code>/var/run/libvirt/qemu/</code>,
|
||||||
|
<code>/var/lib/libvirt/qemu/</code> and
|
||||||
|
<code>/var/cache/libvirt/qemu/</code> back to root, in addition
|
||||||
|
to changing the <code>/etc/libvirt/qemu.conf</code> settings.
|
||||||
|
</p>
|
||||||
|
<h3>
|
||||||
|
<a name="securityselinux" id="securityselinux">SELinux MAC basic confinement</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
The basic SELinux protection for QEMU virtual machines is intended to
|
||||||
|
protect the host OS from a compromised virtual machine process. There
|
||||||
|
is no protection between guests.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
In the basic model, all QEMU virtual machines run under the confined
|
||||||
|
domain <code>root:system_r:qemu_t</code>. It is required that any
|
||||||
|
disk image assigned to a QEMU virtual machine is labelled with
|
||||||
|
<code>system_u:object_r:virt_image_t</code>. In a default deployment,
|
||||||
|
package vendors/distributor will typically ensure that the directory
|
||||||
|
<code>/var/lib/libvirt/images</code> has this label, such that any
|
||||||
|
disk images created in this directory will automatically inherit the
|
||||||
|
correct labelling. If attempting to use disk images in another
|
||||||
|
location, the user/administrator must ensure the directory has be
|
||||||
|
given this requisite label. Likewise physical block devices must
|
||||||
|
be labelled <code>system_u:object_r:virt_image_t</code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Not all filesystems allow for labelling of individual files. In
|
||||||
|
particular NFS, VFat and NTFS have no support for labelling. In
|
||||||
|
these cases administrators must use the 'context' option when
|
||||||
|
mounting the filesystem to set the default label to
|
||||||
|
<code>system_u:object_r:virt_image_t</code>. In the case of
|
||||||
|
NFS, there is an alternative option, of enabling the <code>virt_use_nfs</code>
|
||||||
|
SELinux boolean.
|
||||||
|
</p>
|
||||||
|
<h3>
|
||||||
|
<a name="securitysvirt" id="securitysvirt">SELinux MAC sVirt confinement</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
The SELinux sVirt protection for QEMU virtual machines builds to the
|
||||||
|
basic level of protection, to also allow individual guests to be
|
||||||
|
protected from each other.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
In the sVirt model, each QEMU virtual machine runs under its own
|
||||||
|
confined domain, which is based on <code>system_u:system_r:svirt_t:s0</code>
|
||||||
|
with a unique category appended, eg, <code>system_u:system_r:svirt_t:s0:c34,c44</code>.
|
||||||
|
The rules are setup such that a domain can only access files which are
|
||||||
|
labelled with the matching category level, eg
|
||||||
|
<code>system_u:object_r:svirt_image_t:s0:c34,c44</code>. This prevents one
|
||||||
|
QEMU process accessing any file resources that are prevent to another QEMU
|
||||||
|
process.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
There are two ways of assigning labels to virtual machines under sVirt.
|
||||||
|
In the default setup, if sVirt is enabled, guests will get an automatically
|
||||||
|
assigned unique label each time they are booted. The libvirtd daemon will
|
||||||
|
also automatically relabel exclusive access disk images to match this
|
||||||
|
label. Disks that are marked as <shared> will get a generic
|
||||||
|
label <code>system_u:system_r:svirt_image_t:s0</code> allowing all guests
|
||||||
|
read/write access them, while disks marked as <readonly> will
|
||||||
|
get a generic label <code>system_u:system_r:svirt_content_t:s0</code>
|
||||||
|
which allows all guests read-only access.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
With statically assigned labels, the application should include the
|
||||||
|
desired guest and file labels in the XML at time of creating the
|
||||||
|
guest with libvirt. In this scenario the application is responsible
|
||||||
|
for ensuring the disk images & similar resources are suitably
|
||||||
|
labelled to match, libvirtd will not attempt any relabelling.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
If the sVirt security model is active, then the node capabilties
|
||||||
|
XML will include its details. If a virtual machine is currently
|
||||||
|
protected by the security model, then the guest XML will include
|
||||||
|
its assigned labels. If enabled at compile time, the sVirt security
|
||||||
|
model will always be activated if SELinux is available on the host
|
||||||
|
OS. To disable sVirt, and revert to the basic level of SELinux
|
||||||
|
protection (host protection only), the <code>/etc/libvirt/qemu.conf</code>
|
||||||
|
file can be used to change the setting to <code>security_driver="none"</code>
|
||||||
|
</p>
|
||||||
|
<h3>
|
||||||
|
<a name="securityacl" id="securityacl">Cgroups device ACLs</a>
|
||||||
|
</h3>
|
||||||
|
<p>
|
||||||
|
Recent Linux kernels have a capability known as "cgroups" which is used
|
||||||
|
for resource management. It is implemented via a number of "controllers",
|
||||||
|
each controller covering a specific task/functional area. One of the
|
||||||
|
available controllers is the "devices" controller, which is able to
|
||||||
|
setup whitelists of block/character devices that a cgroup should be
|
||||||
|
allowed to access. If the "devices" controller is mounted on a host,
|
||||||
|
then libvirt will automatically create a dedicated cgroup for each
|
||||||
|
QEMU virtual machine and setup the device whitelist so that the QEMU
|
||||||
|
process can only access shared devices, and explicitly disks images
|
||||||
|
backed by block devices.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The list of shared devices a guest is allowed access to is
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
/dev/null, /dev/full, /dev/zero,
|
||||||
|
/dev/random, /dev/urandom,
|
||||||
|
/dev/ptmx, /dev/kvm, /dev/kqemu,
|
||||||
|
/dev/rtc, /dev/hpet, /dev/net/tun
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
In the event of unanticipated needs arising, this can be customized
|
||||||
|
via the <code>/etc/libvirt/qemu.conf</code> file.
|
||||||
|
To mount the cgroups device controller, the following command
|
||||||
|
should be run as root, prior to starting libvirtd
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
mkdir /dev/cgroup
|
||||||
|
mount -t cgroup none /dev/cgroup -o devices
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
libvirt will then place each virtual machine in a cgroup at
|
||||||
|
<code>/dev/cgroup/libvirt/qemu/$VMNAME/</code>
|
||||||
|
</p>
|
||||||
<h2>
|
<h2>
|
||||||
<a name="imex" id="imex">Import and export of libvirt domain XML configs</a>
|
<a name="imex" id="imex">Import and export of libvirt domain XML configs</a>
|
||||||
</h2>
|
</h2>
|
||||||
|
@ -54,6 +54,292 @@
|
|||||||
qemu+ssh://root@example.com/system (remote access, SSH tunnelled)
|
qemu+ssh://root@example.com/system (remote access, SSH tunnelled)
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
|
<h2><a name="security">Driver security architecture</a></h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are multiple layers to security in the QEMU driver, allowing for
|
||||||
|
flexibility in the use of QEMU based virtual machines.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3><a name="securitydriver">Driver instances</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
As explained above there are two ways to access the QEMU driver
|
||||||
|
in libvirt. The "qemu:///session" family of URIs connect to a
|
||||||
|
libvirtd instance running as the same user/group ID as the client
|
||||||
|
application. Thus the QEMU instances spawned from this driver will
|
||||||
|
share the same privileges as the client application. The intended
|
||||||
|
use case for this driver is desktop virtualization, with virtual
|
||||||
|
machines storing their disk imags in the user's home directory and
|
||||||
|
being managed from the local desktop login session.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The "qemu:///system" family of URIs connect to a
|
||||||
|
libvirtd instance running as the privileged system account 'root'.
|
||||||
|
Thus the QEMU instances spawned from this driver may have much
|
||||||
|
higher privileges than the client application managing them.
|
||||||
|
The intended use case for this driver is server virtualization,
|
||||||
|
where the virtual machines may need to be connected to host
|
||||||
|
resources (block, PCI, USB, network devices) whose access requires
|
||||||
|
elevated privileges.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3><a name="securitydac">POSIX DAC users/groups</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In the "session" instance, the POSIX DAC model restricts QEMU virtual
|
||||||
|
machines (and libvirtd in general) to only have access to resources
|
||||||
|
with the same user/group ID as the client application. There is no
|
||||||
|
finer level of configuration possible for the "session" instances.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In the "system" instance, libvirt releases from 0.7.0 onwards allow
|
||||||
|
control over the user/group that the QEMU virtual machines are run
|
||||||
|
as. A build of libvirt with no configuration parameters set will
|
||||||
|
still run QEMU processes as root:root. It is possible to change
|
||||||
|
this default by using the --with-qemu-user=$USERNAME and
|
||||||
|
--with-qemu-group=$GROUPNAME arguments to 'configure' during
|
||||||
|
build. It is strongly recommended that vendors build with both
|
||||||
|
of these arguments set to 'qemu'. Regardless of this build time
|
||||||
|
default, administrators can set a per-host default setting in
|
||||||
|
the <code>/etc/libvirt/qemu.conf</code> configuration file via
|
||||||
|
the <code>user=$USERNAME</code> and <code>group=$GROUPNAME</code>
|
||||||
|
parameters. When a non-root user or group is configured, the
|
||||||
|
libvirt QEMU driver will change uid/gid to match immediately
|
||||||
|
before executing the QEMU binary for a virtual machine.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If QEMU virtual machines from the "system" instance are being
|
||||||
|
run as non-root, there will be greater restrictions on what
|
||||||
|
host resources the QEMU process will be able to access. The
|
||||||
|
libvirtd daemon will attempt to manage permissions on resources
|
||||||
|
to minise the likelihood of unintentionale security denials,
|
||||||
|
but the administrator / application developer must be aware of
|
||||||
|
some of the consequences / restrictions.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
<p>
|
||||||
|
The directories <code>/var/run/libvirt/qemu/</code>,
|
||||||
|
<code>/var/lib/libvirt/qemu/</code> and
|
||||||
|
<code>/var/cache/libvirt/qemu/</code> must all have their
|
||||||
|
ownership set to match the user / group ID that QEMU
|
||||||
|
guests will be run as. If the vendor has set a non-root
|
||||||
|
user/group for the QEMU driver at build time, the
|
||||||
|
permissions should be set automatically at install time.
|
||||||
|
If a host administrator customizes user/group in
|
||||||
|
<code>/etc/libvirt/qemu.conf</code>, they will need to
|
||||||
|
manually set the ownership on these directories.
|
||||||
|
</p>
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
<p>
|
||||||
|
When attaching PCI and USB devices to a QEMU guest,
|
||||||
|
QEMU will need to access files in <code>/dev/bus/usb</code>
|
||||||
|
and <code>/sys/bus/devices</code>. The libvirtd daemon
|
||||||
|
will automatically set the ownership on specific devices
|
||||||
|
that are assigned to a guest at start time. There should
|
||||||
|
not be any need for administrator changes in this respect.
|
||||||
|
</p>
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
<p>
|
||||||
|
Any files/devices used as guest disk images must be
|
||||||
|
accessible to the user/group ID that QEMU guests are
|
||||||
|
configured to run as. The libvirtd daemon will automatically
|
||||||
|
set the ownership of the file/device path to the correct
|
||||||
|
user/group ID. Applications / administrators must be aware
|
||||||
|
though that the parent directory permissions may still
|
||||||
|
deny access. The directories containing disk images
|
||||||
|
must either have their ownership set to match the user/group
|
||||||
|
configured for QEMU, or their UNIX file permissions must
|
||||||
|
have the 'execute/search' bit enabled for 'others'.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The simplest option is the latter one, of just enabling
|
||||||
|
the 'execute/search' bit. For any directory to be used
|
||||||
|
for storing disk images, this can be achived by running
|
||||||
|
the following command on the directory itself, and any
|
||||||
|
parent directories
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
chmod o+x /path/to/directory
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
In particular note that if using the "system" instance
|
||||||
|
and attempting to store disk images in a user home
|
||||||
|
directory, the default permissions on $HOME are typically
|
||||||
|
too restrictive to allow access.
|
||||||
|
</p>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h3><a name="securitycap">Linux DAC capabilities</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The libvirt QEMU driver has a build time option allowing it to use
|
||||||
|
the <a href="http://people.redhat.com/sgrubb/libcap-ng/index.html">libcap-ng</a>
|
||||||
|
library to manage process capabilities. If this build option is
|
||||||
|
enabled, then the QEMU driver will use this to ensure that all
|
||||||
|
process capabilities are dropped before executing a QEMU virtual
|
||||||
|
machine. Process capabilities are what gives the 'root' account
|
||||||
|
its high power, in particular the CAP_DAC_OVERRIDE capability
|
||||||
|
is what allows a process running as 'root' to access files owned
|
||||||
|
by any user.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If the QEMU driver is configured to run virtual machines as non-root,
|
||||||
|
then they will already loose all their process capabilities at time
|
||||||
|
of startup. The Linux capability feature is thus aimed primarily at
|
||||||
|
the scenario where the QEMU processes are running as root. In this
|
||||||
|
case, before launching a QEMU virtual machine, libvirtd will use
|
||||||
|
libcap-ng APIs to drop all process capabilities. It is important
|
||||||
|
for administrators to note that this implies the QEMU process will
|
||||||
|
<strong>only</strong> be able to access files owned by root, and
|
||||||
|
not files owned by any other user.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Thus, if a vendor / distributor has configured their libvirt package
|
||||||
|
to run as 'qemu' by default, a number of changes will be required
|
||||||
|
before an administrator can change a host to run guests as root.
|
||||||
|
In particular it will be neccessary to change ownership on the
|
||||||
|
directories <code>/var/run/libvirt/qemu/</code>,
|
||||||
|
<code>/var/lib/libvirt/qemu/</code> and
|
||||||
|
<code>/var/cache/libvirt/qemu/</code> back to root, in addition
|
||||||
|
to changing the <code>/etc/libvirt/qemu.conf</code> settings.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3><a name="securityselinux">SELinux MAC basic confinement</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The basic SELinux protection for QEMU virtual machines is intended to
|
||||||
|
protect the host OS from a compromised virtual machine process. There
|
||||||
|
is no protection between guests.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In the basic model, all QEMU virtual machines run under the confined
|
||||||
|
domain <code>root:system_r:qemu_t</code>. It is required that any
|
||||||
|
disk image assigned to a QEMU virtual machine is labelled with
|
||||||
|
<code>system_u:object_r:virt_image_t</code>. In a default deployment,
|
||||||
|
package vendors/distributor will typically ensure that the directory
|
||||||
|
<code>/var/lib/libvirt/images</code> has this label, such that any
|
||||||
|
disk images created in this directory will automatically inherit the
|
||||||
|
correct labelling. If attempting to use disk images in another
|
||||||
|
location, the user/administrator must ensure the directory has be
|
||||||
|
given this requisite label. Likewise physical block devices must
|
||||||
|
be labelled <code>system_u:object_r:virt_image_t</code>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Not all filesystems allow for labelling of individual files. In
|
||||||
|
particular NFS, VFat and NTFS have no support for labelling. In
|
||||||
|
these cases administrators must use the 'context' option when
|
||||||
|
mounting the filesystem to set the default label to
|
||||||
|
<code>system_u:object_r:virt_image_t</code>. In the case of
|
||||||
|
NFS, there is an alternative option, of enabling the <code>virt_use_nfs</code>
|
||||||
|
SELinux boolean.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3><a name="securitysvirt">SELinux MAC sVirt confinement</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The SELinux sVirt protection for QEMU virtual machines builds to the
|
||||||
|
basic level of protection, to also allow individual guests to be
|
||||||
|
protected from each other.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In the sVirt model, each QEMU virtual machine runs under its own
|
||||||
|
confined domain, which is based on <code>system_u:system_r:svirt_t:s0</code>
|
||||||
|
with a unique category appended, eg, <code>system_u:system_r:svirt_t:s0:c34,c44</code>.
|
||||||
|
The rules are setup such that a domain can only access files which are
|
||||||
|
labelled with the matching category level, eg
|
||||||
|
<code>system_u:object_r:svirt_image_t:s0:c34,c44</code>. This prevents one
|
||||||
|
QEMU process accessing any file resources that are prevent to another QEMU
|
||||||
|
process.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There are two ways of assigning labels to virtual machines under sVirt.
|
||||||
|
In the default setup, if sVirt is enabled, guests will get an automatically
|
||||||
|
assigned unique label each time they are booted. The libvirtd daemon will
|
||||||
|
also automatically relabel exclusive access disk images to match this
|
||||||
|
label. Disks that are marked as <shared> will get a generic
|
||||||
|
label <code>system_u:system_r:svirt_image_t:s0</code> allowing all guests
|
||||||
|
read/write access them, while disks marked as <readonly> will
|
||||||
|
get a generic label <code>system_u:system_r:svirt_content_t:s0</code>
|
||||||
|
which allows all guests read-only access.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
With statically assigned labels, the application should include the
|
||||||
|
desired guest and file labels in the XML at time of creating the
|
||||||
|
guest with libvirt. In this scenario the application is responsible
|
||||||
|
for ensuring the disk images & similar resources are suitably
|
||||||
|
labelled to match, libvirtd will not attempt any relabelling.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If the sVirt security model is active, then the node capabilties
|
||||||
|
XML will include its details. If a virtual machine is currently
|
||||||
|
protected by the security model, then the guest XML will include
|
||||||
|
its assigned labels. If enabled at compile time, the sVirt security
|
||||||
|
model will always be activated if SELinux is available on the host
|
||||||
|
OS. To disable sVirt, and revert to the basic level of SELinux
|
||||||
|
protection (host protection only), the <code>/etc/libvirt/qemu.conf</code>
|
||||||
|
file can be used to change the setting to <code>security_driver="none"</code>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
<h3><a name="securityacl">Cgroups device ACLs</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Recent Linux kernels have a capability known as "cgroups" which is used
|
||||||
|
for resource management. It is implemented via a number of "controllers",
|
||||||
|
each controller covering a specific task/functional area. One of the
|
||||||
|
available controllers is the "devices" controller, which is able to
|
||||||
|
setup whitelists of block/character devices that a cgroup should be
|
||||||
|
allowed to access. If the "devices" controller is mounted on a host,
|
||||||
|
then libvirt will automatically create a dedicated cgroup for each
|
||||||
|
QEMU virtual machine and setup the device whitelist so that the QEMU
|
||||||
|
process can only access shared devices, and explicitly disks images
|
||||||
|
backed by block devices.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The list of shared devices a guest is allowed access to is
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
/dev/null, /dev/full, /dev/zero,
|
||||||
|
/dev/random, /dev/urandom,
|
||||||
|
/dev/ptmx, /dev/kvm, /dev/kqemu,
|
||||||
|
/dev/rtc, /dev/hpet, /dev/net/tun
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In the event of unanticipated needs arising, this can be customized
|
||||||
|
via the <code>/etc/libvirt/qemu.conf</code> file.
|
||||||
|
To mount the cgroups device controller, the following command
|
||||||
|
should be run as root, prior to starting libvirtd
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
mkdir /dev/cgroup
|
||||||
|
mount -t cgroup none /dev/cgroup -o devices
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
libvirt will then place each virtual machine in a cgroup at
|
||||||
|
<code>/dev/cgroup/libvirt/qemu/$VMNAME/</code>
|
||||||
|
</p>
|
||||||
|
|
||||||
<h2><a name="imex">Import and export of libvirt domain XML configs</a></h2>
|
<h2><a name="imex">Import and export of libvirt domain XML configs</a></h2>
|
||||||
|
|
||||||
<p>The QEMU driver currently supports a single native
|
<p>The QEMU driver currently supports a single native
|
||||||
|
Loading…
x
Reference in New Issue
Block a user