https://bugzilla.redhat.com/show_bug.cgi?id=1147057
The code for relabelling the TAP FD is there due to a race. When
libvirt creates a /dev/tapN device it's labeled as
'system_u:object_r:device_t:s0' by default. Later, when
udev/systemd reacts to this device, it's relabelled to the
expected label 'system_u:object_r:tun_tap_device_t:s0'. Hence, we
have a code that relabels the device, to cut the race down. For
more info see ae368ebfcc4.
But the problem is, the relabel function is called on all TUN/TAP
devices. Yes, on /dev/net/tun too. This is however a special kind
of device - other processes uses it too. We shouldn't touch it's
label then.
Ideally, there would an API in SELinux that would label just the
passed FD and not the underlying path. That way, we wouldn't need
to care as we would be not labeling /dev/net/tun but the FD
passed to the domain. Unfortunately, there's no such API so we
have to workaround until then.
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The check for ISCSI devices was missing a check of subsys type, which
meant we could skip labelling of other host devices as well. This fixes
USB hotplug on F21
https://bugzilla.redhat.com/show_bug.cgi?id=1145968
Add a new parameter to virStorageFileGetMetadata that will break the
backing chain detection process and report useful error message rather
than having to use virStorageFileChainGetBroken.
This patch just introduces the option, usage will be provided
separately.
https://bugzilla.redhat.com/show_bug.cgi?id=1141879
A long time ago I've implemented support for so called multiqueue
net. The idea was to let guest network traffic be processed by
multiple host CPUs and thus increasing performance. However, this
behavior is enabled by QEMU via special ioctl() iterated over the
all tap FDs passed in by libvirt. Unfortunately, SELinux comes in
and disallows the ioctl() call because the /dev/net/tun has label
system_u:object_r:tun_tap_device_t:s0 and 'attach_queue' ioctl()
is not allowed on tun_tap_device_t type. So after discussion with
a SELinux developer we've decided that the FDs passed to the QEMU
should be labelled with svirt_t type and SELinux policy will
allow the ioctl(). Therefore I've made a patch
(cf976d9dcf4e592261b14f03572) that does exactly this. The patch
was fixed then by a4431931393aeb1ac5893f121151fa3df4fde612 and
b635b7a1af0e64754016d758376f382470bc11e7. However, things are not
that easy - even though the API to label FD is called
(fsetfilecon_raw) the underlying file is labelled too! So
effectively we are mangling /dev/net/tun label. Yes, that broke
dozen of other application from openvpn, or boxes, to qemu
running other domains.
The best solution would be if SELinux provides a way to label an
FD only, which could be then labeled when passed to the qemu.
However that's a long path to go and we should fix this
regression AQAP. So I went to talk to the SELinux developer again
and we agreed on temporary solution that:
1) All the three patches are reverted
2) SELinux temporarily allows 'attach_queue' on the
tun_tap_device_t
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I've noticed two problem with the automatically created NVRAM varstore
file. The first, even though I run qemu as root:root for some reason I
get Permission denied when trying to open the _VARS.fd file. The
problem is, the upper directory misses execute permissions, which in
combination with us dropping some capabilities result in EPERM.
The next thing is, that if I switch SELinux to enforcing mode, I get
another EPERM because the vars file is not labeled correctly. It is
passed to qemu as disk and hence should be labelled as disk. QEMU may
write to it eventually, so this is different to kernel or initrd.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
After a4431931 the TAP FDs ale labeled with image label instead
of the process label. On the other hand, the commit was
incomplete as a few lines above, there's still old check for the
process label presence while it should be check for the image
label instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
QEMU now supports UEFI with the following command line:
-drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
where the first line reflects <loader> and the second one <nvram>.
Moreover, these two lines obsolete the -bios argument.
Note that UEFI is unusable without ACPI. This is handled properly now.
Among with this extension, the variable file is expected to be
writable and hence we need security drivers to label it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Up to now, users can configure BIOS via the <loader/> element. With
the upcoming implementation of UEFI this is not enough as BIOS and
UEFI are conceptually different. For instance, while BIOS is ROM, UEFI
is programmable flash (although all writes to code section are
denied). Therefore we need new attribute @type which will
differentiate the two. Then, new attribute @readonly is introduced to
reflect the fact that some images are RO.
Moreover, the OVMF (which is going to be used mostly), works in two
modes:
1) Code and UEFI variable store is mixed in one file.
2) Code and UEFI variable store is separated in two files
The latter has advantage of updating the UEFI code without losing the
configuration. However, in order to represent the latter case we need
yet another XML element: <nvram/>. Currently, it has no additional
attributes, it's just a bare element containing path to the variable
store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For security type='none' libvirt according to the docs should not
generate seclabel be it for selinux or any model. So, skip the
reservation of labels when type is none.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
The cleanup in commit cf976d9d used secdef->label to label the tap
FDs, but that is not possible since it's process-only label (svirt_t)
and not a object label (e.g. svirt_image_t). Starting a domain failed
with EPERM, but simply using secdef->imagelabel instead of
secdef->label fixes it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1095636
When starting up the domain the domain's NICs are allocated. As of
1f24f682 (v1.0.6) we are able to use multiqueue feature on virtio
NICs. It breaks network processing into multiple queues which can be
processed in parallel by different host CPUs. The queues are, however,
created by opening /dev/net/tun several times. Unfortunately, only the
first FD in the row is labelled so when turning the multiqueue feature
on in the guest, qemu will get AVC denial. Make sure we label all the
FDs needed.
Moreover, the default label of /dev/net/tun doesn't allow
attaching a queue:
type=AVC msg=audit(1399622478.790:893): avc: denied { attach_queue }
for pid=7585 comm="qemu-kvm"
scontext=system_u:system_r:svirt_t:s0:c638,c877
tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023
tclass=tun_socket
And as suggested by SELinux maintainers, the tun FD should be labeled
as svirt_t. Therefore, we don't need to adjust any range (as done
previously by Guannan in ae368ebf) rather set the seclabel of the
domain directly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Create the structures and API's to hold and manage the iSCSI host device.
This extends the 'scsi_host' definitions added in commit id '5c811dce'.
A future patch will add the XML parsing, but that code requires some
infrastructure to be in place first in order to handle the differences
between a 'scsi_host' and an 'iSCSI host' device.
Split virDomainHostdevSubsysSCSI further. In preparation for having
either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI
to contain just a virDomainHostdevSubsysSCSIHost to describe the
'scsi_host' host device
To integrate the security driver with the storage driver we need to
pass a callback for a function that will chown storage volumes.
Introduce and document the callback prototype.
When restoring security labels in the dac driver the code would resolve
the file path and use the resolved one to be chown-ed. The setting code
doesn't do that. Remove the unnecessary code.
Rework the apparmor lxc profile abstraction to mimic ubuntu's container-default.
This profile allows quite a lot, but strives to restrict access to
dangerous resources.
Removing the explicit authorizations to bash, systemd and cron files,
forces them to keep the lxc profile for all applications inside the
container. PUx permissions where leading to running systemd (and others
tasks) unconfined.
Put the generic files, network and capabilities restrictions directly
in the TEMPLATE.lxc: this way, users can restrict them on a per
container basis.
Don't fail when there is nothing to do, as a tweak to the previous
patch regarding output of libvirt-UUID.files for LXC apparmor profiles
Signed-off-by: Eric Blake <eblake@redhat.com>
This negation in names of boolean variables is driving me insane. The
code is much more readable if we drop the 'no-' prefix. Well, at least
for me.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add security driver functions to label separate storage images using the
virStorageSource definition. This will help to avoid the need to do ugly
changes to the disk struct and use the source directly.
In the future we might need to track state of individual images. Move
the readonly and shared flags to the virStorageSource struct so that we
can keep them in a per-image basis.
The function headers contain type on the same line as the name. When
combined with usage of ATTRIBUTE_UNUSED, the function headers were very
long. Shorten them by breaking the line after the type.
virSecurityManagerSetDiskLabel and virSecurityManagerRestoreDiskLabel
don't have complementary semantics. Document the semantics to avoid
possible problems.
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
I'm going to add functions that will deal with individual image files
rather than whole disks. Rename the security function to make room for
the new one.
The image labels are stored in the virStorageSource struct. Convert the
virDomainDiskDefGetSecurityLabelDef helper not to use the full disk def
and move it appropriately.
A network disk might actually be backed by local storage. Also the path
iterator actually handles networked disks well now so remove the code
that skips the labelling in dac and selinux security driver.
As part of the work on backing chains, I'm finding that it would
be easier to directly manipulate chains of pointers (adding a
snapshot merely adjusts pointers to form the correct list) rather
than copy data from one struct to another. This patch converts
domain disk source to be a pointer.
In this patch, the pointer is ALWAYS allocated (thanks in part to
the previous patch forwarding all disk def allocation through a
common point), and all other changse are just mechanical fallout of
the new type; there should be no functional change. It is possible
that we may want to leave the pointer NULL for a cdrom with no
medium in a later patch, but as that requires a closer audit of the
source to ensure we don't fault on a null dereference, I didn't do
it here.
* src/conf/domain_conf.h (_virDomainDiskDef): Change type of src.
* src/conf/domain_conf.c: Adjust all clients.
* src/security/security_selinux.c: Likewise.
* src/qemu/qemu_domain.c: Likewise.
* src/qemu/qemu_command.c: Likewise.
* src/qemu/qemu_conf.c: Likewise.
* src/qemu/qemu_process.c: Likewise.
* src/qemu/qemu_migration.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/lxc/lxc_driver.c: Likewise.
* src/lxc/lxc_controller.c: Likewise.
* tests/securityselinuxlabeltest.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>