Commit 379c0ce4bf introduced a call to umount(/dev) performed
inside the namespace that we run QEMU in.
As a result of this, on machines using AppArmor, VM startup now
fails with
internal error: Process exited prior to exec: libvirt:
QEMU Driver error: failed to umount devfs on /dev: Permission denied
The corresponding denial is
AVC apparmor="DENIED" operation="umount" profile="libvirtd"
name="/dev/" pid=70036 comm="rpc-libvirtd"
Extend the AppArmor configuration for virtqemud and libvirtd so
that this operation is allowed.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
For SGX type of memory, QEMU needs to open and talk to
/dev/sgx_vepc and /dev/sgx_provision files. But we do not set nor
restore SELinux labels on these files when starting a guest.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This fits better with the element containing the value (<driver>), and
allows us to use virDomainNetBackend* for things in the <backend>
element.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Unfortunately unlike with DAC we can't simply ignore labelling for the
FD and it also influences the on-disk state.
Thus we need to relabel the FD and we also store the existing label in
cases when the user will request best-effort label replacement.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
DAC security label is irrelevant once you have the FD. Disable all
labelling for such images.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The 'fdgroup' will allow users to specify a passed FD (via the
'virDomainFDAssociate()' API) to be used instead of opening a path.
This is useful in cases when e.g. the file is not accessible from inside
a container.
Since this uses the same disk type as when we open files via names this
patch also introduces a hypervisor feature which the hypervisor asserts
that code paths are ready for this possibility.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Introduce a new backend type 'external' for connecting to a swtpm daemon
not managed by libvirtd.
Mostly in one commit, thanks to -Wswitch and the way we generate
capabilities.
https://bugzilla.redhat.com/show_bug.cgi?id=2063723
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The virSecurityDomainSetTPMLabels() and
virSecurityDomainRestoreTPMLabels() APIs set/restore label on two
files/directories:
1) the TPM state (tpm->data.emulator.storagepath), and
2) the TPM log file (tpm->data.emulator.logfile).
Soon there will be a need to set the label on the log file but
not on the state. Therefore, extend these APIs for a boolean flag
that when set does both, but when unset does only 2).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
As of [1]. libselinux changed the type of context_str() - it now
returns a const string. Follow this change in our code base.
1: dd98fa3227
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use same style in the 'struct option' as:
struct option opt[] = {
{ a, b },
{ a, b },
...
{ a, b },
};
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
For the handling of usb we already allow plenty of read access,
but so far /sys/bus/usb/devices only needed read access to the directory
to enumerate the symlinks in there that point to the actual entries via
relative links to ../../../devices/.
But in more recent systemd with updated libraries a program might do
getattr calls on those symlinks. And while symlinks in apparmor usually
do not matter, as it is the effective target of an access that has to be
allowed, here the getattr calls are on the links themselves.
On USB hostdev usage that causes a set of denials like:
apparmor="DENIED" operation="getattr" class="file"
name="/sys/bus/usb/devices/usb1" comm="qemu-system-x86"
requested_mask="r" denied_mask="r" ...
It is safe to read the links, therefore add a rule to allow it to
the block of rules that covers the usb related access.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Michal Privoznik <mprivozn at redhat.com>
As advertised in previous commits, QEMU needs to access
/dev/sgx_vepc and /dev/sgx_provision files when SGX memory
backend is configured. And if it weren't for QEMU's namespaces,
we wouldn't dare to relabel them, because they are system wide
files. But if namespaces are used, then we can set label on
domain's private copies, just like we do for /dev/sev.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Haibin Huang <haibin.huang@intel.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use the helper with more features to validate the root XML element name
instead of open-coding it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Riscv64 usually uses u-boot as external -kernel and a loader from
the open implementation of RISC-V SBI. The paths for those binaries
as packaged in Debian and Ubuntu are in paths which are usually
forbidden to be added by the user under /usr/lib...
People used to start riscv64 guests only manually via qemu cmdline,
but trying to encapsulate that via libvirt now causes failures when
starting the guest due to the apparmor isolation not allowing that:
virt-aa-helper: error: skipped restricted file
virt-aa-helper: error: invalid VM definition
Explicitly allow the sub-paths used by u-boot-qemu and opensbi
under /usr/lib/ as readonly rules.
Fixes: https://launchpad.net/bugs/1990499
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
For NVMe disks we skip setting SELinux label on corresponding
VFIO group (/dev/vfio/X). This bug is only visible with
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver,
2) kernel creates /dev/vfio/X node with generic device_t SELinux
label,
3) our namespace code creates the exact copy of the node in
domain's private /dev,
4) SELinux policy kicks in an changes the label on the node to
vfio_device_t (in the top most namespace),
5) libvirt tells QEMU to attach the NVMe disk, which is denied by
SELinux policy.
While one can argue that kernel should have created the
/dev/vfio/X node with the correct SELinux label from the
beginning (step 2), libvirt can't rely on that and needs to set
label on its own.
Surprisingly, I already wrote the code that aims on this specific
case (v6.0.0-rc1~241), but because of a shortcut we take earlier
it is never ran. The reason is that
virStorageSourceIsLocalStorage() considers NVMe disks as
non-local because their source is not accessible via src->path
(or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function
continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441
Fixes: 284a12bae0
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This patch adds the generalized job object into the domain object
so that it can be used by all drivers without the need to extract
it from the private data.
Because of this, the job object needs to be created and set
during the creation of the domain object. This patch also extends
xmlopt with possible job config containing virDomainJobObj
callbacks, its private data callbacks and one variable
(maxQueuedJobs).
This patch includes:
* addition of virDomainJobObj into virDomainObj (used in the
following patches)
* extending xmlopt with job config structure
* new function for freeing the virDomainJobObj
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The _virDomainTPMDef structure has 'version' member, which is a
bit misplaced. It's only emulator type of TPM that can have a
version, even our documentation says so:
``version``
The ``version`` attribute indicates the version of the TPM. This attribute
only works with the ``emulator`` backend. The following versions are
supported:
Therefore, move the member into that part of union that's
covering emulated TPM devices.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This supports sockets created by libvirt and passed by FD using the
same method as in security_dac.c.
Signed-off-by: David Michael <david@bigbadwolfsecurity.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use the designated helpers for virStorageSource instead using the
file-based ones with a check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Rohit Kumar <rohit.kumar3@nutanix.com>
Currently, libvirt allows only local filepaths to specify the location
of the 'nvram' image. Changing it to virStorageSource type will allow
supporting remote storage for nvram.
Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
Signed-off-by: Florian Schmidt <flosch@nutanix.com>
Signed-off-by: Rohit Kumar <rohit.kumar3@nutanix.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Rohit Kumar <rohit.kumar3@nutanix.com>
We already allow this for OVMF.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/312
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This fixes a blank screen when viewing a VM with virtio graphics and
gl-accelerated Spice display on Ubuntu 22.04 / libvirt 8.0.0 / qemu 6.2.
Without these AppArmor permissions, the libvirt error log contains
repetitions of:
qemu_spice_gl_scanout_texture: failed to get fd for texture
This appears to be similar to this GNOME Boxes issue:
https://gitlab.gnome.org/GNOME/gnome-boxes/-/issues/586
Fixes: https://launchpad.net/bugs/1972075
Signed-off-by: Max Goodhart <c@chromakode.com>
Reviewed-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Like a Spice port, a dbus serial must specify an associated channel name.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add the ability to configure a qemu-vdagent in guest domains. This
device is similar to the spice vdagent channel except that qemu handles
the spice-vdagent protocol messages itself rather than routing them over
a spice protocol channel.
The qemu-vdagent device has two notable configuration options which
determine whether qemu will handle particular vdagent features:
'clipboard' and 'mouse'.
The 'clipboard' option allows qemu to synchronize its internal clipboard
manager with the guest clipboard, which enables client<->guest clipboard
synchronization for non-spice guests such as vnc.
The 'mouse' option allows absolute mouse positioning to be sent over the
vdagent channel rather than using a usb or virtio tablet device.
Sample configuration:
<channel type='qemu-vdagent'>
<target type='virtio' name='com.redhat.spice.0'/>
<source>
<clipboard copypaste='yes'/>
<mouse mode='client'/>
</source>
</channel>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Instead of creating an empty object and then setting keys one
at a time, it is possible to pass a dict object to
configuration_data(). This is nicer because it doesn't require
repeating the name of the cfg_data object over and over.
There is one exception: the 'conf' object, where we store values
that are used directly by C code. In that case, using a dict
object is not feasible for two reasons: first of all, replacing
the set_quoted() calls would result in awkward code with a lot
of calls to format(); moreover, since code that modifies it is
sprinkled all over the place, refactoring it would probably
make things more complicated rather than simpler.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
QEMU needs to read, write, and lock the NVRAM *.fd files with UEFI
firmware.
Fixes: https://bugs.debian.org/1006324
Fixes: https://launchpad.net/bugs/1962035
Signed-off-by: Martin Pitt <mpitt@debian.org>
Reviewed-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
That is the proper POSIX way.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This change was generated using the following spatch:
@ rule1 @
expression a;
identifier f;
@@
<...
- f(*a);
... when != a;
- *a = NULL;
+ g_clear_pointer(a, f);
...>
@ rule2 @
expression a;
identifier f;
@@
<...
- f(a);
... when != a;
- a = NULL;
+ g_clear_pointer(&a, f);
...>
Then, I left some of the changes out, like tools/nss/ (which
doesn't link with glib) and put back a comment in
qemuBlockJobProcessEventCompletedActiveCommit() which coccinelle
decided to remove (I have no idea why).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After previous cleanups, the virDomainHostdevDefParseXMLSubsys()
function uses a mixture of virXMLProp*() and the old
virXMLPropString() + virXXXTypeFromString() patterns. Rework it
so that virXMLProp*() is used.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
And make callers check the return value as well. This helps error out early for
invalid environment variables.
That is desirable because it could lead to deadlocks. This can happen when
resetting logging after fork() reports translated errors because gettext
functions are not reentrant. Well, it is not limited to resetting logging after
fork(), it can be any translation at that phase, but parsing environment
variables is easy to make fail on purpose to show the result, it can also happen
just due to a typo.
Before this commit it is possible to deadlock the daemon on startup
with something like:
LIBVIRT_LOG_FILTERS='1:*' LIBVIRT_LOG_OUTPUTS=1:stdout libvirtd
where filters are used to enable more logging and hence make the race less rare
and outputs are set to invalid
Combined with the previous patches this changes
the following from:
...
<deadlock>
to:
...
libvirtd: initialisation failed
The error message is improved in future commits and is also possible thanks to
this patch.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Use g_auto for virCommand and char * and drop the cleanup label.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use 'g_clear_pointer(&ptr, g_hash_table_unref)' instead.
In few instances it allows us to also remove explicit clearing of
pointers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Use the appropriate enum type instead of an int and fix the XML parser
and one missing fully populated switch.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'virDomainChrSourceDef' contains private data so 'virDomainChrSourceDefNew'
must be used to allocate it. 'virDomainTPMDef' was using it directly
which won't work with the chardev helper functions.
Convert it to a pointer to properly allocate private data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The main reason is to ensure that the private data are properly
allocated for every instance.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This commit aims to address the bug reported in [1] and [2].
If the profile is corrupted (0-size) the VM cannot be launched.
To overcome this, check if the profile exists and if it has 0 size
remove it.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890084
[2] https://bugs.launchpad.net/bugs/1927519
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Reviewed-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The virCapabilitiesAddGuestDomain() function can't fail. It
aborts on OOM. Therefore, there's no need to check for its
return value.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The virCapabilitiesAddGuest() function can't fail. It aborts on
OOM. Therefore, there's no need to check for its return value.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In a few places we declare a variable (which is optionally
followed by a code not touching it) then set the variable to a
value and return the variable immediately. It's obvious that the
variable is needless and the value can be returned directly
instead.
This patch was generated using this semantic patch:
@@
type T;
identifier ret;
expression E;
@@
- T ret;
... when != ret
when strict
- ret = E;
- return ret;
+ return E;
After that I fixed couple of formatting issues because coccinelle
formatted some lines differently than our coding style.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use 'virXMLPropEnum' to parse it and fix all switch statements which
didn't include the VIR_DOMAIN_SMARTCARD_TYPE_LAST case.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The virtio-mem is paravirtualized mechanism of adding/removing
memory to/from a VM. A virtio-mem-pci device is split into blocks
of equal size which are then exposed (all or only a requested
portion of them) to the guest kernel to use as regular memory.
Therefore, the device has two important attributes:
1) block-size, which defines the size of a block
2) requested-size, which defines how much memory (in bytes)
is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable
throughout device's lifetime. The 'requested-size' can be set on
the command line too, but also is adjustable via monitor. In
fact, that is how management software places its requests to
change the memory allocation. If it wants to give more memory to
the guest it changes 'requested-size' to a bigger value, and if it
wants to shrink guest memory it changes the 'requested-size' to a
smaller value. Note, value of zero means that guest should
release all memory offered by the device. Of course, guest has to
cooperate. Therefore, there is a third attribute 'size' which is
read only and reflects how much memory the guest still has. This
can be different to 'requested-size', obviously. Because of name
clash, I've named it 'current' and it is dealt with in future
commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects:
memory-backend-{ram,file,memfd} and their size puts the cap on
the amount of memory that a virtio-mem device can offer to a
guest. But we are already able to express this info using <size/>
under <target/>.
Therefore, we need only two more elements to cover 'block-size'
and 'requested-size' attributes. This is the XML I've came up
with:
<memory model='virtio-mem'>
<source>
<nodemask>1-3</nodemask>
<pagesize unit='KiB'>2048</pagesize>
</source>
<target>
<size unit='KiB'>2097152</size>
<node>0</node>
<block unit='KiB'>2048</block>
<requested unit='KiB'>1048576</requested>
</target>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of
'block-size', and
2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
address.
Then there is a limitation that the minimal 'block-size' is
transparent huge page size (I'll leave this without explanation).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Attaching a newly created vhostuser port to a VM fails due to an
apparmor denial
internal error: unable to execute QEMU command 'chardev-add': Failed
to bind socket to /run/openvswitch/vhu838c4d29-c9: Permission denied
In the case of a net device type VIR_DOMAIN_NET_TYPE_VHOSTUSER, the
underlying chardev is not labeled in qemuDomainAttachNetDevice prior
to calling qemuMonitorAttachCharDev.
A simple fix would be to call qemuSecuritySetChardevLabel using the
embedded virDomainChrSourceDef in the virDomainNetDef vhostuser data,
but this incurs the risk of incorrectly restoring the label. E.g.
consider the DAC driver behavior with a vhostuser net device, which
uses a socket for the chardev backend. The DAC driver uses XATTRS to
store original labelling information, but XATTRS are not compatible
with sockets. Without the original labelling information, the socket
labels will be restored with root ownership, preventing other
less-privileged processes from connecting to the socket.
This patch avoids overloading chardev labelling with vhostuser net
devices by introducing virSecurityManager{Set,Restore}NetdevLabel,
which is currently only implemented for the apparmor driver. The
new APIs are then used to set and restore labels for the vhostuser
net devices.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>