After some debugging and discussion with systemd team it turns out we
are misusing the ordering in libvirt-guests.service. That happened
because we want to support both monolithic and modular daemon setups and
on top of that we also want to support socket activation and services
without socket activation. Unfortunately this is impossible to express
in the unit file because of how transactions are handled in systemd when
dependencies are resolved and multiple actions (jobs) are queued. For
explanation from Michal Sekletar see comment #7 in the BZ this patch is
fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1964855#c7
In order to support all the scenarios this patch also amends the
manpages so that users that are changing the default can also read how
to correct the dependency ordering in libvirt-guests unit file.
Ideally we would also keep the existing configuration during upgrade,
but due to our huge support matrix this seems hardly feasible as it
could introduce even more problems.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When reconnecting to a running QEMU process, we construct the
per-domain path in all hugetlbfs mounts. This is a relict from
the past (v3.4.0-100-g5b24d25062) where we switched to a
per-domain path and we want to create those paths when libvirtd
restarts on upgrade.
And with namespaces enabled there is one corner case where the
path is not created. In fact an error is reported and the
reconnect fails. Ideally, all mount events are propagated into
the QEMU's namespace. And they probably are, except when the
target path does not exist inside the namespace. Now, it's pretty
common for users to mount hugetlbfs under /dev (e.g.
/dev/hugepages), but if domain is started without hugepages (or
more specifically - private hugetlbfs path wasn't created on
domain startup), then the reconnect code tries to create it.
But it fails to do so, well, it fails to set seclabels on the
path because, because the path does not exist in the private
namespace. And it doesn't exist because we specifically create
only a subset of all possible /dev nodes. Therefore, the mount
event, whilst propagated, is not successful and hence the
filesystem is not mounted. We have to do it ourselves.
If hugetlbfs is mount anywhere else there's no problem and this
is effectively a dead code.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Sometimes it may come handy to just bind mount a directory/file
into domain's namespace. Implement a thin wrapper over
qemuNamespaceMknodPaths() which has all the logic we need.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When setting up namespace for QEMU we look at mount points under
/dev (like /dev/pts, /dev/mqueue/, etc.) because we want to
preserve those (which is done by moving them to a temp location,
unshare(), and then moving them back). We have a convenience
helper - qemuDomainGetPreservedMounts() - that processes the
mount table and (optionally) moves the other filesystems too.
This helper is also used when attempting to create a path in NS,
because the path, while starting with "/dev/" prefix, may
actually lead to one of those filesystems that we preserved.
And here comes the corner case: while we require the parent mount
table to be in shared mode (equivalent of `mount --make-rshared /'),
these mount events propagate iff the target path exist inside the
slave mount table (= QEMU's private namespace). And since we
create only a subset of /dev nodes, well, that assumption is not
always the case.
For instance, assume that a domain is already running, no
hugepages were configured for it nor any hugetlbfs is mounted.
Now, when a hugetlbfs is mounted into '/dev/hugepages', this is
propagated into the QEMU's namespace, but since the target dir
does not exist in the private /dev, the FS is not mounted in the
namespace.
Fortunately, this difference between namespaces is visible when
comparing /proc/mounts and /proc/$PID/mounts (where PID is the
QEMU's PID). Therefore, if possible we should look at the latter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When creating a path in a domain's mount namespace we try to set
ACLs on it, so that it's a verbatim copy of the path in parent's
namespace. The ACLs are queried upfront (by
qemuNamespaceMknodItemInit()) but this is fault tolerant so the
pointer to ACLs might be NULL (meaning no ACLs were queried, for
instance because the underlying filesystem does not support
them). But then we take this NULL and pass it to virFileSetACLs()
which immediately returns an error because NULL is invalid value.
Mimic what we do with SELinux label - only set ACLs if they are
non-NULL which includes symlinks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When retiring QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI capability the
commit removed a bit too much. Previously, all other devices than
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI were ignored in
qemuDomainDeviceHostdevDefPostParseRestoreBackendAlias(). But the
commit in question removed not only the capability check but also
this return early statement. Restore it back.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2129239
Fixes: dc8dbb27d4
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
For NVMe disks we skip setting SELinux label on corresponding
VFIO group (/dev/vfio/X). This bug is only visible with
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver,
2) kernel creates /dev/vfio/X node with generic device_t SELinux
label,
3) our namespace code creates the exact copy of the node in
domain's private /dev,
4) SELinux policy kicks in an changes the label on the node to
vfio_device_t (in the top most namespace),
5) libvirt tells QEMU to attach the NVMe disk, which is denied by
SELinux policy.
While one can argue that kernel should have created the
/dev/vfio/X node with the correct SELinux label from the
beginning (step 2), libvirt can't rely on that and needs to set
label on its own.
Surprisingly, I already wrote the code that aims on this specific
case (v6.0.0-rc1~241), but because of a shortcut we take earlier
it is never ran. The reason is that
virStorageSourceIsLocalStorage() considers NVMe disks as
non-local because their source is not accessible via src->path
(or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function
continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441
Fixes: 284a12bae0
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Updated example covers:
* UUID
* CPU model, vendor, microcode, signature, counters,
topology, maxphysaddr, features,
* Power management
* NUMA page size info, multiple nodes, CPU topology IDs, distances
* CPU cache bank info
* Multiple secmodels
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The 'cb' and 'jobDataPrivateCb' pointers are stored in the job object
but made point to the memory owned by the virDomainXMLOption struct in
the callers.
Since the 'virdomainjob' module isn't in control the lifetime of the
virDomainXMLOption, which in some cases is freed before the domain job
data, freed memory would be dereferenced in some cases.
Copy the structs from virDomainXMLOption to ensure the lifetime. This is
possible since the callback functions are immutable.
Fixes: 84e9fd068c
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
According to repology.org:
RHEL-8: 0.9.4
RHEL-9: 0.9.6
Debian 11: 0.9.5
openSUSE Leap 15.3: 0.8.7
Ubuntu 20.04: 0.9.3
And the rest of distros has something newer anyways. Requiring
0.8.1 or newer allows us to drop the terrible hack where we
rename functions at meson level using #define. Note, 0.8.0 is
the version of libssh where the rename happened. It also allows
us to stick with SHA-256 hash algorithm for public keys.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This updates the FreeBSD 13 image to 13.1 which should fix the
symbol lookup errors seen in CI recently.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In the basic configuration with monolithic libvirtd users are required
to also start virtlogd. Add a general note with a specific example
hinting that this is needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
To prevent surprises when a build doesn't in fact contain the required
functionality suggest that users force-enable required modules.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Only the preparation of sources differs between a build from a git
checkout vs a build from tarball. Restructure the docs to outline the
difference and combine information on how to configure libvirt.
Most notably the suggestion to use '-Dsystem=true' was present only for
the steps to build a git checkout.
Suggest also running the testsuite as part of the build step.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Create a section for instructions on how to install the built binaries
rather than mentioning it multiple times.
Add a note that installing over your distro-provided packages will most
likely break your instalation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Running from build directory isn't strictly tied to the git-checkout
build so make a new section for it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Users should be encouraged to install libvirt from the distro's repos in
the first place.
Also encourage distro-specific ways to get newer versions, rather than
building from source manually.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When a hypervisor driver is not compiled in and a user enables the
monolithic libvirtd, they get the following misleading error:
$ virsh -c qemu:///system
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory
The issue is that the daemon side of the remote driver can't find the
appropriate driver, but the remote driver always accepts everything and
thus attempts to delegate further, which in case of libvirtd makes no
sense.
Refuse opening a connection for local URIS even when the requested
driver is not registered in case when we are inside 'libvirtd' as
libvirtd doesn't have anything to delegate to.
$ virsh -c qemu:///system
error: failed to connect to the hypervisor
error: no connection driver available for qemu:///system
Discovered when investigating https://gitlab.com/libvirt/libvirt/-/issues/370
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Upcoming patch which is fixing the opening of drivers in monolithic mode
needs to know whether we are inside 'libvirtd' but the code where the
decision needs to happen is not re-compiled per daemon. Thus we need to
pass this information to the stateful driver init function so that it
can be remebered.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic memory freeing for 'driver' and return error right away to
avoid the 'cleanup' label.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virConnectOpenAuth provides an unified interface with using 'flags' to
select the proper mode.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Configuring an URI alias such as
uri_aliases = [
"blah=qemu://invaliduri@@@",
]
Results in a double free when the alias is used:
$ virsh -c blah
free(): double free detected in tcache 2
Aborted (core dumped)
This happens as the 'alias' variable is first assigned to 'uristr' which
is cleared in the 'failed' label and then is explicitly freed again.
Fix this by stealing the alias into 'uristr' and removing the
unnecessary freeing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There are two points I've taken for granted:
1) the mount points are set before starting a guest,
2) the / and its submounts are marked as shared, so that mount
events propagate into child namespaces when assumption 1) is
not held.
But what's obvious to me might not be obvious to our users.
Document these known limitations.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The aim of qemuProcessNeedHugepagesPath() is to determine whether
a hugetlbfs mount point is required for given domain (as in
whether qemuBuildMemoryBackendProps() picks up
memory-backend-file pointing to a hugetlbfs mount point). Well,
when domain is configured to use memfd backend then that
condition can never be true. Therefore, skip creating domain's
private path under hugetlbfs mount points.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The bhyve driver still has some frames larger than 2048 bytes, so we
need to keep the limit as is.
The CI failure was masked by the Freebsd-13 failing for unrelated
reasons.
This reverts commit 46302172d4
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
After recent cleanups we can now restrict the maximum stack frame size
to 2k.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
At time of this patch struct 'virDomainDef' has 1736 bytes. Allocate it
dynamically to keep the stack frame size in reasonable values.
This patch also fixes remoteRelayDomainQemuMonitorEventCheckACL, where
we didn't clear the stack'd variable prior to use. Fortunately for now
the code didn't look at anything else than what the code overwrote.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
At time of writing DEVLINK_ATTR_MAX equals to 176, thus the stack'd size
of the pointer array is almost 1.4kiB. Allocate it dynamically.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Introduce 'virLXCProcessReportStartupLogError' which simplifies the
error handling on startup of the LXC process when reading of the error
log is needed.
This function has unusual return value semantics but it helps to make
the callers simpler.
This patch also removes 2 1k stack'd buffers from virLXCProcessStart.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that all preceding flags were deleted we can fix the enum value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'qemuMonitorJSONMigrate' is called from:
- qemuMonitorMigrateToHost
- qemuMonitorMigrateToSocket
Both of the above function are called only from
qemuMigrationSrcStart.
- qemuMonitorMigrateToFd
- called from:
- qemuMigrationSrcToFile
Both instances here pass QEMU_MONITOR_MIGRATE_BACKGROUND
directly.
- qemuMigrationSrcStart
qemuMigrationSrcStart is then called from qemuMigrationSrcRun and
qemuMigrationSrcResume, both of which always add QEMU_MONITOR_MIGRATE_BACKGROUND
to the flags.
Thus any caller always passes the flag so that we can remove the flag
altogether.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove the support for enabling the 'blk' and 'inc' parameters of the
'migrate' command as there are no users any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU supported the NBD server required for the new-style migration for a
long time already and when coupled with -blockdev the old style
migration doesn't even work, thus remove support for it.
This patch modifies the code to check that the destination returned data
for the NBD migration and returns an error if it did not and deletes the
fallback code paths which would not work.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The NBD server (detected via 'nbd-server-start' qmp command) was added
to qemu in v1.3 and can't be compiled out.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In commit 6111b23522 removing pre-blockdev code paths I've
improperly refactored the setup of non-shared storage migration.
Specifically the code checking that there are disks and setting up the
NBD data in the migration cookie was originally outside of the loop
checking the user provided list of specific disks to migrate, but became
part of the block as it was not un-indented when a higher level block
was being removed.
The above caused that if non-shared storage migration is requested, but
the user doesn't provide the list of disks to migrate (thus implying to
migrate every appropriate disk) the code doesn't actually setup the
migration and then later on falls back to the old-style migration which
no longer works with blockdev.
Move the check that there's anything to migrate out of the
'nmigrate_disks' block.
Fixes: 6111b23522
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2125111
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/373
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Adding supposedly secure cleanup for secrets in anything related to the
XML parser is pointless because there are multiple other un-sanitized
copies of the full XML and the XML parser state at the very least.
Similarly in case RPC was used to transport the XML the RPC buffers are
not sanitized.
Additionally this patch was incomplete as it didn't sanitize the
password in the cleanup function for virDomainGraphicsAuthDef.
This reverts commit 51f8130d78
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Although these and functions in the following two patches are for
now just being used by the qemu driver, it makes sense to have all
begin job functions in the same file.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>