Let me take you on a short trip to history. A long time ago,
libvirt would configure all QEMUs to use $hugetlbfs/libvirt/qemu
for their hugepages setup. This was problematic, because it did
not allow enough separation between guests. Therefore in
v3.0.0-rc1~367 the path changed to a per-domain basis:
$hugetlbfs/libvirt/qemu/$domainShortName
And to help with migration on daemon restart a call to
qemuProcessBuildDestroyMemoryPaths() was added to
qemuProcessReconnect() (well, it was named
qemuProcessBuildDestroyHugepagesPath() back then, see
v3.10.0-rc1~174). This was desirable then, because the memory
hotplug code did not call the function, it simply assumes
per-domain paths to exist. But this changed in v3.5.0-rc1~92
after which the per-domain paths are created on memory hotplug
too.
Therefore, it's no longer necessary to create these paths in
qemuProcessReconnect(). They are created exactly when needed
(domain startup and memory hotplug).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When creating a node in QEMU's namespace the whole link chain is
created with it. Here, we use g_file_read_link() from the child
(running inside the namespace) to learn whether a link exists and
points to expected target. Now, when building the namespace there
can't be any symlinks and this g_file_read_link() returns NULL
always. And because we pass a local GError variable to it, glib
tries to set it to a localized error message. This comes with
creating a (static) hash table inside of g_strerror() and is
guarded with a mutex. The hash table is also allocated using
GSlice allocator instead of g_malloc, and since the latter is
safe to use after fork (because it's documented to use plain
malloc), glib went with the former, naturally. Now, GSlice
allocator has plenty of internal mutexes and thus hitting a
locked mutex is not that hard.
Fortunately, we don't care about any error from
g_file_read_link() and thus we can pass NULL which avoids calling
g_strerror().
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2120965
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The qemuNamespaceMknodPaths() function is responsible for
creating files/directories in QEMU's mount namespace. When
called, it is given list of paths that have to be created in the
namespace. It processes this list and removes items that are not
directly under /dev, but on a 'shared' filesystem (note that all
other mount points are preserved). And it may so happen that
after this pre-process no files/directories need to be created in
the namespace. If that's the case, exit early and avoid
fork()-ing only to find out the same.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Besides the -cpu host, The host-phys-bits=on applies to custom or max
cpu model, So the host-passthrough validation check is unnecessary for
maxphysaddr with mode='passthrough'.
Signed-off-by: Lin Ma <lma@suse.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
In case of variable 'oldjob' (job structure) in
qemuProcessReconnect() the cb pointer was just copied from the
existing job structure in virDomainObjPreserveJob(). This caused
the job and oldjob sharing the same pointer, which was later
freed at the end of the qemuProcessReconnect() function by
automatic call to virDomainObjClearJob(). This caused an invalid
read in and subsequent daemon crash as the job structure was
trying to read cb which had been already freed.
This patch changes the copying to g_memdup that allocates
different pointer, which can be later safely freed.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Updated by "Update PO files to match POT (msgmerge)" hook in Weblate.
Translation: libvirt/libvirt
Translate-URL: https://translate.fedoraproject.org/projects/libvirt/libvirt/
Co-authored-by: Weblate <noreply@weblate.org>
Signed-off-by: Fedora Weblate Translation <i18n@lists.fedoraproject.org>
rpmbuild is complaining it's not recommended to have unversioned
Obsoletes. On the other hand using dynamic version/release is a bit too
much as we know in which release a particular subpackage was removed.
Let's just use the corresponding version in both cases to be consistent
with all other Obsoletes in our spec file.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For prevent memory leak and easier to use, So change
virDomainEventTunableNew to get virTypedParameterPtr *params
and set it = NULL.
Signed-off-by: lu zhipeng <luzhipeng@cestc.cn>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
After some debugging and discussion with systemd team it turns out we
are misusing the ordering in libvirt-guests.service. That happened
because we want to support both monolithic and modular daemon setups and
on top of that we also want to support socket activation and services
without socket activation. Unfortunately this is impossible to express
in the unit file because of how transactions are handled in systemd when
dependencies are resolved and multiple actions (jobs) are queued. For
explanation from Michal Sekletar see comment #7 in the BZ this patch is
fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1964855#c7
In order to support all the scenarios this patch also amends the
manpages so that users that are changing the default can also read how
to correct the dependency ordering in libvirt-guests unit file.
Ideally we would also keep the existing configuration during upgrade,
but due to our huge support matrix this seems hardly feasible as it
could introduce even more problems.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When reconnecting to a running QEMU process, we construct the
per-domain path in all hugetlbfs mounts. This is a relict from
the past (v3.4.0-100-g5b24d25062) where we switched to a
per-domain path and we want to create those paths when libvirtd
restarts on upgrade.
And with namespaces enabled there is one corner case where the
path is not created. In fact an error is reported and the
reconnect fails. Ideally, all mount events are propagated into
the QEMU's namespace. And they probably are, except when the
target path does not exist inside the namespace. Now, it's pretty
common for users to mount hugetlbfs under /dev (e.g.
/dev/hugepages), but if domain is started without hugepages (or
more specifically - private hugetlbfs path wasn't created on
domain startup), then the reconnect code tries to create it.
But it fails to do so, well, it fails to set seclabels on the
path because, because the path does not exist in the private
namespace. And it doesn't exist because we specifically create
only a subset of all possible /dev nodes. Therefore, the mount
event, whilst propagated, is not successful and hence the
filesystem is not mounted. We have to do it ourselves.
If hugetlbfs is mount anywhere else there's no problem and this
is effectively a dead code.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Sometimes it may come handy to just bind mount a directory/file
into domain's namespace. Implement a thin wrapper over
qemuNamespaceMknodPaths() which has all the logic we need.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When setting up namespace for QEMU we look at mount points under
/dev (like /dev/pts, /dev/mqueue/, etc.) because we want to
preserve those (which is done by moving them to a temp location,
unshare(), and then moving them back). We have a convenience
helper - qemuDomainGetPreservedMounts() - that processes the
mount table and (optionally) moves the other filesystems too.
This helper is also used when attempting to create a path in NS,
because the path, while starting with "/dev/" prefix, may
actually lead to one of those filesystems that we preserved.
And here comes the corner case: while we require the parent mount
table to be in shared mode (equivalent of `mount --make-rshared /'),
these mount events propagate iff the target path exist inside the
slave mount table (= QEMU's private namespace). And since we
create only a subset of /dev nodes, well, that assumption is not
always the case.
For instance, assume that a domain is already running, no
hugepages were configured for it nor any hugetlbfs is mounted.
Now, when a hugetlbfs is mounted into '/dev/hugepages', this is
propagated into the QEMU's namespace, but since the target dir
does not exist in the private /dev, the FS is not mounted in the
namespace.
Fortunately, this difference between namespaces is visible when
comparing /proc/mounts and /proc/$PID/mounts (where PID is the
QEMU's PID). Therefore, if possible we should look at the latter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When creating a path in a domain's mount namespace we try to set
ACLs on it, so that it's a verbatim copy of the path in parent's
namespace. The ACLs are queried upfront (by
qemuNamespaceMknodItemInit()) but this is fault tolerant so the
pointer to ACLs might be NULL (meaning no ACLs were queried, for
instance because the underlying filesystem does not support
them). But then we take this NULL and pass it to virFileSetACLs()
which immediately returns an error because NULL is invalid value.
Mimic what we do with SELinux label - only set ACLs if they are
non-NULL which includes symlinks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When retiring QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI capability the
commit removed a bit too much. Previously, all other devices than
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI were ignored in
qemuDomainDeviceHostdevDefPostParseRestoreBackendAlias(). But the
commit in question removed not only the capability check but also
this return early statement. Restore it back.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2129239
Fixes: dc8dbb27d40968c9d9bfad2c6181bccc20c0e44e
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
For NVMe disks we skip setting SELinux label on corresponding
VFIO group (/dev/vfio/X). This bug is only visible with
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver,
2) kernel creates /dev/vfio/X node with generic device_t SELinux
label,
3) our namespace code creates the exact copy of the node in
domain's private /dev,
4) SELinux policy kicks in an changes the label on the node to
vfio_device_t (in the top most namespace),
5) libvirt tells QEMU to attach the NVMe disk, which is denied by
SELinux policy.
While one can argue that kernel should have created the
/dev/vfio/X node with the correct SELinux label from the
beginning (step 2), libvirt can't rely on that and needs to set
label on its own.
Surprisingly, I already wrote the code that aims on this specific
case (v6.0.0-rc1~241), but because of a shortcut we take earlier
it is never ran. The reason is that
virStorageSourceIsLocalStorage() considers NVMe disks as
non-local because their source is not accessible via src->path
(or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function
continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441
Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Updated example covers:
* UUID
* CPU model, vendor, microcode, signature, counters,
topology, maxphysaddr, features,
* Power management
* NUMA page size info, multiple nodes, CPU topology IDs, distances
* CPU cache bank info
* Multiple secmodels
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The 'cb' and 'jobDataPrivateCb' pointers are stored in the job object
but made point to the memory owned by the virDomainXMLOption struct in
the callers.
Since the 'virdomainjob' module isn't in control the lifetime of the
virDomainXMLOption, which in some cases is freed before the domain job
data, freed memory would be dereferenced in some cases.
Copy the structs from virDomainXMLOption to ensure the lifetime. This is
possible since the callback functions are immutable.
Fixes: 84e9fd068ccad6e19e037cd6680df437617e2de5
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
According to repology.org:
RHEL-8: 0.9.4
RHEL-9: 0.9.6
Debian 11: 0.9.5
openSUSE Leap 15.3: 0.8.7
Ubuntu 20.04: 0.9.3
And the rest of distros has something newer anyways. Requiring
0.8.1 or newer allows us to drop the terrible hack where we
rename functions at meson level using #define. Note, 0.8.0 is
the version of libssh where the rename happened. It also allows
us to stick with SHA-256 hash algorithm for public keys.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This updates the FreeBSD 13 image to 13.1 which should fix the
symbol lookup errors seen in CI recently.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In the basic configuration with monolithic libvirtd users are required
to also start virtlogd. Add a general note with a specific example
hinting that this is needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
To prevent surprises when a build doesn't in fact contain the required
functionality suggest that users force-enable required modules.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Only the preparation of sources differs between a build from a git
checkout vs a build from tarball. Restructure the docs to outline the
difference and combine information on how to configure libvirt.
Most notably the suggestion to use '-Dsystem=true' was present only for
the steps to build a git checkout.
Suggest also running the testsuite as part of the build step.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Create a section for instructions on how to install the built binaries
rather than mentioning it multiple times.
Add a note that installing over your distro-provided packages will most
likely break your instalation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Running from build directory isn't strictly tied to the git-checkout
build so make a new section for it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Users should be encouraged to install libvirt from the distro's repos in
the first place.
Also encourage distro-specific ways to get newer versions, rather than
building from source manually.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When a hypervisor driver is not compiled in and a user enables the
monolithic libvirtd, they get the following misleading error:
$ virsh -c qemu:///system
error: failed to connect to the hypervisor
error: Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory
The issue is that the daemon side of the remote driver can't find the
appropriate driver, but the remote driver always accepts everything and
thus attempts to delegate further, which in case of libvirtd makes no
sense.
Refuse opening a connection for local URIS even when the requested
driver is not registered in case when we are inside 'libvirtd' as
libvirtd doesn't have anything to delegate to.
$ virsh -c qemu:///system
error: failed to connect to the hypervisor
error: no connection driver available for qemu:///system
Discovered when investigating https://gitlab.com/libvirt/libvirt/-/issues/370
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Upcoming patch which is fixing the opening of drivers in monolithic mode
needs to know whether we are inside 'libvirtd' but the code where the
decision needs to happen is not re-compiled per daemon. Thus we need to
pass this information to the stateful driver init function so that it
can be remebered.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic memory freeing for 'driver' and return error right away to
avoid the 'cleanup' label.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virConnectOpenAuth provides an unified interface with using 'flags' to
select the proper mode.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Configuring an URI alias such as
uri_aliases = [
"blah=qemu://invaliduri@@@",
]
Results in a double free when the alias is used:
$ virsh -c blah
free(): double free detected in tcache 2
Aborted (core dumped)
This happens as the 'alias' variable is first assigned to 'uristr' which
is cleared in the 'failed' label and then is explicitly freed again.
Fix this by stealing the alias into 'uristr' and removing the
unnecessary freeing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There are two points I've taken for granted:
1) the mount points are set before starting a guest,
2) the / and its submounts are marked as shared, so that mount
events propagate into child namespaces when assumption 1) is
not held.
But what's obvious to me might not be obvious to our users.
Document these known limitations.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The aim of qemuProcessNeedHugepagesPath() is to determine whether
a hugetlbfs mount point is required for given domain (as in
whether qemuBuildMemoryBackendProps() picks up
memory-backend-file pointing to a hugetlbfs mount point). Well,
when domain is configured to use memfd backend then that
condition can never be true. Therefore, skip creating domain's
private path under hugetlbfs mount points.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The bhyve driver still has some frames larger than 2048 bytes, so we
need to keep the limit as is.
The CI failure was masked by the Freebsd-13 failing for unrelated
reasons.
This reverts commit 46302172d47709b169c4b9b4cd6a4847fc2f0b4c
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
After recent cleanups we can now restrict the maximum stack frame size
to 2k.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
At time of this patch struct 'virDomainDef' has 1736 bytes. Allocate it
dynamically to keep the stack frame size in reasonable values.
This patch also fixes remoteRelayDomainQemuMonitorEventCheckACL, where
we didn't clear the stack'd variable prior to use. Fortunately for now
the code didn't look at anything else than what the code overwrote.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
At time of writing DEVLINK_ATTR_MAX equals to 176, thus the stack'd size
of the pointer array is almost 1.4kiB. Allocate it dynamically.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Introduce 'virLXCProcessReportStartupLogError' which simplifies the
error handling on startup of the LXC process when reading of the error
log is needed.
This function has unusual return value semantics but it helps to make
the callers simpler.
This patch also removes 2 1k stack'd buffers from virLXCProcessStart.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that all preceding flags were deleted we can fix the enum value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'qemuMonitorJSONMigrate' is called from:
- qemuMonitorMigrateToHost
- qemuMonitorMigrateToSocket
Both of the above function are called only from
qemuMigrationSrcStart.
- qemuMonitorMigrateToFd
- called from:
- qemuMigrationSrcToFile
Both instances here pass QEMU_MONITOR_MIGRATE_BACKGROUND
directly.
- qemuMigrationSrcStart
qemuMigrationSrcStart is then called from qemuMigrationSrcRun and
qemuMigrationSrcResume, both of which always add QEMU_MONITOR_MIGRATE_BACKGROUND
to the flags.
Thus any caller always passes the flag so that we can remove the flag
altogether.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove the support for enabling the 'blk' and 'inc' parameters of the
'migrate' command as there are no users any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>