virHostdevReAttachPCIDevices() is called when we want to re-attach
a list of hostdevs back to the host, either on the shutdown path or
via a 'virsh detach-device' call. This function always count on the
existence of the device in the host to work, but this can lead to
problems. For example, a SR-IOV device can be removed via an admin
"echo 0 > /sys/bus/pci/devices/<addr>/sriov_numvfs", making the kernel
fire up and eventfd_signal() to the process, asking for the process to
release the device. The result might vary depending on the device driver
and OS/arch, but two possible outcomes are:
1) the hypervisor driver will detach the device from the VM, issuing a
delete event to Libvirt. This can be observed in QEMU;
2) the 'echo 0 > ...' will hang waiting for the device to be unplugged.
This means that the VM process failed/refused to release the hostdev back
to the host, and the hostdev will be detached during VM shutdown.
Today we don't behave well for both cases. We'll fail to remove the PCI device
reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because
we rely on the existence of the PCI device conf file in the sysfs. Attempting
to re-utilize the same device (assuming it is now present back in the host)
can result in an error like this:
$ ./run tools/virsh start vm1-sriov --console
error: Failed to start domain vm1-sriov
error: Requested operation is not valid: PCI device 0000:01:00.2 is in use by driver QEMU, domain vm1-sriov
For (1), a VM destroy/start cycle is needed to re-use the VF in the guest.
For (2), the effect is more nefarious, requiring a Libvirtd daemon restart
to use the VF again in any guest.
We can make it a bit better by checking, during virHostdevReAttachPCIDevices(),
if there is any missing PCI device that will be left behind in activePCIHostdevs
and inactivePCIHostdevs lists. Remove any missing device found from both lists,
unconditionally, matching the current state of the host. This change affects
the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all
the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop
into qemuHostdevReAttachDomainDevices).
NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV
hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it,
our goal is to mitigate what it is still considered a user oopsie. For all
supported purposes, the admin must remove the SR-IOV VFs from all running domains
before removing the VFs from the host.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This change will allow us to remove PCI devices from a list
without the need of a PCI Device object, which will be need
in the next patch.
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We're going to need a way to remove a PCI Device from a list without having
a valid virPCIDevicePtr, because the device is missing from the host. This
means that virPCIDevicesListDel() must operate with a PCI Device address
instead.
Turns out that virPCIDevicesListDel() and its related functions only use
the virPCIDeviceAddressPtr of the virPCIDevicePtr, so this change is
simple to do and will not cause hassle in all other callers. Let's
start adapting virPCIDeviceListFindIndex() and crawl our way up to
virPCIDevicesListDel().
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
There is no need to bother with cgroup tearing down for absent
PCI devices, given that their entries in the sysfs are already
gone.
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Add a helper to quickly determine if a hostdev is a PCI device,
instead of doing a tedious 'if' check with hostdev mode and
subsys type.
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Use g_auto* pointers to avoid the need of a cleanup label. The
type of the pointer 'virDomainPtr dom' was changed to its alias
'virshDomainPtr' to allow the use of g_autoptr().
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
If the underlying PCI device of a hostdev does not exist in the
host (e.g. a SR-IOV VF that was removed while the domain was
running), skip security label handling for it.
This will avoid errors that happens during qemuProcessStop() time,
where a VF that was being used by the domain is not present anymore.
The restore label functions of both DAC and SELinux drivers will
trigger errors in virPCIDeviceNew().
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Gitlab issue #72 [1] reports that removing SR-IOVs VFs before
removing the devices from the running domains can have strange
consequences. QEMU might be able to hotunplug the device inside the
guest, but Libvirt will not be aware of that, and then the guest is
now inconsistent with the domain definition.
There's also the possibility of the VFs removal not succeeding
while the domain is running but then, as soon as the domain
is shutdown, all the VFs are removed. Libvirt can't handle
the removal of the PCI devices while trying to reattach the
hostdevs, and the Libvirt daemon can be left in an inconsistent
state (see [2]).
This patch starts to address the issue related in Gitlab #72, most
notably the issue described in [2]. When shutting down a domain
with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices()
is failing the whole process and failing to reattach all the
PCI devices, including the ones that aren't related to the VFs that
went missing. Let's make it more resilient with host changes by
changing virHostdevGetPCIHostDevice() to return an exclusive error
code '-2' for this case. virHostdevGetPCIHostDeviceList() can then
tell when virHostdevGetPCIHostDevice() failed to find the PCI
device of a hostdev and continue to make the list of PCI devices.
virHostdevReAttachPCIDevices() will now be able to proceed reattaching
all other valid PCI devices, at least. The 'ghost hostdevs' will be
handled later on.
[1] https://gitlab.com/libvirt/libvirt/-/issues/72
[2] https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
We're going to add logic to handle the case where a previously
existing PCI device does not longer exist in the host.
The logic was copied from virPCIDeviceNew(), which verifies if a
PCI device exists in the host, returning NULL and throwing an
error if it doesn't. The NULL is used for other errors as well
(product/vendor id read errors, dev id overflow), meaning that we
can't re-use virPCIDeviceNew() for the purpose of detecting
if the device exists.
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Via coccinelle (not the handbag!)
spatches used:
@ rule1 @
identifier a, b;
symbol NULL;
@@
- b = a;
... when != a
- a = NULL;
+ b = g_steal_pointer(&a);
@@
- *b = a;
... when != a
- a = NULL;
+ *b = g_steal_pointer(&a);
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
If the VM isn't active calculating the job stats doesn't make sense.
Additionally this prevents a crash of libvirtd if qemu terminates while
libvirt wasn't running:
Thread 28 "init-backup-tes" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb9310640 (LWP 3201116)]
qemuDomainJobInfoUpdateTime (jobInfo=0x0) at ../../../libvirt/src/qemu/qemu_domainjob.c:275
275 if (!jobInfo->started)
(gdb) bt
#0 qemuDomainJobInfoUpdateTime (jobInfo=0x0) at ../../../libvirt/src/qemu/qemu_domainjob.c:275
#1 0x00007fffcba1a12d in qemuBackupJobTerminate (vm=0x7fff9c1bc840, jobstatus=QEMU_DOMAIN_JOB_STATUS_CANCELED) at ../../../libvirt/src/qemu/qemu_backup.c:563
#2 0x00007fffcbaefcae in qemuProcessStop
(driver=0x7fff9c144ff0, vm=0x7fff9c1bc840, reason=VIR_DOMAIN_SHUTOFF_DAEMON, asyncJob=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
at ../../../libvirt/src/qemu/qemu_process.c:7812
#3 0x00007fffcbaf2a10 in qemuProcessReconnect (opaque=<optimized out>) at ../../../libvirt/src/qemu/qemu_process.c:8578
#4 0x00007ffff7c46bb5 in virThreadHelper (data=<optimized out>) at ../../../libvirt/src/util/virthread.c:233
#5 0x00007ffff6e453f9 in start_thread () at /lib64/libpthread.so.0
#6 0x00007ffff766fb53 in clone () at /lib64/libc.so.6
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Upcoming patch will remove unnecessary actions if the VM crashed. The
cleanup needs to be performed always, thus needs to be moved earlier.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If there are no source extents the volume XML has an empty <source>
element. Remove it if there's nothing in it by using
virXMLFormatElement.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move the extent formatting code into
virStorageVolDefFormatSourceExtents.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When the backing store of the image can't be parsed
virStorageSourceNewFromBacking returns -1. storageBackendProbeTarget
then also fails which makes the pool refresh fail or even the storage
pool becomes inactive after (re)start of libvirtd.
In situations when we can't access the backing store via network we
just report the backing store string, thus we can do the same thing for
unparsable backing store to prevent the pool from going offline.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit bc3a78f61a2aaac3 errorneously removed the return value check from
virStorageSourceNewFromBacking. In cases when we e.g. can't parse the
backing store string this leads to a crash:
#0 virStorageSourceGetActualType (def=0x0) at ../../../libvirt/src/conf/storage_source_conf.c:1014
#1 0x00007ffff7cee4f9 in virStorageSourceIsLocalStorage (src=<optimized out>) at ../../../libvirt/src/conf/storage_source_conf.c:1026
#2 0x00007ffff455c97c in storageBackendProbeTarget (encryption=0x7fff9c122ce8, target=0x7fff9c122c68) at ../../../libvirt/src/storage/storage_util.c:3443
#3 virStorageBackendRefreshVolTargetUpdate (vol=0x7fff9c122c30) at ../../../libvirt/src/storage/storage_util.c:3519
#4 0x00007ffff455cdc0 in virStorageBackendRefreshLocal (pool=0x7fff9c010ea0) at ../../../libvirt/src/storage/storage_util.c:3593
#5 0x00007ffff454f0a1 in storagePoolRefreshImpl
(backend=backend@entry=0x7ffff4711180 <virStorageBackendDirectory>, obj=obj@entry=0x7fff9c010ea0, stateFile=stateFile@entry=0x7fff9c111a90 "/var/run/libvirt/storage/tmp.xml") at ../../../libvirt/src/storage/storage_driver.c:103
#6 0x00007ffff4550ea5 in storagePoolUpdateStateCallback (obj=0x7fff9c010ea0, opaque=<optimized out>) at ../../../libvirt/src/storage/storage_driver.c:165
#7 0x00007ffff7cefef4 in virStoragePoolObjListForEachCb (payload=<optimized out>, name=<optimized out>, opaque=0x7fffc8a489c0)
at ../../../libvirt/src/conf/virstorageobj.c:435
#8 0x00007ffff7c03195 in virHashForEachSafe
(table=<optimized out>, iter=iter@entry=0x7ffff7cefec0 <virStoragePoolObjListForEachCb>, opaque=opaque@entry=0x7fffc8a489c0)
at ../../../libvirt/src/util/virhash.c:414
#9 0x00007ffff7cf0520 in virStoragePoolObjListForEach
(pools=<optimized out>, iter=iter@entry=0x7ffff4550e10 <storagePoolUpdateStateCallback>, opaque=opaque@entry=0x0)
at ../../../libvirt/src/conf/virstorageobj.c:468
#10 0x00007ffff454f43a in storagePoolUpdateAllState () at ../../../libvirt/src/storage/storage_driver.c:184
#11 storageStateInitialize (privileged=<optimized out>, root=<optimized out>, callback=<optimized out>, opaque=<optimized out>)
at ../../../libvirt/src/storage/storage_driver.c:315
#12 0x00007ffff7e10c04 in virStateInitialize
(opaque=0x555555621820, callback=0x55555557b1d0 <daemonInhibitCallback>, root=0x0, mandatory=<optimized out>, privileged=true)
at ../../../libvirt/src/libvirt.c:656
#13 virStateInitialize
(privileged=<optimized out>, mandatory=mandatory@entry=false, root=root@entry=0x0, callback=callback@entry=0x55555557b1d0 <daemonInhibitCallback>, opaque=opaque@entry=0x555555621820) at ../../../libvirt/src/libvirt.c:638
#14 0x000055555557b230 in daemonRunStateInit (opaque=0x555555621820) at ../../../libvirt/src/remote/remote_daemon.c:605
#15 0x00007ffff7c46bb5 in virThreadHelper (data=<optimized out>) at ../../../libvirt/src/util/virthread.c:233
#16 0x00007ffff6e453f9 in start_thread () at /lib64/libpthread.so.0
#17 0x00007ffff766fb53 in clone () at /lib64/libc.so
An invalid image can be easily created by:
$ qemu-img create -f qcow2 -F qcow2 -b 'json:{' -u img.qcow2 10M
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The most important bit is that the caller is expected to pass
locked monitor.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Using the job owner API name directly works fine as long as it is a
static string or the owner's thread is still running. However, this is
not always the case. For example, when the owner API name is filled in a
job when we're reconnecting to existing domains after daemon restart,
the dynamically allocated owner name will disappear with the
reconnecting thread. Any follow up usage of the pointer will read random
memory.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When the parser and docs were enhanced to support a <teaming> element
in a generic <hostdev>, the example XML for formatdomain.rst was
cut/pasted from the example for <interface type='hostdev'>. In my
haste I neglected to remove the <mac address='blah'/> element (which
is unused/ignored for generic <hostdev> and change the closing tag
from </interface> to </hostdev>
https://bugzilla.redhat.com/1927984
Fixes: db64acfbda59ad22b671580fda13968c60bb8c1a
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 010ed0856b and commit db64acfbda introduced the ability to use
the <teaming> element in a generic <hostdev> (previously it could only
be used with <interface type='hostdev'>). However, the patch omitted
one crucial detail - along with parsing the <teaming> element in
<hostdev>, and adding the necessary info to the qemu commandline, we
also need to modify qemuMigrationSrcIsAllowedHostdev() to allow
migration when the generic <hostdev> has a <teaming> element.
https://bugzilla.redhat.com/1927984
Fixes: 010ed0856bb06f439e6fdf44e4f529f53441c398
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemuMonitorUnregister will be called in multiple threads (e.g. threads
in rpc worker pool and the vm event thread). In some cases, it isn't
protected by the monitor lock, which may lead to call g_source_unref
more than one time and a use-after-free problem eventually.
Add the missing lock in qemuProcessHandleMonitorEOF (which is the only
position missing lock of monitor I found).
Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Updated by "Update PO files to match POT (msgmerge)" hook in Weblate.
Translation: libvirt/libvirt
Translate-URL: https://translate.fedoraproject.org/projects/libvirt/libvirt/
Co-authored-by: Weblate <noreply@weblate.org>
Signed-off-by: Fedora Weblate Translation <i18n@lists.fedoraproject.org>
The libvirt libxl driver has no access to FDs associated with VM disks.
The disks are opened by libxl.so and any related FDs are not exposed to
applications. The prevents using virtlockd's auto-release feature to
release locks when the FD is closed. Acquiring and releasing locks is
explicitly handled by the libxl driver.
The current logic is structured such that locks are acquired in
libxlDomainStart and released in libxlDomainCleanup. This works well
except for migration, where the locks must be released on the source
host before the domain can be started on the destination host, but the
domain cannot be cleaned up until the migration confirmation stage.
When libxlDomainCleanup if finally called in the confirm stage, locks
are again released resulting in confusing errors from virtlockd and
libvirtd
virtlockd[8095]: resource busy: Lockspace resource 'xxxxxx' is not locked
libvirtd[8050]: resource busy: Lockspace resource 'xxxxxx' is not locked
libvirtd[8050]: Unable to release lease on testvm
The error is also encountered in some error cases, e.g. when
libxlDomainStart fails before acquiring locks and libxlDomainCleanup
is still used for cleanup.
In lieu of a mechanism to check if a lock has been acquired, this patch
takes an easy approach to fixing the unnecessary lock releases by adding
an indicator to the libxlDomainPrivate object that can be set when the
lock is acquired and cleared when the lock is released. libxlDomainCleanup
can then skip releasing the lock in cases where it was previously released
or never acquired in the first place.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit fa30ee04a2 caused a regression in normal domain shutown.
Initiating a shutdown from within the domain or via 'virsh shutdown'
does cause the guest OS running in the domain to shutdown, but libvirt
never reaps the domain so it is always shown in a running state until
calling 'virsh destroy'.
The shutdown thread is also an internal user of the driver shutdown
machinery and eventually calls libxlDomainDestroyInternal where
the ignoreDeathEvent inhibitor is set, but running in a thread
introduces the possibility of racing with the death event from
libxl. This can be prevented by setting ignoreDeathEvent before
running the shutdown thread.
An additional improvement is to handle the destroy event synchronously
instead of spawning a thread. The time consuming aspects of destroying
a domain have been completed when the destroy event is delivered.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit <d505b8af58912ae1e1a211fabc9995b19bd40828> changed the cpu quota
value that reflects what kernel allows but did not update our
documentation.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commints <bc760f4d7c4f964fadcb2a73e126b0053e7a9b06> and
<98a09ca48ed4fc011abf2aa290e02ce1b8f1bb5f> fixed the code to use
defines instead of magic numbers but missed this place.
Following commit <ed1ba69f5a8132f8c1e73d2a1f142d70de0b564a> changed
the cpu quota limit to reflect what kernel actually allows so using
the defines fixes XML validations as well.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Invoking the XML parser every time is quite expensive. Since we have a
deep copy function for 'virQEMUCapsPtr' object, we can cache the parsed
results lazily.
This brings significant speedup to qemuxml2argvtest:
real 0m2.234s
user 0m2.140s
sys 0m0.089s
vs.
real 0m1.161s
user 0m1.087s
sys 0m0.072s
qemuxml2xmltest benefits too:
real 0m0.879s
user 0m0.801s
sys 0m0.071s
vs.
real 0m0.466s
user 0m0.424s
sys 0m0.040s
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
The schema validator has a comment which allows checking all xml2argv
input files for schema validity by forcing the latest schema onto files
which don't have any schema. Fix it so that it works properly with the
caching introduced in previous commit.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
It's quite wasteful to reparse the QAPI schema for each _CAPS_ test.
Add a simple cache filled lazily by encountered schemas.
The time saving on my box is quite significant:
real 0m3.318s
user 0m3.203s
sys 0m0.107s
vs
real 0m2.223s
user 0m2.134s
sys 0m0.084s
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
The function calls virJSONValueObjectAppend/virJSONValueArrayAppend, so
by taking a double pointer we can drop the pointer clearing from
callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Avoid pointless copies of temporary strings when constructing number
JSON objects.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
virJSONValueNew* won't return error nowadays so NULL checks are not
necessary. The memory can be cleared via g_autoptr.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The parent array takes ownership of the inserted value once all checks
pass. Don't make the callers second-guess when that happens and modify
the function to take a double pointer so that it can be cleared once the
ownership is taken.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use automatic memory freeing and don't check return value of
virJSONValueNewString as it can't fail.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>