The aim of qemuDomainGetPreservedMounts() is to get a list of
filesystems mounted under /dev and optionally generate a path for
each one where they are moved temporarily when building the
namespace. And if given domain is also running it looks into its
mount table rather than at the host one. But if it did look at
the domain's private mount table, it find /dev mounted twice: the
first time by udev, the second time the tmpfs mounted by us.
Now, later in the function there's a "sorting" algorithm that
tries to reduce number of mount points needing preservation, by
identifying nested mount points. And if we keep the second
occurrence of /dev on the list, well, after the "sorting" we are
left with nothing but "/dev" because all other mount points are
nested.
Fixes: 46b03819ae
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The aim of qemuDomainGetPreservedMounts() is to get a list of
filesystems mounted under /dev and optionally generate a path for
each one where they are moved temporarily when building the
namespace. And the function tries to be a bit clever about it.
For instance, if /dev/shm mount point exists, there's no need to
consider /dev/shm/a nor /dev/shm/b as preserving just 'top level'
/dev/shm gives the same result. To achieve this, the function
iterates over the list of filesystem as returned by
virFileGetMountSubtree() and removes the nested ones. However, it
does so in a bit clumsy way: plain VIR_DELETE_ELEMENT() is used
without freeing the string itself. Therefore, if all three
aforementioned example paths appeared on the list, /dev/shm/a and
/dev/shm/b strings would be leaked.
And when I think about it more, there's no real need to shrink
the array down (realloc()). It's going to be free()-d when
returning from the function. Switch to
VIR_DELETE_ELEMENT_INPLACE() then.
Fixes: cdd9205dff
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virTristateSwitchFromBool to fill in the default if user didn't
request it explicitly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
libvirt-guests has After= dependency for all the sockets and that is enough.
With the extra Before= in the service file systemd postpones the start of the
socket activated service (when libvirt-guests is trying to connect to the
socket) until after libvirt-guests is stopped effectively making `systemctl stop
libvirt-guests` deadlock. The reason for that is that all stop jobs are
scheduled before any start job. Removing the redundant Before= specification
fixes this behaviour.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
On normal vm startup, we open a file descriptor
for the vsock device in qemuProcessPrepareHost.
However, when doing domxml-to-native, no file descriptors are open.
Only pass the fd if it's not -1, to make domxml-to-native work.
https://bugzilla.redhat.com/show_bug.cgi?id=1777212
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When libvirtd is restarted during an active outgoing migration (or
snapshot, save, or dump which are internally implemented as migration)
it wants to cancel the migration. But by a mistake in commit
v8.7.0-57-g2d7b22b561 the qemuMigrationSrcCancel function is called with
wait == true, which leads to an instant crash by dereferencing NULL
pointer stored in priv->job.current.
When canceling migration to file (snapshot, save, dump), we don't need
to wait until it is really canceled as no migration capabilities or
parameters need to be restored.
On the other hand we need to wait when canceling outgoing migration and
since we don't have virDomainJobData at this point, we have to
temporarily restore the migration job to make sure we can process
MIGRATION events from QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In my commit v8.7.0-57-g2d7b22b561 I attempted to make
qemuMigrationSrcCancel synchronous, but failed. When we are canceling
migration after some kind of error which is detected in
in qemuMigrationSrcWaitForCompletion, jobData->status will be set to
VIR_DOMAIN_JOB_STATUS_FAILED regardless on QEMU state. So instead of
relying on the translated jobData->status in qemuMigrationSrcIsCanceled
we need to check the migration status we get from QEMU MIGRATION event.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
As advertised in the previous commit, QEMU_SCHED_CORE_VCPUS case
is implemented for hotplug case. The implementation is very
similar to the cold boot case, except here we fork off for every
vCPU (because the implementation is done in
qemuProcessSetupVcpu() which is also the function that's called
from hotplug code). But that's okay because our hotplug APIs
allow hotplugging one device at the time.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2074559
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_VCPUS case, the vCPU threads should be placed
all into one scheduling group, but not the emulator or any of its
threads. Therefore, as soon as vCPU TIDs are detected, fork off a
child which then creates a separate scheduling group and adds all
vCPU threads into it.
Please note, this commit only handles the cold boot case. Hotplug
is going to be implemented in the next commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_FULL case, all helper processes should be
placed into the same scheduling group as the QEMU process they
serve. It may happen though, that a helper process is started
before QEMU (cold start of a domain). But we have the dummy
process running from which the QEMU process will inherit the
scheduling group, so we can use the dummy process PID as an
argument to virCommandSetRunAmong().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_EMULATOR or QEMU_SCHED_CORE_FULL the QEMU
process (and its vCPU threads) should be placed into its own
scheduling group. Since we have the dummy process running for
exactly this purpose use its PID as an argument to
virCommandSetRunAmong().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The aim of this helper function is to spawn a child process in
which new scheduling group is created. This dummy process will
then used to distribute scheduling group from (e.g. when starting
helper processes or QEMU itself). The process is not needed for
QEMU_SCHED_CORE_NONE case (obviously) nor for
QEMU_SCHED_CORE_VCPUS case (because in that case a slightly
different child will be forked off).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Ideally, we would just pick the best default and users wouldn't
have to intervene at all. But in some cases it may be handy to
not bother with SCHED_CORE at all or place helper processes into
the same group as QEMU. Introduce a knob in qemu.conf to allow
users control this behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There are couple of scenarios where we need to reflect MAC change
done in the guest:
1) domain restore from a file (here, we don't store updated MAC
in the save file and thus on restore create the macvtap with
the original MAC),
2) reconnecting to a running domain (here, the guest might have
changed the MAC while we were not running),
3) migration (here, guest might change the MAC address but we
fail to respond to it,
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When restoring a domain from a save image, we need to query QEMU
for some runtime information that is not stored in status XML, or
even if it is, it's not parsed (e.g. virtio-mem actual size, or
soon rx-filters for macvtaps).
During migration, this is done in qemuMigrationDstFinishFresh(),
or in case of newly started domain in qemuProcessStart(). Except,
the way that the code is written, when restoring from a save
image (which is effectively a migration), the state is never
refreshed, because qemuProcessStart() sees incoming migration so
it does not refresh the state thinking it'll be done in the
finish phase. But restoring from a save image has no finish
phase. Therefore, refresh the state explicitly after the domain
was restored but before vCPUs are resumed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We are not updating domain XML to new MAC address, just merely
setting host side of macvtap. But we don't need a MODIFY job for
that, QUERY is just fine.
This allows us to process the event should it occur during
migration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Parts of the code that responds to the NIC_RX_FILTER_CHANGED
event are going to be re-used. Separate them into a function
(qemuDomainSyncRxFilter()) and move the code into qemu_domain.c
so that it can be re-used from other places of the driver.
There's one slight change though: instead of passing device alias
from the just received event to qemuMonitorQueryRxFilter(), I've
switched to using the alias stored in our domain definition. But
these two are guaranteed to be equal. virDomainDefFindDevice()
made sure about that, if nothing else.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There's no need to call virNetDevRxFilterFree() explicitly, when
corresponding variables can be declared as
g_autoptr(virNetDevRxFilter).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch adds a new worker qemuDomainGetStatsVm which reports the
stats returned by "query-stats" via qemuMonitorQueryStats for the VM
target.
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
This patch adds the stats queried by qemuMonitorQueryStats for vCPU and
add them according to their QOM device path
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
This patch adds a hashtable for storing the stats schema and a function
to refresh it by querying "query-stats-schemas" using
qemuMonitorQueryStatsSchema
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
As qemu becomes more modularized, it is important for libvirt to advertise
availability of the modularized functionality through capabilities. This
change adds channel devices to domain capabilities, allowing clients such
as virt-install to avoid using spicevmc channel devices when not supported
by the target qemu.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The error message doesn't really convey the information that 3d
acceleration works only for the 'virtio' model and similarly the same
error would be reported if qemu doesn't support acceleration, which is
hard to debug.
Split and clarify the errors.
Noticed in https://gitlab.com/libvirt/libvirt/-/issues/388
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Users can play all sorts of games with mount points. For
instance, they can unmount and mount back a hugetlbfs and only
after that attempt to hotplug memory.
This has an unfortunate consequence though. During memory
hotplug, when qemuProcessBuildDestroyMemoryPaths() is called the
path is created with very restrictive mode (0700) because under
the hood g_mkdir_with_parents(path, 0700) is called.
Therefore, create the driver generic portion of the path
separately.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2134009
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
During its initialization, the QEMU driver iterates over
hugetlbfs mount points, creating the driver specific path in each
of them ($prefix/libvirt/qemu). This path is created with very
wide mode (0777) because per-domain directories are then created
under it.
Separate this code into a function so that it can be re-used.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
domcapabilities reports spice graphics support even against a minimal
qemu installation without spice modules. Checking for 'query-spice'
in the list of qmp commands supported by qemu is not sufficient to
determine spice support. Checking the command line produces acurrate
results.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As qemu becomes more modularized, it is important for libvirt to advertise
availability of the modularized functionality through capabilities. This
change adds USB redirect devices to domain capabilities, allowing clients
such as virt-install to avoid using redirdev devices when not supported
by the target qemu.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When libvirt is restarted, the qemuProcessShutdownReboot command is
executed to restore the VM that is being restarted. In this case, a
coredump may occur when we hotplug a pci device since the PCI address
hasn't be inited yet. Moving the initialization of address to the front
of qemuProcessShutdownOrReboot to ensure that we have the address inited.
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If QEMU replies to device_del command with "DeviceNotFound"
error, then libvirt doesn't clean the device from the live
configuration.
This is because qemuMonitorDelDevice() returns -2 to
qemuDomainDeleteDevice() and instead of calling
qemuDomainRemoveDevice() the qemuDomainDetachDeviceLive() jumps
right onto cleanup label.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/359
Signed-off-by: Pierre LIBEAU <pierre.libeau@corp.ovh.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The @vendor variable inside virQEMUCapsCPUDefsToModels() is
allocated, but never freed. But there is actually no need for it
to be allocated, because it merely passes a retval of
virCPUGetVendorForModel() (which returns a const string) to
virDomainCapsCPUModelsAdd() (which ten accepts the argument as
const string). Therefore, drop the g_strdup() call and fix the
type of the variable.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Since commit "cpu_x86: Disable blockers from unusable CPU models"
(v3.8.0-99-g9c9620af1d) we explicitly disable CPU features reported by
QEMU as usability blockers for a particular CPU model when creating
baseline or host-model CPU definition. When QEMU changed canonical names
for some features (mostly those with '_' in their names), we forgot to
translate the blocker lists to names used by libvirt and the renamed
features would no longer be explicitly disabled in the created CPU model
even if they were reported as blockers by QEMU.
For example, on a host where EPYC CPU model has the following blockers
<blocker name='sha-ni'/>
<blocker name='mmxext'/>
<blocker name='fxsr-opt'/>
<blocker name='cr8legacy'/>
<blocker name='sse4a'/>
<blocker name='misalignsse'/>
<blocker name='osvw'/>
we would fail to disable 'fxsr-opt':
<cpu mode='custom' match='exact'>
<model fallback='forbid'>EPYC</model>
<feature policy='disable' name='sha-ni'/>
<feature policy='disable' name='mmxext'/>
<feature policy='disable' name='cr8legacy'/>
<feature policy='disable' name='sse4a'/>
<feature policy='disable' name='misalignsse'/>
<feature policy='disable' name='osvw'/>
<feature policy='disable' name='monitor'/>
</cpu>
The 'monitor' feature is disabled even though it is not reported as a
blocker by QEMU because libvirt's definition of EPYC includes the
feature while it is missing in EPYC definition in QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
So far QEMU driver does not get CPU model vendor from QEMU directly and
it has to ask the CPU driver for the info stored in CPU map.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Even though several CPU models from various vendors are reported as
usable on a given host, user may still want to use only those that match
the host vendor. Currently the only place where users can check the
vendor of each CPU model is our CPU map, which is considered internal
and users should not really be using it directly. So to allow for such
filtering we now advertise the vendor of each CPU model in domain
capabilities.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The only part of qemuCaps both functions are interested in is the CPU
architecture. Changing them to expect just virArch makes the functions
more reusable.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Since the function always returns 0, we can just return void and make
callers simpler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLParse so that the code doesn't have to explicitly allocate
an XPath context and validate the root element.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Replace virNetworkDefParseString/File by direct calls to
virNetworkDefParse.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rename virDomainBackupDefParse to virDomainBackupDefParseXML and use
it in place of virDomainBackupDefParseNode. This is possible as
virXMLParse can be used to replace XPath context allocation and root
node checking.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch 'fixes' the behavior of the persistent_state TPM domain XML
attribute that intends to preserve the state of the TPM but should not
keep the state around on all the hosts a VM has been migrated to. It
removes the TPM state directory structure from the source host upon
successful migration when non-shared storage is used. Similarly, it
removes it from the destination host upon migration failure when
non-shared storage is used.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add UNDEFINE_TPM and UNDEFINE_KEEP_TPM flags to qemuDomainUndefineFlags()
API and --tpm and --keep-tpm to 'virsh undefine'. Pass the
virDomainUndefineFlagsValues via qemuDomainRemoveInactive()
from qemuDomainUndefineFlags() all the way down to
qemuTPMEmulatorCleanupHost() and delete TPM storage there considering that
the UNDEFINE_TPM flag has priority over the persistent_state attribute
from the domain XML. Pass 0 in all other API call sites to
qemuDomainRemoveInactive() for now.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we no longer use the capability, stop probing for existence
of 'virtual-css-bridge' and its properties.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Introduced in libvirt by:
commit f245a9791c
qemu: introduce capability for virtual-css-bridge
Which mentions that its support was in QEMU 2.7.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This capability was introduced by libvirt commit:
commit 263e65fd20
qemu: introduce vfio-ccw capability
It probes for the cssid-unrestricted property of
virtual-css-bridge, which was introduced in QEMU v2.12 by:
commit 99577c492fb2916165ed9bc215f058877f0a4106
s390x/css: unrestrict cssids
Since we bumped the minimum QEMU version to 4.2.0, assume
this property is always present.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>