Allow callers to request XML validation against the schema. All callers
for now pass 'false'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
The custom namespace parameters for 'rbd' and 'netfs' pool types were
not included in the interleave statement.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The 'type' element was outside of the 'interleave' definition.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
While for now the 'mirror' element is output only, the idea was to allow
it to be used for input too to restore the mirror job if that becomes
the necessity. Allowing interleaving of the subelements can be done
regardless.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The 'model' and 'target' element can be freely moved around.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Allow interleaving in the 'qemucdevSrcDef' definition which is shared
by all places using character device as backend.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The original patches adding the functionality neglected to add any form
of documentation for the stats fields returned for this group.
The stats are directly converted from qemu's 'query-stats(-schema)' QMP
command without any further interpretation. The 'query-stats-schema' has
the following disclaimer:
Note: runtime-collected statistics and their names fall outside QEMU's usual
deprecation policies. QEMU will try to keep the set of available data
stable, together with their names, but will not guarantee stability
at all costs; the same is true of providers that source statistics
externally, e.g. from Linux. For example, if the same value is being
tracked with different names on different architectures or by different
providers, one of them might be renamed. A statistic might go away if
an algorithm is changed or some code is removed; changing a default
might cause previously useful statistics to always report 0. Such
changes, however, are expected to be rare.
Since libvirt is not doing any form of conversion of the stats we can't
meaningfully document any of the returned fields. At the same time we
can't even meaningfully provide any form of API stability for the field
names.
Modify the documentation for the 'VIR_DOMAIN_STATS_VM' group both in the
API docs and in the virsh man page to reflect that and disclaim any form
of stability guarantees we provide normally.
Fixes: 8c9e3dae14
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
libvirt-guests has After= dependency for all the sockets and that is enough.
With the extra Before= in the service file systemd postpones the start of the
socket activated service (when libvirt-guests is trying to connect to the
socket) until after libvirt-guests is stopped effectively making `systemctl stop
libvirt-guests` deadlock. The reason for that is that all stop jobs are
scheduled before any start job. Removing the redundant Before= specification
fixes this behaviour.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The node_device_driver.h declares nodeDeviceLock() and
nodeDeviceUnlock() functions which used to exist, but after
rework to automatic mutex management they exist no more. Their
last use was removed in v8.1.0-rc1~122.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Currently, udevNodeRegister() is forward declared in
node_device_driver.h even though the function is implemented in
node_device_udev.c which warrants node_device_udev.h header file.
Move the declaration into the correct file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Nothing in the header file requires the include of libudev.h, as
the former header file is now empty.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
The DMI_DEVPATH macro is used exclusively within
node_device_udev.c. There's no need to expose it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
The SYSFS_DATA_SIZE macro is Unused since its introduction in
v0.7.3~48. Sorry Dave.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
This reverts commit e49313b54e.
This reverts commit a0f37232b9.
Revert them together to not break build.
This fix of the issue is incorrect and breaks usage of other controllers
in hybrid mode that systemd creates, specifically usage of devices and
cpuacct controllers as they are now assumed to be part of the cgroup v2
topology which is not true.
We need to find different solution to the issue.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
On normal vm startup, we open a file descriptor
for the vsock device in qemuProcessPrepareHost.
However, when doing domxml-to-native, no file descriptors are open.
Only pass the fd if it's not -1, to make domxml-to-native work.
https://bugzilla.redhat.com/show_bug.cgi?id=1777212
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When libvirtd is restarted during an active outgoing migration (or
snapshot, save, or dump which are internally implemented as migration)
it wants to cancel the migration. But by a mistake in commit
v8.7.0-57-g2d7b22b561 the qemuMigrationSrcCancel function is called with
wait == true, which leads to an instant crash by dereferencing NULL
pointer stored in priv->job.current.
When canceling migration to file (snapshot, save, dump), we don't need
to wait until it is really canceled as no migration capabilities or
parameters need to be restored.
On the other hand we need to wait when canceling outgoing migration and
since we don't have virDomainJobData at this point, we have to
temporarily restore the migration job to make sure we can process
MIGRATION events from QEMU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In my commit v8.7.0-57-g2d7b22b561 I attempted to make
qemuMigrationSrcCancel synchronous, but failed. When we are canceling
migration after some kind of error which is detected in
in qemuMigrationSrcWaitForCompletion, jobData->status will be set to
VIR_DOMAIN_JOB_STATUS_FAILED regardless on QEMU state. So instead of
relying on the translated jobData->status in qemuMigrationSrcIsCanceled
we need to check the migration status we get from QEMU MIGRATION event.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
systemd in hybrid mode uses v1 hierarchies for controllers and v2 for
process tracking.
The LXC code uses virCgroupAddMachineProcess() to move processes into
appropriate cgroup by manipulating cgroupfs directly. (Note, despite
libvirt also supports talking to systemd directly via
org.freedesktop.machine1 API.)
If this path is taken, libvirt/lxc must convince systemd that processes
really belong to new cgroup, i.e. also the tracking v2 hierarchy must
undergo migration too.
The current check would evaluate v2 backend as unavailable with hybrid
mode (because there are no available controllers). Simplify the
condition and consider the mounted cgroup2 as sufficient to touch v2
hierarchy.
This consequently creates an issue with binding the V2 mount. In hybrid
mode the V2 filesystem may be mounted upon the V1 filesystem. By reversing
the order in which backends are mounted in virCgroupBindMount this problem
is circumvented.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/182
Signed-off-by: Eric van Blokland <mail@ericvanblokland.nl>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As advertised in the previous commit, QEMU_SCHED_CORE_VCPUS case
is implemented for hotplug case. The implementation is very
similar to the cold boot case, except here we fork off for every
vCPU (because the implementation is done in
qemuProcessSetupVcpu() which is also the function that's called
from hotplug code). But that's okay because our hotplug APIs
allow hotplugging one device at the time.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2074559
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_VCPUS case, the vCPU threads should be placed
all into one scheduling group, but not the emulator or any of its
threads. Therefore, as soon as vCPU TIDs are detected, fork off a
child which then creates a separate scheduling group and adds all
vCPU threads into it.
Please note, this commit only handles the cold boot case. Hotplug
is going to be implemented in the next commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_FULL case, all helper processes should be
placed into the same scheduling group as the QEMU process they
serve. It may happen though, that a helper process is started
before QEMU (cold start of a domain). But we have the dummy
process running from which the QEMU process will inherit the
scheduling group, so we can use the dummy process PID as an
argument to virCommandSetRunAmong().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For QEMU_SCHED_CORE_EMULATOR or QEMU_SCHED_CORE_FULL the QEMU
process (and its vCPU threads) should be placed into its own
scheduling group. Since we have the dummy process running for
exactly this purpose use its PID as an argument to
virCommandSetRunAmong().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The aim of this helper function is to spawn a child process in
which new scheduling group is created. This dummy process will
then used to distribute scheduling group from (e.g. when starting
helper processes or QEMU itself). The process is not needed for
QEMU_SCHED_CORE_NONE case (obviously) nor for
QEMU_SCHED_CORE_VCPUS case (because in that case a slightly
different child will be forked off).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Ideally, we would just pick the best default and users wouldn't
have to intervene at all. But in some cases it may be handy to
not bother with SCHED_CORE at all or place helper processes into
the same group as QEMU. Introduce a knob in qemu.conf to allow
users control this behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There are two modes of core scheduling that are handy wrt
virCommand:
1) create new trusted group when executing a virCommand
2) place freshly executed virCommand into the trusted group of
another process.
Therefore, implement these two new operations as new APIs:
virCommandSetRunAlone() and virCommandSetRunAmong(),
respectively.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Since its 5.14 release the Linux kernel allows userspace to
define trusted groups of processes/threads that can run on
sibling Hyper Threads (HT) at the same time. This is to mitigate
side channel attacks like L1TF or MDS. If there are no tasks to
fully utilize all HTs, then a HT will idle instead of running a
task from another (un-)trusted group.
On low level, this is implemented by cookies (effectively an UL
value): processes in the same trusted group share the same cookie
and cookie is unique to the group. There are four basic
operations:
1) PR_SCHED_CORE_GET -- get cookie of given PID,
2) PR_SCHED_CORE_CREATE -- create a new unique cookie for PID,
3) PR_SCHED_CORE_SHARE_TO -- push cookie of the caller onto
another PID,
4) PR_SCHED_CORE_SHARE_FROM -- pull cookie of another PID into
the caller.
Since a system where the code is built can be different to the
one where the code is ran let's provide declaration of some
values. It's not unusual for distros to ship older linux-headers
than the actual kernel.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There are couple of scenarios where we need to reflect MAC change
done in the guest:
1) domain restore from a file (here, we don't store updated MAC
in the save file and thus on restore create the macvtap with
the original MAC),
2) reconnecting to a running domain (here, the guest might have
changed the MAC while we were not running),
3) migration (here, guest might change the MAC address but we
fail to respond to it,
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When restoring a domain from a save image, we need to query QEMU
for some runtime information that is not stored in status XML, or
even if it is, it's not parsed (e.g. virtio-mem actual size, or
soon rx-filters for macvtaps).
During migration, this is done in qemuMigrationDstFinishFresh(),
or in case of newly started domain in qemuProcessStart(). Except,
the way that the code is written, when restoring from a save
image (which is effectively a migration), the state is never
refreshed, because qemuProcessStart() sees incoming migration so
it does not refresh the state thinking it'll be done in the
finish phase. But restoring from a save image has no finish
phase. Therefore, refresh the state explicitly after the domain
was restored but before vCPUs are resumed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We are not updating domain XML to new MAC address, just merely
setting host side of macvtap. But we don't need a MODIFY job for
that, QUERY is just fine.
This allows us to process the event should it occur during
migration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Parts of the code that responds to the NIC_RX_FILTER_CHANGED
event are going to be re-used. Separate them into a function
(qemuDomainSyncRxFilter()) and move the code into qemu_domain.c
so that it can be re-used from other places of the driver.
There's one slight change though: instead of passing device alias
from the just received event to qemuMonitorQueryRxFilter(), I've
switched to using the alias stored in our domain definition. But
these two are guaranteed to be equal. virDomainDefFindDevice()
made sure about that, if nothing else.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There's no need to call virNetDevRxFilterFree() explicitly, when
corresponding variables can be declared as
g_autoptr(virNetDevRxFilter).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
With the introduction of `libvirt` sub-directory to the cgroup topology
some of the cgroup configuration was moved into that sub-directory
together with the VM processes.
LXC uses virCgroupNewSelf() in the container process to detect cgroups
in order to report various data from cgroups inside the container.
We need to properly detect the new `libvirt` sub-directory here
otherwise LXC will report incorrect data.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds a new worker qemuDomainGetStatsVm which reports the
stats returned by "query-stats" via qemuMonitorQueryStats for the VM
target.
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
This patch adds the stats queried by qemuMonitorQueryStats for vCPU and
add them according to their QOM device path
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
This patch adds a hashtable for storing the stats schema and a function
to refresh it by querying "query-stats-schemas" using
qemuMonitorQueryStatsSchema
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>