Back when the original version of this chunk of code was added (commit
41b087198 in libvirt-0.8.1 in April 2010), we used virExecDaemonize()
to start the qemu process, and would continue on in the function
(which at that time was called qemudStartVMDaemon()) even if a -1 was
returned. So it was possible to get to this code with rv == -1 (it was
called "ret" in that version of the code).
In modern libvirt code, qemu is started with virCommandRun(); then we
call virPidFileReadPath(); those are the only two ways of setting "rv"
prior to this code being removed, and in either case if the new value
of rv < 0, then we immediately skip over the rest of the code to the
cleanup: label.
This means that the code being removed by this patch is
unreachable.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
As mentioned in previous commit, populating domain's namespace
from pre-exec() hook is dangerous. This commit moves population
of the namespace with basic /dev nodes (e.g. /dev/null, /dev/kvm,
etc.) into daemon's namespace.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Okay, here is the deal. Currently, the way we build namespace is
very fragile. It is done from pre-exec hook when starting a
domain, after we mass closed all FDs and before we drop
privileges and exec() QEMU. This fact poses some limitations onto
the namespace build code, e.g. it has to make sure not to keep
any FD opened (not even through a library call), because it would
be leaked to QEMU. Also, it has to call only async signal safe
functions. These requirements are hard to meet - in fact as of my
commit v6.2.0-rc1~235 we are leaking a FD into QEMU by calling
libdevmapper functions.
To solve this issue and avoid similar problems in the future, we
should change our paradigm. We already have functions which can
populate domain's namespace with nodes from the daemon context.
If we use them to populate the namespace and keep only the bare
minimum in the pre-exec hook, we've mitigated the risk.
Therefore, the old qemuDomainBuildNamespace() is renamed to
qemuDomainUnshareNamespace() and new qemuDomainBuildNamespace()
function is introduced. So far, the new function is basically a
NOP and domain's namespace is still populated from the pre-exec
hook - next patches will fix it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The name of this function is not very helpful, because it doesn't
create anything, it just flips a bit in a bitmask when domain is
starting up. Move the function internals into qemu_process.c and
forget the function ever existed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The qemu_domain.c file is big as is and we should split it into
separate semantic blocks. Start with code that handles domain
namespaces.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Both accept a NULL value gracefully and virStringFreeList
does not zero the pointer afterwards, so a straight replace
is safe.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There are races condiction to make '/run/libvirt/qemu/dbus' directory in
virDirCreateNoFork() while concurrent start VMs, and get "failed to create
directory '/run/libvirt/qemu/dbus': File exists" error message. pre-create the
dbus directory in qemuStateInitialize.
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Since active domains which do not have the attribute already set were
not started by libvirt that probed for CPU migratable property, we need
to check this property on reconnect and update the domain definition
accordingly.
https://bugzilla.redhat.com/show_bug.cgi?id=1857967
Reported-by: Mark Mielke <mark.mielke@gmail.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Use g_autoptr() and remove the 'cleanup' label.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200717211556.1024748-3-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The index returned by qemuDomainDiskLookupByNodename is the position in
the backing chain rather than the index we report in the XML.
Since with -blockdev they differ now and additionally the disk source
also has an index we need to fix the 'threshold' events we report:
1) If it's the top level image we must always trigger the event without
any suffix as we did until now
2) We must report the correct index
3) We must report the correct index also for the top level image, when
blockdev is used.
This means that we need to potentially emit 2 events, one for the device
without the index and then when blockdev is used and the top level image
has an index we must do it also with the index.
This will fix it for blockdev cases, while also not removing previous
semantics.
https://bugzilla.redhat.com/show_bug.cgi?id=1857204
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
To remove dependecy of `qemuDomainJob` on job specific
paramters, a `privateData` pointer is introduced.
To handle it, structure of callback functions is
also introduced.
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently, when restoring from a domain the path that the domain
restores from is labelled under qemuSecuritySetAllLabel() (and after
v6.3.0-rc1~108 even outside transactions). While this grants QEMU
the access, it has a flaw, because once the domain is restored, up
and running then qemuSecurityDomainRestorePathLabel() is called,
which is not real counterpart. In case of DAC driver the
SetAllLabel() does nothing with the restore path but
RestorePathLabel() does - it chown()-s the file back and since there
is no original label remembered, the file is chown()-ed to
root:root. While the apparent solution is to have DAC driver set the
label (and thus remember the original one) in SetAllLabel(), we can
do better.
Turns out, we are opening the file ourselves (because it may live on
a root squashed NFS) and then are just passing the FD to QEMU. But
this means, that we don't have to chown() the file at all, we need
to set SELinux labels and/or add the path to AppArmor profile.
And since we want to restore labels right after QEMU is done loading
the migration stream (we don't want to wait until
qemuSecurityRestoreAllLabel()), the best way to approach this is to
have separate APIs for labelling and restoring label on the restore
file.
I will investigate whether AppArmor can use the SavedStateLabel()
API instead of passing the restore path to SetAllLabel().
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1851016
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Some, but not all, of the monitor event handlers check
the virObjectUnref return value to see if the domain
was disposed.
It should not be possible for this to happen, since
the function already holds a lock on the domain and
has only just acquired an extra reference on the
domain a few lines earlier.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Commit v3.10.0-182-g237f045d9a ("qemu: Ignore fallback CPU attribute
on reconnect") forced CPU 'fallback' to ALLOW, regardless of user
choice. This fixed a situation in which guests created with older
Libvirt versions, which used CPU mode 'host-model' in runtime, would
fail to launch in a newer Libvirt if the fallback was set to FORBID.
This would lead to a scenario where the CPU was translated to 'host-model'
to 'custom', but then the FORBID setting would make the translation
process fail.
PSeries can operate with 'host-model' in runtime due to specific PPC64
mechanics regarding compatibility mode. The update() implementation of
the cpuDriverPPC64 driver is a NO-OP if CPU mode is 'host-model', and
the driver does not implement translate(). The commit mentioned above
is causing PSeries guests to get their 'fallback' setting to ALLOW,
overwriting user choice, exposing a design problem in
qemuProcessRefreshCPU() - for PSeries guests, handling 'host-model'
as it is being done does not apply.
All other cpuArchDrivers implements update() and changes guest mode
to VIR_CPU_MODE_CUSTOM, meaning that PSeries is currently the only
exception to this logic. Let's make it official.
https://bugzilla.redhat.com/show_bug.cgi?id=1660711
Suggested-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200525123945.4049591-2-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Use automatic cleanup on qemuProcessUpdateCPU and the functions called
by it.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200522195620.3843442-5-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
qemuxml2argv test suite is way more comprehensive than the hotplug
suite. Since we share the code paths for monitor and command line
hotplug we can easily test the properties of devices against the QAPI
schema.
To achieve this we'll need to skip the JSON->commandline conversion for
the test run so that we can analyze the pure properties. This patch adds
flags for the comand line generator and hook them into the
JSON->commandline convertor for -netdev. An upcoming patch will make use
of this new infrastructure.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If the mirror destination is not a file but a NVMe disk, then
call qemuHostdevReAttachOneNVMeDisk() to reattach the NVMe back
to the host.
This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1825785
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
In v5.10.0-rc1~42 (which was later fixed in v6.0.0-rc1~487) I am
removing XATTRs for a file that QEMU is mirroring a disk into but
it is killed meanwhile. Well, we can call
qemuSecurityRestoreImageLabel() which will not only remove XATTRs
but also use them to restore the original owner of the file.
This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
A failure in qemuProcessLaunch would lead to qemuExtDevicesStop
being called twice - once in the cleanup section and then again
in qemuProcessStop.
However, the first one is called while the QEMU process is
still running, which is too soon for the swtpm process, because
the swtmp_ioctl command can lock up:
https://bugzilla.redhat.com/show_bug.cgi?id=1822523
Remove the first call and only leave the one in qemuProcessStop,
which is called after the QEMU process is killed.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Help QEMU in deprecation of -drive if=none without the need to refactor
all old boards. Stop masking out -blockdev support when -drive if=sd
needs to be used. We achieve this by forbidding blockjobs and
special-casing all other code paths. Blockjobs are sacrificed in this
case as SD cards are a corner case for some ARM boards and are thus not
used commonly.
https://bugzilla.redhat.com/show_bug.cgi?id=1821692
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
SD cards need to be instantiated via -drive if=sd. This means that all
cases where we use the blockdev path need to be special-cased for SD
cards.
Note that at this point QEMU_CAPS_BLOCKDEV is still cleared if the VM
config has a SD card.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use the drive alias for all cases when we can't generate qomName. This
is meant to handle disks on 'sd' bus which are instantiated via -drive
if=sd as there isn't any specific QOM name for them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function effectively boils down to whether the disk is 'SD'. Since
we'll need to make more decisions based on the fact whether the disk is
on the SD bus, rename the function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So far, libvirt generates the following path for memory:
$memoryBackingDir/$id-$shortName/ram-nodeN
where $memoryBackingDir is the path where QEMU mmaps() memory for
the guest (e.g. /var/lib/libvirt/qemu/ram), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:
/var/lib/libvirt/qemu/ram/1-QEMUGuest/ram-node0
While in case of embed driver the following path would be
generated by default:
$root/lib/qemu/ram/1-QEMUGuest/ram-node0
which is not clashing with other embed drivers, we allow users to
override the default and have all embed drivers use the same
prefix. This can create clashing paths. Fortunately, we can reuse
the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.
Note, the important change is in qemuGetMemoryBackingBasePath().
The rest is needed to pass driver around.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
So far, libvirt generates the following path for hugepages:
$mnt/libvirt/qemu/$id-$shortName
where $mnt is the mount point of hugetlbfs corresponding to
hugepages of desired size (e.g. /dev/hugepages), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:
/dev/hugepages/libvirt/qemu/1-QEMUGuest
But this won't work with embed driver really, because if there
are two instances of embed driver, and they both want to start a
domain with the same name and with hugepages, both drivers will
generate the same path which is not desired. Fortunately, we can
reuse the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.
Note, the important change is in qemuGetBaseHugepagePath(). The
rest is needed to pass driver around.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In previous commit:
commit e6afacb0fe
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Wed Feb 12 12:26:11 2020 +0000
qemu: start/stop an event loop thread for domains
A bogus comment was added claiming we didn't need to shutdown the
event thread in the qemuProcessStop method, because this would be
done in the monitor EOF callback. This was wrong because the EOF
callback only runs in the case of a QEMU crash or a guest initiated
clean shutdown & poweroff. In the case where the libvirt admin
calls virDomainDestroy, the EOF callback never fires because we
have already unregistered the event callbacks. We must thus always
attempt to stop the event thread in qemuProcessStop.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now, that our virCommandSetPidFile() is more intelligent we don't
need to rely on the daemon to create and lock the pidfile and use
virCommandSetPidFile() at the same time.
NOTE that as advertised in the previous commit, this was
temporarily broken, because both virCommand and
qemuProcessStartManagedPRDaemon() would try to lock the pidfile.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Sync was introduced in [1] to check for ga presence. This
check is racy but in the era before serial events are available
there was not better solution I guess.
In case we have the events the sync function is different. It allows us
to flush stateless ga channel from remnants of previous communications.
But we need to do it only once. Until we get timeout on issued command
channel state is ok.
[1] qemu_agent: Issue guest-sync prior to every command
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This converts the QEMU agent APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.
A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This converts the QEMU monitor APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.
A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In common with regular QEMU guests, the QMP probing
will need an event loop for handling monitor I/O
operations.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The event loop thread will be responsible for handling
any per-domain I/O operations, most notably the QEMU
monitor and agent sockets.
We start this event loop when launching QEMU, but stopping
the event loop is a little more complicated. The obvious
idea is to stop it in qemuProcessStop(), but if we do that
we risk loosing the final events from the QEMU monitor, as
they might not have been read by the event thread at the
time we tell the thread to stop.
The solution is to delay shutdown of the event thread until
we have seen EOF from the QEMU monitor, and thus we know
there are no further events to process.
Note that this assumes that we don't have events to process
from the QEMU agent.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.
The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image. Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1803551
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Pass logManager to qemuExtDevicesStart for future usage.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Include virutil.h in all files that use it,
instead of relying on it being pulled in somehow.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
In two places where virPidFileForceCleanupPath() is called, we
try to unlink() the pidfile again. This is needless because
virPidFileForceCleanupPath() has done just that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Pvpanic device supports bit 1 as crashloaded event, it means that
guest actually panicked and run kexec to handle error by guest side.
Handle crashloaded as a lifecyle event in libvirt.
Test case:
Guest side:
before testing, we need make sure kdump is enabled,
1, build new pvpanic driver (with commit from upstream
e0b9a42735f2672ca2764cfbea6e55a81098d5ba
191941692a3d1b6a9614502b279be062926b70f5)
2, insmod new kmod
3, enable crash_kexec_post_notifiers,
# echo 1 > /sys/module/kernel/parameters/crash_kexec_post_notifiers
4, trigger kernel panic
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger
Host side:
1, build new qemu with pvpanic patches (with commit from upstream
600d7b47e8f5085919fd1d1157f25950ea8dbc11
7dc58deea79a343ac3adc5cadb97215086054c86)
2, build libvirt with this patch
3, handle lifecycle event and trigger guest side panic
# virsh event stretch --event lifecycle
event 'lifecycle' for domain stretch: Crashed Crashloaded
events received: 1
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
The usability of a specific CPU mode may depend on machine type, let's
prepare for this by passing it to virQEMUCapsIsCPUModeSupported.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now, that every use of virAtomic was replaced with its g_atomic
equivalent, let's remove the module.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This enables support for running QEMU embedded to the calling
application process using a URI:
qemu:///embed?root=/some/path
Note that it is important to keep the path reasonably short to
avoid risk of hitting the limit on UNIX socket path names
which is 108 characters.
When using the embedded mode with a root=/var/tmp/embed, the
driver will use the following paths:
logDir: /var/tmp/embed/log/qemu
swtpmLogDir: /var/tmp/embed/log/swtpm
configBaseDir: /var/tmp/embed/etc/qemu
stateDir: /var/tmp/embed/run/qemu
swtpmStateDir: /var/tmp/embed/run/swtpm
cacheDir: /var/tmp/embed/cache/qemu
libDir: /var/tmp/embed/lib/qemu
swtpmStorageDir: /var/tmp/embed/lib/swtpm
defaultTLSx509certdir: /var/tmp/embed/etc/pki/qemu
These are identical whether the embedded driver is privileged
or unprivileged.
This compares with the system instance which uses
logDir: /var/log/libvirt/qemu
swtpmLogDir: /var/log/swtpm/libvirt/qemu
configBaseDir: /etc/libvirt/qemu
stateDir: /run/libvirt/qemu
swtpmStateDir: /run/libvirt/qemu/swtpm
cacheDir: /var/cache/libvirt/qemu
libDir: /var/lib/libvirt/qemu
swtpmStorageDir: /var/lib/libvirt/swtpm
defaultTLSx509certdir: /etc/pki/qemu
At this time all features present in the QEMU driver are available when
running in embedded mode, availability matching whether the embedded
driver is privileged or unprivileged.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The function virSecretGetSecretString calls into secret driver and is
used from other hypervisors drivers and as such makes more sense in
util.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt's original atomic ops impls were largely copied
from GLib's code at the time. The only API difference
was that libvirt's virAtomicIntInc() would return a
value, but g_atomic_int_inc was void. We thus use
g_atomic_int_add(v, 1) instead, though this means
virAtomicIntInc() now returns the original value,
instead of the new value.
This rewrites libvirt's impl in terms of g_atomic_int*
as a short term conversion. The key motivation was to
quickly eliminate use of GNULIB's verify_expr() macro
which is not a direct match for G_STATIC_ASSERT_EXPR.
Long term all the callers should be updated to use
g_atomic_int* directly.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When pause-before-switchover QEMU capability is enabled, we get STOP
event before MIGRATION event with postcopy-active state. To properly
handle post-copy migration and emit correct events commit
v4.10.0-rc1-4-geca9d21e6c added a hack to
qemuProcessHandleMigrationStatus which translates the paused state
reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits
VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes
to post-copy.
However, the code was effective on both sides of migration resulting in
a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination
host, where entering post-copy mode is already properly advertised by
VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event.
https://bugzilla.redhat.com/show_bug.cgi?id=1791458
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function needs domain definition really, we don't need to
pass the whole domain object. This saves couple of dereferences
and characters esp. in more checks to come.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Commit d75f865fb9 caused a job-deadlock if
a VM is running the backup job and being destroyed as it removed the
cleanup of the async job type and there was nothing to clean up the
backup job.
Add an explicit cleanup of the backup job when destroying a VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
qemuDomainObjPrivateDataClear clears state which become invalid after VM
stopped running and the node name allocator belongs there.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
If we use fake reboot then domain goes thru running->shutdown->running
state changes with shutdown state only for short period of time. At
least this is implementation details leaking into API. And also there is
one real case when this is not convinient. I'm doing a backup with the
help of temporary block snapshot (with the help of qemu's API which is
used in the newly created libvirt's backup API). If guest is shutdowned
I want to continue to backup so I don't kill the process and domain is
in shutdown state. Later when backup is finished I want to destroy qemu
process. So I check if it is in shutdowned state and destroy it if it
is. Now if instead of shutdown domain got fake reboot then I can destroy
process in the middle of fake reboot process.
After shutdown event we also get stop event and now as domain state is
running it will be transitioned to paused state and back to running
later. Though this is not critical for the described case I guess it is
better not to leak these details to user too. So let's leave domain in
running state on stop event if fake reboot is in process.
Reconnection code handles this patch without modification. It detects
that qemu is not running due to shutdown and then calls qemuProcessShutdownOrReboot
which reboots as fake reboot flag is set.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
The 'cleanup' flag is doing no cleaup in this function. We can
remove it and return NULL on error or qemuBuildCommandLine().
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The g_auto*() changes made by the previous patches made a lot
of 'cleanup' labels obsolete. Let's remove them.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Change all feasible pointers to use g_autoptr().
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Change all feasible strings and scalar pointers to use g_autofree.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Now, that we have everything prepared, we can generate command
line for NVMe disks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
This reverts commit 7be5fe66cd.
This commit broke resctrl, because it missed the fact that the
virResctrlInfoGetCache() has side-effects causing it to actually
change the virResctrlInfo parameter, not merely get data from
it.
This code will need some refactoring before we can try separating
it from virCapabilities again.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When QEMU uid/gid is set to non-root this is pointless as if we just
used a regular setuid/setgid call, the process will have all its
capabilities cleared anyway by the kernel.
When QEMU uid/gid is set to root, this is almost (always?) never
what people actually want. People make QEMU run as root in order
to access some privileged resource that libvirt doesn't support
yet and this often requires capabilities. As a result they have
to go find the qemu.conf param to turn this off. This is not
viable for libguestfs - they want to control everything via the
XML security label to request running as root regardless of the
qemu.conf settings for user/group.
Clearing capabilities was implemented originally because there
was a proposal in Fedora to change permissions such that root,
with no capabilities would not be able to compromise the system.
ie a locked down root account. This never went anywhere though,
and as a result clearing capabilities when running as root does
not really get us any security benefit AFAICT. The root user
can easily do something like create a cronjob, which will then
faithfully be run with full capabilities, trivially bypassing
the restriction we place.
IOW, our clearing of capabilities is both useless from a security
POV, and breaks valid use cases when people need to run as root.
This removes the clear_emulator_capabilities configuration
option from qemu.conf, and always runs QEMU with capabilities
when root. The behaviour when non-root is unchanged.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We will want to use the async job infrastructure along with all the APIs
and event for the backup job so add the backup job as a new async job
type.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We always refresh the capabilities object when using virResctrlInfo
during process startup. This is undesirable overhead, because we can
just directly create a virResctrlInfo instead.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Avoid grabbing the whole virCapsPtr object when we only need the
host CPU information.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Avoid grabbing the whole virCapsPtr object when we only need the
NUMA information.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The NUMA cells are stored directly in the virCapsHostPtr
struct. This moves them into their own struct allowing
them to be stored independantly of the rest of the host
capabilities. The change is used as an excuse to switch
the representation to use a GPtrArray too.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now that the domain XML APIs don't use virCapsPtr we can stop passing it
around many QEMU driver methods.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Our normal practice is for the object type to be the name prefix, and
the object instance be the first parameter passed in.
Rename these to virDomainObjSave and virDomainDefSave moving their
primary parameter to be the first one. Ensure that the xml options
are passed into both functions in prep for future work.
Finally enforce checking of the return type and mark all parameters
as non-NULL.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
As part of a goal to eliminate the need to use virCapsPtr for anything
other than the virConnectGetCapabilies() API impl, cache the host arch
against the QEMU driver struct and use that field directly.
In the tests we move virArchFromHost() globally in testutils.c so that
every test runs with a fixed default architecture reported.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In v5.9.0-370-g8fa0374c5b I've tried to fix a bug by removing
some stale XATTRs in qemuProcessStop(). However, I forgot to
do nothing when the VIR_QEMU_PROCESS_STOP_NO_RELABEL flag was
specified.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
We clear some capabilities here so the lockouts need to be
re-evaluated.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Do all post-processing of capabilities in qemuProcessPrepareQEMUCaps.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Move the post-processing of the QEMU_CAPS_CHARDEV_FD_PASS flag to the
new function.
The clearing of the capability is based on the presence of
VIR_QEMU_PROCESS_START_STANDALONE so we must also pass in the process
start flags.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Start aggregating all capability post-processing code in one place.
The comment was modified while moving it as it was mentioning floppies
which are no longer clearing the blockdev capability.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function is now used only in qemu_process.c so move it there and
name it 'qemuProcessPrepareQEMUCaps' which is more appropriate to what
it's doing.
The reworded comment now mentions that it will also post-process the
caps for VM startup.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The redetection was originally added in 43c01d3838 as a way to recover
from libvirtd upgrade from the time when we didn't persist the qemu
capabilities in the status XML. Also this the oldest supported qemu by
more than two years.
Even if somebody would have a running VM running at least qemu 1.5 with
such an old libvirt we certainly wouldn't do the right thing by
redetecting the capabilities and then trying to communicate with qemu.
For now it will be the best to just stop considering this scenario any
more and error out for such VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit c90fb5a828 added explicit use of the private copy of the qemu
capabilities to various places. The change to qemuProcessInit was bogus
though as at the point where we re-initiate the post parse callbacks
priv->qemuCaps is still NULL as we clear it after shutdown of the VM and
don't initiate it until a later point.
Using the value from priv->qemuCaps might mislead readers of the code
into thinking that something useful is being passed at that point so go
with an explicit NULL instead.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Block jobs may be members of async jobs so it makes more sense to
refresh block job state after we do steps for async job recovery.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
This also isn't required (due to the vportprofile being stored in the
NetDef as a pointer rather than being directly contained), but it
seemed dishonest to not mark it as const (and thus permit users to
modify its contents)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
If user starts a blockcommit or a blockcopy then we modify access
for qemu on both images and leave it like that until the job
terminates. So far so good. Problem is, if user instead of
terminating the job (where we would modify the access again so
that the state before the job is restored) calls destroy on the
domain or if qemu dies whilst executing the block job. In this
case we don't ever clear the access we granted at the beginning.
To fix this, maybe a bit harsh approach is used, but it works:
after all labels were restored (that is after
qemuSecurityRestoreAllLabel() was called), we iterate over each
disk in the domain and remove XATTRs from the whole backing chain
and also from any file the disk is being mirrored to.
This would have been done at the time of pivot, but it isn't
because user decided to kill the domain instead. If we don't do
this and leave some XATTRs behind the domain might be unable to
start.
Also, secdriver can't do this because it doesn't know if there is
any job running. It's outside of its scope - the hypervisor
driver is responsible for calling secdriver's APIs.
Moreover, this is safe to call because we don't remember labels
for any member of a backing chain except of the top layer. But
that one was restored in qemuSecurityRestoreAllLabel() call done
earlier. Therefore, not only we don't remember labels (and thus
this is basically a NOP for other images in the backing chain) it
is also safe to call this when no blockjob was started in the
first place, or if some parts of the backing chain are shared
with some other domains - this is NOP, unless a block job is
active at the time of domain destroy.
https://bugzilla.redhat.com/show_bug.cgi?id=1741456#c19
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Install the convertor function which enables the internals that will use
-blockdev to make qemu open the firmware image and stop using -drive.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The functions return virDomainCapsCPUModelsPtr and thus they should be
called *CPUModels for consistency. Functions called *CPUDefinitions will
work on qemuMonitorCPUDefsPtr.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function would return a valid virDomainCapsCPUModelsPtr with empty
CPU models list if query-cpu-definitions exists in QEMU, but returns
GenericError meaning it's not in fact implemented. This behaviour is a
bit strange especially after such virDomainCapsCPUModels structure is
stored in capabilities XML and parsed back, which will result in NULL
virDomainCapsCPUModelsPtr rather than a structure containing nothing.
Let's just keep virDomainCapsCPUModelsPtr NULL if the QMP command is not
implemented and change the return value to int so that callers can
easily check for failure or success.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some callers of virQEMUCapsGetCPUDefinitions will need to filter the
returned list of CPU models. Let's add the filtering parameters directly
to virQEMUCapsGetCPUDefinitions to avoid copying the CPU models list
twice.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rather than returning a direct pointer the list stored in qemuCaps the
function now creates a new copy of the CPU models list.
The main purpose of this seemingly useless change is to update callers
to free the result returned by virQEMUCapsGetCPUDefinitions because the
internals of this function will change significantly in the following
patches.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The @def variable holds pointer to the domain defintion, but is
set only somewhere in the middle of the function. This is
suboptimal.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Commit <f136b83139c63f20de0df3285d9e82df2fb97bfc> reworked process
affinity setting but did not take cgroups into account which introduced
an issue when starting VM with custom cpuset.cpus for the whole machine
group.
If the machine group is limited to some pCPUs libvirt should not try to
set a VM to run on all pCPUs as it will result in permission denied when
writing to cpuset.cpus.
To fix this the affinity has to be set separately from cgroups cpuset.
Resolves: <https://bugzilla.redhat.com/show_bug.cgi?id=1746517>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As suggested by Cole, this patch uses the domain capabilities to
validate the supported video model types. This allows us to remove the
model type validation from qemu_process.c and qemu_domain.c and
consolidates it all in a single place that will automatically adjust
when new domain capabilities are added.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Continue consolidation of video device validation started in previous
patch.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
The goal is to move all of the video device validation to a single place
and use domain caps to validate the supported video device models. Since
qemuDomainDeviceDefValidateVideo() is called from
qemuProcessStartValidate(), these changes should not change anny
behavior.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
When a CPU definition wants to explicitly disable some features that are
unknown to QEMU, we can safely drop them from the definition before
starting QEMU. Naturally QEMU won't enable such features implicitly.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>