The code that validates whether an internal snapshot is possible would
reject an empty but not-readonly drive. Since floppies can have this
property, add a check for emptiness.
==20406== 8 bytes in 1 blocks are definitely lost in loss record 24 of 1,059
==20406== at 0x4C2CF55: calloc (vg_replace_malloc.c:711)
==20406== by 0x54BF530: virAllocN (viralloc.c:191)
==20406== by 0x54D37C4: virConfGetValueStringList (virconf.c:1001)
==20406== by 0x144E4E8E: virQEMUDriverConfigLoadFile (qemu_conf.c:835)
==20406== by 0x1452A744: qemuStateInitialize (qemu_driver.c:664)
==20406== by 0x55DB585: virStateInitialize (libvirt.c:770)
==20406== by 0x124570: daemonRunStateInit (libvirtd.c:881)
==20406== by 0x5532990: virThreadHelper (virthread.c:206)
==20406== by 0x8C82493: start_thread (in /lib64/libpthread-2.24.so)
==20406== by 0x8F7FA1E: clone (in /lib64/libc-2.24.so)
==20406== 4 bytes in 1 blocks are definitely lost in loss record 6 of 1,059
==20406== at 0x4C2AF3F: malloc (vg_replace_malloc.c:299)
==20406== by 0x8F17D39: strdup (in /lib64/libc-2.24.so)
==20406== by 0x552C0E0: virStrdup (virstring.c:784)
==20406== by 0x54D3622: virConfGetValueString (virconf.c:945)
==20406== by 0x144E4692: virQEMUDriverConfigLoadFile (qemu_conf.c:687)
==20406== by 0x1452A744: qemuStateInitialize (qemu_driver.c:664)
==20406== by 0x55DB585: virStateInitialize (libvirt.c:770)
==20406== by 0x124570: daemonRunStateInit (libvirtd.c:881)
==20406== by 0x5532990: virThreadHelper (virthread.c:206)
==20406== by 0x8C82493: start_thread (in /lib64/libpthread-2.24.so)
==20406== by 0x8F7FA1E: clone (in /lib64/libc-2.24.so)
Commit a4a39d90 added a check that checks for VFIO support with mediated
devices. The problem is that the hostdev preparing functions behave like
a fallthrough if device of that specific type doesn't exist. However,
the check for VFIO support was independent of the existence of a mdev
device which caused the guest to fail to start with any device to be
directly assigned if VFIO was disabled/unavailable in the kernel.
The proposed change first ensures that it makes sense to check for VFIO
support in the first place, and only then performs the VFIO support check
itself.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1441291
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This removes the hacky extern global variable and modifies the
test code to properly create QEMU capabilities cache for QEMU
binaries used in our tests.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This attribute is not needed here, since @mon is in use.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Implement qemuMonitorRegister() as there is already a
qemuMonitorUnregister() function. This way it may be easier to
understand the code paths.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
This way qemuDomainLogContextRef() and qemuDomainLogContextFree() is
no longer needed. The naming qemuDomainLogContextFree() was also
somewhat misleading. Additionally, it's easier to turn
qemuDomainLogContext in a self-locking object.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
There were multiple race conditions that could lead to segmentation
faults. The first precondition for this is qemuProcessLaunch must fail
sometime shortly after starting the new QEMU process. The second
precondition for the segmentation faults is that the new QEMU process
dies - or to be more precise the QEMU monitor has to be closed
irregularly. If both happens during qemuProcessStart (starting a
domain) there are race windows between the thread with the event
loop (T1) and the thread that is starting the domain (T2).
First segmentation fault scenario:
If qemuProcessLaunch fails during qemuProcessStart the code branches
to the 'stop' path where 'qemuMonitorSetDomainLog(priv->mon, NULL,
NULL, NULL)' will set the log function of the monitor to NULL (done in
T2). In the meantime the event loop of T1 will wake up with an EOF
event for the QEMU monitor because the QEMU process has died. The
crash occurs if T1 has checked 'mon->logFunc != NULL' in qemuMonitorIO
just before the logFunc was set to NULL by T2. If this situation
occurs T1 will try to call mon->logFunc which leads to the
segmentation fault.
Solution:
Require the monitor lock for setting the log function.
Backtrace:
0 0x0000000000000000 in ?? ()
1 0x000003ffe9e45316 in qemuMonitorIO (watch=<optimized out>,
fd=<optimized out>, events=<optimized out>, opaque=0x3ffe08aa860) at
../../src/qemu/qemu_monitor.c:727
2 0x000003fffda2e1a4 in virEventPollDispatchHandles (nfds=<optimized
out>, fds=0x2aa000fd980) at ../../src/util/vireventpoll.c:508
3 0x000003fffda2e398 in virEventPollRunOnce () at
../../src/util/vireventpoll.c:657
4 0x000003fffda2ca10 in virEventRunDefaultImpl () at
../../src/util/virevent.c:314
5 0x000003fffdba9366 in virNetDaemonRun (dmn=0x2aa000cc550) at
../../src/rpc/virnetdaemon.c:818
6 0x000002aa00024668 in main (argc=<optimized out>, argv=<optimized
out>) at ../../daemon/libvirtd.c:1541
Second segmentation fault scenario:
If qemuProcessLaunch fails it will unref the log context and with
invoking qemuMonitorSetDomainLog(priv->mon, NULL, NULL, NULL)
qemuDomainLogContextFree() will be invoked. qemuDomainLogContextFree()
invokes virNetClientClose() to close the client and cleans everything
up (including unref of _virLogManager.client) when virNetClientClose()
returns. When T1 is now trying to report 'qemu unexpectedly closed the
monitor' libvirtd will crash because the client has already been
freed.
Solution:
As the critical section in qemuMonitorIO is protected with the monitor
lock we can use the same solution as proposed for the first
segmentation fault.
Backtrace:
0 virClassIsDerivedFrom (klass=0x3100979797979797,
parent=0x2aa000d92f0) at ../../src/util/virobject.c:169
1 0x000003fffda659e6 in virObjectIsClass (anyobj=<optimized out>,
klass=<optimized out>) at ../../src/util/virobject.c:365
2 0x000003fffda65a24 in virObjectLock (anyobj=0x3ffe08c1db0) at
../../src/util/virobject.c:317
3 0x000003fffdba4688 in
virNetClientIOEventLoop (client=client@entry=0x3ffe08c1db0,
thiscall=thiscall@entry=0x2aa000fbfa0) at
../../src/rpc/virnetclient.c:1668
4 0x000003fffdba4b4c in
virNetClientIO (client=client@entry=0x3ffe08c1db0,
thiscall=0x2aa000fbfa0) at ../../src/rpc/virnetclient.c:1944
5 0x000003fffdba4d42 in
virNetClientSendInternal (client=client@entry=0x3ffe08c1db0,
msg=msg@entry=0x2aa000cc710, expectReply=expectReply@entry=true,
nonBlock=nonBlock@entry=false) at ../../src/rpc/virnetclient.c:2116
6 0x000003fffdba6268 in
virNetClientSendWithReply (client=0x3ffe08c1db0, msg=0x2aa000cc710) at
../../src/rpc/virnetclient.c:2144
7 0x000003fffdba6e8e in virNetClientProgramCall (prog=0x3ffe08c1120,
client=<optimized out>, serial=<optimized out>, proc=<optimized out>,
noutfds=<optimized out>, outfds=0x0, ninfds=0x0, infds=0x0,
args_filter=0x3fffdb64440
<xdr_virLogManagerProtocolDomainReadLogFileArgs>, args=0x3ffffffe010,
ret_filter=0x3fffdb644c0
<xdr_virLogManagerProtocolDomainReadLogFileRet>, ret=0x3ffffffe008) at
../../src/rpc/virnetclientprogram.c:329
8 0x000003fffdb64042 in
virLogManagerDomainReadLogFile (mgr=<optimized out>, path=<optimized
out>, inode=<optimized out>, offset=<optimized out>, maxlen=<optimized
out>, flags=0) at ../../src/logging/log_manager.c:272
9 0x000003ffe9e0315c in qemuDomainLogContextRead (ctxt=0x3ffe08c2980,
msg=0x3ffffffe1c0) at ../../src/qemu/qemu_domain.c:4422
10 0x000003ffe9e280a8 in qemuProcessReadLog (logCtxt=<optimized out>,
msg=msg@entry=0x3ffffffe288) at ../../src/qemu/qemu_process.c:1800
11 0x000003ffe9e28206 in qemuProcessReportLogError (logCtxt=<optimized
out>, msgprefix=0x3ffe9ec276a "qemu unexpectedly closed the monitor")
at ../../src/qemu/qemu_process.c:1836
12 0x000003ffe9e28306 in
qemuProcessMonitorReportLogError (mon=mon@entry=0x3ffe085cf10,
msg=<optimized out>, opaque=<optimized out>) at
../../src/qemu/qemu_process.c:1856
13 0x000003ffe9e452b6 in qemuMonitorIO (watch=<optimized out>,
fd=<optimized out>, events=<optimized out>, opaque=0x3ffe085cf10) at
../../src/qemu/qemu_monitor.c:726
14 0x000003fffda2e1a4 in virEventPollDispatchHandles (nfds=<optimized
out>, fds=0x2aa000fd980) at ../../src/util/vireventpoll.c:508
15 0x000003fffda2e398 in virEventPollRunOnce () at
../../src/util/vireventpoll.c:657
16 0x000003fffda2ca10 in virEventRunDefaultImpl () at
../../src/util/virevent.c:314
17 0x000003fffdba9366 in virNetDaemonRun (dmn=0x2aa000cc550) at
../../src/rpc/virnetdaemon.c:818
18 0x000002aa00024668 in main (argc=<optimized out>, argv=<optimized
out>) at ../../daemon/libvirtd.c:1541
Other code parts where the same problem was possible to occur are
fixed as well (qemuMigrationFinish, qemuProcessStart, and
qemuDomainSaveImageStartVM).
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
So far only QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY was reset, but only in
a single code path leaving post-copy enabled in quite a few cases.
https://bugzilla.redhat.com/show_bug.cgi?id=1425003
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
It's only called from qemuMigrationReset now so it doesn't need to be
exported and {tls,sec}Alias are always NULL.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This new API is supposed to reset all migration parameters to make sure
future migrations won't accidentally use them. This patch makes the
first step and moves qemuMigrationResetTLS call inside
qemuMigrationReset.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Migration parameters are either reset by the main migration code path or
from qemuProcessRecoverMigration* in case libvirtd is restarted during
migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Finished qemuMigrationRun does not mean the migration itself finished
(it might have just switched to post-copy mode). While resetting TLS
parameters is probably OK at this point even if migration is still
running, we want to consolidate the code which resets various migration
parameters. Thus qemuMigrationResetTLS will be called from the Confirm
phase (or at the end of the Perform phase in case of v2 protocol), when
migration is either canceled or finished.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuProcessRecoverMigrationOut doesn't explicitly call
qemuMigrationResetTLS relying on two things:
- qemuMigrationCancel resets TLS parameters
- our migration code resets TLS before entering
QEMU_MIGRATION_PHASE_PERFORM3_DONE phase
But this is not obvious and the assumptions will be broken soon. Let's
explicitly reset TLS parameters on all paths which do not kill the
domain.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
There is no async job running when a freshly started libvirtd is trying
to recover from an interrupted incoming migration. While at it, let's
call qemuMigrationResetTLS every time we don't kill the domain. This is
not strictly necessary since TLS is not supported when v2 migration
protocol is used, but doing so makes more sense.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We will need to store two more host CPU models and nested structs look
better than separate items with long complicated names.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
qemuProcessVerifyHypervFeatures is supposed to check whether all
requested hyperv features were actually honored by QEMU/KVM. This is
done by checking the corresponding CPUID bits reported by the virtual
CPU. In other words, it doesn't work for string properties, such as
VIR_DOMAIN_HYPERV_VENDOR_ID (there is no CPUID bit we could check). We
could theoretically check all 96 bits corresponding to the vendor
string, but luckily we don't have to check the feature at all. If QEMU
is too old to support hyperv features, the domain won't even start.
Otherwise, it is always supported.
Without this patch, libvirt refuses to start a domain which contains
<features>
<hyperv>
<vendor_id state='on' value='...'/>
</hyperv>
</features>
reporting internal error: "unknown CPU feature __kvm_hv_vendor_id.
This regression was introduced by commit v3.1.0-186-ge9dbe7011, which
(by fixing the virCPUDataCheckFeature condition in
qemuProcessVerifyHypervFeatures) revealed an old bug in the feature
verification code. It's been there ever since the verification was
implemented by commit v1.3.3-rc1-5-g95bbe4bf5, which effectively did not
check VIR_DOMAIN_HYPERV_VENDOR_ID at all.
https://bugzilla.redhat.com/show_bug.cgi?id=1439424
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
This header file has been created so that we can expose
internal functions to the test suite without making them
public: those in qemu_capabilities.h bearing the comment
/* Only for use by test suite */
are obvious candidates for being moved over.
Buggy condition meant that vcpu0 would not be iterated in the checks.
Since it's not hotpluggable anyways we would not be able to break the
configuration of a live VM.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1437013
Like all devices, add the 'id' option for mdevs as well. Patch also
adjusts the test accordingly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438431
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Depending on the architecture, requirements for ACPI and UEFI can
be different; more specifically, while on x86 UEFI requires ACPI,
on aarch64 it's the other way around.
Enforce these requirements when validating the domain, and make
the error message more accurate by mentioning that they're not
necessarily applicable to all architectures.
Several aarch64 test cases had to be tweaked because they would
have failed the validation step otherwise.
The capabilities used in test cases should match those used
during normal operation for the tests to make any sense.
This results in the generated command line for a few test
cases (most notably non-x86 test cases that were wrongly
assuming they could use -no-acpi) changing.
Instead of having a single function that probes the
architecture from the monitor and then sets a bunch of
basic capabilities based on it, have a separate function
for each part: virQEMUCapsInitQMPArch() only sets the
architecture, and virQEMUCapsInitQMPBasicArch() only sets
the capabilities.
This split will be useful later on, when we will want to
set basic capabilities from the test suite without having
to go through the pain of mocking the monitor.
Currently, if we want to zero out disk source (e,g, due to
startupPolicy when starting up a domain) we use
virDomainDiskSetSource(disk, NULL). This works well for file
based storage (storage type file, dir, or block). But it doesn't
work at all for other types like volume and network.
So imagine that you have a domain that has a CDROM configured
which source is a volume from an inactive pool. Because it is
startupPolicy='optional', the CDROM is empty when the domain
starts. However, the source element is not cleared out in the
status XML and thus when the daemon restarts and tries to
reconnect to the domain it refreshes the disks (which fails - the
storage pool is still not running) and thus the domain is killed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far our code is full of the following pattern:
dom = virGetDomain(conn, name, uuid)
if (dom)
dom->id = 42;
There is no reasong why it couldn't be just:
dom = virGetDomain(conn, name, uuid, id);
After all, client domain representation consists of tuple (name,
uuid, id).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In 9e2465834 a check that denies internal snapshots when pflash
based loader is configured for the domain. However, if there's
none and an user tries to do an internal snapshot they will
witness daemon crash as in that case vm->def->os.loader is NULL
and we dereference it unconditionally.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>