As discussed before, this simple script should help with debugging
deadlocks, although there are still some caveats. RWLocks are not
handled by this and if your deadlock if very racy, it may not lock
up when running with this script due to the slowdown.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When one domain is being undefined and at the same time started, for
example, there is a possibility of a rare problem occuring.
- Thread 1 does virDomainUndefine(), has the lock, checks that the
domain is active and because it's not, calls
virDomainObjListRemove().
- Thread 2 does virDomainCreate() and tries to lock the domain.
- Thread 1 needs to lock domain list in order to remove the domain from
it, but must unlock domain first (proper order is to lock domain list
first and the domain itself second).
- Thread 2 grabs the lock, starts the domain and releases the lock.
- Thread 1 grabs the lock and removes the domain from list.
With this patch:
- qemuDomainRemoveInactive() creates a QEMU_JOB_MODIFY if that's
possible, but since it must remove the domain from list either way,
it continues even when starting the job failed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1150505
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When the list of domains is fetched and being printed, but in the
meantime one domain was undefined before its status was fetched, the
output then includes domain with "no state". With this patch, such
domain is skipped over as consecutive 'virsh list --all' (or the same
one ran a second later) wouldn't list it anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When daemon is killed right in the middle of probing a qemu binary for
its capabilities, the qemu process is left running. Next time the
daemon is starting, it cannot start the probing qemu process because the
one that's already running does have the pidfile flock()'d.
Reported-by: Wang Yufei <james.wangyufei@huawei.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This function is used to cleanup a pidfile doing whatever it takes, even
killing the owning process.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
We were missing check for the fact that the storage driver was found and
in case there is no vbox storage driver available, daemon raised the
following error each start:
error : virRegisterStorageDriver:592 : driver in
virRegisterStorageDriver must not be NULL
Fixing this makes the condition unified with networkDriver registration
in vbox as well.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Particularly in qemuBuildNumaArgStr(), there was a need for the advice
due to memory backing, which needs to know the nodeset it will be pinned
to. With newer qemu this caused the following error when starting
domain:
error: internal error: Advice from numad is needed in case of
automatic numa placement
even when starting perfectly valid domain, e.g.:
...
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
<cpu>
<numa>
<cell id='0' cpus='0' memory='524288'/>
<cell id='1' cpus='1' memory='524288'/>
</numa>
</cpu>
...
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1138545
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Hotplugging and hotunplugging char devices is only supported through
'-device' and the check for device capability should be independently.
Coverity also complains about 'tmpChr->info.alias' could be NULL and we
are dereferencing it but it somehow only in this case don't recognize
that the value is set by 'qemuAssignDeviceChrAlias' so it's clearly
false positive. Add sa_assert to make coverity happy.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Coverity is complaining about overwriting value in 'rc' variable
without using the old value because it somehow doesn't recognize that
the value is used by MACRO. The 'rc' variable is there only for checking
return code so it's save to remove it and make coverity happy.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Since commit 3f99d64 no new scsi_host pools can be defined
if one of the already defined scsi_host pools does not refer
to an accessible scsi_host adapter.
Relax the check by skipping over these inaccessible pools
when checking for duplicates.
If both source adapters are specified by a parent address,
just comparing the address is faster and catches even addresses
that do not refer to valid adapters.
This macro seems to be defined only on linux/unix and it fails during
mingw build. Its value is '16' (taken from net/if.h) so define it if
it's not defined.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Commit 6c9a8a4 (Oct 2014) exposed a long-standing issue on 32-bit
machines: code related to virDomainSetMemoryParameters has always
been documented as using a 64-bit limit, but it was implemented by
calling virDomainParseMemory which enforced an 'unsigned long'
limit. Since VIR_DOMAIN_MEMORY_PARAM_UNLIMITED capped to a
long is -1, but virDomainParseScaledValue no longer accepts
negative values, an attempt to use 2^53-1 as a hard memory limit
started failing the testsuite. However, the problem with capping
things artificially low has existed for much longer - ever since
commits 4888f0fb and 2e22f23 (Mar 2012) switched internal tracking
from 'unsigned long' to 'unsigned long long' (prior to that time,
the cap was a side-effect of the choice of types). We _have_ to
cap the balloon memory values, (no thanks to baked in 'unsigned long'
of API such as virDomainSetMaxMemory or virDomainGetInfo with no
counterpart API that guarantees 64-bit access to those numbers)
but memory parameters have never needed the artificial limit.
At any rate, the solution is to make the parser function gain a
parameter, and only do the reduced 32-bit cap for the values that
are constrained due to API.
* src/conf/domain_conf.h (_virDomainMemtune): Add comments.
* src/conf/domain_conf.c (virDomainParseMemory): Add parameter.
(virDomainDefParseXML): Adjust callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
commit 3e1e16aa8d (Use a port from the
migration range for NBD as well) changed ndb port allocation from
remotePorts to migrationPorts, but did not change the port releasing
process, which makes an error when migrating several times (above 64):
error: internal error: Unable to find an unused port in range
'migration' (49152-49215)
https://bugzilla.redhat.com/show_bug.cgi?id=1159245
Signed-off-by: Weiwei Li <nuonuoli@tencent.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
On error, libxlMakeDomBuildInfo() frees the caller-provided
libxl_domain_build_info struct embedded in libxl_domain_config,
causing a segfault
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9c13020700 (LWP 40988)]
(gdb) bt
0 0x00007f9c162f95b4 in free () from /lib64/libc.so.6
1 0x00007f9c0d0965ad in libxl_bitmap_dispose () from
/usr/lib64/libxenlight.so.4.4
2 0x00007f9c0d0a73bf in libxl_domain_build_info_dispose ()
from /usr/lib64/libxenlight.so.4.4
3 0x00007f9c0d0a7974 in libxl_domain_config_dispose () from
/usr/lib64/libxenlight.so.4.4
4 0x00007f9c0d2e00c5 in libxlDomainStart (driver=0x7f9c0400e4e0,
vm=0x7f9c0412b0d0, start_paused=false, restore_fd=-1) at
libxl/libxl_domain.c:1323
5 0x00007f9c0d2e1d4b in libxlDomainCreateXML (conn=0x7f9c000009a0,...)
at libxl/libxl_driver.c:660
Remove the call to libxl_domain_build_info_dispose() from
libxlMakeDomBuildInfo(). On error, callers will dispose the
libxl_domain_config object, which in turn disposes the build info.
With the introduction of the libxlDomainGetEmulatorType function,
it is trivial to support a user-specfied <emulator> in the libxl
driver. This patch is based loosely on David Scott's old patch
to do the same
https://www.redhat.com/archives/libvir-list/2013-April/msg02119.html
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
It makes sense for none of the callers to have negative value as an
output and, fortunately, if anyone tried defining domain with negative
memory or any other value parsed by virDomainParseScaledValue(), the
resulting value was 0. That means we can error out during parsing as
it won't break anything.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1155843
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
After cidr_format is allocated by virAsprintf and used by vshPrintExtra
it needs to be freed.
Fix the following memory leak from valgrind:
18 bytes in 1 blocks are definitely lost in loss record 41 of 192
at 0x4C29BBD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x85CE36F: __vasprintf_chk (vasprintf_chk.c:80)
by 0x4EE52D5: UnknownInlinedFun (stdio2.h:210)
by 0x4EE52D5: virVasprintfInternal (virstring.c:459)
by 0x4EE53CA: virAsprintfInternal (virstring.c:480)
by 0x14FE96: cmdNetworkDHCPLeases (virsh-network.c:1378)
by 0x13006B: vshCommandRun (virsh.c:1915)
by 0x12A9E1: main (virsh.c:3699)
Signed-off-by: Luyao Huang <lhuang@redhat.com>
The virGetSCSIHostNumber function return type is int, however
its stubbed version returns NULL. That results in a build fail
on systems that use the stubbed version. Fix by using a proper
return type.
Currently, build fails on FreeBSD because its struct ifreq does not
have ifr_hwaddr member. In order to fix that, check if this member
is present, otherwise fall back to the stub version of the
virNetDev{Add,Del}Multi functions.
The complaint is that if cleanup is called when virFileReadAll fails,
then mcast->entries is NULL and could be dereferenced in the clear
function. After following the code some - I saw that the caller to
the function (virNetDevGetMulticastTable) will also call
virNetDevMcastListClear if this function returns -1, so this
isn't necessary, so I removed the call.
Coverity complains that because the for loop is from 0 to 5 (max tokens)
and the impending switch/case statements used each of the #define values
that the 'default' wouldn't reachable. This patch will convert the #define's
into enum's and add the obligatory dead_error_begin marker for these type
situations.
Signed-off-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1140981 reports that
the qemu-kvm shipped as part of RHEL 7.0 intentionally[1] cripples
block jobs by removing the 'block-stream' QMP command, while still
leaving 'block-job-cancel' as an unusable no-op. Meanwhile, we
already had existing code that checked whether block jobs were
completely missing (such as qemu 0.15), old style (cancel is
synchronous, and all commands spelled with '_'), or new style
(cancel is asynchronous, and all commands spelled with '-'), and
used that three-way probe to give decent error messages. At the
time that code was added, all existing qemu versions fell in one
of three buckets, and the code was using the presence of
'block-job-cancel' as the witness of which of the three buckets.
But now that RHEL qemu has shipped with intentionally crippled
'block-stream', we have a fourth bucket, which results in ugly
error messages when trying 'virsh blockpull':
error: Requested operation is not valid: Command 'block-stream' is not found
In reality, the fourth bucket should be treated the same as the
first bucket (no block job support); we can do that by realizing
that no existing build of qemu has working block-stream while
lacking block-job-cancel, so it is easiest to change our witness
to the command that starts a job rather than ends one. We still
act correctly regarding command spelling and whether cancel is
asynchronous. And on crippled RHEL builds, we now get the desired:
error: unsupported configuration: block jobs not supported with this qemu binary
[1] The intentional cripple is limited to qemu-kvm of RHEL; when using
qemu-kvm-rhev of RHEV, block job functionality is supported. Don't ask
me to explain the "why" behind it all - I'm just dealing with fallout
from someone else's decision.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCKJOB_SYNC): Tweak comment.
* src/qemu/qemu_capabilities.c (virQEMUCapsCommands): Look for stream
rather than cancel when determining the flavor of block jobs supported.
Signed-off-by: Eric Blake <eblake@redhat.com>
The code that parses the schema from the URI touches the "hosts[0]"
member of the storage file source structure in case the URI contains a
schema. The hosts array was not yet allocated at the point in the code
where the transport protocol was parsed and set. This lead to a crash of
libvirtd.
Fix the code by allocating the "hosts" array upfront and add a test case
to verify this scenario. (Unfortunately this requires shuffling the test
case numbers too).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1156288
Now that all offenders have been cleaned, turn on a syntax-check
rule to prevent future offenders.
* cfg.mk (sc_prohibit_static_zero_init): New rule.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Avoid false
positive.
Signed-off-by: Eric Blake <eblake@redhat.com>
C guarantees that static variables are zero-initialized. Some older
compilers (and also gcc -fno-zero-initialized-in-bss) create larger
binaries if you explicitly zero-initialize a static variable.
* tests/eventtest.c: Fix initialization.
* tests/testutils.c: Likewise.
* tests/virhostdevtest.c: Likewise.
* tests/virportallocatortest.c: Likewise.
* tests/virscsitest.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
C guarantees that static variables are zero-initialized. Some older
compilers (and also gcc -fno-zero-initialized-in-bss) create larger
binaries if you explicitly zero-initialize a static variable.
* tools/virsh-console.c (got_signal): Drop unused variable.
* tools/virsh-domain.c: Fix initialization.
* tools/virsh.c: Likewise.
* tools/virt-host-validate-common.c (virHostMsgWantEscape):
Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
We weren't ever using the value for anything other than being non-zero.
* src/util/viraudit.h (virAuditLog): Change signature.
* src/util/viraudit.c (virAuditLog): Update user.
* daemon/libvirtd.c (main): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
As I've pushed 5892944f I haven't noticed one small nitpick.
There was this backslash missing on the line 1231 in the
enumeration of libraries to be added to vbox storage driver. This
resulted in nondeterministic build which sometimes succeeded and
sometimes failed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1146837
Resolve a crash in libvirtd resulting from commit id 'a4bd62ad' (1.0.6)
which added parentaddr and unique_id to allow unique identification of
a scsi_host, but assumed that all the pool entries and the incoming
definition would be similarly defined. If the existing pool uses the
'name' attribute and an incoming pool is using the parentaddr/unique_id,
then the code will attempt to compare the existing name string against
the incoming name string which doesn't exist (is NULL) and results in
a core (STREQ).
Conversely, if the existing pool used the parentaddr/unique_id and the
to be defined pool used the name, then the comparison would be against
the parentaddr, but since the incoming pool doesn't have one - that would
leave the comparison against a parentaddr of all 0's and a unique_id of 0,
which will always comparison to fail. This means someone could define the
same source adapter for two pools
In order to resolve this, adjust the code to get the 'host#' to be used
by the storage scsi backend in order to check/start the pool and make sure
the incoming definition doesn't match any of the existing pool defs.
https://bugzilla.redhat.com/show_bug.cgi?id=1141621
As part of attach processing, assign the device aliases by calling
qemuAssignDeviceAliases during qemuDomainQemuAttach once all the devices
are found after the qemuParseCommandLinePid processing.
This will alleviate a symptom that caused a libvirtd crash during an
attempted device detach.