The new safe console handling introduced a possibility to deadlock the
qemu driver when a new console connection forcibly disconnects a
previous console stream that belongs to an already closed connection.
The virStreamFree function calls subsequently a the virReleaseConnect
function that tries to lock the driver while discarding the connection,
but the driver was already locked in qemuDomainOpenConsole.
Backtrace of the deadlocked thread:
0 0x00007f66e5aa7f14 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007f66e5aa3411 in _L_lock_500 () from /lib64/libpthread.so.0
2 0x00007f66e5aa322a in pthread_mutex_lock () from/lib64/libpthread.so.0
3 0x0000000000462bbd in qemudClose ()
4 0x00007f66e6e178eb in virReleaseConnect () from/usr/lib64/libvirt.so.0
5 0x00007f66e6e19c8c in virUnrefStream () from /usr/lib64/libvirt.so.0
6 0x00007f66e6e3d1de in virStreamFree () from /usr/lib64/libvirt.so.0
7 0x00007f66e6e09a5d in virConsoleHashEntryFree () from/usr/lib64/libvirt.so.0
8 0x00007f66e6db7282 in virHashRemoveEntry () from/usr/lib64/libvirt.so.0
9 0x00007f66e6e09c4e in virConsoleOpen () from /usr/lib64/libvirt.so.0
10 0x00000000004526e9 in qemuDomainOpenConsole ()
11 0x00007f66e6e421f1 in virDomainOpenConsole () from/usr/lib64/libvirt.so.0
12 0x00000000004361e4 in remoteDispatchDomainOpenConsoleHelper ()
13 0x00007f66e6e80375 in virNetServerProgramDispatch () from/usr/lib64/libvirt.so.0
14 0x00007f66e6e7ae11 in virNetServerHandleJob () from/usr/lib64/libvirt.so.0
15 0x00007f66e6da897d in virThreadPoolWorker () from/usr/lib64/libvirt.so.0
16 0x00007f66e6da7ff6 in virThreadHelper () from/usr/lib64/libvirt.so.0
17 0x00007f66e5aa0c5c in start_thread () from /lib64/libpthread.so.0
18 0x00007f66e57e7fcd in clone () from /lib64/libc.so.6
* src/qemu/qemu_driver.c: qemuDomainOpenConsole()
-- unlock the qemu driver right after acquiring the domain
object
In case an API fails with "cannot acquire state change lock", searching
for the API that possibly forgot to end its job is not always easy.
Let's keep track of the job owner and print it out for easier
identification.
As reported by Daniel Berrangé, we have a huge performance regression
for virDomainGetInfo() due to the change which makes virDomainEndJob()
save the XML status file every time it is called. Previous to that
change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
We made the change to be able to recover from libvirtd restart in the
middle of a job. However, only destroy and async jobs are taken care of.
Thus it makes more sense to only save domain state XML when these jobs
are started/stopped.
I noticed these compiler warnings when building for the s390 architecture.
* src/node_device/node_device_udev.c (udevDeviceMonitorStartup):
Mark unused variable.
* src/nodeinfo.c (linuxNodeInfoCPUPopulate): Avoid unused variable.
* src/qemu/qemu_command.c: Wire up -bios with <loader>
* tests/qemuxml2argvdata/qemuxml2argv-bios.args,
tests/qemuxml2argvdata/qemuxml2argv-bios.xml: Expand
existing BIOS test case to cover <loader>
This patch cleans up variables used to store boolean command flags that
are inquired by vshCommandOptBool to use the bool data type instead of
an integer.
Additionally this patch cleans up flag variables that are inferred from
existing flags.
The documentation for the flag doesn't clearly state that the flag only
enhances the output and the user needs to specify other flags to list
inactive domains, that are enhanced by this flag.
Below code failed to compile on a 32 bit machine with error
typewrappers.c: In function 'libvirt_intUnwrap':
typewrappers.c:135:5: error: logical 'and' of mutually exclusive tests is always false [-Werror=logical-op]
cc1: all warnings being treated as errors
The patch fixes this error.
The daemon-conf test script continues to be very fragile to
changes in libvirt. It currently fails 1 time in 3/4 due
to race conditions in startup/shutdown of the test script.
Replace it with a proper test case tailored to the code
being tested
* tests/Makefile.am: Remove daemon-conf, add libvirtdconftest
* tests/daemon-conf: Delete obsolete test
* tests/libvirtdconftest.c: Test config file handling
Rename existing daemonConfigLoad API to daemonConfigLoadFile and
add an alternative daemonConfigLoadData
* daemon/libvirtd-config.c, daemon/libvirtd-config.h: Add
daemonConfigLoadData and rename daemonConfigLoad to
daemonConfigLoadFile
* daemon/libvirtd.c: Update for renamed API
To enable creation of unit tests, split the libvirtd config file
loading code out into separate files.
* daemon/libvirtd.c: Delete config loading code / structs
* daemon/libvirtd-config.c, daemon/libvirtd-config.h: Config
file loading APIs
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Leak introduced in commit 0436d32. If we allocate an actions array,
but fail early enough to never consume it with the qemu monitor
transaction call, we leaked memory.
But our semantics of making the transaction command free the caller's
memory is awkward; avoiding the memory leak requires making every
intermediate function in the call chain check for error. It is much
easier to fix things so that the function that allocates also frees,
while the call chain leaves the caller's data intact. To do that,
I had to hack our JSON data structure to make it easy to protect a
portion of an arbitrary JSON tree from being freed.
* src/util/json.h (virJSONType): Name the enum.
(_virJSONValue): New field.
* src/util/json.c (virJSONValueFree): Use it to protect a portion
of an array.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONTransaction): Avoid
freeing caller's data.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive):
Free actions array on failure.
We can tell qemuDomainSnapshotFSThaw if we want it to report errors or
not. However, if we don't want to and an error has been already set by
previous qemuReportError() we must keep copy of that error not just a
pointer to it. Otherwise, it get overwritten if FSThaw reports an error.
When building on Fedora 17 (which uses gcc 4.7.0) with -O0 in CFLAGS,
three of the tests failed to compile.
cputest.c and qemuxml2argvtest.c had non-static structs defined
inside the macro that was being repeatedly invoked. Due to some so-far
unidentified change in gcc, the stack space used by variables defined
inside { } is not recovered/re-used when the block ends, so all these
structs have become additive (this is the same problem worked around
in commit cf57d345b). Fortunately, these two files could be fixed with
a single line addition of "static" to the struct definition in the
macro.
virnettlscontexttest.c was a bit different, though. The problem structs
in the do/while loop of macros had non-constant initializers, so it
took a bit more work and piecemeal initialization instead of member
initialization to get things to be happy.
In an ideal world, none of these changes should be necessary, but not
knowing how long it will be until the gcc regressions are fixed, and
since the code is just as correct after this patch as before, it makes
sense to fix libvirt's build for -O0 while also reporting the gcc
problem.
This bug resolves https://bugzilla.redhat.com/show_bug.cgi?id=810100
rpm builds for i686 were failing with a segfault in
networkxml2argvtest. Running under valgrind showed that a region of
memory was being referenced after it had been freed (as the result of
realloc - see the valgrind report in the BZ).
The problem (in replaceTokens() - added in commit 22ec60, meaning this
bug was in 0.9.10 and 0.9.11) was that the pointers token_start and
token_end were being computed based on the value of *buf, then *buf
was being realloc'ed (potentially moving it), then token_start and
token_end were used without recomputing them to account for movement
of *buf.
The solution is to change the code so that token_start and token_end
are offsets into *buf rather than pointers. This way there is only a
single pointer to the buffer, and nothing needs readjusting after a
realloc. (You may note that some uses of token_start/token_end didn't
need to be changed to add in "*buf +" - that's because there ended up
being a +*buf and -*buf which canceled each other out).
DV gets the credit for finding this bug and pointing out the valgrind
report.
Detected by valgrind. Leaks are introduced in commit b22eaa7.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): fix memory leaks.
How to reproduce?
% make && make -C tests check TESTS=qemuxml2argvtest
% cd tests && valgrind -v --leak-check=full ./qemuxml2argvtest
actual result:
==2143== 12 bytes in 2 blocks are definitely lost in loss record 74 of 179
==2143== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2143== by 0x39D90A67DD: xmlStrndup (xmlstring.c:45)
==2143== by 0x4F5EC0: virDomainDiskDefParseXML (domain_conf.c:3438)
==2143== by 0x502F00: virDomainDefParseXML (domain_conf.c:8304)
==2143== by 0x505FE3: virDomainDefParseNode (domain_conf.c:9080)
==2143== by 0x5069AE: virDomainDefParse (domain_conf.c:9030)
==2143== by 0x41CBF4: testCompareXMLToArgvHelper (qemuxml2argvtest.c:105)
==2143== by 0x41E5DD: virtTestRun (testutils.c:145)
==2143== by 0x416FA3: mymain (qemuxml2argvtest.c:399)
==2143== by 0x41DCB7: virtTestMain (testutils.c:700)
==2143== by 0x39CF01ECDC: (below main) (libc-start.c:226)
Signed-off-by: Alex Jia <ajia@redhat.com>
* configure.ac: Set WITH_SYSCTL only on Linux hosts
* daemon/Makefile.am: Conditionalize install-sysctl using WITH_SYSCTL
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Cc: Jason Helfman <jhelfman@e-e.com>
Every now & then, with parallel builds, we get a failure to
validate hvsupport.html.in. I eventually noticed that this
is because we get 2 instances of the generator running at
once.
We already list hvsupport.html.in in BUILT_SOURCES but this
was not working. It turns out the flaw is that we were
adding deps to the 'all:' target instead of the 'all-am:'
target. BUILT_SOURCES is a dep of 'all', so any custom
targets written in Makefile.am must use 'all-am:' so that
they don't get run until BUILT_SOURCES are completely
generated
* docs/Makefile.am: s/all/all-am/
Some of the test suites use fprintf with format specifiers
that are not supported on Win32 and are not fixed by gnulib.
The mingw32 compiler also has trouble detecting ssize_t
correctly, complaining that 'ssize_t' does not match
'signed size_t' (which it expects for %zd). Force the
cast to size_t to avoid this problem
* tests/testutils.c, tests/testutils.h: Fix printf
annotation on virTestResult. Use virVasprintf
instead of vfprintf
* tests/virhashtest.c: Use VIR_WARN instead of fprintf(stderr).
Cast to size_t to avoid mingw32 compiler bug
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the daemon is restarted it will lose list of active
USB devices assigned to active domains. Therefore we need
to rebuild this list on qemuProcessReconnect().
To prevent assigning one USB device to two domains,
we keep a list of assigned USB devices. On domain
startup - qemuProcessStart() - we insert devices
used by domain into the list but remove them only
on detach-device. Devices are, however, released
on qemuProcessStop() as well.
Introduce a set sub-RPMs, one per hypervisor, which can be used
as dependency targets by applications wishing to pull in the
full stack of packages required for a specific hypervisor. This
avoids the application needing to know what the hypervisor specific
package set is.
ie, applications should not need to know that using the libvirt
Xen hypervisor requires the 'xen' RPM - libvirt should take care
of that knowledge. All the application wants is 'libvirt-daemon-xen'
There are 5 sub-RPMs:
libvirt-daemon-qemu - non-native TCG based emulators
libvirt-daemon-kvm - native KVM hypervisor
libvirt-daemon-uml - User Mode linux
libvirt-daemon-xen - Xen, either via XenD or libxl
libvirt-daemon-lxc - Linux native containers
When driver modules get turned on, these sub-RPMs will also
gain dependencies on the appropriate driver module .so files
Take the libvirt RPM and split it into three pieces
- libvirt-daemon - libvirtd & other mandatory bits for its operation
- libvirt-daemon-config-network - the virbr0 config definition
- libvirt-daemon-config-nwfilter - the firewall config rules
For backwards compatibility with existing installs / application RPM
deps, the 'libvirt' RPM is retained, but will have a dependency on
the 3 new RPMs.
Currently documentation is split between the libvirt RPM and the
libvirt-devel RPM. In the client-only build there is no libvirt
RPM, so the docs need to live elsewhere. The obvious answer is a
dedicated libvirt-docs RPM. For back-compatibility make the
libvirt-devel RPM require the libvirt-docs RPM
* libvirt.spec.in: Create separate libvirt-docs RPM
Currently, we put no strains on escape sequence possibly leaving users
with console that cannot be terminated. However, not all ASCII
characters can be used as escape sequence. Only those falling in
@ - _ can be; implement and document this constraint.
* configure.ac docs/news.html.in libvirt.spec.in: update for the release
* po/*.po*: updated a number of languages translation including new
indian languages and regenerated
Originally, qemuDomainCheckEjectableMedia was entering monitor with qemu
driver lock. Commit 2067e31bf9, which I
made to fix that, revealed another issue we had (but didn't notice it
since the driver was locked): we didn't set nested job when
qemuDomainCheckEjectableMedia is called during migration. Thus the
original fix I made was wrong.
XenD-3.1 introduced managed domains. HV-domains have rtc_timeoffset
(hgd24f37b31030 from 2007-04-03), which tracks the offset between the
hypervisors clock and the domains RTC, and is persisted by XenD.
In combination with localtime=1 this had a bug until XenD-3.4
(hg5d701be7c37b from 2009-04-01) (I'm not 100% sure how that bug
manifests, but at least for me in TZ=Europe/Berlin I see the previous
offset relative to utc being applied to localtime again, which manifests
in an extra hour being added)
XenD implements the following variants for clock/@offset:
- PV domains don't have a RTC → 'localtime' | 'utc'
- <3.1: no managed domains → 'localtime' | 'utc'
- ≥3.1: the offset is tracked for HV → 'variable'
due to the localtime=1 bug → 'localtime' | 'utc'
- ≥3.4: the offset is tracked for HV → 'variable'
Current libvirtd still thinks XenD only implements <clock offset='utc'/>
and <clock offset='localtime'/>, which is wrong, since the semantic of
'utc' and 'localtime' specifies, that the offset will be reset on
domain-restart, while with 'variable' the offset is kept. (keeping the
offset over "virsh edit" is important, since otherwise the clock might
jump, which confuses certain guest OSs)
xendConfigVersion was last incremented to 4 by the xen-folks for
xen-3.1.0. I know of no way to reliably detect the version of XenD
(user space tools), which may be different from the version of the
hypervisor (kernel) version! Because of this only the change from
'utc'/'localtime' to 'variable' in XenD-3.1 is handled, not the buggy
behaviour of XenD-3.1 until XenD-3.4.
For backward compatibility with previous versions of libvirt Xen-HV
still accepts 'utc' and 'localtime', but they are returned as 'variable'
on the next read-back from Xend to libvirt, since this is what XenD
implements: The RTC is NOT reset back to the specified time on next
restart, but the previous offset is kept.
This behaviour can be turned off by adding the additional attribute
adjustment='reset', in which case libvirt will report an error instead
of doing the conversion. The attribute can also be used as a shortcut to
offset='variable' with basis='...'.
With these changes, it is also necessary to adjust the xen tests:
"localtime = 0" is always inserted, because otherwise on updates the
value is not changed within XenD.
adjustment='reset' is inserted for all cases, since they're all <
XEND_CONFIG_VERSION_3_1_0, only 3.1 introduced persistent
rtc_timeoffset.
Some statements change their order because code was moved around.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
https://bugzilla.redhat.com/show_bug.cgi?id=808979
The leak is really in virProcessInfoGetAffinity, as shown in the
valgrind output given in the above bug report - it calls CPU_ALLOC(),
but then fails to call CPU_FREE().
This leak has existed in every version of libvirt since 0.7.5.