If managed save fails at the right point in time, then the save
image can end up with 0 bytes in length (no valid header), and
our attempts in commit 55d88def to detect and skip invalid save
files missed this case.
* src/qemu/qemu_driver.c (qemuDomainSaveImageOpen): Also unlink
empty file as corrupt. Reported by Dennis Householder.
Currently, on device detach, we parse given XML, find the device
in domain object, free it and try to restore security labels.
However, in some cases (e.g. usb hostdev) parsed XML contains
less information than freed device. In usb case it is bus & device
IDs. These are needed during label restoring as a symlink into
/dev/bus is generated from them. Therefore don't drop device
configuration until security labels are restored.
Add an option for virsh undefine command, to remove associated storage
volumes while undefining a domain. This patch allows the user to remove
associated (libvirt managed ) storage volumes while undefining a domain.
The new option --storage for the undefine command takes a string
argument that consists of comma separated list of target or source path
of volumes to be undefined. Volumes are removed after the domain has
been successfully undefined,
If a volume is not part of a storage pool, the user is warned to remove
the volume in question himself.
Option --wipe-storage may be specified along with this, that ensures
the image is wiped before removing.
Option --remove-all-storage enables the user to remove all storage. The
name is chosen long as the users should be aware what they're about to
do.
Some gcc warnings about no % in a printf format string only
appear under --disable-nls. And configure.ac should automatically
be excluding modules on mingw without us having to be explicit.
Improving autobuild.sh to stress more combinations can only help.
* autobuild.sh: Add --disable-nls on first build. Update mingw
build to rely more on configure.ac detection.
In commit 6f84e110 I mistakenly set default migration speed to
33554432 Mb! The units of migMaxBandwidth is Mb, with conversion
handled in qemuMonitor{JSON,Text}SetMigrationSpeed().
Also, remove definition of QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX since
it is no longer used after reverting commit ef1065cf.
If an async job run on a domain will stop the domain at the end of the
job, a concurrently run query job can hang in qemu monitor and nothing
can be done with that domain from this point on. An attempt to start
such domain results in "Timed out during operation: cannot acquire state
change lock" error.
However, quite a few things have to happen at the right time... There
must be an async job running which stops a domain at the end. This race
was reported with dump --crash but other similar jobs, such as
(managed)save and migration, should be able to trigger this bug as well.
While this async job is processing its last monitor command, that is a
query-migrate to which qemu replies with status "completed", a new
libvirt API that results in a query job must arrive and stay waiting
until the query-migrate command finishes. Once query-migrate is done but
before the async job closes qemu monitor while stopping the domain, the
other thread needs to wake up and call qemuMonitorSend to send its
command to qemu. Before qemu gets a chance to respond to this command,
the async job needs to close the monitor. At this point, the query job
thread is waiting for a condition that no-one will ever signal so it
never finishes the job.
* src/qemu/qemu_hostdev.c (qemuDomainReAttachHostdevDevices):
pciDeviceListFree(pcidevs) in the end free()s the device even if
it's in use by other domain, which can cause a race.
How to reproduce:
<script>
virsh nodedev-dettach pci_0000_00_19_0
virsh start test
virsh attach-device test hostdev.xml
virsh start test2
for i in {1..5}; do
echo "[ -- ${i}th time --]"
virsh nodedev-reattach pci_0000_00_19_0
done
echo "clean up"
virsh destroy test
virsh nodedev-reattach pci_0000_00_19_0
</script>
Device pci_0000_00_19_0 dettached
Domain test started
Device attached successfully
error: Failed to start domain test2
error: Requested operation is not valid: PCI device 0000:00:19.0 is in use by domain test
[ -- 1th time --]
Device pci_0000_00_19_0 re-attached
[ -- 2th time --]
Device pci_0000_00_19_0 re-attached
[ -- 3th time --]
Device pci_0000_00_19_0 re-attached
[ -- 4th time --]
Device pci_0000_00_19_0 re-attached
[ -- 5th time --]
Device pci_0000_00_19_0 re-attached
clean up
Domain test destroyed
Device pci_0000_00_19_0 re-attached
The patch also fixes another problem, there won't be error like
"qemuDomainReAttachHostdevDevices: Not reattaching active
device 0000:00:19.0" in daemon log if some device is in active.
As pciResetDevice and pciReattachDevice won't be called for
the device anymore. This is sensible as we already reported
error when preparing the device if it's active. Blindly trying
to pciResetDevice & pciReattachDevice on the device and getting
an error is just redundant.
This patch fixes two problems:
1) The device will be reattached to host even if it's not
managed, as there is a "pciDeviceSetManaged".
2) The device won't be reattached to host with original
driver properly. As it doesn't honor the device original
properties which are maintained by driver->activePciHostdevs.
Commit d336dbdb tried to refactor sanlock to avoid building it
on RHEL for architectures where it is not available, but used
the wrong conditional.
* libvirt.spec.in (with_sanlock): Use %ifarch, not %ifnarch.
I was wondering why 'virsh edit' didn't support the same
'--inactive' option as 'virsh dumpxml'; reading the source
code showed that --inactive was already implied, and that
the only way to alter a running guest rather than affecting
next boot is by hot-plugging individual devices, or by
something complex like saving the guest and modifying the
save image.
* tools/virsh.pod (define, edit): Mention behavior when guest is
already running.
Commit f2013c9dd1 added implementation of
virDomainSnapshotListChildrenNames override export, but registration of
the newly exported function was not added.
*python/libvirt-override.c: - register export of function
This chunk of code below repeated in several functions, factor it into
a helper method virDomainLiveConfigHelperMethod to eliminate duplicated code
based on Eric and Adam's suggestion. I have tested it for all the
relevant APIs changed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
If parsing of arguments failed, virsh did silently exit returning and
error state, but not specifying the possible problem.
* tools/virsh: cmdNodesuspend: - error handling added
Detected by valgrind. Leak introduced in commit 82ff25e.
* tests/nodeinfotest.c: avoid memory leak on nodeinfo test case.
* how to reproduce?
% cd tests && valgrind -v --leak-check=full ./nodeinfotest
* actual valgrind result:
==22147== 65 bytes in 1 blocks are definitely lost in loss record 14 of 29
==22147== at 0x4A0610F: realloc (vg_replace_malloc.c:525)
==22147== by 0x330D6FED94: __vasprintf_chk (in /lib64/libc-2.12.so)
==22147== by 0x426697: virVasprintf (stdio2.h:199)
==22147== by 0x426757: virAsprintf (util.c:1695)
==22147== by 0x41585F: linuxTestNodeInfo (nodeinfotest.c:108)
==22147== by 0x416B21: virtTestRun (testutils.c:141)
==22147== by 0x4157EA: mymain (nodeinfotest.c:140)
==22147== by 0x416217: virtTestMain (testutils.c:696)
==22147== by 0x330D61ECDC: (below main) (in /lib64/libc-2.12.so)
==22147==
==22147== LEAK SUMMARY:
==22147== definitely lost: 65 bytes in 1 blocks
==22147== indirectly lost: 0 bytes in 0 blocks
==22147== possibly lost: 0 bytes in 0 blocks
==22147== still reachable: 126,126 bytes in 1,341 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
If the vol object is newly created, it increases the volumes count,
but doesn't decrease the volumes count when do cleanup. It can
cause libvirtd to crash when one trying to free the volume objects
like:
for (i = 0; i < pool->volumes.count; i++)
virStorageVolDefFree(pool->volumes.objs[i]);
It's more reliable if we add the newly created vol object in the
end.
Improve the documentation of what forms a valid <address> element,
since these elements appear in numerous devices.
* docs/formatdomain.html.in (elementsAddress): New section.
(elementsControllers, elementsUSB, elementsNICS, elementsInput)
(elementsHub, elementsCharChannel, elementsSound): Refer to it.
Commit 4d9e51f6 fixed a 'make uninstall' failure, but failed
to follow other conventions already present in src/Makefile.am.
In particular, we prefer MKDIR_P over mkdir -p, and should
have a matching rmdir during uninstall for every directory
created during install (the idea being that uninstall in a
DESTDIR should be clean, while installation in the final
system should not fail with non-empty directories left behind).
* tools/Makefile.am (install-sysconfig, install-initscript)
(install-systemd): Use MKDIR_P.
(uninstall-sysconfig, uninstall-initscript, uninstall-systemd):
Also remove directories.
* daemon/Makefile.am (install-data-local, install-data-polkit)
(install-logrotate, install-sysconfig, install-sysctl)
(install-init-redhat, install-init-upstart, install-init-systemd)
(install-data-sasl): Use MKDIR_P.
(uninstall-data-polkit, uninstall-sysconfig, uninstall-sysctl)
(uninstall-init-redhat, uninstall-init-upstart)
(uninstall-init-systemd): Also remove directory.
(uninstall-logrotate): New rule.
(uninstall-local): Add uninstall-logrotate.
When destroying a domain qemuDomainDestroy kills its qemu process and
starts a new job, which means it unlocks the domain object and locks it
again after some time. Although the object is usually unlocked for a
pretty short time, chances are another thread processing an EOF event on
qemu monitor is able to lock the object first and does all the cleanup
by itself. This leads to wrong shutoff reason and lifecycle event detail
and virDomainDestroy API incorrectly reporting failure to destroy an
inactive domain.
Reported by Charlie Smurthwaite.
Current "-ay | -an" has problems on pool starting/refreshing if
the volumes are clustered. Rommer has posted a patch to list 2
months ago.
https://www.redhat.com/archives/libvir-list/2011-October/msg01116.html
But IMO we shouldn't skip the inactived vols. So this is a squashed
patch by Rommer.
Signed-off-by: Rommer <rommer@active.by>
Network disks don't have paths to be resolved or files to be checked
for ownership. ee3efc41e6 checked this
for some image label functions, but was partially reverted in a
refactor. This finishes adding the check to each security driver's
set and restore label methods for images.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Make uninstall currently fails with the following message:
rmdir /etc/sasl2/
rmdir: failed to remove `/etc/sasl2/': Directory not empty
That's fine (correct in fact) so force the command to return success
with || :
One of the xml tests in the test suite was created using a
now-deprecated qemu machine type ("fedora-13", which was only ever
valid for Fedora builds of qemu). Although strictly speaking it's not
necessary to replace it with an actual supported qemu machine type
(since the xml in question is never actually sent to qemu), this patch
changes it to the actually-supported "pc-0.13" just for general
tidiness. (Also, on some Fedora builds which contain a special patch
to rid the world of "fedora-13", having it mentioned in the test suite
will cause make check to fail.)
This patch addresses https://bugzilla.redhat.com/show_bug.cgi?id=760442
When a network has any forward type other than route, nat or none, the
network configuration should be done completely external to libvirt -
libvirt only uses these types to allow configuring guests in a manner
that isn't tied to a specific host (all the host-specific information,
in particular interface names, port profile data, and bandwidth
configuration is in the network definition, and the guest
configuration only references it).
Due to a bug in the bridge network driver, libvirt was adding iptables
rules for networks with forward type='bridge' etc. any time libvirtd
was restarted while one of these networks was active.
This patch eliminates that error by only "reloading" iptables rules if
forward type is route, nat, or none.
Currently qemuDomainAssignPCIAddresses() is called to assign addresses
to PCI devices.
We need to do something similar for devices with spapr-vio addresses.
So create one place where address assignment will be done, that is
qemuDomainAssignAddresses().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
For the PPC64 pseries machine type we need to add address information
for the spapr-vty device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
In QEMU PPC64 we have a network device called "spapr-vlan". We can specify
this using the existing syntax for network devices, however libvirt
currently rejects "spapr-vlan" in virDomainNetDefParseXML() because of
the "-". Fix the code to accept "-".
* src/conf/domain_conf.c (virDomainNetDefParseXML): Allow '-' in
model name, and be more efficient.
* docs/schemas/domaincommon.rng: Limit valid model names to match code.
Based on a patch by Michael Ellerman.
Detected by valgrind. Leak introduced in commit 88a993b:
* tools/virsh.c: fix memory leak on cmdDomblklist.
* how to reproduce?
% valgrind -v --leak-check=full virsh domblklist <domain name>
* actual valgrind result:
==6573== 1,836 bytes in 1 blocks are definitely lost in loss record 110 of 124
==6573== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==6573== by 0x330D71497D: xdr_string (in /lib64/libc-2.12.so)
==6573== by 0x4D26CED: xdr_remote_nonnull_string (remote_protocol.c:30)
==6573== by 0x4D28138: xdr_remote_domain_get_xml_desc_ret (remote_protocol.c:1418)
==6573== by 0x4D3C0C2: virNetMessageDecodePayload (virnetmessage.c:382)
==6573== by 0x4D3279F: virNetClientProgramCall (virnetclientprogram.c:382)
==6573== by 0x4D0D50B: callWithFD (remote_driver.c:4339)
==6573== by 0x4D0D5AB: call (remote_driver.c:4360)
==6573== by 0x4D16EAF: remoteDomainGetXMLDesc (remote_client_bodies.h:861)
==6573== by 0x4CF9F4F: virDomainGetXMLDesc (libvirt.c:4098)
==6573== by 0x4154D9: cmdDomblklist (virsh.c:1722)
==6573== by 0x4149E2: vshCommandRun (virsh.c:16365)
==6573==
==6573== 46,009 (352 direct, 45,657 indirect) bytes in 1 blocks are definitely lost in loss record 123 of 124
==6573== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==6573== by 0x3318286DC6: xmlXPathNewContext (in /usr/lib64/libxml2.so.2.7.6)
==6573== by 0x4C79AE2: virXMLParseHelper (xml.c:779)
==6573== by 0x415512: cmdDomblklist (virsh.c:1726)
==6573== by 0x4149E2: vshCommandRun (virsh.c:16365)
==6573== by 0x427743: main (virsh.c:17867)
==6573==
==6573== LEAK SUMMARY:
==6573== definitely lost: 2,188 bytes in 2 blocks
==6573== indirectly lost: 45,657 bytes in 332 blocks
==6573== possibly lost: 0 bytes in 0 blocks
==6573== still reachable: 128,034 bytes in 1,364 blocks
==6573== suppressed: 0 bytes in 0 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
When parsing ppc64 models on an x86 host an out-of-memory error message is displayed due
to it checking for retcpus being NULL. Fix this by removing the check whether retcpus is NULL
since we will realloc into this variable.
Also in the X86 model parser display the OOM error at the location where it happens.
Fix memory leak:
==27534== 24 bytes in 1 blocks are definitely lost in loss record 207 of 530
==27534== at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==27534== by 0x38EC26EC37: vasprintf (in /lib64/libc-2.13.so)
==27534== by 0x4E998E6: virVasprintf (util.c:1677)
==27534== by 0x4E999F1: virAsprintf (util.c:1695)
==27534== by 0x4F1EAAC: nodeGetInfo (nodeinfo.c:593)
==27534== by 0x47948F: qemuCapsInitCPU (qemu_capabilities.c:855)
==27534== by 0x4796B1: qemuCapsInit (qemu_capabilities.c:915)
==27534== by 0x456550: qemuCreateCapabilities (qemu_driver.c:245)
==27534== by 0x4578C4: qemudStartup (qemu_driver.c:580)
==27534== by 0x4F20886: virStateInitialize (libvirt.c:852)
==27534== by 0x420E55: daemonRunStateInit (libvirtd.c:1156)
==27534== by 0x4E94C56: virThreadHelper (threads-pthread.c:157)
Mark this leaked variable as const char * when it is passed into another
function.
Pool creates new workers dynamically. However, it is possible
for a pool to have no workers. If we want to free that pool,
we don't want to wait on quit condition as it will never be
signaled.
Add support for newly supported Intel cpu features. Newly supported
flags are: pclmuldq, dtes64, smx, fma, pdcm, movbe, xsave, osxsave and
avx. This adds support for Intel's Sandy Bridge platform.
Reported by Alex Jia <ajia@redhat.com>. Function cmdDomIfGetLink did not
set a success return value on success path.
Signed-off-by: Alex Jia<ajia@redhat.com>
A preparatory patch for DHCP snooping where we want to be able to
differentiate between a VM's interface using the tuple of
<VM UUID, Interface MAC address>. We assume that MAC addresses could
possibly be re-used between different networks (VLANs) thus do not only
want to rely on the MAC address to identify an interface.
At the current 'final destination' in virNWFilterInstantiate I am leaving
the vmuuid parameter as ATTRIBUTE_UNUSED until the DHCP snooping patches arrive.
(we may not post the DHCP snooping patches for 0.9.9, though)
Mostly this is a pretty trivial patch. On the lowest layers, in lxc_driver
and uml_conf, I am passing the virDomainDefPtr around until I am passing
only the VM's uuid into the NWFilter calls.
This patch cleans up return codes in the nwfilter subsystem.
Some functions in nwfilter_conf.c (validators and formatters) are
keeping their bool return for now and I am converting their return
code to true/false.
All other functions now have failure return codes of -1 and success
of 0.
[I searched for all occurences of ' 1;' and checked all 'if ' and
adapted where needed. After that I did a grep for 'NWFilter' in the source
tree.]
Detected by valgrind. Leak introduced in commit dc675f3:
* tools/virsh.c: fix memory leak on cmdDomIfGetLink.
* how to reproduce?
% valgrind -v --leak-check=full virsh domif-getlink <domain name> 0
* actual valgrind result:
==13102== 18 bytes in 1 blocks are definitely lost in loss record 9 of 47
==13102== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==13102== by 0x322A6A67DD: xmlStrndup (in /usr/lib64/libxml2.so.2.7.6)
==13102== by 0x414892: cmdDomIfGetLink (virsh.c:1538)
==13102== by 0x4136A2: vshCommandRun (virsh.c:16363)
==13102== by 0x4253FB: main (virsh.c:17865)
==13102==
==13102== LEAK SUMMARY:
==13102== definitely lost: 18 bytes in 1 blocks
==13102== indirectly lost: 0 bytes in 0 blocks
==13102== possibly lost: 0 bytes in 0 blocks
==13102== still reachable: 127,888 bytes in 1,361 blocks
==13102== suppressed: 0 bytes in 0 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
Detected by valgrind. Leak introduced in commit e9bd9a0:
* tools/virsh.c: fix memory leak on cmdBlkdeviotune.
* how to reproduce?
% valgrind -v --leak-check=full virsh blkdeviotune <domain name> <block device>
* actual valgrind result:
==12759== 576 bytes in 1 blocks are definitely lost in loss record 18 of 29
==12759== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==12759== by 0x42134E: _vshCalloc.clone.2 (virsh.c:422)
==12759== by 0x4217CB: cmdBlkdeviotune (virsh.c:6364)
==12759== by 0x4136A2: vshCommandRun (virsh.c:16363)
==12759== by 0x4253FB: main (virsh.c:17865)
==12759==
==12759== LEAK SUMMARY:
==12759== definitely lost: 576 bytes in 1 blocks
==12759== indirectly lost: 0 bytes in 0 blocks
==12759== possibly lost: 0 bytes in 0 blocks
==12759== still reachable: 126,964 bytes in 1,342 blocks
==12759== suppressed: 0 bytes in 0 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
Jiri Denemark reported an instance of bootstrapping libvirt
failing when run inside a sandbox, traced to rpm trying to
access /var/ which was not permitted by the sandbox.
Alex Jia reported that 0.9.8-rc1 failed to bootstrap if patch(1)
is not installed.
* bootstrap.conf (buildreq): Avoid rpm call if python-config
exists. Also, require patch, in case we have gnulib-local diffs.
In some error situations, the function testDomainRestoreFlags() could
unlock the test driver mutex without first locking it. This patch
moves the lock operation earlier, so that it occurs before any
potential jump down to the unlock call.
I found this problem while auditing the test driver lock usage to
determine the cause of a hang while running the following test:
cd tests; while true; do printf x; ./undefine; done
This patch *does not* solve that problem, but we now understand its
actual source, and danpb is working on a patch.
https://bugzilla.redhat.com/show_bug.cgi?id=738725
Commit ecd8725 tried to silence a spurious warning on the initial
libvirt install, and commit ba6cbb1 tried to fix up the logic to the
correct Fedora version, but the warning was still present due to a
logic bug: since %{fedora} and %{rhel} are never simulatanously
set, then 0%{rhel} <= 6 made the %if always true. Checking for
minimum versions (via >=) is okay, but checking for maximum versions
(via <=) requires a prerequisite test that the platform being tested
is non-zero.
Also fix a bogus setting of with_libxl (although we previously
hard-code with_libxl to 0 for rhel earlier in the file, so this
was not as severe a bug).
* libvirt.spec.in (with_cgconfig): Don't enable cgconfig on F16.
Over time, Fedora and RHEL RPMs have often backported upstream
patches that touched configure.ac and/or Makefile.am; this
necessitates rerunning the autotools for the patch to be effective.
Making this a one-liner spec tweak will make it easier for future
backports to pull patches without having to find all the places
to touch to properly use the autotools. Meanwhile, there have been
historical instances where an update in the autotools caused FTBFS
situations, so this is not on by default.
* libvirt.spec.in (enable_autotools): New variable, default off.
(BuildRequires): Conditionally add autotools.
(%build): Conditionally use them before configure.
* mingw32-libvirt.spec.in: Likewise.
The installation rules for the libvirt-guests.service were
totally broken
- Installing in the wrong location
- The location was not overridable
- The install-systemd rule was not invoked anywhere
- The install-systemd rule was not invoking install-initscript
which it depends on
- The installed service file lacked a .service extension
* tools/Makefile.am: Fix install of libvirt-guests.service