Currently, monitoring QEMU virtual machines with standard Unix
sysadmin tools is harder than it has to be. The QEMU command line is
often miles long and mostly redundant, it's hard to tell which process
is which.
This patch reorders the QEMU -name argument to be the first, so it's
immediately visible in "ps x", htop and "atop -c" output.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=827519
The problem is that an interface with type='hostdev' will have an
alias of the form "hostdev%d", while the function that looks through
existing netdevs to determine the name to use for a new addition will
fail if there's an existing entry that does not match the form
"net%d".
This is another of the handful of places that need an exception due to
the hybrid nature of <interface type='hostdev'> (which is not exactly
an <interface> or a <hostdev>, but is both at the same time).
If we migrate to fd, spec->fwdType is not MIGRATION_FWD_DIRECT,
we will close spec->dest.fd.local in qemuMigrationRun(). So we
should set spec->dest.fd.local to -1 in qemuMigrationRun().
Bug present since 0.9.5 (commit 326176179).
==3240== 23 bytes in 1 blocks are definitely lost in loss record 242 of 744
==3240== at 0x4C2A4CD: malloc (vg_replace_malloc.c:236)
==3240== by 0x8077537: __vasprintf_chk (vasprintf_chk.c:82)
==3240== by 0x509C677: virVasprintf (stdio2.h:199)
==3240== by 0x509C733: virAsprintf (util.c:1912)
==3240== by 0x1906583A: qemudStartup (qemu_driver.c:679)
==3240== by 0x511991D: virStateInitialize (libvirt.c:809)
==3240== by 0x40CD84: daemonRunStateInit (libvirtd.c:751)
==3240== by 0x5098745: virThreadHelper (threads-pthread.c:161)
==3240== by 0x7953D8F: start_thread (pthread_create.c:309)
==3240== by 0x805FF5C: clone (clone.S:115)
When adding new config file parameters, the corresponding
additions to the augeas lens' are constantly forgotten.
Also there are augeas test cases, these don't catch the
error, since they too are never updated.
To address this, the augeas test cases need to be auto-generated
from the example config files.
* build-aux/augeas-gentest.pl: Helper to generate an
augeas test file, substituting in elements from the
example config files
* src/Makefile.am, daemon/Makefile.am: Switch to
auto-generated augeas test cases
* daemon/test_libvirtd.aug, daemon/test_libvirtd.aug.in,
src/locking/test_libvirt_sanlock.aug,
src/locking/test_libvirt_sanlock.aug.in,
src/lxc/test_libvirtd_lxc.aug,
src/lxc/test_libvirtd_lxc.aug.in,
src/qemu/test_libvirtd_qemu.aug,
src/qemu/test_libvirtd_qemu.aug.in: Remove example
config file data, replacing with a ::CONFIG:: placeholder
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all the config options are listed under a 'vnc_entry'
group. Create a bunch of new groups & move options to the
right place
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add nmissing 'host_uuid' entry to libvirtd.conf lens and
rename spice_passwd to spice_password in qemu.conf lens
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of doing
# example_config
use
#example_config
so it is possible to programatically uncomment example config
options, as distinct from their comment/descriptions
Also delete rogue trailing comma not allowed by lens
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove the uid param from virGetUserConfigDirectory,
virGetUserCacheDirectory, virGetUserRuntimeDirectory,
and virGetUserDirectory
These functions were universally called with the
results of getuid() or geteuid(). To make it practical
to port to Win32, remove the uid parameter and hardcode
geteuid()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If vdsm is installed and configured in Fedora 17, we add the following
items into qemu.conf:
spice_tls=1
spice_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-spice"
However, after this changes, augtool cannot identify qemu.conf anymore.
When building as driver modules, it is not possible for the QEMU
driver module to reference the DTrace/SystemTAP probes linked into
the main libvirt.so. Thus we need to move the QEMU probes into a
separate file 'libvirt_qemu_probes.d'. Also rename the existing
file from 'probes.d' to 'libvirt_probes.d' while we're at it
* daemon/Makefile.am, src/internal.h: Include libvirt_probes.h
instead of probes.h
* src/Makefile.am: Add rules for libvirt_qemu_probes.d
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor_json.c,
src/qemu/qemu_monitor_text.c: Include libvirt_qemu_probes.h
* src/libvirt_probes.d: Rename from probes.d
* src/libvirt_qemu_probes.d: QEMU specific probes formerly
in probes.d
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The pciDevice structure corresponding to the device being hot-unplugged
was freed after it was "stolen" from activeList. The pointer was still
used for eg-inactive list. This patch removes the free of the structure
and frees it only if reset fails on the device.
When the last reference to a virConnectPtr is released by
libvirtd, it was possible for a deadlock to occur in the
virDomainEventState functions. The virDomainEventStatePtr
holds a reference on virConnectPtr for each registered
callback. When removing a callback, the virUnrefConnect
function is run. If this causes the last reference on the
virConnectPtr to be released, then virReleaseConnect can
be run, which in turns calls qemudClose. This function has
a call to virDomainEventStateDeregisterConn which is intended
to remove all callbacks associated with the virConnectPtr
instance. This will try to grab a lock on virDomainEventState
but this lock is already held. Deadlock ensues
Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)):
Since each callback associated with a virConnectPtr holds a
reference on virConnectPtr, it is impossible for the qemudClose
method to be invoked while any callbacks are still registered.
Thus the call to virDomainEventStateDeregisterConn must in fact
be a no-op. Thus it is possible to just remove all trace of
virDomainEventStateDeregisterConn and avoid the deadlock.
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove
calls to virDomainEventStateDeregisterConn
This involves setting the cpuacct cgroup to a per-vcpu granularity,
as well as summing the each vcpu accounting into a common array.
Now that we are reading more than one cgroup file, we double-check
that cpus weren't hot-plugged between reads to invalidate our
summing.
Signed-off-by: Eric Blake <eblake@redhat.com>
If qemuPrepareHostdevUSBDevices fail it will roll back devices added
to the driver list of used devices. However, if it may fail because
the device is being used already. But then again - with roll back.
Therefore don't try to remove a usb device manually if the function
fail. Although, we want to remove the device if any operation
performed afterwards fail.
One of our latest USB device handling patches
05abd1507d introduced a regression.
That is, we first create a temporary list of all USB devices that
are to be used by domain just starting up. Then we iterate over and
check if a device from the list is in the global list of currently
assigned devices (activeUsbHostdevs). If not, we add it there and
continue with next iteration then. But if a device from temporary
list is either taken already or adding to the activeUsbHostdevs fails,
we remove all devices in temp list from the activeUsbHostdevs list.
Therefore, if a device is already taken we remove it from
activeUsbHostdevs even if we should not. Thus, next time we allow
the device to be assigned to another domain.
To allow the security drivers to apply different configuration
information per hypervisor, pass the virtualization driver name
into the security manager constructor.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Thanks to this new option we are now able to use modern CPU models (such
as Westmere) defined in external configuration file.
The qemu-1.1{,-device} data files for qemuhelptest are filled in with
qemu-1.1-rc2 output for now. I will update those files with real
qemu-1.1 output once it is released.
Currently each USB2 companion controller gets put on a separate
PCI slot. Not only is this wasteful of PCI slots, but it is not
in compliance with the spec for USB2 controllers. The master
echi1 and all companion controllers should be in the same slot,
with echi1 in function 7, and uhci1-3 in functions 0-2 respectively.
* src/qemu/qemu_command.c: Special case handling of USB2 controllers
to apply correct pci slot assignment
* tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.args,
tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.xml: Expand
test to cover automatic slot assignment
Like for 'static' placement, when the memory policy mode is
'strict', set the memory policy by writing the advisory nodeset
returned from numad to cgroup file cpuset.mems,
On some of the NUMA platforms, the CPU index in each NUMA node
grows non-consecutive. While on other platforms, it can be inconsecutive,
E.g.
% numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 131058 MB
node 0 free: 86531 MB
node 1 cpus: 1 5 9 13 17 21 25 29
node 1 size: 131072 MB
node 1 free: 127070 MB
node 2 cpus: 2 6 10 14 18 22 26 30
node 2 size: 131072 MB
node 2 free: 127758 MB
node 3 cpus: 3 7 11 15 19 23 27 31
node 3 size: 131072 MB
node 3 free: 127226 MB
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10
This patch is to fix the problem by using the CPU index in
caps->host.numaCell[i]->cpus[i] to set the bitmask instead of
assuming the CPU index of the NUMA nodes are always sequential.
For pseries guest, the default controller model is
ibmvscsi controller, this controller only can work
on spapr-vio address.
This patch is to assign spapr-vio address type to
ibmvscsi controller and correct vscsi test case.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Commit cdce2f42d tried to silence a compiler warning on 32-bit builds,
but the gcc shipped with RHEL 5 is old enough that the type conversion
via multiplication by 1 was insufficient for the task.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Previous attempt
didn't get past all gcc versions.
As defined in:
http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
This offers a number of advantages:
* Allows sharing a home directory between different machines, or
sessions (eg. using NFS)
* Cleanly separates cache, runtime (eg. sockets), or app data from
user settings
* Supports performing smart or selective migration of settings
between different OS versions
* Supports reseting settings without breaking things
* Makes it possible to clear cache data to make room when the disk
is filling up
* Allows us to write a robust and efficient backup solution
* Allows an admin flexibility to change where data and settings are stored
* Dramatically reduces the complexity and incoherence of the
system for administrators
This patch lifts the limit of calling thread detection code only on KVM
guests. With upstream qemu the thread mappings are reported also on
non-KVM machines.
QEMU adopted the thread_id information from the kvm branch.
To remain compatible with older upstream versions of qemu the check is
attempted but the failure to detect threads (or even run the monitor
command - on older versions without SMP support) is treated non-fatal
and the code reports one vCPU with pid of the hypervisor (in same
fashion this was done on non-KVM guests).
After a cpu hotplug the qemu driver did not refresh information about
virtual processors used by qemu and their corresponding threads. This
patch forces a re-detection as is done on start of QEMU.
This ensures that correct information is reported by the
virDomainGetVcpus API and "virsh vcpuinfo".
A failure to obtain the thread<->vcpu mapping is treated non-fatal and
the mapping is not updated in a case of failure as not all versions of
QEMU report this in the info cpus command.
This patch changes a switch statement into ifs when handling live vs.
configuration modifications getting rid of redundant code in case when
both live and persistent configuration gets changed.
when failing to attach another usb device to a domain for some reason
which has one use device attached before, the libvirtd crashed.
The crash is caused by null-pointer dereference error in invoking
usbDeviceListSteal passed in NULL value usb variable.
commit 05abd1507d introduces the bug.
Though numad will manage the memory allocation of task dynamically,
it wants management application (libvirt) to pre-set the memory
policy according to the advisory nodeset returned from querying numad,
(just like pre-bind CPU nodeset for domain process), and thus the
performance could benefit much more from it.
This patch introduces new XML tag 'placement', value 'auto' indicates
whether to set the memory policy with the advisory nodeset from numad,
and its value defaults to the value of <vcpu> placement, or 'static'
if 'nodeset' is specified. Example of the new XML tag's usage:
<numatune>
<memory placement='auto' mode='interleave'/>
</numatune>
Just like what current "numatune" does, the 'auto' numa memory policy
setting uses libnuma's API too.
If <vcpu> "placement" is "auto", and <numatune> is not specified
explicitly, a default <numatume> will be added with "placement"
set as "auto", and "mode" set as "strict".
The following XML can now fully drive numad:
1) <vcpu> placement is 'auto', no <numatune> is specified.
<vcpu placement='auto'>10</vcpu>
2) <vcpu> placement is 'auto', no 'placement' is specified for
<numatune>.
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='interleave'/>
</numatune>
And it's also able to control the CPU placement and memory policy
independently. e.g.
1) <vcpu> placement is 'auto', and <numatune> placement is 'static'
<vcpu placement='auto'>10</vcpu>
<numatune>
<memory mode='strict' nodeset='0-10,^7'/>
</numatune>
2) <vcpu> placement is 'static', and <numatune> placement is 'auto'
<vcpu placement='static' cpuset='0-24,^12'>10</vcpu>
<numatune>
<memory mode='interleave' placement='auto'/>
</numatume>
A follow up patch will change the XML formatting codes to always output
'placement' for <vcpu>, even it's 'static'.
It turns out that when cgroups are enabled, the use of a block device
for a snapshot target was failing with EPERM due to libvirt failing
to add the block device to the cgroup whitelist. See also
https://bugzilla.redhat.com/show_bug.cgi?id=810200
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Account for cgroup.
(qemuDomainSnapshotCreateDiskActive): Update caller.
qemu's behavior in this case is to change the spice server behavior to
require secure connection to any channel not otherwise specified as
being in plaintext mode. libvirt doesn't currently allow requesting this
(via plaintext-channel=<channel name>).
RHBZ: 819499
Signed-off-by: Alon Levy <alevy@redhat.com>
src/qemu/qemu_hostdev.c:
refactor qemuPrepareHostdevUSBDevices function, make it focus on
adding usb device to activeUsbHostdevs after check. After that,
the usb hotplug function qemuDomainAttachHostDevice also could use
it.
expand qemuPrepareHostUSBDevices to perform the usb search,
rollback on failure.
src/qemu/qemu_hotplug.c:
If there are multiple usb devices available with same vendorID and productID,
but with different value of "bus, device", we give an error to let user
use <address> to specify the desired one.
usbFindDevice():get usb device according to
idVendor, idProduct, bus, device
it is the exact match of the four parameters
usbFindDeviceByBus():get usb device according to bus, device
it returns only one usb device same as usbFindDevice
usbFindDeviceByVendor():get usb device according to idVendor,idProduct
it probably returns multiple usb devices.
usbDeviceSearch(): a helper function to do the actual search
When we added the default USB controller into domain XML, we efficiently
broke migration to older versions of libvirt that didn't support USB
controllers at all (0.9.4 and earlier) even for domains that don't use
anything that the older libvirt can't provide. We still want to present
the default USB controller in any XML seen by a user/app but we can
safely remove it from the domain XML used during migration. If we are
migrating to a new enough libvirt, it will add the controller XML back,
while older libvirt won't be confused with it although it will still
tell qemu to create the controller.
Similar approach can be used in the future whenever we find out we
always enabled some kind of device without properly advertising it in
domain XML.
Commit 4c82f09e added a capability check for qemu per-device io
throttling, but only applied it to domain startup. As mentioned
in the previous commit (98cec05), the user can still get an 'internal
error' message during a hotplug attempt, when the monitor command
doesn't exist. It is confusing to allow tuning on inactive domains
only to then be rejected when starting the domain.
* src/qemu/qemu_driver.c (qemuDomainSetBlockIoTune): Reject
offline tuning if online can't match it.
If you have a qemu build that lacks the blockio tune monitor command,
then this command:
$ virsh blkdeviotune rhel6u2 hda --total_bytes_sec 1000
error: Unable to change block I/O throttle
error: internal error Unexpected error
fails as expected (well, the error message is lousy), but the next
dumpxml shows that the domain was modified anyway. Worse, that means
if you save the domain then restore it, the restore will likely fail
due to throttling being unsupported, even though no throttling should
even be active because the monitor command failed in the first place.
* src/qemu/qemu_driver.c (qemuDomainSetBlockIoTune): Check for
error before making modification permanent.
Error: RESOURCE_LEAK:
/libvirt/src/qemu/qemu_driver.c:6968:
alloc_fn: Calling allocation function "calloc".
/libvirt/src/qemu/qemu_driver.c:6968:
var_assign: Assigning: "nodeset" = storage returned from "calloc(1UL, 1UL)".
/libvirt/src/qemu/qemu_driver.c:6977:
noescape: Variable "nodeset" is not freed or pointed-to in function "virTypedParameterAssign".
/libvirt/src/qemu/qemu_driver.c:6997:
leaked_storage: Variable "nodeset" going out of scope leaks the storage it points to.
On 32-bit platforms, gcc warns that the comparison between a long
and (ULLONG_MAX/1024/1024) is always false; throwing in a type
conversion shuts up the warning.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Shut gcc up.
This works with newer qemu that doesn't allow escaping spaces.
It's backwards compatible as well.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
When libvirtd is started, we create "libvirt/qemu" directories under
hugetlbfs mount point. Only the "qemu" subdirectory is chowned to qemu
user and "libvirt" remains owned by root. If umask was too restrictive
when libvirtd started, qemu user may lose access to "qemu"
subdirectory. Let's explicitly grant search permissions to "libvirt"
directory for all users.
Currently, qemu GA is not providing 'desc' field for errors like
we are used to from qemu monitor. Therefore, we fall back to this
general 'unknown error' string. However, GA is reporting 'class' which
is not perfect, but much more helpful than generic error string.
Thus we should fall back to class firstly and if even no class
is presented, then we can fall back to that generic string.
Before this patch:
virsh # dompmsuspend --target mem f16
error: Domain f16 could not be suspended
error: internal error unable to execute QEMU command
'guest-suspend-ram': unknown QEMU command error
After this patch:
virsh # dompmsuspend --target mem f16
error: Domain f16 could not be suspended
error: internal error unable to execute QEMU command
'guest-suspend-ram': The command has not been found
More bug extermination in the category of:
Error: CHECKED_RETURN:
/libvirt/src/conf/network_conf.c:595:
check_return: Calling function "virAsprintf" without checking return value (as is done elsewhere 515 out of 543 times).
/libvirt/src/qemu/qemu_process.c:2780:
unchecked_value: No check of the return value of "virAsprintf(&msg, "was paused (%s)", virDomainPausedReasonTypeToString(reason))".
/libvirt/tests/commandtest.c:809:
check_return: Calling function "setsid" without checking return value (as is done elsewhere 4 out of 5 times).
/libvirt/tests/commandtest.c:830:
unchecked_value: No check of the return value of "virTestGetDebug()".
/libvirt/tests/commandtest.c:831:
check_return: Calling function "virTestGetVerbose" without checking return value (as is done elsewhere 41 out of 42 times).
/libvirt/tests/commandtest.c:833:
check_return: Calling function "virInitialize" without checking return value (as is done elsewhere 18 out of 21 times).
One note about the error in commandtest line 809: setsid() seems to fail when running the test -- could be removed ?
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race
with non-zero bandwidth: there is a window between the block_stream
and block_job_set_speed monitor commands where an unlimited amount
of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was
able to reduce the size of the window for that race. In the meantime,
the qemu developers decided to fix things properly; per this message:
https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html
the fix will be in qemu 1.1, and changes block-job-set-speed to use
a different parameter name, as well as adding a new optional parameter
to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero
bandwidth for some hypervisors, I think the best solution is to do
just that for RHEL 6.2 qemu, so that the race is obvious to the user
(anyone using stock RHEL 6.2 binaries won't have this patch, and anyone
building their own libvirt with this patch for RHEL can also rebuild
qemu to get the modern semantics, so it is no real loss in behavior).
Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
Rename the parameter to 'modern', since the naming difference now
covers more than just 'async' block-job-cancel. And while at it,
fix an unchecked integer overflow.
* src/qemu/qemu_monitor.h (enum BLOCK_JOB_CMD): Drop unused value,
rename enum to match conventions.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Reflect enum rename.
* src/qemu_qemu_monitor_json.h (qemuMonitorJSONBlockJob): Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Likewise,
and support difference between RHEL 6.2 and qemu 1.1 block pull.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Reject
bandwidth during pull with too-old qemu.
* src/libvirt.c (virDomainBlockPull, virDomainBlockRebase):
Document this.
QEMU binary is called several times when we probe different kinds of
capabilities the binary supports. This patch introduces new common
helper so that all probes use a consistent way of invoking qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=816662 pointed out
that attempting 'virsh blockpull' on an offline domain gave a
misleading error message about qemu lacking support for the
operation, even when qemu was specifically updated to support it.
The real problem is that we have several capabilities that are
only determined when starting a domain, and therefore are still
clear when first working with an inactive domain (namely, any
capability set by qemuMonitorJSONCheckCommands).
While this patch was able to hoist an existing check in one of the
three culprits, it had to add redundant checks in the other two
places (because you always have to check for an active domain after
obtaining a VM job lock, but the capability bits were being checked
prior to obtaining the job lock).
Someday it would be nice to patch libvirt to cache the set of
capabilities per qemu binary (as determined by inode and timestamp),
rather than re-probing the binary every time a domain is started,
and to teach the cache how to query the monitor during the one
time the probe is made rather than having to wait until a guest
is started; then, a capability probe would succeed even for offline
guests because it just refers to the cache, and the single check for
an active domain after grabbing the job lock would be sufficient.
But since that will involve a lot more coding, I'm happy to go
with this simpler solution for an immediate solution.
* src/qemu/qemu_driver.c (qemuDomainPMSuspendForDuration)
(qemuDomainSnapshotCreateXML, qemuDomainBlockJobImpl): Check for
offline state before checking an online-only cap.
Once qemu monitor reports migration has completed, we just closed our
end of the pipe and let migration tunnel die. This generated bogus error
in case we did so before the thread saw EOF on the pipe and migration
was aborted even though it was in fact successful.
With this patch we first wake up the tunnel thread and once it has read
all data from the pipe and finished the stream we close the
filedescriptor.
A small additional bonus of this patch is that real errors reported
inside qemuMigrationIOFunc are not overwritten by virStreamAbort any
more.
When QEMU reported failed or canceled migration, we correctly detected
it but didn't really consider it as an error condition and migration
protocol just went on. Luckily, some of the subsequent steps eventually
failed end we reported an (unrelated and mostly random) error back to
the caller.
In some cases (spotted with broken connection during tunneled migration)
we were overwriting the original error with worse or even misleading
errors generated when we were cleaning up after failed migration.
This patch modifies the CPU comparrison function to report the
incompatibilities in more detail to ease identification of problems.
* src/cpu/cpu.h:
cpuGuestData(): Add argument to return detailed error message.
* src/cpu/cpu.c:
cpuGuestData(): Add passthrough for error argument.
* src/cpu/cpu_x86.c
x86FeatureNames(): Add function to convert a CPU definition to flag
names.
x86Compute(): - Add error message parameter
- Add macro for reporting detailed error messages.
- Improve error reporting.
- Simplify calculation of forbidden flags.
x86DataIteratorInit():
x86cpuidMatchAny(): Remove functions that are no longer needed.
* src/qemu/qemu_command.c:
qemuBuildCpuArgStr(): - Modify for new function prototype
- Add detailed error reports
- Change error code on incompatible processors
to VIR_ERR_CONFIG_UNSUPPORTED instead of
internal error
* tests/cputest.c:
cpuTestGuestData(): Modify for new function prototype
Most of our errors complaining about an inability to support a
particular action due to qemu limitations used CONFIG_UNSUPPORTED,
but we had a few outliers. Reported by Jiri Denemark.
* src/qemu/qemu_command.c (qemuBuildDriveDevStr): Prefer
CONFIG_UNSUPPORTED.
* src/qemu/qemu_driver.c (qemuDomainReboot)
(qemuDomainBlockJobImpl): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainAttachPciControllerDevice):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorTransaction)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup): Likewise.
A "ide-drive" device can be either a hard disk or a CD-ROM,
if there is ",media=cdrom" specified for the backend, it's
a CD-ROM, otherwise it's a hard disk.
Upstream qemu splitted "ide-drive" into "ide-hd" and "ide-cd"
since commit 1f56e32, and ",media=cdrom" is not required for
ide-cd anymore. "ide-drive" is still supported for backwards
compatibility, but no doubt we should go foward.
A "scsi-disk" device can be either a hard disk or a CD-ROM,
if there is ",media=cdrom" specified for the backend, it's
a CD-ROM, otherwise it's a hard disk.
But upstream qemu splitted "scsi-disk" into "scsi-hd" and
"scsi-cd" since commit b443ae, and ",media=cdrom" is not
required for scsi-cd anymore. "scsi-disk" is still supported
for backwards compatibility, but no doubt we should go
foward.
If console[0] is an alias for serial[0], do not enforce the former to
have a PTY source type. This breaks serial consoles on stdio and makes
no sense.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Currently, we have 3 boolean arguments we have to pass
to qemuProcessStart(). As libvirt grows it is harder and harder
to remember them and their position. Therefore we should
switch to flags instead.
Instead of returning a CPUs list, numad returns NUMA node
list instead, this patch is to convert the node list to
cpumap before affinity setting. Otherwise, the domain
processes will be pinned only to CPU[$numa_cell_num],
which will cause significiant performance losses.
Also because numad will balance the affinity dynamically,
reflecting the cpuset from numad back doesn't make much
sense then, and it may just could produce confusion for
the users. Thus the better way is not to reflect it back
to XML. And in this case, it's better to ignore the cpuset
when parsing XML.
The codes to update the cpuset is removed in this patch
incidentally, and there will be a follow up patch to ignore
the manually specified "cpuset" if "placement" is "auto",
and document will be updated too.
This patch adds a netlink callback when migrating a VEPA enabled
virtual machine. It fixes a Bug where a VM would not request a port
association when it was cleared by lldpad.
This patch requires the latest git version of lldpad to work.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
If dynamic_ownership is off and we are creating a file on NFS
we force chown. This will fail as chown/chmod are not supported
on NFS. However, with no dynamic_ownership we are not required
to do any chown.
In my testing, I was able to provoke an odd block pull failure:
$ virsh blockpull dom vda --bandwidth 10000
error: Requested operation is not valid: No active operation on device: drive-virtio-disk0
merely by using gdb to artifically wait to do the block job set speed
until after the pull had already finished. But in reality, that should
be a success, since the pull finished before we had a chance to set
speed. Furthermore, using a double job lock is not only annoying, but
a bug in itself - if you do parallel virDomainBlockRebase, and hit
the race window just right, the first call grabs the VM job to start
a fast block job, then the second call grabs the VM job to start
a long-running job with unspecified speed, then the first call finally
regrabs the VM job and sets the speed, which ends up running the
second job under the speed from the first call. By consolidating
things into a single job, we avoid opening that race, as well as reduce
the time between starting the job and changing the speed, for less
likelihood of the speed change happening after block job completion
in the first place.
* src/qemu/qemu_monitor.h (BLOCK_JOB_CMD): Add new mode.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Move secondary
job call...
(qemuDomainBlockJobImpl): ...here, for fewer locks.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Change
return value on new internal mode.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally
poll using qemu's "query-block-jobs" API and will not return until the
operation has been completed. API users are advised that this operation
is unbounded and further interaction with the domain during this period
may block. Future patches may refactor things to allow other queries in
parallel with this polling. For older qemu, we synthesize the cancellation
event, since qemu won't generate it.
The choice of polling duration copies from the code in qemu_migration.c.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Probably in the noise, but this will let us scale more efficiently
as we learn to recognize even more qemu events.
* src/qemu/qemu_monitor_json.c (eventHandlers): Sort.
(qemuMonitorEventCompare): New helper function.
(qemuMonitorJSONIOProcessEvent): Optimize event lookup.
RHEL 6.2 was released with an early version of block jobs, which only
worked on the qed file format, where the commands were spelled with
underscore (contrary to QMP style), and where 'block_job_cancel' was
synchronous and did not trigger an event.
The upcoming qemu 1.1 release has fixed these short-comings [1][2]:
the commands now work on multiple file types, are spelled with dash,
and 'block-job-cancel' is asynchronous and emits an event upon conclusion.
[1]qemu commit 370521a1d6f5537ea7271c119f3fbb7b0fa57063
[2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01248.html
This patch recognizes the new spellings, and fixes virDomainBlockRebase
to give a graceful error when talking to a too-old qemu on a partial
rebase attempt. Fixes for the new semantics will come later. This
patch also removes a bogus ATTRIBUTE_NONNULL mistakenly added in
commit 10ec36e2.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCKJOB_SYNC)
(QEMU_CAPS_BLOCKJOB_ASYNC): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONBlockJob): Manage both command names.
(qemuMonitorJSONDiskSnapshot): Minor formatting fix.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Alter signature.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob): Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Pass through
capability bit.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Update callers.
The new safe console handling introduced a possibility to deadlock the
qemu driver when a new console connection forcibly disconnects a
previous console stream that belongs to an already closed connection.
The virStreamFree function calls subsequently a the virReleaseConnect
function that tries to lock the driver while discarding the connection,
but the driver was already locked in qemuDomainOpenConsole.
Backtrace of the deadlocked thread:
0 0x00007f66e5aa7f14 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007f66e5aa3411 in _L_lock_500 () from /lib64/libpthread.so.0
2 0x00007f66e5aa322a in pthread_mutex_lock () from/lib64/libpthread.so.0
3 0x0000000000462bbd in qemudClose ()
4 0x00007f66e6e178eb in virReleaseConnect () from/usr/lib64/libvirt.so.0
5 0x00007f66e6e19c8c in virUnrefStream () from /usr/lib64/libvirt.so.0
6 0x00007f66e6e3d1de in virStreamFree () from /usr/lib64/libvirt.so.0
7 0x00007f66e6e09a5d in virConsoleHashEntryFree () from/usr/lib64/libvirt.so.0
8 0x00007f66e6db7282 in virHashRemoveEntry () from/usr/lib64/libvirt.so.0
9 0x00007f66e6e09c4e in virConsoleOpen () from /usr/lib64/libvirt.so.0
10 0x00000000004526e9 in qemuDomainOpenConsole ()
11 0x00007f66e6e421f1 in virDomainOpenConsole () from/usr/lib64/libvirt.so.0
12 0x00000000004361e4 in remoteDispatchDomainOpenConsoleHelper ()
13 0x00007f66e6e80375 in virNetServerProgramDispatch () from/usr/lib64/libvirt.so.0
14 0x00007f66e6e7ae11 in virNetServerHandleJob () from/usr/lib64/libvirt.so.0
15 0x00007f66e6da897d in virThreadPoolWorker () from/usr/lib64/libvirt.so.0
16 0x00007f66e6da7ff6 in virThreadHelper () from/usr/lib64/libvirt.so.0
17 0x00007f66e5aa0c5c in start_thread () from /lib64/libpthread.so.0
18 0x00007f66e57e7fcd in clone () from /lib64/libc.so.6
* src/qemu/qemu_driver.c: qemuDomainOpenConsole()
-- unlock the qemu driver right after acquiring the domain
object
In case an API fails with "cannot acquire state change lock", searching
for the API that possibly forgot to end its job is not always easy.
Let's keep track of the job owner and print it out for easier
identification.
As reported by Daniel Berrangé, we have a huge performance regression
for virDomainGetInfo() due to the change which makes virDomainEndJob()
save the XML status file every time it is called. Previous to that
change, 2000 calls to virDomainGetInfo() took ~2.5 seconds. After that
change, 2000 calls to virDomainGetInfo() take 2 *minutes* 45 secs.
We made the change to be able to recover from libvirtd restart in the
middle of a job. However, only destroy and async jobs are taken care of.
Thus it makes more sense to only save domain state XML when these jobs
are started/stopped.
* src/qemu/qemu_command.c: Wire up -bios with <loader>
* tests/qemuxml2argvdata/qemuxml2argv-bios.args,
tests/qemuxml2argvdata/qemuxml2argv-bios.xml: Expand
existing BIOS test case to cover <loader>
Leak introduced in commit 0436d32. If we allocate an actions array,
but fail early enough to never consume it with the qemu monitor
transaction call, we leaked memory.
But our semantics of making the transaction command free the caller's
memory is awkward; avoiding the memory leak requires making every
intermediate function in the call chain check for error. It is much
easier to fix things so that the function that allocates also frees,
while the call chain leaves the caller's data intact. To do that,
I had to hack our JSON data structure to make it easy to protect a
portion of an arbitrary JSON tree from being freed.
* src/util/json.h (virJSONType): Name the enum.
(_virJSONValue): New field.
* src/util/json.c (virJSONValueFree): Use it to protect a portion
of an array.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONTransaction): Avoid
freeing caller's data.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive):
Free actions array on failure.
We can tell qemuDomainSnapshotFSThaw if we want it to report errors or
not. However, if we don't want to and an error has been already set by
previous qemuReportError() we must keep copy of that error not just a
pointer to it. Otherwise, it get overwritten if FSThaw reports an error.
If the daemon is restarted it will lose list of active
USB devices assigned to active domains. Therefore we need
to rebuild this list on qemuProcessReconnect().
To prevent assigning one USB device to two domains,
we keep a list of assigned USB devices. On domain
startup - qemuProcessStart() - we insert devices
used by domain into the list but remove them only
on detach-device. Devices are, however, released
on qemuProcessStop() as well.
Originally, qemuDomainCheckEjectableMedia was entering monitor with qemu
driver lock. Commit 2067e31bf9, which I
made to fix that, revealed another issue we had (but didn't notice it
since the driver was locked): we didn't set nested job when
qemuDomainCheckEjectableMedia is called during migration. Thus the
original fix I made was wrong.
Since Xen 3.1 the clock=variable semantic is supported. In addition to
qemu/kvm Xen also knows about a variant where the offset is relative to
'localtime' instead of 'utc'.
Extends the libvirt structure with a flag 'basis' to specify, if the
offset is relative to 'localtime' or 'utc'.
Extends the libvirt structure with a flag 'reset' to force the reset
behaviour of 'localtime' and 'utc'; this is needed for backward
compatibility with previous versions of libvirt, since they report
incorrect XML.
Adapt the only user 'qemu' to the new name.
Extend the RelaxNG schema accordingly.
Document the new 'basis' attribute in the HTML documentation.
Adapt test for the new attribute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
If we round up a user's memory request, we should update the XML
to reflect the actual value in use by the VM, rather than giving
an artificially small value back to the user.
* src/qemu/qemu_command.c (qemuBuildNumaArgStr)
(qemuBuildCommandLine): Reflect rounding back to XML.
This patch was created to resolve this upstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=784767
and is at least a partial solution to this RHEL RFE:
https://bugzilla.redhat.com/show_bug.cgi?id=805071
Previously the only attribute of a network device that could be
modified by virUpdateDeviceFlags() ("virsh update-device") was the
link state; attempts to change any other attribute would log an error
and fail.
This patch adds recognition of a change in bridge device name, and
supports reconnecting the guest's interface to the new device.
Standard audit logs for detaching and attaching a network device are
also generated. Although the current auditing function doesn't log the
bridge being attached to, this will later be changed in a separate
patch.
qemuBuildHostNetStr had a switch-within-a-switch where both were
looking at the same variable. This was apparently to take advantage of
code common to three different cases (while also taking care of some
code that was different). However, there were only 2 lines common to
all, one of those can be eliminated by merging it into the
virAsprintfs that are in each case. On top of that, all the extra
empty cases cause Coverity complaints (because they are unreachable),
but absence of the empty cases causes a compile error due to
"enumeration value not handled in switch".
The solution is to just make each toplevel case independent, folding
in the common code to each.
commit b0e2bb33 set a default value for the SPICE agent channel by
inserting it during parsing of the channel XML. That method of setting
a default is problematic because it makes a format/parse roundtrip
unclean, and experience with setting other values as a side effect of
parsing has led to headaches (e.g. automatically setting a MAC address
in the parser when one isn't specified in the input XML).
This patch does not revert commit b0e2bb33 (it will be reverted in a
separate patch) but adds the alternate implementation of simply
inserting the default value in the appropriate place on the qemu
commandline when no value is provided.
If we issue guest command and GA is not running, the issuing thread
will block endlessly. We can check for GA presence by issuing
guest-sync with unique ID (timestamp). We don't want to issue real
command as even if GA is not running, once it is started, it process
all commands written to GA socket.
The code is splattered with a mix of
sizeof foo
sizeof (foo)
sizeof(foo)
Standardize on sizeof(foo) and add a syntax check rule to
enforce it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When qemu cannot start, we may call qemuProcessStop() twice.
We have check whether the vm is running at the beginning of
qemuProcessStop() to avoid libvirt deadlock. We call
qemuProcessStop() with driver and vm locked. It seems that
we can avoid libvirt deadlock. But unfortunately we may
unlock driver and vm in the function qemuProcessKill() while
vm->def->id is not -1. So qemuProcessStop() will be run twice,
and monitor will be freed unexpectedly. So we should set
vm->def->id to -1 at the beginning of qemuProcessStop().
In the current V3 migration protocol, Libvirt does not
check the result of the function
qemuMigrationVPAssociatePortProfiles
This means that it is possible for a migration to complete
successfully even when the VM loses network connectivity on
the destination host.
With this change libvirt aborts the migration
(during the "finish" step) when the above function fails, that
is to say when at least one of the port profile associations fails.
Signed-off by: Christian Benvenuti <benve@cisco.com>
Commit d42a2ff caused a regression in creating a disk-only snapshot
of a qcow2 disk; by passing the wrong variable to the monitor call,
libvirt ended up creating JSON that looked like "format":null instead
of the intended "format":"qcow2".
To make it easier to diagnose this in the future, make JSON creation
error out if "s:arg" is paired with NULL (it is still possible to
use "n:arg" in the rare cases where qemu will accept a null).
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Pass correct value.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommandRaw):
Improve error message.
When libvirtd is restarted, also restart the netlink event
message callbacks for existing VEPA connections and send
a message to lldpad for these existing links, so it learns
the new libvirtd pid.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
This avoids possible deadlock of the qemu driver in case a domain is
begin migrated (in Begin phase) and unrelated connection to qemu driver
is closed at the right time.
I checked all callers of qemuDomainCheckEjectableMedia() and they are
calling this function with qemu driver locked.
Found when attempting to build on Fedora 17 alpha with:
./autogen.sh --system --enable-compile-warnings=error
(this same build command works without problem on Fedora 16). Since
the consumer of the qemuProcessReconnectData doesn't assume that the
other fields of the struct are initialized (although it uses them
internally), the simpler solution is to just switch to C99-style
struct initialization (which doesn't require specification of all
fields).
libvirt always adds -Werror-frame-larger-than=4096 to the flags when
it builds. When building on Fedora 17, two functions with multiple
1024 buffers declared inside if {} blocks would generate frame size
errors; apparently the version of gcc on Fedora 16 will merge these
multiple buffers into a single buffer even when optimization is off,
but Fedora 17 won't.
The fix is to declare a single 1024 buffer at the top of the two
offending functions, and reuse the single buffer throughout the
functions.
Return statements with parameter enclosed in parentheses were modified
and parentheses were removed. The whole change was scripted, here is how:
List of files was obtained using this command:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$'
Found files were modified with this command:
sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
Then checked for nonsense.
The whole command looks like this:
git grep -l -e '\<return\s*([^()]*\(([^()]*)[^()]*\)*)\s*;' | \
grep -e '\.[ch]$' -e '\.py$' | xargs sed -i -e \
's_^\(.*\<return\)\s*(\(\([^()]*([^()]*)[^()]*\)*\))\s*\(;.*$\)_\1 \2\4_' \
-e 's_^\(.*\<return\)\s*(\([^()]*\))\s*\(;.*$\)_\1 \2\3_'
The oVirt developers have stated that the real reasons they want
to have qemu reuse existing volumes when creating a snapshot are:
1. the management framework is set up so that creation has to be
done from a central node for proper resource tracking, and having
libvirt and/or qemu create things violates the framework, and
2. qemu defaults to creating snapshots with an absolute path to
the backing file, but oVirt wants to manage a backing chain that
uses just relative names, to allow for easier migration of a chain
across storage locations.
When 0.9.10 added VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT (commit
4e9953a4), it only addressed point 1, but libvirt was still using
O_TRUNC which violates point 2. Meanwhile, the new qemu
'transaction' monitor command includes a new optional mode argument
that will force qemu to reuse the metadata of the file it just
opened (with the burden on the caller to have valid metadata there
in the first place). So, this tweaks the meaning of the flag to
cover both points as intended for use by oVirt. It is not strictly
backward-compatible to 0.9.10 behavior, but it can be argued that
the O_TRUNC of 0.9.10 was a bug.
Note that this flag is all-or-nothing, and only selects between
'existing' and the default 'absolute-paths'. A more flexible
approach that would allow per-disk selections, as well as adding
support for the 'no-backing-file' mode, would be possible by
extending the <domainsnapshot> xml to have a per-disk mode, but
until we have a management application expressing a need for that
additional complexity, it is not worth doing.
* src/libvirt.c (virDomainSnapshotCreateXML): Tweak documentation.
* src/qemu/qemu_monitor.h (qemuMonitorDiskSnapshot): Add
parameters.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Pass them
through.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot): Use
new monitor command arguments.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive)
(qemuDomainSnapshotCreateSingleDiskActive): Adjust callers.
(qemuDomainSnapshotDiskPrepare): Allow qed, modify rules on reuse.
The hardest part about adding transactions is not using the new
monitor command, but undoing the partial changes we made prior
to a failed transaction.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
transaction when available.
(qemuDomainSnapshotUndoSingleDiskActive): New function.
(qemuDomainSnapshotCreateSingleDiskActive): Pass through actions.
(qemuDomainSnapshotCreateXML): Adjust caller.
QEmu 1.1 is adding a 'transaction' command to the JSON monitor.
Each element of a transaction corresponds to a top-level command,
with the additional guarantee that the transaction flushes all
pending I/O, then guarantees that all actions will be successful
as a group or that failure will roll back the state to what it
was before the monitor command. The difference between a
top-level command:
{ "execute": "blockdev-snapshot-sync", "arguments":
{ "device": "virtio0", ... } }
and a transaction:
{ "execute": "transaction", "arguments":
{ "actions": [
{ "type": "blockdev-snapshot-sync", "data":
{ "device": "virtio0", ... } } ] } }
is just a couple of changed key names and nesting the shorter
command inside a JSON array to the longer command. This patch
just adds the framework; the next patch will actually use a
transaction.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMakeCommand): Move
guts...
(qemuMonitorJSONMakeCommandRaw): ...into new helper. Add support
for array element.
(qemuMonitorJSONTransaction): New command.
(qemuMonitorJSONDiskSnapshot): Support use in a transaction.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskSnapshot): Add
argument.
(qemuMonitorJSONTransaction): New declaration.
* src/qemu/qemu_monitor.h (qemuMonitorTransaction): Likewise.
(qemuMonitorDiskSnapshot): Add argument.
* src/qemu/qemu_monitor.c (qemuMonitorTransaction): New wrapper.
(qemuMonitorDiskSnapshot): Pass argument on.
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Update caller.
Taking an external snapshot of just one disk is atomic, without having
to pause and resume the VM. This also paves the way for later patches
to interact with the new qemu 'transaction' monitor command.
The various scenarios when requesting atomic are:
online, 1 disk, old qemu - safe, allowed by this patch
online, more than 1 disk, old qemu - failure, this patch
offline snapshot - safe, once a future patch implements offline disk snapshot
online, 1 or more disks, new qemu - safe, once future patch uses transaction
Taking an online system checkpoint snapshot is atomic, since it is
done via a single 'savevm' monitor command. Taking an offline system
checkpoint snapshot is atomic, thanks to the previous patch.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support
new flag for single-disk setups.
(qemuDomainSnapshotDiskPrepare): Check for atomic here.
(qemuDomainSnapshotCreateDiskActive): Skip pausing the VM when
atomic supported.
(qemuDomainSnapshotIsAllowed): Use bool instead of int.
Offline internal snapshots can be rolled back with just a little
bit of refactoring, meaning that we are now automatically atomic.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move
guts...
(qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow
rollbacks.
We need a capability bit to gracefully error out if some of the
additions in future patches can't be implemented by the running qemu.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_TRANSACTION): New cap.
* src/qemu/qemu_capabilities.c (qemuCaps): Name it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
it.
This introduces a new running reason VIR_DOMAIN_RUNNING_WAKEUP,
and new suspend event type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
While a wakeup event is emitted, the domain which entered into
VIR_DOMAIN_PMSUSPENDED will be transferred to "running"
with reason VIR_DOMAIN_RUNNING_WAKEUP, and a new domain lifecycle
event emitted with type VIR_DOMAIN_EVENT_STARTED_WAKEUP.
This patch introduces a new event type for the QMP event
SUSPEND:
VIR_DOMAIN_EVENT_ID_PMSUSPEND
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventSuspendCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This patch introduces a new event type for the QMP event
WAKEUP:
VIR_DOMAIN_EVENT_ID_PMWAKEUP
The event doesn't take any data, but considering there might
be reason for wakeup in future, the callback definition is:
typedef void
(*virConnectDomainEventWakeupCallback)(virConnectPtr conn,
virDomainPtr dom,
int reason,
void *opaque);
"reason" is unused currently, always passes "0".
This is similiar with physical world, one will be surprised if the
box starts with medium exists while the tray is open.
New tests are added, tests disk-{cdrom,floppy}-tray are for the qemu
supports "-device" flag, and disk-{cdrom,floppy}-no-device-cap are
for old qemu, i.e. which doesn't support "-device" flag.
This patch introduces a new event type for the QMP event
DEVICE_TRAY_MOVED, which occurs when the tray of a removable
disk is moved (i.e opened or closed):
VIR_DOMAIN_EVENT_ID_TRAY_CHANGE
The event's data includes the device alias and the reason
for tray status' changing, which indicates why the tray
status was changed. Thus the callback definition for the event
is:
enum {
VIR_DOMAIN_EVENT_TRAY_CHANGE_OPEN = 0,
VIR_DOMAIN_EVENT_TRAY_CHANGE_CLOSE,
\#ifdef VIR_ENUM_SENTINELS
VIR_DOMAIN_EVENT_TRAY_CHANGE_LAST
\#endif
} virDomainEventTrayChangeReason;
typedef void
(*virConnectDomainEventTrayChangeCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *devAlias,
int reason,
void *opaque);
Since we defined a custom virURIPtr type, we should use a
virURIFree method instead of assuming it will always be
a typedef for xmlURIPtr
* src/util/viruri.c, src/util/viruri.h, src/libvirt_private.syms:
Add a virURIFree method
* src/datatypes.c, src/esx/esx_driver.c, src/libvirt.c,
src/qemu/qemu_migration.c, src/vmx/vmx.c, src/xen/xend_internal.c,
tests/viruritest.c: s/xmlFreeURI/virURIFree/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When a client which started non-p2p migration dies in a bad time, the
source libvirtd never clears the migration job and almost nothing can be
done with the domain without restarting the daemon. This patch makes use
of connection close callbacks and ensures that migration job is properly
discarded when the client disconnects.
Destination daemon should not rely on the client or source daemon
(depending on the type of migration) to call Finish when migration
fails, because the client may crash before it can do so. The domain
prepared for incoming migration is set to be destroyed (and migration
job cleaned up) when connection with the client closes but this is not
enough. If the associated qemu process crashes after Prepare step and
the domain is cleaned up before the connection gets closed, autodestroy
is not called for the domain and migration jobs remains set. In case the
domain is defined on destination host (i.e., it is not completely
removed once destroyed) we keep the job set for ever. To fix this, we
register a cleanup callback which is responsible to clean migration-in
job when a domain dies anywhere between Prepare and Finish steps. Note
that we can't blindly clean any job when spotting EOF on monitor since
normally an API is running at that time.
This reverts commit 61f2b6ba5f and most of
commit d8916dc8e2, which effectively
brings back commit ef1065cf5a written by
Jim Fehlig:
The qemu migration speed default is 32MiB/s as defined in migration.c
/* Migration speed throttling */
static int64_t max_throttle = (32 << 20);
There's no need to throttle migration when targeting a file, so set
migration speed to unlimited prior to migration, and restore to libvirt
default value after migration.
Default units is MB for migrate_set_speed monitor command, so
(INT64_MAX / (1024 * 1024)) is used for unlimited migration speed.
This was reverted because migration to file could not be canceled and
even monitored since qemu was not processing any monitor commands until
the migration finished. This is now different as we make sure the
file descriptor we pass to qemu is able to properly report EAGAIN.
Recent qemu changes might have helped as well.
I tested managedsave with this patch in and indeed, it is 10x faster
while I can still monitor its progress.
If a guest is paused, we were silently ignoring the quiesce flag,
which results in unclean snapshots, contrary to the intent of the
flag. Since we can't quiesce without guest agent support, we should
instead fail if the guest is not running.
Meanwhile, if we attempt a quiesce command, but the guest agent
doesn't respond, and we time out, we may have left the command
pending on the guest's queue, and when the guest resumes parsing
commands, it will freeze even though our command is no longer
around to issue a thaw. To be safe, we must _always_ pair every
quiesce call with a counterpart thaw, even if the quiesce call
failed due to a timeout, so that if a guest wakes up and starts
processing a command backlog, it will not get stuck in a frozen
state.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive):
Always issue thaw after a quiesce, even if quiesce failed.
(qemuDomainSnapshotFSThaw): Add a parameter.
A common coding pattern for changing blkio parameters is
1. virDomainGetBlkioParameters
2. change one or more params
3. virDomainSetBlkioParameters
For this to work, it must be possible to roundtrip through
the methods without error. Unfortunately virDomainGetBlkioParameters
will return "" for the deviceWeight parameter for guests by default,
which virDomainSetBlkioParameters will then reject as invalid.
This fixes the handling of "" to be a no-op, and also improves the
error message to tell you what was invalid
Wire up the domain graphics event notifications for SPICE. Adapted
from a RHEL-only patch written by Dan Berrange that used custom
__com.redhat_SPICE events - equivalent events are now available in
upstream QEMU (including a SPICE_CONNECTED event, which was missing in
the __COM.redhat_SPICE version).
* src/qemu/qemu_monitor_json.c: Wire up SPICE graphics events
numad is an user-level daemon that monitors NUMA topology and
processes resource consumption to facilitate good NUMA resource
alignment of applications/virtual machines to improve performance
and minimize cost of remote memory latencies. It provides a
pre-placement advisory interface, so significant processes can
be pre-bound to nodes with sufficient available resources.
More details: http://fedoraproject.org/wiki/Features/numad
"numad -w ncpus:memory_amount" is the advisory interface numad
provides currently.
This patch add the support by introducing a new XML attribute
for <vcpu>. e.g.
<vcpu placement="auto">4</vcpu>
<vcpu placement="static" cpuset="1-10^6">4</vcpu>
The returned advisory nodeset from numad will be printed
in domain's dumped XML. e.g.
<vcpu placement="auto" cpuset="1-10^6">4</vcpu>
If placement is "auto", the number of vcpus and the current
memory amount specified in domain XML will be used for numad
command line (numad uses MB for memory amount):
numad -w $num_of_vcpus:$current_memory_amount / 1024
The advisory nodeset returned from numad will be used to set
domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity).
If the user specifies both CPU affinity policy (e.g.
(<vcpu cpuset="1-10,^7,^8">4</vcpu>) and placement == "auto"
the specified CPU affinity will be overridden.
Only QEMU/KVM drivers support it now.
See docs update in patch for more details.
With current code, we pass true iff domain is cold booting. However,
if disk is inaccessible and startupPolicy for that disk is set to
'requisite' we have to fail iff cold booting.
Even though we say in documentation setting (tls-)port to -1 is legacy
compat style for enabling autoport, we're roughly doing this for VNC.
However, in case of SPICE auto enable autoport iff both port & tlsPort
are equal -1 as documentation says autoport plays with both.
In qemuDomainDetachNetDevice, detach was being used before it had been
validated. If no matching device was found, this resulted in a
dereference of a NULL pointer.
This behavior was a regression introduced in commit
cf90342be0, so it has not been a part of
any official libvirt release.
When host-model and host-passthrouh CPU modes were introduced, qemu
driver was properly modify to update guest CPU definition during
migration so that we use the right CPU at the destination. However,
similar treatment is needed for (managed)save and snapshots since they
need to save the exact CPU so that a domain can be properly restored.
To avoid repetition of such situation, all places that need live XML
share the code which generates it.
As a side effect, this patch fixes error reporting from
qemuDomainSnapshotWriteMetadata().
Thanks to cgroups, providing user vs. system time of the overall
guest is easy to add to our existing API.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_CPU_STATS_USERTIME)
(VIR_DOMAIN_CPU_STATS_SYSTEMTIME): New constants.
* src/util/virtypedparam.h (virTypedParameterArrayValidate)
(virTypedParameterAssign): Enforce checking the result.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Fix offender.
(qemuDomainGetTotalcpuStats): Implement new parameters.
* tools/virsh.c (cmdCPUStats): Tweak output accordingly.
If there is a disk file with a comma in the name, QEmu expects a double
comma instead of a single one (e.g., the file "virtual,disk.img" needs
to be specified as "virtual,,disk.img" in QEmu's command line). This
patch fixes libvirt to work with that feature. Fix RHBZ #801036.
Based on an initial patch by Crístian Viana.
* src/util/buf.h (virBufferEscape): Alter signature.
* src/util/buf.c (virBufferEscape): Add parameter.
(virBufferEscapeSexpr): Fix caller.
* src/qemu/qemu_command.c (qemuBuildRBDString): Likewise. Also
escape commas in file names.
(qemuBuildDriveStr): Escape commas in file names.
* docs/schemas/basictypes.rng (absFilePath): Relax RNG to allow
commas in input file names.
* tests/qemuxml2argvdata/*-disk-drive-network-sheepdog.*: Update
test.
Signed-off-by: Eric Blake <eblake@redhat.com>
If user hasn't supplied any tlsPort we default to setting it
to zero in our internal structure. However, when building command
line we test it against -1 which is obviously wrong.
This function was freeing a virDomainNetDef with
VIR_FREE(). virDomainNetDef is a complex structure with many pointers
to other dynamically allocated data; to properly free it
virDomainNetDefFree() must be called instead, otherwise several
strings (and potentially other things) will be leaked.
For some reason, although live hotplug of <hostdev> devices is
supported, persistent hotplug is not. This patch adds the proper
VIR_DOMAIN_DEVICE_HOSTDEV cases to the switches in
qemuDomainAttachDeviceConfig and qemuDomainDetachDeviceConfig.
There are several functions in domain_conf.c that remove a device
object from the domain's list of that object type, but don't free the
object or return it to the caller to free. In many cases this isn't a
problem because the caller already had a pointer to the object and
frees it afterward, but in several cases the removed object was just
left floating around with no references to it.
In particular, the function qemuDomainDetachDeviceConfig() calls
functions to locate and remove net (virDomainNetRemoveByMac), disk
(virDomainDiskRemoveByName()), and lease (virDomainLeaseRemove())
devices, but neither it nor its caller qemuDomainModifyDeviceConfig()
ever obtain a pointer to the device being removed, much less free it.
This patch modifies the following "remove" functions to return a
pointer to the device object being removed from the domain device
arrays, to give the caller the option of freeing the device object
using that pointer if needed. In places where the object was
previously leaked, it is now freed:
virDomainDiskRemove
virDomainDiskRemoveByName
virDomainNetRemove
virDomainNetRemoveByMac
virDomainHostdevRemove
virDomainLeaseRemove
virDomainLeaseRemoveAt
The functions that had been leaking:
libxlDomainDetachConfig - leaked a virDomainDiskDef
qemuDomainDetachDeviceConfig - could leak a virDomainDiskDef,
a virDomainNetDef, or a
virDomainLeaseDef
qemuDomainDetachLease - leaked a virDomainLeaseDef
There were certain paths through the hostdev detach code that could
lead to the lower level function failing (and not removing the object
from the domain's hostdevs list), but the higher level function
free'ing the hostdev object anyway. This would leave a stale
hostdevdef pointer in the list, which would surely cause a problem
eventually.
This patch relocates virDomainHostdevRemove from the lower level
functions qemuDomainDetachThisHostDevice and
qemuDomainDetachHostPciDevice, to their caller
qemuDomainDetachThisHostDevice, placing it just before the call to
virDomainHostdevDefFree. This makes it easy to verify that either both
operations are done, or neither.
NB: The "dangling pointer" part of this problem was introduced in
commit 13d5a6, so it is not present in libvirt versions prior to
0.9.9. Earlier versions would return failure in certain cases even
though the the device object was removed/deleted, but the removal and
deletion operations would always both happen or neither.
This patch will allow OpenFlow controllers to identify which interface
belongs to a particular VM by using the Domain UUID.
ovs-vsctl get Interface vnet0 external_ids
{attached-mac="52:54:00:8C:55:2C", iface-id="83ce45d6-3639-096e-ab3c-21f66a05f7fa", iface-status=active, vm-id="142a90a7-0acc-ab92-511c-586f12da8851"}
V2 changes:
Replaced vm-uuid with vm-id. There was a discussion in Open vSwitch
mailinglist that we should stick with the same DB key postfixes for the
sake of consistency (e.g iface-id, vm-id ...).
Some members are generated during XML parse (e.g. MAC address of
an interface); However, with current implementation, if we
are plugging a device both to persistent and live config,
we parse given XML twice: first time for live, second for config.
This is wrong then as the second time we are not guaranteed
to generate same values as we did for the first time.
To prevent that we need to create a copy of DeviceDefPtr;
This is done through format/parse process instead of writing
functions for deep copy as it is easier to maintain:
adding new field to any virDomain*DefPtr doesn't require change
of copying function.
Currently, startupPolicy='requisite' was determining cold boot
by migrateFrom != NULL. That means, if domain was started up
with migrateFrom set we didn't require disk source path and allowed
it to be dropped. However, on snapshot-revert domain wasn't migrated
but according to documentation, requisite should drop disk source
as well.
Using 'unsigned long' for memory values is risky on 32-bit platforms,
as a PAE guest can have more than 4GiB memory. Our API is
(unfortunately) locked at 'unsigned long' and a scale of 1024, but
the rest of our system should consistently use 64-bit values,
especially since the previous patch centralized overflow checking.
* src/conf/domain_conf.h (_virDomainDef): Always use 64-bit values
for memory. Change hugepage_backed to a bool.
* src/conf/domain_conf.c (virDomainDefParseXML)
(virDomainDefCheckABIStability, virDomainDefFormatInternal): Fix
clients.
* src/vmx/vmx.c (virVMXFormatConfig): Likewise.
* src/xenxs/xen_sxpr.c (xenParseSxpr, xenFormatSxpr): Likewise.
* src/xenxs/xen_xm.c (xenXMConfigGetULongLong): New function.
(xenXMConfigGetULong, xenXMConfigSetInt): Avoid truncation.
(xenParseXM, xenFormatXM): Fix clients.
* src/phyp/phyp_driver.c (phypBuildLpar): Likewise.
* src/openvz/openvz_driver.c (openvzDomainSetMemoryInternal):
Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainDefineXML): Likewise.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Likewise.
* src/qemu/qemu_process.c (qemuProcessStart): Likewise.
* src/qemu/qemu_monitor.h (qemuMonitorGetBalloonInfo): Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetBalloonInfo):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetInfo)
(qemuDomainGetXMLDesc): Likewise.
* src/uml/uml_conf.c (umlBuildCommandLine): Likewise.
On 64-bit platforms, unsigned long and unsigned long long are
identical, so we don't have to worry about overflow checks.
On 32-bit platforms, anywhere we narrow unsigned long long back
to unsigned long, we have to worry about overflow; it's easier
to do this in one place by having most of the code use the same
or wider types, and only doing the narrowing at the last minute.
Therefore, the memory set commands remain unsigned long, and
the memory get command now centralizes the overflow check into
libvirt.c, so that drivers don't have to repeat the work.
This also fixes a bug where xen returned the wrong value on
failure (most APIs return -1 on failure, but getMaxMemory
must return 0 on failure).
* src/driver.h (virDrvDomainGetMaxMemory): Use long long.
* src/libvirt.c (virDomainGetMaxMemory): Raise overflow.
* src/test/test_driver.c (testGetMaxMemory): Fix driver.
* src/rpc/gendispatch.pl (name_to_ProcName): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorGetMaxMemory): Likewise.
* src/xen/xen_driver.c (xenUnifiedDomainGetMaxMemory): Likewise.
* src/xen/xend_internal.c (xenDaemonDomainGetMaxMemory):
Likewise.
* src/xen/xend_internal.h (xenDaemonDomainGetMaxMemory):
Likewise.
* src/xen/xm_internal.c (xenXMDomainGetMaxMemory): Likewise.
* src/xen/xm_internal.h (xenXMDomainGetMaxMemory): Likewise.
* src/xen/xs_internal.c (xenStoreDomainGetMaxMemory): Likewise.
* src/xen/xs_internal.h (xenStoreDomainGetMaxMemory): Likewise.
* src/xenapi/xenapi_driver.c (xenapiDomainGetMaxMemory):
Likewise.
* src/esx/esx_driver.c (esxDomainGetMaxMemory): Likewise.
* src/libxl/libxl_driver.c (libxlDomainGetMaxMemory): Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetMaxMemory): Likewise.
* src/lxc/lxc_driver.c (lxcDomainGetMaxMemory): Likewise.
* src/uml/uml_driver.c (umlDomainGetMaxMemory): Likewise.
Overflow can be user-induced, so it deserves more than being called
an internal error. Note that in general, 32-bit platforms have
far more places to trigger this error (anywhere the public API
used 'unsigned long' but the other side of the connection is a
64-bit server); but some are possible on 64-bit platforms (where
the public API computes the product of two numbers).
* include/libvirt/virterror.h (VIR_ERR_OVERFLOW): New error.
* src/util/virterror.c (virErrorMsg): Translate it.
* src/libvirt.c (virDomainSetVcpusFlags, virDomainGetVcpuPinInfo)
(virDomainGetVcpus, virDomainGetCPUStats): Use it.
* daemon/remote.c (HYPER_TO_TYPE): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockResize): Likewise.
The RPC code assumed that the array returned by the driver would be
fully populated; that is, ncpus on entry resulted in ncpus * return
value on exit. However, while we don't support holes in the middle
of ncpus, we do want to permit the case of ncpus on entry being
longer than the array returned by the driver (that is, it should be
safe for the caller to pass ncpus=128 on entry, and the driver will
stop populating the array when it hits max_id).
Additionally, a successful return implies that the caller will then
use virTypedParamArrayClear on the entire array; for this to not
free uninitialized memory, the driver must ensure that all skipped
entries are explicitly zeroed (the RPC driver did this, but not
the qemu driver).
There are now three cases:
server 0.9.10 and client 0.9.10 or newer: No impact - there were no
hypervisor drivers that supported cpu stats
server 0.9.11 or newer and client 0.9.10: if the client calls with
ncpus beyond the max, then the rpc call will fail on the client side
and disconnect the client, but the server is no worse for the wear
server 0.9.11 or newer and client 0.9.11: the server can return a
truncated array and the client will do just fine
I reproduced the problem by using a host with 2 CPUs, and doing:
virsh cpu-stats $dom --start 1 --count 2
* daemon/remote.c (remoteDispatchDomainGetCPUStats): Allow driver
to omit tail of array.
* src/remote/remote_driver.c (remoteDomainGetCPUStats):
Accommodate driver that omits tail of array.
* src/libvirt.c (virDomainGetCPUStats): Document this.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Clear all
unpopulated entries.
* For now, only "cpu_time" is supported.
* cpuacct cgroup is used for providing percpu cputime information.
* src/qemu/qemu.conf - take care of cpuacct cgroup.
* src/qemu/qemu_conf.c - take care of cpuacct cgroup.
* src/qemu/qemu_driver.c - added an interface
* src/util/cgroup.c/h - added interface for getting percpu cputime
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
These changes are applied only if the hostdev has a parent net device
(i.e. if it was defined as "<interface type='hostdev'>" rather than
just "<hostdev>"). If the parent netdevice has virtual port
information, the original virtualport associate functions are called
(these set and restore both mac and port profile on an
interface). Otherwise, only mac address is set on the device.
Note that This is only supported for SR-IOV Virtual Functions (not for
standard PCI or USB netdevs), and virtualport association is only
supported for 802.1Qbh. For all other types of cards and types of
virtualport, a "Config Unsupported" error is returned and the
operation fails.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
This patch includes the following changes to virnetdevmacvlan.c and
virnetdevvportprofile.c:
- removes some netlink functions which are now available in
virnetdev.c
- Adds a vf argument to all port profile functions.
For 802.1Qbh devices, the port profile calls can use a vf argument if
passed by the caller. If the vf argument is -1 it will try to derive the vf
if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device
argument because during port profile assignment on a hostdev, this argument
can be null.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
qemuDomainAttachNetDevice
- re-ordered some things at start of function because
networkAllocateActualDevice should always be run and a slot
in def->nets always allocated, but host_net_add isn't needed
if the actual type is hostdev.
- if actual type is hostdev, defer to
qemuDomainAttachHostDevice (which will reach up to the NetDef
for things like MAC address when necessary). After return
from qemuDomainAttachHostDevice, slip directly to cleanup,
since the rest of the function is specific to emulated net
devices.
- put assignment of new NetDef into expanded def->nets down
below cleanup: (but only on success) since it is also needed
for emulated and hostdev net devices.
qemuDomainDetachHostDevice
- after locating the exact device to detach, check if it's a
network device and, if so, use toplevel
qemuDomainDetachNetDevice instead so that the def->nets list
is properly updated, and 'actual device' properly returned to
network pool if appropriate. Otherwise, for normal hostdevs,
call the lower level qemuDomainDetachThisDevice.
qemuDomainDetachNetDevice
- This is where it gets a bit tricky. After locating the device
on the def->nets list, if the network device type == hostdev,
call the *lower level* qemuDomainDetachThisDevice (which will
reach back up to the parent net device for MAC address /
virtualport when appropriate, then clear the device out of
def->hostdevs) before skipping past all the emulated
net-device-specific code to cleanup:, where the network
device is removed from def->nets, and the network device
object is freed.
In short, any time a hostdev-type network device is detached, we must
go through the toplevel virDomaineDetachNetDevice function first and
last, to make sure 1) the def->nnets list is properly managed, and 2)
any device allocated with networkAllocateActualDevice is properly
freed. At the same time, in the middle we need to go through the
lower-level vidDomainDetach*This*HostDevice to be sure that 1) the
def->hostdevs list is properly managed, 2) the PCI device is properly
detached from the guest and reattached to the host (if appropriate),
and 3) any higher level teardown is called at the appropriate time, by
reaching back up to the NetDef config (part (3) will be covered in a
separate patch).
This patch makes sure that each network device ("interface") of
type='hostdev' appears on both the hostdevs list and the nets list of
the virDomainDef, and it modifies the qemu driver startup code so that
these devices will be presented to qemu on the commandline as hostdevs
rather than as network devices.
It does not add support for hotplug of these type of devices, or code
to honor the <mac address> or <virtualport> given in the config (both
of those will be done in separate patches).
Once each device is placed on both lists, much of what this patch does
is modify places in the code that traverse all the device lists so
that these hybrid devices are only acted on once - either along with
the other hostdevs, or along with the other network interfaces. (In
many cases, only one of the lists is traversed / a specific operation
is performed on only one type of device. In those instances, the code
can remain unchanged.)
There is one special case - when building the commandline, interfaces
are allowed to proceed all the way through
networkAllocateActualDevice() before deciding to skip the rest of
netdev-specific processing - this is so that (once we have support for
networks with pools of hostdev devices) we can get the actual device
allocated, then rely on the loop processing all hostdevs to generate
the correct commandline.
(NB: <interface type='hostdev'> is only supported for PCI network
devices that are SR-IOV Virtual Functions (VF). Standard PCI[e] and
USB devices, and even the Physical Functions (PF) of SR-IOV devices
can only be assigned to a guest using the more basic <hostdev> device
entry. This limitation is mostly due to the fact that non-SR-IOV
ethernet devices tend to lose mac address configuration whenever the
card is reset, which happens when a card is assigned to a guest;
SR-IOV VFs fortunately don't suffer the same problem.)
This is the new interface type that sets up an SR-IOV PCI network
device to be assigned to the guest with PCI passthrough after
initializing some network device-specific things from the config
(e.g. MAC address, virtualport profile parameters). Here is an example
of the syntax:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0' bus='0' slot='4' function='3'/>
</source>
<mac address='00:11:22:33:44:55'/>
<address type='pci' domain='0' bus='0' slot='7' function='0'/>
</interface>
This would assign the PCI card from bus 0 slot 4 function 3 on the
host, to bus 0 slot 7 function 0 on the guest, but would first set the
MAC address of the card to 00:11:22:33:44:55.
NB: The parser and formatter don't care if the PCI card being
specified is a standard single function network adapter, or a virtual
function (VF) of an SR-IOV capable network adapter, but the upcoming
code that implements the back end of this config will work *only* with
SR-IOV VFs. This is because modifying the mac address of a standard
network adapter prior to assigning it to a guest is pointless - part
of the device reset that occurs during that process will reset the MAC
address to the value programmed into the card's firmware.
Although it's not supported by any of libvirt's hypervisor drivers,
usb network hostdevs are also supported in the parser and formatter
for completeness and consistency. <source> syntax is identical to that
for plain <hostdev> devices, except that the <address> element should
have "type='usb'" added if bus/device are specified:
<interface type='hostdev'>
<source>
<address type='usb' bus='0' device='4'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
If the vendor/product form of usb specification is used, type='usb'
is implied:
<interface type='hostdev'>
<source>
<vendor id='0x0012'/>
<product id='0x24dd'/>
</source>
<mac address='00:11:22:33:44:55'/>
</interface>
Again, the upcoming patch to fill in the backend of this functionality
will log an error and fail with "Unsupported Config" if you actually
try to assign a USB network adapter to a guest using <interface
type='hostdev'> - just use a standard <hostdev> entry in that case
(and also for single-port PCI adapters).
This refactoring is necessary to support hotplug detach of
type=hostdev network devices, but needs to be in a separate patch to
make potential debugging of regressions more practical.
Rather than the lowest level functions searching for a matching
device, the search is now done in the toplevel function, and an
intermediate-level function (qemuDomainDetachThisHostDevice()), which
expects that the device's entry is already found, is called (this
intermediate function will be called by qemuDomainDetachNetDevice() in
order to support detach of type=hostdev net devices)
This patch should result in 0 differences in functionality.
In order to allow for a virDomainHostdevDef that uses the
virDomainDeviceInfo of a "higher level" device (such as a
virDomainNetDef), this patch changes the virDomainDeviceInfo in the
HostdevDef into a virDomainDeviceInfoPtr. Rather than adding checks
all over the code to check for a null info, we just guarantee that it
is always valid. The new function virDomainHostdevDefAlloc() allocates
a virDomainDeviceInfo and plugs it in, and virDomainHostdevDefFree()
makes sure it is freed.
There were 4 places allocating virDomainHostdevDefs, all of them
parsers of one sort or another, and those have all had their
VIR_ALLOC(hostdev) changed to virDomainHostdevDefAlloc(). Other than
that, and the new functions, all the rest of the changes are just
mechanical removals of "&" or changing "." to "->".
There will be cases where the iterator callback will need to know the
type of the device whose info is being operated on, and possibly even
need to use some of the device's config. This patch adds a
virDomainDeviceDefPtr to the args of every callback, and fills it in
appropriately as the devices are iterated through.
The virDomainDeviceInfoPtrs in qemuCollectPCIAddress and
qemuComparePCIDevice are named "dev" and "dev1", but those functions
will be changed (in order to match a change in the args sent to
virDomainDeviceInfoIterate() callback args) to contain a
virDomainDeviceDefPtr device.
This patch renames "dev" to "info" (and "dev[n]" to "info[n]") to
avoid later confusion.
Qemu supports sizing by bytes; we shouldn't force the user to
round up if they really wanted an unaligned total size.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_RESIZE_BYTES):
New flag.
* src/libvirt.c (virDomainBlockResize): Document it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockResize): Take
size in bytes.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextBlockResize):
Likewise. Pass bytes, not megabytes, to monitor.
* src/qemu/qemu_driver.c (qemuDomainBlockResize): Implement new
flag.
No matter what cache mode is used, readonly disks are always safe wrt
migration. Shared disks are required to be readonly or to disable
host-side cache, which makes them safe as well.
With an additional new bool added to determine whether or not to
discourage the use of the supplied MAC address by the bridge itself,
virNetDevTapCreateInBridgePort had three booleans (well, 2 bools and
an int used as a bool) in the arg list, which made it increasingly
difficult to follow what was going on. This patch combines those three
into a single flags arg, which not only shortens the arg list, but
makes it more self-documenting.
When a tap device for a domain is created and attached to a bridge,
the first byte of the tap device MAC address is set to 0xFE, while the
rest is set to match the MAC address that will be presented to the
guest as its network device MAC address. Setting this high value in
the tap's MAC address discourages the bridge from using the tap
device's MAC address as the bridge's own MAC address (Linux bridges
always take on the lowest numbered MAC address of all attached devices
as their own).
In one case within libvirt, a tap device is created and attached to
the bridge with the intent that its MAC address be taken on by the
bridge as its own (this is used to assure that the bridge has a fixed
MAC address to prevent network outages created by the bridge MAC
address "flapping" as guests are started and stopped). In this case,
the first byte of the mac address is *not* altered to 0xFE.
In the current code, callers to virNetDevTapCreateInBridgePort each
make the MAC address modification themselves before calling, which
leads to code duplication, and also prevents lower level functions
from knowing the real MAC address being used by the guest. The problem
here is that openvswitch bridges must be informed about this MAC
address, or they will be unable to pass traffic to/from the guest.
This patch centralizes the location of the MAC address "0xFE fixup"
into virNetDevTapCreateInBridgePort(), meaning 1) callers of this
function no longer need the extra strange bit of code, and 2)
bitNetDevTapCreateBridgeInPort itself now is called with the guest's
unaltered MAC address, and can pass it on, unmodified, to
virNetDevOpenvswitchAddPort.
There is no other behavioral change created by this patch.
No thanks to 64-bit windows, with 64-bit pid_t, we have to avoid
constructs like 'int pid'. Our API in libvirt-qemu cannot be
changed without breaking ABI; but then again, libvirt-qemu can
only be used on systems that support UNIX sockets, which rules
out Windows (even if qemu could be compiled there) - so for all
points on the call chain that interact with this API decision,
we require a different variable name to make it clear that we
audited the use for safety.
Adding a syntax-check rule only solves half the battle; anywhere
that uses printf on a pid_t still needs to be converted, but that
will be a separate patch.
* cfg.mk (sc_correct_id_types): New syntax check.
* src/libvirt-qemu.c (virDomainQemuAttach): Document why we didn't
use pid_t for pid, and validate for overflow.
* include/libvirt/libvirt-qemu.h (virDomainQemuAttach): Tweak name
for syntax check.
* src/vmware/vmware_conf.c (vmwareExtractPid): Likewise.
* src/driver.h (virDrvDomainQemuAttach): Likewise.
* tools/virsh.c (cmdQemuAttach): Likewise.
* src/remote/qemu_protocol.x (qemu_domain_attach_args): Likewise.
* src/qemu_protocol-structs (qemu_domain_attach_args): Likewise.
* src/util/cgroup.c (virCgroupPidCode, virCgroupKillInternal):
Likewise.
* src/qemu/qemu_command.c(qemuParseProcFileStrings): Likewise.
(qemuParseCommandLinePid): Use pid_t for pid.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* src/conf/domain_conf.h (_virDomainObj): Likewise.
* src/probes.d (rpc_socket_new): Likewise.
* src/qemu/qemu_command.h (qemuParseCommandLinePid): Likewise.
* src/qemu/qemu_driver.c (qemudGetProcessInfo, qemuDomainAttach):
Likewise.
* src/qemu/qemu_process.c (qemuProcessAttach): Likewise.
* src/qemu/qemu_process.h (qemuProcessAttach): Likewise.
* src/uml/uml_driver.c (umlGetProcessInfo): Likewise.
* src/util/virnetdev.h (virNetDevSetNamespace): Likewise.
* src/util/virnetdev.c (virNetDevSetNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* src/conf/storage_conf.h (_virStoragePerms): Use mode_t, uid_t,
and gid_t rather than int.
* src/security/security_dac.c (virSecurityDACSetOwnership): Likewise.
* src/conf/storage_conf.c (virStorageDefParsePerms): Avoid
compiler warning.
This actually wires up the new optional parameter to block_stream:
http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI
The error checking is still sparse, since libvirt must not use
qemu-img or header probing on a qcow2 file in use by qemu to
check if the backing file name is valid; so for now, libvirt is
relying on qemu to diagnose an incorrect backing name. Fixing this
will require libvirt to track the entire backing file chain at the
time qemu is started and keeps it updated with snapshot and pull
operations.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Add
parameter, and update callers.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockJob): Update
signature.
* src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Likewise.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Update caller.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Likewise.
Block job commands are not part of upstream qemu until 1.1; and
proper support of job completion and cancellation depends on being
able to receive QMP events, which implies the JSON monitor.
Additionally, some early versions of block job commands were
backported to RHEL qemu, but these versions lacked asynchronous
job cancellation and partial block pull, so there are several
patches that will still be needed in this area of libvirt code
to support both flavors of block job commands.
Due to earlier patches in libvirt, we are guaranteed that all versions
of qemu that support block job commands already require libvirt to
use the JSON monitor. That means that the text version of block jobs
will not be used, and having to refactor two copies of the block job
handlers makes no sense. So instead, we delete the text handlers.
* src/qemu/qemu_monitor.c (qemuMonitorBlockJob): Drop text monitor
support.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextBlockJob): Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextParseBlockJobOne)
(qemuMonitorTextParseBlockJob, qemuMonitorTextBlockJob):
Likewise.
Add de-association handling for 802.1qbg (vepa) via lldpad
netlink messages. Also adds the possibility to perform an
association request without waiting for a confirmation.
Signed-off-by: D. Herrendoerfer <d.herrendoerfer@herrendoerfer.name>
For any disk controller model which is not "lsilogic", the command
line will be like:
-drive file=/dev/sda,if=none,id=drive-scsi0-0-3-0,format=raw \
-device scsi-disk,bus=scsi0.0,channel=0,scsi-id=3,lun=0,i\
drive=drive-scsi0-0-3-0,id=scsi0-0-3-0
The relationship between the libvirt address attrs and the qdev
properties are (controller model is not "lsilogic"; strings
inside <> represent libvirt adress attrs):
bus=scsi<controller>.0
channel=<bus>
scsi-id=<target>
lun=<unit>
* src/qemu/qemu_command.h: (New param "virDomainDefPtr def"
for function qemuBuildDriveDevStr; new param "virDomainDefPtr
vmdef" for function qemuAssignDeviceDiskAlias. Both for
virDomainDiskFindControllerModel's use).
* src/qemu/qemu_command.c:
- New param "virDomainDefPtr def" for qemuAssignDeviceDiskAliasCustom.
For virDomainDiskFindControllerModel's use, if the disk bus is "scsi"
and the controller model is not "lsilogic", "target" is one part of
the alias name.
- According change on qemuAssignDeviceDiskAlias and qemuBuildDriveDevStr
* src/qemu/qemu_hotplug.c:
- Changes to be consistent with declarations of qemuAssignDeviceDiskAlias
qemuBuildDriveDevStr, and qemuBuildControllerDevStr.
* tests/qemuxml2argvdata/qemuxml2argv-pseries-vio-user-assigned.args,
tests/qemuxml2argvdata/qemuxml2argv-pseries-vio.args: Update the
generated command line.
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
In qemuDomainAttachNetDevice, the guest's tap interface has only been
attached to the bridge if iface_connected is true. It's possible for
an error to occur prior to that happening, and previously we would
attempt to remove the tap interface from the bridge even if it hadn't
been attached.
QMP commands don't need to be escaped since converting them to json
also escapes special characters. When a QMP command fails, however,
libvirt falls back to HMP commands. These fallback functions
(qemuMonitorText*) do their own escaping, and pass the result directly
to qemuMonitorHMPCommandWithFd. If the monitor is in json mode, these
pre-escaped commands will be escaped again when converted to json,
which can result in the wrong arguments being sent.
For example, a filename test\file would be sent in json as
test\\file.
This prevented attaching an image file with a " or \ in its name in
qemu 1.0.50, and also broke rbd attachment (which uses backslashes to
escape some internal arguments.)
Reported-by: Masuko Tomoya <tomoya.masuko@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch fixes console corruption, that happens if two concurrent
sessions are opened for a single console on a domain. Result of this
corruption was that each of the console streams recieved just a part
of the data written to the pipe so every console rendered unusable.
New helper function for safe console handling is used to establish the
console stream connection. This function ensures that no other libvirt
client is using the console (with the ability to disconnect consoles of
libvirt clients) and that no UUCP style lockfile is placed on the PTY
device.
* src/qemu/qemu_domain.h
- add data structure to domain's private data dealing with
console connections
* src/qemu/qemu_domain.c:
- allocate/free domain's console data structure
* src/qemu/qemu_driver.c
- use the new helper function for console handling
using 'system-wakeup' monitor command. It is supported only in JSON,
as we are enabling it if possible. Moreover, this command is available
in qemu-1.1+ which definitely has JSON.
Function xmlParseURI does not remove square brackets around IPv6
address when parsing. One of the solutions is making wrappers around
functions working with xmlURI*. This assures that uri->server will be
always properly assigned and it doesn't have to be changed when used
on some new place in the code.
For this purpose, functions virParseURI and virSaveURI were
added. These function are wrappers around xmlParseURI and xmlSaveUri
respectively.
Also there is one new syntax check function to prohibit these functions
anywhere else.
File changes:
- src/util/viruri.h -- declaration
- src/util/viruri.c -- definition
- src/libvirt_private.syms -- symbol export
- src/Makefile.am -- added source and header files
- cfg.mk -- added sc_prohibit_xmlURI
- all others -- ID name and include fixes
It's possible to disable SPICE TLS in qemu.conf. When this happens,
libvirt ignores any SPICE TLS port or x509 directory that may have
been set when it builds the qemu command line to use. However, it's
not ignoring the secure channels that may have been set and adds
tls-channel arguments to qemu command line.
Current qemu versions don't report an error when this happens, and try to use
TLS for the specified channels.
Before this patch
<domain type='kvm'>
<name>auto-tls-port</name>
<memory>65536</memory>
<os>
<type arch='x86_64' machine='pc'>hvm</type>
</os>
<devices>
<graphics type='spice' port='5900' tlsPort='-1' autoport='yes' listen='0' ke
<listen type='address' address='0'/>
<channel name='main' mode='secure'/>
<channel name='inputs' mode='secure'/>
</graphics>
</devices>
</domain>
generates
-spice port=5900,addr=0,disable-ticketing,tls-channel=main,tls-channel=inputs
and starts QEMU.
After this patch, an error is reported if a TLS port is set in the XML
or if secure channels are specified but TLS is disabled in qemu.conf.
This is the behaviour the oVirt people (where I spotted this issue) said
they would expect.
This fixes bug #790436
https://bugzilla.redhat.com/show_bug.cgi?id=795656 mentions
that a graceful destroy request can time out, meaning that the
error message is user-visible and should be more appropriate
than just internal error.
* src/qemu/qemu_driver.c (qemuDomainDestroyFlags): Swap error type.
Migrating domains with disks using cache != none is unsafe unless the
disk images are stored on coherent clustered filesystem. Thus we forbid
migrating such domains unless VIR_MIGRATE_UNSAFE flags is used.
* src/qemu/qemu_process.c (qemuFindAgentConfig): avoid crash libvirtd due to
deref a NULL pointer.
* How to reproduce?
1. virsh edit the following xml into guest configuration:
<channel type='pty'>
<target type='virtio'/>
</channel>
2. virsh start <domain>
or
% virt-install -n foo -r 1024 --disk path=/var/lib/libvirt/images/foo.img,size=1 \
--channel pty,target_type=virtio -l <installation tree>
Signed-off-by: Alex Jia <ajia@redhat.com>
When migrating a qemu domain, we enter the monitor, send some commands,
try to connect to destination qemu, send other commands, end exit the
monitor. However, if we couldn't connect to destination qemu we forgot
to exit the monitor.
Bug introduced by commit d9d518b1c8.
In case libvirtd cannot detect host CPU model (which may happen if it
runs inside a virtual machine), the daemon is likely to segfault when
starting a new qemu domain. It segfaults when domain XML asks for host
(either model or passthrough) CPU or does not ask for any specific CPU
model at all.
This patch allows libvirt to add interfaces to already
existing Open vSwitch bridges. The following syntax in
domain XML file can be used:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'/>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
or if libvirt should auto-generate the interfaceid use
following syntax:
<interface type='bridge'>
<mac address='52:54:00:d0:3f:f2'/>
<source bridge='ovsbr'/>
<virtualport type='openvswitch'>
</virtualport>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
It is also possible to pass an optional profileid. To do that
use following syntax:
<interface type='bridge'>
<source bridge='ovsbr'/>
<mac address='00:55:1a:65:a2:8d'/>
<virtualport type='openvswitch'>
<parameters interfaceid='921a80cd-e6de-5a2e-db9c-ab27f15a6e1d'
profileid='test-profile'/>
</virtualport>
</interface>
To create Open vSwitch bridge install Open vSwitch and
run the following command:
ovs-vsctl add-br ovsbr
The current default method of terminating the qemu process is to send
a SIGTERM, wait for up to 1.6 seconds for it to cleanly shutdown, then
send a SIGKILL and wait for up to 1.4 seconds more for the process to
terminate. This is problematic because occasionally 1.6 seconds is not
long enough for the qemu process to flush its disk buffers, so the
guest's disk ends up in an inconsistent state.
Since this only occasionally happens when the timeout prior to SIGKILL
is 1.6 seconds, this patch increases that timeout to 10 seconds. At
the very least, this should reduce the occurrence from "occasionally"
to "extremely rarely". (Once SIGKILL is sent, it waits another 5
seconds for the process to die before returning).
Note that in the cases where it takes less than this for qemu to
shutdown cleanly, libvirt will *not* wait for any longer than it would
without this patch - qemuProcessKill polls the process and returns as
soon as it is gone.
This patch is based on an earlier patch by Eric Blake which was never
committed:
https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html
Aside from rebasing, this patch only drops the driver lock once (prior
to the first time the function sleeps), then leaves it dropped until
it returns (Eric's patch would drop and re-acquire the lock around
each call to sleep).
At the time Eric sent his patch, the response (from Dan Berrange) was
that, while it wasn't a good thing to be holding the driver lock while
sleeping, we really need to rethink locking wrt the driver object,
switching to a finer-grained approach that locks individual items
within the driver object separately to allow for greater concurrency.
This is a good plan, and at the time it made sense to not apply the
patch because there was no known bug related to the driver lock being
held in this function.
However, we now know that the length of the wait in qemuProcessKill is
sometimes too short to allow the qemu process to fully flush its disk
cache before SIGKILL is sent, so we need to lengthen the timeout (in
order to improve the situation with management applications until they
can be updated to use the new VIR_DOMAIN_DESTROY_GRACEFUL flag added
in commit 72f8a7f197). But, if we
lengthen the timeout, we also lengthen the amount of time that all
other threads in libvirtd are essentially blocked from doing anything
(since just about everything needs to acquire the driver lock, if only
for long enough to get a pointer to a domain).
The solution is to modify qemuProcessKill to drop the driver lock
while sleeping, as proposed in Eric's patch. Then we can increase the
timeout with a clear conscience, and thus at least lower the chances
that someone running with existing management software will suffer the
consequence's of qemu's disk cache not being flushed.
In the meantime, we still should work on Dan's proposal to make
locking within the driver object more fine grained.
(NB: although I couldn't find any instance where qemuProcessKill() was
called with no jobs active for the domain (or some other guarantee
that the current thread had at least one refcount on the domain
object), this patch still follows Eric's method of temporarily adding
a ref prior to unlocking the domain object, because I couldn't
convince myself 100% that this was the case.)
In the future (my next patch in fact) we may want to make
decisions depending on qemu having a monitor command or not.
Therefore, we want to set qemuCaps flag instead of querying
on the monitor each time we are about to make that decision.
When blkdeviotune was first committed in 0.9.8, we had the limitation
that setting one value reset all others. But bytes and iops should
be relatively independent. Furthermore, setting tuning values on
a live domain followed by dumpxml did not output the new settings.
* src/qemu/qemu_driver.c (qemuDiskPathToAlias): Add parameter, and
update callers.
(qemuDomainSetBlockIoTune): Don't lose previous unrelated
settings. Make live changes reflect to dumpxml output.
* tools/virsh.pod (blkdeviotune): Update documentation.
The auto-generated WWN comply with the new addressing schema of WWN:
<quote>
the first nibble is either hex 5 or 6 followed by a 3-byte vendor
identifier and 36 bits for a vendor-specified serial number.
</quote>
We choose hex 5 for the first nibble. And for the 3-bytes vendor ID,
we uses the OUI according to underlying hypervisor type, (invoking
virConnectGetType to get the virt type). e.g. If virConnectGetType
returns "QEMU", we use Qumranet's OUI (00:1A:4A), if returns
ESX|VMWARE, we use VMWARE's OUI (00:05:69). Currently it only
supports qemu|xen|libxl|xenapi|hyperv|esx|vmware drivers. The last
36 bits are auto-generated.
Some tools, such as virt-manager, prefers having the default USB
controller explicit in the XML document. This patch makes sure there
is one. With this patch, it is now possible to switch from USB1 to
USB2 from the release 0.9.1 of virt-manager.
Fix tests to pass with this change.
virsh blkiotune dom --device-weights /dev/sda,400 --config
wasn't working correctly.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Use
correct definition.
Reported by Alex Jia:
==21503== 112 (32 direct, 80 indirect) bytes in 1 blocks are
definitely lost in loss record 37 of 40
==21503== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==21503== by 0x4A8991: virAlloc (memory.c:101)
==21503== by 0x505A6C: x86DataCopy (cpu_x86.c:247)
==21503== by 0x507B34: x86Compute (cpu_x86.c:1225)
==21503== by 0x43103C: qemuBuildCommandLine (qemu_command.c:3561)
==21503== by 0x41C9F7: testCompareXMLToArgvHelper
(qemuxml2argvtest.c:183)
==21503== by 0x41E10D: virtTestRun (testutils.c:141)
==21503== by 0x41B942: mymain (qemuxml2argvtest.c:705)
==21503== by 0x41D7E7: virtTestMain (testutils.c:696)
Qemu uses non-blocking I/O which doesn't play nice with regular file
descriptors. We need to pass a pipe to qemu instead, which can easily be
done using iohelper.
virFileDirectFd was used for accessing files opened with O_DIRECT using
libvirt_iohelper. We will want to use the helper for accessing files
regardless on O_DIRECT and thus virFileDirectFd was generalized and
renamed to virFileWrapperFd.
Calling qemuDomainMigrateGraphicsRelocate notifies spice clients to
connect to destination qemu so that they can seamlessly switch streams
once migration is done. Unfortunately, current qemu is not able to
accept any connections while incoming migration connection is open.
Thus, we need to delay opening the migration connection to the point
spice client is already connected to the destination qemu.
This eliminates the warning message reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=624447
It was caused by a failure to open an image file that is not
accessible by root (the uid libvirtd is running as) because it's on a
root-squash NFS share, owned by a different user, with permissions of
660 (or maybe 600).
The solution is to use virFileOpenAs() rather than open(). The
codepath that generates the error is during qemuSetupDiskCGroup(), but
the actual open() is in a lower-level generic function called from
many places (virDomainDiskDefForeachPath), so some other pieces of the
code were touched just to add dummy (or possibly useful) uid and gid
arguments.
Eliminating this warning message has the nice side effect that the
requested operation may even succeed (which in this case isn't
necessary, but shouldn't hurt anything either).
virFileOpenAs previously would only try opening a file as the current
user, or as a different user, but wouldn't try both methods in a
single call. This made it cumbersome to use as a replacement for
open(2). Additionally, it had a lot of historical baggage that led to
it being difficult to understand.
This patch refactors virFileOpenAs in the following ways:
* reorganize the code so that everything dealing with both the parent
and child sides of the "fork+setuid+setgid+open" method are in a
separate function. This makes the public function easier to understand.
* Allow a single call to virFileOpenAs() to first attempt the open as
the current user, and if that fails to automatically re-try after
doing fork+setuid (if deemed appropriate, i.e. errno indicates it
would now be successful, and the file is on a networkFS). This makes
it possible (in many, but possibly not all, cases) to drop-in
virFileOpenAs() as a replacement for open(2).
(NB: currently qemuOpenFile() calls virFileOpenAs() twice, once
without forking, then again with forking. That unfortunately can't
be changed without at least some discussion of the ramifications,
because the requested file permissions are different in each case,
which is something that a single call to virFileOpenAs() can't deal
with.)
* Add a flag so that any fchown() of the file to a different uid:gid
is explicitly requested when the function is called, rather than it
being implied by the presence of the O_CREAT flag. This just makes
for less subtle surprises to consumers. (Commit
b1643dc15c added the check for O_CREAT
before forcing ownership. This patch just makes that restriction
more explicit.)
* If either the uid or gid is specified as "-1", virFileOpenAs will
interpret this to mean "the current [gu]id".
All current consumers of virFileOpenAs should retain their present
behavior (after a few minor changes to their setup code and
arguments).
When libvirt's virDomainDestroy API is shutting down the qemu process,
it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the
process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss
because the guest running in qemu doesn't have time to flush its disk
cache buffers before it's unceremoniously whacked.
This patch maintains that default behavior, but provides a new flag
VIR_DOMAIN_DESTROY_GRACEFUL to alter the behavior. If this flag is set
in the call to virDomainDestroyFlags, SIGKILL will never be sent to
the qemu process; instead, if the timeout is reached and the qemu
process still exists, virDomainDestroy will return an error.
Once this patch is in, the recommended method for applications to call
virDomainDestroyFlags will be with VIR_DOMAIN_DESTROY_GRACEFUL
included. If that fails, then the application can decide if and when
to call virDomainDestroyFlags again without
VIR_DOMAIN_DESTROY_GRACEFUL (to force the issue with SIGKILL).
(Note that this does not address the issue of existing applications
that have not yet been modified to use VIR_DOMAIN_DESTROY_GRACEFUL.
That is a separate patch.)
Curently security labels can be of type 'dynamic' or 'static'.
If no security label is given, then 'dynamic' is assumed. The
current code takes advantage of this default, and avoids even
saving <seclabel> elements with type='dynamic' to disk. This
means if you temporarily change security driver, the guests
can all still start.
With the introduction of sVirt to LXC though, there needs to be
a new default of 'none' to allow unconfined LXC containers.
This patch introduces two new security label types
- default: the host configuration decides whether to run the
guest with type 'none' or 'dynamic' at guest start
- none: the guest will run unconfined by security policy
The 'none' label type will obviously be undesirable for some
deployments, so a new qemu.conf option allows a host admin to
mandate confined guests. It is also possible to turn off default
confinement
security_default_confined = 1|0 (default == 1)
security_require_confined = 1|0 (default == 0)
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add new
seclabel types
* src/security/security_manager.c, src/security/security_manager.h:
Set default sec label types
* src/security/security_selinux.c: Handle 'none' seclabel type
* src/qemu/qemu.conf, src/qemu/qemu_conf.c, src/qemu/qemu_conf.h,
src/qemu/libvirtd_qemu.aug: New security config options
* src/qemu/qemu_driver.c: Tell security driver about default
config
This is a trivial implementation, which works with the current
released qemu 1.0 with backports of preliminary block pull but
no partial rebase. Future patches will update the monitor handling
to support an optional parameter for partial rebase; but as qemu
1.1 is unreleased, it can be in later patches, designed to be
backported on top of the supported API.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Add parameter,
and adjust callers. Drop redundant check.
(qemuDomainBlockPull): Move guts...
(qemuDomainBlockRebase): ...to new function.
This patch adds support for the new api into the qemu driver to support
modification and retrieval of domain description and title. This patch
does not add support for modifying the <metadata> element.
In qemuDomainShutdownFlags if we try to use guest agent,
which has error or is not configured, we jump go endjob
label even if we haven't started any job yet. This may
lead to the daemon crash:
1) virsh shutdown --mode agent on a domain without agent configured
2) wait until domain quits
3) virsh edit
Fix my typo at
commit 74e034964c
"disk->rawio == -1" indicates that this value is not
specified. So in case of this, domain must not
be tainted.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
This patch revises qemuProcessStart() function for qemu
processes to retain CAP_SYS_RAWIO if needed.
And in case of that, add taint flag to domain.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Shota Hirae <m11g1401@hibikino.ne.jp>
This patch addresses: https://bugzilla.redhat.com/show_bug.cgi?id=781562
Along with the "rombar" option that controls whether or not a boot rom
is made visible to the guest, qemu also has a "romfile" option that
allows specifying a binary file to present as the ROM BIOS of any
emulated or passthrough PCI device. This patch adds support for
specifying romfile to both passthrough PCI devices, and emulated
network devices that attach to the guest's PCI bus (just about
everything other than ne2k_isa).
One example of the usefulness of this option is described in the
bugzilla report: 82576 sriov network adapters don't provide a ROM BIOS
for the cards virtual functions (VF), but an image of such a ROM is
available, and with this ROM visible to the guest, it can PXE boot.
In libvirt's xml, the new option is configured like this:
<hostdev>
...
<rom file='/etc/fake/boot.bin'/>
...
</hostdev
(similarly for <interface>).
When support for the rombar option was added, it was only added for
PCI passthrough devices, configured with <hostdev>. The same option is
available for any network device that is attached to the guest's PCI
bus. This patch allows setting rombar for any PCI network device type.
After adding cases to test this to qemuxml2argv-hostdev-pci-rombar.*,
I decided to rename those files (to qemuxml2argv-pci-rom.*) to more
accurately reflect the additional tests, and also noticed that up to
now we've only been performing a domainschematest for that case, so I
added the "pci-rom" test to both qemuxml2argv and qemuxml2xml (and in
the process found some bugs whose fixes I squashed into previous
commits of this series).
To help consolidate the commonality between virDomainHostdevDef and
virDomainNetDef into as few members as possible (and because I
think it makes sense), this patch moves the rombar and bootIndex
members into the "info" member that is common to both (and to all the
other structs that use them).
It's a bit problematic that this gives rombar and bootIndex to many
device types that don't use them, but this is already the case for the
master and mastertype members of virDomainDeviceInfo, and is properly
commented as such in the definition.
Note that this opens the door to supporting rombar for other devices
that are attached to the guest PCI bus - virtio-blk-pci,
virtio-net-pci, various other network adapters - which which have that
capability in qemu, but previously had no support in libvirt.
If yajl was not compiled in, we end up freeing an incoming
parameter, which leads to a bogus free later on. Regression
introduced in commit 6e769eb.
* src/qemu/qemu_capabilities.c (qemuCapsParseHelpStr): Avoid alloc
on failure path, which in turn fixes bogus free.
Reported by Cole Robinson.
QEMU supports a bunch of CPUID features that are tied to the kvm CPUID
nodes rather than the processor's. They are "kvmclock",
"kvm_nopiodelay", "kvm_mmu", "kvm_asyncpf". These are not known to
libvirt and their CPUID leaf might move if (for example) the Hyper-V
extensions are enabled. Hence their handling would anyway require some
special-casing.
However, among these the most useful is kvmclock; an additional
"property" of this feature is that a <timer> element is a better model
than a CPUID feature. Although, creating part of the -cpu command-line
from something other than the <cpu> XML element introduces some
ugliness.
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid creating an empty <cpu> element when the QEMU command-line simply
specifies the default "-cpu qemu32" or "-cpu qemu64".
This requires the previous patch, which lets us represent "-cpu qemu32"
as <os arch='i686'> in the generated XML.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qemu32 CPU model is chosen based on the <os arch=...> name when
creating the QEMU command line for a 64-bit host. For the opposite
transformation we can test the guest CPU model for the "lm" feature.
If it is absent, def->os.arch needs to be corrected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When running under KVM, the arch is usually set to i686 because
the name of the emulator is not qemu-system-x86_64. Use the host
arch instead.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qemu developers have made it clear that modern qemu will no
longer guarantee human monitor command stability; furthermore,
some features, such as async events, are only supported via qmp.
If we are compiled without support for handling JSON, we cannot
expect to sanely interact with modern qemu.
However, things must continue to build on RHEL 5, where qemu
is stuck at 0.10, and where yajl is not available.
Another benefit of this patch: future additions of new monitor
commands need only focus on qemu_monitor_json.c, instead of
also wasting time with qemu_monitor_text.c.
* src/qemu/qemu_capabilities.c (qemuCapsComputeCmdFlags): Report
error if yajl is missing but qemu requires qmp.
(qemuCapsParseHelpStr): Propagate error.
(qemuCapsExtractVersionInfo): Update caller.
* tests/qemuhelptest.c (testHelpStrParsing): Likewise.
I'm getting tired of remembering to backport RHEL-specific
patches when building upstream libvirt on RHEL 6.x or CentOS.
All the affected versions of RHEL qemu-kvm have backported
enough patches to a) make JSON useful, and b) modify the
-help text to mention libvirt as the preferred interface;
which means this string in the help output is a reliable
indicator that we can outsmart a strict version check,
even when upstream qemu 0.12 lacked the needed features.
* src/qemu/qemu_capabilities.c (qemuCapsComputeCmdFlags):
Recognize particular help string present when enough features were
backported to be worth using JSON.
* tests/qemuhelptest.c (mymain): Update tests accordingly.
QEMU always sends details about all available block devices as an answer
for "info block"/"query-block" command. On the other hand, our
qemuMonitorGetBlockInfo was made for a single block devices queries
only. Thus, when asking for multiple devices, we asked qemu multiple
times to always get the same answer from which different parts were
filtered. This patch makes qemuMonitorGetBlockInfo return a hash table
of all block devices, which may later be used for getting details about
specific devices.
In preparation for the patch to include Murmurhash3, which
introduces a virhashcode.h and virhashcode.c files, rename
the existing hash.h and hash.c to virhash.h and virhash.c
respectively.
Direct boot (using kernel, initrd, and command line) is used by
virt-install/virt-manager for network install. While any bootindex has
no direct effect since -kernel is always first, we need it as a hint for
SeaBIOS to present disks in the same order as they will be presented
during normal boot.
This makes use of the QEMU guest agent to implement the
virDomainShutdownFlags and virDomainReboot APIs. With
no flags specified, it will prefer to use the agent, but
fallback to ACPI. Explicit choice can be made by using
a suitable flag
* src/qemu/qemu_driver.c: Wire up use of agent
There is now a standard QEMU guest agent that can be installed
and given a virtio serial channel
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/f16x86_64.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
</channel>
The protocol that runs over the guest agent is JSON based and
very similar to the JSON monitor. We can't use exactly the same
code because there are some odd differences in the way messages
and errors are structured. The qemu_agent.c file is based on
a combination and simplification of qemu_monitor.c and
qemu_monitor_json.c
* src/qemu/qemu_agent.c, src/qemu/qemu_agent.h: Support for
talking to the agent for shutdown
* src/qemu/qemu_domain.c, src/qemu/qemu_domain.h: Add thread
helpers for talking to the agent
* src/qemu/qemu_process.c: Connect to agent whenever starting
a guest
* src/qemu/qemu_monitor_json.c: Make variable static
When converting a linear enum to a string, we have checks in
place in the VIR_ENUM_IMPL macro to ensure that there is one
string for every value, which lets us quickly flag if a user
added a value but forgot to add a counterpart string. However,
this only works if we use the _LAST marker.
* cfg.mk (sc_require_enum_last_marker): New syntax check.
* src/conf/domain_conf.h (virDomainSnapshotState): Add new marker.
* src/conf/domain_conf.c (virDomainSnapshotState): Fix offender.
* src/qemu/qemu_monitor_json.c (qemuMonitorWatchdogAction)
(qemuMonitorIOErrorAction, qemuMonitorGraphicsAddressFamily):
Likewise.
* src/util/virtypedparam.c (virTypedParameter): Likewise.
Reusing common code makes things smaller; it also buys us some
additional safety, such as now rejecting duplicate parameters
during a set operation.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters)
(qemuDomainSetMemoryParameters, qemuDomainSetNumaParameters)
(qemuSetSchedulerParametersFlags)
(qemuDomainSetInterfaceParameters, qemuDomainSetBlockIoTune)
(qemuDomainGetBlkioParameters, qemuDomainGetMemoryParameters)
(qemuDomainGetNumaParameters, qemuGetSchedulerParametersFlags)
(qemuDomainBlockStatsFlags, qemuDomainGetInterfaceParameters)
(qemuDomainGetBlockIoTune): Use new helpers.
* src/esx/esx_driver.c (esxDomainSetSchedulerParametersFlags)
(esxDomainSetMemoryParameters)
(esxDomainGetSchedulerParametersFlags)
(esxDomainGetMemoryParameters): Likewise.
* src/libxl/libxl_driver.c
(libxlDomainSetSchedulerParametersFlags)
(libxlDomainGetSchedulerParametersFlags): Likewise.
* src/lxc/lxc_driver.c (lxcDomainSetMemoryParameters)
(lxcSetSchedulerParametersFlags, lxcDomainSetBlkioParameters)
(lxcDomainGetMemoryParameters, lxcGetSchedulerParametersFlags)
(lxcDomainGetBlkioParameters): Likewise.
* src/test/test_driver.c (testDomainSetSchedulerParamsFlags)
(testDomainGetSchedulerParamsFlags): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorSetSchedulerParameters)
(xenHypervisorGetSchedulerParameters): Likewise.
There was missing capability for blkiotune and thus specifying these
settings caused libvirt to run qemu with invalid parameters and then
reporting qemu error instead of the standard libvirt one. The support
for blkiotune setting was added in upstream qemu repo under commit
0563e191516289c9d2f282a8c50f2eecef2fa773.
pciTrySecondaryBusReset checks if there is active device on the
same bus, however, qemu driver doesn't maintain an effective
list for the inactive devices, and it passes meaningless argument
for parameter "inactiveDevs". e.g. (qemuPrepareHostdevPCIDevices)
if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
return -1;
..skipped...
if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
goto reattachdevs;
NB, the "pcidevs" used above are extracted from domain def, and
thus one won't be able to attach a device of which bus has other
device even detached from host (nodedev-detach). To see more
details of the problem:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=773667
This patch is to resolve the problem by introducing an inactive
PCI device list (just like qemu_driver->activePciHostdevs), and
the whole logic is:
* Add the device to inactive list during nodedev-dettach
* Remove the device from inactive list during nodedev-reattach
* Remove the device from inactive list during attach-device
(for non-managed device)
* Add the device to inactive list after detach-device, only
if the device is not managed
With the above, we have a sufficient inactive PCI device list, and thus
we can use it for pciResetDevice. e.g.(qemuPrepareHostdevPCIDevices)
if (pciResetDevice(dev, driver->activePciHostdevs,
driver->inactivePciHostdevs) < 0)
goto reattachdevs;
This introduces new attribute wrpolicy with only supported
value as immediate. This will be an optional
attribute with no defaults. This helps specify whether
to skip the host page cache.
When wrpolicy is specified, meaning when wrpolicy=immediate
a writeback is explicitly initiated for the dirty pages in
the host page cache as part of the guest file write operation.
Usage:
<filesystem type='mount' accessmode='passthrough'>
<driver type='path' wrpolicy='immediate'/>
<source dir='/export/to/guest'/>
<target dir='mount_tag'/>
</filesystem>
Currently this only works with type='mount' for the QEMU/KVM driver.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
In the past we didn't reserve 0:0:2.0 PCI address if there was no video
device assigned to a domain, which made it impossible to add a video
device later on. So we fixed it (commit v0.9.0-37-g7b2cac1) by always
reserving that address. However, that breaks existing domains without
video devices that already have another device assigned to the
problematic address.
This patch reserves address 0:0:2.0 only in case it was not explicitly
assigned to another device, which means libvirt will try to keep this
address free and will not automatically assign it new devices. But
existing domains for which older libvirt already assigned the address to
a non-video device will keep working as they used to work before 0.9.1.
Moreover, users who want to create a domain without a video device and
use its address for another device may do so by explicitly configuring
the PCI address in domain XML.
There are several reasons for doing this:
- the CPU specification is out of libvirt's control so we cannot
guarantee stable guest ABI
- not every feature of a CPU may actually work as expected when
advertised directly to a guest
- migration between two machines with exactly the same CPU may work but
no guarantees can be made
- this mode is not supported and its use is at one's own risk
VIR_DOMAIN_XML_UPDATE_CPU flag for virDomainGetXMLDesc may be used to
get updated custom mode guest CPU definition in case it depends on host
CPU. This patch implements the same behavior for host-model and
host-passthrough CPU modes.
In case a hypervisor doesn't support the exact CPU model requested by a
domain XML, we automatically fallback to a closest CPU model the
hypervisor supports (and make sure we add/remove any additional features
if needed). This patch adds 'fallback' attribute to model element, which
can be used to disable this automatic fallback.
Commit 5d784bd6d7 was a nice attempt to
clarify the semantics by requiring domain name from dxml to either match
original name or dname. However, setting dxml domain name to dname
doesn't really work since destination host needs to know the original
domain name to be able to use it in migration cookies. This patch
requires domain name in dxml to match the original domain name. The
change should be safe and backward compatible since migration would fail
just a bit later in the process.
We can't call qemuCapsExtractVersionInfo() from test code, because it
expects to be able to call the emulator, and for testing we have fake
emulators that can't be executed. For that reason qemuxml2argvtest.c
doesn't call qemuDomainAssignPCIAddresses(), instead it open codes its
own version.
That means we can't call qemuDomainAssignAddresses() from the test code,
instead we need to manually call qemuDomainAssignSpaprVioAddresses().
Also add logic to cope with qemuDomainAssignSpaprVioAddresses() failing,
so that we can write a test that checks for a known failure in there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
KVM will be able to use a PCI SCSI controller even on POWER. Let
the user specify the vSCSI controller by other means than a default.
After this patch, the QEMU driver will actually look at the model
and reject anything but auto, lsilogic and ibmvscsi.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new introduced optional attribute "copy_on_read</code> controls
whether to copy read backing file into the image file. The value can
be either "on" or "off". Copy-on-read avoids accessing the same backing
file sectors repeatedly and is useful when the backing file is over a
slow network. By default copy-on-read is off.
QEMU does not support security_model for anything but 'path' fs driver type.
Currently in libvirt, when security_model ( accessmode attribute) is not
specified it auto-generates it irrespective of the fs driver type, which
can result in a qemu error for drivers other than path. This patch ensures
that the qemu cmdline is correctly generated by taking into account the
fs driver type.
Signed-off-by: Deepak C Shetty <deepakcs@linux.vnet.ibm.com>
When sVirt is integrated with the LXC driver, it will be neccessary
to invoke the security driver APIs using only a virDomainDefPtr
since the lxc_container.c code has no virDomainObjPtr available.
Aside from two functions which want obj->pid, every bit of the
security driver code only touches obj->def. So we don't need to
pass a virDomainObjPtr into the security drivers, a virDomainDefPtr
is sufficient. Two functions also gain a 'pid_t pid' argument.
* src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/security/security_apparmor.c,
src/security/security_dac.c,
src/security/security_driver.h,
src/security/security_manager.c,
src/security/security_manager.h,
src/security/security_nop.c,
src/security/security_selinux.c,
src/security/security_stack.c: Change all security APIs to use a
virDomainDefPtr instead of virDomainObjPtr
When disk snapshots were first implemented, libvirt blindly refused
to allow an external snapshot destination that already exists, since
qemu will blindly overwrite the contents of that file during the
snapshot_blkdev monitor command, and we don't like a default of
data loss by default. But VDSM has a scenario where NFS permissions
are intentionally set so that the destination file can only be
created by the management machine, and not the machine where the
guest is running, so that libvirt will necessarily see the destination
file already existing; adding a flag will allow VDSM to force the file
reuse without libvirt complaining of possible data loss.
https://bugzilla.redhat.com/show_bug.cgi?id=767104
* include/libvirt/libvirt.h.in (virDomainSnapshotCreateFlags): Add
VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it. Add
note about partial failure.
* tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Add new
flag.
* tools/virsh.pod (snapshot-create, snapshot-create-as): Document
it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateXML): Implement the new flag.
This *kind of* addresses:
https://bugzilla.redhat.com/show_bug.cgi?id=772395
(it doesn't eliminate the failure to start, but causes libvirt to give
a better idea about the cause of the failure).
If a guest uses a kvm emulator (e.g. /usr/bin/qemu-kvm) and the guest
is started when kvm isn't available (either because virtualization is
unavailable / has been disabled in the BIOS, or the kvm modules
haven't been loaded for some reason), a semi-cryptic error message is
logged:
libvirtError: internal error Child process (LC_ALL=C
PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/bin/qemu-kvm -device ? -device
pci-assign,? -device virtio-blk-pci,? -device virtio-net-pci,?) status
unexpected: exit status 1
This patch notices at process start that a guest needs kvm, and checks
for the presence of /dev/kvm (a reasonable indicator that kvm is
available) before trying to execute the qemu binary. If kvm isn't
available, a more useful (too verbose??) error is logged.
It should be a copy-paste error, the result is programming will result in an
infinite loop again due to without iterating 'j' variable.
* src/qemu/qemu_driver.c: fix a typo on qemuDomainSetBlkioParameters.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=770520
Signed-off-by: Alex Jia <ajia@redhat.com>
In the past, generic SCSI commands issued from a guest to a virtio
disk were always passed through to the underlying disk by qemu, and
the kernel would also pass them on.
As a result of CVE-2011-4127 (see:
http://seclists.org/oss-sec/2011/q4/536), qemu now honors its
scsi=on|off device option for virtio-blk-pci (which enables/disables
passthrough of generic SCSI commands), and the kernel will only allow
the commands for physical devices (not for partitions or logical
volumes). The default behavior of qemu is still to allow sending
generic SCSI commands to physical disks that are presented to a guest
as virtio-blk-pci devices, but libvirt prefers to disable those
commands in the standard virtio block devices, enabling it only when
specifically requested (hopefully indicating that the requester
understands what they're asking for). For this purpose, a new libvirt
disk device type (device='lun') has been created.
device='lun' is identical to the default device='disk', except that:
1) It is only allowed if bus='virtio', type='block', and the qemu
version is "new enough" to support it ("new enough" == qemu 0.11 or
better), otherwise the domain will fail to start and a
CONFIG_UNSUPPORTED error will be logged).
2) The option "scsi=on" will be added to the -device arg to allow
SG_IO commands (if device !='lun', "scsi=off" will be added to the
-device arg so that SG_IO commands are specifically forbidden).
Guests which continue to use disk device='disk' (the default) will no
longer be able to use SG_IO commands on the disk; those that have
their disk device changed to device='lun' will still be able to use SG_IO
commands.
*docs/formatdomain.html.in - document the new device attribute value.
*docs/schemas/domaincommon.rng - allow it in the RNG
*tests/* - update the args of several existing tests to add scsi=off, and
add one new test that will test scsi=on.
*src/conf/domain_conf.c - update domain XML parser and formatter
*src/qemu/qemu_(command|driver|hotplug).c - treat
VIR_DOMAIN_DISK_DEVICE_LUN *almost* identically to
VIR_DOMAIN_DISK_DEVICE_DISK, except as indicated above.
Note that no support for this new device value was added to any
hypervisor drivers other than qemu, because it's unclear what it might
mean (if anything) to those drivers.
This patch adds two capabilities flags to deal with various aspects
of supporting SG_IO commands on virtio-blk-pci devices:
QEMU_CAPS_VIRTIO_BLK_SCSI
set if -device virtio-blk-pci accepts the scsi="on|off" option
When present, this is on by default, but can be set to off to disable
SG_IO functions.
QEMU_CAPS_VIRTIO_BLK_SG_IO
set if SG_IO commands are supported in the virtio-blk-pci driver
(present since qemu 0.11 according to a qemu developer, if I
understood correctly)
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=638633
Although scripts are not used by interfaces of type other than
"ethernet" in qemu, due to the fact that the parser stores the script
name in a union that is only valid when type is ethernet or bridge,
there is no way for anyone except the parser itself to catch the
problem of specifying an interface script for an inappropriate
interface type (by the time the parsed data gets back to the code that
called the parser, all evidence that a script was specified is
forgotten).
Since the parser itself should be agnostic to which type of interface
allows scripts (an example of why: a script specified for an interface
of type bridge is valid for xen domains, but not for qemu domains),
the solution here is to move the script out of the union(s) in the
DomainNetDef, always populate it when specified (regardless of
interface type), and let the driver decide whether or not it is
appropriate.
Currently the qemu, xen, libxml, and uml drivers recognize the script
parameter and do something with it (the uml driver only to report that
it isn't supported). Those drivers have been updated to log a
CONFIG_UNSUPPORTED error when a script is specified for an interface
type that's inappropriate for that particular hypervisor.
(NB: There was earlier discussion of solving this problem by adding a
VALIDATE flag to all libvirt APIs that accept XML, which would cause
the XML to be validated against the RNG files. One statement during
that discussion was that the RNG shouldn't contain hypervisor-specific
things, though, and a proper solution to this problem would require
that (again, because a script for an interface of type "bridge" is
accepted by xen, but not by qemu).
Commit ae523427 missed one pair of functions that could use
the helper routine.
* src/qemu/qemu_driver.c (qemuSetSchedulerParametersFlags)
(qemuGetSchedulerParametersFlags): Simplify.
Detected by valgrind. Leak introduced in commit 5745dc1.
* src/qemu/qemu_command.c: fix memory leak on failure and successful path.
* How to reproduce?
% valgrind -v --leak-check=full ./qemuargv2xmltest
* Actual result:
==2196== 80 bytes in 1 blocks are definitely lost in loss record 3 of 4
==2196== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==2196== by 0x39CF07F6E1: strdup (in /lib64/libc-2.12.so)
==2196== by 0x419823: qemuParseRBDString (qemu_command.c:1657)
==2196== by 0x4221ED: qemuParseCommandLine (qemu_command.c:5934)
==2196== by 0x422AFB: qemuParseCommandLineString (qemu_command.c:7561)
==2196== by 0x416864: testCompareXMLToArgvHelper (qemuargv2xmltest.c:48)
==2196== by 0x417DB1: virtTestRun (testutils.c:141)
==2196== by 0x415CAF: mymain (qemuargv2xmltest.c:175)
==2196== by 0x4174A7: virtTestMain (testutils.c:696)
==2196== by 0x39CF01ECDC: (below main) (in /lib64/libc-2.12.so)
==2196==
==2196== LEAK SUMMARY:
==2196== definitely lost: 80 bytes in 1 blocks
Signed-off-by: Alex Jia <ajia@redhat.com>
When setting numa nodeset for a domain which has no nodeset set
before, libvirtd crashes by dereferencing the pointer to the old
nodemask which is null in that case.
Commit baade4d fixed a memory leak on failure, but in the process,
introduced a use-after-free on success, which can be triggered with:
1. set bandwidth with --live
2. query bandwidth
3. set bandwidth with --live
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Don't
free newBandwidth on success.
Reported by Hu Tao.
Most severe here is a latent (but currently untriggered) memory leak
if any hypervisor ever adds a string interface property; the
remainder are mainly cosmetic.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BANDWIDTH_*): Move
macros closer to interface that uses them, and document type.
* src/libvirt.c (virDomainSetInterfaceParameters)
(virDomainGetInterfaceParameters): Formatting tweaks.
* daemon/remote.c (remoteDispatchDomainGetInterfaceParameters):
Avoid memory leak.
* src/libvirt_public.syms (LIBVIRT_0.9.9): Sort lines.
* src/libvirt_private.syms (domain_conf.h): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetInterfaceParameters): Fix
comments, break long lines.
Leak detected by Coverity, and introduced in commit 93ab585.
Reported by Alex Jia.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Free
devices array on error.
https://bugzilla.redhat.com/show_bug.cgi?id=770520
We had two nested loops both trying to use 'i' as the iteration
variable, which can result in an infinite loop when the inner
loop interferes with the outer loop. Introduced in commit 93ab585.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Don't
reuse iteration variable across two loops.
This patch adds max_files option to qemu.conf which can be used to
override system default limit on number of opened files that are
allowed for qemu user.
Add logic to assign addresses for devices with spapr-vio addresses.
We also do validation of addresses specified by the user, ie. ensuring
that there are not duplicate addresses on the bus.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
For QEMU PPC64 we have a machine type ("pseries") which has a virtual
bus called "spapr-vio". We need to be able to create devices on this
bus, and as such need a way to specify the address for those devices.
This patch adds a new address type "spapr-vio", which achieves this.
The addressing is specified with a "reg" property in the address
definition. The reg is optional, if it is not specified QEMU will
auto-assign an address for the device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Currently non-x86 guests must have <acpi/> defined in <features> to
prevent libvirt from running qemu with -no-acpi. Although it works, it
is a hack.
Instead add a capability flag which indicates whether qemu understands
the -no-acpi option. Use it to control whether libvirt emits -no-acpi.
Current versions of qemu always display -no-acpi in their help output,
so this patch has no effect. However the development version of qemu
has been modified such that -no-acpi is only displayed when it is
actually supported.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
The lifetime of the virDomainEventState object is tied to
the lifetime of the driver, which in stateless drivers is
tied to the lifetime of the virConnectPtr.
If we add & remove a timer when allocating/freeing the
virDomainEventState object, we can get a situation where
the timer still triggers once after virDomainEventState
has been freed. The timeout callback can't keep a ref
on the event state though, since that would be a circular
reference.
The trick is to only register the timer when a callback
is registered with the event state & remove the timer
when the callback is unregistered.
The demo for the bug is to run
while true ; do date ; ../tools/virsh -q -c test:///default 'shutdown test; undefine test; dominfo test' ; done
prior to this fix, it will frequently hang and / or
crash, or corrupt memory
Currently all drivers using domain events need to provide a callback
for handling a timer to dispatch events in a clean stack. There is
no technical reason for dispatch to go via driver specific code. It
could trivially be dispatched directly from the domain event code,
thus removing tedious boilerplate code from all drivers
Also fix the libxl & xen drivers to pass 'true' when creating the
virDomainEventState, since they run inside the daemon & thus always
expect events to be present.
* src/conf/domain_event.c, src/conf/domain_event.h: Internalize
dispatch of events from timer callback
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
src/qemu/qemu_domain.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Remove all timer dispatch functions
When registering a callback for a particular event some callers
need to know how many callbacks already exist for that event.
While it is possible to ask for a count, this is not free from
race conditions when threaded. Thus the API for registering
callbacks should return the count of callbacks. Also rename
virDomainEventStateDeregisterAny to virDomainEventStateDeregisterID
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Return count of callbacks when
registering callbacks
* src/libxl/libxl_driver.c, src/libxl/libxl_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/remote/remote_driver.c, src/uml/uml_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c: Update
for change in APIs
A generic error code was returned, if the user aborted a migration job.
This made it hard to distinguish between a user requested abort and an
error that might have occured. This patch introduces a new error code,
which is returned in the specific case of a user abort, while leaving
all other failures with their existing code. This makes it easier to
distinguish between failure while mirgrating and an user requested
abort.
* include/libvirt/virterror.h: - add new error code
* src/util/virterror.c: - add message for the new error code
* src/qemu/qemu_migration.h: - Emit operation aborted error instead of
operation failed, on migration abort
If managed save fails at the right point in time, then the save
image can end up with 0 bytes in length (no valid header), and
our attempts in commit 55d88def to detect and skip invalid save
files missed this case.
* src/qemu/qemu_driver.c (qemuDomainSaveImageOpen): Also unlink
empty file as corrupt. Reported by Dennis Householder.
Currently, on device detach, we parse given XML, find the device
in domain object, free it and try to restore security labels.
However, in some cases (e.g. usb hostdev) parsed XML contains
less information than freed device. In usb case it is bus & device
IDs. These are needed during label restoring as a symlink into
/dev/bus is generated from them. Therefore don't drop device
configuration until security labels are restored.
In commit 6f84e110 I mistakenly set default migration speed to
33554432 Mb! The units of migMaxBandwidth is Mb, with conversion
handled in qemuMonitor{JSON,Text}SetMigrationSpeed().
Also, remove definition of QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX since
it is no longer used after reverting commit ef1065cf.
If an async job run on a domain will stop the domain at the end of the
job, a concurrently run query job can hang in qemu monitor and nothing
can be done with that domain from this point on. An attempt to start
such domain results in "Timed out during operation: cannot acquire state
change lock" error.
However, quite a few things have to happen at the right time... There
must be an async job running which stops a domain at the end. This race
was reported with dump --crash but other similar jobs, such as
(managed)save and migration, should be able to trigger this bug as well.
While this async job is processing its last monitor command, that is a
query-migrate to which qemu replies with status "completed", a new
libvirt API that results in a query job must arrive and stay waiting
until the query-migrate command finishes. Once query-migrate is done but
before the async job closes qemu monitor while stopping the domain, the
other thread needs to wake up and call qemuMonitorSend to send its
command to qemu. Before qemu gets a chance to respond to this command,
the async job needs to close the monitor. At this point, the query job
thread is waiting for a condition that no-one will ever signal so it
never finishes the job.
* src/qemu/qemu_hostdev.c (qemuDomainReAttachHostdevDevices):
pciDeviceListFree(pcidevs) in the end free()s the device even if
it's in use by other domain, which can cause a race.
How to reproduce:
<script>
virsh nodedev-dettach pci_0000_00_19_0
virsh start test
virsh attach-device test hostdev.xml
virsh start test2
for i in {1..5}; do
echo "[ -- ${i}th time --]"
virsh nodedev-reattach pci_0000_00_19_0
done
echo "clean up"
virsh destroy test
virsh nodedev-reattach pci_0000_00_19_0
</script>
Device pci_0000_00_19_0 dettached
Domain test started
Device attached successfully
error: Failed to start domain test2
error: Requested operation is not valid: PCI device 0000:00:19.0 is in use by domain test
[ -- 1th time --]
Device pci_0000_00_19_0 re-attached
[ -- 2th time --]
Device pci_0000_00_19_0 re-attached
[ -- 3th time --]
Device pci_0000_00_19_0 re-attached
[ -- 4th time --]
Device pci_0000_00_19_0 re-attached
[ -- 5th time --]
Device pci_0000_00_19_0 re-attached
clean up
Domain test destroyed
Device pci_0000_00_19_0 re-attached
The patch also fixes another problem, there won't be error like
"qemuDomainReAttachHostdevDevices: Not reattaching active
device 0000:00:19.0" in daemon log if some device is in active.
As pciResetDevice and pciReattachDevice won't be called for
the device anymore. This is sensible as we already reported
error when preparing the device if it's active. Blindly trying
to pciResetDevice & pciReattachDevice on the device and getting
an error is just redundant.
This patch fixes two problems:
1) The device will be reattached to host even if it's not
managed, as there is a "pciDeviceSetManaged".
2) The device won't be reattached to host with original
driver properly. As it doesn't honor the device original
properties which are maintained by driver->activePciHostdevs.
This chunk of code below repeated in several functions, factor it into
a helper method virDomainLiveConfigHelperMethod to eliminate duplicated code
based on Eric and Adam's suggestion. I have tested it for all the
relevant APIs changed.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
When destroying a domain qemuDomainDestroy kills its qemu process and
starts a new job, which means it unlocks the domain object and locks it
again after some time. Although the object is usually unlocked for a
pretty short time, chances are another thread processing an EOF event on
qemu monitor is able to lock the object first and does all the cleanup
by itself. This leads to wrong shutoff reason and lifecycle event detail
and virDomainDestroy API incorrectly reporting failure to destroy an
inactive domain.
Reported by Charlie Smurthwaite.
Currently qemuDomainAssignPCIAddresses() is called to assign addresses
to PCI devices.
We need to do something similar for devices with spapr-vio addresses.
So create one place where address assignment will be done, that is
qemuDomainAssignAddresses().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
For the PPC64 pseries machine type we need to add address information
for the spapr-vty device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
When parsing ppc64 models on an x86 host an out-of-memory error message is displayed due
to it checking for retcpus being NULL. Fix this by removing the check whether retcpus is NULL
since we will realloc into this variable.
Also in the X86 model parser display the OOM error at the location where it happens.
A preparatory patch for DHCP snooping where we want to be able to
differentiate between a VM's interface using the tuple of
<VM UUID, Interface MAC address>. We assume that MAC addresses could
possibly be re-used between different networks (VLANs) thus do not only
want to rely on the MAC address to identify an interface.
At the current 'final destination' in virNWFilterInstantiate I am leaving
the vmuuid parameter as ATTRIBUTE_UNUSED until the DHCP snooping patches arrive.
(we may not post the DHCP snooping patches for 0.9.9, though)
Mostly this is a pretty trivial patch. On the lowest layers, in lxc_driver
and uml_conf, I am passing the virDomainDefPtr around until I am passing
only the VM's uuid into the NWFilter calls.
This patch cleans up return codes in the nwfilter subsystem.
Some functions in nwfilter_conf.c (validators and formatters) are
keeping their bool return for now and I am converting their return
code to true/false.
All other functions now have failure return codes of -1 and success
of 0.
[I searched for all occurences of ' 1;' and checked all 'if ' and
adapted where needed. After that I did a grep for 'NWFilter' in the source
tree.]
assumptions from generic code.
This implements the minimal set of changes needed in libvirt to launch a
PowerPC-KVM based guest.
It removes x86-specific assumptions about choice of serial driver backend
from generic qemu guest commandline generation code.
It also restricts the ACPI capability to be available for an x86 or
x86_64 domain.
This is not a complete solution -- it still does not guarantee libvirt
the capability to flag non-supported options in guest XML. (Eg, an ACPI
specification in a PowerPC guest XML will still get processed, even
though qemu-system-ppc64 does not support it while qemu-system-x86_64 does.)
This drawback exists because libvirt falls back on qemu to query supported
features, and qemu '-h' blindly lists all capabilities -- irrespective
of whether they are available while emulating a given architecture or not.
The long-term solution would be for qemu to list out capabilities based
on architecture and platform -- so that libvirt can cleanly make out what
devices are supported on an arch (say 'ppc64') and platform (say, 'mac99').
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
This enables libvirt to select the correct qemu binary (qemu-system-ppc64)
for a guest vm based on arch 'ppc64'.
Also, libvirt is enabled to correctly parse the list of supported PowerPC
CPUs, generated by running 'qemu-system-ppc64 -cpu ?'
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Acked-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
With security_driver set to "none" in /etc/libvirt/qemu.conf,
libvirtd would crash when attempted to attach to an existing
qemu process. Only copy the security model if it actually exists.
During virDomainDestroy, QEMU may emit SHUTDOWN event as a response to
SIGTERM and since domain object is still locked, the event is processed
after the domain is destroyed. We need to ignore this event in such case
to avoid changing domain state from shutoff to shutdown.
When QEMU guest finishes its shutdown sequence, qemu stops virtual CPUs
and when started with -no-shutdown waits for us to kill it using
SGITERM. Since QEMU is flushing its internal buffers, some time may pass
before QEMU actually dies. We mistakenly used "paused" state (and
events) for this which is quite confusing since users may see a domain
going to pause while they expect it to shutdown. Since we already have
"shutdown" state with "the domain is being shut down" semantics, we
should use it for this state.
However, the state didn't have a corresponding event so I created one
and called its detail as VIR_DOMAIN_EVENT_SHUTDOWN_FINISHED (guest OS
finished its shutdown sequence) with the intent to add
VIR_DOMAIN_EVENT_SHUTDOWN_STARTED in the future if we have a
sufficiently capable guest agent that can notify us when guest OS starts
to shutdown.
Fix a logic error, the initial value of ret = -1, if just set --config,
it will goto endjob directly without doing its really job here.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
filter 0-device-weight when:
- getting blkio parameters with --config
- starting up a domain
When testing with blkio, I found these issues:
(dom is down)
virsh blkiotune dom --device-weights /dev/sda,300,/dev/sdb,500
virsh blkiotune dom --device-weights /dev/sda,300,/dev/sdb,0
virsh blkiotune dom
weight : 800
device_weight : /dev/sda,200,/dev/sdb,0
# issue 1: shows 0 device weight of /dev/sdb that may confuse user
(continued)
virsh start dom
# issue 2: If /dev/sdb doesn't exist, libvirt refuses to bring the
# dom up because it wants to set the device weight to 0 of a
# non-existing device. Since 0 means no weight-limit, we really don't
# have to set it.
Prior to this patch, for a running dom, the commands:
$ virsh blkiotune dom --device-weights /dev/sda,502,/dev/sdb,498
$ virsh blkiotune dom --device-weights /dev/sda,503
$ virsh blkiotune dom
weight : 500
device_weight : /dev/sda,503
claim that /dev/sdb no longer has a non-default weight, but
directly querying cgroups says otherwise:
$ cat /cgroup/blkio/libvirt/qemu/dom/blkio.weight_device
8:0 503
8:16 498
After this patch, an explicit 0 is required to remove a device path
from the XML, and omitting a device path that was previously
specified leaves that device path untouched in the XML, to match
cgroups behavior.
* src/qemu/qemu_driver.c (parseBlkioWeightDeviceStr): Rename...
(qemuDomainParseDeviceWeightStr): ...and use correct type.
(qemuDomainSetBlkioParameters): After parsing string, modify
rather than replacing existing table.
* tools/virsh.pod (blkiotune): Tweak wording.
Implement the block I/O throttle setting and getting support to qemu
driver.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The virTimestamp and virTimeMs functions in src/util/util.h
duplicate functionality from virtime.h, in a non-async signal
safe manner. Remove them, and convert all code over to the new
APIs.
* src/util/util.c, src/util/util.h: Delete virTimeMs and virTimestamp
* src/lxc/lxc_driver.c, src/qemu/qemu_domain.c,
src/qemu/qemu_driver.c, src/qemu/qemu_migration.c,
src/qemu/qemu_process.c, src/util/event_poll.c: Convert to use
virtime APIs
If we ensure that virNodeSuspendGetTargetMask always resets
*bitmask to zero upon failure, there is no need for the
powerMgmt_valid field.
* src/util/virnodesuspend.c: Ensure *bitmask is zero upon
failure
* src/conf/capabilities.c, src/conf/capabilities.h: Remove
powerMgmt_valid field
* src/qemu/qemu_capabilities.c: Remove powerMgmt_valid
The node suspend capabilities APIs should not have been put into
util.[ch]. Instead move them into virnodesuspend.[ch]
* src/util/util.c, src/util/util.h: Remove suspend capabilities APIs
* src/util/virnodesuspend.c, src/util/virnodesuspend.h: Add
suspend capabilities APIs
* src/qemu/qemu_capabilities.c: Include virnodesuspend.h
Rename virGetPMCapabilities to virNodeSuspendGetTargetMask and
virDiscoverHostPMFeature to virNodeSuspendSupportsTarget.
* src/util/util.c, src/util/util.h: Rename APIs
* src/qemu/qemu_capabilities.c, src/util/virnodesuspend.c: Adjust
for new names
Without this, 'virsh blkiotune --live --config --weight=n'
only affected live.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Allow
setting both configurations at once.
After the previous patch, there are now some redundant checks.
* src/qemu/qemu_driver.c (qemudDomainGetVcpuPinInfo)
(qemuGetSchedulerParametersFlags): Drop checks now guaranteed by
libvirt.c.
* src/lxc/lxc_driver.c (lxcGetSchedulerParametersFlags):
Likewise.
It requires the domain is running, otherwise fails. Resize to a lower
size is supported, but should be used with extreme caution.
In order to prohibit the "size" overflowing after multiplied by
1024. We do checking in the codes. For QMP mode, the default units
is Bytes, the passed size needs to be multiplied by 1024, however,
for HMP mode, the default units is "Megabytes", the passed "size"
needs to be divided by 1024 then.
Implements functions for both HMP and QMP mode.
For HMP mode, qemu uses "M" as the units by default, so the passed "sized"
is divided by 1024.
For QMP mode, qemu uses "Bytes" as the units by default, the passed "sized"
is multiplied by 1024.
All of the monitor functions return -1 on failure, 0 on success, or -2 if
not supported.
Add the core functions that implement the functionality of the API.
Suspend is done by using an asynchronous mechanism so that we can return
the status to the caller before the host gets suspended. This asynchronous
operation is achieved by suspending the host in a separate thread of
execution. However, returning the status to the caller is only best-effort,
but not guaranteed.
To resume the host, an RTC alarm is set up (based on how long we want to
suspend) before suspending the host. When this alarm fires, the host
gets woken up.
Suspend-to-RAM operation on a host running Linux can take upto more than 20
seconds, depending on the load of the system. (Freezing of tasks, an operation
preceding any suspend operation, is given up after a 20 second timeout).
And Suspend-to-Disk can take even more time, considering the time required
for compaction, creating the memory image and writing it to disk etc.
So, we do not allow the user to specify a suspend duration of less than 60
seconds, to be on the safer side, since we don't want to prematurely declare
failure when we only had to wait for some more time.
If a connection to destination host is lost during peer-to-peer
migration (because keepalive protocol timed out), we won't be able to
finish the migration and it doesn't make sense to wait for qemu to
transmit all data. This patch automatically cancels such migration
without waiting for virDomainAbortJob to be called.
If something fails while initializing qemu job object in
qemuDomainObjPrivateAlloc(), memory to the private pointer is freed, but
after that, the pointer is still dereferenced, which may result in a
segfault.
* qemuDomainObjPrivateAlloc() - Don't dereference NULL pointer.
Generally, functions which return malloc'd strings should be typed
as 'char *', not 'const char *', to make it obvious that the caller
is responsible to free things. free(const char *) fails to compile,
and although we have a cast embedded in VIR_FREE to work around poor
code that frees const char *, it's better to not rely on that hack.
* src/qemu/qemu_driver.c (qemuDiskPathToAlias): Change return type.
(qemuDomainBlockJobImpl): Update caller.
Commit 89b6284f made it possible to pass either a source name or
the target device to most API demanding a disk designation, but
forgot to update the documentation. It also failed to update
virDomainBlockStats to take both forms. This patch fixes both the
documentation and the remaining function.
Xen continues to use just device shorthand (that is, I did not
implement path lookup there, since xen does not track a domain_conf
to quickly tie a path back to the device shorthand).
* src/libvirt.c (virDomainBlockStats, virDomainBlockStatsFlags)
(virDomainGetBlockInfo, virDomainBlockPeek)
(virDomainBlockJobAbort, virDomainGetBlockJobInfo)
(virDomainBlockJobSetSpeed, virDomainBlockPull): Document
acceptable disk naming conventions.
* src/qemu/qemu_driver.c (qemuDomainBlockStats)
(qemuDomainBlockStatsFlags): Allow lookup by source name.
* src/test/test_driver.c (testDomainBlockStats): Likewise.
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
To use this, new APIs could be implemented in libvirt to exploit power
management features such as S3/S4.
For direct attach devices, in qemuBuildCommandLine, we seem to be freeing
actual device on error path (with networkReleaseActualDevice). But the actual
device is not deleted.
qemuProcessStop eventually deletes the direct attach device and releases
actual device. But by the time qemuProcessStop is called qemuBuildCommandLine
has already freed actual device, leaving stray macvtap devices behind on error.
So the simplest fix is to remove the networkReleaseActualDevice in
qemuBuildCommandLine. This patch does just that.
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Now, when we support multiple consoles per domain,
the vm->def->console[0] can still remain an alias
for vm->def->serial[0]; However, we need to copy
it's source definition as well otherwise we'll regress
on virDomainOpenConsole.
This prepares for subsequent patches which introduce dependence
on cgroup cpuset. Enable cgroup cpuset by default so users don't
have to modify configuration file before encountering a cpuset
error.
Update virNetDevMacVLanCreateWithVPortProfile to allow creation
of plain macvlan devices, as well as macvtap devices. The former
is useful for LXC containers
* src/qemu/qemu_command.c: Explicitly request a macvtap device
* src/util/virnetdevmacvlan.c, src/util/virnetdevmacvlan.h: Add
new flag to allow switching between macvlan and macvtap
creation
Rename virNetDevMacVLanCreate to virNetDevMacVLanCreateWithVPortProfile
and virNetDevMacVLanDelete to virNetDevMacVLanDeleteWithVPortProfile
To make way for renaming the other macvlan creation APIs in
interface.c
* util/virnetdevmacvlan.c, util/virnetdevmacvlan.h,
qemu/qemu_command.c, qemu/qemu_hotplug.c, qemu/qemu_process.c:
Rename APIs
Rename the macvtap.c file to virnetdevmacvlan.c to reflect its
functionality. Move the port profile association code out into
virnetdevvportprofile.c. Make the APIs available unconditionally
to callers
* src/util/macvtap.h: rename to src/util/virnetdevmacvlan.h,
* src/util/macvtap.c: rename to src/util/virnetdevmacvlan.c
* src/util/virnetdevvportprofile.c, src/util/virnetdevvportprofile.h:
Pull in vport association code
* src/Makefile.am, src/conf/domain_conf.h, src/qemu/qemu_conf.c,
src/qemu/qemu_conf.h, src/qemu/qemu_driver.c: Update include
paths & remove conditional compilation
In preparation for code re-organization, rename the Macvtap
management APIs to have the following patterns
virNetDevMacVLanXXXXX - macvlan/macvtap interface management
virNetDevVPortProfileXXXX - virtual port profile management
* src/util/macvtap.c, src/util/macvtap.h: Rename APIs
* src/conf/domain_conf.c, src/network/bridge_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_command.h,
src/qemu/qemu_driver.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/qemu/qemu_process.c,
src/qemu/qemu_process.h: Update for renamed APIs
Add routines to generate -numa QEMU command line option based on
<numa> ... </numa> XML specifications.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
This improves the support for qemu rbd devices by adding support for a few
key features (e.g., authentication) and cleaning up the way in which
rbd configuration options are passed to qemu.
An <auth> member of the disk source xml specifies how librbd should
authenticate. The username attribute is the Ceph/RBD user to authenticate as.
The usage or uuid attributes specify which secret to use. Usage is an
arbitrary identifier local to libvirt.
The old RBD support relied on setting an environment variable to
communicate information to qemu/librbd. Instead, pass those options
explicitly to qemu. Update the qemu argument parsing and tests
accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
The src/util/network.c file is a dumping ground for many different
APIs. Split it up into 5 pieces, along functional lines
- src/util/virnetdevbandwidth.c: virNetDevBandwidth type & helper APIs
- src/util/virnetdevvportprofile.c: virNetDevVPortProfile type & helper APIs
- src/util/virsocketaddr.c: virSocketAddr and APIs
- src/conf/netdev_bandwidth_conf.c: XML parsing / formatting
for virNetDevBandwidth
- src/conf/netdev_vport_profile_conf.c: XML parsing / formatting
for virNetDevVPortProfile
* src/util/network.c, src/util/network.h: Split into 5 pieces
* src/conf/netdev_bandwidth_conf.c, src/conf/netdev_bandwidth_conf.h,
src/conf/netdev_vport_profile_conf.c, src/conf/netdev_vport_profile_conf.h,
src/util/virnetdevbandwidth.c, src/util/virnetdevbandwidth.h,
src/util/virnetdevvportprofile.c, src/util/virnetdevvportprofile.h,
src/util/virsocketaddr.c, src/util/virsocketaddr.h: New pieces
* daemon/libvirtd.h, daemon/remote.c, src/conf/domain_conf.c,
src/conf/domain_conf.h, src/conf/network_conf.c,
src/conf/network_conf.h, src/conf/nwfilter_conf.h,
src/esx/esx_util.h, src/network/bridge_driver.c,
src/qemu/qemu_conf.c, src/rpc/virnetsocket.c,
src/rpc/virnetsocket.h, src/util/dnsmasq.h, src/util/interface.h,
src/util/iptables.h, src/util/macvtap.c, src/util/macvtap.h,
src/util/virnetdev.h, src/util/virnetdevtap.c,
tools/virsh.c: Update include files
Rename the virVirtualPortProfileParams struct to be
virNetDevVPortProfile, and rename the APIs to match
this prefix.
* src/util/network.c, src/util/network.h: Rename port profile
APIs
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/conf/network_conf.c, src/conf/network_conf.h,
src/network/bridge_driver.c, src/qemu/qemu_hotplug.c,
src/util/macvtap.c, src/util/macvtap.h: Update for
renamed APIs/structs
Hi
Commit c31d23a787 removed the "conn"
parameter from qemuPhysIfaceConnect(), but it's still used if
WITH_MACVTAP is false. Also, it's still mentioned in the comment
above the function:
/**
* qemuPhysIfaceConnect:
* @def: the definition of the VM (needed by 802.1Qbh and audit)
* @conn: pointer to virConnect object
* @driver: pointer to the qemud_driver
* @net: pointer to he VM's interface description with direct device type
* @qemuCaps: flags for qemu
*
* Returns a filedescriptor on success or -1 in case of error.
*/
int
qemuPhysIfaceConnect(virDomainDefPtr def,
struct qemud_driver *driver,
virDomainNetDefPtr net,
virBitmapPtr qemuCaps,
enum virVMOperationType vmop)
{
int rc;
#if WITH_MACVTAP
[...]
#else
(void)def;
(void)conn;
(void)net;
(void)qemuCaps;
(void)driver;
(void)vmop;
qemuReportError(VIR_ERR_INTERNAL_ERROR,
"%s", _("No support for macvtap device"));
rc = -1;
#endif
return rc;
}
--
Michael Wood <esiotrot@gmail.com>
From f4fc43b4111a4c099395c55902e497b8965e2b53 Mon Sep 17 00:00:00 2001
From: Michael Wood <esiotrot@gmail.com>
Date: Sat, 12 Nov 2011 13:37:53 +0200
Subject: [PATCH] Fix build without MACVTAP.
Qemu will be the first driver to make use of a typed string in the
next round of additions. Separate out the trivial addition.
* src/qemu/qemu_driver.c (qemudSupportsFeature): Advertise feature.
(qemuDomainGetBlkioParameters, qemuDomainGetMemoryParameters)
(qemuGetSchedulerParametersFlags, qemudDomainBlockStatsFlags):
Allow typed strings flag where trivially supported.
This reverts commit ef1065cf5ac; see also this bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=751900
In qemu 0.15.1 and earlier, during migration to file, the
qemu_savevm_state_begin and qemu_savevm_state_iterate methods
will both process as much migration data as possible until either
1. The file descriptor returns EAGAIN
2. The bandwidth rate limit is reached
If we set the rate limit to ULONG_MAX, test 2 never becomes true. We're
passing a plain file descriptor to QEMU and POSIX does not support EAGAIN on
regular files / block devices, so test 1 never becomes true either.
In the 'virsh save --bypass-cache' case, we pass a pipe instead of a
regular fd, but using a pipe adds I/O overhead, so always passing a
pipe just so qemu can see EAGAIN doesn't seem nice.
The ultimate fix needs to come from qemu - background migration must
respect asynchronous abort requests, or else periodically return
control to the main handling loop without an EAGAIN and without
waiting to hit an insanely large amount of data. But until a
version of qemu is fixed to support "unlimited" data rates while
still allowing cancellation, the best we can do is avoid the
automatic use of unlimited rates from within libvirt (users can
still explicitly change the migration rates, if they are aware that
they are giving up the ability to cancel a job).
Reverting the lone use of QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX is
the simplest patch; this slows migration back down to a default
32M/sec cap, but also ensures that the main qemu processing loop
will still be responsive to cancellation requests. Hopefully
upstream qemu will provide us a means of safely using unlimited
speed, including a runtime probe of that capability.
* src/qemu/qemu_migration.c (qemuMigrationToFile): Revert attempt
to use unlimited migration bandwidth when migrating to file.
Signed-off-by: Daniel Veillard <veillard@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The socket address APIs in src/util/network.h either take the
form virSocketAddrXXX, virSocketXXX or virSocketXXXAddr.
Sanitize this so everything is virSocketAddrXXXX, and ensure
that the virSocketAddr parameter is always the first one.
* src/util/network.c, src/util/network.h: Santize socket
address API naming
* src/conf/domain_conf.c, src/conf/network_conf.c,
src/conf/nwfilter_conf.c, src/network/bridge_driver.c,
src/nwfilter/nwfilter_ebiptables_driver.c,
src/nwfilter/nwfilter_learnipaddr.c,
src/qemu/qemu_command.c, src/rpc/virnetsocket.c,
src/util/dnsmasq.c, src/util/iptables.c,
src/util/virnetdev.c, src/vbox/vbox_tmpl.c: Update for
API renaming
Following the renaming of the bridge management APIs, we can now
split the source file into 3 corresponding pieces
* src/util/virnetdev.c: APIs for any type of network interface
* src/util/virnetdevbridge.c: APIs for bridge interfaces
* src/util/virnetdevtap.c: APIs for TAP interfaces
* src/util/virnetdev.c, src/util/virnetdev.h,
src/util/virnetdevbridge.c, src/util/virnetdevbridge.h,
src/util/virnetdevtap.c, src/util/virnetdevtap.h: Copied
from bridge.{c,h}
* src/util/bridge.c, src/util/bridge.h: Split into 3 pieces
* src/lxc/lxc_driver.c, src/network/bridge_driver.c,
src/openvz/openvz_driver.c, src/qemu/qemu_command.c,
src/qemu/qemu_conf.h, src/uml/uml_conf.c, src/uml/uml_conf.h,
src/uml/uml_driver.c: Update #include directives
The existing brXXX APIs in src/util/bridge.h are renamed to
follow one of three different conventions
- virNetDevXXX - operations for any type of interface
- virNetDevBridgeXXX - operations for bridge interfaces
- virNetDevTapXXX - operations for tap interfaces
* src/util/bridge.h, src/util/bridge.c: Rename all APIs
* src/lxc/lxc_driver.c, src/network/bridge_driver.c,
src/qemu/qemu_command.c, src/uml/uml_conf.c,
src/uml/uml_driver.c: Update for API renaming
Currently every caller of the brXXX APIs has to store the returned
errno value and then raise an error message. This results in
inconsistent error messages across drivers, additional burden on
the callers and makes the error reporting inaccurate since it is
hard to distinguish different scenarios from 1 errno value.
* src/util/bridge.c: Raise errors instead of returning errnos
* src/lxc/lxc_driver.c, src/network/bridge_driver.c,
src/qemu/qemu_command.c, src/uml/uml_conf.c,
src/uml/uml_driver.c: Remove error reporting code
The bridge management APIs in src/util/bridge.c require a brControl
object to be passed around. This holds the file descriptor for the
control socket. This extra object complicates use of the API for
only a minor efficiency gain, which is in turn entirely offset by
the need to fork/exec the brctl command for STP configuration.
This patch removes the 'brControl' object entirely, instead opening
the control socket & closing it again within the scope of each method.
The parameter names for the APIs are also made to consistently use
'brname' for bridge device name, and 'ifname' for an interface
device name. Finally annotations are added for non-NULL parameters
and return check validation
* src/util/bridge.c, src/util/bridge.h: Remove brControl object
and update API parameter names & annotations.
* src/lxc/lxc_driver.c, src/network/bridge_driver.c,
src/uml/uml_conf.h, src/uml/uml_conf.c, src/uml/uml_driver.c,
src/qemu/qemu_command.c, src/qemu/qemu_conf.h,
src/qemu/qemu_driver.c: Remove reference to 'brControl' object
All constants related to events should have a prefix of
VIR_DOMAIN_EVENT_
* include/libvirt/libvirt.h.in, src/qemu/qemu_domain.c:
Rename VIR_DOMAIN_DISK_CHANGE_MISSING_ON_START to
VIR_DOMAIN_EVENT_DISK_CHANGE_MISSING_ON_START
The default console type may vary based on the OS type. ie a Xen
paravirt guests wants a 'xen' console, while a fullvirt guests
wants a 'serial' console.
A plain integer default console type in the capabilities does
not suffice. Instead introduce a callback that is passed the
OS type.
* src/conf/capabilities.h: Use a callback for default console
type
* src/conf/domain_conf.c, src/conf/domain_conf.h: Use callback
for default console type. Add missing LXC/OpenVZ console types.
* src/esx/esx_driver.c, src/libxl/libxl_conf.c,
src/lxc/lxc_conf.c, src/openvz/openvz_conf.c,
src/phyp/phyp_driver.c, src/qemu/qemu_capabilities.c,
src/uml/uml_conf.c, src/vbox/vbox_tmpl.c,
src/vmware/vmware_conf.c, src/xen/xen_hypervisor.c,
src/xenapi/xenapi_driver.c: Set default console type callback
qemuBuildVirtioSerialPortDevStr was mistakenly accessing the
target.name field in the virDomainChrDef object for chardevs
belonging to a console. Those chardevs only have port set,
and if there's > 1 console, the > 1port number results in
trying to access a target.name with address 0x1
* src/qemu/qemu_command.c: Fix target.name handling and
make code more robust wrt error reporting
* src/qemu/qemu_command.c: Conditionally access target.name
While Xen only has a single paravirt console, UML, and
QEMU both support multiple paravirt consoles. The LXC
driver can also be trivially made to support multiple
consoles. This patch extends the XML to allow multiple
<console> elements in the XML. It also makes the UML
and QEMU drivers support this config.
* src/conf/domain_conf.c, src/conf/domain_conf.h: Allow
multiple <console> devices
* src/lxc/lxc_driver.c, src/xen/xen_driver.c,
src/xenxs/xen_sxpr.c, src/xenxs/xen_xm.c: Update for
internal API changes
* src/security/security_selinux.c, src/security/virt-aa-helper.c:
Only label consoles that aren't a copy of the serial device
* src/qemu/qemu_command.c, src/qemu/qemu_driver.c,
src/qemu/qemu_process.c, src/uml/uml_conf.c,
src/uml/uml_driver.c: Support multiple console devices
* tests/qemuxml2xmltest.c, tests/qemuxml2argvtest.c: Extra
tests for multiple virtio consoles. Set QEMU_CAPS_CHARDEV
for all console /channel tests
* tests/qemuxml2argvdata/qemuxml2argv-channel-virtio-auto.args,
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.args
tests/qemuxml2argvdata/qemuxml2argv-console-virtio.args: Update
for correct chardev syntax
* tests/qemuxml2argvdata/qemuxml2argv-console-virtio-many.args,
tests/qemuxml2argvdata/qemuxml2argv-console-virtio-many.xml: New
test file
Document the parameter names that will be used by
virDomain{Get,Set}SchedulerParameters{,Flags}, rather than
hard-coding those names in each driver, to match what is
done with memory, blkio, and blockstats parameters.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_SCHEDULER_CPU_SHARES)
(VIR_DOMAIN_SCHEDULER_VCPU_PERIOD)
(VIR_DOMAIN_SCHEDULER_VCPU_QUOTA, VIR_DOMAIN_SCHEDULER_WEIGHT)
(VIR_DOMAIN_SCHEDULER_CAP, VIR_DOMAIN_SCHEDULER_RESERVATION)
(VIR_DOMAIN_SCHEDULER_LIMIT, VIR_DOMAIN_SCHEDULER_SHARES): New
field name macros.
* src/qemu/qemu_driver.c (qemuSetSchedulerParametersFlags)
(qemuGetSchedulerParametersFlags): Use new defines.
* src/test/test_driver.c (testDomainGetSchedulerParamsFlags)
(testDomainSetSchedulerParamsFlags): Likewise.
* src/xen/xen_hypervisor.c (xenHypervisorGetSchedulerParameters)
(xenHypervisorSetSchedulerParameters): Likewise.
* src/xen/xend_internal.c (xenDaemonGetSchedulerParameters)
(xenDaemonSetSchedulerParameters): Likewise.
* src/lxc/lxc_driver.c (lxcSetSchedulerParametersFlags)
(lxcGetSchedulerParametersFlags): Likewise.
* src/esx/esx_driver.c (esxDomainGetSchedulerParametersFlags)
(esxDomainSetSchedulerParametersFlags): Likewise.
* src/libxl/libxl_driver.c (libxlDomainGetSchedulerParametersFlags)
(libxlDomainSetSchedulerParametersFlags): Likewise.
Since all virTypedParameter APIs allow us to return the number
of slots we actually populated, we should allow the user to
call with nparams too small (without overrunning their array)
or too large (ignoring the tail of the array that we can't fill),
rather than requiring that they get things exactly right.
Making this change will make it easier for a future patch to
introduce VIR_TYPED_PARAM_STRING, with filtering in libvirt.c
rather than in every single driver, since users already have
to be prepared for *nparams to be smaller on exit than on entry.
* src/qemu/qemu_driver.c (qemuDomainGetBlkioParameters)
(qemuDomainGetMemoryParameters): Allow variable nparams on entry.
(qemuGetSchedulerParametersFlags): Drop redundant check.
(qemudDomainBlockStats, qemudDomainBlockStatsFlags): Rename...
(qemuDomainBlockStats, qemuDomainBlockStatsFlags): ...to this.
Don't return unavailable stats.
The qemu RBD driver needs access to the conn in order to get the secret
needed for connecting to the ceph cluster.
Signed-off-by: Sage Weil <sage@newdream.net>
To support "managed" mode of host PCI device, we record the original
states (unbind_from_stub, remove_slot, and reprobe) so that could
reattach the device to host with original driver. But there is no XML
for theses attrs, and thus after daemon is restarted, we lose the
original states. It's easy to reproduce:
1) virsh start domain
2) virsh attach-device dom hostpci.xml (in 'managed' mode)
3) service libvirtd restart
4) virsh destroy domain
You will see the device won't be bound to the original driver
if there was one.
This patch is to solve the problem by introducing internal XML
(won't be dumped to user, only dumped to status XML). The XML is:
<origstates>
<unbind/>
<remove_slot/>
<reprobe/>
</origstates>
Which will be child node of <hostdev><source>...</souce></hostdev>.
(only for PCI device).
A new struct "virDomainHostdevOrigStates" is introduced for the XML,
and the according members are updated when preparing the PCI device.
And function "qemuUpdateActivePciHostdevs" is modified to honor
the original states. Use of qemuGetPciHostDeviceList is removed
in function "qemuUpdateActivePciHostdevs", and the "managed" value of
the device config is honored by the change. This fixes another problem
alongside:
qemuGetPciHostDeviceList set the device as "managed" force
regardless of whether the device is configured as "managed='yes'"
or not in XML, which is not right.
- changed some return 1's to return -1
- changed if (rc) error checks to if (rc < 0)
- fixed some other minor convention violations
I might have missed some. Can fix in another patch or can respin
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Reported-by: Eric Blake <eblake@redhat.com>
Reported-by: Laine Stump <laine@laine.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
When using the xml as below:
------------------------------------------------------
<devices>
<emulator>/home/soulxu/data/work-code/qemu-kvm/x86_64-softmmu/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/home/soulxu/data/VM/images/linux.img'/>
<target dev='vda' bus='virtio'/>
<address type='drive' controller='0' bus='0' unit='0'/>
</disk>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</memballoon>
</devices>
------------------------------------------------------
Then can't startup qemu, the error message as below:
virsh # start test-vm
error: Failed to start domain test-vm
error: internal error process exited while connecting to monitor: qemu-system-x86_64: -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3: PCI: slot 3 function 0 not available for virtio-balloon-pci, in use by virtio-blk-pci
qemu-system-x86_64: -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3: Device 'virtio-balloon-pci' could not be initialized
So adding check for bus type and address type. Only the address of pci type support by virtio bus.
Signed-off-by: Xu He Jie <xuhj@linux.vnet.ibm.com>
Leak introduced in commit c1bc3d89.
Detected by valgrind:
==18462== 1,100 bytes in 1 blocks are definitely lost in loss record 183 of 184
==18462== at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==18462== by 0x4A06167: realloc (vg_replace_malloc.c:525)
==18462== by 0x4AADBB: virReallocN (memory.c:161)
==18462== by 0x4A975E: virBufferGrow (buf.c:117)
==18462== by 0x4A9D92: virBufferVasprintf (buf.c:290)
==18462== by 0x4A9EF7: virBufferAsprintf (buf.c:263)
==18462== by 0x429488: qemuBuildControllerDevStr (qemu_command.c:1993)
==18462== by 0x42C4B6: qemuBuildCommandLine (qemu_command.c:3803)
==18462== by 0x41A604: testCompareXMLToArgvHelper (qemuxml2argvtest.c:124)
==18462== by 0x41BB81: virtTestRun (testutils.c:141)
==18462== by 0x416DFF: mymain (qemuxml2argvtest.c:369)
==18462== by 0x41B277: virtTestMain (testutils.c:696)
==18462==
==18462== LEAK SUMMARY:
==18462== definitely lost: 1,100 bytes in 1 blocks
==18462== indirectly lost: 0 bytes in 0 blocks
* src/qemu/qemu_command.c (qemuBuildCommandLine): Clean up on success.
Signed-off-by: Alex Jia <ajia@redhat.com>
Detected by Coverity. The fix in 2c27dfa didn't catch all bad
instances of memcpy(). Thankfully, on further analysis, all of
the problematic uses are only triggered by old qemu that lacks
-device.
* src/qemu/qemu_hotplug.c (qemuDomainAttachPciDiskDevice)
(qemuDomainAttachNetDevice, qemuDomainAttachHostPciDevice): Init
all fields since monitor only populates some of them.
The QEMU monitor command 'add_client' can be used to connect to
a VNC or SPICE graphics display. This allows for implementation
of the virDomainOpenGraphics API
* src/qemu/qemu_driver.c: Implement virDomainOpenGraphics
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Add binding for 'add_client' command
Not all VNC/SPICE servers use a TCP socket for their connections.
It is possible to configure a UNIX socket server. The graphics
event must thus include a UNIX socket address type.
* include/libvirt/libvirt.h.in: Add UNIX socket address type
for graphics event
* src/qemu/qemu_monitor_json.c: Add 'unix' string to address
type enum
This change adds some systemtap/dtrace probes to the QEMU monitor
client code. In particular it allows watching of all operations
for a VM
* examples/systemtap/qemu-monitor.stp: Watch all monitor commands
* src/Makefile.am: Passing libdir/bindir/sbindir to dtrace2systemtap.pl
* src/dtrace2systemtap.pl: Accept libdir/bindir/sbindir as args
and look for '# binary:' comment to mark probes against libvirtd
vs libvirt.so
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor_json.c,
src/qemu/qemu_monitor_text.c: Add probes for key functions
Rather than making all clients of monitor commands that are JSON-only
check whether yajl support was compiled in, it is simpler to just
avoid setting the capability bit up front if we can't use the capability.
* src/qemu/qemu_capabilities.c (qemuCapsComputeCmdFlags): Only set
capability bit if we also have yajl library to use it.
* src/qemu/qemu_driver.c (qemuDomainReboot): Drop #ifdefs.
* src/qemu/qemu_process.c (qemuProcessStart): Likewise.
* tests/qemuhelptest.c (testHelpStrParsing): Pass test even
without yajl.
* tests/qemuxml2argvtest.c (mymain): Simplify use of json flag.
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-error-*.args:
Update expected results to match.
If a disk source gets dropped because it is not accessible,
mgmt application might want to be informed about this. Therefore
we need to emit an event. The event presented in this patch
is however a bit superset of what written above. The reason is simple:
an intention to be easily expanded, e.g. on 'user ejected disk
in guest' events. Therefore, callback gets source string and disk alias
(which should be unique among a domain) and reason (an integer);
This patch implements on_missing feature in qemu driver.
Upon qemu startup process an accessibility of CDROMs
and floppy disks is checked. The source might get dropped
if unavailable and on_missing is set accordingly.
No event is emit thought. Look for follow up patch.
This patch is rather cosmetic as it only moves device alias
assignation from command line construction just before that.
However, it is needed in connotation of previous and next patch.
Detected by Coverity. Both text and JSON monitors set only the
bus and unit fields, which means driveAddr.controller spends
life as garbage on the stack, and is then memcpy()'d into the
in-memory representation which the user can see via dumpxml.
* src/qemu/qemu_hotplug.c (qemuDomainAttachSCSIDisk): Only copy
defined fields.
The improvements to virBuffer, along with a paradigm shift to pass
the original buffer through rather than creating a second buffer,
allow us to shave off quite a few lines of code.
* src/util/sysinfo.h (virSysinfoFormat): Alter signature.
* src/util/sysinfo.c (virSysinfoFormat, virSysinfoBIOSFormat)
(virSysinfoSystemFormat, virSysinfoProcessorFormat)
(virSysinfoMemoryFormat): Change indentation parameter.
* src/conf/domain_conf.c (virDomainSysinfoDefFormat): Adjust
caller.
* src/qemu/qemu_driver.c (qemuGetSysinfo): Likewise.
<domainsnapshot> is the first public instance of <domain> being
used as a sub-element, although we have two other private uses
(runtime state, and migration cookie). Although indentation has
no effect on XML parsing, using it makes the output more consistent.
This uses virBuffer auto-indentation to obtain the effect, for all
but the portions of <domain> that are not generated a line at a
time into the same virBuffer. Further patches will clean up the
remaining problems.
* src/conf/domain_conf.h (virDomainDefFormatInternal): New prototype.
* src/conf/domain_conf.c (virDomainDefFormatInternal): Export.
(virDomainObjFormat, virDomainSnapshotDefFormat): Update callers.
* src/libvirt_private.syms (domain_conf.h): Add new export.
* src/qemu/qemu_migration.c (qemuMigrationCookieXMLFormat): Use
new function.
(qemuMigrationCookieXMLFormatStr): Update caller.
There is a little difference between the output of domxml-to-native and the actual commandline.
No matter qemu is in control or readline mode, domxml-to-native always converts it to readline mode.
That is because the parameter "monitor_json" for qemuBuildCommandLine() is always set to false
in qemuDomainXMLToNative().
Signed-off-by: tangchen <tangchen@cn.fujitsu.com>
The XML parser for the qemu specific extensions expects the qemu name-space
to be bound to the 'qemu' prefix. This is too strict, since the name of the
name-space-prefix is only meant as an internal lookup key. Only the associated
URI is relevant.
<domain>...
<qemu:commandline xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0">
...</qemu:commandline>
</domain>
<domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0">...
<ns0:commandline>
...</ns0:commandline>
</domain>
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0">
<qemu:commandline xmlns:qemu="urn:foo">
...</qemu:commandline>
</domain>
Remove the test for checking the name-space binding on the top-level <domain>
element. Registering the name-space with XPath is enough.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Noticed when testing new libvirt against old qemu that lacked the
snapshot_blkdev HMP command. Libvirt was mistakenly treating the
command as successful, and re-writing the domain XML to use the
just-created 0-byte file, rendering the domain broken on restart.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextDiskSnapshot):
Notice another possible error message.
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Don't keep 0-byte file
on failure.
BZ# https://bugzilla.redhat.com/show_bug.cgi?id=736214
The problem is caused by the original info of domain's PCI dev is
maintained by qemu_driver->activePciHostdevs list, (E.g. dev->reprobe,
which stands for whether need to reprobe driver for the dev when do
reattachment). The fields (dev->reprobe, dev->unbind_from_stub, and
dev->remove_slot) are initialized properly when preparing the PCI
device for managed attachment. However, when do reattachment, it
construct a complete new "pciDevice" without honoring the original
dev info, and thus the dev won't get the original driver or can get
other problem.
This patch is to fix the problem by get the devs from list
driver->activePciHostdevs.
Tested with following 3 scenarios:
* the PCI was bound to some driver not pci-stub before attaching
result: the device will be bound to the original driver
* the PCI was bound to pci-stub before attaching
result: no driver reprobing, and still bound to pci-stub
* The PCI was not bound to any driver
result: no driver reprobing, and still not bound to any driver.
When failing on starting a domain, it tries to reattach all the PCI
devices defined in the domain conf, regardless of whether the devices
are still used by other domain. This will cause the devices to be deleted
from the list qemu_driver->activePciHostdevs, thus the devices will be
thought as usable even if it's not true. And following commands
nodedev-{reattach,reset} will be successful.
How to reproduce:
1) Define two domains with same PCI device defined in the confs.
2) # virsh start domain1
3) # virsh start domain2
4) # virsh nodedev-reattach $pci_device
You will see the device will be reattached to host successfully.
As pciDeviceReattach just check if the device is still used by
other domain via checking if the device is in list driver->activePciHostdevs,
however, the device is deleted from the list by step 2).
This patch is to prohibit the bug by:
1) Prohibit a domain starting or device attachment right at
preparation period (qemuPrepareHostdevPCIDevices) if the
device is in list driver->activePciHostdevs, which means
it's used by other domain.
2) Introduces a new field for struct _pciDevice, (const char *used_by),
it will be set as the domain name at preparation period,
(qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting
the device from driver->activePciHostdevs if it's still used by
other domain when stopping the domain process.
* src/pci.h (define two internal functions, pciDeviceSetUsedBy and
pciDevceGetUsedBy)
* src/pci.c (new field "const char *used_by" for struct _pciDevice,
implementations for the two new functions)
* src/libvirt_private.syms (Add the two new internal functions)
* src/qemu_hostdev.h (Modify the definition of functions
qemuPrepareHostdevPCIDevices, and qemuDomainReAttachHostdevDevices)
* src/qemu_hostdev.c (Prohibit preparation and don't delete the
device from activePciHostdevs list if it's still used by other domain)
* src/qemu_hotplug.c (Update function usage, as the definitions are
changed)
Signed-off-by: Eric Blake <eblake@redhat.com>
Detected by Coverity. p (the pointer to the string) is always true;
when in reality, we wanted to know whether the integer value of the
just-parsed string is '0' or '1'. Logic bug since commit b1b5b51.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetBlockInfo): Set
results to proper value.
Detected by Coverity. If, for some reason, our text monitor input
does not match our assumptions, we end up incrementing p while it
is NULL, then dereferencing the pointer 0x1, which will fault.
* src/qemu/qemu_monitor_text.c
(qemuMonitorTextGetBlockStatsParamsNumber): Rewrite to avoid
deref of strchr failure. Fix indentation.
As this is needed. Although some functions check for domain
being active before obtaining job, we need to check it after,
because obtaining job unlocks domain object, during which
a state of domain can be changed.
This patch extends qemudDomainCoreDump so it supports new VIR_DUMP_RESET
flag. If this flag is set, domain is reset on successful dump. However,
this is needed to be done after we start CPUs.
With the recent refactoring of qemu snapshot relationships, it
is now trivial to filter on leaves.
* src/conf/domain_conf.c (virDomainSnapshotObjListCount)
(virDomainSnapshotObjListCopyNames): Handle new flag.
* src/qemu/qemu_driver.c (qemuDomainSnapshotListNames)
(qemuDomainSnapshotNum, qemuDomainSnapshotListChildrenNames)
(qemuDomainSnapshotNumChildren): Pass new flag through.
VirtFS allows the user to choose between path/handle based fs driver.
As of now, libvirt hardcoded path based driver only. This patch provides
a solution to allow user to choose between path/handle based fs driver.
Sample:
<filesystem type='mount'>
<driver type='handle'/>
<source dir='/folder/to/share1'/>
<target dir='mount_tag1'/>
</filesystem>
<filesystem type='mount'>
<driver type='path'/>
<source dir='/folder/to/share2'/>
<target dir='mount_tag2'/>
</filesystem>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The previous optimizations lead to some follow-on cleanups.
* src/conf/domain_conf.c (virDomainSnapshotForEachChild)
(virDomainSnapshotForEachDescendant): Drop dead parameter.
(virDomainSnapshotActOnDescendant)
(virDomainSnapshotObjListNumFrom)
(virDomainSnapshotObjListGetNamesFrom): Update callers.
* src/qemu/qemu_driver.c (qemuDomainSnapshotNumChildren)
(qemuDomainSnapshotListChildrenNames, qemuDomainSnapshotDelete):
Likewise.
* src/conf/domain_conf.h: Update prototypes.
Maintain the parent/child relationships of all qemu snapshots.
* src/qemu/qemu_driver.c (qemuDomainSnapshotLoad): Populate
relationships after loading.
(qemuDomainSnapshotCreateXML): Set relations on creation; tweak
redefinition to reuse existing object.
(qemuDomainSnapshotReparentChildren, qemuDomainSnapshotDelete):
Clear relations on delete.
To date, JSON disk snapshots worked by accident, as they were always
using hmp fallback due to a typo in commit e702b5b not picking up
on the (intentional) difference in command names between the two
monitor protocols.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Spell QMP command correctly.
Reported by Luiz Capitulino.
Not too hard to wire up. The trickiest part is realizing that
listing children of a snapshot cannot use SNAPSHOT_LIST_ROOTS,
and that we overloaded that bit to also mean SNAPSHOT_LIST_DESCENDANTS;
we use that bit to decide which iteration to use, but don't want
the existing counting/listing functions to see that bit.
* src/conf/domain_conf.h (virDomainSnapshotObjListNumFrom)
(virDomainSnapshotObjListGetNamesFrom): New prototypes.
* src/conf/domain_conf.c (virDomainSnapshotObjListNumFrom)
(virDomainSnapshotObjListGetNamesFrom): New functions.
* src/libvirt_private.syms (domain_conf.h): Export them.
* src/qemu/qemu_driver.c (qemuDomainSnapshotNumChildren)
(qemuDomainSnapshotListChildrenNames): New functions.
If parsing qemu command line fails (e.g. because of non-existing
process number supplied), we jump to cleanup label where we free
pidfile. Therefore it needs to be initialized. Otherwise we free
random pointer.
Coverity complained that 4 out of 5 callers to virJSONValueObjectGetBoolean
checked for errors. But we documented that we don't care in this case.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetBlockInfo): Use
ignore_value.
Previously libvirt's disk device XML only had a single attribute,
error_policy, to control both read and write error policy, but qemu
has separate options for controlling read and write. In one case
(enospc) a policy is allowed for write errors but not read errors.
This patch adds a separate attribute that sets only the read error
policy. If just error_policy is set, it will apply to both read and
write error policy (previous behavior), but if the new rerror_policy
attribute is set, it will override error_policy for read errors only.
Possible values for rerror_policy are "stop", "report", and "ignore"
("report" is the qemu-controlled default for rerror_policy when
error_policy isn't specified).
For consistency, the value "report" has been added to the possible
values for error_policy as well.
commit 12062ab set rerror=ignore when error_policy="enospace" was
selected (since the rerror option in qemu doesn't accept "enospc", as
the werror option does).
After that patch was already pushed, Paolo Bonzini noticed it and
commented that leaving rerror at the default ("report") would be a
better choice. This patch corrects the problem - if error_policy =
"enospace" is given, rerror is left off the qemu commandline,
effectively setting it to "report". For other values, rerror is still
set to match werror.
Additionally, the parsing of error_policy was changed to no longer
erroneously allow "default" as a choice - as with most other
attributes, if you want the default setting, just don't specify an
error_policy.
Finally, two ommissions in the first patch were corrected - a
long-dormant qemuxml2argv test for enospace was enabled, and fixed to
pass, and the argv2xml parser in qemu_command.c was updated to
recognize the different spelling on the qemu commandline.
Now that RHEL 6.2 Beta is out, it would be nice to test multifunction
devices on that platform. This changes things so that the multifunction
cap bit can be set in two different ways: by version comparison (needed
for qemu 0.13 which lacked a -device query), and by -device query
(provided by qemu.git and backported to the RHEL beta build of
qemu-kvm which still claims to be a modified 0.12, and therefore needed
for RHEL).
* src/qemu/qemu_capabilities.c (qemuCapsParseDeviceStr): Allow
second method of setting multifunction cap bit.
* tests/qemuhelptest.c (mymain): Test it.
* tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel62-beta: New file.
* tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel62-beta-device: Likewise.
Implements the documentation for snapshot revert vs. force.
Part of the patch tightens existing behavior (previously, reverting
to an old snapshot without <domain> was blindly attempted, now it
requires force), while part of it relaxes behavior (previously, it
was not possible to revert an active domain to an ABI-incompatible
active snapshot, now force allows this transition).
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Check for
risky situations, and allow force to get past them.
Once we know which set of disks belong to a snapshot, reverting or
deleting that snapshot should visit just those disks, rather than
also visiting disks that were hot-plugged in the meantime or
skipping disks that were hot-unplugged in the meantime.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Use
snapshot domain details when available. Avoid NULL deref.
Qemu driver tries to update balloon data in virDomainGetInfo and if it
can't do so because there is another monitor job running, it just
reports what's known in domain def. However, if there was no job running
but getting the data from qemu fails, we would fail the whole API. This
doesn't make sense. Let's make the failure nonfatal.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=730909
When support for setting the qemu disk error policy to "enospc" was
added, it was inadvertently spelled "enospace". This patch corrects
that on the qemu commandline (while retaining the "enospace" spelling
for libvirt's XML).
Also, while examining the qemu source, I found that "enospc" is not
allowed for the read error policy, only for write error policy (makes
sense). Since libvirt currently only has a single error policy
setting, when "enospace" is selected, the read error policy is set to
"ignore".
Destination libvirtd remembers the original name in the prepare phase
and clears it in the finish phase. The original name is used when
comparing domain name in migration cookie.
When support for was added for PCI multifunction cards (in commit
9f8baf, first included in libvirt 0.9.3), it was done by always
turning on the multifunction bit for all PCI devices. Since that time
it has been realized that this is not an ideal solution, and that the
multifunction bit must be selectively turned on. For example, see
https://bugzilla.redhat.com/show_bug.cgi?id=728174
and the discussion before and after
https://www.redhat.com/archives/libvir-list/2011-September/msg01036.html
This patch modifies multifunction support so that the multifunction=on
option is only added to the qemu commandline for a device if its PCI
<address> definition has the attribute "multifunction='on'", e.g.:
<address type='pci' domain='0x0000' bus='0x00'
slot='0x04' function='0x0' multifunction='on'/>
In practice, the multifunction bit should only be turned on if
function='0' AND other functions will be used in the same slot - it
usually isn't needed for functions 1-7 (although there are apparently
some exceptions, e.g. the Intel X53 according to the QEMU source
code), and should never be set if only function 0 will be used in the
slot. The test cases have been changed accordingly to illustrate.
With this patch in place, if a user attempts to assign multiple
functions in a slot without setting the multifunction bit for function
0, libvirt will issue an error when the domain is defined, and the
define operation will fail. In the future, we may decide to detect
this situation and automatically add multifunction=on to avoid the
error; even then it will still be useful to have a manual method of
turning on multifunction since, as stated above, there are some
devices that excpect it to be turned on for all functions in a slot.
A side effect of this patch is that attempts to use the same PCI
address for two different devices will now log an error (previously
this would cause the domain define operation to fail, but there would
be no log message generated). Because the function doing this log was
almost completely rewritten, I didn't think it worthwhile to make a
separate patch for that fix (the entire patch would immediately be
obsoleted).
Currently, qemuDomainGetXMLDesc and qemudDomainGetInfo check for
outstanding synchronous job before (eventual) monitor entering.
However, there can be already async job set, e.g. migration.
If the daemon is restarted so we reconnect to monitor, cdrom media
can be ejected. In that case we don't want to show it in domain xml,
or require it on migration destination.
To check for disk status use 'info block' monitor command.
* src/qemu/qemu_migration.c: if 'vmdef' is NULL, the function
virDomainSaveConfig still dereferences it, it doesn't make
sense, so should add return value check to make sure 'vmdef'
is non-NULL before calling virDomainSaveConfig, in addition,
in order to debug later, also should record error information
into log.
Signed-off-by: Alex Jia <ajia@redhat.com>
First hypervisor implementation of the new API.
Allows 'virsh snapshot-list --tree' to be more efficient.
* src/qemu/qemu_driver.c (qemuDomainSnapshotGetParent): New
function.
If a domain started with -no-shutdown shuts down while libvirtd is not
running, it will be seen as paused when libvirtd reconnects to it. Use
the paused reason to detect if a domain was stopped because of shutdown
and finish the process just as if a SHUTDOWN event is delivered from
qemu.
This patch was made in response to:
https://bugzilla.redhat.com/show_bug.cgi?id=738095
In short, qemu's default for the rombar setting (which makes the
firmware ROM of a PCI device visible/not on the guest) was previously
0 (not visible), but they recently changed the default to 1
(visible). Unfortunately, there are some PCI devices that fail in the
guest when rombar is 1, so the setting must be exposed in libvirt to
prevent a regression in behavior (it will still require explicitly
setting <rom bar='off'/> in the guest XML).
rombar is forced on/off by adding:
<rom bar='on|off'/>
inside a <hostdev> element that defines a PCI device. It is currently
ignored for all other types of devices.
At the moment there is no clean method to determine whether or not the
rombar option is supported by QEMU - this patch uses the advice of a
QEMU developer to assume support for qemu-0.12+. There is currently a
patch in the works to put this information in the output of "qemu-kvm
-device pci-assign,?", but of course if we switch to keying off that,
we would lose support for setting rombar on all the versions of qemu
between 0.12 and whatever version gets that patch.
SIGTERM handling for -no-shutdown is already fixed in qemu git and
libvirt can safely use it. The downside is that 0.15.50 version of qemu
can be any qemu compiled from git, even that without the fix for
SIGTERM. However, I think this patch is worth it since excluding 0.15.50
from the check makes testing current qemu with libvirt much easier and
someone running qemu from git should be able to rebuild fixed qemu from
git if they hit the problem with a hang on shutdown.
QEMU 0.13 introduced cache=unsafe for -drive, this patch exposes
it in the libvirt layer.
* Introduced a new QEMU capability flag ($prefix_CACHE_UNSAFE),
as even if $prefix_CACHE_V2 is set, we can't know if unsafe
is supported.
* Improved the reliability of qemu cache type detection.
If a domain has inactive XML we want to transfer it to destination
when migrating with VIR_MIGRATE_PERSIST_DEST. In order to harm
the migration protocol as least as possible, a optional cookie was
chosen.
The previous patch removed all snapshots, but not the directory
where the snapshots lived, which is still a form of stale data.
* src/qemu/qemu_domain.c (qemuDomainRemoveInactive): Wipe any
snapshot directory.
Commit 282fe1f0 documented that transient domains will auto-delete
any snapshot metadata when the last reference to the domain is
removed, and that management apps are in charge of grabbing any
snapshot metadata prior to that point. However, this was not
actually implemented for qemu until now.
* src/qemu/qemu_driver.c (qemudDomainCreate)
(qemuDomainDestroyFlags, qemuDomainSaveInternal)
(qemudDomainCoreDump, qemuDomainRestoreFlags, qemudDomainDefine)
(qemuDomainUndefineFlags, qemuDomainMigrateConfirm3)
(qemuDomainRevertToSnapshot): Clean up snapshot metadata.
* src/qemu/qemu_migration.c (qemuMigrationPrepareAny)
(qemuMigrationPerformJob, qemuMigrationPerformPhase)
(qemuMigrationFinish): Likewise.
* src/qemu/qemu_process.c (qemuProcessHandleMonitorEOF)
(qemuProcessReconnect, qemuProcessReconnectHelper)
(qemuProcessAutoDestroyDom): Likewise.
This patch is mostly code motion - moving some functions out
of qemu_driver and into qemu_domain so they can be reused by
multiple qemu_* files (since qemu_driver.h must not grow).
It also adds a new helper function, qemuDomainRemoveInactive,
which will be used in the next patch.
* src/qemu/qemu_domain.h (qemuFindQemuImgBinary)
(qemuDomainSnapshotWriteMetadata, qemuDomainSnapshotForEachQcow2)
(qemuDomainSnapshotDiscard, qemuDomainSnapshotDiscardAll)
(qemuDomainRemoveInactive): New prototypes.
(struct qemu_snap_remove): New struct.
* src/qemu/qemu_domain.c (qemuDomainRemoveInactive)
(qemuDomainSnapshotDiscardAllMetadata): New functions.
(qemuFindQemuImgBinary, qemuDomainSnapshotWriteMetadata)
(qemuDomainSnapshotForEachQcow2, qemuDomainSnapshotDiscard)
(qemuDomainSnapshotDiscardAll): Move here...
* src/qemu/qemu_driver.c (qemuFindQemuImgBinary)
(qemuDomainSnapshotWriteMetadata, qemuDomainSnapshotForEachQcow2)
(qemuDomainSnapshotDiscard, qemuDomainSnapshotDiscardAll): ...from
here.
(qemuDomainUndefineFlags): Update caller.
* src/conf/domain_conf.c (virDomainRemoveInactive): Doc fixes.
Commit 19f8c98 introduced VIR_DOMAIN_UNDEFINE_SNAPSHOTS_METADATA,
with the intent that omitting the flag makes undefine fail, and
including the flag deletes metadata. But it used the wrong logic.
Also, hoist the transient domain sooner, so that we don't
accidentally remove metadata of a transient domain.
* src/qemu/qemu_driver.c (qemuDomainUndefineFlags): Check correct
flag value.
* src/qemu/qemu_process.c: Taking if (qemuDomainObjEndJob(driver, obj) == 0)
true branch then 'obj' is NULL, virDomainObjIsActive(obj) and
virDomainObjUnref(obj) will dereference NULL pointer.
Signed-off-by: Alex Jia <ajia@redhat.com>
Once virDomainReboot is called for a domain, guest OS initiated shutdown
would always result in reboot instead of shutdown. Only
virDomainShutdown would actually shutd such domain down. That's because
we forgot to reset fakeReboot flag once we asked the domain to reboot.
The commit that prevents disk corruption on domain shutdown
(96fc478417) causes regression with QEMU
0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
only recently in QEMU git. The affected versions of QEMU do not quit on
SIGTERM if started with -no-shutdown, which we use to implement fake
reboot. Since -no-shutdown tells QEMU not to quit automatically on guest
shutdown, domains started using the affected QEMU cannot be shutdown
properly and stay in a paused state.
This patch disables fake reboot feature on such QEMU by not using
-no-shutdown, which makes shutdown work as expected. However,
virDomainReboot will not work in this case and it will report "Requested
operation is not valid: Reboot is not supported with this QEMU binary".
gcc warns when building libvirt 0.9.5 on a 32-bit machine:
qemu/qemu_migration.c: In function 'qemuMigrationToFile':
qemu/qemu_migration.c:2727:38: error: large integer implicitly truncated to unsigned type [-Woverflow]
* src/qemu/qemu_domain.h (QEMU_DOMAIN_FILE_MIG_BANDWIDTH_MAX): Cap
to long when building for 32-bit platform.
Virsh man page lists driver types to be used with attach-device
command, but does not specify that those are usable only with the XEN
Hypervisor.
This patch adds statement, that those options specified are applicable
only on the Xen hypervisor and adds option usable with qemu emulator.
This patch also changes type of error returned by QEMU driver if the
user specifies incompatible driver type from VIR_ERR_INTERNAL_ERROR to
VIR_ERR_CONFIG_UNSUPPORTED.
For all types of disks other than qcow2, we were requesting that
SELinux labeling visit the new file as if it were qcow2, which
means labeling would try to find the backing files of an empty file.
And for a pre-existing qcow2 disk, we were passing NULL, which meant
that labelling tried to probe the file type (and if probing is
disabled, per the default qemu.conf, this made snapshots fail).
What we really want is to make SELinux labeling visit the new
file as raw; it will later be converted to qcow2 if qemu successfully
made the snapshot.
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Force SELinux labeling
to avoid probe of new file.
For external snapshots to be useful on persistent domains, we must
alter the persistent definition alongside the running definition.
Thanks to the possibility of disk hotplug as well as of edits that
only affect the persistent xml, we can't assume that vm->def and
vm->newDef have the same disk at the same index, so we can only
update the persistent copy if the device destination matches up.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive)
(qemuDomainSnapshotCreateSingleDiskActive): Also affect newDef, if
present.
Qemu sends STOP event as part of the shutdown process. Detect such STOP
event and consider shutdown to be reason of emitting such event. That's
the best we can do until qemu provides us the reason directly in STOP
event. This allows us to report shutdown reason for paused state so that
apps can detect domains that failed to finish the shutdown process
(e.g., because qemu is buggy and doesn't exit on SIGTERM or it is
blocked in flushing disk buffers).
Ever since we introduced fake reboot, we call qemuProcessKill as a
reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it
flushed all internal buffers before sending SHUTDOWN, in which case
killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush
all buffers and exit. Once qemu exits, we will see an EOF on monitor
connection and tear down the domain. In case qemu ignores SIGTERM or
just hangs there, the process stays running but that's not any different
from a possible hang anytime during the shutdown process so I think it's
just fine.
Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing
which causes it not to exit but instead send new SHUTDOWN event and keep
waiting. I think the best we can do is to ignore duplicate SHUTDOWN
events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused
state.
When a domain is rebooted using libvirt API, we use fake reboot
consisting of shutting down and resetting the domain. Thus we see a
SHUTDOWN event and set gotShutdown flag. But we never reset it back and
if the domain crashes after it was rebooted this way, we consider it was
a normal shutdown and not a crash.
Commit 4454a9efc7 changed shutoff reason
from VIR_DOMAIN_SHUTOFF_CRASHED to VIR_DOMAIN_SHUTOFF_FAILED in case we
see an unexpected EOF on monitor connection. But FAILED reason is
dedicated for domains that fail to start. CRASHED reason is the right
one to use in this situation.
This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter problem but a connection sharing / closure issue.
https://bugzilla.redhat.com/show_bug.cgi?id=738778
Depending on the speed / #CPUs of the machine you are using you may not see this bug all the time.
Adjust qemuMigrationRun() to use migMaxBandwidth in qemuDomainObjPrivate
structure when setting qemu migration speed. Caller-specified 'resource'
parameter overrides migMaxBandwidth.
The qemu migration speed default is 32MiB/s as defined in migration.c
/* Migration speed throttling */
static int64_t max_throttle = (32 << 20);
There's no need to throttle migration when targeting a file, so set migration
speed to unlimited prior to migration, and restore to libvirt default value
after migration.
Default units is MB for migrate_set_speed monitor command, so
(INT64_MAX / (1024 * 1024)) is used for unlimited migration speed.
Tested with both json and text monitors.
Now that migration speed is stored in qemuDomainObjPrivate structure,
save the new value when invoking qemuDomainMigrateSetMaxSpeed().
Allow setting migration speed on inactive domain too.
The maximum bandwidth that can be consumed when migrating a domain
is better classified as an operational vs configuration parameter of
the dommain. As such, store this parameter in qemuDomainObjPrivate
structure.
Commit 498d783 cleans up some of virtual file names for parsing strings
in memory. This patch cleans up (hopefuly) the rest forgotten by the
first patch.
This patch also changes all of the previously modified "filenames" to
valid URI's replacing spaces for underscores.
Changes to v1:
- Replace all spaces for underscores, so that the strings form valid
URI's
- Replace spaces in places changed by commit 498d783
Regression introduced in commit 3881a470, due to an improper rebase
of a cleanup written beforehand but only applied after a rebased of
a refactoring that created a new function in commit 25fb3ef.
Also avoids passing NULL to printf %s.
* src/qemu/qemu_driver.c: In qemuDomainSnapshotForEachQcow2()
it free up the memory of qemu_driver->qemuImgBinary in the
cleanup tag which leads to the garbage value of qemuImgBinary
in qemu_driver struct and libvirtd crash when running
"virsh snapshot-create" command a second time.
Signed-off-by: Eric Blake <eblake@redhat.com>
Regression introduced in commit 89b6284fd, due to an incorrect
conversion to the new means of converting disk names back to
the correct object.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Avoid NULL deref.
This patch enables modifying network device configuration using the
virUpdateDeviceFlags API method. Matching of devices is accomplished
using MAC addresses.
While updating live configuration of a running domain, the user is
allowed only to change link state of the interface. Additional
modifications may be added later. For now the code checks for
unsupported changes and thereafter changes the link state, if
applicable.
When updating persistent configuration of guest's network interface the
whole configuration (except for the MAC address) may be modified and
is stored for the next startup.
* src/qemu/qemu_driver.c - Add dispatching of virUpdateDevice for
network devices update (live/config)
* src/qemu/qemu_hotplug.c - add setting of initial link state on live
device addition
- add function to change network device
configuration. By now it supports only
changing of link state
* src/qemu/qemu_hotplug.h - Headers to above functions
* src/qemu/qemu_process.c - set link states before virtual machine
start. Qemu does not support setting of
this on the command line.
This patch adds handlers for modification of guest's interface
link state. Both HMP and QMP commands are supported, but as the
link state functionality is from the beginning supported in QMP
the HMP code will probably never be used.
The mainly changes are:
1) Update qemuMonitorGetBlockStatsInfo and it's children (Text/JSON)
functions to return the value of new latency fields.
2) Add new function qemuMonitorGetBlockStatsParamsNumber, which is
to count how many parameters the underlying QEMU supports.
3) Update virDomainBlockStats in src/qemu/qemu_driver.c to be
compatible with the changes by 1).
If libvirt daemon gets restarted and there is (at least) one
unresponsive qemu, the startup procedure hangs up. This patch creates
one thread per vm in which we try to reconnect to monitor. Therefore,
blocking in one thread will not affect other APIs.
This patch creates an optional BeginJob queue size limit. When
active, all other attempts above level will fail. To set this
feature assign desired value to max_queued variable in qemu.conf.
Setting it to 0 turns it off.
This patch annotates APIs with low or high priority.
In low set MUST be all APIs which might eventually access monitor
(and thus block indefinitely). Other APIs may be marked as high
priority. However, some must be (e.g. domainDestroy).
For high priority calls (HPC), there are some high priority workers
(HPW) created in the pool. HPW can execute only HPC, although normal
worker can process any call regardless priority. Therefore, only those
APIs which are guaranteed to end in reasonable small amount of time
can be marked as HPC.
The size of this HPC pool is static, because HPC are expected to end
quickly, therefore jobs assigned to this pool will be served quickly.
It can be configured in libvirtd.conf via prio_workers variable.
Default is set to 5.
To mark API with low or high priority, append priority:{low|high} to
it's comment in src/remote/remote_protocol.x. This is similar to
autogen|skipgen. If not marked, the generator assumes low as default.
With this, it is now possible to create external snapshots even
when SELinux is enforcing, and to protect the new file with a
lock manager.
* src/qemu/qemu_driver.c
(qemuDomainSnapshotCreateSingleDiskActive): Create and register
new file with proper permissions and locks.
(qemuDomainSnapshotCreateDiskActive): Update caller.
Lots of earlier patches led up to this point - the qemu snapshot_blkdev
monitor command can now be controlled by libvirt! Well, insofar as
SELinux doesn't prevent qemu from open(O_CREAT) on the files. There's
still some followup work before things work with SELinux enforcing,
but this patch is big enough to post now.
There's still room for other improvements, too (for example, taking a
disk snapshot of an inactive domain, by using qemu-img for both internal
and external snapshots; wiring up delete and revert control, including
additional flags from my RFC; supporting active QED disk snapshots;
supporting per-storage-volume snapshots such as LVM or btrfs snapshots;
etc.). But this patch is the one that proves the new XML works!
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Wire in
active disk snapshots.
(qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateDiskActive)
(qemuDomainSnapshotCreateSingleDiskActive): New functions.
No one uses this yet, but it will be important once
virDomainSnapshotCreateXML learns a VIR_DOMAIN_SNAPSHOT_DISK_ONLY
flag, and the xml allows passing in the new file names.
* src/qemu/qemu_monitor.h (qemuMonitorDiskSnapshot): New prototype.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): New
function.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Likewise.
Snapshots alter the set of disk image files opened by qemu, so
they must be audited. But they don't involve a full disk definition
structure, just the new filename. Make the next patch easier by
refactoring the audit routines to just operate on file name.
* src/conf/domain_audit.h (virDomainAuditDisk): Update prototype.
* src/conf/domain_audit.c (virDomainAuditDisk): Act on strings,
not definition structures.
(virDomainAuditStart): Update caller.
* src/qemu/qemu_hotplug.c (qemuDomainChangeEjectableMedia)
(qemuDomainAttachPciDiskDevice, qemuDomainAttachSCSIDisk)
(qemuDomainAttachUsbMassstorageDevice)
(qemuDomainDetachPciDiskDevice, qemuDomainDetachDiskDevice):
Likewise.
My RFC for snapshot support [1] proposes several rules for when it is
safe to delete or revert to an external snapshot, predicated on
the existence of new API flags. These will be incrementally added
in future patches, but until then, blindly mishandling a disk
snapshot risks corrupting internal state, so it is better to
outright reject the attempts until the other pieces are in place,
thus incrementally relaxing the restrictions added in this patch.
[1] https://www.redhat.com/archives/libvir-list/2011-August/msg00361.html
* src/qemu/qemu_driver.c (qemuDomainSnapshotCountExternal): New
function.
(qemuDomainUndefineFlags, qemuDomainSnapshotDelete): Use it to add
safety valve.
(qemuDomainRevertToSnapshot, qemuDomainSnapshotCreateXML): Add safety
valve.
Prior to this patch, <domainsnapshot>/<disks> was ignored. This
changes it to be an error unless an explicit disk snapshot is
requested (a future patch may relax things if it turns out to
be useful to have a <disks> specification alongside a system
checkpoint).
* include/libvirt/libvirt.h.in
(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY): New flag.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it.
* src/esx/esx_driver.c (esxDomainSnapshotCreateXML): Disk
snapshots not supported yet.
* src/vbox/vbox_tmpl.c (vboxDomainSnapshotCreateXML): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Likewise.
I got confused when 'virsh domblkinfo dom disk' required the
path to a disk (which can be ambiguous, since a single file
can back multiple disks), rather than the unambiguous target
device name that I was using in disk snapshots. So, in true
developer fashion, I went for the best of both worlds - all
interfaces that operate on a disk (aka block) now accept
either the target name or the unambiguous path to the backing
file used by the disk.
* src/conf/domain_conf.h (virDomainDiskIndexByName): Add
parameter.
(virDomainDiskPathByName): New prototype.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/conf/domain_conf.c (virDomainDiskIndexByName): Also allow
searching by path, and decide whether ambiguity is okay.
(virDomainDiskPathByName): New function.
(virDomainDiskRemoveByName, virDomainSnapshotAlignDisks): Update
callers.
* src/qemu/qemu_driver.c (qemudDomainBlockPeek)
(qemuDomainAttachDeviceConfig, qemuDomainUpdateDeviceConfig)
(qemuDomainGetBlockInfo, qemuDiskPathToAlias): Likewise.
* src/qemu/qemu_process.c (qemuProcessFindDomainDiskByPath):
Likewise.
* src/libxl/libxl_driver.c (libxlDomainAttachDeviceDiskLive)
(libxlDomainDetachDeviceDiskLive, libxlDomainAttachDeviceConfig)
(libxlDomainUpdateDeviceConfig): Likewise.
* src/uml/uml_driver.c (umlDomainBlockPeek): Likewise.
* src/xen/xend_internal.c (xenDaemonDomainBlockPeek): Likewise.
* docs/formatsnapshot.html.in: Update documentation.
* tools/virsh.pod (domblkstat, domblkinfo): Likewise.
* docs/schemas/domaincommon.rng (diskTarget): Tighten pattern on
disk targets.
* docs/schemas/domainsnapshot.rng (disksnapshot): Update to match.
* tests/domainsnapshotxml2xmlin/disk_snapshot.xml: Update test.
Since a snapshot is fully recoverable, it is useful to have a
snapshot as a means of hibernating a guest, then reverting to
the snapshot to wake the guest up. This mode of usage is
similar to 'virsh save/virsh restore', except that virsh
save uses an external file while virsh snapshot keeps the
vm state internal to a qcow2 file. However, it only works on
persistent domains.
In the usage pattern of snapshot/revert for hibernating a guest,
there is no need to keep the guest running between the two points
in time, especially since that would generate runtime state that
would just be discarded. Add a flag to make it possible to
stop the domain after the snapshot has completed.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_CREATE_HALT):
New flag.
* src/libvirt.c (virDomainSnapshotCreateXML): Document it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML)
(qemuDomainSnapshotCreateActive): Implement it.
Reverting to a state prior to an external snapshot risks
corrupting any other branches in the snapshot hierarchy that
were using the snapshot as a read-only backing file. So
disk snapshot code will default to preventing reverting to
a snapshot that has any children, meaning that deleting just
the children of a snapshot becomes a useful operation in
preparing that snapshot for being a future reversion target.
The code for the new flag is simple - it's one less deletion,
plus a tweak to keep the current snapshot correct.
* include/libvirt/libvirt.h.in
(VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY): New flag.
* src/libvirt.c (virDomainSnapshotDelete): Document it, and
enforce mutual exclusion.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDelete): Implement
it.
The previous patch introduced new config, but if a hypervisor does
not support that new config, someone can write XML that does not
behave as documented. This prevents some of those cases by
explicitly rejecting transient disks for several hypervisors.
Disk snapshots will require a new flag to actually affect a snapshot
creation, so there's not much to reject there.
* src/qemu/qemu_command.c (qemuBuildDriveStr): Reject transient
disks for now.
* src/libxl/libxl_conf.c (libxlMakeDisk): Likewise.
* src/xenxs/xen_sxpr.c (xenFormatSxprDisk): Likewise.
* src/xenxs/xen_xm.c (xenFormatXMDisk): Likewise.
When reverting to a snapshot, the inactive domain configuration
has to be rolled back to what it was at the time of the snapshot.
Additionally, if the VM is active and the snapshot was active,
this now adds a failure if the two configurations are ABI
incompatible, rather than risking qemu confusion.
A future patch will add a VIR_DOMAIN_SNAPSHOT_FORCE flag, which
will be required for two risky code paths - reverting to an
older snapshot that lacked full domain information, and reverting
from running to a live snapshot that requires starting a new qemu
process. Any reverting that stops a running vm is also a form
of data loss (discarding the current running state to go back in
time), but as that is what reversion usually implies, it is
probably not worth requiring a force flag.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Copy out
domain.
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot): Perform
ABI compatibility checks.
Just like VM saved state images (virsh save), snapshots MUST
track the inactive domain xml to detect any ABI incompatibilities.
The indentation is not perfect, but functionality comes before form.
Later patches will actually supply a full domain; for now, this
wires up the storage to support one, but doesn't ever generate one
in dumpxml output.
Happily, libvirt.c was already rejecting use of VIR_DOMAIN_XML_SECURE
from read-only connections, even though before this patch, there was
no information to be secured by the use of that flag.
And while we're at it, mark the libvirt snapshot metadata files
as internal-use only.
* src/libvirt.c (virDomainSnapshotGetXMLDesc): Document flag.
* src/conf/domain_conf.h (_virDomainSnapshotDef): Add member.
(virDomainSnapshotDefParseString, virDomainSnapshotDefFormat):
Update signature.
* src/conf/domain_conf.c (virDomainSnapshotDefFree): Clean up.
(virDomainSnapshotDefParseString): Optionally parse domain.
(virDomainSnapshotDefFormat): Output full domain.
* src/esx/esx_driver.c (esxDomainSnapshotCreateXML)
(esxDomainSnapshotGetXMLDesc): Update callers.
* src/vbox/vbox_tmpl.c (vboxDomainSnapshotCreateXML)
(vboxDomainSnapshotGetXMLDesc): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML)
(qemuDomainSnapshotLoad, qemuDomainSnapshotGetXMLDesc)
(qemuDomainSnapshotWriteMetadata): Likewise.
* docs/formatsnapshot.html.in: Rework doc example.
Based on a patch by Philipp Hahn.
Migration is another case of stranding metadata. And since
snapshot metadata is arbitrarily large, there's no way to
shoehorn it into the migration cookie of migration v3.
This patch consolidates two existing locations for migration
validation into one helper function, then enhances that function
to also do the new checks. If we could always trust the source
to validate migration, then the destination would not have to
do anything; but since older servers that did not do checking
can migrate to newer destinations, we have to repeat some of
the same checks on the destination; meanwhile, we want to
detect failures as soon as possible. With migration v2, this
means that validation will reject things at Prepare on the
destination if the XML exposes the problem, otherwise at Perform
on the source; with migration v3, this means that validation
will reject things at Begin on the source, or if the source
is old and the XML exposes the problem, then at Prepare on the
destination.
This patch is necessarily over-strict. Once a later patch
properly handles auto-cleanup of snapshot metadata on the
death of a transient domain, then the only time we actually
need snapshots to prevent migration is when using the
--undefinesource flag on a persistent source domain.
It is possible to recreate snapshot metadata on the destination
with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE and
VIR_DOMAIN_SNAPSHOT_CREATE_CURRENT. But for now, that is limited,
since if we delete the snapshot metadata prior to migration,
then we won't know the name of the current snapshot to pass
along; and if we delete the snapshot metadata after migration
and use the v3 migration cookie to pass along the name of the
current snapshot, then we need a way to bypass the fact that
this patch refuses migration with snapshot metadata present.
So eventually, we may have to introduce migration protocol v4
that allows feature negotiation and an arbitrary number of
handshake exchanges, so as to pass as many rpc calls as needed
to transfer all the snapshot xml hierarchy.
But all of that is thoughts for the future; for now, the best
course of action is to quit early, rather than get into a
funky state of stale metadata; then relax restrictions later.
* src/qemu/qemu_migration.h (qemuMigrationIsAllowed): Make static.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Alter
signature, and allow checks for both outgoing and incoming.
(qemuMigrationBegin, qemuMigrationPrepareAny)
(qemuMigrationPerformJob): Update callers.
A nice benefit of deleting all snapshots at undefine time is that
you don't have to do any reparenting or subtree identification - since
everything goes, this is an O(n) process, whereas using multiple
virDomainSnapshotDelete calls would be O(n^2) or worse. But it is
only doable for snapshot metadata, where we are in control of the
data being deleted; for the actual snapshots, there's too much
likelihood of something going wrong, and requiring even more API
calls to figure out what failed in the meantime, so callers are
better off deleting the snapshot data themselves one snapshot at
a time where they can deal with failures as they happen.
* src/qemu/qemu_driver.c (qemuDomainUndefineFlags): Honor new flags.
As more clients start to want to know this information, doing
a PATH stat walk and malloc for every client adds up.
We are only caching the location, not the capabilities, so even
if qemu-img is updated in the meantime, it will still probably
live in the same location. So there is no need to worry about
clearing this particular cache.
* src/qemu/qemu_conf.h (qemud_driver): Add member.
* src/qemu/qemu_driver.c (qemudShutdown): Cleanup.
(qemuFindQemuImgBinary): Add an argument, and cache result.
(qemuDomainSnapshotForEachQcow2, qemuDomainSnapshotDiscard)
(qemuDomainSnapshotCreateInactive, qemuDomainSnapshotRevertInactive)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot): Update
callers.
Just as leaving managed save metadata behind can cause problems
when creating a new domain that happens to collide with the name
of the just-deleted domain, the same is true of leaving any
snapshot metadata behind. For safety sake, extend the semantic
change of commit b26a9fa9 to also cover snapshot metadata as a
reason to reject undefining an inactive domain. A future patch
will make sure that shutdown of a transient domain automatically
deletes snapshot metadata (whether by destroy, shutdown, or
guest-initiated action). Management apps of transient domains
should take care to capture xml of snapshots, if it is necessary
to recreate the snapshot metadata on a later transient domain
with the same name and uuid.
This also documents a new flag that hypervisors can choose to
support as a shortcut for taking care of the metadata as part of
the undefine process; however, nontrivial driver support for these
flags will be deferred to future patches.
Note that ESX and VBox can never be transient; therefore, they
do not have to worry about automatic cleanup after shutdown
(the persistent domain still remains); likewise they never
store snapshot metadata, so the undefine flag is trivial.
The nontrivial work remaining is thus in the qemu driver.
* include/libvirt/libvirt.h.in
(VIR_DOMAIN_UNDEFINE_SNAPSHOTS_METADATA): New flag.
* src/libvirt.c (virDomainUndefine, virDomainUndefineFlags):
Document new limitations and flag.
* src/esx/esx_driver.c (esxDomainUndefineFlags): Trivial
implementation.
* src/vbox/vbox_tmpl.c (vboxDomainUndefineFlags): Likewise.
* src/qemu/qemu_driver.c (qemuDomainUndefineFlags): Enforce
the limitations.
Redefining a qemu snapshot requires a bit of a tweak to the common
snapshot parsing code, but the end result is quite nice.
Be careful that redefinitions do not introduce circular parent
chains. Also, we don't want to allow conversion between online
and offline existing snapshots. We could probably do some more
validation for snapshots that don't already exist to make sure
they are even feasible, by parsing qemu-img output, but that
can come later.
* src/conf/domain_conf.h (virDomainSnapshotParseFlags): New
internal flags.
* src/conf/domain_conf.c (virDomainSnapshotDefParseString): Alter
signature to take internal flags.
* src/esx/esx_driver.c (esxDomainSnapshotCreateXML): Update caller.
* src/vbox/vbox_tmpl.c (vboxDomainSnapshotCreateXML): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support
new public flags.
Supporting NO_METADATA on snapshot creation is interesting - we must
still return a valid opaque snapshot object, but the user can't get
anything out of it (unless we add a virDomainSnapshotGetName()),
since it is no longer registered with the domain.
Also, virsh now tries to query for secure xml, in anticipation of
when we store <domain> xml inside <domainsnapshot>; for now, we
can trivially support it, since we have nothing secure.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support
new flag.
(qemuDomainSnapshotGetXMLDesc): Trivially support VIR_DOMAIN_XML_SECURE.
To make it easier to know when undefine will fail because of existing
snapshot metadata, we need to know how many snapshots have metadata.
Also, it is handy to filter the list of snapshots to just those that
have no parents; document that flag now, but implement it in later patches.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_LIST_ROOTS)
(VIR_DOMAIN_SNAPSHOT_LIST_METADATA): New flags.
* src/libvirt.c (virDomainSnapshotNum)
(virDomainSnapshotListNames): Document them.
* src/esx/esx_driver.c (esxDomainSnapshotNum)
(esxDomainSnapshotListNames): Implement trivial flag.
* src/vbox/vbox_tmpl.c (vboxDomainSnapshotNum)
(vboxDomainSnapshotListNames): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSnapshotNum)
(qemuDomainSnapshotListNames): Likewise.
Adding this was trivial compared to the previous patch for fixing
qemu snapshot deletion in the first place.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiscard): Add
parameter.
(qemuDomainSnapshotDiscardDescendant, qemuDomainSnapshotDelete):
Update callers.
Similar to the last patch in isolating the filtering from the
client actions, so that clients don't have to reinvent the
filtering.
* src/conf/domain_conf.h (virDomainSnapshotForEachChild): New
prototype.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/conf/domain_conf.c (virDomainSnapshotActOnChild)
(virDomainSnapshotForEachChild): New functions.
(virDomainSnapshotCountChildren): Delete.
(virDomainSnapshotHasChildren): Simplify.
* src/qemu/qemu_driver.c (qemuDomainSnapshotReparentChildren)
(qemuDomainSnapshotDelete): Likewise.
Deleting a snapshot and all its descendants had problems with
tracking the current snapshot. The deletion does not necessarily
proceed in depth-first order, so a parent could be deleted
before a child, wreaking havoc on passing the notion of the
current snapshot to the parent. Furthermore, even if traversal
were depth-first, doing multiple file writes to pass current up
the chain one snapshot at a time is wasteful, comparing to a
single update to the current snapshot at the end of the algorithm.
* src/qemu/qemu_driver.c (snap_remove): Add field.
(qemuDomainSnapshotDiscard): Add parameter.
(qemuDomainSnapshotDiscardDescendant): Adjust accordingly.
(qemuDomainSnapshotDelete): Properly reset current.
This one's nasty. Ever since we fixed virHashForEach to prevent
nested hash iterations for safety reasons (commit fba550f6),
virDomainSnapshotDelete with VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN
has been broken for qemu: it deletes children, while leaving
grandchildren intact but pointing to a no-longer-present parent.
But even before then, the code would often appear to succeed to
clean up grandchildren, but risked memory corruption if you have
a large and deep hierarchy of snapshots.
For acting on just children, a single virHashForEach is sufficient.
But for acting on an entire subtree, it requires iteration; and
since we declared recursion as invalid, we have to switch to a
while loop. Doing this correctly requires quite a bit of overhaul,
so I added a new helper function to isolate the algorithm from the
actions, so that callers do not have to reinvent the iteration.
Note that this _still_ does not handle CHILDREN correctly if one
of the children is the current snapshot; that will be next.
* src/conf/domain_conf.h (_virDomainSnapshotDef): Add mark.
(virDomainSnapshotForEachDescendant): New prototype.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/conf/domain_conf.c (virDomainSnapshotMarkDescendant)
(virDomainSnapshotActOnDescendant)
(virDomainSnapshotForEachDescendant): New functions.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiscardChildren):
Replace...
(qemuDomainSnapshotDiscardDescenent): ...with callback that
doesn't nest hash traversal.
(qemuDomainSnapshotDelete): Use new function.
For a system checkpoint of a running or paused domain, it's fairly
easy to honor new flags for altering which state to use after the
revert. For an inactive snapshot, the revert has to be done while
there is no qemu process, so do back-to-back transitions; this also
lets us revert to inactive snapshots even for transient domains.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Support new
flags.
Commit 5e47785 broke reverts to offline system checkpoint snapshots
with older qemu, since there is no longer any code path to use
qemu -loadvm on next boot. Meanwhile, reverts to offline system
checkpoints have been broken for newer qemu, both before and
after that commit, since -loadvm no longer works to revert to
disk state without accompanying vm state. Fix both of these by
using qemu-img to revert disk state.
Meanwhile, consolidate the (now 3) clients of a qemu-img iteration
over all disks of a VM into one function, so that any future
algorithmic fixes to the FIXMEs in that function after partial
loop iterations are dealt with at once. That does mean that this
patch doesn't handle partial reverts very well, but we're not
making the situation any worse in this patch.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use
qemu-img rather than 'qemu -loadvm' to revert to offline snapshot.
(qemuDomainSnapshotRevertInactive): New helper.
(qemuDomainSnapshotCreateInactive): Factor guts...
(qemuDomainSnapshotForEachQcow2): ...into new helper.
(qemuDomainSnapshotDiscard): Use it.
If you take a checkpoint snapshot of a running domain, then pause
qemu, then restore the snapshot, the result should be a running
domain, but the code was leaving things paused. Furthermore, if
you take a checkpoint of a paused domain, then run, then restore,
there was a brief but non-deterministic window of time where the
domain was running rather than paused. Fix both of these
discrepancies by always pausing before restoring.
Also, check that the VM is active every time lock is dropped
between two monitor calls.
Finally, straighten out the events that get emitted on each
transition.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Always
pause before reversion, and improve events.
Implement the new running/paused overrides for saved state management.
Unfortunately, for virDomainSaveImageDefineXML, the saved state
updates are write-only - I don't know of any way to expose a way
to query the current run/pause setting of an existing save image
file to the user without adding a new API or modifying the domain
xml of virDomainSaveImageGetXMLDesc to include a new element to
reflect the state bit encoded into the save image. However, I
don't think this is a show-stopper, since the API is designed to
leave the state bit alone unless an explicit flag is used to
change it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSaveImageOpen): Adjust signature.
(qemuDomainSaveFlags, qemuDomainManagedSave)
(qemuDomainRestoreFlags, qemuDomainSaveImageGetXMLDesc)
(qemuDomainSaveImageDefineXML, qemuDomainObjRestore): Adjust
callers.
There are two classes of management apps that track events - one
that only cares about on/off (and only needs to track EVENT_STARTED
and EVENT_STOPPED), and one that cares about paused/running (also
tracks EVENT_SUSPENDED/EVENT_RESUMED). To keep both classes happy,
any transition that can go from inactive to paused must emit two
back-to-back events - one for started and one for suspended (since
later resuming of the domain will only send RESUMED, but the first
class isn't tracking that).
This also fixes a bug where virDomainCreateWithFlags with the
VIR_DOMAIN_START_PAUSED flag failed to start paused when restoring
from a managed save image.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_EVENT_SUSPENDED_RESTORED)
(VIR_DOMAIN_EVENT_SUSPENDED_FROM_SNAPSHOT)
(VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT): New sub-events.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use them.
(qemuDomainSaveImageStartVM): Likewise, and add parameter.
(qemudDomainCreate, qemuDomainObjStart): Send suspended event when
starting paused.
(qemuDomainObjRestore): Add parameter.
(qemuDomainObjStart, qemuDomainRestoreFlags): Update callers.
* examples/domain-events/events-c/event-test.c
(eventDetailToString): Map new detail strings.
QEMU uses USB bus name "usb.0" when using the legacy -usb argument.
If we want to allow USB devices to specify their addresses with legacy
-usb, we should either in case of legacy bus name drop the 0 from the
address bus, or just drop the 0 from device id. This patch does the
later.
Another solution would be to permit addressing on non-legacy USB
controllers only.
So that devices can be attached to hubs. Example, to attach to first
port of a usb-hub on port 1.
<hub type='usb'>
<address type='usb' bus='0' port='1'/>
</hub>
<input type='mouse' type='usb'>
<address type='usb' bus='0' port='1.1'/>
</hub>
also add a test entry
Commit 6766ff10 introduced a corner case bug with snapshot creation:
if a snapshot is created, but then we hit OOM while trying to
create the return value of the function, then we have polluted the
internal directory with the snapshot metadata with no way to clean
it up from the running libvirtd.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Don't
write metadata file on OOM condition.
Newer QEMU introduced cache=directsync for -drive, this patchset
is to expose it in libvirt layer.
* Introduced a new QEMU capability flag ($prefix_CACHE_DIRECTSYNC),
As even $prefix_CACHE_V2 is set, we can't known if directsync
is supported.
Several users have reported problems with 'virsh start' failing because
it was encountering a managed save situation where the managed save file
was incomplete. Be more robust to this by using two different magic
numbers, so that newer libvirt can gracefully handle an incomplete file
differently than a complete one, while older libvirt will at least fail
up front rather than trying to load only to have qemu fail at the end.
Managed save is a convenience - it exists to preserve as much state
as possible; if the state was not preserved, it is reasonable to just
log that fact, then proceed with a fresh boot. On the other hand,
user saves are under user control, so we must fail, but by making
the failure message distinct, the user can better decide how to handle
the situation of an incomplete save file.
* src/qemu/qemu_driver.c (QEMUD_SAVE_PARTIAL): New define.
(qemuDomainSaveInternal): Use it to mark incomplete images.
(qemuDomainSaveImageOpen, qemuDomainObjRestore): Add parameter
that controls what to do with partial images.
(qemuDomainRestoreFlags, qemuDomainSaveImageGetXMLDesc)
(qemuDomainSaveImageDefineXML, qemuDomainObjStart): Update callers.
Based on an initial idea by Osier Yang.
In a SELinux or root-squashing NFS environment, libvirt has to go
through some hoops to create a new file that qemu can then open()
by name. Snapshots are a case where we want to guarantee an empty
file that qemu can open; also, reopening a save file to convert it
from being marked partial to complete requires a reopen to avoid
O_DIRECT headaches. Refactor some existing code to make it easier
to reuse in later patches.
* src/qemu/qemu_migration.h (qemuMigrationToFile): Drop parameter.
* src/qemu/qemu_migration.c (qemuMigrationToFile): Let cgroup do
the stat, rather than asking caller to do it and pass info down.
* src/qemu/qemu_driver.c (qemuOpenFile): New function, pulled from...
(qemuDomainSaveInternal): ...here.
(doCoreDump, qemuDomainSaveImageOpen): Use it here as well.
After supporting multi function pci device, we only reserve function 1 on slot 1.
The user can use the other function on slot 1 in the xml config file. We should
detect this wrong usage.
The libvirt BlockPull API supports the use of an initial bandwidth limit but the
qemu block_stream API does not. To get the desired behavior we use the two APIs
strung together: first BlockPull, then BlockJobSetSpeed. We can do this at the
driver level to avoid duplicated code in each monitor path.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Due to an unfortunate precedent in qemu, the units for the bandwidth parameter
to block_job_set_speed are different between the text monitor and the qmp
monitor. While the qmp monitor uses bytes/s, the text monitor expects MB/s.
Correct the units for the text interface.
Signed-off-by: Adam Litke <agl@us.ibm.com>
It is not possible to change the label of a TCP socket once it
has been opened. When creating a TCP socket care must be taken
to ensure the socket creation label is set & then cleared.
Remove the bogus call to virSecurityManagerSetProcessFDLabel
from the lock driver guest setup code and instead make use of
virSecurityManagerSetSocketLabel
There is no reason to forbid pausing an autodestroy domain
(not to mention that 'virsh start --paused --autodestroy'
succeeds in creating a paused autodestroy domain).
Meanwhile, qemu was failing to enforce the API documentation that
autodestroy domains cannot be saved. And while the original
documentation only mentioned save/restore, snapshots are another
form of saving that are close enough in semantics as to make no
sense on one-shot domains.
* src/qemu/qemu_driver.c (qemudDomainSuspend): Drop bogus check.
(qemuDomainSaveInternal, qemuDomainSnapshotCreateXML): Forbid
saves of autodestroy domains.
* src/libvirt.c (virDomainCreateWithFlags, virDomainCreateXML):
Document snapshot interaction.
According to qemu-kvm/qerror.c all messages start with a capital
"Device ", but the current code only scans for the lower case "device ".
This results in "virDomainUpdateDeviceFlags()" to not detect locked
CD-ROMs and reporting success even in the case of a failure:
# virsh qemu-monitor-command "$VM" change\ drive-ide0-0-0\ \"/var/lib/libvirt/images/ucs_2.4-0-sec4-20110714145916-dvd-amd64.iso\"
Device 'drive-ide0-0-0' is locked
# virsh update-device "$VM" /dev/stdin <<<"<disk type='file' device='cdrom'><driver name='qemu' type='raw'/><source file='/var/lib/libvirt/images/ucs_2.4-0-sec4-20110714145916-dvd-amd64.iso'/><target dev='hda' bus='ide'/><readonly/><alias name='ide0-0-0'/><address type='drive' controller='0' bus='0' unit='0'/></disk>"
Device updated successfully
Signed-off-by: Philipp Hahn <hahn@univention.de>
There have been several instances of people having problems with
a broken managed save file, and not aware that they could use
'virsh managedsave-remove dom' to fix things. Making it possible
to do this as part of starting a domain makes the same functionality
easier to find, and one less API call.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_START_FORCE_BOOT): New
flag.
* src/libvirt.c (virDomainCreateWithFlags): Document it.
* src/qemu/qemu_driver.c (qemuDomainObjStart): Alter signature.
(qemuAutostartDomain, qemuDomainStartWithFlags): Update callers.
* tools/virsh.c (cmdStart): Expose it in virsh.
* tools/virsh.pod (start): Document it.
Commit 3261761 made it possible to use pipes instead of sockets
for outgoing tunneled migration; however, it caused a regression
because the pipe was never given a SELinux label.
* src/qemu/qemu_migration.c (doTunnelMigrate): Label outgoing pipe.
When a user migrates a domain by command as
libvirt saves vm's domain XML config in destination host after migration.
But it saves vm->def. Then, the saved XML contains some garbage.
<domain type='kvm' id='50'>
^^^^^^^^
...
<console type='pty' tty='/dev/pts/5'>
^^^^^^^^^^^^^^^^^
Avoid saving unnecessary things by saving persistent vm definition.
On success, the 'sendkey' command does not return any data, so
any data in the reply should be considered to be an error
message
* src/qemu/qemu_monitor_text.c: Treat non-"" reply data as an
error message for 'sendkey' command
The QEMU 'sendkey' command expects keys to be encoded in the same
way as the RFB extended keycode set. Specifically it wants extended
keys to have the high bit of the first byte set, while the Linux
XT KBD driver codeset uses the low bit of the second byte. To deal
with this we introduce a new keymap 'RFB' and use that in the QEMU
driver
* include/libvirt/libvirt.h.in: Add VIR_KEYCODE_SET_RFB
* src/qemu/qemu_driver.c: Use RFB keycode set instead of XT KBD
* src/util/virkeycode-mapgen.py: Auto-generate the RFB keycode
set from the XT KBD set
* src/util/virkeycode.c: Add RFB keycode entry to table. Add a
verify check on cardinality of the codeOffset table
The APIs are designed to label a socket in a way that the libvirt daemon
itself is able to access it (i.e., in SELinux the label is virtd_t based
as opposed to svirt_* we use for labeling resources that need to be
accessed by a vm). The new name reflects this.
Audit all changes to the qemu vm->current_snapshot, and make them
update the saved xml file for both the previous and the new
snapshot, so that there is always at most one snapshot with
<active>1</active> in the xml, and that snapshot is used as the
current snapshot even across libvirtd restarts.
This patch does not fix the case of virDomainSnapshotDelete(,CHILDREN)
where one of the children is the current snapshot; that will be later.
* src/conf/domain_conf.h (_virDomainSnapshotDef): Alter member
type and name.
* src/conf/domain_conf.c (virDomainSnapshotDefParseString)
(virDomainSnapshotDefFormat): Update clients.
* docs/schemas/domainsnapshot.rng: Tighten rng.
* src/qemu/qemu_driver.c (qemuDomainSnapshotLoad): Reload current
snapshot.
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainSnapshotDiscard): Track current snapshot.
Changing the current vm, and writing that change to the file
system, all before a new qemu starts, is risky; it's hard to
roll back if starting the new qemu fails for some reason.
Instead of abusing vm->current_snapshot and making the command
line generator decide whether the current snapshot warrants
using -loadvm, it is better to just directly pass a snapshot all
the way through the call chain if it is to be loaded.
This frees up the last use of snapshot->def->active for qemu's
use, so the next patch can repurpose that field for tracking
which snapshot is current.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Don't use active
field of snapshot.
* src/qemu/qemu_process.c (qemuProcessStart): Add a parameter.
* src/qemu/qemu_process.h (qemuProcessStart): Update prototype.
* src/qemu/qemu_migration.c (qemuMigrationPrepareAny): Update
callers.
* src/qemu/qemu_driver.c (qemudDomainCreate)
(qemuDomainSaveImageStartVM, qemuDomainObjStart)
(qemuDomainRevertToSnapshot): Likewise.
(qemuDomainSnapshotSetCurrentActive)
(qemuDomainSnapshotSetCurrentInactive): Delete unused functions.
https://bugzilla.redhat.com/show_bug.cgi?id=727709
mentions that if qemu fails to create the snapshot (such as what
happens on Fedora 15 qemu, which has qmp but where savevm is only
in hmp, and where libvirt is old enough to not try the hmp fallback),
then 'virsh snapshot-list dom' will show a garbage snapshot entry,
and the libvirt internal directory for storing snapshot metadata
will have a bogus file.
This fixes the fallout bug of polluting the snapshot-list with
garbage on failure (the root cause of the F15 bug of not having
fallback to hmp has already been fixed in newer libvirt releases).
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Allocate
memory before making snapshot, and cleanup on failure. Don't
dereference NULL if transient domain exited during snapshot creation.
* src/qemu/qemu_migration.c: avoid dead 'ret' assignment and silence
clang warning.
Detected by ccc-analyzer:
CC libvirt_driver_qemu_la-qemu_migration.lo
qemu/qemu_migration.c:2046:5: warning: Value stored to 'ret' is never read
ret = qemuMigrationConfirm(driver, sconn, vm,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pciDeviceListSteal(pcidevs, dev) removes dev from pcidevs reducing
the length of pcidevs, so moving onto what was the next dev is wrong.
Instead callers should pop entry 0 repeatedly until pcidevs is empty.
Signed-off-by: Steve Hodgson <shodgson@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
* src/qemu/qemu_monitor_json.c: Handle error "CommandNotFound" and
report the error.
* src/qemu/qemu_monitor_text.c: If a sub info command is not found,
it prints the output of "help info", for other commands,
"unknown command" is printed.
Without this patch, libvirt always report:
An error occurred, but the cause is unknown
This patch was adapted from a patch by Osier Yang <jyang@redhat.com> to
break out detection of unrecognized text monitor commands into a separate
function.
Signed-off-by: Adam Litke <agl@us.ibm.com>
* src/qemu/qemu_monitor_text.c: BALLOON_PREFIX was defined as
"balloon: actual=", which cause "actual=" is stripped early before
the real parsing. This patch changes BALLOON_PREFIX into "balloon: ",
and modifies related functions, also renames
"qemuMonitorParseExtraBalloonInfo" to "qemuMonitorParseBalloonInfo",
as after the changing, it parses all the info returned by "info balloon".
A virsh command like:
migrate --live --copy-storage-all Guest qemu+ssh://user@host/system
--persistent --verbose
shows
Migration: [ 0 %]
during the storage copy and does not start counting
until the ram transfer starts
Fix this by scraping optional disk transfer status, and adding it
into the progress meter.
Otherwise the device will still be bound to pci-stub driver even
it's set as "managed=yes" when do detaching. Of course, it won't
triger any driver reprobing too.
In some versions of qemu, both virtio-blk-pci and virtio-net-pci
devices can have an event_idx setting that determines some details of
event processing. When it is enabled, it "reduces the number of
interrupts and exits for the guest". qemu will automatically enable
this feature when it is available, but there may be cases where this
new feature could actually make performance worse (NB: no such case
has been found so far).
As a safety switch in case such a situation is encountered in the
field, this patch adds a new attribute "event_idx" to the <driver>
element of both disk and interface devices. event_idx can be set to
"on" (to force event_idx on in case qemu has it disabled by default)
or "off" (for force event_idx off). In the case that event_idx support
isn't present in qemu, the attribute is ignored (this on the advice of
the qemu developer).
docs/formatdomain.html.in: document the new flag (marking it as
"don't mess with this!"
docs/schemas/domain.rng: add event_idx in appropriate places
src/conf/domain_conf.[ch]: add event_idx to parser and formatter
src/libvirt_private.syms: export
virDomainVirtioEventIdx(From|To)String
src/qemu/qemu_capabilities.[ch]: detect and report event_idx in
disk/net
src/qemu/qemu_command.c: add event_idx parameter to qemu commandline
when appropriate.
tests/qemuxml2argvdata/qemuxml2argv-event_idx.args,
tests/qemuxml2argvdata/qemuxml2argv-event_idx.xml,
tests/qemuxml2argvtest.c,
tests/qemuxml2xmltest.c: test cases for event_idx.
By opening a connection to remote qemu process ourselves and passing the
socket to qemu we get much better errors than just "migration failed"
when the connection is opened by qemu.
The core of these two functions is very similar and most of it is even
exactly the same. Factor out the core functionality into a separate
function to remove code duplication and make further changes easier.
The functions for manipulating pidfiles are in util/util.{c,h}.
We will shortly be adding some further pidfile related functions.
To avoid further growing util.c, this moves the pidfile related
functions into a dedicated virpidfile.{c,h}. The functions are
also all renamed to have 'virPidFile' as their name prefix
* util/util.h, util/util.c: Remove all pidfile code
* util/virpidfile.c, util/virpidfile.h: Add new APIs for pidfile
handling.
* lxc/lxc_controller.c, lxc/lxc_driver.c, network/bridge_driver.c,
qemu/qemu_process.c: Add virpidfile.h include and adapt for API
renames
Our logic throws off analyzer tools:
ptr var = NULL;
if (flags == 0) flags = live ? _LIVE : _CONFIG;
if (flags & _LIVE) do stuff
if (flags & _CONFIG) var = non-null;
if (flags & _LIVE) do more stuff
else if (flags & _CONFIG) use var
the tools keep thinking that var can still be NULL in the last
if clause, adding the hint shuts them up.
* src/qemu/qemu_driver.c (qemuDomainSetBlkioParameters): Add a
static analysis hint.
The following XML:
<serial type='udp'>
<source mode='connect' service='9999'/>
</serial>
is accepted by domain_conf.c but maps to the qemu command line:
-chardev udp,host=127.0.0.1,port=2222,localaddr=(null),localport=(null)
qemu can cope with everything omitting except the connection port, which
seems to also be the intent of domain_conf validation, so let's not
generate bogus command lines for that case.
The defaults are empty strings for addresses and 0 for the localport
Additionally, tweak the qemu cli parsing to handle omitted host
parameters
for -serial udp
Transient domains reject attempts to set autostart, and using
virDomainCreate to restart a domain only works on persistent
domains. Therefore, managed save makes no sense on transient
domains, and should be rejected up front rather than creating
an otherwise unrecoverable managed save file.
Besides, transient domains imply that a lot more management is
being done by the upper layer; this includes the assumption
that the upper layer is okay managing the saved state file
created by virDomainSave, and does not need to use managed save.
* src/libvirt.c: Document that transient domains are incompatible
with managed save.
* src/qemu/qemu_driver.c (qemuDomainManagedSave): Enforce it.
* src/libxl/libxl_driver.c (libxlDomainManagedSave): Likewise.
I noticed some inconsistent use of 'else'.
* src/qemu/qemu_driver.c (qemuCPUCompare)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainSnapshotDiscard): Match coding conventions.
If a snapshot with the name already exists, virDomainSnapshotAssignDef()
just returns NULL, in which case the snapshot definition is leaked.
Currently this leak is not a big problem, since qemuDomainSnapshotLoad()
is only called once during initial startup of libvirtd.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Detected by ccc-analyzer, reported by Alex Jia.
qemuProcessStart always calls qemuProcessWaitForMonitor with a
non-negative position, but qemuProcessAttach always calls with -1.
In the latter case, there is no log file we can scrape, so we
also should not be trying to scrape the logs if the qemu process
died at the very end.
* src/qemu/qemu_process.c (qemuProcessWaitForMonitor): Don't try
to read from log in qemuProcessAttach case.
Value stored to 'ret' is never read, so remove this dead assignment.
* src/qemu/qemu_monitor_text.c: kill dead assignment.
Signed-off-by: Alex Jia <ajia@redhat.com>
Value stored to 'ret' is never read, in fact, 'cleanup' section will
directly return -1 when function is fail, so remove this dead assignment.
* src/qemu/qemu_process.c: kill dead assignment.
Signed-off-by: Alex Jia <ajia@redhat.com>
Quite a few leaks detected by coverity. For chr, the leaks were
close enough to the allocations to plug in place; for disk, the
leaks were separated from the allocation by enough other lines with
intermediate failure cases that I refactored the cleanup instead.
* src/qemu/qemu_command.c (qemuParseCommandLine): Plug leaks.
Warning detected by Coverity. No need for the NULL check, and
removing it silences the warning without any semantic change.
* src/qemu/qemu_migration.c (qemuMigrationFinish): All entries to
endjob had non-NULL vm.
Coverity detected that 5 of 6 callers of virJSONValueArrayGet checked
for a NULL return; and that by not checking we risk a null deref
during an error. The error is unlikely since the prior call to
virJSONValueArraySize would probably have already caught any botched
JSON array parse, but better safe than sorry.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONGetBlockJobInfo):
Check for NULL.
(qemuMonitorJSONExtractPtyPaths): Fix typo.
Revert 6a1f5f568f. Now that libvirt_iohelper takes fds by
inheritance rather than by open() (commit 1eb66479), there is
no longer a race where the parent can unlink() a file prior to
the iohelper open()ing the same file. From there, it makes
more sense to have the callers both create and unlink, rather
than the caller create and the stream unlink, since the latter
was only needed when iohelper had to do the unlink.
* src/fdstream.h (virFDStreamOpenFile, virFDStreamCreateFile):
Callers are responsible for deletion.
* src/fdstream.c (virFDStreamOpenFileInternal): Don't leak created
file on failure.
(virFDStreamOpenFile, virFDStreamCreateFile): Drop parameter.
* src/lxc/lxc_driver.c (lxcDomainOpenConsole): Update callers.
* src/qemu/qemu_driver.c (qemuDomainScreenshot)
(qemuDomainOpenConsole): Likewise.
* src/storage/storage_driver.c (storageVolumeDownload)
(storageVolumeUpload): Likewise.
* src/uml/uml_driver.c (umlDomainOpenConsole): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Likewise.
* src/xen/xen_driver.c (xenUnifiedDomainOpenConsole): Likewise.
The previous qemu patch could end up calling unlink(tmp) before
tmp was the name of a valid file (unlinking a fileXXXXXX template
instead), or calling unlink(tmp) twice on success (once here,
and once at the end of the stream). Meanwhile, vbox also suffered
from the same leaked tmp file bug.
* src/qemu/qemu_driver.c (qemuDomainScreenshot): Don't unlink on
success, or on invalid name.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Don't leak temp file.
Currently, we attempt to run sync job and async job at the same time. It
means that the monitor commands for two jobs can be run in any order.
In the function qemuDomainObjEnterMonitorInternal():
if (priv->job.active == QEMU_JOB_NONE && priv->job.asyncJob) {
if (qemuDomainObjBeginNestedJob(driver, obj) < 0)
We check whether the caller is an async job by priv->job.active and
priv->job.asynJob. But when an async job is running, and a sync job is
also running at the time of the check, then priv->job.active is not
QEMU_JOB_NONE. So we cannot check whether the caller is an async job
in the function qemuDomainObjEnterMonitorInternal(), and must instead
put the burden on the caller to tell us when an async command wants
to do a nested job.
Once the burden is on the caller, then only async monitor enters need
to worry about whether the VM is still running; for sync monitor enter,
the internal return is always 0, so lots of ignore_value can be dropped.
* src/qemu/THREADS.txt: Reflect new rules.
* src/qemu/qemu_domain.h (qemuDomainObjEnterMonitorAsync): New
prototype.
* src/qemu/qemu_process.h (qemuProcessStartCPUs)
(qemuProcessStopCPUs): Add parameter.
* src/qemu/qemu_migration.h (qemuMigrationToFile): Likewise.
(qemuMigrationWaitForCompletion): Make static.
* src/qemu/qemu_domain.c (qemuDomainObjEnterMonitorInternal): Add
parameter.
(qemuDomainObjEnterMonitorAsync): New function.
(qemuDomainObjEnterMonitor, qemuDomainObjEnterMonitorWithDriver):
Update callers.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemudDomainCoreDump, doCoreDump, processWatchdogEvent)
(qemudDomainSuspend, qemudDomainResume, qemuDomainSaveImageStartVM)
(qemuDomainSnapshotCreateActive, qemuDomainRevertToSnapshot):
Likewise.
* src/qemu/qemu_process.c (qemuProcessStopCPUs)
(qemuProcessFakeReboot, qemuProcessRecoverMigration)
(qemuProcessRecoverJob, qemuProcessStart): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationToFile)
(qemuMigrationWaitForCompletion, qemuMigrationUpdateJobStatus)
(qemuMigrationJobStart, qemuDomainMigrateGraphicsRelocate)
(doNativeMigrate, doTunnelMigrate, qemuMigrationPerformJob)
(qemuMigrationPerformPhase, qemuMigrationFinish)
(qemuMigrationConfirm): Likewise.
* src/qemu/qemu_hotplug.c: Drop unneeded ignore_value.
whether or not previous return value is -1, the following codes will be
executed for a inactive guest in src/qemu/qemu_driver.c:
ret = virDomainSaveConfig(driver->configDir, persistentDef);
and if everything is okay, 'ret' is assigned to 0, the previous 'ret'
will be overwritten, this patch will fix this issue.
* src/qemu/qemu_driver.c: avoid return value is overwritten when give a argument
in out of blkio weight range for a inactive guest.
* how to reproduce?
% virsh blkiotune ${guestname} --weight 10
% echo $?
Note: guest must be inactive, argument 10 in out of blkio weight range,
and can get a error information by checking libvirtd.log, however,
virsh hasn't raised any error information, and return value is 0.
https://bugzilla.redhat.com/show_bug.cgi?id=726304
Signed-off-by: Alex Jia <ajia@redhat.com>
whether or not previous return value is -1, the following codes will be
executed for a inactive guest in qemuDomainSetMemoryParameters:
ret = virDomainSaveConfig(driver->configDir, persistentDef);
and if everything is okay, 'ret' is assigned to 0, the previous 'ret'
will be overwritten, this patch will fix this issue.
* src/qemu/qemu_driver.c: avoid return value is overwritten when set
min_guarante value to a inactive guest.
* how to reproduce?
% virsh memtune ${guestname} --min_guarante 1024
% echo $?
Note: guest must be inactive, in fact, 'min_guarante' hasn't been implemented
in memory tunable, and I can get the error when check actual libvirtd.log,
however, virsh hasn't raised any error information, and return value is 0.
Signed-off-by: Alex Jia <ajia@redhat.com>
Introduced by f9a837da73, the condition is not changed after
the else clause is removed. So now it quit with "domain is not
running" when the domain is running. However, when the domain is
not running, it reports "no job is active".
How to reproduce:
1)
% virsh start $domain
% virsh domjobabort $domain
error: Requested operation is not valid: domain is not running
2)
% virsh destroy $domain
% virsh domjobabort $domain
error: Requested operation is not valid: no job is active on the domain
3)
% virsh save $domain /tmp/$domain.save
Before above commands finished, try to abort job in another terminal
% virsh domabortjob $domain
error: Requested operation is not valid: domain is not running
Using a macro ensures that all the code is looking for the same
prefix.
* src/conf/domain_conf.h (VIR_NET_GENERATED_PREFIX): New macro.
* src/conf/domain_conf.c (virDomainNetDefParseXML): Use it.
* src/uml/uml_conf.c (umlConnectTapDevice): Likewise.
* src/qemu/qemu_command.c (qemuNetworkIfaceConnect): Likewise.
Suggested by Laine Stump.
The goal here is that save-image-dumpxml fed back to
save-image-define should not change the save file; anywhere that
this is not the case is probably a bug in domain_conf.c.
* src/qemu/qemu_driver.c (qemuDomainSaveImageGetXMLDesc)
(qemuDomainSaveImageDefineXML): New functions.
(qemuDomainSaveImageOpen): Add parameter.
(qemuDomainRestoreFlags, qemuDomainObjRestore): Adjust clients.
With this, it is possible to update the path to a disk backing
image on either the save or restore action, without having to
binary edit the XML embedded in the state file.
This also modifies virDomainSave to output a smaller xml (only
the inactive xml, which is all the more virDomainRestore parses),
while still guaranteeing padding for most typical abi-compatible
xml replacements, necessary so that the next patch for
virDomainSaveImageDefineXML will not cause unnecessary
modifications to the save image file.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal): Add parameter,
only use inactive state, and guarantee padding.
(qemuDomainSaveImageOpen): Add parameter.
(qemuDomainSaveFlags, qemuDomainManagedSave)
(qemuDomainRestoreFlags, qemuDomainObjRestore): Update callers.
The domain XML now understands the <listen> subelement of its
<graphics> element (including when listen type='network'), and the
network driver has an internal API that will turn a network name into
an IP address, so the final logical step is to put the glue into the
qemu driver so that when it is starting up a domain, if it finds
<listen type='network' network='xyz'/> in the XML, it will call the
network driver to get an IPv4 address associated with network xyz, and
tell qemu to listen for vnc (or spice) on that address rather than the
default address (localhost).
The motivation for this is that a large installation may want the
guests' VNC servers listening on physical interfaces rather than
localhost, so that users can connect directly from the outside; this
requires sending qemu the appropriate IP address to listen on. But
this address will of course be different for each host, and if a guest
might be migrated around from one host to another, it's important that
the guest's config not have any information embedded in it that is
specific to one particular host. <listen type='network.../> can solve
this problem in the following manner:
1) on each host, define a libvirt network of the same name,
associated with the interface on that host that should be used
for listening (for example, a simple macvtap network: <forward
mode='bridge' dev='eth0'/>, or host bridge network: <forward
mode='bridge'/> <bridge name='br0'/>
2) in the <graphics> element of each guest's domain xml, tell vnc to
listen on the network name used in step 1:
<graphics type='vnc' port='5922'>
<listen type='network'network='example-net'/>
</graphics>
(all the above also applies for graphics type='spice').
Once it's plugged in, the <listen> element will be an optional
replacement for the "listen" attribute that graphics elements already
have. If the <listen> element is type='address', it will have an
attribute called 'address' which will contain an IP address or dns
name that the guest's display server should listen on. If, however,
type='network', the <listen> element should have an attribute called
'network' that will be set to the name of a network configuration to
get the IP address from.
* docs/schemas/domain.rng: updated to allow the <listen> element
* docs/formatdomain.html.in: document the <listen> element and its
attributes.
* src/conf/domain_conf.[hc]:
1) The domain parser, formatter, and data structure are modified to
support 0 or more <listen> subelements to each <graphics>
element. The old style "legacy" listen attribute is also still
accepted, and will be stored internally just as if it were a
separate <listen> element. On output (i.e. format), the address
attribute of the first <listen> element of type 'address' will be
duplicated in the legacy "listen" attribute of the <graphic>
element.
2) The "listenAddr" attribute has been removed from the unions in
virDomainGRaphicsDef for graphics types vnc, rdp, and spice.
This attribute is now in the <listen> subelement (aka
virDomainGraphicsListenDef)
3) Helper functions were written to provide simple access
(both Get and Set) to the listen elements and their attributes.
* src/libvirt_private.syms: export the listen helper functions
* src/qemu/qemu_command.c, src/qemu/qemu_hotplug.c,
src/qemu/qemu_migration.c, src/vbox/vbox_tmpl.c,
src/vmx/vmx.c, src/xenxs/xen_sxpr.c, src/xenxs/xen_xm.c
Modify all these files to use the listen helper functions rather
than directly referencing the (now missing) listenAddr
attribute. There can be multiple <listen> elements to a single
<graphics>, but the drivers all currently only support one, so all
replacements of direct access with a helper function indicate index
"0".
* tests/* - only 3 of these are new files added explicitly to test the
new <listen> element. All the others have been modified to reflect
the fact that any legacy "listen" attributes passed in to the domain
parse will be saved in a <listen> element (i.e. one of the
virDomainGraphicsListenDefs), and during the domain format function,
both the <listen> element as well as the legacy attributes will be
output.
qemuMigrationUpdateJobStatus (called in a loop by migration
and save tasks) uses qemuDomainObjEnterMonitorWithDriver;
however, that function ended up starting a nested job without
releasing the driver.
Since no one else is making nested calls, we can inline the
internal functions to properly track driver_locked.
* src/qemu/qemu_domain.h (qemuDomainObjBeginNestedJob)
(qemuDomainObjBeginNestedJobWithDriver)
(qemuDomainObjEndNestedJob): Drop unused prototypes.
* src/qemu/qemu_domain.c (qemuDomainObjEnterMonitorInternal):
Reflect driver lock to nested job.
(qemuDomainObjBeginNestedJob)
(qemuDomainObjBeginNestedJobWithDriver)
(qemuDomainObjEndNestedJob): Drop unused functions.
As written in virStorageFileGetMetadataFromFD decription, caller
must free metadata after use. Qemu driver miss this and therefore
leak metadata which can grow to huge mem leak if somebody query
for blockInfo a lot.
The error in getCompressionType will never be reported, change
the errors codes into warning (VIR_WARN("%s", _(foo)); doesn't break
syntax-check rule), and also improve the docs in qemu.conf to tell
user the truth.
Make MIGRATION_OUT use the new helper methods.
This also introduces new protection to migration v3 process: the
migration job is held from Begin to Confirm to avoid changes to a domain
during migration (esp. between Begin and Perform phases). This change is
automatically applied to p2p and tunneled migrations. For normal
migration, this requires support from a client. In other words, if an
old (pre 0.9.4) client starts normal migration of a domain, the domain
will not be protected against changes between Begin and Perform steps.
Without this, a configure built by autoconf 2.59 was broken when
trying to detect which compiler warning flags were supported.
* .gnulib: Update to latest, for warnings.m4 fix.
* bootstrap.conf: Add fclose explicitly, to match recent gnulib
implicit dependency changes.
* src/qemu/qemu_conf.c (includes): Drop unused include.
* src/uml/uml_conf.c (include): Likewise.
Reported by Daniel P. Berrange.
Every DomainNetDef has a bandwidth, as does every portgroup.
Whenever a DomainNetDef of type NETWORK is about to be used, a call is
made to networkAllocateActualDevice(). This function chooses the "best"
bandwidth object and places it in the DomainActualNetDef.
From that point on, whenever some code needs to use the bandwidth data
for the interface, it's retrieved with virDomainNetGetActualBandwidth(),
which will always return the "best" info as determined in the
previous step.
The cpu bandwidth is applied at the vcpu group level. We should apply it
at the vm group level too, because the vm may do heavy I/O, and it will affect
the other vm.
We apply cpu bandwidth at the vcpu and the vm group level, so we must ensure
that max(child_quota) <= parent_quota when we modify cpu bandwidth.
Now that virDomainSetVcpusFlags knows about VIR_DOMAIN_AFFECT_CURRENT,
so should virDomainGetVcpusFlags.
Unfortunately, the virsh counterpart 'virsh vcpucount' has already
commandeered --current for a different meaning, so teaching virsh
to expose this in the next patch will require a bit of care.
* src/libvirt.c (virDomainGetVcpusFlags): Allow
VIR_DOMAIN_AFFECT_CURRENT.
* src/libxl/libxl_driver.c (libxlDomainGetVcpusFlags): Likewise.
* src/qemu/qemu_driver.c (qemudDomainGetVcpusFlags): Likewise.
* src/test/test_driver.c (testDomainGetVcpusFlags): Likewise.
* src/xen/xen_driver.c (xenUnifiedDomainGetVcpusFlags): Likewise.
Although most functions in libvirt return 0 on success and < 0 on
failure, there are a few functions lingering around that return errno
(a positive value) on failure, and sometimes code calling those
functions incorrectly assumes the <0 standard. I noticed one of these
the other day when auditing networkStartDhcpDaemon after Guido Gunther
found a place where success was improperly returned on failure (that
patch has been acked and is pending a push). The problem was that it
expected the return value from virFileReadPid to be < 0 on failure,
but it was actually positive (it was also neglected to set the return
code in this case, similar to the bug found by Guido).
This all led to the fact that *all* of the virFile*Pid functions in
util.c are returning errno on failure. This patch remedies that
problem by changing them all to return -errno on failure, and makes
any necessary changes to callers of the functions. (In the meantime, I
also properly set the return code on failure of virFileReadPid in
networkStartDhcpDaemon).
In the XML file we now have
<cputune>
<shares>1024</shares>
<period>90000</period>
<quota>0</quota>
</cputune>
But the schedinfo parameter are being named
cpu_shares: 1024
cfs_period: 90000
cfs_quota: 0
The period/quota is per-vcpu value, so these new tunables should be named
'vcpu_period' and 'vcpu_quota'.
When an operation started by virDomainBlockPull completes (either with
success or with failure), raise an event to indicate the final status.
This API allow users to avoid polling on virDomainGetBlockJobInfo if
they would prefer to use an event mechanism.
* daemon/remote.c: Dispatch events to client
* include/libvirt/libvirt.h.in: Define event ID and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle the new event
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for block_stream completion and emit a libvirt block pull event
* src/remote/remote_driver.c: Receive and dispatch events to application
* src/remote/remote_protocol.x: Wire protocol definition for the event
* src/remote_protocol-structs: structure definitions for protocol verification
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for BLOCK_STREAM_COMPLETED event
from QEMU monitor
The virDomainBlockPull* family of commands are enabled by the
following HMP/QMP commands: 'block_stream', 'block_job_cancel',
'info block-jobs' / 'query-block-jobs', and 'block_job_set_speed'.
* src/qemu/qemu_driver.c src/qemu/qemu_monitor_text.[ch]: implement disk
streaming by using the proper qemu monitor commands.
* src/qemu/qemu_monitor_json.[ch]: implement commands using the qmp monitor
When auto-dumping a domain on crash events, or autostarting a domain
with managed save state, let the user configure whether to imply
the bypass cache flag.
* src/qemu/qemu.conf (auto_dump_bypass_cache, auto_start_bypass_cache):
Document new variables.
* src/qemu/libvirtd_qemu.aug (vnc_entry): Let augeas parse them.
* src/qemu/qemu_conf.h (qemud_driver): Store new preferences.
* src/qemu/qemu_conf.c (qemudLoadDriverConfig): Parse them.
* src/qemu/qemu_driver.c (processWatchdogEvent, qemuAutostartDomain):
Honor them.
Wire together the previous patches to support file system cache
bypass during API save/restore requests in qemu.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal, doCoreDump)
(qemudDomainObjStart, qemuDomainSaveImageOpen, qemuDomainObjRestore)
(qemuDomainObjStart): Add parameter.
(qemuDomainSaveFlags, qemuDomainManagedSave, qemudDomainCoreDump)
(processWatchdogEvent, qemudDomainStartWithFlags, qemuAutostartDomain)
(qemuDomainRestoreFlags): Update callers.
For all hypervisors that support save and restore, the new API
now performs the same functions as the old.
VBox is excluded from this list, because its existing domainsave
is broken (there is no corresponding domainrestore, and there
is no control over the filename used in the save). A later
patch should change vbox to use its implementation for
managedsave, and teach start to use managedsave results.
* src/libxl/libxl_driver.c (libxlDomainSave): Move guts...
(libxlDomainSaveFlags): ...to new function.
(libxlDomainRestore): Move guts...
(libxlDomainRestoreFlags): ...to new function.
* src/test/test_driver.c (testDomainSave, testDomainSaveFlags)
(testDomainRestore, testDomainRestoreFlags): Likewise.
* src/xen/xen_driver.c (xenUnifiedDomainSave)
(xenUnifiedDomainSaveFlags, xenUnifiedDomainRestore)
(xenUnifiedDomainRestoreFlags): Likewise.
* src/qemu/qemu_driver.c (qemudDomainSave, qemudDomainRestore):
Rename and move guts.
(qemuDomainSave, qemuDomainSaveFlags, qemuDomainRestore)
(qemuDomainRestoreFlags): ...here.
(qemudDomainSaveFlag): Rename...
(qemuDomainSaveInternal): ...to this, and update callers.
The network driver needs to assign physical devices for use by modes
that use macvtap, keeping track of which physical devices are in use
(and how many instances, when the devices can be shared). Three calls
are added:
networkAllocateActualDevice - finds a physical device for use by the
domain, and sets up the virDomainActualNetDef accordingly.
networkNotifyActualDevice - assumes that the domain was already
running, but libvirtd was restarted, and needs to be notified by each
already-running domain about what interfaces they are using.
networkReleaseActualDevice - decrements the usage count of the
allocated physical device, and frees the virDomainActualNetDef to
avoid later accidentally using the device.
bridge_driver.[hc] - the new APIs. When WITH_NETWORK is false, these
functions are all #defined to be "0" in the .h file (effectively
becoming a NOP) to prevent link errors.
qemu_(command|driver|hotplug|process).c - add calls to the above APIs
in the appropriate places.
tests/Makefile.am - we need to include libvirt_driver_network.la
whenever libvirt_driver_qemu.la is linked, to avoid unreferenced
symbols (in functions that are never called by the test
programs...)
This is the one function outside of domain_conf.c that plays around
with (even modifying) the internals of the virDomainNetDef, and thus
can't be fixed up simply by replacing direct accesses to the fields of
the struct with the GetActual*() access functions.
In this case, we need to check if the defined type is "network", and
if it is *then* check the actual type; if the actual type is "bridge",
then we can at least put the bridgename in a place where it can be
used; otherwise (if type isn't "bridge"), we behave exactly as we used
to - just null out *everything*.
The qemu driver accesses fields in the virDomainNetDef directly, but
with the advent of the virDomainActualNetDef, some pieces of
information may be found in a different place (the ActualNetDef) if
the network connection is of type='network' and that network is of
forward type='bridge|private|vepa|passthrough'. The previous patch
added functions to mask this difference from callers - they hide the
decision making process and just pick the value from the proper place.
This patch uses those functions in the qemu driver as a first step in
making qemu work with the new network types. At this point, the
virDomainActualNetDef is guaranteed always NULL, so the GetActualX()
function will return exactly what the def->X that's being replaced
would have returned (ie bisecting is not compromised).
There is one place (in qemu_driver.c) where the internal details of
the NetDef are directly manipulated by the code, so the GetActual
functions cannot be used there without extra additional code; that
file will be treated in a separate patch.
The virtPortProfile in the domain interface struct is now a separately
allocated object *pointed to by* (rather than contained in) the main
virDomainNetDef object. This is done to make it easier to figure out
when a virtualPortProfile has/hasn't been specified in a particular
config.
Otherwise, an ABI mismatch gives error messages attributing the target
xml string as current, and the current domain state as the new xml.
* src/qemu/qemu_migration.c (qemuMigrationBegin): Use correct
argument order.
Since libvirt is multi-threaded, we should use FD_CLOEXEC as much
as possible in the parent, and only relax fds to inherited after
forking, to avoid leaking an fd created in one thread to a fork
run in another thread. This gets us closer to that ideal, by
making virCommand automatically clear FD_CLOEXEC on fds intended
for the child, as well as avoiding a window of time with non-cloexec
pipes created for capturing output.
* src/util/command.c (virExecWithHook): Use CLOEXEC in parent. In
child, guarantee that all fds to pass to child are inheritable.
(getDevNull): Use CLOEXEC.
(prepareStdFd): New helper function.
(virCommandRun, virCommandRequireHandshake): Use pipe2.
* src/qemu/qemu_command.c (qemuBuildCommandLine): Simplify caller.
This patch implements cfs_period and cfs_quota's modification.
We can use the command 'virsh schedinfo' to query or modify cfs_period and
cfs_quota.
If you query period or quota from config file, the value 0 means it does not set
in the config file.
If you set period or quota to config file, the value 0 means that delete current
setting from config file.
If you modify period or quota while vm is running, the value 0 means that use
current value.
Starting/ending jobs when closing the connection may reset any
error which was reported earlier in p2p migration. We must
save the original error before doing so. This means we can also
just call virConnectClose as normal, instead of virUnrefConnect
* src/qemu/qemu_migration.c: Preserve errors in p2p migration
There were two API in driver.c that were silently masking flags
bits prior to calling out to the drivers, and several others
that were explicitly masking flags bits. This is not
forward-compatible - if we ever have that many flags in the
future, then talking to an old server that masks out the
flags would be indistinguishable from talking to a new server
that can honor the flag. In general, libvirt.c should forward
_all_ flags on to drivers, and only the drivers should reject
unknown flags.
In the case of virDrvSecretGetValue, the solution is to separate
the internal driver callback function to have two parameters
instead of one, with only one parameter affected by the public
API. In the case of virDomainGetXMLDesc, it turns out that
no one was ever mixing VIR_DOMAIN_XML_INTERNAL_STATUS with
the dumpxml path in the first place; that internal flag was
only used in saving and restoring state files, which happened
to be in functions internal to a single file, so there is no
mixing of the internal flag with a public flags argument.
Additionally, virDomainMemoryStats passed a flags argument
over RPC, but not to the driver.
* src/driver.h (VIR_DOMAIN_XML_FLAGS_MASK)
(VIR_SECRET_GET_VALUE_FLAGS_MASK): Delete.
(virDrvSecretGetValue): Separate out internal flags.
(virDrvDomainMemoryStats): Provide missing flags argument.
* src/driver.c (verify): Drop unused check.
* src/conf/domain_conf.h (virDomainObjParseFile): Delete
declaration.
(virDomainXMLInternalFlags): Move...
* src/conf/domain_conf.c: ...here. Delete redundant include.
(virDomainObjParseFile): Make static.
* src/libvirt.c (virDomainGetXMLDesc, virSecretGetValue): Update
clients.
(virDomainMemoryPeek, virInterfaceGetXMLDesc)
(virDomainMemoryStats, virDomainBlockPeek, virNetworkGetXMLDesc)
(virStoragePoolGetXMLDesc, virStorageVolGetXMLDesc)
(virNodeNumOfDevices, virNodeListDevices, virNWFilterGetXMLDesc):
Don't mask unknown flags.
* src/interface/netcf_driver.c (interfaceGetXMLDesc): Reject
unknown flags.
* src/secret/secret_driver.c (secretGetValue): Update clients.
* src/remote/remote_driver.c (remoteSecretGetValue)
(remoteDomainMemoryStats): Likewise.
* src/qemu/qemu_process.c (qemuProcessGetVolumeQcowPassphrase):
Likewise.
* src/qemu/qemu_driver.c (qemudDomainMemoryStats): Likewise.
* daemon/remote.c (remoteDispatchDomainMemoryStats): Likewise.
The regression is introduced by Commit da1eba6b, the new
codes with this commit doesn't reset "ret" to "-1" when
it fails on parsing the device XML (live device attachment)
This patch changes the codes to reset the "ret" and "-1",
and also changes the codes so that it don't modify "ret"
for condition checking.
How to reproduce:
% cat test.xml
<disk type='oops' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/var/lib/libvirt/images/test.img'/>
<target dev='vda' bus='virtio'/>
</disk>
% virsh attach-device $domain test.xml
Device attached successfully
The device attachment failed actually with error "unknown disk type 'oops'",
however, it reports success.
Commit f548480b broke migration v3 on qemu, because the driver
passed flags on through to qemu_migration even though
qemu_migration wasn't using those flags.
* src/qemu/qemu_migration.h (QEMU_MIGRATION_FLAGS): New define.
* src/qemu/qemu_driver.c: Simplify all migration callbacks.
* src/qemu/qemu_migration.c (qemuMigrationConfirm): Fix regression.
The previous patches only cleaned up ATTRIBUTE_UNUSED flags cases;
auditing the drivers found other places where flags was being used
but not validated. In particular, domainGetXMLDesc had issues with
clients accepting a different set of flags than the common
virDomainDefFormat helper function.
* src/conf/domain_conf.c (virDomainDefFormat): Add common flag check.
* src/uml/uml_driver.c (umlDomainAttachDeviceFlags)
(umlDomainDetachDeviceFlags): Reject unknown
flags.
* src/vbox/vbox_tmpl.c (vboxDomainGetXMLDesc)
(vboxDomainAttachDeviceFlags)
(vboxDomainDetachDeviceFlags): Likewise.
* src/qemu/qemu_driver.c (qemudDomainMemoryPeek): Likewise.
(qemuDomainGetXMLDesc): Document common flag handling.
* src/libxl/libxl_driver.c (libxlDomainGetXMLDesc): Likewise.
* src/lxc/lxc_driver.c (lxcDomainGetXMLDesc): Likewise.
* src/openvz/openvz_driver.c (openvzDomainGetXMLDesc): Likewise.
* src/phyp/phyp_driver.c (phypDomainGetXMLDesc): Likewise.
* src/test/test_driver.c (testDomainGetXMLDesc): Likewise.
* src/vmware/vmware_driver.c (vmwareDomainGetXMLDesc): Likewise.
* src/xenapi/xenapi_driver.c (xenapiDomainGetXMLDesc): Likewise.
This patch extends qemudDomainSetVcpusFlags() function to support
VIR_DOMAIN_AFFECT_CURRENT flag.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
When qemuMonitorCloseFileHandle is called in error path, we need to
preserve the original error since a possible further error when running
closefd monitor command is not very useful to users.
When creating new qemu process we saved domain status XML only after the
process was fully setup and running. In case libvirtd was killed before
the whole process finished, once libvirtd started again it didn't know
anything about the new process and we end up with an orphaned qemu
process. Let's save the domain status XML as soon as we know the PID so
that libvirtd can kill the process on restart.
The compiler might optimize based on our declaration that something
is unused. Putting that declaration in the header risks getting
out of sync with the actual implementation, so it belongs better
only in the .c files. We were mostly compliant, and a new syntax
check will help us in the future.
* cfg.mk (sc_avoid_attribute_unused_in_header): New syntax check.
* src/nodeinfo.h (nodeGetCPUStats, nodeGetMemoryStats): Delete
attribute already present in .c file.
* src/qemu/qemu_domain.h (qemuDomainEventFlush): Likewise.
* src/util/virterror_internal.h (virReportErrorHelper): Parameters
are actually used by .c file.
* src/xenxs/xen_sxpr.h (xenFormatSxprDisk): Adjust prototype.
* src/xenxs/xen_sxpr.c (xenFormatSxprDisk): Delete unused argument.
(xenFormatSxpr): Adjust caller.
* src/xen/xend_internal.c (xenDaemonAttachDeviceFlags)
(xenDaemonUpdateDeviceFlags): Likewise.
Suggested by Daniel Veillard.
Continuation of commit 313ac7fd, and enforce things with a syntax
check.
Technically, virNetServerClientCalculateHandleMode is not printing
a mode_t, but rather a collection of VIR_EVENT_HANDLE_* bits;
however, these bits are < 8, so there is no different in the
output, and that was the easiest way to silence the new syntax check.
* cfg.mk (sc_flags_debug): New syntax check.
(exclude_file_name_regexp--sc_flags_debug): Add exemptions.
* src/fdstream.c (virFDStreamOpenFileInternal): Print flags in
hex, mode_t in octal.
* src/libvirt-qemu.c (virDomainQemuMonitorCommand)
(virDomainQemuAttach): Likewise.
* src/locking/lock_driver_nop.c (virLockManagerNopInit): Likewise.
* src/locking/lock_driver_sanlock.c (virLockManagerSanlockInit):
Likewise.
* src/locking/lock_manager.c: Likewise.
* src/qemu/qemu_migration.c: Likewise.
* src/qemu/qemu_monitor.c: Likewise.
* src/rpc/virnetserverclient.c
(virNetServerClientCalculateHandleMode): Print mode with %o.
When monitor is entered with qemuDomainObjEnterMonitorWithDriver, the
correct method for leaving and unlocking the monitor is
qemuDomainObjExitMonitorWithDriver.
Most of the code in these two functions is supposed to be identical but
currently it isn't (which is natural since the code is duplicated).
Let's move common parts of these functions into qemuMigrationPrepareAny.
This also fixes qemuMigrationPrepareTunnel which didn't store received
lockState in the domain object.
Asynchronous jobs may take long time to finish and may consist of
several phases which we need to now about to help with recovery/rollback
after libvirtd restarts.
Query commands are safe to be called during long running jobs (such as
migration). This patch makes them all work without the need to
special-case every single one of them.
The patch introduces new job.asyncCond condition and associated
job.asyncJob which are dedicated to asynchronous (from qemu monitor
point of view) jobs that can take arbitrarily long time to finish while
qemu monitor is still usable for other commands.
The existing job.active (and job.cond condition) is used all other
synchronous jobs (including the commands run during async job).
Locking schema is changed to use these two conditions. While asyncJob is
active, only allowed set of synchronous jobs is allowed (the set can be
different according to a particular asyncJob) so any method that
communicates to qemu monitor needs to check if it is allowed to be
executed during current asyncJob (if any). Once the check passes, the
method needs to normally acquire job.cond to ensure no other command is
running. Since domain object lock is released during that time, asyncJob
could have been started in the meantime so the method needs to recheck
the first condition. Then, normal jobs set job.active and asynchronous
jobs set job.asyncJob and optionally change the list of allowed job
groups.
Since asynchronous jobs only set job.asyncJob, other allowed commands
can still be run when domain object is unlocked (when communicating to
remote libvirtd or sleeping). To protect its own internal synchronous
commands, the asynchronous job needs to start a special nested job
before entering qemu monitor. The nested job doesn't check asyncJob, it
only acquires job.cond and sets job.active to block other jobs.
EnterMonitor and ExitMonitor methods are very similar to their
*WithDriver variants; consolidate them into EnterMonitorInternal and
ExitMonitorInternal to avoid (mainly future) code duplication.
The LXC and UML drivers can both make use of auditing. Move
the qemu_audit.{c,h} files to src/conf/domain_audit.{c,h}
* src/conf/domain_audit.c: Rename from src/qemu/qemu_audit.c
* src/conf/domain_audit.h: Rename from src/qemu/qemu_audit.h
* src/Makefile.am: Remove qemu_audit.{c,h}, add domain_audit.{c,h}
* src/qemu/qemu_audit.h, src/qemu/qemu_cgroup.c,
src/qemu/qemu_command.c, src/qemu/qemu_driver.c,
src/qemu/qemu_hotplug.c, src/qemu/qemu_migration.c,
src/qemu/qemu_process.c: Update for changed audit API names
Given a PID, the QEMU driver reads /proc/$PID/cmdline and
/proc/$PID/environ to get the configuration. This is fed
into the ARGV->XML convertor to build an XML configuration
for the process.
/proc/$PID/exe is resolved to identify the full command
binary path
After checking for name/uuid uniqueness, an attempt is
made to connect to the monitor socket. If successful
then 'info status' and 'info kvm' are issued to determine
whether the CPUs are running and if KVM is enabled.
* src/qemu/qemu_driver.c: Implement virDomainQemuAttach
* src/qemu/qemu_process.h, src/qemu/qemu_process.c: Add
qemuProcessAttach to connect to the monitor of an
existing QEMU process
When attaching to an external QEMU process, it is neccessary
to check if the process is using KVM or not. This can be done
using a monitor command
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
API for checking if KVM is enabled
To enable attaching to externally launched QEMU, we need
to be able to reverse engineer a guest XML config based
on the argv for a PID in /proc
* src/qemu/qemu_command.c, src/qemu/qemu_command.h: Add
qemuParseCommandLinePid which extracts QEMU config from
argv in /proc, given a PID number
When converting QEMU argv into a virDomainDefPtr, also extract
the pidfile, monitor character device config and the monitor
mode.
* src/qemu/qemu_command.c, src/qemu/qemu_command.h: Extract
pidfile & monitor config from QEMU argv
* src/qemu/qemu_driver.c, tests/qemuargv2xmltest.c: Add extra
params when calling qemuParseCommandLineString
Avoid re-formatting the pidfile path everytime we need it. Create
it once when starting the guest, and preserve it until the guest
is shutdown.
* src/libvirt_private.syms, src/util/util.c,
src/util/util.h: Add virFileReadPidPath
* src/qemu/qemu_domain.h: Add pidfile field
* src/qemu/qemu_process.c: Store pidfile path in qemuDomainObjPrivate
The drivers were accepting domain configs without checking if those
were actually meant for them. For example the LXC driver happily
accepts configs with type QEMU.
Add a check for the expected domain types to the virDomainDefParse*
functions.
If virDomainSaveConfig() failed, we will return NULL to source,
and the vm is still available to restart during confirm() step in
v3 protocol. So we should kill it off in qemuMigrationFinish().
In v2 protocol, we should not set vm to NULL, because we hold
a reference of vm and should unrefernce it.
This patch creates new <bios> element which, at this time has only the
attribute useserial='yes|no'. This attribute allow users to use
Serial Graphics Adapter and see BIOS messages from the very first moment
domain boots up. Therefore, users can choose boot medium, set PXE, etc.
This option accepts 3 values:
-keep, to keep current client connected (Spice+VNC)
-disconnect, to disconnect client (Spice)
-fail, to fail setting password if there is a client connected (Spice)
Add libvirt support for MicroBlaze architecture as a QEMU target. Based on mips/mipsel pattern.
Signed-off-by: John Williams <john.williams@petalogix.com>