This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1035188
Commit f094aaac48a6 changed the PCI device assignment in qemu domains
to default to using VFIO rather than legacy KVM device assignment
(when VFIO is available). It didn't change which driver was used by
default for virNodeDeviceDetachFlags(), though, so that API (and the
virsh nodedev-detach command) was still binding to the pci-stub
driver, used by legacy KVM assignment, by default.
This patch publicizes (only within the qemu module, though, so no
additions to the symbol exports are needed) the functions that check
for presence of KVM and VFIO device assignment, then uses those
functions to decide what to do when no driver is specified for
virNodeDeviceDetachFlags(); if the vfio driver is loaded, the device
will be bound to vfio-pci, or if legacy KVM assignment is supported on
this system, the device will be bound to pci-stub; if neither method
is available, the detach will fail.
Currently the snapshot code did not check if it actually supports
snapshots on various disk backends for domains. To avoid future problems
add checkers that whitelist the supported configurations.
This patch adds function qemuGetDriveSourceString to produce
qemu-compatible disk source strings that will enable to reuse the code
and refactors building of the qemu commandline of disks to use this new
helper.
Automatically assign secret type from the disk source definition and
pull in adding of the comma. Then update callers to keep generated
output the same.
Before this patch, the translation function still needs a second ugly
helper function to actually format the command line for qemu. But if we
do the right stuff in the translation function, we don't have to bother
with the second function any more.
This patch removes the messy qemuBuildVolumeString function and changes
qemuTranslateDiskSourcePool to set stuff up correctly so that the
regular code paths meant for volumes can be used to format the command
line correctly.
For this purpose a new helper "qemuDiskGetActualType()" is introduced to
return the type of the volume in a pool.
As a part of the refactor the qemuTranslateDiskSourcePool function is
fixed to do decisions based on the pool type instead of the volume type.
This allows to separate pool-type-specific stuff more clearly and will
ease addition of other pool types that will require certain other
operations to get the correct pool source.
The previously fixed tests should make sure that we don't break stuff
that was working before.
When doing an internal snapshot on a VM with sheepdog or RBD disks we
would not set a flag to mark the domain is using internal snapshots and
might end up creating a mixed snapshot. Move the setting of the variable
to avoid this problem.
The virsh command 'domxml-to-native' (virConnectDomainXMLToNative())
converts all network devices to "type='ethernet'" in order to make it
more likely that the generated command could be run directly from a
shell (other libvirt network device types end up referencing file
descriptors for tap devices assumed to have been created by libvirt,
which can't be done in this case).
During this conversion, all of the netdev parameters are cleared out,
then specific items are filled in after changing the type. The MAC
address was not one of these preserved items, and the result was that
mac addresses in the generated commandlines were always
00:00:00:00:00:00.
This patch saves the mac address before the conversion, then
repopulates it afterwards, so the proper mac addresses show up in the
commandline.
Signed-off-by: Bing Bu Cao <mars@linux.vnet.ibm.com>
Signed-off-by: Laine Stump <laine@laine.org>
In the 'directory' and 'netfs' storage pools, a user can see
both 'file' and 'dir' storage volume types, to know when they
can descend into a subdirectory. But in a network-based storage
pool, such as the upcoming 'gluster' pool, we use 'network'
instead of 'file', and did not have any counterpart for a
directory until this patch. Adding a new volume type
'network-dir' is better than reusing 'dir', because it makes
it clear that the only way to access 'network' volumes within
that container is through the network mounting (leaving 'dir'
for something accessible in the local file system).
* include/libvirt/libvirt.h.in (virStorageVolType): Expand enum.
* docs/formatstorage.html.in: Document it.
* docs/schemasa/storagevol.rng (vol): Allow new value.
* src/conf/storage_conf.c (virStorageVol): Use new value.
* src/qemu/qemu_command.c (qemuBuildVolumeString): Fix client.
* src/qemu/qemu_conf.c (qemuTranslateDiskSourcePool): Likewise.
* tools/virsh-volume.c (vshVolumeTypeToString): Likewise.
* src/storage/storage_backend_fs.c
(virStorageBackendFileSystemVolDelete): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
The bus type IDE being enum Zero, the bus type on pseries system appears as IDE for all the -hda/-cdrom and for disk drives with if="none" type. Pseries platform needs this to appear as SCSI instead of IDE. The ide being not supported, the explicit requests for ide devices will return an error.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
This nested job is canceled by the first ExitMonitor call (even though
it was not created by the corresponding EnterMonitor call), and
again in qemuMigrationPrepareAny if qemuProcessStart failed.
This can lead to a crash if the vm object was disposed of before calling
qemuDomainRemoveInactive:
0 ..62bc in virClassIsDerivedFrom (klass=0xdeadbeef,
parent=0x7ffce4cdd270) at util/virobject.c:166
1 ..6666 in virObjectIsClass at util/virobject.c:362
2 ..66b4 in virObjectLock at util/virobject.c:314
3 ..477e in virDomainObjListRemove at conf/domain_conf.c:2359
4 ..7a64 in qemuDomainRemoveInactive at qemu/qemu_domain.c:2087
5 ..956c in qemuMigrationPrepareAny at qemu/qemu_migration.c:2469
This was added by commit e4e2822, exposed by 5a4c237 and c7ac251.
https://bugzilla.redhat.com/show_bug.cgi?id=1018267
If a SCSI hostdev is included in an initial domain XML, without a
corresponding controller statement, one is created silently when the
guest is booted.
When hotplugging a SCSI hostdev, a presumption is that the controller
is already present in the domain either from the original XML, or via
an earlier hotplug.
[root@xxxxxxxx ~]# cat disk.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='3' unit='1088438288'/>
</source>
</hostdev>
[root@xxxxxxxx ~]# virsh attach-device guest01 disk.xml
error: Failed to attach device from disk.xml
error: internal error: unable to execute QEMU command 'device_add': Bus 'scsi0.0' not found
Since the infrastructure is in place, we can also create a controller
silently for use by the hotplugged hostdev device.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For systems without a PCI bus, attaching a SCSI controller fails:
[root@xxxxxxxx ~]# cat controller.xml
<controller type='scsi' model='virtio-scsi' index='0' />
[root@xxxxxxxx ~]# virsh attach-device guest01 controller.xml
error: Failed to attach device from controller.xml
error: XML error: No PCI buses available
A similar problem occurs with the detach of a controller:
[root@xxxxxxxx ~]# virsh detach-device guest01 controller.xml
error: Failed to detach device from controller.xml
error: operation failed: controller scsi:0 not found
The qemuDomainXXtachPciControllerDevice routines made assumptions
that any caller had a PCI bus. These routines now selectively calls
PCI functions where necessary, and assigns the device information
type to one appropriate for the bus in use.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
For attach/detach of controller devices, we rename the functions to
remove 'PCI' from their title. The actual separation of PCI-specific
operations will be handled in the next patch.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
These changes allow the correct virtio-blk-device and virtio-net-device
devices to be used for the 'virt' machine type for armv7 rather than the
PCI virtio devices.
A test case was added to qemuxml2argvtest for this change.
Signed-off-by: Clark Laughlin <clark.laughlin@linaro.org>
Most of our code base uses space after comma but not before;
fix the remaining uses before adding a syntax check.
* src/qemu/qemu_cgroup.c: Consistently use commas.
* src/qemu/qemu_command.c: Likewise.
* src/qemu/qemu_conf.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/qemu/qemu_monitor.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
On the domain startup, this function is called to dump some info about
the CPUs. At the beginning of the function we check if we aren't running
older qemu which is not exposing the CPUs via 'qom-list'. However, we
are not checking for even older qemus, which throw 'CommandNotFound'
error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Report the error in virPortAllocatorAcquire instead
of doing it in every caller.
The error contains the port range name instead of the intended
use for the port, e.g.:
Unable to find an unused port in range 'display' (65534-65535)
instead of:
Unable to find an unused port for SPICE
This also adds error reporting when the QEMU driver could not
find an unused port for VNC, VNC WebSockets or NBD migration.
QEMU 1.6.0 introduced new migration status: setup
Libvirt does not expect such string in QMP and refuses to migrate with error
"unexpected migration status in setup"
This patch fixes it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1025108
So far qemuSetupHostdevCGroup was called very early during hotplug, even
before we knew the device we were about to hotplug was actually
available. By calling the function later, we make sure QEMU won't be
allowed to access devices used by other domains.
Another important effect of this change is that hopluging USB devices
specified by vendor and product (but not by their USB address) works
again. This was broken since v1.0.5-171-g7d763ac, when the call to
qemuFindHostdevUSBDevice was moved after the call to
qemuSetupHostdevCGroup, which then used an uninitialized USB address.
https://bugzilla.redhat.com/show_bug.cgi?id=1018267
The aim of virObject refing and urefing is to tell where the object is
to be used and when is no longer needed. Hence any object shouldn't be
used after it has been unrefed, as we might be the last to hold the
reference. The better way is to call virObjectUnref() *after* the last
object usage. In this specific case, the monitor EOF handler was called
after the qemuMonitorIO called virObjectUnref. Not only that @mon was
disposed (which is not used in the handler anyway) but the @mon->vm
which is causing a SIGSEGV:
2013-11-15 10:17:54.425+0000: 20110: error : qemuMonitorIO:688 : internal error: early end of file from monitor: possible problem:
qemu-kvm: -incoming tcp:01.01.01.0:49152: Failed to bind socket: Cannot assign requested address
Program received signal SIGSEGV, Segmentation fault.
qemuProcessHandleMonitorEOF (mon=<optimized out>, vm=0x7fb728004170) at qemu/qemu_process.c:299
299 if (priv->beingDestroyed) {
(gdb) p *priv
Cannot access memory at address 0x0
(gdb) p vm
$1 = (virDomainObj *) 0x7fb728004170
(gdb) p *vm
$2 = {parent = {parent = {magic = 3735928559, refs = 0, klass = 0xdeadbeef}, lock = {lock = {__data = {__lock = 2, __count = 0, __owner = 20110, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\216N\000\000\001", '\000' <repeats 26 times>, __align = 2}}}, pid = 0, state = {state = 0, reason = 0}, autostart = 0, persistent = 0,
updated = 0, def = 0x0, newDef = 0x0, snapshots = 0x0, current_snapshot = 0x0, hasManagedSave = false, privateData = 0x0, privateDataFreeFunc = 0x0, taint = 304}
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In the qemuProcessReconnectHelper() a new thread that does all the
interesting work is spawned. The rationale is to not block the daemon
startup process in case of unresponsive qemu. However, the thread
handler is a local variable which gets lost once the control goes out of
scope. Hence the thread gets leaked. We can avoid this if the thread
isn't made joinable.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The QOM path in qemu that contains the CPUID registers of a running VM
may not be present (introduced in QEMU 1.5).
Since commit d94b7817719 we have a regression with QEMU that don't
support reporting of the CPUID register state via the monitor as the
process startup code expects the path to exist.
This patch adds code that checks with the monitor if the requested path
already exists and uses it only in this case.
The 'none' machine type is something only intended for use
by libvirt probing capabilities. It isn't something that
is useful for running real VM instances. As such it should
not be exposed to users in the capabilities.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virQEMUCapsProbeQMPMachineTypes method iterates over machine
types copying them into the qemuCapsPtr object. It only updates
the qemuCaps->nmachinetypes value at the end though. So if OOM
occurs in the middle, the destructor of qemuCapsPtr will not
free the partially initialized machine types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the managedsave image is corrupted, e.g. the XML part is, we fail to
parse it and throw an error, e.g.:
error: Failed to start domain jms8
error: XML error: missing security model when using multiple labels
This is okay, as we can't really start the machine and avoid undefined
qemu behaviour. On the other hand, the error message doesn't give a
clue to users what should they do. The consensus here would be to thrown
a warning to logs saying "Hey, you've got a corrupted file".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch moves some code in the qemuDomainAttachSCSIDisk
function. The check for the existence of a PCI address assigned
to the SCSI controller was moved in order to be executed only
when needed. The PCI address of a controller is not necessary
if QEMU_CAPS_DEVICE is supported.
This fixes issues with the hotplug of SCSI disks on pseries guests.
When adding support for Q35 guests, the code to assign a PCI address
to the primary video card was moved into Q35 and i440fx(PIIX3)
specific functions, but no fallback was kept for other machine types
that might have a video card.
This patch remedies that by assigning a PCI address to the primary
video card if it does not have any kind of address. In particular,
this fixes issues with pseries guests.
Signed-off-by: Vitor de Lima <vitor.lima@eldorado.org.br>
Signed-off-by: Laine Stump <laine@laine.org>
When starting a VM the qemu process may filter out some requested
features of a domain as it's not supported either by the host or by
qemu. Libvirt didn't check if this happened which might end up in
changing of the guest ABI when migrating.
The proof of concept implementation adds the check for the recently
introduced kvm_pv_unhalt cpuid feature bit. This feature depends on both
qemu and host kernel support and thus increase the possibility of guest
ABI breakage.
The linux kernel recently added support for paravirtual spinlock
handling to avoid performance regressions on overcomitted hosts. This
feature needs to be turned in the hypervisor so that the guest OS is
notified about the possible support.
This patch adds a new feature "paravirt-spinlock" to the XML and
supporting code to enable the "kvm_pv_unhalt" pseudo CPU feature in
qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=1008989
Currently we were storing domain feature flags in a bit field as the
they were either enabled or disabled. New features such as paravirtual
spinlocks however can be tri-state as the default option may depend on
hypervisor version.
To allow storing tri-state feature state in the same place instead of
having to declare dedicated variables for each feature this patch
refactors the bit field to an array.
The qemu monitor supports retrieval of actual CPUID bits presented to
the guest using QMP monitor. Add APIs to extract these information and
tests for them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Since 86d90b3a (yes, my patch; again) we are supporting NBD storage
migration. However, on error recovery path we got the steps reversed.
The correct order is: return NBD port to the virPortAllocator and then
either unlock the vm or remove it from the driver. Not vice versa.
==11192== Invalid write of size 4
==11192== at 0x11488559: qemuMigrationPrepareAny (qemu_migration.c:2459)
==11192== by 0x11488EA6: qemuMigrationPrepareDirect (qemu_migration.c:2652)
==11192== by 0x114D1509: qemuDomainMigratePrepare3Params (qemu_driver.c:10332)
==11192== by 0x519075D: virDomainMigratePrepare3Params (libvirt.c:7290)
==11192== by 0x1502DA: remoteDispatchDomainMigratePrepare3Params (remote.c:4798)
==11192== by 0x12DECA: remoteDispatchDomainMigratePrepare3ParamsHelper (remote_dispatch.h:5741)
==11192== by 0x5212127: virNetServerProgramDispatchCall (virnetserverprogram.c:435)
==11192== by 0x5211C86: virNetServerProgramDispatch (virnetserverprogram.c:305)
==11192== by 0x520A8FD: virNetServerProcessMsg (virnetserver.c:165)
==11192== by 0x520A9E1: virNetServerHandleJob (virnetserver.c:186)
==11192== by 0x50DA78F: virThreadPoolWorker (virthreadpool.c:144)
==11192== by 0x50DA11C: virThreadHelper (virthreadpthread.c:161)
==11192== Address 0x1368baa0 is 576 bytes inside a block of size 688 free'd
==11192== at 0x4A07F5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==11192== by 0x5079A2F: virFree (viralloc.c:580)
==11192== by 0x11456C34: qemuDomainObjPrivateFree (qemu_domain.c:267)
==11192== by 0x50F41B4: virDomainObjDispose (domain_conf.c:2034)
==11192== by 0x50C2991: virObjectUnref (virobject.c:262)
==11192== by 0x50F4CFC: virDomainObjListRemove (domain_conf.c:2361)
==11192== by 0x1145C125: qemuDomainRemoveInactive (qemu_domain.c:2087)
==11192== by 0x11488520: qemuMigrationPrepareAny (qemu_migration.c:2456)
==11192== by 0x11488EA6: qemuMigrationPrepareDirect (qemu_migration.c:2652)
==11192== by 0x114D1509: qemuDomainMigratePrepare3Params (qemu_driver.c:10332)
==11192== by 0x519075D: virDomainMigratePrepare3Params (libvirt.c:7290)
==11192== by 0x1502DA: remoteDispatchDomainMigratePrepare3Params (remote.c:4798)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
One of my previous patches (c7ac2519b7f) did try to fix the issue when
domain dies too soon during migration. However, this clumsy approach was
missing removal of qemuProcessHandleMonitorDestroy resulting in double
unrefing of mon->vm and hence producing the daemon crash:
==11843== Invalid read of size 4
==11843== at 0x50C28C5: virObjectUnref (virobject.c:255)
==11843== by 0x1148F7DB: qemuMonitorDispose (qemu_monitor.c:258)
==11843== by 0x50C2991: virObjectUnref (virobject.c:262)
==11843== by 0x50C2D13: virObjectFreeCallback (virobject.c:388)
==11843== by 0x509C37B: virEventPollCleanupHandles (vireventpoll.c:583)
==11843== by 0x509C711: virEventPollRunOnce (vireventpoll.c:652)
==11843== by 0x509A620: virEventRunDefaultImpl (virevent.c:274)
==11843== by 0x520D21C: virNetServerRun (virnetserver.c:1112)
==11843== by 0x11F368: main (libvirtd.c:1513)
==11843== Address 0x13b88864 is 4 bytes inside a block of size 136 free'd
==11843== at 0x4A07F5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==11843== by 0x5079A2F: virFree (viralloc.c:580)
==11843== by 0x50C29E3: virObjectUnref (virobject.c:270)
==11843== by 0x114770E4: qemuProcessHandleMonitorDestroy (qemu_process.c:1103)
==11843== by 0x1148F7CB: qemuMonitorDispose (qemu_monitor.c:257)
==11843== by 0x50C2991: virObjectUnref (virobject.c:262)
==11843== by 0x50C2D13: virObjectFreeCallback (virobject.c:388)
==11843== by 0x509C37B: virEventPollCleanupHandles (vireventpoll.c:583)
==11843== by 0x509C711: virEventPollRunOnce (vireventpoll.c:652)
==11843== by 0x509A620: virEventRunDefaultImpl (virevent.c:274)
==11843== by 0x520D21C: virNetServerRun (virnetserver.c:1112)
==11843== by 0x11F368: main (libvirtd.c:1513)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So far we are checking if qemu supports 'nbd-server-start'. This,
however, makes no sense on the source as nbd-server-* is used on the
destination. On the source the 'drive-mirror' is used instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since the 90139a62 commit the error is copied into mon->lastError but
it's never freed from there.
==31989== 395 bytes in 1 blocks are definitely lost in loss record 877 of 978
==31989== at 0x4A06C2B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31989== by 0x7EAF129: strdup (in /lib64/libc-2.15.so)
==31989== by 0x50D586C: virStrdup (virstring.c:554)
==31989== by 0x50976C1: virCopyError (virerror.c:191)
==31989== by 0x5097A35: virCopyLastError (virerror.c:312)
==31989== by 0x114909A9: qemuMonitorIO (qemu_monitor.c:690)
==31989== by 0x509BEDE: virEventPollDispatchHandles (vireventpoll.c:501)
==31989== by 0x509C701: virEventPollRunOnce (vireventpoll.c:648)
==31989== by 0x509A620: virEventRunDefaultImpl (virevent.c:274)
==31989== by 0x520D21C: virNetServerRun (virnetserver.c:1112)
==31989== by 0x11F368: main (libvirtd.c:1513)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If there's a migration cancelled, the bitmap of migration port should be
cleaned up too.
Signed-off-by: Zeng Junliang <zengjunliang@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>