The following sequence
1. Define a persistent QMEU guest
2. Start the QEMU guest
3. Stop libvirtd
4. Kill the QEMU process
5. Start libvirtd
6. List persistent guests
At the last step, the previously running persistent guest
will be missing. This is because of a race condition in the
QEMU driver startup code. It does
1. Load all VM state files
2. Spawn thread to reconnect to each VM
3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get
marked as "persistent". There is therefore a window where
the thread reconnecting to the VM will remove the persistent
VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
In addition to this though, we must only attempt to reconnect
to a VM which had a non-zero PID loaded from its state file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'error' cleanup block in qemuProcessReconnect() had a
'return' statement in the middle of it. This caused a leak
of virConnectPtr & virQEMUDriverConfigPtr instances. This
was identified because netcf recently started checking its
refcount in libvirtd shutdown:
netcfStateCleanup:109 : internal error: Attempt to close netcf state driver with open connections
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When adding an automatically allocated port to a well-formed migration
URI, keep it well-formed:
tcp://1.2.3.4/ -> tcp://1.2.3.4/:12345 # wrong
tcp://1.2.3.4/ -> tcp://1.2.3.4:12345/ # fixed
tcp://1.2.3.4 -> tcp://1.2.3.4:12345 # still works
tcp:1.2.3.4 -> tcp:1.2.3.4:12345 # still works (old syntax)
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:
<secmodel>
<model>selinux</model>
<doi>0</doi>
<baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
<baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
</secmodel>
<secmodel>
<model>dac</model>
<doi>0</doi>
<baselabel type='kvm'>107:107</baselabel>
<baselabel type='qemu'>107:107</baselabel>
</secmodel>
"baselabel" is driver-specific information, e.g. in the DAC security
model, it indicates USER_ID:GROUP_ID.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch (and the two patches that precede it) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1005682
When libvirt was changed to delay the final cleanup of device removal
until the qemu process had signaled it with a DEVICE_DELETED event for
that device, the hostdev removal function
(qemuDomainRemoveHostDevice()) was written to properly handle the
removal of a hostdev that was actually an SRIOV virtual function
(defined with <interface type='hostdev'>). However, the function used
to search for a device matching the alias name provided in the
DEVICE_DELETED message (virDomainDefFindDevice()) would search through
the list of netdevs before hostdevs, so qemuDomainRemoveHostDevice()
was never called; instead the netdev function,
qemuDomainRemoveNetDevice() (which *doesn't* properly cleanup after
removal of <interface type='hostdev'>), was called.
(As a reminder - each <interface type='hostdev'> results in a
virDomainNetDef which contains a virDomainHostdevDef having a parent
type of VIR_DOMAIN_DEVICE_NET, and parent.data.net pointing back to
the virDomainNetDef; both Defs point to the same device info object
(and the info contains the device's "alias", which is used by qemu to
identify the device). The virDomainHostdevDef is added to the domain's
hostdevs list *and* the virDomainNetDef is added to the domain's nets
list, so searching either list for a particular alias will yield a
positive result.)
This function modifies the qemuDomainRemoveNetDevice() to short
circuit itself and call qemu DomainRemoveHostDevice() instead when the
actual device is a VIR_DOMAIN_NET_TYPE_HOSTDEV (similar logic to what
is done in the higher level qemuDomainDetachNetDevice())
Note that even if virDomainDefFindDevice() changes in the future so
that it finds the hostdev entry first, the current code will continue
to work properly.
This function was called in three places, and in each the call was
qualified by a slightly different conditional. In reality, this
function should only be called for a hostdev if all of the following
are true:
1) mode='subsystem'
2) type='pci'
3) there is a parent device definition which is an <interface>
(VIR_DOMAIN_DEVICE_NET)
We can simplify the callers and make them more consistent by checking
these conditions at the top ov qemuDomainHostdevNetConfigRestore and
returning 0 if one of them isn't satisfied.
The location of the call to qemuDomainHostdevNetConfigRestore() has
also been changed in the hot-plug case - it is moved into the caller
of its previous location (i.e. from qemuDomainRemovePCIHostDevice() to
qemuDomainRemoveHostDevice()). This was done to be more consistent
about which functions pay attention to whether or not this is one of
the special <interface> hostdevs or just a normal hostdev -
qemuDomainRemoveHostDevice() already contained a call to
networkReleaseActualDevice() and virDomainNetDefFree(), so it makes
sense for it to also handle the resetting of the device's MAC address
and vlan tag (which is what's done by
qemuDomainHostdevNetConfigRestore()).
Most of the usage of getuid()/getgid() is in cases where we are
considering what privileges we have. As such the code should be
using the effective IDs, not real IDs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When running setuid, we must be careful about what env vars
we allow commands to inherit from us. Replace the
virCommandAddEnvPass function with two new ones which do
filtering
virCommandAddEnvPassAllowSUID
virCommandAddEnvPassBlockSUID
And make virCommandAddEnvPassCommon use the appropriate
ones
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit e3ef20d7 allows user to configure migration ports range via
qemu.conf. However, it forgot to update augeas definition file and
even the test data was malicious.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1019053
When we migrate vms concurrently, there's a chance that libvirtd on
destination assigns the same port for different migrations, which will
lead to migration failure during prepare phase on destination. So we use
virPortAllocator here to solve the problem.
Signed-off-by: Wang Yufei <james.wangyufei@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The header definition didn't match the function declaration, so adjusted
header to reflect the definition.
Found during a Coverity build where STATIC_ANALYSIS is enabled resulting
in the internal.h adding __nonnull__ handling to arguments.
Commit '6d264c91' added support for the qemuMonitorJSONDrivePivot() and
commit 'fbc3adc9' added a corresponding test which ended up triggering
the build failure which I didn't notice until today!
QEMU has support for SASL auth for SPICE guests, but libvirt
has no way to enable it. Following the example from VNC where
it is globally enabled via qemu.conf
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The last argument of memmove is the amount of bytes to be moved. The
amount is in Bytes. We are moving some void pointers around. However,
since sizeof(void *) is not Byte on any architecture, we've got the
arithmetic wrong.
'const fooPtr' is the same as 'foo * const' (the pointer won't
change, but it's contents can). But in general, if an interface
is trying to be const-correct, it should be using 'const foo *'
(the pointer is to data that can't be changed).
Fix up offenders in src/qemu.
* src/qemu/qemu_bridge_filter.h (networkAllowMacOnPort)
(networkDisallowMacOnPort): Use intended type.
* src/qemu/qemu_bridge_filter.c (networkAllowMacOnPort)
(networkDisallowMacOnPort): Likewise.
* src/qemu/qemu_command.c (qemuBuildTPMBackendStr)
(qemuBuildTPMDevStr, qemuBuildCpuArgStr)
(qemuBuildObsoleteAccelArg, qemuBuildMachineArgStr)
(qemuBuildSmpArgStr, qemuBuildNumaArgStr): Likewise.
* src/qemu/qemu_conf.c (qemuSharedDeviceEntryCopy): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSaveImageStartVM): Likewise.
* src/qemu/qemu_hostdev.c
(qemuDomainHostdevNetConfigVirtPortProfile): Likewise.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONAttachCharDevCommand): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
'const fooPtr' is the same as 'foo * const' (the pointer won't
change, but it's contents can). But in general, if an interface
is trying to be const-correct, it should be using 'const foo *'
(the pointer is to data that can't be changed).
Fix up offenders in src/conf/domain_conf, and their fallout.
Several things to note: virObjectLock() requires a non-const
argument; if this were C++, we could treat the locking field
as 'mutable' and allow locking an otherwise 'const' object, but
that is a more invasive change, so I instead dropped attempts
to be const-correct on domain lookup. virXMLPropString and
friends require a non-const xmlNodePtr - this is because libxml2
is not a const-correct library. We could make the src/util/virxml
wrappers cast away const, but I figured it was easier to not
try to mark xmlNodePtr as const. Finally, virDomainDeviceDefCopy
was a rather hard conversion - it calls virDomainDeviceDefPostParse,
which in turn in the xen driver was actually modifying the domain
outside of the current device being visited. We should not be
adding a device on the first per-device callback, but waiting until
after all per-device callbacks are complete.
* src/conf/domain_conf.h (virDomainObjListFindByID)
(virDomainObjListFindByUUID, virDomainObjListFindByName)
(virDomainObjAssignDef, virDomainObjListAdd): Drop attempt at
const.
(virDomainDeviceDefCopy): Use intended type.
(virDomainDeviceDefParse, virDomainDeviceDefPostParseCallback)
(virDomainVideoDefaultType, virDomainVideoDefaultRAM)
(virDomainChrGetDomainPtrs): Make const-correct.
* src/conf/domain_conf.c (virDomainObjListFindByID)
(virDomainObjListFindByUUID, virDomainObjListFindByName)
(virDomainDeviceDefCopy, virDomainObjListAdd)
(virDomainObjAssignDef, virDomainHostdevSubsysUsbDefParseXML)
(virDomainHostdevSubsysPciOrigStatesDefParseXML)
(virDomainHostdevSubsysPciDefParseXML)
(virDomainHostdevSubsysScsiDefParseXML)
(virDomainControllerModelTypeFromString)
(virDomainTPMDefParseXML, virDomainTimerDefParseXML)
(virDomainSoundCodecDefParseXML, virDomainSoundDefParseXML)
(virDomainWatchdogDefParseXML, virDomainRNGDefParseXML)
(virDomainMemballoonDefParseXML, virDomainNVRAMDefParseXML)
(virSysinfoParseXML, virDomainVideoAccelDefParseXML)
(virDomainVideoDefParseXML, virDomainHostdevDefParseXML)
(virDomainRedirdevDefParseXML)
(virDomainRedirFilterUsbDevDefParseXML)
(virDomainRedirFilterDefParseXML, virDomainIdMapEntrySort)
(virDomainIdmapDefParseXML, virDomainVcpuPinDefParseXML)
(virDiskNameToBusDeviceIndex, virDomainDeviceDefCopy)
(virDomainVideoDefaultType, virDomainHostdevAssignAddress)
(virDomainDeviceDefPostParseInternal, virDomainDeviceDefPostParse)
(virDomainChrGetDomainPtrs, virDomainControllerSCSINextUnit)
(virDomainSCSIDriveAddressIsUsed)
(virDomainDriveAddressIsUsedByDisk)
(virDomainDriveAddressIsUsedByHostdev): Fix fallout.
* src/openvz/openvz_driver.c (openvzDomainDeviceDefPostParse):
Likewise.
* src/libxl/libxl_domain.c (libxlDomainDeviceDefPostParse):
Likewise.
* src/qemu/qemu_domain.c (qemuDomainDeviceDefPostParse)
(qemuDomainDefaultNetModel): Likewise.
* src/lxc/lxc_domain.c (virLXCDomainDeviceDefPostParse):
Likewise.
* src/uml/uml_driver.c (umlDomainDeviceDefPostParse): Likewise.
* src/xen/xen_driver.c (xenDomainDeviceDefPostParse): Split...
(xenDomainDefPostParse): ...since per-device callback is not the
time to be adding a device.
Signed-off-by: Eric Blake <eblake@redhat.com>
virDomainChrGetDomainPtrs() required 4 levels of pointers (taking
a parameter that will be used as an output variable to return the
address of another variable that contains an array of pointers).
This is rather complex to reason about, especially when outside
of the domain_conf file, no other caller should be modifying
the resulting array of pointers directly. Changing the public
signature gives something is easier to reason with, and actually
make const-correct; which is important as it was the only function
that was blocking virDomainDeviceDefCopy from treating its source
as const.
* src/conf/domain_conf.h (virDomainChrGetDomainPtrs): Use simpler
types, and make const-correct for external users.
* src/conf/domain_conf.c (virDomainChrGetDomainPtrs): Split...
(virDomainChrGetDomainPtrsInternal): ...into an internal version
that lets us modify terms, vs. external form that is read-only.
(virDomainDeviceDefPostParseInternal, virDomainChrFind)
(virDomainChrInsert): Adjust callers.
* src/qemu/qemu_command.c (qemuGetNextChrDevIndex): Adjust caller.
(qemuDomainDeviceAliasIndex): Make const-correct.
Signed-off-by: Eric Blake <eblake@redhat.com>
The regular save image code has the support to compress images using a
specified algorithm. This was not implemented for external checkpoints
although it shares most of the backend code.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1017227
The regular save image code has the support to compress images using a
specified algorithm. This was not implemented for managed save although
it shares most of the backend code.
After my patches, some functions gained one more argument
(@listenAddress) which wasn't included in debug printing of
arguments they were called with. Functions in question are:
qemuMigrationPrepareDirect and qemuMigrationPerform.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
died too quickly = in Prepare phase. What is happening here is:
1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
succeeds, however switching monitor to QMP mode fails as qemu died
meanwhile. That is qemuMonitorSetCapabilities() returns -1.
2013-10-08 15:54:10.629+0000: 3493: debug : qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
2013-10-08 15:54:10.630+0000: 3493: debug : qemuMonitorJSONCommandWithFd:262 : Send command '{"execute":"qmp_capabilities","id":"libvirt-1"}' for write with FD -1
2013-10-08 15:54:10.630+0000: 3493: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17 events=13
...
2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 : QEMU_MONITOR_SEND_MSG: mon=0x14a53da0 msg={"execute":"qmp_capabilities","id":"libvirt-1"}
fd=-1
2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 : Poll got 1 event(s)
2) [Thread 3262] The event loop is trying to do the talking to monitor.
However, qemu is dead already, remember?
2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 : Unable to read from monitor: Connection reset by peer
2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
...
2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send command resulted in error internal error: early end of file from monitor: possible problem:
3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
'endjob' label and subsequently to the 'cleanup'. Since the domain is
not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
Unpleasant because the event loop which is about to trigger EOF callback
still holds a pointer to the @vm (not the reference). See the valgrind
output below.
4) [Thread 3262] So the event loop starts triggering EOF:
2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 : Triggering EOF callback
2013-10-08 15:54:13.543+0000: 3262: debug : qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
And the monitor is cleaned up. This results in calling
qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer is
kept in qemuMonitor struct.
==3262== Thread 1:
==3262== Invalid read of size 4
==3262== at 0x77ECCAA: pthread_mutex_lock (in /lib64/libpthread-2.15.so)
==3262== by 0x52FAA06: virMutexLock (virthreadpthread.c:85)
==3262== by 0x52E3891: virObjectLock (virobject.c:320)
==3262== by 0x11626743: qemuProcessHandleMonitorEOF (qemu_process.c:296)
==3262== by 0x11642593: qemuMonitorIO (qemu_monitor.c:730)
==3262== by 0x52BD526: virEventPollDispatchHandles (vireventpoll.c:501)
==3262== by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648)
==3262== by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274)
==3262== by 0x542D3D9: virNetServerRun (virnetserver.c:1112)
==3262== by 0x11F368: main (libvirtd.c:1513)
==3262== Address 0x14549128 is 24 bytes inside a block of size 136 free'd
==3262== at 0x4C2AF5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3262== by 0x529B1FF: virFree (viralloc.c:580)
==3262== by 0x52E3703: virObjectUnref (virobject.c:270)
==3262== by 0x531557E: virDomainObjListRemove (domain_conf.c:2355)
==3262== by 0x1160E899: qemuDomainRemoveInactive (qemu_domain.c:2061)
==3262== by 0x1163A0C6: qemuMigrationPrepareAny (qemu_migration.c:2450)
==3262== by 0x1163A923: qemuMigrationPrepareDirect (qemu_migration.c:2626)
==3262== by 0x11682D71: qemuDomainMigratePrepare3Params (qemu_driver.c:10309)
==3262== by 0x53B0976: virDomainMigratePrepare3Params (libvirt.c:7266)
==3262== by 0x1502D3: remoteDispatchDomainMigratePrepare3Params (remote.c:4797)
==3262== by 0x12DECA: remoteDispatchDomainMigratePrepare3ParamsHelper (remote_dispatch.h:5741)
==3262== by 0x54322EB: virNetServerProgramDispatchCall (virnetserverprogram.c:435)
The mon->vm is set in qemuMonitorOpenInternal() which is the correct
place to increase @vm ref counter. The correct place to decrease the ref
counter is then qemuMonitorDispose().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This configuration knob is there to override default listen address for
-incoming for all qemu domains.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=994364
Whenever we check for ABI stability, we have new xml (e.g. provided by
user, or obtained from snapshot, whatever) which we compare to old xml
and see if ABI won't break. However, if the new xml was produced via
virDomainGetXMLDesc(..., VIR_DOMAIN_XML_MIGRATABLE) it lacks some
devices, e.g. 'pci-root' controller. Hence, the ABI stability check
fails even though it is stable. Moreover, we can't simply fix
virDomainDefCheckABIStability because removing the correct devices is
task for the driver. For instance, qemu driver wants to remove the usb
controller too, while LXC driver doesn't. That's why we need special
qemu wrapper over virDomainDefCheckABIStability which removes the
correct devices from domain XML, produces MIGRATABLE xml and calls the
check ABI stability function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
At the beginning of the function qemuPrepareHostdevPCICheckSupport() is
called. After that @pcidevs is initialized. However, if the very first
command fails, we go to 'cleanup' label where virObjectUnref(pcidevs) is
called. Obviously, it is called before @pcidevs was able to get
initialized. Compiler warns about it:
CC qemu/libvirt_driver_qemu_impl_la-qemu_hostdev.lo
qemu/qemu_hostdev.c: In function 'qemuPrepareHostdevPCIDevices':
qemu/qemu_hostdev.c:824:19: error: 'pcidevs' may be used uninitialized in this function [-Werror=maybe-uninitialized]
virObjectUnref(pcidevs);
^
cc1: all warnings being treated as errors
Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
Add code to check availability of PCI passhthrough using VFIO and the
legacy KVM passthrough and use it when starting VMs and hotplugging
devices to live machine.
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.
Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.
The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1012824https://bugzilla.redhat.com/show_bug.cgi?id=1012834
Note that a similar problem was reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=827519
but the fix only worked for <interface type='hostdev'>, *not* for
<interface type='network'> where the network itself was a pool of
hostdevs.
The symptom in both cases was this error message:
internal error: Unable to determine device index for network device
In both cases the cause was lack of proper handling for netdevs
(<interface>) of type='hostdev' when scanning the netdev list looking
for alias names in qemuAssignDeviceNetAlias() - those that aren't
type='hostdev' have an alias of the form "net%d", while those that are
hostdev use "hostdev%d". This special handling was completely lacking
prior to the fix for Bug 827519 which was:
When searching for the highest alias index, libvirt looks at the alias
for each netdev and if it is type='hostdev' it ignores the entry. If
the type is not hostdev, then it expects the "net%d" form; if it
doesn't find that, it fails and logs the above error message.
That fix works except in the case of <interface type='network'> where
the network uses hostdev (i.e. the network is a pool of VFs to be
assigned to the guests via PCI passthrough). In this case, the check
for type='hostdev' would fail because it was done as:
def->net[i]->type == VIR_DOMAIN_NET_TYPE_HOSTDEV
(which compares what was written in the config) when it actually
should have been:
virDomainNetGetActualType(def->net[i]) == VIR_DOMAIN_NET_TYPE_HOSTDEV
(which compares the type of netdev that was actually allocated from
the network at runtime).
Of course the latter wouldn't be of any use if the netdevs of
type='network' hadn't already acquired their actual network connection
yet, but manual examination of the code showed that this is never the
case.
While looking through qemu_command.c, two other places were found to
directly compare the net[i]->type field rather than getting actualType:
* qemuAssignDeviceAliases() - in this case, the incorrect comparison
would cause us to create a "net%d" alias for a netdev with
type='network' but actualType='hostdev'. This alias would be
subsequently overwritten by the proper "hostdev%d" form, so
everything would operate properly, but a string would be
leaked. This patch also fixes this problem.
* qemuAssignDevicePCISlots() - would defer assigning a PCI address to
a netdev if it was type='hostdev', but not for type='network +
actualType='hostdev'. In this case, the actual device usually hasn't
been acquired yet anyway, and even in the case that it has, there is
no practical difference between assigning a PCI address while
traversing the netdev list or while traversing the hostdev
list. Because changing it would be an effective NOP (but potentially
cause some unexpected regression), this usage was left unchanged.
When querying for kvm, we try to find 'enabled' field. Hence the error
message should report we haven't found 'enabled' and not 'running'
(which is not even in the reply). Probably a typo or copy-paste error.
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This basically covers the talking-to-monitor part of
virQEMUCapsInitQMP. The patch itself has no real value,
but it creates an entity to be tested in the next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The change in ef29de14c3 that introduced
better error logging from qemu introduced a warning from coverity about
unused return value from lseek. Silence this warning and fix typo in the
corresponding error message.
Reported by: John Ferlan
https://bugzilla.redhat.com/show_bug.cgi?id=1011330 (case A)
While activeScsiHostdevs and webSocketPorts were allocated in
qemuStateInitialize, they were not freed in qemuStateCleanup.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1011330 (case D)
qemuProcessStart created two references to virQEMUDriverConfigPtr before
calling fork():
cfg = virQEMUDriverGetConfig(driver);
...
hookData.cfg = virObjectRef(cfg);
However, the child only unreferenced hookData.cfg and the parent only
removed the cfg reference. That said, we don't need to increment the
reference counter when assigning cfg to hookData. Both the child and the
parent will correctly remove the reference on cfg (the child will do
that through hookData).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The return value of virDomainControllerFind >=0 means that
the specific controller was found.
But some functions invoke it and treat 0 as not found.
This patch fix these incorrect invocation.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
If qemuParseCommandLine finds an arg it does not understand
it adds it to the QEMU passthrough custom arg list. If the
qemuParseCommandLine method hits an error for any reason
though, it just does 'VIR_FREE(cmd)' on the custom arg list.
This means all actual args / env vars are leaked. Introduce
a qemuDomainCmdlineDefFree method to be used for cleanup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the call to virDomainControllerInsert fails in
qemuParseCommandLine, the controller struct is leaked.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'qemuStringToArgvEnv' method splits up a string of command
line env/args to an 'arglist' array. It then copies env vars
to a 'progenv' array and args to a 'progargv' array. When
copyin the env vars, it NULL-ifies the element in 'arglist'
that is copied.
Upon OOM the 'virStringListFree' is called on progenv and
arglist. Unfortunately, because the elements in 'arglist'
related to env vars have been set to NULL, the call to
virStringListFree(arglist) doesn't free anything, even
though some non-NULL args vars still exist later in the
array.
To fix this leak, stop NULL-ifying the 'arglist' elements,
and change the cleanup code to only free elements in the
'arglist' array, not 'progenv'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In a number of places in qemuParseCommandLineDisk, an error
is reported, but no 'goto error' jump is used. This causes
failure to report OOM conditions to the caller.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in qemuParseCommandLineDisk some intermediate
variables will be leaked when parsing Sheepdog or RBD disks.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuBuildCommandLine code for parsing sound cards will leak
an intermediate variable if an OOM occurs. Move the free'ing of
the variable earlier to avoid the leak.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In qemuParseNBDString, if the virURIParse fails, the
error is not reported to the caller. Instead execution
falls through to the non-URI codepath causing memory
leaks later on.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If qemuAddRBDHost fails due to parsing problems or OOM, then
qemuParseRBDString cleanup is skipped causing a memory leak.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemuDomainPCIAddressGetNextSlot has a loop for finding
compatible PCI buses. In the loop body it creates a
PCI address string, but never frees this. This causes
a leak if the loop executes more than one iteration,
or if a call in the loop body fails.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This resolves one of the issues listed in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
00:1E.0 is the location of this controller on at least some actual Q35
hardware, so we try to replicate the placement. The bridge should work
just as well in any other location though, so if 00:1E.0 isn't
available, just allow it to be auto-assigned anywhere appropriate.
This resolves one of the issues in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
This device is identical to qemu's "intel-hda" device (known as "ich6"
in libvirt), but has a different PCI device ID (which matches the ID
of the hda audio built into the ich9 chipset, of course). It's not
supported in earlier versions of qemu, so it requires a capability
bit.
I'm not sure why this code was written to compare the strings that it
had just retrieved from an enum->string conversion, rather than just
look at the original enum values, but this yields the same results,
and is much more efficient (especially as you add more devices).
This is a prerequisite for patches to resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
Part of the resolution to:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
Although most devices available in qemu area defined as PCI devices,
and strictly speaking should only be attached via a PCI slot, in
practice qemu allows them to be attached to a PCIe slot and sometimes
this makes sense.
For example, The UHCI and EHCI USB controllers are usually attached
directly to the PCIe "root complex" (i.e. PCIe slots) on real
hardware, so that should be possible for a Q35-based qemu virtual
machine as well.
We still want to prefer a standard PCI slot when auto-assigning
addresses, though, and in general to disallow attaching PCI devices
via PCIe slots.
This patch makes that possible by adding a new
QEMU_PCI_CONNECT_TYPE_EITHER_IF_CONFIG flag. Three things are done
with this flag:
1) It is set for the "pcie-root" controller
2) qemuCollectPCIAddress() now has a set of nested switches that set
this "EITHER" flag for devices that we want to allow connecting to
pcie-root when specifically requested in the config.
3) qemuDomainPCIAddressFlagsCompatible() adds this new flag to the
"flagsMatchMask" if the address being checked came from config rather
than being newly auto-allocated by libvirt (this knowledge is
conveniently already available in the "fromConfig" arg).
Now any device having the EITHER flag set can be connected to
pcie-root if explicitly requested, but auto-allocated addresses for
those devices will still be standard PCI slots instead.
This patch only loosens the restrictions on devices that have been
specifically requested, but the setup is such that it should be fairly
easy to add new devices.
Replace them with switch cases. This will make it more efficient when
we add exceptions for more controller types, and other device types.
This is a prerequisite for patches to resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
The previous patches added infrastructure to report better errors from
monitor in some cases. This patch finalizes this "feature" by enabling
this enhanced error reporting on early phases of VM startup. In these
phases the possibility of qemu producing a useful error message is
really high compared to running it during the whole life cycle. After
the start up is complete, the feature is disabled to provide the usual
error messages so that users are not confused by possibly irrelevant
messages that may be in the domain log.
The original motivation to do this enhancement is to capture errors when
using VFIO device passthrough, where qemu reports errors after the
monitor is initialized and the existing error catching code couldn't
catch this producing a unhelpful message:
# virsh start test
error: Failed to start domain test
error: Unable to read from monitor: Connection reset by peer
With this change, the message is changed to:
# virsh start test
error: Failed to start domain test
error: internal error: early end of file from monitor: possible problem:
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: error, group 8 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: vfio: failed to get group 8
qemu-system-x86_64: -device vfio-pci,host=00:1a.0,id=hostdev0,bus=pci.0,addr=0x5: Device 'vfio-pci' could not be initialized
Change the monitor error code to add the ability to access the qemu log
file using a file descriptor so that we can dig in it for a more useful
error message. The error is now logged on monitor hangups and overwrites
a possible lesser error. A hangup on the monitor usualy means that qemu
has crashed and there's a significant chance it produced a useful error
message.
The functionality will be latent until the next patch.
Early VM startup errors usually produce a better error message in the
machine log file. Currently we were accessing it only when the process
exited during certain phases of startup. This will help adding a more
comprehensive error extraction for early qemu startup phases.
This patch adds infrastructure to keep a file descriptor for the machine
log file that will be used in case an error happens.
Teach the function to skip character device definitions printed by qemu
at startup in addition to libvirt log messages and make it usable from
outside of qemu_process.c. Also add documentation about the func.
The parsing of '-usb' did not check for failure of the
virDomainControllerInsert method. As a result on OOM, the
parser mistakenly attached USB disks to the IDE controller.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code formatting NUMA args was ignoring the return value
of virBitmapFormat, so on OOM, it would silently drop the
NUMA cpumask arg.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When building boot menu args, if OOM occurred the CLI args
would end up containing 'order=(null)' due to a missing
call to 'virBufferError'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuParseCommandLine method did not check the return value of
virStringSplit to see if OOM had occurred. This lead to dereference
of a NULL pointer on OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Most callers of qemuParseKeywords were assigning its return
value to a 'size_t' variable. Then then also checked '< 0'
for error condition, but this will never be true with the
unsigned size_t variable. Rather than using 'ssize_t', change
qemuParseKeywords so that the element count is returned via
an output parameter, leaving the return value solely as an
error indicator.
This avoids a crash accessing beyond the end of an error
upon OOM.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In
commit 41b5505679
Author: Eric Blake <eblake@redhat.com>
Date: Wed Aug 28 15:01:23 2013 -0600
qemu: simplify list cleanup
The qemuStringToArgvEnv method was changed to use virStringFreeList
to free the 'arglist' array. This method assumes the string list
array is NULL terminated, however, qemuStringToArgvEnv was not
ensuring this when populating 'arglist'. This caused an out of
bounds access by virStringFreeList when OOM occured in the initial
loop of qemuStringToArgvEnv
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When parsing the RBD hosts, it increments the 'nhosts' counter
before increasing the 'hosts' array allocation. If an OOM then
occurs when increasing the array allocation, the cleanup block
will attempt to access beyond the end of the array. Switch
to using VIR_EXPAND_N instead of VIR_REALLOC_N to protect against
this mistake
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If OOM occurs in qemuDomainCCWAddressSetCreate, it jumps to
a cleanup block and frees the partially initialized object.
It then mistakenly returns the address of the just free'd
pointer instead of NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since the wait is done during migration (still inside
QEMU_ASYNC_JOB_MIGRATION_OUT), the code should enter the monitor as such
in order to prohibit all other jobs from interfering in the meantime.
This patch fixes bug #1009886 in which qemuDomainGetBlockInfo was
waiting on the monitor condition and after GetSpiceMigrationStatus
mangled its internal data, the daemon crashed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1009886
This resolves https://bugzilla.redhat.com/show_bug.cgi?id=1008903
The Q35 machinetype has an implicit SATA controller at 00:1F.2 which
isn't given the "expected" id of ahci0 by qemu when it's created. The
original suggested solution to this problem was to not specify any
controller for the disks that use the default controller and just
specify "unit=n" instead; qemu should then use the first IDE or SATA
controller for the disk.
Unfortunately, this "solution" is ignorant of the fact that in the
case of SATA disks, the "unit" attribute in the disk XML is actually
*not* being used for the unit, but is instead used to specify the
"bus" number; each SATA controller has 6 buses, and each bus only
allows a single unit. This makes it nonsensical to specify unit='n'
where n is anything other than 0. It also means that the only way to
connect more than a single device to the implicit SATA controller is
to explicitly give the bus names, which happen to be "ide.$n", where
$n can be replaced by the disk's "unit" number.
virDomainSetBlockIoTuneEnsureACL was incorrectly called after we already
started a job. As a result of this, the job was not cleaned up when an
access driver had forbidden the action.
qemu/KVM also supports a tftp URL while specifying the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='tftp' name='/url/path'>
<host name='host.name' port='69'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The ftps protocol is another protocol supported by qemu/KVM while specifying
the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftps' name='/url/path'>
<host name='host.name' port='990'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The https protocol is also accepted by qemu/KVM when specifying the cdrom ISO
image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='https' name='/url/path'>
<host name='host.name' port='443'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
If the ABI compatibility check with the "migratable" user XML is
successful, we would leak the originally parsed XML from the user that
would not be used in this case.
Reported by Ján Tomko.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
The function implemented common behavior that can be reused for other
hypervisor drivers that use the virDomainObj data structures. Factor out
the core into a separate helper func.
In the original implementation of external checkpoints I've mistakenly
used the live definition to be stored in the save image. The normal
approach is to use the "migratable" definition. This was discovered when
commit 07966f6a8b changed the behavior to
use a converted XML from the user to do the compatibility check to fix
problem when using the regular machine saving.
As the previous patch added a compatibility layer, we can now change the
type of the XML in the image.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1008340
External checkpoints have a bug in the implementation where they use the
normal definition instead of the "migratable" one. This causes errors
when the snapshot is being reverted using the workaround method via
qemuDomainRestoreFlags() with a custom XML. This issue was introduced
when commit 07966f6a8b changed the code to
compare "migratable" XMLs from the user as we should have used
migratable in the image too.
This patch adds a compatibility layer, so that fixing the snapshot code
won't make existing snapshots fail to load.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1008340
qemuMigrationEatCookie has flags to control if these should
be parsed, but it does not fill mig->flags. These cookies might
get leaked if these flags are not set by qemuMigrationBakeCookie.
42 (32 direct, 10 indirect) bytes in 1 blocks are definitely lost in
loss record 361 of 662
==123== by 0x1BA33FCA: qemuMigrationEatCookie (qemu_migration.c:678)
==123== by 0x1BA34A1E: qemuMigrationRun (qemu_migration.c:3108)
==123== by 0x1BA3622B: doNativeMigrate (qemu_migration.c:3343)
==123== by 0x1BA3B408: qemuMigrationPerform (qemu_migration.c:4138)
When reverting a live internal snapshot with a live guest the ABI
compatiblity check was comparing a "migratable" definition with a normal
one. This resulted in the check failing with:
revert requires force: Target device address type none does not match source pci
This patch generates a "migratable" definition from the actual one to
check against the definition from the snapshot to avoid this problem.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1006886
Osier Yang pointed out that ever since commit 31cb030, the
signature of qemuDomainObjEndJob was changed to return a bool.
While comparison against 0 or > 0 still gives the right results,
it looks fishy; we also had one place that was comparing < 0
which is effectively dead code.
* src/qemu/qemu_migration.c (qemuMigrationPrepareAny): Fix dead
code bug.
(qemuMigrationBegin): Use more canonical form of bool check.
* src/qemu/qemu_driver.c (qemuAutostartDomain)
(qemuDomainCreateXML, qemuDomainSuspend, qemuDomainResume)
(qemuDomainShutdownFlags, qemuDomainReboot, qemuDomainReset)
(qemuDomainDestroyFlags, qemuDomainSetMemoryFlags)
(qemuDomainSetMemoryStatsPeriod, qemuDomainInjectNMI)
(qemuDomainSendKey, qemuDomainGetInfo, qemuDomainScreenshot)
(qemuDomainSetVcpusFlags, qemuDomainGetVcpusFlags)
(qemuDomainRestoreFlags, qemuDomainGetXMLDesc)
(qemuDomainCreateWithFlags, qemuDomainAttachDeviceFlags)
(qemuDomainUpdateDeviceFlags, qemuDomainDetachDeviceFlags)
(qemuDomainBlockResize, qemuDomainBlockStats)
(qemuDomainBlockStatsFlags, qemuDomainMemoryStats)
(qemuDomainMemoryPeek, qemuDomainGetBlockInfo)
(qemuDomainAbortJob, qemuDomainMigrateSetMaxDowntime)
(qemuDomainMigrateGetCompressionCache)
(qemuDomainMigrateSetCompressionCache)
(qemuDomainMigrateSetMaxSpeed)
(qemuDomainSnapshotCreateActiveInternal)
(qemuDomainRevertToSnapshot, qemuDomainSnapshotDelete)
(qemuDomainQemuMonitorCommand, qemuDomainQemuAttach)
(qemuDomainBlockJobImpl, qemuDomainBlockCopy)
(qemuDomainBlockCommit, qemuDomainOpenGraphics)
(qemuDomainGetBlockIoTune, qemuDomainGetDiskErrors)
(qemuDomainPMSuspendForDuration, qemuDomainPMWakeup)
(qemuDomainQemuAgentCommand, qemuDomainFSTrim): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Failure to attach to a domain during 'virsh qemu-attach' left
the list of domains in an odd state:
$ virsh qemu-attach 4176
error: An error occurred, but the cause is unknown
$ virsh list --all
Id Name State
----------------------------------------------------
2 foo shut off
$ virsh qemu-attach 4176
error: Requested operation is not valid: domain is already active as 'foo'
$ virsh undefine foo
error: Failed to undefine domain foo
error: Requested operation is not valid: cannot undefine transient domain
$ virsh shutdown foo
error: Failed to shutdown domain foo
error: invalid argument: monitor must not be NULL
It all stems from leaving the list of domains unmodified on
the initial failure; we should follow the lead of createXML
which removes vm on failure (the actual initial failure still
needs to be fixed in a later patch, but at least this patch
gets us to the point where we aren't getting stuck with an
unremovable "shut off" transient domain).
While investigating, I also found a leak in qemuDomainCreateXML;
the two functions should behave similarly. Note that there are
still two unusual paths: if dom is not allocated, the user will
see an OOM error even though the vm remains registered (but oom
errors already indicate tricky cleanup); and if the vm starts
and then quits again all before the job ends, it is possible
to return a non-NULL dom even though the dom will no longer be
useful for anything (but this at least lets the user know their
short-lived vm ran).
* src/qemu/qemu_driver.c (qemuDomainCreateXML): Don't leak vm on
failure to obtain job.
(qemuDomainQemuAttach): Match cleanup of qemuDomainCreateXML.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, only X86 provides users CPU features with CPUID instruction.
If users specify the features for non-x86, it should tell users to
remove them.
This patch is to report one error if features are specified by
users for non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
While debugging a failure of 'virsh qemu-attach', I noticed that
we were leaking the count of active domains on failure. This
means that a libvirtd session that is supposed to quit after
active domains disappear will hang around forever.
* src/qemu/qemu_process.c (qemuProcessAttach): Undo count of
active domains on failure.
Signed-off-by: Eric Blake <eblake@redhat.com>
In Fedora 19, 'qemu-kvm' is a simple wrapper that calls
'qemu-system-x86_64 -machine accel=kvm'. Attempting
to use 'virsh qemu-attach $pid' to a machine started as:
qemu-kvm -cdrom /var/lib/libvirt/images/foo.img \
-monitor unix:/tmp/demo,server,nowait -name foo \
--uuid cece4f9f-dff0-575d-0e8e-01fe380f12ea
was failing with:
error: XML error: No PCI buses available
because we did not see 'kvm' in the executable name read from
/proc/$pid/cmdline, and tried to assign os.machine as
"accel=kvm" instead of "pc"; this in turn led to refusal to
recognize the pci bus.
Noticed while investigating https://bugzilla.redhat.com/995312
although there are still other issues to fix before that bug
will be completely solved.
I've concluded that the existing parser code for native-to-xml
is a horrendous hodge-podge of ad-hoc approaches; I basically
rewrote the -machine section to be a bit saner.
* src/qemu/qemu_command.c (qemuParseCommandLine): Don't assume
-machine argument is always appropriate for os.machine; set
virtType if accel is present.
Signed-off-by: Eric Blake <eblake@redhat.com>
'virsh domxml-from-native' and 'virsh qemu-attach' could misbehave
for an emulator installed in (a somewhat unlikely) location
such as /usr/local/qemu-1.6/qemu-system-x86_64 or (an even less
likely) /opt/notxen/qemu-system-x86_64. Limit the strstr seach
to just the basename of the file where we are assuming details
about the binary based on its name.
While testing, I accidentally triggered a core dump during strcmp
when I forgot to set os.type on one of my code paths; this patch
changes such a coding error to raise a nicer internal error instead.
* src/qemu/qemu_command.c (qemuParseCommandLine): Compute basename
earlier.
* src/conf/domain_conf.c (virDomainDefPostParseInternal): Avoid
NULL deref.
Signed-off-by: Eric Blake <eblake@redhat.com>
CPU features are not supported on non-x86 and hasFeatures will be NULL.
This patch is to remove CPU features functions calling to avoid errors.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
The VIR_FREE() macro will cast away any const-ness. This masked a
number of places where we passed a 'const char *' string to
VIR_FREE. Fortunately in all of these cases, the variable was not
in fact const data, but a heap allocated string. Fix all the
variable declarations to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
No need to open code now that we have a nice function.
Interestingly, our virStringFreeList function is typed correctly
(a malloc'd list of malloc'd strings is NOT const, whether at the
point where it is created, or at the point where it is cleand up),
so using it with a 'const char **' argument would require a cast
to keep the compiler. I chose instead to remove const from code
even where we don't modify the argument, just to avoid the need
to cast.
* src/qemu/qemu_command.h (qemuParseCommandLine): Drop declaration.
* src/qemu/qemu_command.c (qemuParseProcFileStrings)
(qemuStringToArgvEnv): Don't force malloc'd result to be const.
(qemuParseCommandLinePid, qemuParseCommandLineString): Simplify
cleanup.
(qemuParseCommandLine, qemuFindEnv): Drop const-correctness to
avoid the need to cast in callers.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=999352
Since commit v1.0.5-56-g449e6b1 (Pull parsing of migration xml up into
QEMU driver APIs) any attempt to rename a domain during migration fails
with the following error message:
internal error Incoming cookie data had unexpected name DOM vs DOM2
This is because migration cookies always use the original domain name
and the mentioned commit failed to propagate the name back to
qemuMigrationPrepareAny.
Currently, kernel supports up to 8 queues for a multiqueue tap device.
However, if user tries to enter a huge number (e.g. one million) the tap
allocation fails, as expected. But what is not expected is the log full
of warnings:
warning : virFileClose:83 : Tried to close invalid fd 0
The problem is, upon error we iterate over an array of FDs (handlers to
queues) and VIR_FORCE_CLOSE() over each item. However, the array is
pre-filled with zeros. Hence, we repeatedly close stdin. Ouch.
But there's more. The queues allocation is done in virNetDevTapCreate()
which cleans up the FDs in case of error. Then, its caller, the
virNetDevTapCreateInBridgePort() iterates over the FD array and tries to
close them too. And so does qemuNetworkIfaceConnect() and
qemuBuildInterfaceCommandLine().
Starting with qemu 1.6, the qemu-system-arm vexpress-a9 model has a
hardcoded virtio-mmio transport which enables attaching all virtio
devices.
On the command line, we have to use virtio-XXX-device rather than
virtio-XXX-pci, thankfully s390 already set the precedent here so
it's fairly straight forward.
At the XML level, this adds a new device address type virtio-mmio.
The controller and addressing don't have any subelements at the
moment because we they aren't needed for this usecase, but could
be added later if needed.
Add a test case for an ARM guest with one of every virtio device
enabled.
Similar to the chardev bit, ARM boards depend on the old style '-net nic'
for actually instantiating net devices. But we can't block out
-netdev altogether since it's needed for upcoming virtio support.
And add tests for working ARM XML with console, disk, and networking.
This corresponds to '-sd' and '-drive if=sd' on the qemu command line.
Needed for many ARM boards which don't provide any other way to
pass in storage.
QEMU ARM boards don't give us any way to explicitly wire in
a -chardev, so use the old style -serial options.
Unfortunately this isn't as simple as just turning off the CHARDEV flag
for qemu-system-arm, as upcoming virtio support _will_ use device/chardev.
On my machine, a guest fails to boot if it has a sound card, but not
graphical device/display is configured, because pulseaudio fails to
initialize since it can't access $HOME.
A workaround is removing the audio device, however on ARM boards there
isn't any option to do that, so -nographic always fails.
Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately
this has massive test suite fallout.
Add a qemu.conf parameter nographics_allow_host_audio, that if enabled
will pass through QEMU_AUDIO_DRV from sysconfig (similar to
vnc_allow_host_audio)
Add an attribute named 'removable' to the 'target' element of disks,
which controls the removable flag. For instance, on a Linux guest it
controls the value of /sys/block/$dev/removable. This option is only
valid for USB disks (i.e. bus='usb'), and its default value is 'off',
which is the same behaviour as before.
To achieve this, 'removable=on' (or 'off') is appended to the '-device
usb-storage' parameter sent to qemu when adding a USB disk via
'-disk'. A capability flag QEMU_CAPS_USB_STORAGE_REMOVABLE was added
to keep track if this option is supported by the qemu version used.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=922495
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Allow use of the usb-storage device only if the new capability flag
QEMU_CAPS_DEVICE_USB_STORAGE is set, which it is for qemu(-kvm)
versions >= 0.12.1.2-rhel62-beta.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
vhost only works in KVM mode at the moment, and is infact compiled
out if the emulator is built for non-native architecture. While it
may work at some point in the future for plain qemu, for now it's
just noise on the command line (and which contributes to arm cli
breakage).
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
QEMU commit 3984890 introduced the "pci-hole64-size" property,
to i440FX-pcihost and q35-pcihost with a default setting of 2 GB.
Translate <pcihole64>x<pcihole64/> to:
-global q35-pcihost.pci-hole64-size=x for q35 machines and
-global i440FX-pcihost.pci-hole64-size=x for i440FX-based machines.
Error out on other machine types or if the size was specified
but the pcihost device lacks 'pci-hole64-size' property.
https://bugzilla.redhat.com/show_bug.cgi?id=990418
The ftp protocol is already recognized by qemu/KVM so add this support to
libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftp' name='/url/path'>
<host name='host.name' port='21'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
QEMU/KVM already allows a HTTP URL for the cdrom ISO image so add this support
to libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='http' name='/url/path'>
<host name='host.name' port='80'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
If there's no hard_limit set and domain uses VFIO we still must lock
the guest memory (prerequisite from qemu). Hence, we should compute
the amount to be locked from max_balloon.
When cpu hotplug fails without reporting an error, we would fail the
command but update the count of vCPUs anyways.
Commit 761fc48136 fixed the case when CPU
hot-unplug failed silently, but forgot to fix up the value in this case.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000357
The virDomainOpenGraphics method accepts a UNIX socket FD from
the client app. It must set the label on this FD otherwise QEMU
will be prevented from receiving it with recvmsg.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
Re-arrange the code so that the returned bitmap is always initialized to
NULL even on early failures and return an error message as some callers
are already expecting it. Fix up the rest not to shadow the error.
https://bugzilla.redhat.com/show_bug.cgi?id=822052
When doing a live migration, if the destination fails for any
reason after the point in which files should be labeled, then
the cleanup of the destination would restore the labels to their
defaults, even though the source is still trying to continue
running with the image open. Bug 822052 mentioned one source
of live migration failure - a mismatch in SELinux virt_use_nfs
settings (on for source, off for destination); but I found other
situations that would also trigger it (for example, having a
graphics device tied to port 5999 on the source, and a different
domain on the destination already using that port, so that the
destination cannot reuse the port).
In short, just as cleanup of the source on a successful migration
must not relabel files (because the destination would be crippled
by the relabel), cleanup of the destination on a failed migration
must not relabel files (because the source would be crippled).
* src/qemu/qemu_process.c (qemuProcessStart): Set flag to avoid
label restoration when cleaning up on failed migration.
Signed-off-by: Eric Blake <eblake@redhat.com>
Each of the modules handled reporting error messages from the secret fetching
slightly differently with respect to the error. Provide a similar message
for each error case and provide as much data as possible.
Following XML would fail :
<disk type='network' device='lun'>
<driver name='qemu' type='raw'/>
<source protocol='iscsi' name='iqn.2013-07.com.example:iscsi/1'>
<host name='example.com' port='3260'/>
</source>
<target dev='sda' bus='scsi'/>
</disk>
With the message:
error: Failed to start domain iscsilun
error: Unable to get device ID 'iqn.2013-07.com.example:iscsi/1': No such fi
Cause was commit id '1f49b05a' which added 'virDomainDiskSourceIsBlockType'
If there's no hard_limit set and domain uses VFIO we still must lock the
guest memory (prerequisite from qemu). Hence, we should compute the
amount to be locked from max_balloon.
Since 16bcb3 we have a regression. The hard_limit is set
unconditionally. By default the limit is zero. Hence, if user hasn't
configured any, we set the zero in cgroup subsystem making the kernel
kill the corresponding qemu process immediately. The proper fix is to
set hard_limit iff user has configured any.
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Currently the virConnectBaselineCPU API does not expose the CPU features
that are part of the CPU's model. This patch adds a new flag,
VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES, that causes the API to explicitly
list all features that are part of that model.
Signed-off-by: Don Dugger <donald.d.dugger@intel.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Go through disks of guest, if one disk doesn't exist or its backing
chain is broken, with 'optional' startupPolicy, for CDROM and Floppy
we only discard its source path definition in xml, for disks we drop
it from disk list and free it.
This patch addresses two concerns with the error reporting when an
incompatible PCI address is specified for a device:
1) It wasn't always apparent which device had the problem. With this
patch applied, any error about an incompatible address will always
contain the full address as given in the config, so it will be easier
to determine which device's config aused the problem.
2) In some cases when the problem came from bad config, the error
message was erroneously classified as VIR_ERR_INTERNAL_ERROR. With
this patch applied, the same error message will be changed to indicate
either "internal" or "xml" error depending on whether the address came
from the config, or was automatically generated by libvirt.
Note that in the case of "internal" (due to bad auto-generation)
errors, the PCI address won't be of much use in finding the location
in config to change (because it was automatically generated). Of
course that makes perfect sense, but still the address could provide a
clue about a bug in libvirt attempting to use a type of pci bus that
doesn't have its flags set correctly (or something similar). In other
words, it's not perfect, but it is definitely better.
q35 machines have an implicit ahci (sata) controller at 00:1F.2 which
has no "id" associated with it. For this reason, we can't refer to it
as "ahci0". Instead, we don't give an id on the commandline, which
qemu interprets as "use the first ahci controller". We then need to
specify the unit with "unit=%d" rather than adding it onto the bus
arg.
https://bugzilla.redhat.com/show_bug.cgi?id=979477
Since 1.0.3 we are using the new way to copy non shared storage during
migration (the NBD way). However, whether the new or old way is used is
not controllable by user but unconditionally turned on if both sides of
migration support it. Moreover, the implementation is not complete: the
combination for VIR_MIGRATE_TUNNELLED flag is missing (as we need to
open new port on the destination) in which case we just error out. This
is a deadly combination: not letting users choose their destiny and
erroring out. We should not do that but VIR_WARN and turn the NBD off
instead.
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
This patch adds in special handling for a few devices that need to be
treated differently for q35 domains:
usb - there is no implicit/default usb controller for the q35
machinetype. This is done because normally the default usb controller
is added to a domain by just adding "-usb" to the qemu commandline,
and it's assumed that this will add a single piix3 usb1 controller at
slot 1 function 2. That's not what happens when the machinetype is
q35, though. Instead, adding -usb to the commandline adds 3 usb
(version 2) controllers to the domain at slot 0x1D.{1,2,7}. Rather
than having
<controller type='usb' index='0'/>
translate into 3 separate devices on the PCI bus, it's cleaner to not
automatically add a default usb device; one can always be added
explicitly if desired. Or we may decide that on q35 machines, 3 usb
controllers will be automatically added when none is given. But for
this initial commit, at least we aren't locking ourselves into
something we later won't want.
video - qemu always initializes the primary video device immediately
after any integrated devices for the machinetype. Unless instructed
otherwise (by using "-device vga..." instead of "-vga" which libvirt
uses in many cases to work around deficiencies and bugs in various
qemu versions) qemu will always pick the first unused slot. In the
case of the "pc" machinetype and its derivatives, this is always slot
2, but on q35 machinetypes, the first free slot is slot 1 (since the
q35's integrated peripheral devices are placed in other slots,
e.g. slot 0x1f). In order to make the PCI address of the video device
predictable, that slot (1 or 2, depending on machinetype) is reserved
even when no video device has been specified.
sata - a q35 machine always has a sata controller implicitly added at
slot 0x1F, function 2. There is no way to avoid this controller, so we
always add it. Note that the xml2xml tests for the pcie-root and q35
cases were changed to use DO_TEST_DIFFERENT() so that we can check for
the sata controller being automatically added. This is especially
important because we can't check for it in the xml2argv output (it has
no effect on that output since it's an implicit device).
ide - q35 has no ide controllers.
isa and smbus controllers - these two are always present in a q35 (at
slot 0x1F functions 0 and 3) but we have no way of modelling them in
our config. We do need to reserve those functions so that the user
doesn't attempt to put anything else there though. (note that the "pc"
machine type also has an ISA controller, which we also ignore).
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
Previous refactoring of the guest PCI address reservation/allocation
code allowed for slot types other than basic PCI (e.g. PCI express,
non-hotpluggable slots, etc) but would not auto-allocate a slot for a
device that required any type other than a basic hot-pluggable
PCI slot.
This patch refactors the code to be aware of different slot types
during auto-allocation of addresses as well - as long as there is an
empty slot of the required type, it will be found and used.
The piece that *wasn't* added is that we don't auto-create a new PCI
bus when needed for anything except basic PCI devices. This is because
there are multiple different types of controllers that can provide,
for example, a PCI express slot (in addition to the pcie-root
controller, these can also be found on a "root-port" or on a
"downstream-switch-port"). Since we currently don't support any PCIe
devices (except pending support for dmi-to-pci-bridge), we can defer
any decision on what to do about this.