This eliminates the misleading error message that was being logged
when a vfio hostdev hotplug failed:
error: unable to set user and group to '107:107' on '/dev/vfio/22':
No such file or directory
as documented in:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
Commit ee414b5d (pushed as a fix for Bug 1016511 and part of Bug
1025108) replaced the single call to
virSecurityManagerSetHostdevLabel() in qemuDomainAttachHostDevice()
with individual calls to that same function in each
device-type-specific attach function (for PCI, USB, and SCSI). It also
added a corresponding call to virSecurityManagerRestoreHostdevLabel()
in the error handling of the device-type-specific functions, but
forgot to remove the common call to that from
qemuDomainAttachHostDevice() - this resulted in a duplicate call to
virSecurityManagerRestoreHostdevLabel(), with the second occurrence
being after (e.g.) a PCI device has already been re-attached to the
host driver, thus destroying some of the device nodes / links that we
then attempted to re-label (e.f. /dev/vfio/22) and generating an error
log that obscured the original error.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1035490
virProcessSetMaxMemLock() (which is a wrapper over prlimit(3)) expects
the memory size in bytes, but libvirt's domain definition (which was
being used by qemuDomainAttachHostPciDevice()) stores all memory
tuning parameters in KiB. This was being accounted for when setting
MaxMemLock at domain startup time (so cold-plugged devices would
work), but not for hotplug.
This patch simplifies the few lines that call
virProcessSetMemMaxLock(), and multiply the amount * 1024 so that
we're locking the correct amount of memory.
What remains a mystery to me is why hot-plug of a managed='no' device
would succeed (at least on my system) while managed='yes' would
fail. I guess in one case the memory was coincidentally already
resident and in the other it wasn't.
We were unconditionally removing the device from the host list, when it
should only be done on error.
This fixes USB collision detection when hotplugging the same device to
two guests.
If we hit a collision, we free the USB device while it is still part
of our temporary USBDeviceList. When the list is unref'd, the device
is free'd again.
Make the initial device freeing dependent on whether it is present
in the temporary list or not.
Similar to what Jiri did for cgroup setup/teardown in 05e149f94, push
it all into the device handler functions so we can do the necessary prep
work before claiming the device.
This also fixes hotplugging USB devices by product/vendor (virt-manager's
default behavior):
https://bugzilla.redhat.com/show_bug.cgi?id=1016511
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1029732
The BZ asked for the capability to change the number of queues used by
a virtio-net device while the device is in use. Because the number of
queues can only be set at the time the device is created, that isn't
possible. However, libvirt also shouldn't be silently reporting
success when someone tries to change the number of queues. So this
patch flags that as an error (just as attempts to change any of the
other virtio-specific parameters already do).
If a SCSI hostdev is included in an initial domain XML, without a
corresponding controller statement, one is created silently when the
guest is booted.
When hotplugging a SCSI hostdev, a presumption is that the controller
is already present in the domain either from the original XML, or via
an earlier hotplug.
[root@xxxxxxxx ~]# cat disk.xml
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='3' unit='1088438288'/>
</source>
</hostdev>
[root@xxxxxxxx ~]# virsh attach-device guest01 disk.xml
error: Failed to attach device from disk.xml
error: internal error: unable to execute QEMU command 'device_add': Bus 'scsi0.0' not found
Since the infrastructure is in place, we can also create a controller
silently for use by the hotplugged hostdev device.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For systems without a PCI bus, attaching a SCSI controller fails:
[root@xxxxxxxx ~]# cat controller.xml
<controller type='scsi' model='virtio-scsi' index='0' />
[root@xxxxxxxx ~]# virsh attach-device guest01 controller.xml
error: Failed to attach device from controller.xml
error: XML error: No PCI buses available
A similar problem occurs with the detach of a controller:
[root@xxxxxxxx ~]# virsh detach-device guest01 controller.xml
error: Failed to detach device from controller.xml
error: operation failed: controller scsi:0 not found
The qemuDomainXXtachPciControllerDevice routines made assumptions
that any caller had a PCI bus. These routines now selectively calls
PCI functions where necessary, and assigns the device information
type to one appropriate for the bus in use.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
For attach/detach of controller devices, we rename the functions to
remove 'PCI' from their title. The actual separation of PCI-specific
operations will be handled in the next patch.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1025108
So far qemuSetupHostdevCGroup was called very early during hotplug, even
before we knew the device we were about to hotplug was actually
available. By calling the function later, we make sure QEMU won't be
allowed to access devices used by other domains.
Another important effect of this change is that hopluging USB devices
specified by vendor and product (but not by their USB address) works
again. This was broken since v1.0.5-171-g7d763ac, when the call to
qemuFindHostdevUSBDevice was moved after the call to
qemuSetupHostdevCGroup, which then used an uninitialized USB address.
This patch moves some code in the qemuDomainAttachSCSIDisk
function. The check for the existence of a PCI address assigned
to the SCSI controller was moved in order to be executed only
when needed. The PCI address of a controller is not necessary
if QEMU_CAPS_DEVICE is supported.
This fixes issues with the hotplug of SCSI disks on pseries guests.
This patch (and the two patches that precede it) resolve:
https://bugzilla.redhat.com/show_bug.cgi?id=1005682
When libvirt was changed to delay the final cleanup of device removal
until the qemu process had signaled it with a DEVICE_DELETED event for
that device, the hostdev removal function
(qemuDomainRemoveHostDevice()) was written to properly handle the
removal of a hostdev that was actually an SRIOV virtual function
(defined with <interface type='hostdev'>). However, the function used
to search for a device matching the alias name provided in the
DEVICE_DELETED message (virDomainDefFindDevice()) would search through
the list of netdevs before hostdevs, so qemuDomainRemoveHostDevice()
was never called; instead the netdev function,
qemuDomainRemoveNetDevice() (which *doesn't* properly cleanup after
removal of <interface type='hostdev'>), was called.
(As a reminder - each <interface type='hostdev'> results in a
virDomainNetDef which contains a virDomainHostdevDef having a parent
type of VIR_DOMAIN_DEVICE_NET, and parent.data.net pointing back to
the virDomainNetDef; both Defs point to the same device info object
(and the info contains the device's "alias", which is used by qemu to
identify the device). The virDomainHostdevDef is added to the domain's
hostdevs list *and* the virDomainNetDef is added to the domain's nets
list, so searching either list for a particular alias will yield a
positive result.)
This function modifies the qemuDomainRemoveNetDevice() to short
circuit itself and call qemu DomainRemoveHostDevice() instead when the
actual device is a VIR_DOMAIN_NET_TYPE_HOSTDEV (similar logic to what
is done in the higher level qemuDomainDetachNetDevice())
Note that even if virDomainDefFindDevice() changes in the future so
that it finds the hostdev entry first, the current code will continue
to work properly.
This function was called in three places, and in each the call was
qualified by a slightly different conditional. In reality, this
function should only be called for a hostdev if all of the following
are true:
1) mode='subsystem'
2) type='pci'
3) there is a parent device definition which is an <interface>
(VIR_DOMAIN_DEVICE_NET)
We can simplify the callers and make them more consistent by checking
these conditions at the top ov qemuDomainHostdevNetConfigRestore and
returning 0 if one of them isn't satisfied.
The location of the call to qemuDomainHostdevNetConfigRestore() has
also been changed in the hot-plug case - it is moved into the caller
of its previous location (i.e. from qemuDomainRemovePCIHostDevice() to
qemuDomainRemoveHostDevice()). This was done to be more consistent
about which functions pay attention to whether or not this is one of
the special <interface> hostdevs or just a normal hostdev -
qemuDomainRemoveHostDevice() already contained a call to
networkReleaseActualDevice() and virDomainNetDefFree(), so it makes
sense for it to also handle the resetting of the device's MAC address
and vlan tag (which is what's done by
qemuDomainHostdevNetConfigRestore()).
Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
The qemuDomainChangeNet() is called when 'virsh update-device' is
invoked on a NIC. Currently, we fail to update the QoS even though
we have routines for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The return value of virDomainControllerFind >=0 means that
the specific controller was found.
But some functions invoke it and treat 0 as not found.
This patch fix these incorrect invocation.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
When using a <interface type="network"> that points to a network with
hostdev forwarding mode a hostdev alias is created for the network. This
allias is inserted into the hostdev list, but is backed with a part of
the network object that it is connected to.
When a VM is being stopped qemuProcessStop() calls
networkReleaseActualDevice() which eventually frees the memory for the
hostdev object. Afterwards when the domain definition is being freed by
virDomainDefFree() an invalid pointer is accessed by
virDomainHostdevDefFree() and may cause a crash of the daemon.
This patch removes the entry in the hostdev list before freeing the
depending memory to avoid this issue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1000973
If there's no hard_limit set and domain uses VFIO we still must lock
the guest memory (prerequisite from qemu). Hence, we should compute
the amount to be locked from max_balloon.
If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
Signed-off-by: Eric Blake <eblake@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
If an error occurs during qemuDomainAttachNetDevice after the macvtap
was created in qemuPhysIfaceConnect, the macvtap device gets left behind.
This patch adds code to the cleanup routine to delete the macvtap.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.