If user requested multiqueue networking, beside multiple /dev/tap and
/dev/vhost-net openings, we forgot to pass mq=on onto the -device
virtio-net-pci command line. This is advised at:
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
This function is to guess the correct limit for maximal memory
usage by qemu for given domain. This can never be guessed
correctly, not to mention all the pains and sleepless nights this
code has caused. Once somebody discovers algorithm to solve the
Halting Problem, we can compute the limit algorithmically. But
till then, this code should never see the light of the release
again.
Hotplugging a single SCSI device works, but adding additional ones
result in an error from QEMU:
[root@gpok197 ~]# virsh attach-device guest01 blah.xml
Device attached successfully
[root@gpok197 ~]# virsh attach-device guest01 blah2.xml
error: Failed to attach device from blah2.xml
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'hostdev0' for device
The hostdev ID that is created is always set to zero, regardless
of the contents of the XML. Changing the index in the hotplug case
to a negative one so the next available index is used.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
We had been setting the device alias in the devinceinfo for pci
controllers to "pci%u", but then hardcoding "pci.%u" when creating the
device address for other devices using that pci bus. This all worked
just fine until we encountered the built-in "pcie.0" bus (the PCIe
root complex) in Q35 machines.
In order to create the correct commandline for this one case, this
patch:
1) sets the alias for PCI controllers correctly, to "pci.%u" (or
"pcie.%u" for the pcie-root controller)
2) eliminates the hardcoded "pci.%u" for pci controllers when
generatuing device address strings, and instead uses the controller's
alias.
3) plumbs a pointer to the virDomainDef all the way down to
qemuBuildDeviceAddressStr. This was necessary in order to make the
aliase of the controller *used by a device* available (previously
qemuBuildDeviceAddressStr only had the deviceinfo of the device
itself, *not* of the controller it was connecting to). This made for a
larger than desired diff, but at least in the future we won't have to
do it again, since all the information we could possibly ever need for
future enhancements is in the virDomainDef. (right?)
This should be done for *all* controllers, but for now we just do it
in the case of PCI controllers, to reduce the likelyhood of
regression.
Introduced in commit 24b08219; compilation on RHEL 6.4 complained:
qemu/qemu_hotplug.c: In function 'qemuDomainAttachChrDevice':
qemu/qemu_hotplug.c:1257: error: declaration of 'remove' shadows a global declaration [-Wshadow]
/usr/include/stdio.h:177: error: shadowed declaration is here [-Wshadow]
* src/qemu/qemu_hotplug.c (qemuDomainAttachChrDevice): Avoid the
name 'remove'.
Signed-off-by: Eric Blake <eblake@redhat.com>
There are two levels on which a device may be hotplugged: config
and live. The config level requires just an insert or remove from
internal domain definition structure, which is exactly what this
patch does. There is currently no implementation for a chardev
update action, as there's not much to be updated. But more
importantly, the only thing that can be updated is path or socket
address by which chardevs are distinguished. So the update action
is currently not supported.
If an error occurs during qemuDomainAttachNetDevice after the macvtap
was created in qemuPhysIfaceConnect, the macvtap device gets left behind.
This patch adds code to the cleanup routine to delete the macvtap.
Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
I recently patches the callers to virPCIDeviceReset() to not call it
if the current driver for a device was vfio-pci (since that driver
will always reset the device itself when appropriate. At the time, Dan
Berrange suggested that I could instead modify virPCIDeviceReset
to check the currently bound driver for the device, and decide
for itself whether or not to go ahead with the reset.
This patch removes the previously added checks, and replaces them with
a check down in virPCIDeviceReset(), as suggested.
The functional difference here is that previously we were deciding
based on either the hostdev configuration or the value of
stubDriverName in the virPCIDevice object, but now we are actually
comparing to the "driver" link in the device's sysfs entry
directly. In practice, both should be the same.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
In order to learn libvirt multiqueue several things must be done:
1) The '/dev/net/tun' device needs to be opened multiple times with
IFF_MULTI_QUEUE flag passed to ioctl(fd, TUNSETIFF, &ifr);
2) Similarly, '/dev/vhost-net' must be opened as many times as in 1)
in order to keep 1:1 ratio recommended by qemu and kernel folks.
3) The command line construction code needs to switch from 'fd=X' to
'fds=X:Y:...:Z' and from 'vhostfd=X' to 'vhostfds=X:Y:...:Z'.
4) The monitor handling code needs to learn to pass multiple FDs.
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
Since 0d70656afded, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
This adds both attachment and detachment support for scsi host
device.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat>
It's better to put the usb related codes into qemuDomainAttachHostUsbDevice
instead of qemuDomainAttachHostDevice.
And in the old qemuDomainAttachHostDevice, just stealing the "usb" from
driver->activeUsbHostdevs leaks the memory.
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.
The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)
PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)
/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.
The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:
1) for any devices in the persistent config when the domain is started
(done during qemuSetupCgroup())
2) at device attach time for any hotplug devices (done in
qemuDomainAttachHostDevice)
The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())
Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.
This isn't strictly speaking a bugfix, but I realized I'd gotten a bit
too verbose when I chose the names for
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_*. This shortens them all a bit.
<source type='bridge'> uses a helper application to do the necessary
TUN/TAP setup to use an existing network bridge, thus letting
unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid
programs at all, which would include this helper program. From
a security POV, any setuid helper needs to be run by libvirtd itself,
not QEMU.
This is what this patch does. libvirt now invokes the setuid helper,
gets the TAP fd and then passes it to QEMU in the normal manner.
The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the
XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VFIO requires all of the guest's memory and IO space to be lockable in
RAM. The domain's max_balloon is the maximum amount of memory the
domain can have (in KiB). We add a generous 1GiB to that for IO space
(still much better than KVM device assignment, where the KVM module
actually *ignores* the process limits and locks everything anyway),
and convert from KiB to bytes.
In the case of hotplug, we are changing the limit for the already
existing qemu process (prlimit() is used under the hood), and for
regular commandline additions of vfio devices, we schedule a call to
setrlimit() that will happen after the qemu process is forked.
The device option for vfio-pci is nearly identical to that for
pci-assign - only the configfd parameter isn't supported (or needed).
Checking for presence of the bootindex parameter is done separately
from constructing the commandline, similar to how it is done for
pci-assign.
This patch contains tests to check for proper commandline
construction. It also includes tests for parser-formatter-parser
roundtrips (xml2xml), because those tests use the same data files, and
would have failed had they been included before now.
qemu: xml/args tests for VFIO hostdev and <interface type='hostdev'/>
These should be squashed in with the patch that adds commandline
handling of vfio (they would fail at any earlier time).
There will soon be other items related to pci hostdevs that need to be
in the same part of the hostdevsubsys union as the pci address (which
is currently a single member called "pci". This patch replaces the
single member named pci with a struct named pci that contains a single
member named "addr".
Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in qemuDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the QEMU driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
f946462e14ac036357b7c11ce5c23f94a3ee4e49 changed behavior by settings
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI upfront. If we do so before invoking
qemuDomainPCIAddressEnsureAddr we merely try to set the PCI slot via
qemuDomainPCIAddressReserveSlot instead reserving a new address via
qemuDomainPCIAddressSetNextAddr which fails with
$ ~/run-tck-test domain/200-disk-hotplug.t
./scripts/domain/200-disk-hotplug.t .. # Creating a new transient domain
./scripts/domain/200-disk-hotplug.t .. 1/5 # Attaching the new disk /var/lib/jenkins/jobs/libvirt-tck-build/workspace/scratchdir/200-disk-hotplug/extra.img
# Failed test 'disk has been attached'
# at ./scripts/domain/200-disk-hotplug.t line 67.
# died: Sys::Virt::Error (libvirt error code: 1, message: internal error unable to reserve PCI address 0:0:0.0
# )
The VIR_ERR_NO_SUPPORT error code is reserved for cases where an
API is not implemented in a driver. It definitely should not be
used when an API execution fails due to unsupported operation.
We didn't yet expose the virtio device attach and detach functionality
for s390 domains as the device hotplug was very limited with the old
virtio-s390 bus. With the CCW bus there's full hotplug support for
virtio devices in QEMU, so we are adding this to libvirt too.
Since the virtio hotplug isn't limited to PCI anymore, we change the
function names from xxxPCIyyy to xxxVirtioyyy, where we handle all
three virtio bus types.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>