This patch adds new functions for reservation, assignment and release
to handle the uid/fid. If the uid/fid is defined in the domain XML,
they will be reserved directly in the collecting phase. If any of them
is not defined, we will find out an available value for them from the
zPCI address hashtable, and reserve them. For the hotplug case there
might not be a zPCI definition. So allocate and reserve uid/fid the
case. Assign if needed and reserve uid/fid for the defined case.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch provides a caching mechanism for the device address
extensions uid and fid on S390. For efficient sparse address allocation,
we introduce two hash tables for uid/fid which hold the address set
information per domain. Also in order to improve performance of
searching available value, we introduce our own callbacks for the two
hashtables. In this way, uid/fid is saved in hash key and hash value
could be any non-NULL pointer due to no operation on hash value. That is
also the reason why we don't introduce hash value free callback.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This patch introduces PCI address extension flag for virDomainDeviceInfo
and virPCIDeviceAddress. The extension flag in virDomainDeviceInfo is
used internally during calculating PCI extension flag. The one in
virPCIDeviceAddress is the duplicate to indicate extension address is
being used. Currently only zPCI extension address is introduced to deal
with 'uid' and 'fid' on the S390 platform.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
On aarch64, lauch vm with the follow configuration:
<interface type="hostdev" managed="yes">
<mac address="fa:16:3e:14:41:00"/>
<source>
<address type="pci" domain="0x0000" bus="0x01" slot="0x0b" function="0x2"/>
</source>
</interface>
libvirtd will crash when accessing net->model.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Adjusting domain format documentation, adding device address
support and adding command line generation for vfio-ap.
Since only one mediated hostdev with model vfio-ap is supported a check
disallows to define domains with more than one such hostdev device.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Chris Venteicher <cventeic@redhat.com>
The struct is called virPCIDeviceAddress and the
functions operating on it should be named accordingly.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The affected functions are
virDeviceInfoPCIAddressWanted()
virDeviceInfoPCIAddressPresent()
which get renamed to
virDeviceInfoPCIAddressIsWanted()
virDeviceInfoPCIAddressIsPresent()
to comply with the naming convention used for other
predicates.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We're going to need to assign virtio-mmio addresses to non-ARM
guests soon, so let's create a generic wrapper that calls to
the architecture-specific implementation.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Commit d48813e8 made sure we wouldn't get one for i440fx, but not for Q35
machine type. If the primary video didn't get the assumed 0:0:1.0 PCI
address, the evaluation then failed with: "Cannot automatically add a
new PCI bus for a device with connect flags 00"
https://bugzilla.redhat.com/show_bug.cgi?id=1609087
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Historically, we've always enabled an emulated video device every time we
see that graphics should be supported with a guest. With the appearance
of mediated devices which can support QEMU's vfio-display capability,
users might want to use such a device as the only video device.
Therefore introduce a new, effectively a 'disable', type for video
device.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Since 133fb140 moved the validation of a video device into a separate
function, the code handling PCI slot assignment for video devices has
been the same for both the primary device and the secondary devices.
Let's merge these and thus handle all the devices within the existing
'for' loop.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
from src/qemu/qemu_domain_address.c to src/conf/domain_addr.c
and rename to virDomainCCWAddressSetCreateFromDomain
(rename to have Address in full instead of Addr to follow
the naming convention of other virDomainCCWAddress functions)
Signed-off-by: Anya Harter <aharter@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add a new 'vsock' element for the vsock device.
The 'model' attribute is optional.
A <source cid> subelement should be used to specify the guest cid,
or <source auto='yes'/> should be used.
https://bugzilla.redhat.com/show_bug.cgi?id=1291851
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Introduces the vfio-ccw model for mediated devices and prime vfio-ccw
devices such that CCW address will be generated.
Alters the qemuxml2xmltest for testing a basic mdev device using vfio-ccw.
Signed-off-by: Shalini Chellathurai Saroja <shalini@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add the function virHostdevIsMdevDevice() which detects whether a
hostdev is a mediated device or not. Also, replace all existing
conditionals.
Signed-off-by: Shalini Chellathurai Saroja <shalini@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Let us introduce the capability QEMU_CAPS_CCW for virtual-css-bridge
and replace QEMU_CAPS_VIRTIO_CCW with QEMU_CAPS_CCW in code segments
which identify support for ccw devices.
The virtual-css-bridge is part of the ccw support introduced in QEMU 2.7.
The QEMU_CAPS_CCW capability is based on the existence of the QEMU type.
Let us also add the capability QEMU_CAPS_CCW to the tests which
require support for ccw devices.
Signed-off-by: Shalini Chellathurai Saroja <shalini@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Just like the existing areMultipleRootsSupported, this will
allow us to change the results of the driver-agnostic PCI
address allocation logic based on whether the QEMU binary
supports certain features.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The new controller will not yet be used automatically by
libvirt, but at this point it's already possible to configure
a guest to use it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We're going to add a similarly-named attribute later, and we'd
like to be consistent.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
QEMU on S390 (since v2.11) can support the virtio-gpu-ccw device,
which can be used as a video device.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
commit 7210cef452 'qemu: build command line for virtio input devices'
introduced an error, by checking if input bus type is
VIR_DOMAIN_DISK_BUS_VIRTIO.
Fix it by using the correct bus type for input devices.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit id '177db487' renamed 'qemuValidateDevicePCISlotsChipsets' to
'qemuDomainValidateDevicePCISlotsChipsets', but didn't adjust comment.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Since data.count is not a pointer, but an integer,
compare it against an integer value instead of using
the implicit "boolean" conversion that is customarily
used for pointers.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
If the user tries to define a domain that has
<controller type='usb' model='none'/>
and also some USB devices, we report an error:
error: internal error: No free USB ports
Which is technically still correct for a domain with no USB ports.
Change it to:
USB is disabled for this domain, but USB devices are present in the domain XML
https://bugzilla.redhat.com/show_bug.cgi?id=1347550
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Rather than having the caller check, if the input @addrs is NULL
(e.g. priv->usbaddrs), then just return 0. This also removes the
need for ATTRIBUTE_NONNULL which only really helped if someone
passed a NULL as a parameter not if the passed parameter is NULL.
Signed-off-by: John Ferlan <jferlan@redhat.com>
GCC 8 became more fussy about detecting switch
fallthroughs. First it doesn't like it if you have
a fallthrough attribute that is not before a case
statement. e.g.
FOO:
BAR:
WIZZ:
ATTRIBUTE_FALLTHROUGH;
Is unacceptable as there's no final case statement,
so while FOO & BAR are falling through, WIZZ is
not falling through. IOW, GCC wants us to write
FOO:
BAR:
ATTRIBUTE_FALLTHROUGH;
WIZZ:
Second, it will report risk of fallthrough even if you
have a case statement for every single enum value, but
only if the switch is nested inside another switch and
the outer case statement has no final break. This is
is arguably valid because despite the fact that we have
cast from "int" to the enum typedef, nothing guarantees
that the variable we're switching on only contains values
that have corresponding switch labels. e.g.
int domstate = 87539319;
switch ((virDomainState)domstate) {
...
}
will not match enum value, but also not raise any kind
of compiler warning. So it is right to complain about
risk of fallthrough if no default: is present.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The controller model is slightly unusual in that the default value is
-1, not 0. As a result the default value is not covered by any of the
existing enum cases. This in turn means that any switch() statements
that think they have covered all cases, will in fact not match the
default value at all. In the qemuDomainDeviceCalculatePCIConnectFlags()
method this has caused a serious mistake where we fallthrough from the
SCSI controller case, to the VirtioSerial controller case, and from
the USB controller case to the IDE controller case.
By adding explicit enum constant starting at -1, we can ensure switches
remember to handle the default case.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently the QEMU driver will call directly into the network driver
impl to modify resolve the atual type of NICs with type=network. It
has todo this before it has allocated the actual NIC. This introduces
a callback system to allow us to decouple the QEMU driver from the
network driver.
This is a short term step, as it ought to be possible to achieve the
same end goal by simply querying XML via the public network API. The
QEMU code in question though, has no virConnectPtr conveniently
available at this time.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now that the controller model is updated during post parse callback,
this code no longer needs to fetch the model based on the capabilities
and can just return the model directly if the controller is found.
Removal of @qemuCaps cascades through various callers which are now
updated to not pass the capabilities.
Now that post parse processing handles setting the SCSI controller
model, there's no need to call qemuDomainGetSCSIControllerModel to
get the "default controller" when building the command line controller
string or when assigning the spaprvio address since the controller
model value will already be filled in.
During post parse processing, let's force setting the controller
model to default value if not already set for defined controllers
(e.g. the non implicit ones).
Rename and rework qemuDomainSetSCSIControllerModel since we're
really not setting the SCSI controller model. Instead the code
is either returning the existing SCSI controller model value, the
default value based on the capabilities, or -1 with the error set.
Rather than repeat multiple steps in order to find the SCSI
controller model, let's combine them into one helper that will
return either the model from the definition or the default
model based on the capabilities.
This patch adds an extra check/error that the controller
that's being found actually exists. This just clarifies that
the error was because the controller doesn't exist rather
than the more generic error that we were unable to determine
the model from qemuDomainSetSCSIControllerModel when a -1
was passed in and the capabilities were unable to find one.
Rather than one function serving two purposes, let's split out the
else condition which is checking whether the model can be used
during command line building based on the capabilities.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Commit 10c73bf1 fixed a bug that I had introduced back in commit
70249927 - if a vhost-scsi device had no manually assigned PCI
address, one wouldn't be assigned automatically. There was a slight
problem with the logic of the fix though - in the case of domains with
pcie-root (e.g. those with a q35 machinetype),
qemuDomainDeviceCalculatePCIConnectFlags() will attempt to determine
if the host-side PCI device is Express or legacy by examining sysfs
based on the host-side PCI address stored in
hostdev->source.subsys.u.pci.addr, but that part of the union is only
valid for PCI hostdevs, *not* for SCSI hostdevs. So we end up trying
to read sysfs for some probably-non-existent device, which fails, and
the function virPCIDeviceIsPCIExpress() returns failure (-1).
By coincidence, the return value is being examined as a boolean, and
since -1 is true, we still end up assigning the vhost-scsi device to
an Express slot, but that is just by chance (and could fail in the
case that the gibberish in the "hostside PCI address" was the address
of a real device that happened to be legacy PCI).
Since (according to Paolo Bonzini) vhost-scsi devices appear just like
virtio-scsi devices in the guest, they should follow the same rules as
virtio devices when deciding whether they should be placed in an
Express or a legacy slot. That's accomplished in this patch by
returning early with virtioFlags, rather than erroneously using
hostdev->source.subsys.u.pci.addr. It also adds a test case for PCIe
to assure it doesn't get broken in the future.
Commit id '70249927b' neglected to cover this case because the test
had taken the "shortcut" to already add the <address>; however, when
the PCI address assignment code was adjusted by commit id '70249927'
the vhost-scsi (VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST) wasn't
covered thus returning a 0 for pciFlags. So I altered the tests too
to make sure it doesn't happen again.
Previously the qemuxml2xmloutdata was a softlink to the source
qemuxml2argvdata, so I unlinked and recreated the output file to
force generation of the adddress. Without the test changes, an
address generation returns:
libvirt: Domain Config error : internal error: Cannot automatically
add a new PCI bus for a device with connect flags 00
if an address was supplied in the test, a restart of libvirtd or
edit of a guest would display the following opaque message:
warning : qemuDomainCollectPCIAddress:1237 :
qemuDomainDeviceCalculatePCIConnectFlags() thinks that the device
with PCI address 0000:00:09.0 should not have a PCI address
where the address is related to the guest PCI address provided.
Introduce specific a target types with two models for the console
devices (sclp and sclplm) used in s390 and s390x guests, so isa-serial
is no more used for them.
This makes <serial> usable on s390 and s390x guests, with at most only
a single sclpconsole and one sclplmconsole devices usable in a single
guest (due to limitations in QEMU, which will enforce already at
runtime).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1449265
Signed-off-by: Pino Toscano <ptoscano@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We can finally introduce a specific target model for the pl011 device
used by mach-virt guests, which means isa-serial will no longer show
up to confuse users.
We make sure migration works in both directions by interpreting the
isa-serial target type, or the lack of target type, appropriately
when parsing the guest XML, and skipping the newly-introduced type
when formatting if for migration. We also verify that pl011 is not
used for non-mach-virt guests and add a bunch of test cases.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=151292
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The existing implementation set the address type for all serial
devices to spapr-vio, which made it impossible to use other devices
such as usb-serial and pci-serial; moreover, some decisions were
made based on the address type rather than the device type.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1512934
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We can finally introduce a specific target model for the spapr-vty
device used by pSeries guests, which means isa-serial will no longer
show up to confuse users.
We make sure migration works in both directions by interpreting the
isa-serial target type, or the lack of target type, appropriately
when parsing the guest XML, and skipping the newly-introduced type
when formatting if for migration. We also verify that spapr-vty is
not used for non-pSeries guests and add a bunch of test cases.
This commit is best viewed with 'git show -w'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1511421
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
This is the first step in getting rid of the assumption that
isa-serial is the default target type for serial devices.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Split out the common code responsible for reserving/assigning
PCI/CCW addresses for virtio disks into a helper function
for reuse by other virtio devices.
Somewhere around commit 9ff9d9f reserving entire PCI slots was
eliminated, as demonstrated by commit 6cc2014.
Reserve the functions required by the implicit devices:
00:01.0 ISA Bridge
00:01.1 IDE Controller
00:01.2 USB Controller (unless USB is disabled)
00:01.3 Bridge
https://bugzilla.redhat.com/show_bug.cgi?id=1460143
Will be needed for future patches to pull the default video type
setting out of XML parsing routines.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
We can't retrieve the isolation group of a device that's not present
in the system. However, it's very common for VFs to be created late
in the boot, so they might not be present yet when libvirtd starts,
which would cause the guests using them to disappear.
Moreover, for other architectures and even ppc64 before isolation
groups were introduced, it's considered perfectly fine to configure a
guest to use a device that's not yet (or no longer) available to the
host, with the obvious caveat that such a guest won't be able to
start before the device is available.
In order to be consistent, when a device's isolation group can't be
determined fall back to not isolating it rather than erroring out or,
worse, making the guest disappear.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1484254
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This is more user-friendly because the error will be
displayed directly instead of being buried in the log.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The original name didn't hint at the fact that PHBs are
a pSeries-specific concept.
Suggested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
All the pieces are now in place, so we can finally start
using isolation groups to achieve our initial goal, which is
separating hostdevs from emulated PCI devices while keeping
hostdevs that belong to the same host IOMMU group together.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1280542
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Isolation groups will eventually allow us to make sure certain
devices, eg. PCI hostdevs, are assigned to guest PCI buses in
a way that guarantees improved isolation, error detection and
recovery for machine types and hypervisors that support it,
eg. pSeries guest on QEMU.
This patch merely defines storage for the new information
we're going to need later on and makes sure it is passed from
the hypervisor driver (QEMU / bhyve) down to the generic PCI
address allocation code.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
When looking for slots suitable for a PCI device, libvirt
might need to add an extra PCI controller: for pSeries guests,
we want that extra controller to be a PHB (pci-root) rather
than a PCI bridge.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
PCI bus has to be numbered sequentially, and no index can be
missing, so libvirt will fill in the blanks automatically for
the user.
Up until now, it has done so using either pci-bridge, for machine
types based on legacy PCI, or pcie-root-port, for machine types
based on PCI Express. Neither choice is good for pSeries guests,
where PHBs (pci-root) should be used instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
pSeries guests will soon need the new information; luckily,
we can figure it out automatically most of the time, so
users won't have to worry about it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
This function was private to the QEMU driver and was,
accordingly, called qemuDomainPCIBusFullyReserved().
However the function is really not QEMU-specific at
all, so it makes sense to move it closer to the
virDomainPCIAddressBus struct it operates on.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Introduce new wrapper functions without *Machine* in the function
name that take the whole virDomainDef structure as argument and
call the existing functions with *Machine* in the function name.
Change the arguments of existing functions to *machine* and *arch*
because they don't need the whole virDomainDef structure and they
could be used in places where we don't have virDomainDef.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
So far, the official support is for x86_64 arch guests so unless a
different device API than vfio-pci is available let's only turn on
support for PCI address assignment. Once a different device API is
introduced, we can enable another address type easily.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
ioh3420 is emulated Intel hardware, so it always looked
quite out of place in aarch64/virt guests. Even for x86/q35
guests, the recently-introduced pcie-root-port is a better
choice because, unlike ioh3420, it doesn't require IO space
(a fairly constrained resource) to work.
If pcie-root-port is available in QEMU, use it; ioh3420 is
still used as fallback for when pcie-root-port is not
available.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1408808
bhyve supports 'gop' video device that allows clients to connect
to VMs using VNC clients. This commit adds support for that to
the bhyve driver:
- Introducr 'gop' video device type
- Add capabilities probing for the 'fbuf' device that's
responsible for graphics
- Update command builder routines to let users configure
domain's VNC via gop graphics.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
One of the conditions in qemuDomainDeviceCalculatePCIConnectFlags
was missing a break that could result it in falling through to
an incorrect codepath.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemuDomainAssignPCIAddresses() hardcoded the assumption
that the only way to support devices on a non-zero bus is
to add one or more pci-bridges; however, since we now
support a large selection of PCI controllers that can be
used instead, the assumption is no longer true.
Moreover, this check was always redundant, because the
only sensible time to check for the availability of
pci-bridge is when building the QEMU command line, and
such a check is of course already in place.
In fact, there were *two* such checks, but since one of
the two was relying on the incorrect assumption explained
above, and it was redundant anyway, it has been dropped.
Due to a logic error, the autofilling of USB port when a bus is
specified:
<address type='usb' bus='0'/>
does not work for non-hub devices on domain startup.
Fix the logic in qemuDomainAssignUSBPortsIterator to also
assign ports for USB addresses that do not yet have one.
https://bugzilla.redhat.com/show_bug.cgi?id=1374128
Commit 815d98a started auto-adding one hub if there are more USB devices
than available USB ports.
This was a strange choice, since there might be even more devices.
Before USB address allocation was implemented in libvirt, QEMU
automatically added a new USB hub if the old one was full.
Adjust the logic to try adding as many hubs as will be needed
to plug in all the specified devices.
https://bugzilla.redhat.com/show_bug.cgi?id=1410188
Surprisingly there was a virDomainPCIAddressReleaseAddr() function
already, but it was completely unused. Since we don't reserve entire
slots at once any more, there is no need to release entire slots
either, so we just replace the single call to
virDomainPCIAddressReleaseSlot() with a call to
virDomainPCIAddressReleaseAddr() and remove the now unused function.
The keen observer may be concerned that ...Addr() doesn't call
virDomainPCIAddressValidate(), as ...Slot() did. But really the
validation was pointless anyway - if the device hadn't been suitable
to be connected at that address, it would have failed validation
before every being reserved in the first place, so by definition it
will pass validation when it is being unplugged. (And anyway, even if
something "bad" happened and we managed to have a device incorrectly
at the given address, we would still want to be able to free it up for
use by a device that *did* validate properly).
This function is only called in two places, and the function itself is
just adding a single argument and calling
virDomainPCIAddressReserveNextAddr(), so we can remove it and instead
call virDomainPCIAddressReserveNextAddr() directly. (The main
motivation for doing this is to free up the name so that
qemuDomainPCIAddressReserveNextSlot() can be renamed in the next
patch, as its current name is now inaccurate and misleading).
All occurences of the former use fromConfig=true, and that's exactly
how virDomainPCIAddressReserveSlot() calls
virDomainPCIaddressReserveAddr(), so just use *Slot() so that *Addr()
can be made static to conf/domain_addr.c (both functions will be
renamed in upcoming patches).
fromConfig should be true if the caller wants
virDomainPCIAddressValidate() to loosen restrictions on its
interpretation of the pciConnectFlags. In particular, either
PCI_DEVICE or PCIE_DEVICE will be counted as equivalent to both, and
HOTPLUG will be ignored. In a few cases where libvirt was manually
overriding automatic address assignment, it was setting fromConfig to
false when validating the hardcoded manual override. This patch
changes those to fromConfig=true as a preemptive strike against any
future bugs that might otherwise surface.
Although setting virDomainPCIAddressReserveAddr()'s fromConfig=true is
correct when a PCI addres is coming from a domain's config, the *true*
purpose of the fromConfig argument is to lower restrictions on what
kind of device can plug into what kind of controller - if fromConfig
is true, then a PCIE_DEVICE can plug into a slot that is marked as
only compatible with PCI_DEVICE (and vice versa), and the HOTPLUG flag
is ignored.
For a long time there have been several calls to
virDomainPCIAddressReserveAddr() that have fromConfig incorrectly set
to false - it's correct that the addresses aren't coming from user
config, but they are coming from hardcoded exceptions in libvirt that
should, if anything, pay *even less* attention to following the
pciConnectFlags (under the assumption that the libvirt programmer knew
what they were doing).
See commit b87703cf7 for an example of an actual bug caused by the
incorrect setting of the "fromConfig" argument to
virDomainPCIAddressReserveAddr(). Although they haven't resulted in
any reported bugs, this patch corrects all the other incorrect
settings of fromConfig in calls to virDomainPCIAddressReserveAddr().
If a PCI device has VIR_PCI_CONNECT_AGGREGATE_SLOT set in its
pciConnectFlags, then during address assignment we allow multiple
instances of this type of device to be auto-assigned to multiple
functions on the same device. A slot is used for aggregating multiple
devices only if the first device assigned to that slot had
VIR_PCI_CONNECT_AGGREGATE_SLOT set. but any device types that have
AGGREGATE_SLOT set might be mix/matched on the same slot.
(NB: libvirt should never set the AGGREGATE_SLOT flag for a device
type that might need to be hotplugged. Currently it is only planned
for pcie-root-port and possibly other PCI controller types, and none
of those are hotpluggable anyway)
There aren't yet any devices that use this flag. That will be in a
later patch.
If there are multiple devices assigned to the different functions of a
single PCI slot, they will not work properly if the device at function
0 doesn't have its "multi" attribute turned on, so it makes sense for
libvirt to turn it on during PCI address assignment. Setting multi
then assures that the new setting is stored in the config (so it will
be used next time the domain is started), preventing any potential
problems in the case that a future change in the configuration
eliminates the devices on all non-0 functions (multi will still be set
for function 0 even though it is the only function in use on the slot,
which has no useful purpose, but also doesn't cause any problems).
(NB: If we were to instead just decide on the setting for
multifunction at runtime, a later removal of the non-0 functions of a
slot would result in a silent change in the guest ABI for the
remaining device on function 0 (although it may seem like an
inconsequential guest ABI change, it *is* a guest ABI change to turn
off the multi bit).)
setting reserveEntireSlot really accomplishes nothing - instead of
going to the trouble of computing the value for reserveEntireSlot and
then possibly setting *all* functions of the slot as in-use, we can
just set the in-use bit only for the specific function being used by a
device. Later we will know from the context (the PCI connect flags,
and whether we are reserving a specific address or asking for "the
next available") whether or not it is okay to allocate other functions
on the same slot.
Although it's not used yet, we allow specifying "-1" for the function
number when looking for the "next available slot" - this is going to
end up meaning "return the lowest available function in the slot, but
since we currently only provide a function from an otherwise unused
slot, "-1" ends up meaning "0".
When keeping track of which functions of which slots are allocated, we
will need to have more information than just the current bitmap with a
bit for each function that is currently stored for each slot in a
virDomainPCIAddressBus. To prepare for adding more per-slot info, this
patch changes "uint8_t slots" into "virDomainPCIAddressSlot slot", which
currently has a single member named "functions" that serves the same
purpose previously served directly by "slots".
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.
Users and applications can already opt-in by explicitly using
<address type='pci'/>
inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.
What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.
That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.
Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
Although nearly all host devices that are assigned to guests using
VFIO ("<hostdev>" devices in libvirt) are physically PCI Express
devices, until now libvirt's PCI address assignment has always
assigned them addresses on legacy PCI controllers in the guest, even
if the guest's machinetype has a PCIe root bus (e.g. q35 and
aarch64/virt).
This patch tries to assign them to an address on a PCIe controller
instead, when appropriate. First we do some preliminary checks that
might allow setting the flags without doing any extra work, and if
those conditions aren't met (and if libvirt is running privileged so
that it has proper permissions), we perform the (relatively) time
consuming task of reading the device's PCI config to see if it is an
Express device. If this is successful, the connect flags are set based
on the result, but if we aren't able to read the PCI config (most
likely due to the device not being present on the system at the time
of the check) we assume it is (or will be) an Express device, since
that is almost always the case anyway.
If libvirtd is running unprivileged, it can open a device's PCI config
data in sysfs, but can only read the first 64 bytes. But as part of
determining whether a device is Express or legacy PCI,
qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future
patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond
the first 64 bytes of the PCI config data and fails with an error log
if the read is unsuccessful.
In order to avoid creating a parallel "quiet" version of
virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down
through all the call chains that initialize the
qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver
pointer with the rest of the iterdata so that it can be used by
qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used
yet, but will be used in an upcoming patch (that detects Express vs
legacy PCI for VFIO assigned devices) to examine driver->privileged.
Open /dev/vhost-scsi, and record the resulting file descriptor, so that
the guest has access to the host device outside of the libvirt daemon.
Pass this information, along with data parsed from the XML file, to build
a device string for the qemu command line. That device string will be
for either a vhost-scsi-ccw device in the case of an s390 machine, or
vhost-scsi-pci for any others.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
For machinetypes with a pci-root bus (all legacy PCI), libvirt will
make a "fake" reservation for one extra slot prior to assigning
addresses to unaddressed PCI endpoint devices in the domain. This will
trigger auto-adding of a pci-bridge for the final device to be
assigned an address *if that device would have otherwise instead been
the last device on the last available pci-bridge*; thus it assures
that there will always be at least one slot left open in the domain's
bus topology for expansion (which is important both for hotplug (since
a new pci-bridge can't be added while the guest is running) as well as
for offline additions to the config (since adding a new device might
otherwise in some cases require re-addressing existing devices, which
we want to avoid)).
It's important to note that for the above case (legacy PCI), we must
check for the special case of all slots on all buses being occupied
*prior to assigning any addresses*, and avoid attempting to reserve
the extra address in that case, because there is no free address in
the existing topology, so no place to auto-add a pci-bridge for
expansion (i.e. it would always fail anyway). Since that condition can
only be reached by manual intervention, this is acceptable.
For machinetypes with pcie-root (Q35, aarch64 virt), libvirt's
methodology for automatically expanding the bus topology is different
- pcie-root-ports are plugged into slots (soon to be functions) of
pcie-root as needed, and the new endpoint devices are assigned to the
single slot in each pcie-root-port. This is done so that the devices
are, by default, hotpluggable (the slots of pcie-root don't support
hotplug, but the single slot of the pcie-root-port does). Since
pcie-root-ports can only be plugged into pcie-root, and we don't
auto-assign endpoint devices to the pcie-root slots, this means
topology expansion doesn't compete with endpoint devices for slots, so
we don't need to worry about checking for all "useful" slots being
free *prior* to assigning addresses to new endpoint devices - as a
matter of fact, if we attempt to reserve the open slots before the
used slots, it can lead to errors.
Instead this patch just reserves one slot for a "future potential"
PCIe device after doing the assignment for actual devices, but only
if the only PCI controller defined prior to starting address
assignment was pcie-root, and only if we auto-added at least one PCI
controller during address assignment. This assures two things:
1) that reserving the open slots will only be done when the domain is
initially defined, never at any time after, and
2) that if the user understands enough about PCI controllers that they
are adding them manually, that we don't mess up their plan by
adding extras - if they know enough to add one pcie-root-port, or
to manually assign addresses such that no pcie-root-ports are
needed, they know enough to add extra pcie-root-ports if they want
them (this could be called the "libguestfs clause", since
libguestfs needs to be able to create domains with as few
devices/controllers as possible).
This is set to reserve a single free port for now, but could be
increased in the future if public sentiment goes in that direction
(it's easy to increase later, but essentially impossible to decrease)
Real Q35 hardware has an ICH9 chip that includes several integrated
devices at particular addresses (see the file docs/q35-chipset.cfg in
the qemu source). libvirt already attempts to put the first two sets
of ich9 USB2 controllers it finds at 00:1D.* and 00:1A.* to match the
real hardware. This patch does the same for the ich9 "HD audio"
device.
The main inspiration for this patch is that currently the *only*
device in a reasonable "workstation" type virtual machine config that
requires a legacy PCI slot is the audio device, Without this patch,
the standard Q35 machine created by virt-manager will have a
dmi-to-pci-bridge and a pci-bridge just for the sound device; with the
patch (and if you change the sound device model from the default
"ich6" to "ich9"), the machine definition constructed by virt-manager
has absolutely no legacy PCI controllers - any legacy PCI devices
(e.g. video and sound) are on pcie-root as integrated devices.
Previously libvirt would only add pci-bridge devices automatically
when an address was requested for a device that required a legacy PCI
slot and none was available. This patch expands that support to
dmi-to-pci-bridge (which is needed in order to add a pci-bridge on a
machine with a pcie-root), and pcie-root-port (which is needed to add
a hotpluggable PCIe device). It does *not* automatically add
pcie-switch-upstream-ports or pcie-switch-downstream-ports (and
currently there are no plans for that).
Given the existing code to auto-add pci-bridge devices, automatically
adding pcie-root-ports is fairly straightforward. The
dmi-to-pci-bridge support is a bit tricky though, for a few reasons:
1) Although the only reason to add a dmi-to-pci-bridge is so that
there is a reasonable place to plug in a pci-bridge controller,
most of the time it's not the presence of a pci-bridge *in the
config* that triggers the requirement to add a dmi-to-pci-bridge.
Rather, it is the presence of a legacy-PCI device in the config,
which triggers auto-add of a pci-bridge, which triggers auto-add of
a dmi-to-pci-bridge (this is handled in
virDomainPCIAddressSetGrow() - if there's a request to add a
pci-bridge we'll check if there is a suitable bus to plug it into;
if not, we first add a dmi-to-pci-bridge).
2) Once there is already a single dmi-to-pci-bridge on the system,
there won't be a need for any more, even if it's full, as long as
there is a pci-bridge with an open slot - you can also plug
pci-bridges into existing pci-bridges. So we have to make sure we
don't add a dmi-to-pci-bridge unless there aren't any
dmi-to-pci-bridges *or* any pci-bridges.
3) Although it is strongly discouraged, it is legal for a pci-bridge
to be directly plugged into pcie-root, and we don't want to
auto-add a dmi-to-pci-bridge if there is already a pci-bridge
that's been forced directly into pcie-root.
Although libvirt will now automatically create a dmi-to-pci-bridge
when it's needed, the code still remains for now that forces a
dmi-to-pci-bridge on all domains with pcie-root (in
qemuDomainDefAddDefaultDevices()). That will be removed in a future
patch.
For now, the pcie-root-ports are added one to a slot, which is a bit
wasteful and means it will fail after 31 total PCIe devices (30 if
there are also some PCI devices), but helps keep the changeset down
for this patch. A future patch will have 8 pcie-root-ports sharing the
functions on a single slot.
Andrea had the right idea when he disabled the "reserve an extra
unused slot" bit for aarch64/virt. For *any* PCI Express-based
machine, it is pointless since 1) an extra legacy-PCI slot can't be
used for hotplug, since hotplug into legacy PCI slots doesn't work on
PCI Express machinetypes, and 2) even for "coldplug" expansion,
everybody will want to expand using Express controllers, not legacy
PCI.
This patch eliminates the extra slot reserve unless the system has a
pci-root (i.e. legacy PCI)
The nec-usb-xhci device (which is a USB3 controller) has always
presented itself as a PCI device when plugged into a legacy PCI slot,
and a PCIe device when plugged into a PCIe slot, but libvirt has
always auto-assigned it to a legacy PCI slot.
This patch changes that behavior to auto-assign to a PCIe slot on
systems that have pcie-root (e.g. Q35 and aarch64/virt).
Since we don't yet auto-create pcie-*-port controllers on demand, this
means a config with an nec-xhci USB controller that has no PCI address
assigned will also need to have an otherwise-unused pcie-*-port
controller specified:
<controller type='pci' model='pcie-root-port'/>
<controller type='usb' model='nec-xhci'/>
(this assumes there is an otherwise-unused slot on pcie-root to accept
the pcie-root-port)
The e1000e is an emulated network device based on the Intel 82574,
present in qemu 2.7.0 and later. Among other differences from the
e1000, it presents itself as a PCIe device rather than legacy PCI. In
order to get it assigned to a PCIe controller, this patch updates the
flags setting for network devices when the model name is "e1000e".
(Note that for some reason libvirt has never validated the network
device model names other than to check that there are no dangerous
characters in them. That should probably change, but is the subject of
another patch.)
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1343094
libvirt previously assigned nearly all devices to a "hotpluggable"
legacy PCI slot even on machines with a PCIe root bus (and even though
most such machines don't even support hotplug on legacy PCI slots!)
Forcing all devices onto legacy PCI slots means that the domain will
need a dmi-to-pci-bridge (to convert from PCIe to legacy PCI) and a
pci-bridge (to provide hotpluggable legacy PCI slots which, again,
usually aren't hotpluggable anyway).
To help reduce the need for these legacy controllers, this patch tries
to assign virtio-1.0-capable devices to PCIe slots whenever possible,
by setting appropriate connectFlags in
virDomainCalculateDevicePCIConnectFlags(). Happily, when that function
was written (just a few commits ago) it was created with a
"virtioFlags" argument, set by both of its callers, which is the
proper connectFlags to set for any virtio-*-pci device - depending on
the arch/machinetype of the domain, and whether or not the qemu binary
supports virtio-1.0, that flag will have either been set to PCI or
PCIe. This patch merely enables the functionality by setting the flags
for the device to whatever is in virtioFlags if the device is a
virtio-*-pci device.
NB: the first virtio video device will be placed directly on bus 0
slot 1 rather than on a pcie-root-port due to the override for primary
video devices in qemuDomainValidateDevicePCISlotsQ35(). Whether or not
to change that is a topic of discussion, but this patch doesn't change
that particular behavior.
NB2: since the slot must be hotpluggable, and pcie-root (the PCIe root
complex) does *not* support hotplug, this means that suitable
controllers must also be in the config (i.e. either pcie-root-port, or
pcie-downstream-port). For now, libvirt doesn't add those
automatically, so if you put virtio devices in a config for a qemu
that has PCIe-capable virtio devices, you'll need to add extra
pcie-root-ports yourself. That requirement will be eliminated in a
future patch, but for now, it's simple to do this:
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-root-port'/>
...
Partially Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1330024
This patch cleans up the connect flags for certain types/models of
devices that aren't PCI to return 0. In the future that may be used as
an indicator to the caller about whether or not a device needs a PCI
address. For now it's just ignored, except for in
virDomainPCIAddressEnsureAddr() - called during device hotplug - (and
in some cases actually needs to be re-set to PCI|HOTPLUGGABLE just in
case someone (in some old config) has manually set a PCI address for a
device that isn't PCI.
Before now, all the qemu hotplug functions assumed that all devices to
be hotplugged were legacy PCI endpoint devices
(VIR_PCI_CONNECT_TYPE_PCI_DEVICE). This worked out "okay", because all
devices *are* legacy PCI endpoint devices on x86/440fx machinetypes,
and hotplug didn't work properly on machinetypes using PCIe anyway
(hotplugging onto a legacy PCI slot doesn't work, and until commit
b87703cf any attempt to manually specify a PCIe address for a
hotplugged device would be erroneously rejected).
This patch makes all qemu hotplug operations honor the pciConnectFlags
set by the single all-knowing function
qemuDomainDeviceCalculatePCIConnectFlags(). This is done in 3 steps,
but in a single commit since we would have to touch the other points
at each step anyway:
1) add a flags argument to the hypervisor-agnostic
virDomainPCIAddressEnsureAddr() (previously it hardcoded
..._PCI_DEVICE)
2) add a new qemu-specific function qemuDomainEnsurePCIAddress() which
gets the correct pciConnectFlags for the device from
qemuDomainDeviceConnectFlags(), then calls
virDomainPCIAddressEnsureAddr().
3) in qemu_hotplug.c replace all calls to
virDomainPCIAddressEnsureAddr() with calls to
qemuDomainEnsurePCIAddress()
So in effect, we're putting a "shim" on top of all calls to
virDomainPCIAddressEnsureAddr() that sets the right pciConnectFlags.
Set pciConnectFlags in each device's DeviceInfo and then use those
flags later when validating existing addresses in
qemuDomainCollectPCIAddress() and when assigning new addresses with
qemuDomainPCIAddressReserveNextAddr() (rather than scattering the
logic about which devices need which type of slot all over the place).
Note that the exact flags set by
qemuDomainDeviceCalculatePCIConnectFlags() are different from the
flags previously set manually in qemuDomainCollectPCIAddress(), but
this doesn't matter because all validation of addresses in that case
ignores the setting of the HOTPLUGGABLE flag, and treats PCIE_DEVICE
and PCI_DEVICE the same (this lax checking was done on purpose,
because there are some things that we want to allow the user to
specify manually, e.g. assigning a PCIe device to a PCI slot, that we
*don't* ever want libvirt to do automatically. The flag settings that
we *really* want to match are 1) the old flag settings in
qemuDomainAssignDevicePCISlots() (which is HOTPLUGGABLE | PCI_DEVICE
for everything except PCI controllers) and 2) the new flag settings
done by qemuDomainDeviceCalculatePCIConnectFlags() (which are
currently exactly that - HOTPLUGGABLE | PCI_DEVICE for everything
except PCI controllers).
The lowest level function of this trio
(qemuDomainDeviceCalculatePCIConnectFlags()) aims to be the single
authority for the virDomainPCIConnectFlags to use for any given device
using a particular arch/machinetype/qemu-binary.
qemuDomainFillDevicePCIConnectFlags() sets info->pciConnectFlags in a
single device (unless it has no virDomainDeviceInfo, in which case
it's a NOP).
qemuDomainFillAllPCIConnectFlags() sets info->pciConnectFlags in all
devices that have a virDomainDeviceInfo
The latter two functions aren't called anywhere yet. This commit is
just making them available. Later patches will replace all the current
hodge-podge of flag settings with calls to this single authority.