When looking for slots suitable for a PCI device, libvirt
might need to add an extra PCI controller: for pSeries guests,
we want that extra controller to be a PHB (pci-root) rather
than a PCI bridge.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
PCI bus has to be numbered sequentially, and no index can be
missing, so libvirt will fill in the blanks automatically for
the user.
Up until now, it has done so using either pci-bridge, for machine
types based on legacy PCI, or pcie-root-port, for machine types
based on PCI Express. Neither choice is good for pSeries guests,
where PHBs (pci-root) should be used instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
These tests demonstrate that, while it's now possible for the
user to create PHB explicitly and manually assign devices to
them, libvirt still defaults to extending the guest PCI
topology using PCI bridges and making suboptimal device
placement choices.
The next few commits will improve on these behaviors and the
tests outputs will automatically be updated to reflect this.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Some qemu arch/machine types have built in platform devices that
are always implicitly available. For platform serial devices, the
current code assumes that only old style -serial config can be
used for these devices.
Apparently though since -chardev was introduced, we can use -chardev
in these cases, like this:
-chardev pty,id=foo
-serial chardev:foo
Since -chardev enables all sorts of modern features, use this method
for platform devices.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Several tests are intending to test some serial/console related
bits but aren't setting QEMU_CAPS_CHARDEV. This will soon be enabled
unconditionally so let's add it ahead of time.
* q35-virt-manager-basic: Intended to test a virt-manager q35 config,
which will include a serial/console device
* console-compat*: console/serial XML compat handling
* bios: Needs a serial device for sgabios CLI
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
These tests are exercising old style -serial command lines. That
code will soon be removed, so drop these tests.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Several cases have incidental <serial> or <console> XML which aren't
the features being tested for. Upcoming changes will cause some
churn here, so instead drop these bits now.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
AFAIK there aren't any cases where we will/should hit the old code
path for our supported qemu versions, so drop the old code.
Massive test suite churn follows
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
AFAIK there aren't any qemu arch/machine types with platform parallel
devices that would require old style -parallel config, so we shouldn't
ever need this nowadays.
Remove a now redundant test
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
This demonstrates that the previous qemu caps changes will use
-chardev for pci-serial on aarch64 machvirt
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Check for the LOADPARM capabilility and potentially add a loadparm=x to
the "-machine" string for the QEMU command line.
Also add xml2argv test cases for loadparm.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Update the per device boot schema to add an optional loadparm parameter.
eg: <boot order='1' loadparm='2'/>
Extend the virDomainDeviceInfo to support loadparm option.
Modify the appropriate functions to parse loadparm from boot device xml.
Add the xml2xml test to validate the field.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
When added in multiple previous commits, it was used only with -device
qxl(-vga), but for some QEMUs (< 1.6) we need to add this
functionality when using -vga qxl as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1283207
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1214369
My fix 671d18594f was incomplete. If domain doesn't have
hugepages enabled, because of missing condition we would still be
putting hugepages path onto qemu cmd line. Clean up the
conditions so that it's more visible next time.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1214369
Consider the following XML:
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='1'/>
</hugepages>
<source type='file'/>
<access mode='shared'/>
</memoryBacking>
<numa>
<cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
<cell id='1' cpus='4-7' memory='512000' unit='KiB'/>
</numa>
The following cmd line is generated:
-object
memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,
share=yes,size=524288000 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0
-object
memory-backend-file,id=ram-node1,mem-path=/var/lib/libvirt/qemu/ram,
share=yes,size=524288000 -numa node,nodeid=1,cpus=4-7,memdev=ram-node1
This is obviously wrong as for node 1 hugepages should have been
used. The hugepages configuration is more specific than <source
type='file'/>.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We have couple of hugepage enabled domains for qemuxml2argvtest.
Unfortunately, often when adding a test case there I forget to
add it to xml2xml test too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit 824272cb28 attempted to fix escaping of characters in unix
socket path but it was wrong. We need to escape only ',', there is
no escape character for '='.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
There are currently some limitations in the emulated GICv3
that make it unsuitable as a default. Use GICv2 instead.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1450433
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Currently we consider all UNIX paths with specific prefix as generated
by libvirt, but that's a wrong assumption. Let's make the detection
better by actually checking whether the whole path matches one of the
paths that we generate or generated in the past.
The UNIX path isn't stored in config XML since libvirt-1.3.1.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1446980
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Add kernel_irqchip=split/on to the QEMU command line
and a capability that looks for it in query-command-line-options
output. For the 'split' option, use a version check
since it cannot be reasonably probed.
https://bugzilla.redhat.com/show_bug.cgi?id=1427005
Add a new <ioapic> element with a driver attribute.
Possible values are qemu and kvm. With 'qemu', the I/O
APIC can be put in the userspace even for KVM domains.
https://bugzilla.redhat.com/show_bug.cgi?id=1427005
This is a USB3 controller and it's a better choice than piix3-uhci.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Andrea Bolognani <abologna@redhat.com>
This patch maps /domain/cpu/cache element into -cpu parameters:
- <cache mode='passthrough'/> is translated to host-cache-info=on
- <cache level='3' mode='emulate'/> is transformed into l3-cache=on
- <cache mode='disable'/> is turned in host-cache-info=off,l3-cache=off
Any other <cache> element is forbidden.
The tricky part is detecting whether QEMU supports the CPU properties.
The 'host-cache-info' property is introduced in v2.4.0-1389-ge265e3e480,
earlier QEMU releases enabled host-cache-info by default and had no way
to disable it. If the property is present, it defaults to 'off' for any
QEMU until at least 2.9.0.
The 'l3-cache' property was introduced later by v2.7.0-200-g14c985cffa.
Earlier versions worked as if l3-cache=off was passed. For any QEMU
until at least 2.9.0 l3-cache is 'off' by default.
QEMU 2.9.0 was the first release which supports probing both properties
by running device-list-properties with typename=host-x86_64-cpu. Older
QEMU releases did not support device-list-properties command for CPU
devices. Thus we can't really rely on probing them and we can just use
query-cpu-model-expansion QMP command as a witness.
Because the cache property probing is only reliable for QEMU >= 2.9.0
when both are already supported for quite a few releases, we let QEMU
report an error if a specific cache mode is explicitly requested. The
other mode (or both if a user requested CPU cache to be disabled) is
explicitly turned off for QEMU >= 2.9.0 to avoid any surprises in case
the QEMU defaults change. Any older QEMU already turns them off so not
doing so explicitly does not make any harm.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We are currently parsing only rx/frames/max because that's the only
value that makes sense for us. The tun device just added support for
this one and the others are only supported by hardware devices which
we don't need to worry about as the only way we'd pass those to the
domain is using <hostdev/> or <interface type='hostdev'/>. And in
those cases the guest can modify the settings itself.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Our test data used a lot of different qemu binary paths and some
of them were based on downstream systems.
Note that there is one file where I had to add "accel=kvm" because
the qemuargv2xml code parses "/usr/bin/kvm" as virt type="kvm".
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The virt type for QEMU can be modified by -machine attribute "accel"
so there is no need to have different QEMU binary paths.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Like all devices, add the 'id' option for mdevs as well. Patch also
adjusts the test accordingly.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1438431
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Depending on the architecture, requirements for ACPI and UEFI can
be different; more specifically, while on x86 UEFI requires ACPI,
on aarch64 it's the other way around.
Enforce these requirements when validating the domain, and make
the error message more accurate by mentioning that they're not
necessarily applicable to all architectures.
Several aarch64 test cases had to be tweaked because they would
have failed the validation step otherwise.
The capabilities used in test cases should match those used
during normal operation for the tests to make any sense.
This results in the generated command line for a few test
cases (most notably non-x86 test cases that were wrongly
assuming they could use -no-acpi) changing.
This reverts commit c2e60ad0e5.
Turns out this check is excessively strict: there are ways
other than <memtune><hard_limit> to raise the memory locking
limit for QEMU processes, one prominent example being
tweaking /etc/security/limits.conf.
Partially-resolves: https://bugzilla.redhat.com/1431793
QEMU allows for TSC frequency to be explicitly set to enable migration
with invtsc (migration fails if the destination QEMU cannot set the
exact same frequency used when starting the domain on the source host).
Libvirt already supports setting the TSC frequency in the XML using
<clock>
<timer name='tsc' frequency='1234567890'/>
</clock>
which will be transformed into
-cpu Model,tsc-frequency=1234567890
QEMU command line.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We want pcie-root-ports to be used when available in QEMU,
but at the same time we need to ensure that hosts running
older QEMU releases keep working and that the user can
override the default at any time.
Add a comment for the original pcie-root-port test cases
to make it clear how these new test cases are different.
For NVDIMM devices it is optionally possible to specify the size
of internal storage for namespaces. Namespaces are a feature that
allows users to partition the NVDIMM for different uses.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Now that NVDIMM has found its way into libvirt, users might want
to fine tune some settings for each module separately. One such
setting is 'share=on|off' for the memory-backend-file object.
This setting - just like its name suggest already - enables
sharing the nvdimm module with other applications. Under the hood
it controls whether qemu mmaps() the file as MAP_PRIVATE or
MAP_SHARED.
Yet again, we have such config knob in domain XML, but it's just
an attribute to numa <cell/>. This does not give fine enough
tuning on per-memdevice basis so we need to have the attribute
for each device too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
So, majority of the code is just ready as-is. Well, with one
slight change: differentiate between dimm and nvdimm in places
like device alias generation, generating the command line and so
on.
Speaking of the command line, we also need to append 'nvdimm=on'
to the '-machine' argument so that the nvdimm feature is
advertised in the ACPI tables properly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
NVDIMM is new type of memory introduced into QEMU 2.6. The idea
is that we have a Non-Volatile memory module that keeps the data
persistent across domain reboots.
At the domain XML level, we already have some representation of
'dimm' modules. Long story short, NVDIMM will utilize the
existing <memory/> element that lives under <devices/> by adding
a new attribute 'nvdimm' to the existing @model and introduce a
new <path/> element for <source/> while reusing other fields. The
resulting XML would appear as:
<memory model='nvdimm'>
<source>
<path>/tmp/nvdimm</path>
</source>
<target>
<size unit='KiB'>523264</size>
<node>0</node>
</target>
<address type='dimm' slot='0'/>
</memory>
So far, this is just a XML parser/formatter extension. QEMU
driver implementation is in the next commit.
For more info on NVDIMM visit the following web page:
http://pmem.io/
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
While reviewing a patch from Andrea that modified this test case, I
realized that although it was "properly failing" (it's a negative
test), that it was failing for the wrong reason (the MULTIFUNCTION cap
wasn't set in the test case, so it was saying that multifunction=on
wasn't supported by the QEMU binary; instead it should have been
complaining that it had run out of PCI slots of the appropriate type
and couldn't automatically add any more).
This improper failure had started when I added the patch to
automatically aggregate pcie-root-ports onto multiple functions of
each pcie-root slot, but I hadn't noticed it because the test still
failed.
This patch corrects the test case to 1) set the MULTIFUNCTION flag in
the caps, and 2) attempt to add 241 pcie-root-ports to a domain. Since
there are 30 slots available on a pcie-root (slot 0 is reserved, and
slot 31 is used by the integrated SATA controller), and a
pcie-root-port can only be placed on a function of a slot on
pcie-root, the maximum number of pcie-root-ports in any domain is 240.
virQEMUCapsHasPCIMultiBus() performs a version check on
the QEMU binary to figure out whether multiple buses are
supported, so to get the correct aliases assigned when
dealing with pSeries guests we need to spoof the version
accordingly in the test suite.
Up until a while ago, libvirt would automatically add a legacy
PCI controllers combo (dmi-to-pci-bridge + pci-bridge) to any
PCIe machine type (x86_64/q35 and aarch64/virt).
As a result, a number of input and output files in the test
suite ended up containing the legacy PCI controllers, even
though they are not needed or in any way relevant to the
feature being tested.
Get rid of most of the occurrences. Most of the time, this
just means removing the controllers from the input file and
regenerating the output files; in a few instances, some
minor tweaking is performed on the input file, most notably
removing the memory balloon: as memory balloon support was
not the scope of the test being changed, there is no loss
of test coverage from doing so.
Several occurrences of the legacy PCI controllers remain in
the test suite, both because removing their usage would have
required even more tweaking, and because we still want to
have coverage of this perfectly valid combination.
Add a new attribute 'rendernode' to <gl> spice element.
Give it to QEMU if qemu supports it (queued for 2.9).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Due to a logic error, the autofilling of USB port when a bus is
specified:
<address type='usb' bus='0'/>
does not work for non-hub devices on domain startup.
Fix the logic in qemuDomainAssignUSBPortsIterator to also
assign ports for USB addresses that do not yet have one.
https://bugzilla.redhat.com/show_bug.cgi?id=1374128
This patch add support for file memory backing on numa topology.
The specified access mode in memoryBacking can be overriden
by specifying token memAccess in numa cell.
This part introduces new xml elements for file based
memorybacking support and their parsing.
(It allows vhost-user to be used without hugepages.)
New xml elements:
<memoryBacking>
<source type="file|anonymous"/>
<access mode="shared|private"/>
<allocation mode="immediate|ondemand"/>
</memoryBacking>
In order for memory locking to work, the hard limit on memory
locking (and usage) has to be set appropriately by the user.
The documentation mentions the requirement already: with this
patch, it's going to be enforced by runtime checks as well,
by forbidding a non-compliant guest from being defined as well
as edited and started.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1316774
Commit 815d98a started auto-adding one hub if there are more USB devices
than available USB ports.
This was a strange choice, since there might be even more devices.
Before USB address allocation was implemented in libvirt, QEMU
automatically added a new USB hub if the old one was full.
Adjust the logic to try adding as many hubs as will be needed
to plug in all the specified devices.
https://bugzilla.redhat.com/show_bug.cgi?id=1410188
So far we allow to set MTU for libvirt networks. However, not all
domain interfaces have to be plugged into a libvirt network and
even if they are, they might want to have a different MTU (e.g.
for testing purposes).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Set the VIR_PCI_CONNECT_AGGREGATE_SLOT flag for pcie-root-ports so
that they will be assigned to all the functions on a slot.
Some qemu test case outputs had to be adjusted due to the
pcie-root-ports now being put on multiple functions.
If there are multiple devices assigned to the different functions of a
single PCI slot, they will not work properly if the device at function
0 doesn't have its "multi" attribute turned on, so it makes sense for
libvirt to turn it on during PCI address assignment. Setting multi
then assures that the new setting is stored in the config (so it will
be used next time the domain is started), preventing any potential
problems in the case that a future change in the configuration
eliminates the devices on all non-0 functions (multi will still be set
for function 0 even though it is the only function in use on the slot,
which has no useful purpose, but also doesn't cause any problems).
(NB: If we were to instead just decide on the setting for
multifunction at runtime, a later removal of the non-0 functions of a
slot would result in a silent change in the guest ABI for the
remaining device on function 0 (although it may seem like an
inconsequential guest ABI change, it *is* a guest ABI change to turn
off the multi bit).)
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.
Users and applications can already opt-in by explicitly using
<address type='pci'/>
inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.
What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.
That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.
Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
Add a test case for when the QEMU_CAPS_NO_KVM_PIT capability is set.
This capability is mutually exclusive to QEMU_CAPS_KVM_PIT_TICK_POLICY
and results in the same output regardless of whether "discard" or
"delay" was specified in the guest XML for 'tickpolicy'.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Separate out the "policy=discard" into it's own specific
qemu command line.
We'll rename "kvm-pit-device" test case to be "kvm-pit-discard"
since it has the syntax we'd be using.
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
By a mistake, for the VIR_DOMAIN_TIMER_TICKPOLICY_DELAY qemu
command line creation, 'discard' was used instead of 'delay'
in commit id '1569fa14'.
Test "kvm-pit-delay" is fixed accordingly to show the correct
option being generated.
Remove the (now) redundant kvm-pit-device tests. As it turns
out there is no need to specify both QEMU_CAPS_NO_KVM_PIT and
QEMU_CAPS_KVM_PIT_TICK_POLICY since they are mutually exclusive
and "kvm-pit-device" becomes just the same as "kvm-pit-delay".
Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Qemu has abandoned the +/-feature syntax in favor of key=value. Some
architectures (s390) do not support +/-feature. So we update libvirt to handle
both formats.
If we detect a sufficiently new Qemu (indicated by support for qmp
query-cpu-model-expansion) we use key=value else we fall back to +/-feature.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Add tests for controller based disks to check disk address compatibility
with disk bus types.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
If you've ever tried running a huge page backed guest under
different user than in qemu.conf, you probably failed. Problem is
even though we have corresponding APIs in the security drivers,
there's no implementation and thus we don't relabel the huge page
path. But even if we did, so far all of the domains share the
same path:
/hugepageMount/libvirt/qemu
Our only option there would be to set 0777 mode on the qemu dir
which is totally unsafe. Therefore, we can create dir on
per-domain basis, i.e.:
/hugepageMount/libvirt/qemu/domainName
and chown domainName dir to the user that domain is configured to
run under.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add in the block I/O throttling group parameter to the command line
if supported. If not supported, fail command creation.
Add the xml2argvtest for testing.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Modify _virDomainBlockIoTuneInfo and rng schema to support the group_name
option for iotune throttling. Document the new value.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add test cases for address conflicts between disks and hostdevs that are
using drive addresses.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Don't use duplicate disk addresses in test cases unless it's useful. At
least the test case will break once we have a check for uniqueness of
addresses at time of domain definition.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
With the QEMU components in place, provide the XML parsing to
invoke that code when given the following XML snippet:
<hostdev mode='subsystem' type='scsi_host'>
<source protocol='vhost' wwpn='naa.501234567890abcd'/>
</hostdev>
An optional address element can be specified within the hostdev
(pick CCW or PCI as necessary):
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0625'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
Add basic vhost-scsi tests which were cloned from hostdev-scsi-virtio-scsi
in both xml2argv and xml2xml. Added ones for both vhost-scsi-ccw and
vhost-scsi-pci since the syntaxes are slightly different between them.
Also adjusted the docs to describe the changes.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
For machinetypes with a pci-root bus (all legacy PCI), libvirt will
make a "fake" reservation for one extra slot prior to assigning
addresses to unaddressed PCI endpoint devices in the domain. This will
trigger auto-adding of a pci-bridge for the final device to be
assigned an address *if that device would have otherwise instead been
the last device on the last available pci-bridge*; thus it assures
that there will always be at least one slot left open in the domain's
bus topology for expansion (which is important both for hotplug (since
a new pci-bridge can't be added while the guest is running) as well as
for offline additions to the config (since adding a new device might
otherwise in some cases require re-addressing existing devices, which
we want to avoid)).
It's important to note that for the above case (legacy PCI), we must
check for the special case of all slots on all buses being occupied
*prior to assigning any addresses*, and avoid attempting to reserve
the extra address in that case, because there is no free address in
the existing topology, so no place to auto-add a pci-bridge for
expansion (i.e. it would always fail anyway). Since that condition can
only be reached by manual intervention, this is acceptable.
For machinetypes with pcie-root (Q35, aarch64 virt), libvirt's
methodology for automatically expanding the bus topology is different
- pcie-root-ports are plugged into slots (soon to be functions) of
pcie-root as needed, and the new endpoint devices are assigned to the
single slot in each pcie-root-port. This is done so that the devices
are, by default, hotpluggable (the slots of pcie-root don't support
hotplug, but the single slot of the pcie-root-port does). Since
pcie-root-ports can only be plugged into pcie-root, and we don't
auto-assign endpoint devices to the pcie-root slots, this means
topology expansion doesn't compete with endpoint devices for slots, so
we don't need to worry about checking for all "useful" slots being
free *prior* to assigning addresses to new endpoint devices - as a
matter of fact, if we attempt to reserve the open slots before the
used slots, it can lead to errors.
Instead this patch just reserves one slot for a "future potential"
PCIe device after doing the assignment for actual devices, but only
if the only PCI controller defined prior to starting address
assignment was pcie-root, and only if we auto-added at least one PCI
controller during address assignment. This assures two things:
1) that reserving the open slots will only be done when the domain is
initially defined, never at any time after, and
2) that if the user understands enough about PCI controllers that they
are adding them manually, that we don't mess up their plan by
adding extras - if they know enough to add one pcie-root-port, or
to manually assign addresses such that no pcie-root-ports are
needed, they know enough to add extra pcie-root-ports if they want
them (this could be called the "libguestfs clause", since
libguestfs needs to be able to create domains with as few
devices/controllers as possible).
This is set to reserve a single free port for now, but could be
increased in the future if public sentiment goes in that direction
(it's easy to increase later, but essentially impossible to decrease)
Real Q35 hardware has an ICH9 chip that includes several integrated
devices at particular addresses (see the file docs/q35-chipset.cfg in
the qemu source). libvirt already attempts to put the first two sets
of ich9 USB2 controllers it finds at 00:1D.* and 00:1A.* to match the
real hardware. This patch does the same for the ich9 "HD audio"
device.
The main inspiration for this patch is that currently the *only*
device in a reasonable "workstation" type virtual machine config that
requires a legacy PCI slot is the audio device, Without this patch,
the standard Q35 machine created by virt-manager will have a
dmi-to-pci-bridge and a pci-bridge just for the sound device; with the
patch (and if you change the sound device model from the default
"ich6" to "ich9"), the machine definition constructed by virt-manager
has absolutely no legacy PCI controllers - any legacy PCI devices
(e.g. video and sound) are on pcie-root as integrated devices.