cloud-hypervisor/vmm
Sebastien Boeuf b3bef3adda vmm: acpi: Don't declare MMIO config space through PCI buses
The PCI buses should not declare the address space related to the MMIO
config space given it's already declared in the MCFG table and through
the motherboard device PNP0C02 in the DSDT table.

The PCI MMIO config region for the segment was being wrongly exposed as
part of the _CRS for the ACPI bus device (using Memory32Fixed). Exposing
it via this object was ineffectual as the equivalent entry in the
PNP0C02 (_SB_.MBRD) marked those ranges as not usable via the kernel.
Either way, with both devices used by the kernel, the kernel will not
try and use those memory ranges for the device BARs. However under
td-shim on TDX the PNP0C02 device is not on the permitted list of
devices so the the memory ranges were not marked as unusable resulting
in the kernel attempting to allocate BARs that collided with the PCI
MMIO configuration space.

This is based on the kernel documentation PCI/acpi-info.rst which relies
on ACPI and PCI Firmware specifications. And here are the interesting
quotes from this document:

"""
Prior to the addition of Extended Address Space descriptors, the failure
of Consumer/Producer meant there was no way to describe bridge registers
in the PNP0A03/PNP0A08 device itself. The workaround was to describe the
bridge registers (including ECAM space) in PNP0C02 catch-all devices.
With the exception of ECAM, the bridge register space is device-specific
anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need
to know about it.

PNP0C02 “motherboard” devices are basically a catch-all. There’s no
programming model for them other than “don’t use these resources for
anything else.” So a PNP0C02 _CRS should claim any address space that is
(1) not claimed by _CRS under any other device object in the ACPI
namespace and (2) should not be assigned by the OS to something else.

The address range reported in the MCFG table or by _CBA method (see
Section 4.1.3) must be reserved by declaring a motherboard resource. For
most systems, the motherboard resource would appear at the root of the
ACPI namespace (under _SB) in a node with a _HID of EISAID (PNP0C02),
and the resources in this case should not be claimed in the root PCI
bus’s _CRS. The resources can optionally be returned in Int15 E820 or
EFIGetMemoryMap as reserved memory but must always be reported through
ACPI as a motherboard resource.
"""

This change has been manually tested by running a VM with multiple
segments (4 segments), and by hotplugging an additional disk to the
segment number 2 (third segment).

From one shell:
"""
cloud-hypervisor \
    --cpus boot=1 \
    --memory size=1G \
    --kernel vmlinux \
    --cmdline "root=/dev/vda1 rw console=hvc0" \
    --disk path=jammy-server-cloudimg.raw \
    --api-socket /tmp/ch.sock \
    --platform num_pci_segments=4
"""

From another shell (after the VM is booted):
"""
ch-remote \
    --api-socket=/tmp/ch.sock \
    add-disk \
    path=test-disk.raw,id=disk2,pci_segment=2
"""

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-02 14:14:23 +02:00
..
src vmm: acpi: Don't declare MMIO config space through PCI buses 2022-09-02 14:14:23 +02:00
Cargo.toml build: bump clap from 3.2.19 to 3.2.20 2022-09-02 02:07:12 +00:00