214 Commits

Author SHA1 Message Date
Rob Bradford
5e52729453 misc: Automatically fix cargo clippy issues added in 1.65 (stable)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-14 14:27:19 +00:00
Sebastien Boeuf
748018ace3 vm-migration: Don't store the id as part of Snapshot structure
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
5b3bcfa233 vm-migration: Snapshot should have a unique SnapshotDataSection
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
30e421d2e5 pci: Remove unused restore() implementations
Now that VirtioPciDevice, VfioPciDevice and VfioUserPciDevice have all
been moved to the new restore design, there's no need to keep the old
way around, therefore the restore() implementations for MsiConfig,
MsixConfig and PciConfiguration can be removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
cc3706afe1 pci, vmm: Move VfioPciDevice and VfioUserPciDevice to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
1eac37bd5f pci: msi: Move MsiConfig to the new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
d6bf1f5eb0 pci: Move VfioCommon creation to a dedicated function
This is some preliminatory work for moving both VfioUser and Vfio to the
new restore design.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
eae8043890 pci, virtio-devices: Move VirtioPciDevice to the new restore design
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.

It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Wei Liu
c5bd8cabc4 pci: add safety comments
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-11-18 12:50:01 +00:00
Bo Chen
a9ec0f33c0 misc: Fix clippy issues
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-02 09:41:43 +01:00
Sebastien Boeuf
2b150ac2ea pci, virtio-devices: Restore proper BAR type
When restoring a VM, the BAR type can be found directly from the
snapshot resources. It is more reliable than the previous method which
was using self.use_64bit_bar from VirtioPciDevice because at the time
the BARs are allocated, the VirtioDevice hasn't been restored yet,
meaning the way to determine the value of use_64bit_bar is wrong for a
device like vDPA. At this time, the device type is not known and relying
on the stored resources is the only reliable way.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Rob Bradford
89a2562c5a pci: Optimise MSI-X table programming
With an updated Linux kernel the kernel reprograms the same table entry
in the MSI-X table multiple times. Fortunately it does so with entry
masked (the default.)

The costly part of MSI-X table reprogramming is going out to the host
kernel to update the GSI routing entries. This change makes three
optimisations.

1. If the table entry is unchanged: skip handking updates
2. Only update the GSI routing table if the table entry is unmasked
   (this skips extra calls to the ioctl() for reprogramming the
    entries.)
3. Only generate a message on entry unmasking if the global MSI-X enable
   bit is set.

Fixes: #4273

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-07 09:57:27 -07:00
Sebastien Boeuf
e45e3df64d pci: vfio: Filter out some PCI extended capabilities
There are PCI extended capabilities that can't be passed through the VM
as they would be unusable from a guest perspective. That's why we
introduce a way to patch what is returned to the guest when the PCI
configuration space is accessed. The list of patches is created from the
parsing of the extended capabilities in that case, and particularly
based on the presence of the SRIOV and Resizable BAR capabilities.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Steven Dake <sdake@lambdal.com>
2022-08-08 18:29:16 +02:00
Steven Dake
868d1f6902 pci: vfio: Preserve prefetchable BAR flag
If the BAR for the VFIO device is marked as prefetchable on the
underlying device ensure that the BAR exposed through PciConfiguration
is also marked as prefetchable.

Fixes problem where NVIDIA devices are not usable with PCI VFIO
passthrough. See related NVIDIA kernel driver bug:
 https://github.com/NVIDIA/open-gpu-kernel-modules/issues/344.

Fixes: #4451

Signed-off-by: Steven Dake <sdake@lambdal.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-08 14:12:09 +01:00
Rob Bradford
b57d7b258d build: Fix beta clippy issue (needless_return)
warning: unneeded `return` statement
   --> pci/src/vfio_user.rs:627:13
    |
627 | /             return Err(std::io::Error::new(
628 | |                 std::io::ErrorKind::Other,
629 | |                 format!("Region not found for 0x{:x}", gpa),
630 | |             ));
    | |_______________^
    |
    = note: `#[warn(clippy::needless_return)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return
help: remove `return`
    |
627 ~             Err(std::io::Error::new(
628 +                 std::io::ErrorKind::Other,
629 +                 format!("Region not found for 0x{:x}", gpa),
630 +             ))
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2716bc3311 build: Fix beta clippy issue (derive_partial_eq_without_eq)
warning: you are deriving `PartialEq` and can implement `Eq`
  --> vmm/src/serial_manager.rs:59:30
   |
59 | #[derive(Debug, Clone, Copy, PartialEq)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Sebastien Boeuf
81ba70a497 pci, vmm: Defer mapping VFIO MMIO regions on restore
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.

When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
7df7061610 pci, vmm: Add migratable support to vfio-user devices
Based on recent changes to VfioUserPciDevice, the vfio-user devices can
now be migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
c021dda267 pci, vmm: Add migratable support to VFIO devices
Based on recent changes to VfioPciDevice, the VFIO devices can now be
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
f48b05eee6 pci: vfio_user: Implement Migratable for VfioUserPciDevice
Based on the VfioCommon implementation, the VfioUserPciDevice now
implements the Migratable trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
3b59e57001 pci: vfio: Implement Migratable for VfioPciDevice
Based on the VfioCommon implementation, the VfioPciDevice now implements
the Migratable trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
49069d8474 pci: Implement Migratable for VfioCommon
Introduces the common code to handle one aspect of the migration
support. Particularly, the ability to store VMM internal states related
to such device. The internal state of the device will happen later in a
dedicated patchset that will implement the VFIO migration API.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
8eaefa6e8e pci: msix: Derive Versionize for MsixCap
So that we can save and restore the whole structure through snapshot and
restore operations.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
a1b996ac37 pci: msi: Make MsiCap field public
So that it can be accessed during a VM snapshot to store its state.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Bo Chen
06f57abdf9 pci: vfio: Fix potential mmap leaks from the mmio regions
We can return prematurely from 'map_mmio_regions()' (e.g. when a mmap call
failed for vfio or 'create_user_memory_region()' failed for vfio-user)
without updating the 'MmioRegion::user_memory_regions' with the
information of previous successful mmaps, which in turn would cause mmap
leaks particularly for the case of hotplug where the 'vmm' thread will
keep running. To fix the issue, let's keep 'MmioRegion::user_memory_regions'
updated right after successful mmap calls.

Fixes: #4068

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-06 08:57:16 +02:00
Sebastien Boeuf
1108bd1967 pci: vfio: Implement support for sparse mmap
Reorganizing the code to leverage the same mechanics implemented for
vfio-user and aimed at supporting sparse memory mappings for a single
region.

Relying on the capabilities returned by the vfio-ioctls crate, we create
a list of sparse areas depending if we get SPARSE_MMAP or MSIX_MAPPABLE
capability, or a single sparse area in case we couldn't find any
capability.

The list of sparse areas is then used to create both the memory mappings
in the Cloud Hypervisor address space and the hypervisor user memory
regions.

This allowed for the simplification of the MmioRegion structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 09:28:46 +02:00
Bo Chen
5bd7a1f03c pci: vfio_user: Simplify 'unmap_mmio_regions()'
Instead of always creating a single large mmap for the MMIO region of a
BAR, we create multiple mmaps for the BARs that need multiple kvm user
memory regions. In this way, we can simplify 'unmap_mmio_regions()' (by
reusing information kept from 'MmioRegion::user_memory_region').

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Bo Chen
25a38b25c4 pci: vfio_user: Create user memory regions based on sparse areas
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Bo Chen
bf39146caa pci: vfio_user: Allow a BAR to have multiple user memory regions
Similar to what's being supported for vfio devices, vfio-user devices
may also have BARs that need multiple kvm user memory regions,
e.g. device regions with `VFIO_REGION_INFO_CAP_SPARSE_MMAP`.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Sebastien Boeuf
4a99d3dbaf pci: Move VfioWrapper to VfioCommon
Extend VfioCommon to simplify the overall code, and also in preparation
for supporting the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Sebastien Boeuf
e6aa792c01 pci: Store legacy interrupt manager in VfioCommon
Extend VfioCommon structure to own the legacy interrupt manager. This
will be useful for implementing the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Sebastien Boeuf
eb6daa2fc3 pci: Store MSI interrupt manager in VfioCommon
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Sebastien Boeuf
f767e97fa5 pci: vfio: Split PCI capability parsing functions
We need to split the parsing functions into one function dedicated to
the actual parsing and a second function for initializing the interrupt
type. This will be useful on the restore path as the parsing won't be
needed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Sebastien Boeuf
f076819d81 pci: msi: Implement Snapshot for MsiConfig
Adding support for snapshot and restore to the MsiConfig structure, as
it will be needed part of VFIO migration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Sebastien Boeuf
37521ddff7 pci: vfio: Restore BARs in a more straightforward way
In case a list of resources is provided to allocate_bars(), it directly
means we're restoring some existing BARs. That's why we shouldn't share
the codepath that creates BARs from scratch as we don't need to interact
with the device to retrieve the information.

Whenever resources are provided, we simply iterate over the list of
possible BAR indexes and create the BARs if the resource could be found.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
11e9f43305 vmm: Use new Resource type PciBar
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
89218b6d1e pci: Replace BAR tuple with PciBarConfiguration
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
da95c0d784 pci: Clarify register index and BAR index
The code was quite unclear regarding the type of index that was being
used regarding a BAR. This is improved by differenciating register
indexes and BAR indexes more clearly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
5264d545dd pci, vmm: Extend PciDevice trait to support BAR relocation
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
6175cc0977 pci: vfio: Restore BARs at expected addresses
Relying on the list of resources, VfioCommon is now able to allocate the
BARs at specific addresses. This will be useful for restoring VFIO and
vfio-user devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
6e084572d4 pci, virtio: Make virtio-pci BAR restoration more generic
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Rob Bradford
ed87e42e6f vm-device, pci, devices: Remove InterruptSourceGroup::{un}mask
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.

This causes the ioctl to disappear from the perf report for a boot of
the VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-11 22:56:48 +01:00
Sebastien Boeuf
028961ac9f vfio_user: Shutdown the communication with the backend
Whenever a vfio-user device is dropped, the communication between the
VMM and the backend should be shutdown.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:45:26 +01:00
Sebastien Boeuf
23fb4fa26d pci: Allow only writable bits for MSI message control register
The same way we mask the writes coming from the guest to the message
control register related to MSI-X capability, let's do the same for MSI.

The point is to prevent the guest from writing to read-only bits.

The correct writable bits for MSI are only bits 0, 4, 5 and 6 of 2nd
16-bit word.

Those are:

* MSI Enable: 0
* Multiple Message Enable: 6-4

See "Table 7-39 Message Control Register for MSI" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf".

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-23 14:01:55 +01:00
Rob Bradford
a116add991 pci: configuration: Correctly mask MSI-X control register
I incorrectly used the MSI message control register values for the mask
not the the MSI-X control registers.

The correct writable fields for MSI-X are only bits 14 and 15 of 2nd
16-bit word.

Those are:

* Function Mask: 14
* MSI-X Enable: 15

See "Table 7-47 Message Control Register for MSI-X" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf"

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-22 16:33:49 +00:00
Wei Liu
6f571d5c07 pci: add debug output for enabling and disabling MSI-X
This helps debugging MSI-X issues.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-02-17 17:33:40 +01:00
Rob Bradford
9c6e7c4a4b pci: Support DWORD/4-byte writes to the MSI-X control register
The PCI spec does not specify that the access has to be of a specific
size.

Fixes: #3714

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-17 14:13:45 +00:00
Rob Bradford
d9eff12ba3 pci: Only allow writes to RW bits in MSI-X register
The PCI spec specifies that only the following bits are writable:

16: MSI Enable
20,21,22: Multiple Message Enable
26: Extended Message Data Enable

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-16 22:48:54 +00:00
Sebastien Boeuf
42b5d4a2f7 pci, vmm: Update DeviceNode to store PciBdf instead of u32
By having the DeviceNode storing a PciBdf, we simplify the internal code
as well as allow for custom Serialize/Deserialize implementation for the
PciBdf structure. These custom implementations let us display the PCI
s/b/d/f in a human readable format.

Fixes #3711

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-16 11:57:23 +00:00
Michael Zhao
1db7718589 pci, vmm: Pass PCI BDF to vfio and vfio_user
On AArch64, PCI BDF is used for devId in MSI-X routing entry.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-01-18 18:00:00 -08:00