The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that VirtioPciDevice, VfioPciDevice and VfioUserPciDevice have all
been moved to the new restore design, there's no need to keep the old
way around, therefore the restore() implementations for MsiConfig,
MsixConfig and PciConfiguration can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is some preliminatory work for moving both VfioUser and Vfio to the
new restore design.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.
It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When restoring a VM, the BAR type can be found directly from the
snapshot resources. It is more reliable than the previous method which
was using self.use_64bit_bar from VirtioPciDevice because at the time
the BARs are allocated, the VirtioDevice hasn't been restored yet,
meaning the way to determine the value of use_64bit_bar is wrong for a
device like vDPA. At this time, the device type is not known and relying
on the stored resources is the only reliable way.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With an updated Linux kernel the kernel reprograms the same table entry
in the MSI-X table multiple times. Fortunately it does so with entry
masked (the default.)
The costly part of MSI-X table reprogramming is going out to the host
kernel to update the GSI routing entries. This change makes three
optimisations.
1. If the table entry is unchanged: skip handking updates
2. Only update the GSI routing table if the table entry is unmasked
(this skips extra calls to the ioctl() for reprogramming the
entries.)
3. Only generate a message on entry unmasking if the global MSI-X enable
bit is set.
Fixes: #4273
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are PCI extended capabilities that can't be passed through the VM
as they would be unusable from a guest perspective. That's why we
introduce a way to patch what is returned to the guest when the PCI
configuration space is accessed. The list of patches is created from the
parsing of the extended capabilities in that case, and particularly
based on the presence of the SRIOV and Resizable BAR capabilities.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Steven Dake <sdake@lambdal.com>
If the BAR for the VFIO device is marked as prefetchable on the
underlying device ensure that the BAR exposed through PciConfiguration
is also marked as prefetchable.
Fixes problem where NVIDIA devices are not usable with PCI VFIO
passthrough. See related NVIDIA kernel driver bug:
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/344.
Fixes: #4451
Signed-off-by: Steven Dake <sdake@lambdal.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.
When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the VfioCommon implementation, the VfioUserPciDevice now
implements the Migratable trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the VfioCommon implementation, the VfioPciDevice now implements
the Migratable trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduces the common code to handle one aspect of the migration
support. Particularly, the ability to store VMM internal states related
to such device. The internal state of the device will happen later in a
dedicated patchset that will implement the VFIO migration API.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
So that we can save and restore the whole structure through snapshot and
restore operations.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We can return prematurely from 'map_mmio_regions()' (e.g. when a mmap call
failed for vfio or 'create_user_memory_region()' failed for vfio-user)
without updating the 'MmioRegion::user_memory_regions' with the
information of previous successful mmaps, which in turn would cause mmap
leaks particularly for the case of hotplug where the 'vmm' thread will
keep running. To fix the issue, let's keep 'MmioRegion::user_memory_regions'
updated right after successful mmap calls.
Fixes: #4068
Signed-off-by: Bo Chen <chen.bo@intel.com>
Reorganizing the code to leverage the same mechanics implemented for
vfio-user and aimed at supporting sparse memory mappings for a single
region.
Relying on the capabilities returned by the vfio-ioctls crate, we create
a list of sparse areas depending if we get SPARSE_MMAP or MSIX_MAPPABLE
capability, or a single sparse area in case we couldn't find any
capability.
The list of sparse areas is then used to create both the memory mappings
in the Cloud Hypervisor address space and the hypervisor user memory
regions.
This allowed for the simplification of the MmioRegion structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of always creating a single large mmap for the MMIO region of a
BAR, we create multiple mmaps for the BARs that need multiple kvm user
memory regions. In this way, we can simplify 'unmap_mmio_regions()' (by
reusing information kept from 'MmioRegion::user_memory_region').
Signed-off-by: Bo Chen <chen.bo@intel.com>
Similar to what's being supported for vfio devices, vfio-user devices
may also have BARs that need multiple kvm user memory regions,
e.g. device regions with `VFIO_REGION_INFO_CAP_SPARSE_MMAP`.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Extend VfioCommon to simplify the overall code, and also in preparation
for supporting the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the legacy interrupt manager. This
will be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to split the parsing functions into one function dedicated to
the actual parsing and a second function for initializing the interrupt
type. This will be useful on the restore path as the parsing won't be
needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding support for snapshot and restore to the MsiConfig structure, as
it will be needed part of VFIO migration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a list of resources is provided to allocate_bars(), it directly
means we're restoring some existing BARs. That's why we shouldn't share
the codepath that creates BARs from scratch as we don't need to interact
with the device to retrieve the information.
Whenever resources are provided, we simply iterate over the list of
possible BAR indexes and create the BARs if the resource could be found.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code was quite unclear regarding the type of index that was being
used regarding a BAR. This is improved by differenciating register
indexes and BAR indexes more clearly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the list of resources, VfioCommon is now able to allocate the
BARs at specific addresses. This will be useful for restoring VFIO and
vfio-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.
This causes the ioctl to disappear from the perf report for a boot of
the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Whenever a vfio-user device is dropped, the communication between the
VMM and the backend should be shutdown.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The same way we mask the writes coming from the guest to the message
control register related to MSI-X capability, let's do the same for MSI.
The point is to prevent the guest from writing to read-only bits.
The correct writable bits for MSI are only bits 0, 4, 5 and 6 of 2nd
16-bit word.
Those are:
* MSI Enable: 0
* Multiple Message Enable: 6-4
See "Table 7-39 Message Control Register for MSI" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
I incorrectly used the MSI message control register values for the mask
not the the MSI-X control registers.
The correct writable fields for MSI-X are only bits 14 and 15 of 2nd
16-bit word.
Those are:
* Function Mask: 14
* MSI-X Enable: 15
See "Table 7-47 Message Control Register for MSI-X" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The PCI spec specifies that only the following bits are writable:
16: MSI Enable
20,21,22: Multiple Message Enable
26: Extended Message Data Enable
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By having the DeviceNode storing a PciBdf, we simplify the internal code
as well as allow for custom Serialize/Deserialize implementation for the
PciBdf structure. These custom implementations let us display the PCI
s/b/d/f in a human readable format.
Fixes#3711
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>