We can return prematurely from 'map_mmio_regions()' (e.g. when a mmap call
failed for vfio or 'create_user_memory_region()' failed for vfio-user)
without updating the 'MmioRegion::user_memory_regions' with the
information of previous successful mmaps, which in turn would cause mmap
leaks particularly for the case of hotplug where the 'vmm' thread will
keep running. To fix the issue, let's keep 'MmioRegion::user_memory_regions'
updated right after successful mmap calls.
Fixes: #4068
Signed-off-by: Bo Chen <chen.bo@intel.com>
Reorganizing the code to leverage the same mechanics implemented for
vfio-user and aimed at supporting sparse memory mappings for a single
region.
Relying on the capabilities returned by the vfio-ioctls crate, we create
a list of sparse areas depending if we get SPARSE_MMAP or MSIX_MAPPABLE
capability, or a single sparse area in case we couldn't find any
capability.
The list of sparse areas is then used to create both the memory mappings
in the Cloud Hypervisor address space and the hypervisor user memory
regions.
This allowed for the simplification of the MmioRegion structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of always creating a single large mmap for the MMIO region of a
BAR, we create multiple mmaps for the BARs that need multiple kvm user
memory regions. In this way, we can simplify 'unmap_mmio_regions()' (by
reusing information kept from 'MmioRegion::user_memory_region').
Signed-off-by: Bo Chen <chen.bo@intel.com>
Similar to what's being supported for vfio devices, vfio-user devices
may also have BARs that need multiple kvm user memory regions,
e.g. device regions with `VFIO_REGION_INFO_CAP_SPARSE_MMAP`.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Extend VfioCommon to simplify the overall code, and also in preparation
for supporting the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the legacy interrupt manager. This
will be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to split the parsing functions into one function dedicated to
the actual parsing and a second function for initializing the interrupt
type. This will be useful on the restore path as the parsing won't be
needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding support for snapshot and restore to the MsiConfig structure, as
it will be needed part of VFIO migration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a list of resources is provided to allocate_bars(), it directly
means we're restoring some existing BARs. That's why we shouldn't share
the codepath that creates BARs from scratch as we don't need to interact
with the device to retrieve the information.
Whenever resources are provided, we simply iterate over the list of
possible BAR indexes and create the BARs if the resource could be found.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code was quite unclear regarding the type of index that was being
used regarding a BAR. This is improved by differenciating register
indexes and BAR indexes more clearly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the list of resources, VfioCommon is now able to allocate the
BARs at specific addresses. This will be useful for restoring VFIO and
vfio-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.
This causes the ioctl to disappear from the perf report for a boot of
the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rust 2021 edition has a few improvements over the 2018 edition. Migrate
the project to 2021 edition by following recommended migration steps.
Luckily, the code itself doesn't require fixing.
Bump MSRV to 1.56 as it is required by the 2021 edition. Also fix the
clap build dependency to make Cloud Hypervisor build again.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Whenever a vfio-user device is dropped, the communication between the
VMM and the backend should be shutdown.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The same way we mask the writes coming from the guest to the message
control register related to MSI-X capability, let's do the same for MSI.
The point is to prevent the guest from writing to read-only bits.
The correct writable bits for MSI are only bits 0, 4, 5 and 6 of 2nd
16-bit word.
Those are:
* MSI Enable: 0
* Multiple Message Enable: 6-4
See "Table 7-39 Message Control Register for MSI" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
I incorrectly used the MSI message control register values for the mask
not the the MSI-X control registers.
The correct writable fields for MSI-X are only bits 14 and 15 of 2nd
16-bit word.
Those are:
* Function Mask: 14
* MSI-X Enable: 15
See "Table 7-47 Message Control Register for MSI-X" from
"NCB-PCI_Express_Base_5.0r1.0-2019-05-22.pdf"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The PCI spec specifies that only the following bits are writable:
16: MSI Enable
20,21,22: Multiple Message Enable
26: Extended Message Data Enable
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By having the DeviceNode storing a PciBdf, we simplify the internal code
as well as allow for custom Serialize/Deserialize implementation for the
PciBdf structure. These custom implementations let us display the PCI
s/b/d/f in a human readable format.
Fixes#3711
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure Cloud Hypervisor relies on upstream and actively maintained
vfio-ioctls crate from the rust-vmm/vfio repository instead of the
deprecated version coming from rust-vmm/vfio-ioctls repository.
Fixes#3673
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>