We can return prematurely from 'map_mmio_regions()' (e.g. when a mmap call
failed for vfio or 'create_user_memory_region()' failed for vfio-user)
without updating the 'MmioRegion::user_memory_regions' with the
information of previous successful mmaps, which in turn would cause mmap
leaks particularly for the case of hotplug where the 'vmm' thread will
keep running. To fix the issue, let's keep 'MmioRegion::user_memory_regions'
updated right after successful mmap calls.
Fixes: #4068
Signed-off-by: Bo Chen <chen.bo@intel.com>
Reorganizing the code to leverage the same mechanics implemented for
vfio-user and aimed at supporting sparse memory mappings for a single
region.
Relying on the capabilities returned by the vfio-ioctls crate, we create
a list of sparse areas depending if we get SPARSE_MMAP or MSIX_MAPPABLE
capability, or a single sparse area in case we couldn't find any
capability.
The list of sparse areas is then used to create both the memory mappings
in the Cloud Hypervisor address space and the hypervisor user memory
regions.
This allowed for the simplification of the MmioRegion structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Similar to what's being supported for vfio devices, vfio-user devices
may also have BARs that need multiple kvm user memory regions,
e.g. device regions with `VFIO_REGION_INFO_CAP_SPARSE_MMAP`.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Extend VfioCommon to simplify the overall code, and also in preparation
for supporting the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the legacy interrupt manager. This
will be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to split the parsing functions into one function dedicated to
the actual parsing and a second function for initializing the interrupt
type. This will be useful on the restore path as the parsing won't be
needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a list of resources is provided to allocate_bars(), it directly
means we're restoring some existing BARs. That's why we shouldn't share
the codepath that creates BARs from scratch as we don't need to interact
with the device to retrieve the information.
Whenever resources are provided, we simply iterate over the list of
possible BAR indexes and create the BARs if the resource could be found.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code was quite unclear regarding the type of index that was being
used regarding a BAR. This is improved by differenciating register
indexes and BAR indexes more clearly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the list of resources, VfioCommon is now able to allocate the
BARs at specific addresses. This will be useful for restoring VFIO and
vfio-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
AArch64 does not use IOBAR, and current code of panics the whole VMM if
we need to allocate the IOBAR.
This commit checks if IOBAR is enabled before the arch conditional code
of IOBAR allocation and if the IOBAR is not enabled, we can just skip
the IOBAR allocation and do nothing.
Fixes: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3479
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Now that vfio-ioctls correctly exposes the list of capabilities related
to each region, Cloud Hypervisor can decide to mmap a region based on
the presence or absence of MSIX_MAPPABLE. Instead of blindly mmap'ing
the region, we check if the MSI-X table or PBA is present on the BAR,
and if that's the case, we look for MSIX_MAPPABLE.
If MSIX_MAPPABLE is present, we can go ahead and mmap the entire region.
If MSIX_MAPPABLE is not present, we simply ignore the mmap'ing of this
region as it wouldn't be supported.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since each segment must have a non-overlapping memory range associated
with it the device memory must be equally divided amongst all segments.
A new allocator is used for each segment to ensure that BARs are
allocated from the correct address ranges. This requires changes to
PciDevice::allocate/free_bars to take that allocator and when
reallocating BARs the correct allocator must be identified from the
ranges.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to get proper performance out of hardware that might be passed
through the VM with VFIO, we decide to try mapping any region marked as
mappable, ignoring the failure and moving on to the next region. When
the mapping succeeds, we establish a list of user memory regions so that
we enable only subparts of the global mapping through the hypervisor.
This allows MSI-X table and PBA to keep trapping accesses from the guest
so that the VMM can properly emulate MSI-X.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Taking advantage of the refactored VFIO code implement a new
VfioUserPciDevice that wraps the client for vfio-user and exposes the
BusDevice and PciDevice into the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After the refactoring to split the common VFIO code out for vfio-user
there were some inconsistencies in the error handling. Correct this so
that the error is independent of the transport (hardware vs user) VFIO
and migrate to anyhow/thiserror in the process. Some unused errors from
earlier refactoring have also been removed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Returning a reference is not possible for the vfio-user code as it is
constructed for the function.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rename the wrapper trait and structs since this will be used for more
than reading the PCI config.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the code to be used from a different module in the same
crate for vfio-user support.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split data that will need to be common between VfioPciDevice and
VfioUserPciDevice into a common struct. Currently this has no methods
but they will be added soon.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By splitting this into a trait with common code extracted then this
will allow extensive reuse of logic in the vfio-user version.
This commit also changed the order of parameters on
::write_config_dword() to place offset first to match the other
functions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The BAR calculation code was incorrect for calculating I/O BARs but also
has misleading comments (mixing bits and bytes, first and least
significant, etc).
This change adjusts the algorithm to more closely match the version
described in the PCI specification and takes advantage of Rust's binary
literals for ease of reading. Although this is slightly longer by
calculating the 64-bit and 32-bit paths separately I think this is
easier to read.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.
MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.
Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.
This fixes user memory region removal on MSHV.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `IORegion` contains a capitalized acronym
--> pci/src/configuration.rs:320:5
|
320 | IORegion = 0x01,
| ^^^^^^^^ help: consider making the acronym lowercase, except the initial letter (notice the capitalization): `IoRegion`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that virtio-mem devices can update VFIO mappings through dedicated
handlers, let's provide them from the DeviceManager.
Important to note these handlers should either be provided to virtio-mem
devices or to the unique virtio-iommu device. This must be mutually
exclusive.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of letting the VfioPciDevice take the decision on how/when to
perform the DMA mapping/unmapping, we move this to the DeviceManager
instead.
The point is to let the DeviceManager choose which guest memory regions
should be mapped or not. In particular, we don't want the virtio-mem
region to be mapped/unmapped as it will be virtio-mem device
responsibility to do so.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit moves both pci and vmm code from the internal vfio-ioctls
crate to the upstream one from the rust-vmm project.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VFIO device does not support MSI or MSI-X, the capabilities
should not be parsed, avoiding the exposure of unsupported capabilities.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure to propagate the error coming from VfioDevice when trying to
enable INTx, MSI or MSI-X interrutps.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for supporting the notifier function for the legacy
interrupt source group, we need this function to return an EventFd
instead of a reference to this same EventFd.
The reason is we can't return a reference when there's an Arc<Mutex<>>
involved in the call chain.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>