Looking up devices on the port I/O bus is time consuming during the
boot at there is an O(lg n) tree lookup and the overhead from taking a
lock on the bus contents.
Avoid this by adding a fast path uses the hardcoded port address and
size and directs PCI config requests directly to the device.
Command line:
target/release/cloud-hypervisor --kernel ~/src/linux/vmlinux --cmdline "root=/dev/vda1 console=ttyS0" --serial tty --console off --disk path=~/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw --api-socket /tmp/api
PIO exit: 17913
PCI fast path: 17871
Percentage on fast path: 99.8%
perf before:
marvin:~/src/cloud-hypervisor (main *)$ perf report -g | grep resolve
6.20% 6.20% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve
perf after:
marvin:~/src/cloud-hypervisor (2021-09-17-ioapic-fast-path *)$ perf report -g | grep resolve
0.08% 0.08% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve
The compromise required to implement this fast path is bringing the
creation of the PciConfigIo device into the DeviceManager::new() so that
it can be used in the VmmOps struct which is created before
DeviceManager::create_devices() is called.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This message only occurs sporadically and so it should be included at
info!() level. Enhance the output to also include the BAR number.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The rust-vmm crates we're pulling from git have renamed their main
branches. We need to update the branch names we're giving to Cargo,
or people who don't have these dependencies cached will get errors
like this when trying to build:
error: failed to get `vm-fdt` as a dependency of package `arch v0.1.0 (/home/src/cloud-hypervisor/arch)`
Caused by:
failed to load source for dependency `vm-fdt`
Caused by:
Unable to update https://github.com/rust-vmm/vm-fdt?branch=master#031572a6
Caused by:
object not found - no match for id (031572a6edc2f566a7278f1e17088fc5308d27ab); class=Odb (9); code=NotFound (-3)
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This resolves an issue with hotplug -> removal -> hotplug of a vfio-user
device as the allocator was not updated with the now unused entries.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When mapping the region into the guest ensure that all the fields are
updated correctly as the unmap code path checks that they are set.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Taking advantage of the refactored VFIO code implement a new
VfioUserPciDevice that wraps the client for vfio-user and exposes the
BusDevice and PciDevice into the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This doesn't really affect the build as we ship a Cargo.lock with fixed
versions in. However for clarity it makes sense to use fixed versions
throughout and let dependabot update them.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After the refactoring to split the common VFIO code out for vfio-user
there were some inconsistencies in the error handling. Correct this so
that the error is independent of the transport (hardware vs user) VFIO
and migrate to anyhow/thiserror in the process. Some unused errors from
earlier refactoring have also been removed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Returning a reference is not possible for the vfio-user code as it is
constructed for the function.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rename the wrapper trait and structs since this will be used for more
than reading the PCI config.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the code to be used from a different module in the same
crate for vfio-user support.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split data that will need to be common between VfioPciDevice and
VfioUserPciDevice into a common struct. Currently this has no methods
but they will be added soon.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By splitting this into a trait with common code extracted then this
will allow extensive reuse of logic in the vfio-user version.
This commit also changed the order of parameters on
::write_config_dword() to place offset first to match the other
functions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The BAR calculation code was incorrect for calculating I/O BARs but also
has misleading comments (mixing bits and bytes, first and least
significant, etc).
This change adjusts the algorithm to more closely match the version
described in the PCI specification and takes advantage of Rust's binary
literals for ease of reading. Although this is slightly longer by
calculating the 64-bit and 32-bit paths separately I think this is
easier to read.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.
MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.
Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.
This fixes user memory region removal on MSHV.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The logic wasn't quite right, as it wasn't detecting BAR reprogramming
when the upper part of the address was identical. For instance, a BAR
moved from 0x7fc0000000 to 0x7fd0000000 wasn't detected properly.
The logic has been updated and cleaned up to fix this issue, which was
observed when running Windows guests. This fixes the network hotplug
support as well.
Fixes#1797Fixes#1798
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Simplify snapshot & restore code by using generics to specify helper
functions that take / make a Serialize / Deserialize struct
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `IORegion` contains a capitalized acronym
--> pci/src/configuration.rs:320:5
|
320 | IORegion = 0x01,
| ^^^^^^^^ help: consider making the acronym lowercase, except the initial letter (notice the capitalization): `IoRegion`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that virtio-mem devices can update VFIO mappings through dedicated
handlers, let's provide them from the DeviceManager.
Important to note these handlers should either be provided to virtio-mem
devices or to the unique virtio-iommu device. This must be mutually
exclusive.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of letting the VfioPciDevice take the decision on how/when to
perform the DMA mapping/unmapping, we move this to the DeviceManager
instead.
The point is to let the DeviceManager choose which guest memory regions
should be mapped or not. In particular, we don't want the virtio-mem
region to be mapped/unmapped as it will be virtio-mem device
responsibility to do so.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit moves both pci and vmm code from the internal vfio-ioctls
crate to the upstream one from the rust-vmm project.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VFIO device does not support MSI or MSI-X, the capabilities
should not be parsed, avoiding the exposure of unsupported capabilities.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure to propagate the error coming from VfioDevice when trying to
enable INTx, MSI or MSI-X interrutps.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for supporting the notifier function for the legacy
interrupt source group, we need this function to return an EventFd
instead of a reference to this same EventFd.
The reason is we can't return a reference when there's an Arc<Mutex<>>
involved in the call chain.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to be able to return the barrier from the code that prepares to
activate the virtio device. This triggered by a write to the
configuration fields stored in the PCI BAR. Since bars can be accessed
by both memory mapping and through PCI config I/O several prototypes
must be changed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.
This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.
See #1863
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is a new clippy check introduced in 1.47 which requires the use of
the matches!() macro for simple match blocks that return a boolean.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By looking at Linux kernel boot time, we identified that a lot of time
was spent registering and unregistering IRQ fds to KVM. This is not
efficient and certainly not a wrong behavior from the Linux kernel,
but rather a problem with the Cloud-Hypervisor's implementation of
MSI-X.
The way to fix this issue is by ensuring the initial conditions are
correct, which means the entire MSI-X vector table must be disabled
and masked. Additionally, each vector must be individually masked.
With these correct conditions, Linux won't start masking interrupt
vectors, and later unmask them since they will be seen as masked from
the beginning. This means the OS will simply have to unmask them when
needed, avoiding the extra operation.
Another aspect of this patch is to prevent Cloud-Hypervisor from
enabling (by registering IRQ fd) all vectors when either the global
'mask' or 'enable' bits are set. Instead, we can simply let the mask()
and unmask() operations take care of it if needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This fixes `cargo vendor` throwing an error
```
$ cargo vendor
error: failed to sync
Caused by:
found duplicate version of package `vfio-bindings v0.2.0` vendored from two sources:
source 1: https://github.com/rust-vmm/vfio-bindings#f08cbcbf
source 2: registry `https://github.com/rust-lang/crates.io-index`
```
Both sources are indeed same, the conflict is only cause by the
different URLs.
Signed-off-by: Anatol Belski <ab@php.net>