Commit Graph

202 Commits

Author SHA1 Message Date
Rob Bradford
43365ade2e vmm, pci: Implement virtio-mem support for vfio-user
Implement the infrastructure that lets a virtio-mem device map the guest
memory into the device. This is necessary since with virtio-mem zones
memory can be added or removed and the vfio-user device must be
informed.

Fixes: #3025

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-21 15:42:49 +01:00
Rob Bradford
e9d67dc405 vmm: pci: Move creation of vfio_user::Client to DeviceManager
By moving this from the VfioUserPciDevice to DeviceManager the client
can be reused for handling DMA mapping behind an IOMMU.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-21 15:42:49 +01:00
Rob Bradford
0faa7afac2 vmm: Add fast path for PCI config IO port
Looking up devices on the port I/O bus is time consuming during the
boot at there is an O(lg n) tree lookup and the overhead from taking a
lock on the bus contents.

Avoid this by adding a fast path uses the hardcoded port address and
size and directs PCI config requests directly to the device.

Command line:
target/release/cloud-hypervisor --kernel ~/src/linux/vmlinux --cmdline "root=/dev/vda1 console=ttyS0" --serial tty --console off --disk path=~/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw --api-socket /tmp/api

PIO exit: 17913
PCI fast path: 17871
Percentage on fast path: 99.8%

perf before:

marvin:~/src/cloud-hypervisor (main *)$ perf report -g | grep resolve
     6.20%     6.20%  vcpu0            cloud-hypervisor    [.] vm_device:🚌:Bus::resolve

perf after:

marvin:~/src/cloud-hypervisor (2021-09-17-ioapic-fast-path *)$ perf report -g | grep resolve
     0.08%     0.08%  vcpu0            cloud-hypervisor    [.] vm_device:🚌:Bus::resolve

The compromise required to implement this fast path is bringing the
creation of the PciConfigIo device into the DeviceManager::new() so that
it can be used in the VmmOps struct which is created before
DeviceManager::create_devices() is called.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-17 17:09:45 +01:00
Rob Bradford
64e217cf39 pci: configuration: Upgrade log level of PCI BAR reprogramming message
This message only occurs sporadically and so it should be included at
info!() level. Enhance the output to also include the BAR number.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-03 09:30:55 -07:00
Rob Bradford
e3487c0146 pci: vfio_user: Free BARs associated with vfio-user device
This resolves an issue with hotplug -> removal -> hotplug of a vfio-user
device as the allocator was not updated with the now unused entries.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-25 09:58:13 -07:00
Rob Bradford
1240ef3261 pci: vfio_user: Update all fields in MmioRegion on map
When mapping the region into the guest ensure that all the fields are
updated correctly as the unmap code path checks that they are set.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-12 13:19:04 +01:00
Rob Bradford
ed53c74ca9 pci: vfio_user: Fix region start calculation in unmap_mmio_regions()
The offset on the fd should not be used with the GPA.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-12 13:19:04 +01:00
Rob Bradford
0b5c680d15 pci: vfio_user: Implement PciDevice::move_bar()
When the BAR is moved location then update the BAR address and the guest
mapping.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-12 13:19:04 +01:00
Rob Bradford
9254b74c6d pci: Add support for vfio-user PCI devices
Taking advantage of the refactored VFIO code implement a new
VfioUserPciDevice that wraps the client for vfio-user and exposes the
BusDevice and PciDevice into the VMM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Rob Bradford
cdfc177347 pci: vfio: Cleanup error handling
After the refactoring to split the common VFIO code out for vfio-user
there were some inconsistencies in the error handling. Correct this so
that the error is independent of the transport (hardware vs user) VFIO
and migrate to anyhow/thiserror in the process. Some unused errors from
earlier refactoring have also been removed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
51ceae9131 pci: vfio: Make get_irq_info() return a non-reference
Returning a reference is not possible for the vfio-user code as it is
constructed for the function.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
1997152ee1 pci: vfio: Move {read,write}_config_register() to VfioCommon
These functions are used for the implementation of PciDevice.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
a5f4d79547 pci: vfio: Move read_bar()/write_bar() to VfioCommon
This also required the function they use (unmasq_irq()) to be added to
the wrapper.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
2ff193456d pci: vfio: Move find_region() to VfioCommon
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
ecc8382ff0 pci: vfio: Move interrupt handling to VfioCommon
The interrupt handling code can be reused with the vfio-user
implementation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
521a11a110 pci: vfio: Move all capability handling to VfioCommon
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
60d054519e pci: vfio: Extend Vfio trait to handle region read/write
This allows the config code to be implemented in terms of that
primitive.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
dc35dac306 pci: vfio: Generalise VfioPciConfig trait wrapper
Rename the wrapper trait and structs since this will be used for more
than reading the PCI config.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
ec1f7189da pci: vfio: Increase visibility of VfioCommon API
This allows the code to be used from a different module in the same
crate for vfio-user support.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
f5353c5b08 pci: configuration: Derive Debug for PciBarRegionType
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
2a76a589c3 pci: vfio: Move parse_msi(x)_capabilities to VfioCommon
This capability parsing logic will be useful in the vfio-user
implementation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
22275c3462 pci: vfio: Move allocate_bar & free_bars to VfioCommon
This logic can then be shared with the vfio-user implementation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
d27ea34a2d pci: vfio: Split common data into VfioCommon struct
Split data that will need to be common between VfioPciDevice and
VfioUserPciDevice into a common struct. Currently this has no methods
but they will be added soon.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
a0e48a87b8 pci: vfio: Refactor code that reads PCI config from VFIO device
By splitting this into a trait with common code extracted then this
will allow extensive reuse of logic in the vfio-user version.

This commit also changed the order of parameters on
::write_config_dword() to place offset first to match the other
functions.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Rob Bradford
349dbb9aac pci: vfio: Add trait for accessing VFIO PCI device config
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-04 14:30:14 +02:00
Sebastien Boeuf
dcc646f5b1 clippy: Fix redundant allocations
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-29 13:28:57 +02:00
Rob Bradford
6e63df98ba pci: vfio: Fix and clarify BAR calculation code
The BAR calculation code was incorrect for calculating I/O BARs but also
has misleading comments (mixing bits and bytes, first and least
significant, etc).

This change adjusts the algorithm to more closely match the version
described in the PCI specification and takes advantage of Rust's binary
literals for ease of reading. Although this is slightly longer by
calculating the 64-bit and 32-bit paths separately I think this is
easier to read.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-07-05 17:38:23 +02:00
Wei Liu
1f2915bff0 vmm: hypervisor: split set_user_memory_region to two functions
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.

MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.

Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.

This fixes user memory region removal on MSHV.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:45:45 +02:00
Rob Bradford
3ffd2cb9be pci: Versionize PCI state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-26 22:29:35 +02:00
Rob Bradford
496ceed1d0 misc: Remove unnecessary "extern crate"
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-12 17:26:11 +02:00
Rob Bradford
c82226fdae pci: vfio: Naturally align the PCI BAR allocation
The PCI bar should be naturally aligned i.e. aligned to the size of the
bar itself.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-21 16:11:54 +01:00
Sebastien Boeuf
7c457378e5 pci: Fix BAR reprogramming detection logic
The logic wasn't quite right, as it wasn't detecting BAR reprogramming
when the upper part of the address was identical. For instance, a BAR
moved from 0x7fc0000000 to 0x7fd0000000 wasn't detected properly.

The logic has been updated and cleaned up to fix this issue, which was
observed when running Windows guests. This fixes the network hotplug
support as well.

Fixes #1797
Fixes #1798

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-04-21 16:11:54 +01:00
Rob Bradford
6f5d4702d4 misc: Simplify snapshot/restore by using helper functions
Simplify snapshot & restore code by using generics to specify helper
functions that take / make a Serialize / Deserialize struct

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-08 16:17:14 +01:00
Rob Bradford
827229d8e4 pci: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `IORegion` contains a capitalized acronym
   --> pci/src/configuration.rs:320:5
    |
320 |     IORegion = 0x01,
    |     ^^^^^^^^ help: consider making the acronym lowercase, except the initial letter (notice the capitalization): `IoRegion`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Sebastien Boeuf
933d41cf2f vmm: Provide DMA mapping handlers to virtio-mem devices
Now that virtio-mem devices can update VFIO mappings through dedicated
handlers, let's provide them from the DeviceManager.

Important to note these handlers should either be provided to virtio-mem
devices or to the unique virtio-iommu device. This must be mutually
exclusive.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-05 10:38:42 +01:00
Sebastien Boeuf
080ea31813 pci, vmm: Manage VFIO DMA mapping from DeviceManager
Instead of letting the VfioPciDevice take the decision on how/when to
perform the DMA mapping/unmapping, we move this to the DeviceManager
instead.

The point is to let the DeviceManager choose which guest memory regions
should be mapped or not. In particular, we don't want the virtio-mem
region to be mapped/unmapped as it will be virtio-mem device
responsibility to do so.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-05 10:38:42 +01:00
Sebastien Boeuf
a0a89b1346 pci, vmm: Move to upstream vfio-ioctls crate
This commit moves both pci and vmm code from the internal vfio-ioctls
crate to the upstream one from the rust-vmm project.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-24 08:02:37 +01:00
Sebastien Boeuf
9af477e964 pci: vfio: Check VFIO device interrupt's support
In case the VFIO device does not support MSI or MSI-X, the capabilities
should not be parsed, avoiding the exposure of unsupported capabilities.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00
Sebastien Boeuf
27515a6ec4 pci: vfio: Propagate errors when enabling interrupts
Make sure to propagate the error coming from VfioDevice when trying to
enable INTx, MSI or MSI-X interrutps.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00
Sebastien Boeuf
19167e7647 pci: vfio: Implement INTx support
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00
Sebastien Boeuf
acfbee5b7a interrupt: Make notifier function return Option<EventFd>
In anticipation for supporting the notifier function for the legacy
interrupt source group, we need this function to return an EventFd
instead of a reference to this same EventFd.

The reason is we can't return a reference when there's an Arc<Mutex<>>
involved in the call chain.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00
Rob Bradford
6ccd32c904 pci: Remove manual range checks
error: manual `Range::contains` implementation
   --> pci/src/vfio.rs:948:12
    |
948 |         if (reg_idx >= PCI_CONFIG_BAR0_INDEX && reg_idx < PCI_CONFIG_BAR0_INDEX + BAR_NUMS)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use: `(PCI_CONFIG_BAR0_INDEX..PCI_CONFIG_BAR0_INDEX + BAR_NUMS).contains(&reg_idx)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_range_contains

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 13:46:37 +01:00
Rob Bradford
7cc729c7d9 pci, virtio-devices: Extend barrier returning through PCI code
We need to be able to return the barrier from the code that prepares to
activate the virtio device. This triggered by a write to the
configuration fields stored in the PCI BAR. Since bars can be accessed
by both memory mapping and through PCI config I/O several prototypes
must be changed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 11:23:53 +00:00
Rob Bradford
1fc6d50f3e misc: Make Bus::write() return an Option<Arc<Barrier>>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.

This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.

See #1863

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 11:23:53 +00:00
Rob Bradford
593a958fe5 pci, vmm: Include VFIO devices in device tree
Fixes: #1687

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-01 10:49:04 +01:00
Samuel Ortiz
72bb255ff6 pci, virtio-devices: Fix rust 1.48 clippy warnings
Unnecessary closure used to substitute value for `Option::None`
See https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-11-20 17:09:28 +01:00
Rob Bradford
8baa244ec1 hypervisor: Add control for dirty page logging
When creating a userspace mapping provide a control for enabling the
logging of dirty pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-17 16:57:11 +00:00
Rob Bradford
bb5b9584d2 pci, ch-remote, vmm: Replace simple match blocks with matches!
This is a new clippy check introduced in 1.47 which requires the use of
the matches!() macro for simple match blocks that return a boolean.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-09 10:49:54 +02:00
Sebastien Boeuf
de88bef429 pci: msix: Fix masking/enabling semantics
By looking at Linux kernel boot time, we identified that a lot of time
was spent registering and unregistering IRQ fds to KVM. This is not
efficient and certainly not a wrong behavior from the Linux kernel,
but rather a problem with the Cloud-Hypervisor's implementation of
MSI-X.

The way to fix this issue is by ensuring the initial conditions are
correct, which means the entire MSI-X vector table must be disabled
and masked. Additionally, each vector must be individually masked.

With these correct conditions, Linux won't start masking interrupt
vectors, and later unmask them since they will be seen as masked from
the beginning. This means the OS will simply have to unmask them when
needed, avoiding the extra operation.

Another aspect of this patch is to prevent Cloud-Hypervisor from
enabling (by registering IRQ fd) all vectors when either the global
'mask' or 'enable' bits are set. Instead, we can simply let the mask()
and unmask() operations take care of it if needed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-24 22:29:16 +02:00
Josh Soref
5c3f4dbe6f ch: Fix various misspelled words
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-09-23 08:59:31 +01:00
Rob Bradford
15025d71b1 devices, vm-device: Move BusDevice and Bus into vm-device
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-10 09:35:38 +01:00
Sebastien Boeuf
871138d5cc vm-migration: Make snapshot() mutable
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Wei Liu
47e8f5475e pci/msix: remove reference to KVM from a comment
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-08-21 16:23:41 +02:00
Wei Liu
e6849699d2 vfio: remove KVM-ism from comments and error messages
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-08-21 16:23:41 +02:00
Wei Liu
571c368528 vfio: fix comment for map_mmio_regions
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-08-21 16:23:41 +02:00
Muminul Islam
053ea5dcd3 vfio: Make vfio to use MemoryRegion instead of kvm_userspace_memory_region
Signed-off-by: Muminul Islam <muislam@microsoft.com>
2020-07-16 07:34:27 +02:00
Rob Bradford
dc55e45977 pci: Introduce and use PciBar struct
This simplies some of the handling for PCI BARs particularly with
respect to snapshot and restore. No attempt has been made to handle the
64-bit bar handling in a different manner to that which was used before.

Fixes: #1153

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-15 07:59:33 +02:00
Michael Zhao
cce6237536 pci: Enable GSI routing (MSI type) for AArch64
In this commit we saved the BDF of a PCI device and set it to "devid"
in GSI routing entry, because this field is mandatory for GICv3-ITS.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-14 14:34:54 +01:00
Michael Zhao
17057a0dd9 vmm: Fix build errors with "pci" feature on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-14 14:34:54 +01:00
Wei Liu
e5552a53d8 arch, pci: rename vm_fd to vm
The type is now hypervisor::Vm. Switch from KVM specific name vm_fd to a
generic name just like 8186a8eee6 ("vmm: interrupt: Rename vm_fd").

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-13 16:08:00 +01:00
Muminul Islam
e4dee57e81 arch, pci, vmm: Initial switch to the hypervisor crate
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-06-22 15:03:15 +02:00
Henry Wang
99e72be169 unit tests: Fix unit tests and docs for AArch64
Currently, not every feature of the cloud-hypervisor is enabled
on AArch64, which means that on AArch64 machines, the
`run_unit_tests.sh` needs to be tailored and some unit test cases
should be run on x86_64 only.

Also this commit fixes the typo and unifies `Arm64` and `AArch64`
in the AArch64 document.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-06-15 17:28:05 +01:00
LiYa'nan
acc234088f vfio: fix for bug as below:
cloud-hypervisor: 763.978581807s: ERROR:pci/src/vfio.rs:651 -- failed to remove all guest memory regions from iommu table

when poweroff a vm with vfio device, clh will finally remove all guest memory region from iommu table
with the method unset_dma_map, not method setup_dma_map.

Signed-off-by: LiYa'nan <oliverliyn@gmail.com>
2020-06-15 08:51:13 +02:00
Anatol Belski
abd6204d27 source: Fix file permissions
Rust sources and some data files should not be executable. The perms are
set to 644.

Signed-off-by: Anatol Belski <ab@php.net>
2020-06-10 18:47:27 +01:00
Samuel Ortiz
3336e80192 vfio: Switch to the vfio-ioctls crate ch branch
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-06-04 08:48:55 +02:00
Samuel Ortiz
d24aa72d3e vfio: Rename to vfio-ioctls
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-06-04 08:48:55 +02:00
Samuel Ortiz
53ce529875 vfio: Move the PCI implementation to the PCI crate
There is a much stronger PCI dependency from vfio_pci.rs than a VFIO one
from pci/src/vfio.rs. It seems more natural to have the PCI specific
VFIO implementation in the PCI crate rather than the other way around.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-06-04 08:48:55 +02:00
Rob Bradford
c31ad72ee9 build: Address issues found by 1.43.0 clippy
These are mostly due to use of "bare use" statements and unnecessary vector
creation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-27 19:32:12 +02:00
Sebastien Boeuf
1e0ebb760f pci: Allow specific PCI b/d/f to be reserved
In order to let the PciBus user choose where a device should be placed
on the bus, a new function get_device_id() is introduced. This will be
helpful in the context of snapshot/restore as the caller will be able to
place the PCI devices on the same slot they were placed before the
snapshot was taken.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-05-12 17:37:31 +01:00
Sebastien Boeuf
e1701f11b1 pci: Implement Snapshottable trait for PciConfiguration
The PCI configuration from each PCI device is modified at runtime as we
can expect the guest OS to write to some PCI capability structure, or
move the BAR to a different location in the guest address space.

For all the reasons why such configuration might differ from the initial
configuration, we must store the registers values to be able to restore
them with the right values whenever a PCI device is being restored.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-05-11 11:38:16 +01:00
Sebastien Boeuf
376db31107 pci: Implement Snapshottable trait for MsixConfig
In order to restore devices relying on MSI-X, the MsixConfig structure
must be restored with the correct values. Additionally, the KVM routes
must be restored so that interrupts can be delivered through KVM the way
they were configured before the snapshot was taken.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-05-11 11:38:16 +01:00
Rob Bradford
b8cfdab8b6 pci: configuration: Use correct algorithm for BAR size reporting
When reporting the BAR size it is necessary to return a value that is
encoded such that all the bits are set that represent the mask of the
natural alignment of the field.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-17 15:20:50 +02:00
Rob Bradford
9bd5ec8967 pci, vfio, vm-virtio: Specify a PCI revision ID of 1 for virtio-pci
Add support for specifying the PCI revision in the PCI configuration and
populate this with the value of 1 for virtio-pci devices.

The virtio-pci specification is slightly ambiguous only saying that
transitional (i.e. devices that support legacy and virtio 1.0) should
set this to 0. In practice it seems that software expects the revision
to be set to 1 for modern only devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-17 13:46:48 +02:00
Rob Bradford
56207a0328 pci: Print out details of the BAR moving upon error
In particular include the old and new bases as well as the length.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-15 12:02:19 +02:00
Sebastien Boeuf
8d785bbd5f pci: Fix the PciBus using HashMap instead of Vec
By using a Vec to hold the list of devices on the PciBus, there's a
problem when we use unplug. Indeed, the vector of devices gets reduced
and if the unplugged device was not the last one from the list, every
other device after this one is shifted on the bus.

To solve this problem, a HashMap is used. This allows to keep track of
the exact place where each device stands on the bus.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-13 10:54:34 +01:00
Sebastien Boeuf
f3dc245c4f pci: Extend PciDevice trait with new free_bars() method
The point of this new method is to let the caller decide when the
implementation of the PciDevice should free the BARs previously
allocated through the other method allocate_bars().

This provides a way to perform proper cleanup for any PCI device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:30 +00:00
Sebastien Boeuf
b50cbe5064 pci: Give PCI device ID back when removing a device
Upon removal of a PCI device, make sure we don't hold onto the device ID
as it could be reused for another device later.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
df71aaee3f pci: Make the device ID allocation smarter
In order to handle the case where devices are very often plugged and
unplugged from a VM, we need to handle the PCI device ID allocation
better.

Any PCI device could be removed, which means we cannot simply rely on
the vector size to give the next available PCI device ID.

That's why this patch stores in memory the information about the 32
slots availability. Based on this information, whenever a new slot is
needed, the code can correctly provide an available ID, or simply return
an error because all slots are taken.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
f8e2008e0e pci: Add a function to remove a PciDevice from the bus
Simple function relying on the retain() method from std::Vec, allowing
to remove every occurence of the same device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
49268bff3b pci: Remove all Weak references from PciBus
Now that the BusDevice devices are stored as Weak references by the IO
and MMIO buses, there's no need to use Weak references from the PciBus
anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
b77fdeba2d msi/msi-x: Prevent from losing masked interrupts
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.

The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.

That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.

Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-25 08:31:14 +00:00
Qiu Wenbo
4cf89d373d pci: handle extended configuration space properly
This is critical to support extended capabilitiy list.

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-02-24 17:05:09 +01:00
Qiu Wenbo
f6b9445be7 pci: fix pci MMCONFIG address parsing
We should not assume the offset produced by ECAM is identical to the
CONFIG_ADDRESS register of legacy PCI port io enumeration.

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-02-24 17:05:09 +01:00
Sebastien Boeuf
db9f9b7820 pci: Make self mutable when reading from PCI config space
In order to anticipate the need to support more features related to the
access of a device's PCI config space, this commits changes the self
reference in the function read_config_register() to be mutable.

This also brings some more flexibility for any implementation of the
PciDevice trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-30 09:25:52 +01:00
Sebastien Boeuf
e91638e6c5 pci: Cleanup the crate from unneeded types
Both InterruptDelivery and InterruptParameters can be removed from the
pci crate as they are not used anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
99f39291fd pci: Simplify PciDevice trait
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
50a4c16d34 pci: Cleanup the crate from kvm_iotcls and kvm_bindings dependencies
Now that KVM specific interrupts are handled through InterruptManager
trait implementation, the pci crate does not need to rely on kvm_ioctls
and kvm_bindings crates.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
4bb12a2d8d interrupt: Reorganize all interrupt management with InterruptManager
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.

By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.

Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.

Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
c396baca46 vm-virtio: Modify VirtioInterrupt callback into a trait
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.

Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.

For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
ef7d889a79 vfio: Remove unused GSI routing functions
At this point, both MSI and MSI-X handle the KVM GSI routing update,
which means the vfio crate does not have to deal with it anymore.
Therefore, several functions can be removed from the vfio-pci code, as
they are not needed anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
1a4b5ecc75 msi: Set KVM routes from MsiConfig instead of VFIO
Now that MsiConfig has access to both KVM VmFd and the list of GSI
routes, the update of the KVM GSI routes can be directly done from
MsiConfig instead of specifically from the vfio-pci implementation.

By moving the KVM GSI routes update at the MsiConfig level, any PCI
device such as vfio-pci, virtio-pci, or any other emulated PCI device
can benefit from it, without having to implement it on their own.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f3c3870159 msi: Create MsiConfig to embed MsiCap
The same way we have MsixConfig in charge of managing whatever relates
to MSI-X vectors, we need a MsiConfig structure to manage MSI vectors.
The MsiCap structure is still needed as a low level API, but it is now
part of the MsiConfig which oversees anything related to MSI.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
1e5e02801f msix: Perform interrupt enabling/disabling
In order to factorize one step further, we let MsixConfig perform the
interrupt enabling/disabling. This is done by registering/unregistering
the KVM irq_fds of all GSI routes related to this device.

And now that MsixConfig is in charge of the irq_fds, vfio-pci must rely
on it to retrieve them and provide them to the vfio driver.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
19aeac40c9 msix: Remove the need for interrupt callback
Now that MsixConfig has access to the irq_fd descriptors associated with
each vector, it can directly write to it anytime it needs to trigger an
interrupt.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
3fe362e3bd msix: Set KVM routes from MsixConfig instead of VFIO
Now that MsixConfig has access to both KVM VmFd and the list of GSI
routes, the update of the KVM GSI routes can be directly done from
MsixConfig instead of specifically from the vfio-pci implementation.

By moving the KVM GSI routes update at the MsixConfig level, both
vfio-pci and virtio-pci (or any other emulated PCI device) can benefit
from it, without having to implement it on their own.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
2381f32ae0 msix: Add gsi_msi_routes to MsixConfig
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
9b60fcdc39 msix: Add VmFd to MsixConfig
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
86c760a0d9 msix: Add SystemAllocator to MsixConfig
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.

Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f77d2c2d16 pci: Add some KVM and interrupt utilities to the crate
In order to anticipate the need for both msi.rs and msix.rs to rely on
some KVM utils and InterruptRoute structure to handle the update of the
KVM GSI routes, this commit adds these utilities directly to the pci
crate. So far, these were exclusively used by the vfio crate, which is
why there were located there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
c2ae380503 pci: Refine detection of BAR reprogramming
The current code was always considering 0xffffffff being written to the
register as a sign it was expecting to get the size, hence the BAR
reprogramming detection was stating this case was not a reprogramming
case.

Problem is, if the value 0xffffffff is directed at a 64bits BAR, this
might be the high or low part of a 64bits address which is meant to be
the new address of the BAR, which means we would miss the detection of
the BAR being reprogrammed here.

This commit improves the code using finer granularity checks in order to
detect this corner case correctly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-09 08:05:35 +01:00