46 Commits

Author SHA1 Message Date
Sebastien Boeuf
9b60fcdc39 msix: Add VmFd to MsixConfig
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
86c760a0d9 msix: Add SystemAllocator to MsixConfig
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.

Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
24cf15d2b2 vfio: Don't throw an error if a region cannot be found
Everytime we use VFIO with cloud-hypervisor, we get the following error:

ERROR:vfio/src/vfio_device.rs:440 -- Could not get region #8 info

But this is not an error per se, and should be considered as a simple
warning.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-09 08:05:35 +01:00
Rob Bradford
b2589d4f3f vm-virtio, vmm, vfio: Store GuestMemoryMmap in an Arc<ArcSwap<T>>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.

Not covered by this change is replacing the GuestMemoryMmap itself.

This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-02 13:20:11 +00:00
Samuel Ortiz
664431ff14 vsock: vhost_user: vfio: Fix potential host memory overflow
The vsock packets that we're building are resolving guest addresses to
host ones and use the latter as raw pointers.
If the corresponding guest mapped buffer spans across several regions in
the guest, they will do so in the host as well. Since we have no
guarantees that host regions are contiguous, it may lead the VMM into
trying to access memory outside of its memory space.

For now we fix that by ensuring that the guest buffers do not span
across several regions. If they do, we error out.
Ideally, we should enhance the rust-vmm memory model to support safe
acces across host regions.

Fixes CVE-2019-18960

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 22:15:50 +01:00
Rob Bradford
c61104df47 vmm: Port to latest vmm-sys-util
The signal handling for vCPU signals has changed in the latest release
so switch to the new API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-11 14:11:11 +00:00
Sebastien Boeuf
23929f41a7 vfio: Don't override MSI Enable bit through VFIO ioctl
This commit ensures device's PCI config space is being written after
MSI/MSI-X interrupts have been enabled/disabled. In case of MSI, when
the interrupts are enabled through VFIO (using VFIO_DEVICE_SET_IRQS),
the MSI Enable bit in the MSI capability structure found in the PCI
config space is disabled by default. That's why when the guest is
enabling this bit, we first need to enable the MSI interrupts with
VFIO through VFIO_DEVICE_SET_IRQS ioctl, and only after we can write
to the device region to update the MSI Enable bit.

Fixes #460

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-04 14:47:06 +00:00
Wu Zongyong
1dfd60b609 vfio: use correct flags to disable interrupts
The comments of vfio kernel module said that individual subindex
interrupts can be disabled using the -1 value for DATA_EVENTFD or
the index can be disabled as a whole with:
    flags = (DATA_NONE|ACTION_TRIGGER), count = 0.

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
CC: Liu Jiang <gerry@linux.alibaba.com>
2019-12-04 14:47:06 +00:00
Sebastien Boeuf
08258d5dad vfio: pci: Allow multiple devices to be passed through
The KVM_SET_GSI_ROUTING ioctl is very simple, it overrides the previous
routes configuration with the new ones being applied. This means the
caller, in this case cloud-hypervisor, needs to maintain the list of all
interrupts which needs to be active at all times. This allows to
correctly support multiple devices to be passed through the VM and being
functional at the same time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-04 08:48:17 +01:00
Sebastien Boeuf
4115fa8a87 vfio: pci: Update irqfd registration
In order to improve the existing VFIO code, this patch registers the
eventfds used to trigger KVM interrupts only when the interrupts are
enabled, and unregisters them when interrupts are disabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-04 08:48:17 +01:00
Samuel Ortiz
0f21781fbe cargo: Bump the kvm and vmm-sys-util crates
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-11-29 17:48:02 +00:00
Wu Zongyong
4de04e84b5 vfio-pci: unmap regions when dropping VfioGroup
Introduce method unmap_mmio_regions to unmap all regions mapped to host.
This patch eliminate the error message "Could not unset container".

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
2019-11-27 07:44:02 -08:00
Sebastien Boeuf
360f0639f4 Revert "vfio: use correct flags to disable interrupts"
This reverts commit 66fde245b39b71dfc2e46803bcaf640f8175bcb5.

The commit broke the VFIO support for MSI. Issue needs to be
investigated but in the meantime, it is safer to fix the codebase.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-11-24 12:43:06 +01:00
Wu Zongyong
d7dc1a9226 pci: don't cleanup msi/msix interrupts repeatedly
We disabled msi/msix twice inside Drop trait for VfioPciDevice,
which resulted in error message "Could not disable MSI-X". Eliminating
this error by check whether the msi/msix capability is enabled.

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
CC: Liu Jiang <gerry@linux.alibaba.com>
2019-11-21 06:38:36 -08:00
Wu Zongyong
66fde245b3 vfio: use correct flags to disable interrupts
The comments of vfio kernel module said that individual subindex
interrupts can be disabled using the -1 value for DATA_EVENTFD or
the index can be disabled as a whole with:
    flags = (DATA_NONE|ACTION_TRIGGER), count = 0.

Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
CC: Liu Jiang <gerry@linux.alibaba.com>
2019-11-21 06:38:36 -08:00
Sebastien Boeuf
587a420429 cargo: Update to the latest kvm-ioctls version
We need to rely on the latest kvm-ioctls version to benefit from the
recent addition of unregister_ioevent(), allowing us to detach a
previously registered eventfd to a PIO or MMIO guest address.

Because of this update, we had to modify the current constraint we had
on the vmm-sys-util crate, using ">= 0.1.1" instead of being strictly
tied to "0.2.0".

Once the dependency conflict resolved, this commit took care of fixing
build issues caused by recent modification of kvm-ioctls relying on
EventFd reference instead of RawFd.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-31 09:30:59 +01:00
Sebastien Boeuf
de21c9ba4f pci: Remove ioeventfds() from PciDevice trait
The PciDevice trait is supposed to describe only functions related to
PCI. The specific method ioeventfds() has nothing to do with PCI, but
instead would be more specific to virtio transport devices.

This commit removes the ioeventfds() method from the PciDevice trait,
adding some convenient helper as_any() to retrieve the Any trait from
the structure behing the PciDevice trait. This is the only way to keep
calling into ioeventfds() function from VirtioPciDevice, so that we can
still properly reprogram the PCI BAR.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-31 09:30:59 +01:00
Sebastien Boeuf
d6c68e4738 pci: Add error propagation to PCI BAR reprogramming
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-29 16:48:02 +01:00
Sebastien Boeuf
149b61b213 pci: Detect BAR reprogramming
Based on the value being written to the BAR, the implementation can
now detect if the BAR is being moved to another address. If that is the
case, it invokes move_bar() function from the DeviceRelocation trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-29 16:48:02 +01:00
Sebastien Boeuf
e536f88012 vfio: Implement move_bar() from PciDevice trait
Everytime a PCI BAR is moved, the KVM regions need to be updated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-29 16:48:02 +01:00
Samuel Ortiz
de9eb3e0fa Bump vmm-sys-utils to 0.2.0
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-10-23 11:35:11 +03:00
Sebastien Boeuf
3acf9dfcf3 vfio: Don't map guest memory for VFIO devices attached to vIOMMU
In case a VFIO devices is being attached behind a virtual IOMMU, we
should not automatically map the entire guest memory for the specific
device.

A VFIO device attached to the virtual IOMMU will be driven with IOVAs,
hence we should simply wait for the requests coming from the virtual
IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-16 07:27:06 +02:00
Sebastien Boeuf
837bcbc6ba vfio: Create VFIO implementation of ExternalDmaMapping
With this implementation of the trait ExternalDmaMapping, we now have
the tool to provide to the virtual IOMMU to trigger the map/unmap on
behalf of the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-16 07:27:06 +02:00
Sebastien Boeuf
3598e603d5 vfio: Add a public function to retrive VFIO container
The VFIO container is the object needed to update the VFIO mapping
associated with a VFIO device. This patch allows the device manager
to have access to the VFIO container.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-10-16 07:27:06 +02:00
Samuel Ortiz
8e018d6feb vfio: Move vfio-bindings to crates.io
And remove our local vfio-bindings code.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-10-10 10:32:40 +02:00
Samuel Ortiz
14eb071b29 Cargo: Move to crates.io vmm-sys-util
Use the newly published 0.1.1 version.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-10-08 07:28:53 -07:00
Samuel Ortiz
add0471120 vfio: Use the log crate macros
Instead of using the syslog vmm-sys-util ones.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-09-02 15:07:42 +02:00
Sebastien Boeuf
0b8856d148 vmm: Add RwLock to the GuestMemoryMmap
Following the refactoring of the code allowing multiple threads to
access the same instance of the guest memory, this patch goes one step
further by adding RwLock to it. This anticipates the future need for
being able to modify the content of the guest memory at runtime.

The reasons for adding regions to an existing guest memory could be:
- Add virtio-pmem and virtio-fs regions after the guest memory was
  created.
- Support future hotplug of devices, memory, or anything that would
  require more memory at runtime.

Because most of the time, the lock will be taken as read only, using
RwLock instead of Mutex is the right approach.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-22 08:24:15 +01:00
Sebastien Boeuf
ec0b5567c8 vmm: Share the guest memory instead of cloning it
The VMM guest memory was cloned (copied) everywhere the code needed to
have ownership of it. In order to clean the code, and in anticipation
for future support of modifying this guest memory instance at runtime,
it is important that every part of the code share the same instance.

Because VirtioDevice implementations need to have access to it from
different threads, that's why Arc must be used in this case.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-22 08:24:15 +01:00
Sebastien Boeuf
658c076eb2 linters: Fix clippy issues
Latest clippy version complains about our existing code for the
following reasons:

- trait objects without an explicit `dyn` are deprecated
- `...` range patterns are deprecated
- lint `clippy::const_static_lifetime` has been renamed to
  `clippy::redundant_static_lifetimes`
- unnecessary `unsafe` block
- unneeded return statement

All these issues have been fixed through this patch, and rustfmt has
been run to cleanup potential formatting errors due to those changes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-15 09:10:04 -07:00
Rob Bradford
9caad7394d build, misc: Bump vmm-sys-util dependency
The structure of the vmm-sys-util crate has changed with lots of code
moving to submodules.

This change adjusts the use of the imported structs to reference the
submodules.

Fixes: #145

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-08-02 07:42:20 -07:00
Rob Bradford
ac950d9a97 build: Bulk update dependencies
Update all dependencies with "cargo upgrade" with the exception of
vmm-sys-utils which needs some extra porting work.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-08-02 15:22:37 +02:00
Sebastien Boeuf
49ef201cd1 vfio: pci: Provide the right MSI-X table offset
When reading from or writing to the MSI-X table, the function provided
by the PCI crate expects the offset to start from the beginning of the
table. That's why it is VFIO specific code to be responsible for
providing the right offset, which means it needs to be the offset
substracted by the beginning of the MSI-X table offset.

This bug was not discovered until we tested VFIO with some device where
the MSI-X table was placed on a BAR at an offset different from 0x0.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-02 09:45:20 +02:00
Sebastien Boeuf
d18c8d4c8c vfio: pci: Add support for expansion ROM BAR
Relying on the newly added code in the pci crate, the vfio crate can now
properly expose an expansion ROM BAR if the device has one.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-31 09:28:29 +02:00
Sebastien Boeuf
347f8a036b vfio: pci: Mask multi function device bit
In order to support VFIO for devices supporting multiple functions,
we need to mask the multi-function bit (bit 7 from Header Type byte).
Otherwise, the guest kernel ends up tryng to enumerate those devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-31 09:28:29 +02:00
Samuel Ortiz
792cc27435 vfio: Propagate the KVM routes setting error
This will trigger a logged error once we have an actual logger.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
421b896ab7 vfio: Don't expose an Interrupt Pin
Since our VFIO code does not support pin based interrupt, but only MSI
and MSI-X, it is cleaner to not expose any Interrupt Pin to the guest by
setting its value to 0.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
2f802880c0 vfio: Disable the ROM expansion BAR
Until the codebase can properly expose the ROM BAR into the guest, it is
better to disable it for now, returning always 0 when the register is
being read.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
e18052120a vfio: Fix Memory BAR alignment
The IO BAR alignment was already set to 4 bytes, this patch simply added
a comment for it.

The Memory BAR alignment was also set to the right value, but it was not
explained why 0x1000 was needed, and also why 0x10 could sometimes be
used as correct alignment.
A Memory BAR must be aligned at least on 16 bytes since the first 4 bits
are dedicated to some specific information about the BAR itself. But in
case a BAR is identified as mappable from VFIO, this means our VMM might
memory map it into the VMM address space, and set KVM accordingly using
the ioctl KVM_SET_USER_MEMORY_REGION. In case of KVM, we have to take
into account that it expects addresses to be page aligned, which means
4K in this case.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
d92d797896 vfio: Update memory slot index to support multiple VFIO devices
In order to correctly support multiple VFIO devices, we need to
increment the memory slot index every time it is being used to set some
user memory region through KVM. That's why the mem_slot parameter is
made mutable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
b5eab43aa5 vfio: Create a global KVM VFIO device for all VFIO devices
KVM does not support multiple KVM VFIO devices to be created when
trying to support multiple VFIO devices. This commit creates one
global KVM VFIO device being shared with every VFIO device, which
makes possible the support for passing several devices through the
VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Chao Peng
b746dd7116 vfio: Map MMIO regions into the guest
VFIO explictly tells us if a MMIO region can be mapped into the guest
address space or not. Except for MSI-X table BARs, we try to map them
into the guest whenever VFIO allows us to do so. This avoids unnecessary
VM exits when the guest tries to access those regions.

Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00
Sebastien Boeuf
c93d5361b8 vfio: pci: Build the KVM routes
We track all MSI and MSI-X capabilities changes, which allows us to also
track all MSI and MSI-X table changes.

With both pieces of information we can build kvm irq routing tables and
map the physical device MSI/X vectors to the guest ones. Once that
mapping is in place we can toggle the VFIO IRQ API accordingly and
enable disable MSI or MSI-X interrupts, from the physical device up to
the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00
Sebastien Boeuf
20f0116111 vfio: pci: Track MSI and MSI-X capabilities
In order to properly manage the VFIO device interrupt settings, we need
to keep track of both MSI and MSI-X PCI config capabilities changes.

When the guest programs the device for interrupt delivery, it writes to
the MSI and MSI-X capabilities. This information must be trapped and
cached in order to map the physical device interrupt delivery path to
the guest one. In other words, tracking MSI and MSI-X capabilites will
allow us to accurately build the KVM interrupt routes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00
Samuel Ortiz
db5b4763c2 vfio: Initial PCI support
This brings the initial PCI support to the VFIO crate.
The VfioPciDevice is the main structure and holds an inner VfioDevice.

VfioPciDevice implements the PCI trait, leaving the IRQ assignments
empty as this will be driven by both the guest and the VFIO PCI device,
not by the VMM.

As we must trap BAR programming from the guest (We don't want to program
the actual device with guest addresses), we use our local PCI
configuration cache to read and write BARs.

Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00
Samuel Ortiz
2cec3aad7f vfio: VFIO API wrappers and helpers
The Virtual Function I/O (VFIO) kernel subsystem exposes a vast and
relatively complex userspace API. This commit abstracts and simplifies
this API into both an internal and external API.

The external API is to be consumed by VFIO device implementation through
the VfioDevice structure. A VfioDevice instance can:

- Enable and disable all interrupts (INTX, MSI and MSI-X) on the
  underlying VFIO device.
- Read and write all of the VFIO device memory regions.
- Set the system's IOMMU tables for the underlying device.

Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00