2019-07-24 12:54:37 +00:00
|
|
|
# Cloud Hypervisor VFIO HOWTO
|
|
|
|
|
|
|
|
VFIO (Virtual Function I/O) is a kernel framework that exposes direct device
|
|
|
|
access to userspace. `cloud-hypervisor`, as many VMMs do, uses the VFIO
|
|
|
|
framework to directly assign host physical devices to the guest workloads.
|
|
|
|
|
|
|
|
## Direct Device Assignment with Cloud Hypervisor
|
|
|
|
|
|
|
|
To assign a device to a `cloud-hypervisor` guest, the device needs to be managed
|
|
|
|
by the VFIO kernel drivers. However, by default, a host device will be bound to
|
|
|
|
its native driver, which is not the VFIO one.
|
|
|
|
|
|
|
|
As a consequence, a device must be unbound from its native driver before passing
|
2021-05-18 01:12:08 +00:00
|
|
|
it to `cloud-hypervisor` for assigning it to a guest.
|
2019-07-24 12:54:37 +00:00
|
|
|
|
|
|
|
### Example
|
|
|
|
|
|
|
|
In this example we're going to assign a PCI memory card (SD, MMC, etc) reader
|
|
|
|
from the host in a cloud hypervisor guest.
|
|
|
|
|
|
|
|
`cloud-hypervisor` only supports assigning PCI devices to its guests. `lspci`
|
|
|
|
helps with identifying PCI devices on the host:
|
|
|
|
|
|
|
|
```
|
|
|
|
$ lspci
|
|
|
|
[...]
|
|
|
|
01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
|
|
|
|
[...]
|
|
|
|
```
|
|
|
|
|
|
|
|
Here we see that our device is on bus 1, slot 0 and function 0 (`01:00.0`).
|
|
|
|
|
|
|
|
Now that we have identified the device, we must unbind it from its native driver
|
|
|
|
(`rtsx_pci`) and bind it to the VFIO driver instead (`vfio_pci`).
|
|
|
|
|
|
|
|
First we add VFIO support to the host:
|
|
|
|
|
|
|
|
```
|
2021-11-16 10:12:28 +00:00
|
|
|
# modprobe -r vfio_pci
|
|
|
|
# modprobe -r vfio_iommu_type1
|
|
|
|
# modprobe vfio_iommu_type1 allow_unsafe_interrupts
|
|
|
|
# modprobe vfio_pci
|
|
|
|
```
|
|
|
|
|
|
|
|
In case the VFIO drivers are built-in, enable unsafe interrupts with:
|
|
|
|
|
|
|
|
```
|
|
|
|
# echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
|
2019-07-24 12:54:37 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
Then we unbind it from its native driver:
|
|
|
|
|
|
|
|
```
|
2021-11-16 10:12:28 +00:00
|
|
|
# echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind
|
2019-07-24 12:54:37 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
And finally we bind it to the VFIO driver. To do that we first need to get the
|
|
|
|
device's VID (Vendor ID) and PID (Product ID):
|
|
|
|
|
|
|
|
```
|
|
|
|
$ lspci -n -s 01:00.0
|
|
|
|
01:00.0 ff00: 10ec:525a (rev 01)
|
|
|
|
|
2021-11-16 10:12:28 +00:00
|
|
|
# echo 10ec 525a > /sys/bus/pci/drivers/vfio-pci/new_id
|
|
|
|
```
|
|
|
|
|
|
|
|
If you have more than one device with the same `vendorID`/`deviceID`, starting
|
|
|
|
with the second device, the binding is performed as follows:
|
|
|
|
|
|
|
|
```
|
|
|
|
# echo 0000:02:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
|
2019-07-24 12:54:37 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
Now the device is managed by the VFIO framework.
|
|
|
|
|
|
|
|
The final step is to give that device to `cloud-hypervisor` to assign it to the
|
|
|
|
guest. This is done by using the `--device` command line option. This option
|
|
|
|
takes the device's sysfs path as an argument. In our example it is
|
|
|
|
`/sys/bus/pci/devices/0000:01:00.0/`:
|
|
|
|
|
|
|
|
```
|
|
|
|
./target/debug/cloud-hypervisor \
|
|
|
|
--kernel ~/vmlinux \
|
2020-07-02 16:30:16 +00:00
|
|
|
--disk path=~/focal-server-cloudimg-amd64.raw \
|
2019-07-24 12:54:37 +00:00
|
|
|
--console off \
|
|
|
|
--serial tty \
|
2020-07-02 16:30:16 +00:00
|
|
|
--cmdline "console=ttyS0 root=/dev/vda1 rw" \
|
2019-07-24 12:54:37 +00:00
|
|
|
--cpus 4 \
|
|
|
|
--memory size=512M \
|
2019-10-07 16:03:58 +00:00
|
|
|
--device path=/sys/bus/pci/devices/0000:01:00.0/
|
2019-07-24 12:54:37 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
The guest kernel will then detect the card reader on its PCI bus and provided
|
|
|
|
that support for this device is enabled, it will probe and enable it for the
|
|
|
|
guest to use.
|
2021-11-16 10:12:28 +00:00
|
|
|
|
|
|
|
In case you want to pass multiple devices, here is the correct syntax:
|
|
|
|
|
|
|
|
```
|
|
|
|
--device path=/sys/bus/pci/devices/0000:01:00.0/ path=/sys/bus/pci/devices/0000:02:00.0/
|
|
|
|
```
|
|
|
|
|
|
|
|
### Multiple devices in the same IOMMU group
|
|
|
|
|
|
|
|
There are cases where multiple devices can be found under the same IOMMU group.
|
|
|
|
This happens often with graphics card embedding an audio controller.
|
|
|
|
|
|
|
|
```
|
|
|
|
$ lspci
|
|
|
|
[...]
|
|
|
|
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)
|
|
|
|
01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
|
|
|
|
[...]
|
|
|
|
```
|
|
|
|
|
|
|
|
This is usually exposed as follows through `sysfs`:
|
|
|
|
|
|
|
|
```
|
|
|
|
$ ls /sys/kernel/iommu_groups/22/devices/
|
|
|
|
0000:01:00.0 0000:01:00.1
|
|
|
|
```
|
|
|
|
|
|
|
|
This means these two devices are under the same IOMMU group 22. In such case,
|
|
|
|
it is important to bind both devices to VFIO and pass them both through the
|
|
|
|
VM, otherwise this could cause some functional and security issues.
|