docs: add documentation for GPUDirect P2P

Signed-off-by: Thomas Barrett <tbarrett1200@icloud.com>
2025-03-20 07:58:55 +00:00 · 2024-04-27 04:52:26 +00:00 · 2024-04-27 04:52:26 +00:00 · 3b64b7723b
commit 3b64b7723b
parent 6925750622
1 changed files with 33 additions and 0 deletions
--- a/docs/vfio.md
+++ b/docs/vfio.md
@ -126,6 +126,39 @@ VM, otherwise this could cause some functional and security issues.

 ### Advanced Configuration Options

+When using NVIDIA GPUs in a VFIO passthrough configuration, advanced
+configuration options are supported to enable GPUDirect P2P DMA over
+PCIe. When enabled, loads and stores between GPUs use native PCIe
+peer-to-peer transactions instead of a shared memory buffer. This drastically
+decreases P2P latency between GPUs. This functionality is supported by
+cloud-hypervisor on NVIDIA Turing, Ampere, Hopper, and Lovelace GPUs.
+
+The NVIDIA driver does not enable GPUDirect P2P over PCIe within guests
+by default because hardware support for routing P2P TLP between PCIe root
+ports is optional. PCIe P2P should always be supported between devices
+on the same PCIe switch. The `x_nv_gpudirect_clique` config argument may
+be used to signal support for PCIe P2P traffic between NVIDIA VFIO endpoints.
+The guest driver assumes that P2P traffic is supported between all endpoints
+that are part of the same clique.
+```
+--device path=/sys/bus/pci/devices/0000:01:00.0/,x_nv_gpudirect_clique=0
+```
+
+The following command can be run on the guest to verify that GPUDirect P2P is
+correctly enabled.
+```
+nvidia-smi topo -p2p r
+ 	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	
+ GPU0	X	OK	OK	OK	OK	OK	OK	OK	
+ GPU1	OK	X	OK	OK	OK	OK	OK	OK	
+ GPU2	OK	OK	X	OK	OK	OK	OK	OK	
+ GPU3	OK	OK	OK	X	OK	OK	OK	OK	
+ GPU4	OK	OK	OK	OK	X	OK	OK	OK	
+ GPU5	OK	OK	OK	OK	OK	X	OK	OK	
+ GPU6	OK	OK	OK	OK	OK	OK	X	OK	
+ GPU7	OK	OK	OK	OK	OK	OK	OK	X	
+```
+
 Some VFIO devices have a 32-bit mmio BAR. When using many such devices, it is
 possible to exhaust the 32-bit mmio space available on a PCI segment. The
 following example demonstrates an example device with a 16 MiB 32-bit mmio BAR.