mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2024-12-22 13:45:38 +00:00
docs: Convert 'drvnodedev' page to rST
Fix one cross link anchor along with the conversion. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>
This commit is contained in:
parent
05a514b0b3
commit
19b1fef54a
@ -1,383 +0,0 @@
|
|||||||
<?xml version="1.0" encoding="UTF-8"?>
|
|
||||||
<!DOCTYPE html>
|
|
||||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
||||||
<body>
|
|
||||||
<h1>Host device management</h1>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Libvirt provides management of both physical and virtual host devices
|
|
||||||
(historically also referred to as node devices) like USB, PCI, SCSI, and
|
|
||||||
network devices. This also includes various virtualization capabilities
|
|
||||||
which the aforementioned devices provide for utilization, for example
|
|
||||||
SR-IOV, NPIV, MDEV, DRM, etc.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The node device driver provides means to list and show details about host
|
|
||||||
devices (<code>virsh nodedev-list</code>, <code>virsh nodedev-info</code>,
|
|
||||||
and <code>virsh nodedev-dumpxml</code>), which are generic and can be used
|
|
||||||
with all devices. It also provides the means to manage virtual devices.
|
|
||||||
Persistently-defined virtual devices are only supported for mediated
|
|
||||||
devices, while transient devices are supported by both mediated devices
|
|
||||||
and NPIV (<a href="https://wiki.libvirt.org/page/NPIV_in_libvirt">more
|
|
||||||
info about NPIV)</a>).
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
Persistent virtual devices are managed with
|
|
||||||
<code>virsh nodedev-define</code> and <code>virsh nodedev-undefine</code>.
|
|
||||||
Persistent devices can be configured to start manually or automatically
|
|
||||||
using <code>virsh nodedev-autostart</code>. Inactive devices can be made
|
|
||||||
active with <code>virsh nodedev-start</code>.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
Transient virtual devices are started and stopped with the commands
|
|
||||||
<code>virsh nodedev-create</code> and <code>virsh nodedev-destroy</code>.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
Devices on the host system are arranged in a tree-like hierarchy, with
|
|
||||||
the root node being called <code>computer</code>. The node device driver
|
|
||||||
supports udev backend (HAL backend was removed in <code>6.8.0</code>).
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Details of the XML format of a host device can be found <a
|
|
||||||
href="formatnode.html">here</a>. Of particular interest is the
|
|
||||||
<code>capability</code> element, which describes features supported by
|
|
||||||
the device. Some specific device types are addressed in more detail
|
|
||||||
below.
|
|
||||||
</p>
|
|
||||||
<h2>Basic structure of a node device</h2>
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
<name>pci_0000_00_17_0</name>
|
|
||||||
<path>/sys/devices/pci0000:00/0000:00:17.0</path>
|
|
||||||
<parent>computer</parent>
|
|
||||||
<driver>
|
|
||||||
<name>ahci</name>
|
|
||||||
</driver>
|
|
||||||
<capability type='pci'>
|
|
||||||
...
|
|
||||||
</capability>
|
|
||||||
</device></pre>
|
|
||||||
|
|
||||||
<ul id="toc"/>
|
|
||||||
|
|
||||||
<h2><a id="PCI">PCI host devices</a></h2>
|
|
||||||
<dl>
|
|
||||||
<dt><code>capability</code></dt>
|
|
||||||
<dd>
|
|
||||||
When used as top level element, the supported values for the
|
|
||||||
<code>type</code> attribute are <code>pci</code> and
|
|
||||||
<code>phys_function</code> (see <a href="#SRIOVCap">SR-IOV below</a>).
|
|
||||||
</dd>
|
|
||||||
</dl>
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
<name>pci_0000_04_00_1</name>
|
|
||||||
<path>/sys/devices/pci0000:00/0000:00:06.0/0000:04:00.1</path>
|
|
||||||
<parent>pci_0000_00_06_0</parent>
|
|
||||||
<driver>
|
|
||||||
<name>igb</name>
|
|
||||||
</driver>
|
|
||||||
<capability type='pci'>
|
|
||||||
<domain>0</domain>
|
|
||||||
<bus>4</bus>
|
|
||||||
<slot>0</slot>
|
|
||||||
<function>1</function>
|
|
||||||
<product id='0x10c9'>82576 Gigabit Network Connection</product>
|
|
||||||
<vendor id='0x8086'>Intel Corporation</vendor>
|
|
||||||
<iommuGroup number='15'>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
|
|
||||||
</iommuGroup>
|
|
||||||
<numa node='0'/>
|
|
||||||
<pci-express>
|
|
||||||
<link validity='cap' port='1' speed='2.5' width='2'/>
|
|
||||||
<link validity='sta' speed='2.5' width='2'/>
|
|
||||||
</pci-express>
|
|
||||||
</capability>
|
|
||||||
</device></pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The XML format for a PCI device stays the same for any further
|
|
||||||
capabilities it supports, a single nested <code><capability></code>
|
|
||||||
element will be included for each capability the device supports.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h3><a id="SRIOVCap">SR-IOV capability</a></h3>
|
|
||||||
<p>
|
|
||||||
Single root input/output virtualization (SR-IOV) allows sharing of the
|
|
||||||
PCIe resources by multiple virtual environments. That is achieved by
|
|
||||||
slicing up a single full-featured physical resource called physical
|
|
||||||
function (PF) into multiple devices called virtual functions (VFs) sharing
|
|
||||||
their configuration with the underlying PF. Despite the SR-IOV
|
|
||||||
specification, the amount of VFs that can be created on a PF varies among
|
|
||||||
manufacturers.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Suppose the NIC <a href="#PCI">above</a> was also SR-IOV capable, it would
|
|
||||||
also include a nested
|
|
||||||
<code><capability></code> element enumerating all virtual
|
|
||||||
functions available on the physical device (physical port) like in the
|
|
||||||
example below.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<capability type='pci'>
|
|
||||||
...
|
|
||||||
<capability type='virt_functions' maxCount='7'>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x10' function='0x5'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x10' function='0x7'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x11' function='0x1'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x11' function='0x3'/>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x11' function='0x5'/>
|
|
||||||
</capability>
|
|
||||||
...
|
|
||||||
</capability></pre>
|
|
||||||
<p>
|
|
||||||
A SR-IOV child device on the other hand, would then report its top level
|
|
||||||
capability type as a <code>phys_function</code> instead:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
...
|
|
||||||
<capability type='phys_function'>
|
|
||||||
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
|
|
||||||
</capability>
|
|
||||||
...
|
|
||||||
</device></pre>
|
|
||||||
|
|
||||||
<h3><a id="MDEVCap">MDEV capability</a></h3>
|
|
||||||
<p>
|
|
||||||
A device capable of creating mediated devices will include a nested
|
|
||||||
capability <code>mdev_types</code> which enumerates all supported mdev
|
|
||||||
types on the physical device, along with the type attributes available
|
|
||||||
through sysfs. A detailed description of the XML format for the
|
|
||||||
<code>mdev_types</code> capability can be found
|
|
||||||
<a href="formatnode.html#MDEVTypesCap">here</a>.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
The following example shows how we might represent an NVIDIA GPU device
|
|
||||||
that supports mediated devices. See below for <a href="#MDEV">more
|
|
||||||
information about mediated devices</a>.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
...
|
|
||||||
<driver>
|
|
||||||
<name>nvidia</name>
|
|
||||||
</driver>
|
|
||||||
<capability type='pci'>
|
|
||||||
...
|
|
||||||
<capability type='mdev_types'>
|
|
||||||
<type id='nvidia-11'>
|
|
||||||
<name>GRID M60-0B</name>
|
|
||||||
<deviceAPI>vfio-pci</deviceAPI>
|
|
||||||
<availableInstances>16</availableInstances>
|
|
||||||
</type>
|
|
||||||
<!-- Here would come the rest of the available mdev types -->
|
|
||||||
</capability>
|
|
||||||
...
|
|
||||||
</capability>
|
|
||||||
</device></pre>
|
|
||||||
|
|
||||||
<h3><a id="VPDCap">VPD capability</a></h3>
|
|
||||||
<p>
|
|
||||||
A device that exposes a PCI/PCIe VPD capability will include a nested
|
|
||||||
capability <code>vpd</code> which presents data stored in the Vital Product
|
|
||||||
Data (VPD). VPD provides a device name and a number of other standard-defined
|
|
||||||
read-only fields (change level, manufacture id, part number, serial number) and
|
|
||||||
vendor-specific read-only fields. Additionally, if a device supports it,
|
|
||||||
read-write fields (asset tag, vendor-specific fields or system fields) may
|
|
||||||
also be present. The VPD capability is optional for PCI/PCIe devices and the
|
|
||||||
set of exposed fields may vary depending on a device. The XML format follows
|
|
||||||
the binary format described in "I.3. VPD Definitions" in PCI Local Bus (2.2+)
|
|
||||||
and the identical format in PCIe 4.0+. At the time of writing, the support for
|
|
||||||
exposing this capability is only present on Linux-based systems (kernel version
|
|
||||||
v2.6.26 is the first one to expose VPD via sysfs which Libvirt relies on).
|
|
||||||
Reading the VPD contents requires root privileges, therefore,
|
|
||||||
<code>virsh nodedev-dumpxml</code> must be executed accordingly.
|
|
||||||
A description of the XML format for the <code>vpd</code> capability can
|
|
||||||
be found <a href="formatnode.html#VPDCap">here</a>.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
The following example shows a VPD representation for a device that exposes the
|
|
||||||
VPD capability with read-only and read-write fields. Among other things,
|
|
||||||
the VPD of this particular device includes a unique board serial number.
|
|
||||||
</p>
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
<name>pci_0000_42_00_0</name>
|
|
||||||
<capability type='pci'>
|
|
||||||
<class>0x020000</class>
|
|
||||||
<domain>0</domain>
|
|
||||||
<bus>66</bus>
|
|
||||||
<slot>0</slot>
|
|
||||||
<function>0</function>
|
|
||||||
<product id='0xa2d6'>MT42822 BlueField-2 integrated ConnectX-6 Dx network controller</product>
|
|
||||||
<vendor id='0x15b3'>Mellanox Technologies</vendor>
|
|
||||||
<capability type='virt_functions' maxCount='16'/>
|
|
||||||
<capability type='vpd'>
|
|
||||||
<name>BlueField-2 DPU 25GbE Dual-Port SFP56, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket</name>
|
|
||||||
<fields access='readonly'>
|
|
||||||
<change_level>B1</change_level>
|
|
||||||
<manufacture_id>foobar</manufacture_id>
|
|
||||||
<part_number>MBF2H332A-AEEOT</part_number>
|
|
||||||
<serial_number>MT2113X00000</serial_number>
|
|
||||||
<vendor_field index='0'>PCIeGen4 x8</vendor_field>
|
|
||||||
<vendor_field index='2'>MBF2H332A-AEEOT</vendor_field>
|
|
||||||
<vendor_field index='3'>3c53d07eec484d8aab34dabd24fe575aa</vendor_field>
|
|
||||||
<vendor_field index='A'>MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=BF2H332A</vendor_field>
|
|
||||||
</fields>
|
|
||||||
<fields access='readwrite'>
|
|
||||||
<asset_tag>fooasset</asset_tag>
|
|
||||||
<vendor_field index='0'>vendorfield0</vendor_field>
|
|
||||||
<vendor_field index='2'>vendorfield2</vendor_field>
|
|
||||||
<vendor_field index='A'>vendorfieldA</vendor_field>
|
|
||||||
<system_field index='B'>systemfieldB</system_field>
|
|
||||||
<system_field index='0'>systemfield0</system_field>
|
|
||||||
</fields>
|
|
||||||
</capability>
|
|
||||||
<iommuGroup number='65'>
|
|
||||||
<address domain='0x0000' bus='0x42' slot='0x00' function='0x0'/>
|
|
||||||
</iommuGroup>
|
|
||||||
<numa node='0'/>
|
|
||||||
<pci-express>
|
|
||||||
<link validity='cap' port='0' speed='16' width='8'/>
|
|
||||||
<link validity='sta' speed='8' width='8'/>
|
|
||||||
</pci-express>
|
|
||||||
</capability>
|
|
||||||
</device>
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<h2><a id="MDEV">Mediated devices (MDEVs)</a></h2>
|
|
||||||
<p>
|
|
||||||
Mediated devices (<span class="since">Since 3.2.0</span>) are software
|
|
||||||
devices defining resource allocation on the backing physical device which
|
|
||||||
in turn allows the parent physical device's resources to be divided into
|
|
||||||
several mediated devices, thus sharing the physical device's performance
|
|
||||||
among multiple guests. Unlike SR-IOV however, where a PCIe device appears
|
|
||||||
as multiple separate PCIe devices on the host's PCI bus, mediated devices
|
|
||||||
only appear on the mdev virtual bus. Therefore, no detach/reattach
|
|
||||||
procedure from/to the host driver procedure is involved even though
|
|
||||||
mediated devices are used in a direct device assignment manner. A
|
|
||||||
detailed description of the XML format for the <code>mdev</code>
|
|
||||||
capability can be found <a href="formatnode.html#mdev">here</a>.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h3>Example of a mediated device</h3>
|
|
||||||
<pre>
|
|
||||||
<device>
|
|
||||||
<name>mdev_4b20d080_1b54_4048_85b3_a6a62d165c01</name>
|
|
||||||
<path>/sys/devices/pci0000:00/0000:00:02.0/4b20d080-1b54-4048-85b3-a6a62d165c01</path>
|
|
||||||
<parent>pci_0000_06_00_0</parent>
|
|
||||||
<driver>
|
|
||||||
<name>vfio_mdev</name>
|
|
||||||
</driver>
|
|
||||||
<capability type='mdev'>
|
|
||||||
<type id='nvidia-11'/>
|
|
||||||
<uuid>4b20d080-1b54-4048-85b3-a6a62d165c01</uuid>
|
|
||||||
<iommuGroup number='12'/>
|
|
||||||
</capability>
|
|
||||||
</device></pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The support of mediated device's framework in libvirt's node device driver
|
|
||||||
covers the following features:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>
|
|
||||||
list available mediated devices on the host
|
|
||||||
(<span class="since">Since 3.4.0</span>)
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
display device details
|
|
||||||
(<span class="since">Since 3.4.0</span>)
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
create transient mediated devices
|
|
||||||
(<span class="since">Since 6.5.0</span>)
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
define persistent mediated devices
|
|
||||||
(<span class="since">Since 7.3.0</span>)
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Because mediated devices are instantiated from vendor specific templates,
|
|
||||||
simply called 'types', information describing these types is contained
|
|
||||||
within the parent device's capabilities (see the example in <a
|
|
||||||
href="#PCI">PCI host devices</a>). To list all devices capable of
|
|
||||||
creating mediated devices, the following command can be used.
|
|
||||||
</p>
|
|
||||||
<pre>$ virsh nodedev-list --cap mdev_types</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
To see the supported mediated device types on a specific physical device
|
|
||||||
use the following:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>$ virsh nodedev-dumpxml <device></pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Before creating a mediated device, unbind the device from the respective
|
|
||||||
device driver, eg. subchannel I/O driver for a CCW device. Then bind the
|
|
||||||
device to the respective VFIO driver. For a CCW device, also unbind the
|
|
||||||
corresponding subchannel of the CCW device from the subchannel I/O driver
|
|
||||||
and then bind the subchannel (instead of the CCW device) to the vfio_ccw
|
|
||||||
driver. The below example shows the unbinding and binding steps for a CCW
|
|
||||||
device.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
device="0.0.1234"
|
|
||||||
subchannel="0.0.0123"
|
|
||||||
echo $device > /sys/bus/ccw/devices/$device/driver/unbind
|
|
||||||
echo $subchannel > /sys/bus/css/devices/$subchannel/driver/unbind
|
|
||||||
echo $subchannel > /sys/bus/css/drivers/vfio_ccw/bind
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
To instantiate a transient mediated device, create an XML file representing the
|
|
||||||
device. See above for information about the mediated device xml format.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>$ virsh nodedev-create <xml-file>
|
|
||||||
Node device '<device-name>' created from '<xml-file>'</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
If you would like to persistently define the device so that it will be
|
|
||||||
maintained across host reboots, use <code>virsh nodedev-define</code>
|
|
||||||
instead of <code>nodedev-create</code>:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>$ virsh nodedev-define <xml-file>
|
|
||||||
Node device '<device-name>' defined from '<xml-file>'</pre>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
To start an instance of this device definition, use the following command:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>$ virsh nodedev-start <device-name></pre>
|
|
||||||
<p>
|
|
||||||
Active mediated device instances can be stopped using <code>virsh
|
|
||||||
nodedev-destroy</code>, and persistent device definitions can be removed
|
|
||||||
using <code>virsh nodedev-undefine</code>.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
If a mediated device is defined persistently, it can also be set to be
|
|
||||||
automatically started whenever the host reboots or when the parent device
|
|
||||||
becomes available. In order to autostart a mediated device, use the
|
|
||||||
following command:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<pre>$ virsh nodedev-autostart <device-name></pre>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
348
docs/drvnodedev.rst
Normal file
348
docs/drvnodedev.rst
Normal file
@ -0,0 +1,348 @@
|
|||||||
|
.. role:: since
|
||||||
|
|
||||||
|
======================
|
||||||
|
Host device management
|
||||||
|
======================
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
|
||||||
|
Libvirt provides management of both physical and virtual host devices
|
||||||
|
(historically also referred to as node devices) like USB, PCI, SCSI, and network
|
||||||
|
devices. This also includes various virtualization capabilities which the
|
||||||
|
aforementioned devices provide for utilization, for example SR-IOV, NPIV, MDEV,
|
||||||
|
DRM, etc.
|
||||||
|
|
||||||
|
The node device driver provides means to list and show details about host
|
||||||
|
devices (``virsh nodedev-list``, ``virsh nodedev-info``, and
|
||||||
|
``virsh nodedev-dumpxml``), which are generic and can be used with all devices.
|
||||||
|
It also provides the means to manage virtual devices. Persistently-defined
|
||||||
|
virtual devices are only supported for mediated devices, while transient devices
|
||||||
|
are supported by both mediated devices and NPIV (`more info about
|
||||||
|
NPIV) <https://wiki.libvirt.org/page/NPIV_in_libvirt>`__).
|
||||||
|
|
||||||
|
Persistent virtual devices are managed with ``virsh nodedev-define`` and
|
||||||
|
``virsh nodedev-undefine``. Persistent devices can be configured to start
|
||||||
|
manually or automatically using ``virsh nodedev-autostart``. Inactive devices
|
||||||
|
can be made active with ``virsh nodedev-start``.
|
||||||
|
|
||||||
|
Transient virtual devices are started and stopped with the commands
|
||||||
|
``virsh nodedev-create`` and ``virsh nodedev-destroy``.
|
||||||
|
|
||||||
|
Devices on the host system are arranged in a tree-like hierarchy, with the root
|
||||||
|
node being called ``computer``. The node device driver supports udev backend
|
||||||
|
(HAL backend was removed in ``6.8.0``).
|
||||||
|
|
||||||
|
Details of the XML format of a host device can be found
|
||||||
|
`here <formatnode.html>`__. Of particular interest is the ``capability``
|
||||||
|
element, which describes features supported by the device. Some specific device
|
||||||
|
types are addressed in more detail below.
|
||||||
|
|
||||||
|
Basic structure of a node device
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
<name>pci_0000_00_17_0</name>
|
||||||
|
<path>/sys/devices/pci0000:00/0000:00:17.0</path>
|
||||||
|
<parent>computer</parent>
|
||||||
|
<driver>
|
||||||
|
<name>ahci</name>
|
||||||
|
</driver>
|
||||||
|
<capability type='pci'>
|
||||||
|
...
|
||||||
|
</capability>
|
||||||
|
</device>
|
||||||
|
|
||||||
|
PCI host devices
|
||||||
|
----------------
|
||||||
|
|
||||||
|
``capability``
|
||||||
|
When used as top level element, the supported values for the ``type``
|
||||||
|
attribute are ``pci`` and ``phys_function`` (see `SR-IOV
|
||||||
|
below <#SRIOVCap>`__).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
<name>pci_0000_04_00_1</name>
|
||||||
|
<path>/sys/devices/pci0000:00/0000:00:06.0/0000:04:00.1</path>
|
||||||
|
<parent>pci_0000_00_06_0</parent>
|
||||||
|
<driver>
|
||||||
|
<name>igb</name>
|
||||||
|
</driver>
|
||||||
|
<capability type='pci'>
|
||||||
|
<domain>0</domain>
|
||||||
|
<bus>4</bus>
|
||||||
|
<slot>0</slot>
|
||||||
|
<function>1</function>
|
||||||
|
<product id='0x10c9'>82576 Gigabit Network Connection</product>
|
||||||
|
<vendor id='0x8086'>Intel Corporation</vendor>
|
||||||
|
<iommuGroup number='15'>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
|
||||||
|
</iommuGroup>
|
||||||
|
<numa node='0'/>
|
||||||
|
<pci-express>
|
||||||
|
<link validity='cap' port='1' speed='2.5' width='2'/>
|
||||||
|
<link validity='sta' speed='2.5' width='2'/>
|
||||||
|
</pci-express>
|
||||||
|
</capability>
|
||||||
|
</device>
|
||||||
|
|
||||||
|
The XML format for a PCI device stays the same for any further capabilities it
|
||||||
|
supports, a single nested ``<capability>`` element will be included for each
|
||||||
|
capability the device supports.
|
||||||
|
|
||||||
|
SR-IOV capability
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Single root input/output virtualization (SR-IOV) allows sharing of the PCIe
|
||||||
|
resources by multiple virtual environments. That is achieved by slicing up a
|
||||||
|
single full-featured physical resource called physical function (PF) into
|
||||||
|
multiple devices called virtual functions (VFs) sharing their configuration with
|
||||||
|
the underlying PF. Despite the SR-IOV specification, the amount of VFs that can
|
||||||
|
be created on a PF varies among manufacturers.
|
||||||
|
|
||||||
|
Suppose the NIC `above <#PCI>`__ was also SR-IOV capable, it would also include
|
||||||
|
a nested ``<capability>`` element enumerating all virtual functions available on
|
||||||
|
the physical device (physical port) like in the example below.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<capability type='pci'>
|
||||||
|
...
|
||||||
|
<capability type='virt_functions' maxCount='7'>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x5'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x7'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x1'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x3'/>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x5'/>
|
||||||
|
</capability>
|
||||||
|
...
|
||||||
|
</capability>
|
||||||
|
|
||||||
|
A SR-IOV child device on the other hand, would then report its top level
|
||||||
|
capability type as a ``phys_function`` instead:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
...
|
||||||
|
<capability type='phys_function'>
|
||||||
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
|
||||||
|
</capability>
|
||||||
|
...
|
||||||
|
</device>
|
||||||
|
|
||||||
|
MDEV capability
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A device capable of creating mediated devices will include a nested capability
|
||||||
|
``mdev_types`` which enumerates all supported mdev types on the physical device,
|
||||||
|
along with the type attributes available through sysfs. A detailed description
|
||||||
|
of the XML format for the ``mdev_types`` capability can be found
|
||||||
|
`here <formatnode.html#MDEVTypesCap>`__.
|
||||||
|
|
||||||
|
The following example shows how we might represent an NVIDIA GPU device that
|
||||||
|
supports mediated devices. See below for `more information about mediated
|
||||||
|
devices <#MDEV>`__.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
...
|
||||||
|
<driver>
|
||||||
|
<name>nvidia</name>
|
||||||
|
</driver>
|
||||||
|
<capability type='pci'>
|
||||||
|
...
|
||||||
|
<capability type='mdev_types'>
|
||||||
|
<type id='nvidia-11'>
|
||||||
|
<name>GRID M60-0B</name>
|
||||||
|
<deviceAPI>vfio-pci</deviceAPI>
|
||||||
|
<availableInstances>16</availableInstances>
|
||||||
|
</type>
|
||||||
|
<!-- Here would come the rest of the available mdev types -->
|
||||||
|
</capability>
|
||||||
|
...
|
||||||
|
</capability>
|
||||||
|
</device>
|
||||||
|
|
||||||
|
VPD capability
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A device that exposes a PCI/PCIe VPD capability will include a nested capability
|
||||||
|
``vpd`` which presents data stored in the Vital Product Data (VPD). VPD provides
|
||||||
|
a device name and a number of other standard-defined read-only fields (change
|
||||||
|
level, manufacture id, part number, serial number) and vendor-specific read-only
|
||||||
|
fields. Additionally, if a device supports it, read-write fields (asset tag,
|
||||||
|
vendor-specific fields or system fields) may also be present. The VPD capability
|
||||||
|
is optional for PCI/PCIe devices and the set of exposed fields may vary
|
||||||
|
depending on a device. The XML format follows the binary format described in
|
||||||
|
"I.3. VPD Definitions" in PCI Local Bus (2.2+) and the identical format in PCIe
|
||||||
|
4.0+. At the time of writing, the support for exposing this capability is only
|
||||||
|
present on Linux-based systems (kernel version v2.6.26 is the first one to
|
||||||
|
expose VPD via sysfs which Libvirt relies on). Reading the VPD contents requires
|
||||||
|
root privileges, therefore, ``virsh nodedev-dumpxml`` must be executed
|
||||||
|
accordingly. A description of the XML format for the ``vpd`` capability can be
|
||||||
|
found `here <formatnode.html#VPDCap>`__.
|
||||||
|
|
||||||
|
The following example shows a VPD representation for a device that exposes the
|
||||||
|
VPD capability with read-only and read-write fields. Among other things, the VPD
|
||||||
|
of this particular device includes a unique board serial number.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
<name>pci_0000_42_00_0</name>
|
||||||
|
<capability type='pci'>
|
||||||
|
<class>0x020000</class>
|
||||||
|
<domain>0</domain>
|
||||||
|
<bus>66</bus>
|
||||||
|
<slot>0</slot>
|
||||||
|
<function>0</function>
|
||||||
|
<product id='0xa2d6'>MT42822 BlueField-2 integrated ConnectX-6 Dx network controller</product>
|
||||||
|
<vendor id='0x15b3'>Mellanox Technologies</vendor>
|
||||||
|
<capability type='virt_functions' maxCount='16'/>
|
||||||
|
<capability type='vpd'>
|
||||||
|
<name>BlueField-2 DPU 25GbE Dual-Port SFP56, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket</name>
|
||||||
|
<fields access='readonly'>
|
||||||
|
<change_level>B1</change_level>
|
||||||
|
<manufacture_id>foobar</manufacture_id>
|
||||||
|
<part_number>MBF2H332A-AEEOT</part_number>
|
||||||
|
<serial_number>MT2113X00000</serial_number>
|
||||||
|
<vendor_field index='0'>PCIeGen4 x8</vendor_field>
|
||||||
|
<vendor_field index='2'>MBF2H332A-AEEOT</vendor_field>
|
||||||
|
<vendor_field index='3'>3c53d07eec484d8aab34dabd24fe575aa</vendor_field>
|
||||||
|
<vendor_field index='A'>MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=BF2H332A</vendor_field>
|
||||||
|
</fields>
|
||||||
|
<fields access='readwrite'>
|
||||||
|
<asset_tag>fooasset</asset_tag>
|
||||||
|
<vendor_field index='0'>vendorfield0</vendor_field>
|
||||||
|
<vendor_field index='2'>vendorfield2</vendor_field>
|
||||||
|
<vendor_field index='A'>vendorfieldA</vendor_field>
|
||||||
|
<system_field index='B'>systemfieldB</system_field>
|
||||||
|
<system_field index='0'>systemfield0</system_field>
|
||||||
|
</fields>
|
||||||
|
</capability>
|
||||||
|
<iommuGroup number='65'>
|
||||||
|
<address domain='0x0000' bus='0x42' slot='0x00' function='0x0'/>
|
||||||
|
</iommuGroup>
|
||||||
|
<numa node='0'/>
|
||||||
|
<pci-express>
|
||||||
|
<link validity='cap' port='0' speed='16' width='8'/>
|
||||||
|
<link validity='sta' speed='8' width='8'/>
|
||||||
|
</pci-express>
|
||||||
|
</capability>
|
||||||
|
</device>
|
||||||
|
|
||||||
|
Mediated devices (MDEVs)
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
Mediated devices ( :since:`Since 3.2.0` ) are software devices defining resource
|
||||||
|
allocation on the backing physical device which in turn allows the parent
|
||||||
|
physical device's resources to be divided into several mediated devices, thus
|
||||||
|
sharing the physical device's performance among multiple guests. Unlike SR-IOV
|
||||||
|
however, where a PCIe device appears as multiple separate PCIe devices on the
|
||||||
|
host's PCI bus, mediated devices only appear on the mdev virtual bus. Therefore,
|
||||||
|
no detach/reattach procedure from/to the host driver procedure is involved even
|
||||||
|
though mediated devices are used in a direct device assignment manner. A
|
||||||
|
detailed description of the XML format for the ``mdev`` capability can be found
|
||||||
|
`here <formatnode.html#mdev>`__.
|
||||||
|
|
||||||
|
Example of a mediated device
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
<device>
|
||||||
|
<name>mdev_4b20d080_1b54_4048_85b3_a6a62d165c01</name>
|
||||||
|
<path>/sys/devices/pci0000:00/0000:00:02.0/4b20d080-1b54-4048-85b3-a6a62d165c01</path>
|
||||||
|
<parent>pci_0000_06_00_0</parent>
|
||||||
|
<driver>
|
||||||
|
<name>vfio_mdev</name>
|
||||||
|
</driver>
|
||||||
|
<capability type='mdev'>
|
||||||
|
<type id='nvidia-11'/>
|
||||||
|
<uuid>4b20d080-1b54-4048-85b3-a6a62d165c01</uuid>
|
||||||
|
<iommuGroup number='12'/>
|
||||||
|
</capability>
|
||||||
|
</device>
|
||||||
|
|
||||||
|
The support of mediated device's framework in libvirt's node device driver
|
||||||
|
covers the following features:
|
||||||
|
|
||||||
|
- list available mediated devices on the host ( :since:`Since 3.4.0` )
|
||||||
|
- display device details ( :since:`Since 3.4.0` )
|
||||||
|
- create transient mediated devices ( :since:`Since 6.5.0` )
|
||||||
|
- define persistent mediated devices ( :since:`Since 7.3.0` )
|
||||||
|
|
||||||
|
Because mediated devices are instantiated from vendor specific templates, simply
|
||||||
|
called 'types', information describing these types is contained within the
|
||||||
|
parent device's capabilities (see the example in `PCI host devices <#PCI>`__).
|
||||||
|
To list all devices capable of creating mediated devices, the following command
|
||||||
|
can be used.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-list --cap mdev_types
|
||||||
|
|
||||||
|
To see the supported mediated device types on a specific physical device use the
|
||||||
|
following:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-dumpxml <device>
|
||||||
|
|
||||||
|
Before creating a mediated device, unbind the device from the respective device
|
||||||
|
driver, eg. subchannel I/O driver for a CCW device. Then bind the device to the
|
||||||
|
respective VFIO driver. For a CCW device, also unbind the corresponding
|
||||||
|
subchannel of the CCW device from the subchannel I/O driver and then bind the
|
||||||
|
subchannel (instead of the CCW device) to the vfio_ccw driver. The below example
|
||||||
|
shows the unbinding and binding steps for a CCW device.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
device="0.0.1234"
|
||||||
|
subchannel="0.0.0123"
|
||||||
|
echo $device > /sys/bus/ccw/devices/$device/driver/unbind
|
||||||
|
echo $subchannel > /sys/bus/css/devices/$subchannel/driver/unbind
|
||||||
|
echo $subchannel > /sys/bus/css/drivers/vfio_ccw/bind
|
||||||
|
|
||||||
|
To instantiate a transient mediated device, create an XML file representing the
|
||||||
|
device. See above for information about the mediated device xml format.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-create <xml-file>
|
||||||
|
Node device '<device-name>' created from '<xml-file>'
|
||||||
|
|
||||||
|
If you would like to persistently define the device so that it will be
|
||||||
|
maintained across host reboots, use ``virsh nodedev-define`` instead of
|
||||||
|
``nodedev-create``:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-define <xml-file>
|
||||||
|
Node device '<device-name>' defined from '<xml-file>'
|
||||||
|
|
||||||
|
To start an instance of this device definition, use the following command:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-start <device-name>
|
||||||
|
|
||||||
|
Active mediated device instances can be stopped using
|
||||||
|
``virsh nodedev-destroy``, and persistent device definitions can be
|
||||||
|
removed using ``virsh nodedev-undefine``.
|
||||||
|
|
||||||
|
If a mediated device is defined persistently, it can also be set to be
|
||||||
|
automatically started whenever the host reboots or when the parent device
|
||||||
|
becomes available. In order to autostart a mediated device, use the following
|
||||||
|
command:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ virsh nodedev-autostart <device-name>
|
@ -4166,7 +4166,8 @@ or:
|
|||||||
specifies the device API which determines how the host's vfio driver will
|
specifies the device API which determines how the host's vfio driver will
|
||||||
expose the device to the guest. Currently, ``model='vfio-pci'``,
|
expose the device to the guest. Currently, ``model='vfio-pci'``,
|
||||||
``model='vfio-ccw'`` ( :since:`Since 4.4.0` ) and ``model='vfio-ap'`` (
|
``model='vfio-ccw'`` ( :since:`Since 4.4.0` ) and ``model='vfio-ap'`` (
|
||||||
:since:`Since 4.9.0` ) is supported. `MDEV <drvnodedev.html#MDEV>`__
|
:since:`Since 4.9.0` ) is supported.
|
||||||
|
`MDEV <drvnodedev.html#mediated-devices-mdevs>`__
|
||||||
section provides more information about mediated devices as well as how to
|
section provides more information about mediated devices as well as how to
|
||||||
create mediated devices on the host. :since:`Since 4.6.0 (QEMU 2.12)` an
|
create mediated devices on the host. :since:`Since 4.6.0 (QEMU 2.12)` an
|
||||||
optional ``display`` attribute may be used to enable or disable support
|
optional ``display`` attribute may be used to enable or disable support
|
||||||
|
@ -22,7 +22,6 @@ docs_html_in_files = [
|
|||||||
'csharp',
|
'csharp',
|
||||||
'dbus',
|
'dbus',
|
||||||
'docs',
|
'docs',
|
||||||
'drvnodedev',
|
|
||||||
'drvopenvz',
|
'drvopenvz',
|
||||||
'drvsecret',
|
'drvsecret',
|
||||||
'drvtest',
|
'drvtest',
|
||||||
@ -80,6 +79,7 @@ docs_rst_files = [
|
|||||||
'drvesx',
|
'drvesx',
|
||||||
'drvhyperv',
|
'drvhyperv',
|
||||||
'drvlxc',
|
'drvlxc',
|
||||||
|
'drvnodedev',
|
||||||
'drvqemu',
|
'drvqemu',
|
||||||
'errors',
|
'errors',
|
||||||
'formatbackup',
|
'formatbackup',
|
||||||
|
Loading…
Reference in New Issue
Block a user