2017-03-27 09:03:02 +02:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
2017-07-26 18:01:25 +01:00
|
|
|
<!DOCTYPE html>
|
2017-03-27 09:03:02 +02:00
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<body>
|
|
|
|
<h1>Host device management</h1>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Libvirt provides management of both physical and virtual host devices
|
|
|
|
(historically also referred to as node devices) like USB, PCI, SCSI, and
|
|
|
|
network devices. This also includes various virtualization capabilities
|
|
|
|
which the aforementioned devices provide for utilization, for example
|
2017-03-29 12:36:31 +02:00
|
|
|
SR-IOV, NPIV, MDEV, DRM, etc.
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The node device driver provides means to list and show details about host
|
2021-06-02 15:03:12 -05:00
|
|
|
devices (<code>virsh nodedev-list</code>, <code>virsh nodedev-info</code>,
|
|
|
|
and <code>virsh nodedev-dumpxml</code>), which are generic and can be used
|
|
|
|
with all devices. It also provides the means to manage virtual devices.
|
|
|
|
Persistently-defined virtual devices are only supported for mediated
|
|
|
|
devices, while transient devices are supported by both mediated devices
|
|
|
|
and NPIV (<a href="https://wiki.libvirt.org/page/NPIV_in_libvirt">more
|
|
|
|
info about NPIV)</a>).
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
Persistent virtual devices are managed with
|
|
|
|
<code>virsh nodedev-define</code> and <code>virsh nodedev-undefine</code>.
|
|
|
|
Persistent devices can be configured to start manually or automatically
|
|
|
|
using <code>virsh nodedev-autostart</code>. Inactive devices can be made
|
|
|
|
active with <code>virsh nodedev-start</code>.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
Transient virtual devices are started and stopped with the commands
|
|
|
|
<code>virsh nodedev-create</code> and <code>virsh nodedev-destroy</code>.
|
|
|
|
</p>
|
|
|
|
<p>
|
2017-03-27 09:03:02 +02:00
|
|
|
Devices on the host system are arranged in a tree-like hierarchy, with
|
|
|
|
the root node being called <code>computer</code>. The node device driver
|
2020-09-03 14:20:43 +02:00
|
|
|
supports udev backend (HAL backend was removed in <code>6.8.0</code>).
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
2020-05-21 15:09:35 -05:00
|
|
|
Details of the XML format of a host device can be found <a
|
|
|
|
href="formatnode.html">here</a>. Of particular interest is the
|
|
|
|
<code>capability</code> element, which describes features supported by
|
|
|
|
the device. Some specific device types are addressed in more detail
|
|
|
|
below.
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
<h2>Basic structure of a node device</h2>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>pci_0000_00_17_0</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:17.0</path>
|
|
|
|
<parent>computer</parent>
|
|
|
|
<driver>
|
|
|
|
<name>ahci</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
|
|
|
<ul id="toc"/>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h2><a id="PCI">PCI host devices</a></h2>
|
2017-03-27 09:03:02 +02:00
|
|
|
<dl>
|
|
|
|
<dt><code>capability</code></dt>
|
|
|
|
<dd>
|
|
|
|
When used as top level element, the supported values for the
|
|
|
|
<code>type</code> attribute are <code>pci</code> and
|
|
|
|
<code>phys_function</code> (see <a href="#SRIOVCap">SR-IOV below</a>).
|
|
|
|
</dd>
|
|
|
|
</dl>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>pci_0000_04_00_1</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:06.0/0000:04:00.1</path>
|
|
|
|
<parent>pci_0000_00_06_0</parent>
|
|
|
|
<driver>
|
|
|
|
<name>igb</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
<domain>0</domain>
|
|
|
|
<bus>4</bus>
|
|
|
|
<slot>0</slot>
|
|
|
|
<function>1</function>
|
|
|
|
<product id='0x10c9'>82576 Gigabit Network Connection</product>
|
|
|
|
<vendor id='0x8086'>Intel Corporation</vendor>
|
|
|
|
<iommuGroup number='15'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
|
|
|
|
</iommuGroup>
|
|
|
|
<numa node='0'/>
|
|
|
|
<pci-express>
|
|
|
|
<link validity='cap' port='1' speed='2.5' width='2'/>
|
|
|
|
<link validity='sta' speed='2.5' width='2'/>
|
|
|
|
</pci-express>
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The XML format for a PCI device stays the same for any further
|
|
|
|
capabilities it supports, a single nested <code><capability></code>
|
|
|
|
element will be included for each capability the device supports.
|
|
|
|
</p>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h3><a id="SRIOVCap">SR-IOV capability</a></h3>
|
2017-03-27 09:03:02 +02:00
|
|
|
<p>
|
|
|
|
Single root input/output virtualization (SR-IOV) allows sharing of the
|
|
|
|
PCIe resources by multiple virtual environments. That is achieved by
|
|
|
|
slicing up a single full-featured physical resource called physical
|
|
|
|
function (PF) into multiple devices called virtual functions (VFs) sharing
|
|
|
|
their configuration with the underlying PF. Despite the SR-IOV
|
|
|
|
specification, the amount of VFs that can be created on a PF varies among
|
|
|
|
manufacturers.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Suppose the NIC <a href="#PCI">above</a> was also SR-IOV capable, it would
|
|
|
|
also include a nested
|
|
|
|
<code><capability></code> element enumerating all virtual
|
|
|
|
functions available on the physical device (physical port) like in the
|
|
|
|
example below.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
<capability type='virt_functions' maxCount='7'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x5'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x7'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x1'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x3'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x5'/>
|
|
|
|
</capability>
|
|
|
|
...
|
|
|
|
</capability></pre>
|
|
|
|
<p>
|
|
|
|
A SR-IOV child device on the other hand, would then report its top level
|
|
|
|
capability type as a <code>phys_function</code> instead:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
...
|
|
|
|
<capability type='phys_function'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
|
|
|
|
</capability>
|
|
|
|
...
|
2020-05-22 11:53:59 +02:00
|
|
|
</device></pre>
|
2017-03-27 09:03:02 +02:00
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h3><a id="MDEVCap">MDEV capability</a></h3>
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
2020-11-11 13:45:22 +01:00
|
|
|
A device capable of creating mediated devices will include a nested
|
2017-03-29 12:36:31 +02:00
|
|
|
capability <code>mdev_types</code> which enumerates all supported mdev
|
|
|
|
types on the physical device, along with the type attributes available
|
2020-05-21 15:09:35 -05:00
|
|
|
through sysfs. A detailed description of the XML format for the
|
|
|
|
<code>mdev_types</code> capability can be found
|
2020-11-11 13:45:20 +01:00
|
|
|
<a href="formatnode.html#MDEVTypesCap">here</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
<p>
|
2020-05-21 15:09:35 -05:00
|
|
|
The following example shows how we might represent an NVIDIA GPU device
|
|
|
|
that supports mediated devices. See below for <a href="#MDEV">more
|
|
|
|
information about mediated devices</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
...
|
|
|
|
<driver>
|
|
|
|
<name>nvidia</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
<capability type='mdev_types'>
|
|
|
|
<type id='nvidia-11'>
|
|
|
|
<name>GRID M60-0B</name>
|
|
|
|
<deviceAPI>vfio-pci</deviceAPI>
|
|
|
|
<availableInstances>16</availableInstances>
|
|
|
|
</type>
|
|
|
|
<!-- Here would come the rest of the available mdev types -->
|
|
|
|
</capability>
|
|
|
|
...
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
2021-10-20 11:30:34 +03:00
|
|
|
<h3><a id="VPDCap">VPD capability</a></h3>
|
|
|
|
<p>
|
|
|
|
A device that exposes a PCI/PCIe VPD capability will include a nested
|
|
|
|
capability <code>vpd</code> which presents data stored in the Vital Product
|
|
|
|
Data (VPD). VPD provides a device name and a number of other standard-defined
|
|
|
|
read-only fields (change level, manufacture id, part number, serial number) and
|
|
|
|
vendor-specific read-only fields. Additionally, if a device supports it,
|
|
|
|
read-write fields (asset tag, vendor-specific fields or system fields) may
|
|
|
|
also be present. The VPD capability is optional for PCI/PCIe devices and the
|
|
|
|
set of exposed fields may vary depending on a device. The XML format follows
|
|
|
|
the binary format described in "I.3. VPD Definitions" in PCI Local Bus (2.2+)
|
|
|
|
and the identical format in PCIe 4.0+. At the time of writing, the support for
|
|
|
|
exposing this capability is only present on Linux-based systems (kernel version
|
|
|
|
v2.6.26 is the first one to expose VPD via sysfs which Libvirt relies on).
|
|
|
|
Reading the VPD contents requires root privileges, therefore,
|
|
|
|
<code>virsh nodedev-dumpxml</code> must be executed accordingly.
|
|
|
|
A description of the XML format for the <code>vpd</code> capability can
|
|
|
|
be found <a href="formatnode.html#VPDCap">here</a>.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
The following example shows a VPD representation for a device that exposes the
|
|
|
|
VPD capability with read-only and read-write fields. Among other things,
|
|
|
|
the VPD of this particular device includes a unique board serial number.
|
|
|
|
</p>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>pci_0000_42_00_0</name>
|
|
|
|
<capability type='pci'>
|
|
|
|
<class>0x020000</class>
|
|
|
|
<domain>0</domain>
|
|
|
|
<bus>66</bus>
|
|
|
|
<slot>0</slot>
|
|
|
|
<function>0</function>
|
|
|
|
<product id='0xa2d6'>MT42822 BlueField-2 integrated ConnectX-6 Dx network controller</product>
|
|
|
|
<vendor id='0x15b3'>Mellanox Technologies</vendor>
|
|
|
|
<capability type='virt_functions' maxCount='16'/>
|
|
|
|
<capability type='vpd'>
|
|
|
|
<name>BlueField-2 DPU 25GbE Dual-Port SFP56, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket</name>
|
|
|
|
<fields access='readonly'>
|
|
|
|
<change_level>B1</change_level>
|
|
|
|
<manufacture_id>foobar</manufacture_id>
|
|
|
|
<part_number>MBF2H332A-AEEOT</part_number>
|
|
|
|
<serial_number>MT2113X00000</serial_number>
|
|
|
|
<vendor_field index='0'>PCIeGen4 x8</vendor_field>
|
|
|
|
<vendor_field index='2'>MBF2H332A-AEEOT</vendor_field>
|
|
|
|
<vendor_field index='3'>3c53d07eec484d8aab34dabd24fe575aa</vendor_field>
|
|
|
|
<vendor_field index='A'>MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=BF2H332A</vendor_field>
|
|
|
|
</fields>
|
|
|
|
<fields access='readwrite'>
|
|
|
|
<asset_tag>fooasset</asset_tag>
|
|
|
|
<vendor_field index='0'>vendorfield0</vendor_field>
|
|
|
|
<vendor_field index='2'>vendorfield2</vendor_field>
|
|
|
|
<vendor_field index='A'>vendorfieldA</vendor_field>
|
|
|
|
<system_field index='B'>systemfieldB</system_field>
|
|
|
|
<system_field index='0'>systemfield0</system_field>
|
|
|
|
</fields>
|
|
|
|
</capability>
|
|
|
|
<iommuGroup number='65'>
|
|
|
|
<address domain='0x0000' bus='0x42' slot='0x00' function='0x0'/>
|
|
|
|
</iommuGroup>
|
|
|
|
<numa node='0'/>
|
|
|
|
<pci-express>
|
|
|
|
<link validity='cap' port='0' speed='16' width='8'/>
|
|
|
|
<link validity='sta' speed='8' width='8'/>
|
|
|
|
</pci-express>
|
|
|
|
</capability>
|
|
|
|
</device>
|
|
|
|
</pre>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h2><a id="MDEV">Mediated devices (MDEVs)</a></h2>
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
|
|
|
Mediated devices (<span class="since">Since 3.2.0</span>) are software
|
|
|
|
devices defining resource allocation on the backing physical device which
|
|
|
|
in turn allows the parent physical device's resources to be divided into
|
|
|
|
several mediated devices, thus sharing the physical device's performance
|
|
|
|
among multiple guests. Unlike SR-IOV however, where a PCIe device appears
|
|
|
|
as multiple separate PCIe devices on the host's PCI bus, mediated devices
|
|
|
|
only appear on the mdev virtual bus. Therefore, no detach/reattach
|
|
|
|
procedure from/to the host driver procedure is involved even though
|
2020-05-21 15:09:35 -05:00
|
|
|
mediated devices are used in a direct device assignment manner. A
|
|
|
|
detailed description of the XML format for the <code>mdev</code>
|
|
|
|
capability can be found <a href="formatnode.html#mdev">here</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<h3>Example of a mediated device</h3>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>mdev_4b20d080_1b54_4048_85b3_a6a62d165c01</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:02.0/4b20d080-1b54-4048-85b3-a6a62d165c01</path>
|
|
|
|
<parent>pci_0000_06_00_0</parent>
|
|
|
|
<driver>
|
|
|
|
<name>vfio_mdev</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='mdev'>
|
|
|
|
<type id='nvidia-11'/>
|
2021-06-02 15:03:12 -05:00
|
|
|
<uuid>4b20d080-1b54-4048-85b3-a6a62d165c01</uuid>
|
2017-03-29 12:36:31 +02:00
|
|
|
<iommuGroup number='12'/>
|
2020-05-22 11:53:59 +02:00
|
|
|
</capability>
|
|
|
|
</device></pre>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
|
|
|
<p>
|
|
|
|
The support of mediated device's framework in libvirt's node device driver
|
|
|
|
covers the following features:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>
|
|
|
|
list available mediated devices on the host
|
|
|
|
(<span class="since">Since 3.4.0</span>)
|
|
|
|
</li>
|
|
|
|
<li>
|
|
|
|
display device details
|
|
|
|
(<span class="since">Since 3.4.0</span>)
|
|
|
|
</li>
|
2021-06-02 15:03:12 -05:00
|
|
|
<li>
|
|
|
|
create transient mediated devices
|
|
|
|
(<span class="since">Since 6.5.0</span>)
|
|
|
|
</li>
|
|
|
|
<li>
|
|
|
|
define persistent mediated devices
|
|
|
|
(<span class="since">Since 7.3.0</span>)
|
|
|
|
</li>
|
2017-03-29 12:36:31 +02:00
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Because mediated devices are instantiated from vendor specific templates,
|
|
|
|
simply called 'types', information describing these types is contained
|
2021-06-02 15:03:12 -05:00
|
|
|
within the parent device's capabilities (see the example in <a
|
|
|
|
href="#PCI">PCI host devices</a>). To list all devices capable of
|
|
|
|
creating mediated devices, the following command can be used.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
2021-06-02 15:03:12 -05:00
|
|
|
<pre>$ virsh nodedev-list --cap mdev_types</pre>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
|
|
|
<p>
|
|
|
|
To see the supported mediated device types on a specific physical device
|
|
|
|
use the following:
|
|
|
|
</p>
|
|
|
|
|
2021-06-02 15:03:12 -05:00
|
|
|
<pre>$ virsh nodedev-dumpxml <device></pre>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
2018-05-07 16:41:17 +02:00
|
|
|
<p>
|
|
|
|
Before creating a mediated device, unbind the device from the respective
|
|
|
|
device driver, eg. subchannel I/O driver for a CCW device. Then bind the
|
|
|
|
device to the respective VFIO driver. For a CCW device, also unbind the
|
|
|
|
corresponding subchannel of the CCW device from the subchannel I/O driver
|
|
|
|
and then bind the subchannel (instead of the CCW device) to the vfio_ccw
|
|
|
|
driver. The below example shows the unbinding and binding steps for a CCW
|
|
|
|
device.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
device="0.0.1234"
|
|
|
|
subchannel="0.0.0123"
|
|
|
|
echo $device > /sys/bus/ccw/devices/$device/driver/unbind
|
|
|
|
echo $subchannel > /sys/bus/css/devices/$subchannel/driver/unbind
|
|
|
|
echo $subchannel > /sys/bus/css/drivers/vfio_ccw/bind
|
|
|
|
</pre>
|
|
|
|
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
2021-06-02 15:03:12 -05:00
|
|
|
To instantiate a transient mediated device, create an XML file representing the
|
|
|
|
device. See above for information about the mediated device xml format.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
2021-06-02 15:03:12 -05:00
|
|
|
<pre>$ virsh nodedev-create <xml-file>
|
|
|
|
Node device '<device-name>' created from '<xml-file>'</pre>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
|
|
|
<p>
|
2021-06-02 15:03:12 -05:00
|
|
|
If you would like to persistently define the device so that it will be
|
|
|
|
maintained across host reboots, use <code>virsh nodedev-define</code>
|
|
|
|
instead of <code>nodedev-create</code>:
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
2021-06-02 15:03:12 -05:00
|
|
|
<pre>$ virsh nodedev-define <xml-file>
|
|
|
|
Node device '<device-name>' defined from '<xml-file>'</pre>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
To start an instance of this device definition, use the following command:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>$ virsh nodedev-start <device-name></pre>
|
|
|
|
<p>
|
|
|
|
Active mediated device instances can be stopped using <code>virsh
|
|
|
|
nodedev-destroy</code>, and persistent device definitions can be removed
|
|
|
|
using <code>virsh nodedev-undefine</code>.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
If a mediated device is defined persistently, it can also be set to be
|
|
|
|
automatically started whenever the host reboots or when the parent device
|
|
|
|
becomes available. In order to autostart a mediated device, use the
|
|
|
|
following command:
|
|
|
|
</p>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
2021-06-02 15:03:12 -05:00
|
|
|
<pre>$ virsh nodedev-autostart <device-name></pre>
|
2017-03-27 09:03:02 +02:00
|
|
|
</body>
|
|
|
|
</html>
|