2017-03-27 09:03:02 +02:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
2017-07-26 18:01:25 +01:00
|
|
|
<!DOCTYPE html>
|
2017-03-27 09:03:02 +02:00
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<body>
|
|
|
|
<h1>Host device management</h1>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Libvirt provides management of both physical and virtual host devices
|
|
|
|
(historically also referred to as node devices) like USB, PCI, SCSI, and
|
|
|
|
network devices. This also includes various virtualization capabilities
|
|
|
|
which the aforementioned devices provide for utilization, for example
|
2017-03-29 12:36:31 +02:00
|
|
|
SR-IOV, NPIV, MDEV, DRM, etc.
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The node device driver provides means to list and show details about host
|
|
|
|
devices (<code>virsh nodedev-list</code>,
|
|
|
|
<code>virsh nodedev-dumpxml</code>), which are generic and can be used
|
|
|
|
with all devices. It also provides means to create and destroy devices
|
|
|
|
(<code>virsh nodedev-create</code>, <code>virsh nodedev-destroy</code>)
|
|
|
|
which are meant to be used to create virtual devices, currently only
|
|
|
|
supported by NPIV
|
2020-08-26 00:44:00 +02:00
|
|
|
(<a href="https://wiki.libvirt.org/page/NPIV_in_libvirt">more info about NPIV)</a>).
|
2017-03-27 09:03:02 +02:00
|
|
|
Devices on the host system are arranged in a tree-like hierarchy, with
|
|
|
|
the root node being called <code>computer</code>. The node device driver
|
2020-09-03 14:20:43 +02:00
|
|
|
supports udev backend (HAL backend was removed in <code>6.8.0</code>).
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
2020-05-21 15:09:35 -05:00
|
|
|
Details of the XML format of a host device can be found <a
|
|
|
|
href="formatnode.html">here</a>. Of particular interest is the
|
|
|
|
<code>capability</code> element, which describes features supported by
|
|
|
|
the device. Some specific device types are addressed in more detail
|
|
|
|
below.
|
2017-03-27 09:03:02 +02:00
|
|
|
</p>
|
|
|
|
<h2>Basic structure of a node device</h2>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>pci_0000_00_17_0</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:17.0</path>
|
|
|
|
<parent>computer</parent>
|
|
|
|
<driver>
|
|
|
|
<name>ahci</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
|
|
|
<ul id="toc"/>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h2><a id="PCI">PCI host devices</a></h2>
|
2017-03-27 09:03:02 +02:00
|
|
|
<dl>
|
|
|
|
<dt><code>capability</code></dt>
|
|
|
|
<dd>
|
|
|
|
When used as top level element, the supported values for the
|
|
|
|
<code>type</code> attribute are <code>pci</code> and
|
|
|
|
<code>phys_function</code> (see <a href="#SRIOVCap">SR-IOV below</a>).
|
|
|
|
</dd>
|
|
|
|
</dl>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>pci_0000_04_00_1</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:06.0/0000:04:00.1</path>
|
|
|
|
<parent>pci_0000_00_06_0</parent>
|
|
|
|
<driver>
|
|
|
|
<name>igb</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
<domain>0</domain>
|
|
|
|
<bus>4</bus>
|
|
|
|
<slot>0</slot>
|
|
|
|
<function>1</function>
|
|
|
|
<product id='0x10c9'>82576 Gigabit Network Connection</product>
|
|
|
|
<vendor id='0x8086'>Intel Corporation</vendor>
|
|
|
|
<iommuGroup number='15'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
|
|
|
|
</iommuGroup>
|
|
|
|
<numa node='0'/>
|
|
|
|
<pci-express>
|
|
|
|
<link validity='cap' port='1' speed='2.5' width='2'/>
|
|
|
|
<link validity='sta' speed='2.5' width='2'/>
|
|
|
|
</pci-express>
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
The XML format for a PCI device stays the same for any further
|
|
|
|
capabilities it supports, a single nested <code><capability></code>
|
|
|
|
element will be included for each capability the device supports.
|
|
|
|
</p>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h3><a id="SRIOVCap">SR-IOV capability</a></h3>
|
2017-03-27 09:03:02 +02:00
|
|
|
<p>
|
|
|
|
Single root input/output virtualization (SR-IOV) allows sharing of the
|
|
|
|
PCIe resources by multiple virtual environments. That is achieved by
|
|
|
|
slicing up a single full-featured physical resource called physical
|
|
|
|
function (PF) into multiple devices called virtual functions (VFs) sharing
|
|
|
|
their configuration with the underlying PF. Despite the SR-IOV
|
|
|
|
specification, the amount of VFs that can be created on a PF varies among
|
|
|
|
manufacturers.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Suppose the NIC <a href="#PCI">above</a> was also SR-IOV capable, it would
|
|
|
|
also include a nested
|
|
|
|
<code><capability></code> element enumerating all virtual
|
|
|
|
functions available on the physical device (physical port) like in the
|
|
|
|
example below.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
<capability type='virt_functions' maxCount='7'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x5'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x7'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x1'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x3'/>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x5'/>
|
|
|
|
</capability>
|
|
|
|
...
|
|
|
|
</capability></pre>
|
|
|
|
<p>
|
|
|
|
A SR-IOV child device on the other hand, would then report its top level
|
|
|
|
capability type as a <code>phys_function</code> instead:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
...
|
|
|
|
<capability type='phys_function'>
|
|
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
|
|
|
|
</capability>
|
|
|
|
...
|
2020-05-22 11:53:59 +02:00
|
|
|
</device></pre>
|
2017-03-27 09:03:02 +02:00
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h3><a id="MDEVCap">MDEV capability</a></h3>
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
|
|
|
A PCI device capable of creating mediated devices will include a nested
|
|
|
|
capability <code>mdev_types</code> which enumerates all supported mdev
|
|
|
|
types on the physical device, along with the type attributes available
|
2020-05-21 15:09:35 -05:00
|
|
|
through sysfs. A detailed description of the XML format for the
|
|
|
|
<code>mdev_types</code> capability can be found
|
|
|
|
<a href="formatnode.html#MDEVCap">here</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
<p>
|
2020-05-21 15:09:35 -05:00
|
|
|
The following example shows how we might represent an NVIDIA GPU device
|
|
|
|
that supports mediated devices. See below for <a href="#MDEV">more
|
|
|
|
information about mediated devices</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
...
|
|
|
|
<driver>
|
|
|
|
<name>nvidia</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='pci'>
|
|
|
|
...
|
|
|
|
<capability type='mdev_types'>
|
|
|
|
<type id='nvidia-11'>
|
|
|
|
<name>GRID M60-0B</name>
|
|
|
|
<deviceAPI>vfio-pci</deviceAPI>
|
|
|
|
<availableInstances>16</availableInstances>
|
|
|
|
</type>
|
|
|
|
<!-- Here would come the rest of the available mdev types -->
|
|
|
|
</capability>
|
|
|
|
...
|
|
|
|
</capability>
|
|
|
|
</device></pre>
|
|
|
|
|
2017-07-26 15:52:42 +01:00
|
|
|
<h2><a id="MDEV">Mediated devices (MDEVs)</a></h2>
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
|
|
|
Mediated devices (<span class="since">Since 3.2.0</span>) are software
|
|
|
|
devices defining resource allocation on the backing physical device which
|
|
|
|
in turn allows the parent physical device's resources to be divided into
|
|
|
|
several mediated devices, thus sharing the physical device's performance
|
|
|
|
among multiple guests. Unlike SR-IOV however, where a PCIe device appears
|
|
|
|
as multiple separate PCIe devices on the host's PCI bus, mediated devices
|
|
|
|
only appear on the mdev virtual bus. Therefore, no detach/reattach
|
|
|
|
procedure from/to the host driver procedure is involved even though
|
2020-05-21 15:09:35 -05:00
|
|
|
mediated devices are used in a direct device assignment manner. A
|
|
|
|
detailed description of the XML format for the <code>mdev</code>
|
|
|
|
capability can be found <a href="formatnode.html#mdev">here</a>.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<h3>Example of a mediated device</h3>
|
|
|
|
<pre>
|
|
|
|
<device>
|
|
|
|
<name>mdev_4b20d080_1b54_4048_85b3_a6a62d165c01</name>
|
|
|
|
<path>/sys/devices/pci0000:00/0000:00:02.0/4b20d080-1b54-4048-85b3-a6a62d165c01</path>
|
|
|
|
<parent>pci_0000_06_00_0</parent>
|
|
|
|
<driver>
|
|
|
|
<name>vfio_mdev</name>
|
|
|
|
</driver>
|
|
|
|
<capability type='mdev'>
|
|
|
|
<type id='nvidia-11'/>
|
|
|
|
<iommuGroup number='12'/>
|
2020-05-22 11:53:59 +02:00
|
|
|
</capability>
|
|
|
|
</device></pre>
|
2017-03-29 12:36:31 +02:00
|
|
|
|
|
|
|
<p>
|
|
|
|
The support of mediated device's framework in libvirt's node device driver
|
|
|
|
covers the following features:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li>
|
|
|
|
list available mediated devices on the host
|
|
|
|
(<span class="since">Since 3.4.0</span>)
|
|
|
|
</li>
|
|
|
|
<li>
|
|
|
|
display device details
|
|
|
|
(<span class="since">Since 3.4.0</span>)
|
|
|
|
</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Because mediated devices are instantiated from vendor specific templates,
|
|
|
|
simply called 'types', information describing these types is contained
|
|
|
|
within the parent device's capabilities
|
|
|
|
(see the example in <a href="#PCI">PCI host devices</a>).
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
To see the supported mediated device types on a specific physical device
|
|
|
|
use the following:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ ls /sys/class/mdev_bus/<device>/mdev_supported_types</pre>
|
|
|
|
|
2018-05-07 16:41:17 +02:00
|
|
|
<p>
|
|
|
|
Before creating a mediated device, unbind the device from the respective
|
|
|
|
device driver, eg. subchannel I/O driver for a CCW device. Then bind the
|
|
|
|
device to the respective VFIO driver. For a CCW device, also unbind the
|
|
|
|
corresponding subchannel of the CCW device from the subchannel I/O driver
|
|
|
|
and then bind the subchannel (instead of the CCW device) to the vfio_ccw
|
|
|
|
driver. The below example shows the unbinding and binding steps for a CCW
|
|
|
|
device.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
device="0.0.1234"
|
|
|
|
subchannel="0.0.0123"
|
|
|
|
echo $device > /sys/bus/ccw/devices/$device/driver/unbind
|
|
|
|
echo $subchannel > /sys/bus/css/devices/$subchannel/driver/unbind
|
|
|
|
echo $subchannel > /sys/bus/css/drivers/vfio_ccw/bind
|
|
|
|
</pre>
|
|
|
|
|
2017-03-29 12:36:31 +02:00
|
|
|
<p>
|
|
|
|
To manually instantiate a mediated device, use one of the following as a
|
2018-05-07 16:41:17 +02:00
|
|
|
reference. For a CCW device, use the subchannel ID instead of the device
|
|
|
|
ID.
|
2017-03-29 12:36:31 +02:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ uuidgen > /sys/class/mdev_bus/<device>/mdev_supported_types/<type>/create
|
|
|
|
...
|
|
|
|
$ echo <UUID> > /sys/class/mdev_bus/<device>/mdev_supported_types/<type>/create</pre>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Manual removal of a mediated device is then performed as follows:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ echo 1 > /sys/bus/mdev/devices/<uuid>/remove</pre>
|
|
|
|
|
2017-03-27 09:03:02 +02:00
|
|
|
</body>
|
|
|
|
</html>
|