mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2024-12-22 13:45:38 +00:00
9c0981ea2e
Signed-off-by: Shalini Chellathurai Saroja <shalini@linux.vnet.ibm.com> Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: John Ferlan <jferlan@redhat.com>
375 lines
14 KiB
XML
375 lines
14 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html>
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<body>
|
|
<h1>Host device management</h1>
|
|
|
|
<p>
|
|
Libvirt provides management of both physical and virtual host devices
|
|
(historically also referred to as node devices) like USB, PCI, SCSI, and
|
|
network devices. This also includes various virtualization capabilities
|
|
which the aforementioned devices provide for utilization, for example
|
|
SR-IOV, NPIV, MDEV, DRM, etc.
|
|
</p>
|
|
|
|
<p>
|
|
The node device driver provides means to list and show details about host
|
|
devices (<code>virsh nodedev-list</code>,
|
|
<code>virsh nodedev-dumpxml</code>), which are generic and can be used
|
|
with all devices. It also provides means to create and destroy devices
|
|
(<code>virsh nodedev-create</code>, <code>virsh nodedev-destroy</code>)
|
|
which are meant to be used to create virtual devices, currently only
|
|
supported by NPIV
|
|
(<a href="http://wiki.libvirt.org/page/NPIV_in_libvirt">more info about NPIV)</a>).
|
|
Devices on the host system are arranged in a tree-like hierarchy, with
|
|
the root node being called <code>computer</code>. The node device driver
|
|
supports two backends to manage the devices, HAL and udev, with the former
|
|
being deprecated in favour of the latter.
|
|
</p>
|
|
|
|
<p>
|
|
The generic format of a host device XML can be seen below.
|
|
To identify a device both within the host and the device tree hierarchy,
|
|
the following elements are used:
|
|
</p>
|
|
<dl>
|
|
<dt><code>name</code></dt>
|
|
<dd>
|
|
The device's name will be generated by libvirt using the subsystem,
|
|
like pci and the device's sysfs basename.
|
|
</dd>
|
|
<dt><code>path</code></dt>
|
|
<dd>
|
|
Fully qualified sysfs path to the device.
|
|
</dd>
|
|
<dt><code>parent</code></dt>
|
|
<dd>
|
|
This element identifies the parent node in the device hierarchy. The
|
|
value of the element will correspond with the device parent's
|
|
<code>name</code> element or <code>computer</code> if the device does
|
|
not have any parent.
|
|
</dd>
|
|
<dt><code>driver</code></dt>
|
|
<dd>
|
|
This elements reports the driver in use for this device. The presence
|
|
of this element in the output XML depends on whether the underlying
|
|
device manager (most likely udev) exposes information about the
|
|
driver.
|
|
</dd>
|
|
<dt><code>capability</code></dt>
|
|
<dd>
|
|
Describes the device in terms of feature support. The element has one
|
|
mandatory attribute <code>type</code> the value of which determines
|
|
the type of the device. Currently recognized values for the attribute
|
|
are:
|
|
<code>system</code>,
|
|
<code>pci</code>,
|
|
<code>usb</code>,
|
|
<code>usb_device</code>,
|
|
<code>net</code>,
|
|
<code>scsi</code>,
|
|
<code>scsi_host</code> (<span class="since">Since 0.4.7</span>),
|
|
<code>fc_host</code>,
|
|
<code>vports</code>,
|
|
<code>scsi_target</code> (<span class="since">Since 0.7.3</span>),
|
|
<code>storage</code> (<span class="since">Since 1.0.4</span>),
|
|
<code>scsi_generic</code> (<span class="since">Since 1.0.7</span>),
|
|
<code>drm</code> (<span class="since">Since 3.1.0</span>), and
|
|
<code>mdev</code> (<span class="since">Since 3.4.0</span>).
|
|
This element can be nested in which case it further specifies a
|
|
device's capability. Refer to specific device types to see more values
|
|
for the <code>type</code> attribute which are exclusive.
|
|
</dd>
|
|
</dl>
|
|
|
|
<h2>Basic structure of a node device</h2>
|
|
<pre>
|
|
<device>
|
|
<name>pci_0000_00_17_0</name>
|
|
<path>/sys/devices/pci0000:00/0000:00:17.0</path>
|
|
<parent>computer</parent>
|
|
<driver>
|
|
<name>ahci</name>
|
|
</driver>
|
|
<capability type='pci'>
|
|
...
|
|
</capability>
|
|
</device></pre>
|
|
|
|
<ul id="toc"/>
|
|
|
|
<h2><a id="PCI">PCI host devices</a></h2>
|
|
<dl>
|
|
<dt><code>capability</code></dt>
|
|
<dd>
|
|
When used as top level element, the supported values for the
|
|
<code>type</code> attribute are <code>pci</code> and
|
|
<code>phys_function</code> (see <a href="#SRIOVCap">SR-IOV below</a>).
|
|
</dd>
|
|
</dl>
|
|
<pre>
|
|
<device>
|
|
<name>pci_0000_04_00_1</name>
|
|
<path>/sys/devices/pci0000:00/0000:00:06.0/0000:04:00.1</path>
|
|
<parent>pci_0000_00_06_0</parent>
|
|
<driver>
|
|
<name>igb</name>
|
|
</driver>
|
|
<capability type='pci'>
|
|
<domain>0</domain>
|
|
<bus>4</bus>
|
|
<slot>0</slot>
|
|
<function>1</function>
|
|
<product id='0x10c9'>82576 Gigabit Network Connection</product>
|
|
<vendor id='0x8086'>Intel Corporation</vendor>
|
|
<iommuGroup number='15'>
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
|
|
</iommuGroup>
|
|
<numa node='0'/>
|
|
<pci-express>
|
|
<link validity='cap' port='1' speed='2.5' width='2'/>
|
|
<link validity='sta' speed='2.5' width='2'/>
|
|
</pci-express>
|
|
</capability>
|
|
</device></pre>
|
|
|
|
<p>
|
|
The XML format for a PCI device stays the same for any further
|
|
capabilities it supports, a single nested <code><capability></code>
|
|
element will be included for each capability the device supports.
|
|
</p>
|
|
|
|
<h3><a id="SRIOVCap">SR-IOV capability</a></h3>
|
|
<p>
|
|
Single root input/output virtualization (SR-IOV) allows sharing of the
|
|
PCIe resources by multiple virtual environments. That is achieved by
|
|
slicing up a single full-featured physical resource called physical
|
|
function (PF) into multiple devices called virtual functions (VFs) sharing
|
|
their configuration with the underlying PF. Despite the SR-IOV
|
|
specification, the amount of VFs that can be created on a PF varies among
|
|
manufacturers.
|
|
</p>
|
|
|
|
<p>
|
|
Suppose the NIC <a href="#PCI">above</a> was also SR-IOV capable, it would
|
|
also include a nested
|
|
<code><capability></code> element enumerating all virtual
|
|
functions available on the physical device (physical port) like in the
|
|
example below.
|
|
</p>
|
|
|
|
<pre>
|
|
<capability type='pci'>
|
|
...
|
|
<capability type='virt_functions' maxCount='7'>
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x5'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x10' function='0x7'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x1'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x3'/>
|
|
<address domain='0x0000' bus='0x04' slot='0x11' function='0x5'/>
|
|
</capability>
|
|
...
|
|
</capability></pre>
|
|
<p>
|
|
A SR-IOV child device on the other hand, would then report its top level
|
|
capability type as a <code>phys_function</code> instead:
|
|
</p>
|
|
|
|
<pre>
|
|
<device>
|
|
...
|
|
<capability type='phys_function'>
|
|
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
|
|
</capability>
|
|
...
|
|
<device></pre>
|
|
|
|
<h3><a id="MDEVCap">MDEV capability</a></h3>
|
|
<p>
|
|
A PCI device capable of creating mediated devices will include a nested
|
|
capability <code>mdev_types</code> which enumerates all supported mdev
|
|
types on the physical device, along with the type attributes available
|
|
through sysfs:
|
|
</p>
|
|
|
|
<dl>
|
|
<dt><code>type</code></dt>
|
|
<dd>
|
|
This element describes a mediated device type which acts as an
|
|
abstract template defining a resource allocation for instances of this
|
|
device type. The element has one attribute <code>id</code> which holds
|
|
an official vendor-supplied identifier for the type.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
|
|
<dt><code>name</code></dt>
|
|
<dd>
|
|
The <code>name</code> element holds a vendor-supplied code name for
|
|
the given mediated device type. This is an optional element.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
|
|
<dt><code>deviceAPI</code></dt>
|
|
<dd>
|
|
The value of this element describes how an instance of the given type
|
|
will be presented to the guest by the VFIO framework.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
|
|
<dt><code>availableInstances</code></dt>
|
|
<dd>
|
|
This element reports the current state of resource allocation. In other
|
|
words, how many instances of the given type can still be successfully
|
|
created on the physical device.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>
|
|
For a more info about mediated devices, refer to the
|
|
<a href="#MDEV">paragraph below</a>.
|
|
</p>
|
|
|
|
<pre>
|
|
<device>
|
|
...
|
|
<driver>
|
|
<name>nvidia</name>
|
|
</driver>
|
|
<capability type='pci'>
|
|
...
|
|
<capability type='mdev_types'>
|
|
<type id='nvidia-11'>
|
|
<name>GRID M60-0B</name>
|
|
<deviceAPI>vfio-pci</deviceAPI>
|
|
<availableInstances>16</availableInstances>
|
|
</type>
|
|
<!-- Here would come the rest of the available mdev types -->
|
|
</capability>
|
|
...
|
|
</capability>
|
|
</device></pre>
|
|
|
|
<h2><a id="MDEV">Mediated devices (MDEVs)</a></h2>
|
|
<p>
|
|
Mediated devices (<span class="since">Since 3.2.0</span>) are software
|
|
devices defining resource allocation on the backing physical device which
|
|
in turn allows the parent physical device's resources to be divided into
|
|
several mediated devices, thus sharing the physical device's performance
|
|
among multiple guests. Unlike SR-IOV however, where a PCIe device appears
|
|
as multiple separate PCIe devices on the host's PCI bus, mediated devices
|
|
only appear on the mdev virtual bus. Therefore, no detach/reattach
|
|
procedure from/to the host driver procedure is involved even though
|
|
mediated devices are used in a direct device assignment manner.
|
|
</p>
|
|
|
|
<p>
|
|
The following sub-elements and attributes are exposed within the
|
|
<code>capability</code> element:
|
|
</p>
|
|
|
|
<dl>
|
|
<dt><code>type</code></dt>
|
|
<dd>
|
|
This element describes a mediated device type which acts as an
|
|
abstract template defining a resource allocation for instances of this
|
|
device type. The element has one attribute <code>id</code> which holds
|
|
an official vendor-supplied identifier for the type.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
|
|
<dt><code>iommuGroup</code></dt>
|
|
<dd>
|
|
This element supports a single attribute <code>number</code> which holds
|
|
the IOMMU group number the mediated device belongs to.
|
|
<span class="since">Since 3.4.0</span>
|
|
</dd>
|
|
</dl>
|
|
|
|
<h3>Example of a mediated device</h3>
|
|
<pre>
|
|
<device>
|
|
<name>mdev_4b20d080_1b54_4048_85b3_a6a62d165c01</name>
|
|
<path>/sys/devices/pci0000:00/0000:00:02.0/4b20d080-1b54-4048-85b3-a6a62d165c01</path>
|
|
<parent>pci_0000_06_00_0</parent>
|
|
<driver>
|
|
<name>vfio_mdev</name>
|
|
</driver>
|
|
<capability type='mdev'>
|
|
<type id='nvidia-11'/>
|
|
<iommuGroup number='12'/>
|
|
<capability/>
|
|
<device/></pre>
|
|
|
|
<p>
|
|
The support of mediated device's framework in libvirt's node device driver
|
|
covers the following features:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
list available mediated devices on the host
|
|
(<span class="since">Since 3.4.0</span>)
|
|
</li>
|
|
<li>
|
|
display device details
|
|
(<span class="since">Since 3.4.0</span>)
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Because mediated devices are instantiated from vendor specific templates,
|
|
simply called 'types', information describing these types is contained
|
|
within the parent device's capabilities
|
|
(see the example in <a href="#PCI">PCI host devices</a>).
|
|
</p>
|
|
|
|
<p>
|
|
To see the supported mediated device types on a specific physical device
|
|
use the following:
|
|
</p>
|
|
|
|
<pre>
|
|
$ ls /sys/class/mdev_bus/<device>/mdev_supported_types</pre>
|
|
|
|
<p>
|
|
Before creating a mediated device, unbind the device from the respective
|
|
device driver, eg. subchannel I/O driver for a CCW device. Then bind the
|
|
device to the respective VFIO driver. For a CCW device, also unbind the
|
|
corresponding subchannel of the CCW device from the subchannel I/O driver
|
|
and then bind the subchannel (instead of the CCW device) to the vfio_ccw
|
|
driver. The below example shows the unbinding and binding steps for a CCW
|
|
device.
|
|
</p>
|
|
|
|
<pre>
|
|
device="0.0.1234"
|
|
subchannel="0.0.0123"
|
|
echo $device > /sys/bus/ccw/devices/$device/driver/unbind
|
|
echo $subchannel > /sys/bus/css/devices/$subchannel/driver/unbind
|
|
echo $subchannel > /sys/bus/css/drivers/vfio_ccw/bind
|
|
</pre>
|
|
|
|
<p>
|
|
To manually instantiate a mediated device, use one of the following as a
|
|
reference. For a CCW device, use the subchannel ID instead of the device
|
|
ID.
|
|
</p>
|
|
|
|
<pre>
|
|
$ uuidgen > /sys/class/mdev_bus/<device>/mdev_supported_types/<type>/create
|
|
...
|
|
$ echo <UUID> > /sys/class/mdev_bus/<device>/mdev_supported_types/<type>/create</pre>
|
|
|
|
<p>
|
|
Manual removal of a mediated device is then performed as follows:
|
|
</p>
|
|
|
|
<pre>
|
|
$ echo 1 > /sys/bus/mdev/devices/<uuid>/remove</pre>
|
|
|
|
</body>
|
|
</html>
|