conf: parse/format <teaming> element in plain <hostdev>

The <teaming> element in <interface> allows pairing two interfaces
together as a simple "failover bond" network device in a guest. One of
the devices is the "transient" interface - it will be preferred for
all network traffic when it is present, but may be removed when
necessary, in particular during migration, when traffic will instead
go through the other interface of the pair - the "persistent"
interface. As it happens, in the QEMU implementation of this teaming
pair (called "virtio failover" in QEMU) the transient interface is
always a host network device assigned to the guest using VFIO (aka
"hostdev"); the persistent interface is always an emulated virtio NIC.

When support was initially added for <teaming>, it was written to
require that the transient/hostdev device be defined using <interface
type='hostdev'>; this was done because the virtio failover
implementation in QEMU and the virtio guest driver demands that the
two interfaces in the pair have matching MAC addresses, and the only
way libvirt can guarantee the MAC address of a hostdev network device
is to use <interface type='hostdev'>, whose main purpose is to
configure the device's MAC address before handing the device to
QEMU. (note that <interface type='hostdev'> in turn requires that the
network device be an SRIOV VF (Virtual Function), as that is the only
type of network device whose MAC address we can set in a way that will
survive the device's driver init in the guest).

It has recently come up that some users are unable to use <teaming>
because they are running in a container environment where libvirt
doesn't have the necessary privileges or resources to set the VF's MAC
address (because setting the VF MAC is done via the same device's PF
(Physical Function), and the PF is not exposed to libvirt's container).

At the same time, these users *are* able to set the VF's MAC address
themselves in advance of staring up libvirt in the container. So they
could theoretically use the <teaming> feature if libvirt just skipped
the "setting the MAC address" part.

Fortunately, that is *exactly* the difference between <interface
type='hostdev'> (which must be a "hostdev VF") and <hostdev> (a "plain
hostdev" - it could be *any* PCI device; libvirt doesn't know what type
of PCI device it is, and doesn't care).

But what is still needed is for libvirt to provide a small bit of
information on the QEMU commandline argument for the hostdev, telling
QEMU that this device will be part of a team ("failover pair"), and
the id of the other device in the pair.

To make both of those goals simultaneously possible, this patch adds
support for the <teaming> element to plain <hostdev> - libvirt doesn't
try to set any MAC addresses, and QEMU gets the extra commandline
argument it needs)

(actually, this patch adds only the parsing/formatting of the
<teaming> element in <hostdev>. The next patch will actually wire that
into the qemu driver.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This commit is contained in:
Laine Stump 2021-02-11 00:58:29 -05:00
parent 5cea59b2b3
commit db64acfbda
8 changed files with 149 additions and 0 deletions

View File

@ -4837,6 +4837,22 @@ support in the hypervisor and the guest network driver).
</devices>
...
The second interface in this example is referencing a network that is
a pool of SRIOV VFs (i.e. a "hostdev network"). You could instead
directly reference an SRIOV VF device:
::
...
<interface type='hostdev'>
<source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</source>
<mac address='00:11:22:33:44:55:66'/>
<teaming type='transient' persistent='ua-backup0'/>
</interface>
...
The ``<teaming>`` element required attribute ``type`` will be set to either
``"persistent"`` to indicate a device that should always be present in the
domain, or ``"transient"`` to indicate a device that may periodically be
@ -4858,6 +4874,41 @@ once migration is completed; while migration is taking place, network traffic
will use the virtio NIC. (Of course the emulated virtio NIC and the hostdev NIC
must be connected to the same subnet for bonding to work properly).
:since:`Since 7.1.0` The ``<teaming>`` element can also be added to a
plain ``<hostdev>`` device.
::
...
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</source>
<mac address='00:11:22:33:44:55:66'/>
<teaming type='transient' persistent='ua-backup0'/>
</interface>
...
This device must be a network device, but not necessarily an SRIOV
VF. Using plain ``<hostdev>`` rather than ``<interface
type='hostdev'>`` or ``<interface type='network'>`` is useful if the
device that will be assigned with VFIO is a standard NIC (not a VF) or
if libvirt doesn't have the necessary resources and privileges to set
the VF's MAC address (e.g. if libvirt is running unprivileged, or in a
container). This of course means that the user (or another
application) is responsible for setting the MAC address of the device
in a way such that it will survive guest driver initialization. For
standard NICs (i.e. not an SRIOV VF) this probably means that the
NIC's factory-programmed MAC address will need to be used for the
teaming pair (since any driver init in the guest will reset the MAC
back to factory). If it is an SRIOV VF, then its MAC address will need
to be set via the VF's PF, e.g. if you are going to use VF 2 of the PF
enp2s0f1, you would use something like this command:
::
ip link set enp2s0f1 vf 2 mac 52:54:00:11:22:33
NB1: Since you must know the alias name of the virtio NIC when configuring the
hostdev NIC, it will need to be manually set in the virtio NIC's configuration
(as with all other manually set alias names, this means it must start with

View File

@ -5156,6 +5156,9 @@
<empty/>
</element>
</optional>
<optional>
<ref name="teaming"/>
</optional>
<element name="source">
<optional>
<ref name="startupPolicy"/>

View File

@ -3024,6 +3024,8 @@ void virDomainHostdevDefClear(virDomainHostdevDefPtr def)
if (!def->parentnet)
virDomainDeviceInfoFree(def->info);
virDomainNetTeamingInfoFree(def->teaming);
switch (def->mode) {
case VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES:
switch ((virDomainHostdevCapsType) def->source.caps.type) {
@ -15015,6 +15017,9 @@ virDomainHostdevDefParseXML(virDomainXMLOptionPtr xmlopt,
}
}
if (virDomainNetTeamingInfoParseXML(ctxt, &def->teaming) < 0)
goto error;
return def;
error:
@ -27433,6 +27438,8 @@ virDomainHostdevDefFormat(virBufferPtr buf,
break;
}
virDomainNetTeamingInfoFormat(def->teaming, buf);
if (def->readonly)
virBufferAddLit(buf, "<readonly/>\n");
if (def->shareable)

View File

@ -354,6 +354,7 @@ struct _virDomainHostdevDef {
virDomainHostdevCaps caps;
} source;
virDomainHostdevOrigStates origstates;
virDomainNetTeamingInfoPtr teaming;
virDomainDeviceInfoPtr info; /* Guest address */
};

View File

@ -1585,6 +1585,25 @@ virDomainHostdevDefValidate(const virDomainHostdevDef *hostdev)
break;
}
}
if (hostdev->teaming) {
if (hostdev->teaming->type != VIR_DOMAIN_NET_TEAMING_TYPE_TRANSIENT) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("teaming hostdev devices must have type='transient'"));
return -1;
}
if (!hostdev->teaming->persistent) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("missing required persistent attribute in hostdev teaming element"));
return -1;
}
if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("teaming is only supported for pci hostdev devices"));
return -1;
}
}
return 0;
}

View File

@ -0,0 +1,64 @@
<domain type='qemu'>
<name>QEMUGuest1</name>
<uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
<memory unit='KiB'>219100</memory>
<currentMemory unit='KiB'>219100</currentMemory>
<vcpu placement='static'>1</vcpu>
<os>
<type arch='i686' machine='pc'>hvm</type>
<boot dev='hd'/>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-i386</emulator>
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/HostVG/QEMUGuest1'/>
<target dev='hda' bus='ide'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<interface type='user'>
<mac address='00:11:22:33:44:55'/>
<model type='virtio'/>
<teaming type='persistent'/>
<alias name='ua-backup0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<interface type='user'>
<mac address='66:44:33:22:11:00'/>
<model type='virtio'/>
<teaming type='persistent'/>
<alias name='ua-backup1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</interface>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x03' slot='0x07' function='0x1'/>
</source>
<teaming type='transient' persistent='ua-backup0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x03' slot='0x07' function='0x2'/>
</source>
<teaming type='transient' persistent='ua-backup1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</memballoon>
</devices>
</domain>

View File

@ -0,0 +1 @@
../qemuxml2argvdata/net-virtio-teaming-hostdev.xml

View File

@ -438,6 +438,9 @@ mymain(void)
DO_TEST("net-virtio-teaming-network",
QEMU_CAPS_VIRTIO_NET_FAILOVER,
QEMU_CAPS_DEVICE_VFIO_PCI);
DO_TEST("net-virtio-teaming-hostdev",
QEMU_CAPS_VIRTIO_NET_FAILOVER,
QEMU_CAPS_DEVICE_VFIO_PCI);
DO_TEST_CAPS_LATEST("net-isolated-port");
DO_TEST("net-hostdev", NONE);
DO_TEST("net-hostdev-bootorder", NONE);