The <teaming> element in <interface> allows pairing two interfaces
together as a simple "failover bond" network device in a guest. One of
the devices is the "transient" interface - it will be preferred for
all network traffic when it is present, but may be removed when
necessary, in particular during migration, when traffic will instead
go through the other interface of the pair - the "persistent"
interface. As it happens, in the QEMU implementation of this teaming
pair (called "virtio failover" in QEMU) the transient interface is
always a host network device assigned to the guest using VFIO (aka
"hostdev"); the persistent interface is always an emulated virtio NIC.
When support was initially added for <teaming>, it was written to
require that the transient/hostdev device be defined using <interface
type='hostdev'>; this was done because the virtio failover
implementation in QEMU and the virtio guest driver demands that the
two interfaces in the pair have matching MAC addresses, and the only
way libvirt can guarantee the MAC address of a hostdev network device
is to use <interface type='hostdev'>, whose main purpose is to
configure the device's MAC address before handing the device to
QEMU. (note that <interface type='hostdev'> in turn requires that the
network device be an SRIOV VF (Virtual Function), as that is the only
type of network device whose MAC address we can set in a way that will
survive the device's driver init in the guest).
It has recently come up that some users are unable to use <teaming>
because they are running in a container environment where libvirt
doesn't have the necessary privileges or resources to set the VF's MAC
address (because setting the VF MAC is done via the same device's PF
(Physical Function), and the PF is not exposed to libvirt's container).
At the same time, these users *are* able to set the VF's MAC address
themselves in advance of staring up libvirt in the container. So they
could theoretically use the <teaming> feature if libvirt just skipped
the "setting the MAC address" part.
Fortunately, that is *exactly* the difference between <interface
type='hostdev'> (which must be a "hostdev VF") and <hostdev> (a "plain
hostdev" - it could be *any* PCI device; libvirt doesn't know what type
of PCI device it is, and doesn't care).
But what is still needed is for libvirt to provide a small bit of
information on the QEMU commandline argument for the hostdev, telling
QEMU that this device will be part of a team ("failover pair"), and
the id of the other device in the pair.
To make both of those goals simultaneously possible, this patch adds
support for the <teaming> element to plain <hostdev> - libvirt doesn't
try to set any MAC addresses, and QEMU gets the extra commandline
argument it needs)
(actually, this patch adds only the parsing/formatting of the
<teaming> element in <hostdev>. The next patch will actually wire that
into the qemu driver.)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
For hardware virtualization this is functionally identical to the
existing host-passthrough mode so the same caveats apply.
For emulated guest this exposes the maximum featureset supported by
the emulator. Note that despite being emulated this is not guaranteed
to be migration safe, especially if different emulator software versions
are used on each host.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Many of Xen's text documents have been converted to man pages over
the years, the channel doc being one of them. Replace the broken
channel.txt link with the name of the man page providing the same
information.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Pass the parameter clock rt to qemu to ensure that the
virtual machine is not synchronized with the host time
Signed-off-by: gongwei <gongwei@smartx.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add virtio related options iommu, ats and packed as driver element attributes
to vsock devices. Ex:
<vsock model='virtio'>
<cid auto='no' address='3'/>
<driver iommu='on'/>
</vsock>
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The current formulation can lead people to believe SCSI
controllers only allow the virtio-scsi model, but really the
only difference is that you have to use model='virtio-scsi'
where you would use model='virtio' for another device.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The virtio-pmem is a virtio variant of NVDIMM and just like
NVDIMM virtio-pmem also allows accessing host pages bypassing
guest page cache. The difference is that if a regular file is
used to back guest's NVDIMM (model='nvdimm') the persistence of
guest writes might not be guaranteed while with virtio-pmem it
is.
To express this new model at domain XML level, I've chosen the
following:
<memory model='virtio-pmem' access='shared'>
<source>
<path>/tmp/virtio_pmem</path>
</source>
<target>
<size unit='KiB'>524288</size>
</target>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memory>
Another difference between NVDIMM and virtio-pmem is that while
the former supports NUMA node locality the latter doesn't. And
also, the latter goes onto PCI bus and not into a DIMM module.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
In certain specific cases it might be beneficial to be able to control
the metadata caching of storage image format drivers of a hypervisor.
Introduce XML machinery to set the maximum size of the metadata cache
which will be used by qemu's qcow2 driver.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add documentation and schema for the new disk transport protocol.
Signed-off-by: Ryan Gahagan <rgahagan@cs.utexas.edu>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Currently, swtpm TPM state file is removed when a transient domain is
powered off or undefined. When we store TPM state on a shared storage
such as NFS and use transient domain, TPM states should be kept as it is.
Add per-TPM emulator option `persistent_sate` for keeping TPM state.
This option only works for the emulator type backend and looks as follows:
<tpm model='tpm-tis'>
<backend type='emulator' persistent_state='yes'/>
</tpm>
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
On PPC platform it is required that a NVDIMM has an UUID. If none
is provided then libvirt generates one during parsing (see
v6.2.0-rc1~96 and friends). However, the example provided in our
documentation is not valid XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Han Han <hhan@redhat.com>
The NCR53C90 is the built-in SCSI controller on all sparc machine types,
and some mips and m68k machine types.
The DC390 and AM53C974 are PCI SCSI controllers that can be added to any
PCI machine.
These are only interesting for emulating obsolete hardware platforms.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
QEMU version 4.2 introduced a performance feature under commit
d645e13287 ("kvm: i386: halt poll control MSR support").
This patch adds a new KVM feature 'poll-control' to set this performance
hint for KVM guests. The feature is off by default.
To enable this hint and have libvirt add "-cpu host,kvm-poll-control=on"
to the QEMU command line, the following XML code needs to be added to the
guest's domain description:
<features>
<kvm>
<poll-control state='on'/>
</kvm>
</features>
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Adds documentation for QEMU 9pfs 'fmode' and 'dmode' options.
Signed-off-by: Brian Turek <brian.turek@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch adds new schema and adds support for parsing and formatting
domain configurations that include vdpa devices.
vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.
When a device on the host is bound to an appropriate vendor-specific
driver, it will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
That chardev path can then be used to define a new interface with
type='vdpa'.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
The 'reporting' suffix of the attribute makes it sound like we
could be reporting something to user. While in fact, this is
purely virtio membaloon <-> QEMU business. Clarify the docs to
make it clear.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In fee8a61d29 a new attribute to <memballoon/> was introduced:
free-page-reporting. We don't really like hyphens in attribute
names. Use camelCase instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This will add the proper documentation and parser support for the free page
reporting feature that is introduced in QEMU 5.1.
Signed-off-by: Nico Pache <npache@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Actual change is "s/``elements``/``feature`` elements/", rest is
reflow.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
"cpu_map.xml" was moved to a directory "cpu_map" and split up into
several files.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
spicevmc is the most common <redirdev> usage. This adds an XML example
for it.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Currently it is visually at the same indent as <seclabel>. This
fixes it to be grouped it with <devices>
Fixes: d4abb7b45d
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
libvirt doesn't reject this but only one <driver> element takes
effect.
Drop the instance that is already referenced in the previous example
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Extract the validation of transient disk option. We support transient
disks in qemu under the following conditions:
- -blockdev is used
- the disk source is a local file
- the disk type is 'disk'
- the disk is not readonly
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
The resolution of the VNC framebuffer can now be set via the resolution
definition introduced in 5.9.0.
Also, add "gop" to the list of model types the <resolution/>
sub-element is valid for.
Signed-off-by: Fabian Freyer <fabian.freyer@physik.tu-berlin.de>
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In [1], changes were made to remove the existing auto-alignment
for pSeries NVDIMM devices. That design promotes strange situations
where the NVDIMM size reported in the domain XML is different
from what QEMU is actually using. We removed the auto-alignment
and relied on standard size validation.
However, this goes against Libvirt design philosophy of not
tampering with existing guest behavior, as pointed out by Daniel
in [2]. Since we can't know for sure whether there are guests that
are relying on the auto-alignment feature to work, the changes
made in [1] are a direct violation of this rule.
This patch reverts [1] entirely, re-enabling auto-alignment for
pSeries NVDIMM as it was before. Changes will be made to ease
the limitations of this design without hurting existing
guests.
This reverts the following commits:
- commit 2d93cbdea9
Revert "formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic"
- commit 0ee56369c8
qemu_domain.c: change qemuDomainMemoryDeviceAlignSize() return type
- commit 07de813924
qemu_domain.c: do not auto-align ppc64 NVDIMMs
- commit 0ccceaa57c
qemu_validate.c: add pSeries NVDIMM size alignment validation
- commit 4fa2202d88
qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public
[1] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html
[2] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Even though this was brought up in upstream discussion [1] it
missed my patches: users should prefer <oemStrings/> over fwcfg.
The reason is that fwcfg is considered somewhat internal to QEMU
and it has limited number of slots and neither of these applies
to <oemStrings/>.
While I'm at it, I'm fixing the example too (because it contains
incorrect element name) and clarifying sysfs/ exposure.
1: https://www.redhat.com/archives/libvir-list/2020-May/msg00957.html
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
By default Xen only allows guests to write "known safe" values into PCI
configuration space, yet many devices require writes to other areas of
the configuration space in order to operate properly. To allow writing
any values Xen supports the 'permissive' setting, see xl.cfg(5) man page.
This change models Xen's permissive setting by adding a writeFiltering
attribute on the <source> element of a PCI hostdev. When writeFiltering
is set to 'no', the Xen permissive setting will be enabled and guests
will be able to write any values into the device's configuration space.
The permissive setting remains disabled in the absense of the
writeFiltering attribute, of if it is explicitly set to 'yes'.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Use https: links for websites that support them.
The URIs which are used as namespace identifiers
are left alone.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Document the new <audio> element which allows to specify
host audio backend for a guest <sound> device, and update
the <sound> element description with the new <audio>
sub-element which specifies the other end of the mapping.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
We do not auto-align down pSeries NVDIMMs anymore.
This reverts commit 8f474ceea0.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We already allow controlling the initiator IQN for iSCSI based disks.
Add the same for host devices.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For now just plain conversion to rst. Anchors which existed until now
are preserved, but the table of contents now uses the docutils-generated
ones.
Additionally <code> which was nested in a link (<a>) was removed as rst
doesn't support nesting of inline markup.
The only anchor which wasn't restored is
'elementsDiskBackingStoreIndex' and its only reference was removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>