QEMU 4.1.0 introduced a new device type called TPM Proxy, currently
implemented by PPC64 guests via a new virtual device called
'spapr-tpm-proxy' (see QEMU 0fb6bd073230 for more info).
The TPM Proxy device interacts with a TPM Resource Manager, a host
device capable of multiplexing the host TPM with multiple processes.
This allows multiple guests to access some TPM features at the
same time. Note that this mode of operation does not provide
full TPM features to be available for the guest - for that case
the guest still needs to assign a vTPM device (tpm-spapr for
PPC64 guests). Although redundant, there is currently no technical
limitation for a guest to assign both a vTPM and a TPM Proxy at the
same time.
This patch adds documentation and schema for a new TPM model
type called 'spapr-tpm-proxy' that creates this new TPM Proxy
device. This model is valid only for the 'passthrough' backend.
An example of a TPM Proxy device connected to a TPM Resource Manager
'/dev/tpmrm0' will look like this:
<tpm model='spapr-tpm-proxy'>
<backend type='passthrough'>
<device path='/dev/tpmrm0'/>
</backend>
</tpm>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add a new aw_bits attribute to the iommu device to control
the address width of the intel-iommu
Signed-off-by Menno Lageman <menno.lageman@oracle.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We're not mentioning that we're replicating QEMU behavior on purpose.
First because QEMU will one day, maybe, change the behavior and
start to refuse incomplete NUMA setups, and then our documentation
is now deprecated. Second, auto filling the CPUs in the first
cell will work regardless of QEMU changes in the future.
The idea is to encourage the user to provide a complete NUMA CPU topology,
not relying on the CPU auto fill mechanic.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
QEMU has -fw_cfg which allows users to tweak how firmware
configures itself and/or provide new configuration blobs.
Introduce new <sysinfo/> type "fwcfg" that will hold these
new blobs.
It's possible to either specify new value as a string or
provide a filename which contents then serve as the value.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The attribute is only allowed for host-passthrough CPUs and it can be
used to request only migratable or all supported features to be enabled
in the virtual CPU.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In formatdomain, using 'libxl' and 'xen' is redundant since they now
both refer to the same driver. 'xen' predates 'libxl' and unambiguously
identifies the Xen hypervisor, so drop the use of 'libxl'.
In aclpolkit, the connection URI was erroneously identified as 'libxl'
and the name 'xenlight'. Change the URI to 'xen' and driver name to 'Xen'.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This patch adds the implementation of the IBS pSeries feature,
using the QEMU_CAPS_MACHINE_PSERIES_CAP_IBS capability added
in the previous patch.
IBS can have the following values: "broken", "workaround",
"fixed-ibs", "fixed-ccd" and "fixed-na".
This is the XML format for the cap:
<features>
<ibs value='fixed-ibs'/>
</features>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds the implementation of the SBBC pSeries feature,
using the QEMU_CAPS_MACHINE_PSERIES_CAP_SBBC capability added
in the previous patch.
Like the previously added CFPC feature, SBBC can have the values
"broken", "workaround" or "fixed". Extra code is required to handle
it since it's not a regular tristate capability.
This is the XML format for the cap:
<features>
<sbbc value='workaround'/>
</features>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds the implementation of the CFPC pSeries feature,
using the QEMU_CAPS_MACHINE_PSERIES_CAP_CFPC capability added
in the previous patch.
CPFC can have the values "broken", "workaround" or "fixed". Extra
code is required to handle it since it's not a regular tristate
capability.
This is the XML format for the cap:
<features>
<cfpc value='workaround'/>
</features>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'passthrough' is Xen-Specific guest configuration option new to Xen 4.13
that enables IOMMU mappings for a guest and hence whether it supports PCI
passthrough. The default is disabled. See the xl.cfg(5) man page and
xen.git commit babde47a3fe for more details.
The default state of disabled prevents hotlugging PCI devices. However,
if the guest configuration contains a PCI passthrough device at time of
creation, libxl will automatically enable 'passthrough' and subsequent
hotplugging of PCI devices will also be possible. It is not possible to
unconditionally enable 'passthrough' since it would introduce a migration
incompatibility due to guest ABI change. Instead, introduce another Xen
hypervisor feature that can be used to enable guest PCI passthrough
<features>
<xen>
<passthrough state='on'/>
</xen>
</features>
To allow finer control over how IOMMU maps to guest P2M table, the
passthrough element also supports a 'mode' attribute with values
restricted to snyc_pt and share_pt, similar to xl.cfg(5) 'passthrough'
setting .
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
e820_host is a Xen-specific option, only available for PV domains, that
provides the domain a virtual e820 memory map based on the host one. It
is enabled with a new Xen hypervisor feature, e.g.
<features>
<xen>
<e820_host state='on'/>
</xen>
</features>
e820_host is required when using PCI passthrough and is generally
considered safe for any PV kernel. e820_host is silently ignored if set
in HVM domain configuration. See xl.cfg(5) man page in the Xen
documentation for more details.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
This document describes the relationship between PCI addresses as
seen in the domain XML and by the guest OS, which is a topic that
people get confused by time and time again.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
a <controller type='pci'...> element can now have a "hotplug"
attribute in the <target> subelement. This is intended to control
whether or not the slot(s) of the controller support
hotplugging/unplugging a device:
<controller type='pci' model='pcie-root-port'>
<target hotplug='off'/>
</controller>
The default value of hotplug is "on".
Since support for configuring such an option is hypervisor-dependent
(and will vary among different types of PCI controllers even on a
single hypervisor), no validation is done in this patch - that
validation will be done in the patch that wires support for the
setting into the hypervisor.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a guest with only one vcpu, when pinning the emulator in say CPU184
and the vcpu0 in CPU0 of the host, the user might expect that only
CPU0 and CPU184 of the host will be used by the guest.
The reality is that Libvirt takes some time to honor the emulator
and vcpu pinning, taking care of NUMA constraints first. This will
result in other CPUs of the host being potentially used by the
QEMU thread until the emulator/vcpu pinning is done. The user
then might be confused by the output of 'virsh cpu-stats' in this
scenario, showing around 200 microseconds of cycles being spent
in other CPUs.
Let's document this behavior, which is explained in detail in
Libvirt commit v5.0.0-199-gf136b83139, in the cputune section
of formatdomain.html.in.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Event channels are like PV interrupts and in conjuction with grant frames
form a data transfer mechanism for PV drivers. They are also used for
inter-processor interrupts. Guests with a large number of vcpus and/or
many PV devices many need to increase the maximum default value of 1023.
For this reason the native Xen config format supports the
'max_event_channels' setting. See xl.cfg(5) man page for more details.
Similar to the existing maxGrantFrames option, add a new xenbus controller
option 'maxEventChannels', allowing to adjust the maximum value via libvirt.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
In the 'topology' element it is mentioned, regarding the sockets
value, "They refer to the total number of CPU sockets".
This is not accurate. What we're doing is calculating the number
of sockets per NUMA node, which can be checked in the current
implementation of virHostCPUGetInfoPopulateLinux(). Calculating
the total number of sockets would break the topology sanity
check nodes*sockets*cores*threads=online_cpus.
This documentation fix is important to avoid user confusion when
seeing the output of 'virsh capabilities' and expecting it to be
equal to the output of 'lscpu'. E.g in a Power 9 host this 'lscpu'
output:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0-159
Thread(s) per core: 4
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported
And this XML output from virsh capabilities:
<cpu>
<arch>ppc64le</arch>
<model>POWER9</model>
<vendor>IBM</vendor>
<topology sockets='1' dies='1' cores='20' threads='4'/>
(...)
</cpu>
Both are correct, as long as we mention in the Libvirt documentation
that 'sockets' in the topology element represents the number of sockets
per NUMA node.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Introduce new 'multidevs' option for filesystem.
<filesystem type='mount' accessmode='mapped' multidevs='remap'>
<source dir='/path'/>
<target dir='mount_tag'>
</filesystem>
This option prevents misbehaviours on guest if a qemu 9pfs export
contains multiple devices, due to the potential file ID collisions
this otherwise may cause.
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add a new attribute for holding the query part for http(s) disks.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
http, https, ftp, ftps, and tftp were not mentioned in the
documentation. Note that 'ssh' is still omitted as it's used only
internally.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This commit is related to RTC timer device too. HPET is being shared
from host device through `localtime` clock. This timer is available
creating a new timer using `hpet` name.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This commit share host Real Time Clock device (rtc) into LXC containers
to support hardware clock. This should be available setting up a `rtc`
timer under clock section. Since this option is not emulated, it should
be available only for `localtime` clock. This option should be readonly
due to security reasons.
Before:
root# hwclock --verbose
hwclock from util-linux 2.32.1
System Time: 1581877557.598365
Trying to open: /dev/rtc0
Trying to open: /dev/rtc
Trying to open: /dev/misc/rtc
No usable clock interface found.
hwclock: Cannot access the Hardware Clock via any known method.
Now:
root# hwclock
2020-02-16 18:23:55.374134+00:00
root# hwclock -w
hwclock: ioctl(RTC_SET_TIME) to /dev/rtc to set the time failed:
Permission denied
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Some disk backends support configuring the readahead buffer or timeout
for requests. Add the knobs to the XML.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add possibility to specify one or more cookies for http based disks.
This patch adds the config parser, storage and validation of the
cookies.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
To allow turning off verification of SSL cerificates add a new element
<ssl> to the disk source XML which will allow configuring the validation
process using the 'verify' attribute.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add more elements for tuning the virtiofsd daemon
and the vhost-user-fs device:
<driver type='virtiofs' queue='1024' xattr='on'>
<binary path='/usr/libexec/virtiofsd'>
<cache mode='always'/>
<lock posix='off' flock='off'/>
</binary>
</driver>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Introduce a new 'virtiofs' driver type for filesystem.
<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<source dir='/path'/>
<target dir='mount_tag'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</filesystem>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
The current documentation is fairly terse and not easy to decode
for someone who's not intimately familiar with the inner workings
of timer devices. Expand on it by providing a somewhat verbose
description of what behavior each policy will result in, as seen
from both the guest OS and host point of view.
This is lifted directly from QEMU commit
commit 2a7d957596786404c4ed16b089273de95a9580ad
Author: Andrea Bolognani <abologna@redhat.com>
Date: Tue Feb 11 19:37:44 2020 +0100
qapi: Expand documentation for LostTickPolicy
v4.2.0-1442-g2a7d957596
The original text also matched word for word the documentation
found in QEMU.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We are going to add support for specifying offset and size attributes
which will allow controling where the image and where the guest data
itself starts in the source of the disk. This will be represented by
a <slices> element filled with either a <slice type='storage'> for the
offset of the image format data.
Add the XML documentation and RNG schema.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Users expect to be able to configure the <console> element and see
that configuration reflected into the <serial> element or at least
sticking, however due to our crazy back-compat code that doesn't
always happen.
There's really not much we can do to make this kind of corner cases
work as the user would expect, especially not without introducing
additional complexity in a part of libvirt that already has more
than a fair share of it; we can, however, improve the documentation
so that it will nudge said users in the right direction.
https://bugzilla.redhat.com/show_bug.cgi?id=1770725
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch adds support for the tpm-spapr device model for ppc64. The XML for
this type of TPM looks as follows:
<tpm model='tpm-spapr'>
<backend type='emulator'/>
</tpm>
Extend the documentation.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The 'builtin' rng backend model can be used as following:
<rng model='virtio'>
<backend model='builtin'/>
</rng>
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Recently CPU hardware vendors have started to support a new structure
inside the CPU package topology known as a "die". Thus the hierarchy
is now:
sockets > dies > cores > threads
This adds support for "dies" in the XML parser, with the value
defaulting to 1 if not specified for backwards compatibility.
For example a system with 64 logical CPUs might report
<topology sockets="4" dies="2" cores="4" threads="2"/>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently, when security driver is not available users are informed that
it wasn't found which can be confusing.
1. Update error message
2. Add comment to domain doc
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by Sebastian Mitterle <smitterl@redhat.com>
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:
1) domains with <hostdev/> can't be migrated
2) NVMe device is assigned whole, there's no way to assign only a
namespace
3) Because hypervisors see <hostdev/> they don't put block layer
on top of it - users don't get all the fancy features like
snapshots
NVMe namespaces are way of splitting one continuous NVDIMM memory
into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)
Because of all of this the following XML was chosen to model a
NVMe device:
<disk type='nvme' device='disk'>
<driver name='qemu' type='raw'/>
<source type='pci' managed='yes' namespace='1'>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>