Prefer using VFIO (if available) to the legacy KVM device passthrough.
With this patch a PCI passthrough device without the driver configured
will be started with VFIO if it's available on the host. If not legacy
KVM passthrough is checked and error is reported if it's not available.
The following XML is the recommended default clock configuration for
qemu:
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
However we weren't testing any of those timer elements.
The test case average timing code has not been used by any test
case ever. Delete it to remove complexity.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since commit 297c99a5 an invalid source definition XML of a character
device that is used as backend for RNG devices, smartcards and redirdevs
causes crash of the daemon when parsing such a definition.
The device types mentioned above are not a part of a regular character
device but are backends for other types. Thus when parsing such device
NULL is passed as the argument @chr_def. Later when checking the
validity of the definition @chr_def was dereferenced when parsing a UNIX
socket backend with missing path of the socket and crashed the daemon.
Sample offending configuration:
<devices>
...
<rng model='virtio'>
<backend model='egd' type='unix'>
<source mode='bind' service='1024'/>
</backend>
</rng>
</devices>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1012196
This resolves one of the issues in:
https://bugzilla.redhat.com/show_bug.cgi?id=1003983
This device is identical to qemu's "intel-hda" device (known as "ich6"
in libvirt), but has a different PCI device ID (which matches the ID
of the hda audio built into the ich9 chipset, of course). It's not
supported in earlier versions of qemu, so it requires a capability
bit.
qemu/KVM also supports a tftp URL while specifying the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='tftp' name='/url/path'>
<host name='host.name' port='69'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The ftps protocol is another protocol supported by qemu/KVM while specifying
the cdrom ISO image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftps' name='/url/path'>
<host name='host.name' port='990'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
The https protocol is also accepted by qemu/KVM when specifying the cdrom ISO
image.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='https' name='/url/path'>
<host name='host.name' port='443'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
Commit ef5d51d fixed a crash for numatune with auto placement and
nodeset specified:
<numatune>
<memory mode='preferred' placement='auto' nodeset='0'/>
</numatune>
Starting with qemu 1.6, the qemu-system-arm vexpress-a9 model has a
hardcoded virtio-mmio transport which enables attaching all virtio
devices.
On the command line, we have to use virtio-XXX-device rather than
virtio-XXX-pci, thankfully s390 already set the precedent here so
it's fairly straight forward.
At the XML level, this adds a new device address type virtio-mmio.
The controller and addressing don't have any subelements at the
moment because we they aren't needed for this usecase, but could
be added later if needed.
Add a test case for an ARM guest with one of every virtio device
enabled.
Similar to the chardev bit, ARM boards depend on the old style '-net nic'
for actually instantiating net devices. But we can't block out
-netdev altogether since it's needed for upcoming virtio support.
And add tests for working ARM XML with console, disk, and networking.
Add an attribute named 'removable' to the 'target' element of disks,
which controls the removable flag. For instance, on a Linux guest it
controls the value of /sys/block/$dev/removable. This option is only
valid for USB disks (i.e. bus='usb'), and its default value is 'off',
which is the same behaviour as before.
To achieve this, 'removable=on' (or 'off') is appended to the '-device
usb-storage' parameter sent to qemu when adding a USB disk via
'-disk'. A capability flag QEMU_CAPS_USB_STORAGE_REMOVABLE was added
to keep track if this option is supported by the qemu version used.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=922495
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Allow use of the usb-storage device only if the new capability flag
QEMU_CAPS_DEVICE_USB_STORAGE is set, which it is for qemu(-kvm)
versions >= 0.12.1.2-rhel62-beta.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
QEMU commit 3984890 introduced the "pci-hole64-size" property,
to i440FX-pcihost and q35-pcihost with a default setting of 2 GB.
Translate <pcihole64>x<pcihole64/> to:
-global q35-pcihost.pci-hole64-size=x for q35 machines and
-global i440FX-pcihost.pci-hole64-size=x for i440FX-based machines.
Error out on other machine types or if the size was specified
but the pcihost device lacks 'pci-hole64-size' property.
https://bugzilla.redhat.com/show_bug.cgi?id=990418
The ftp protocol is already recognized by qemu/KVM so add this support to
libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='ftp' name='/url/path'>
<host name='host.name' port='21'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
QEMU/KVM already allows a HTTP URL for the cdrom ISO image so add this support
to libvirt as well.
The xml should be as following:
<disk type='network' device='cdrom'>
<source protocol='http' name='/url/path'>
<host name='host.name' port='80'/>
</source>
</disk>
Signed-off-by: Aline Manera <alinefm@br.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=924153
Commit 904e05a2 (v0.9.9) added a per-<disk> seclabel element with
an attribute relabel='no' in order to try and minimize the
impact of shutdown delays when an NFS server disappears. The idea
was that if a disk is on NFS and can't be labeled in the first
place, there is no need to attempt the (no-op) relabel on domain
shutdown. Unfortunately, the way this was implemented was by
modifying the domain XML so that the optimization would survive
libvirtd restart, but in a way that is indistinguishable from an
explicit user setting. Furthermore, once the setting is turned
on, libvirt avoids attempts at labeling, even for operations like
snapshot or blockcopy where the chain is being extended or pivoted
onto non-NFS, where SELinux labeling is once again possible. As
a result, it was impossible to do a blockcopy to pivot from an
NFS image file onto a local file.
The solution is to separate the semantics of a chain that must
not be labeled (which the user can set even on persistent domains)
vs. the optimization of not attempting a relabel on cleanup (a
live-only annotation), and using only the user's explicit notation
rather than the optimization as the decision on whether to skip
a label attempt in the first place. When upgrading an older
libvirtd to a newer, an NFS volume will still attempt the relabel;
but as the avoidance of a relabel was only an optimization, this
shouldn't cause any problems.
In the ideal future, libvirt will eventually have XML describing
EVERY file in the backing chain, with each file having a separate
<seclabel> element. At that point, libvirt will be able to track
more closely which files need a relabel attempt at shutdown. But
until we reach that point, the single <seclabel> for the entire
<disk> chain is treated as a hint - when a chain has only one
file, then we know it is accurate; but if the chain has more than
one file, we have to attempt relabel in spite of the attribute,
in case part of the chain is local and SELinux mattered for that
portion of the chain.
* src/conf/domain_conf.h (_virSecurityDeviceLabelDef): Add new
member.
* src/conf/domain_conf.c (virSecurityDeviceLabelDefParseXML):
Parse it, for live images only.
(virSecurityDeviceLabelDefFormat): Output it.
(virDomainDiskDefParseXML, virDomainChrSourceDefParseXML)
(virDomainDiskSourceDefFormat, virDomainChrDefFormat)
(virDomainDiskDefFormat): Pass flags on through.
* src/security/security_selinux.c
(virSecuritySELinuxRestoreSecurityImageLabelInt): Honor labelskip
when possible.
(virSecuritySELinuxSetSecurityFileLabel): Set labelskip, not
norelabel, if labeling fails.
(virSecuritySELinuxSetFileconHelper): Fix indentation.
* docs/formatdomain.html.in (seclabel): Document new xml.
* docs/schemas/domaincommon.rng (devSeclabel): Allow it in RNG.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.xml:
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.args:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-seclabel-*-labelskip.xml:
New test files.
* tests/qemuxml2argvtest.c (mymain): Run the new tests.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
q35 machines have an implicit ahci (sata) controller at 00:1F.2 which
has no "id" associated with it. For this reason, we can't refer to it
as "ahci0". Instead, we don't give an id on the commandline, which
qemu interprets as "use the first ahci controller". We then need to
specify the unit with "unit=%d" rather than adding it onto the bus
arg.
This patch adds in special handling for a few devices that need to be
treated differently for q35 domains:
usb - there is no implicit/default usb controller for the q35
machinetype. This is done because normally the default usb controller
is added to a domain by just adding "-usb" to the qemu commandline,
and it's assumed that this will add a single piix3 usb1 controller at
slot 1 function 2. That's not what happens when the machinetype is
q35, though. Instead, adding -usb to the commandline adds 3 usb
(version 2) controllers to the domain at slot 0x1D.{1,2,7}. Rather
than having
<controller type='usb' index='0'/>
translate into 3 separate devices on the PCI bus, it's cleaner to not
automatically add a default usb device; one can always be added
explicitly if desired. Or we may decide that on q35 machines, 3 usb
controllers will be automatically added when none is given. But for
this initial commit, at least we aren't locking ourselves into
something we later won't want.
video - qemu always initializes the primary video device immediately
after any integrated devices for the machinetype. Unless instructed
otherwise (by using "-device vga..." instead of "-vga" which libvirt
uses in many cases to work around deficiencies and bugs in various
qemu versions) qemu will always pick the first unused slot. In the
case of the "pc" machinetype and its derivatives, this is always slot
2, but on q35 machinetypes, the first free slot is slot 1 (since the
q35's integrated peripheral devices are placed in other slots,
e.g. slot 0x1f). In order to make the PCI address of the video device
predictable, that slot (1 or 2, depending on machinetype) is reserved
even when no video device has been specified.
sata - a q35 machine always has a sata controller implicitly added at
slot 0x1F, function 2. There is no way to avoid this controller, so we
always add it. Note that the xml2xml tests for the pcie-root and q35
cases were changed to use DO_TEST_DIFFERENT() so that we can check for
the sata controller being automatically added. This is especially
important because we can't check for it in the xml2argv output (it has
no effect on that output since it's an implicit device).
ide - q35 has no ide controllers.
isa and smbus controllers - these two are always present in a q35 (at
slot 0x1F functions 0 and 3) but we have no way of modelling them in
our config. We do need to reserve those functions so that the user
doesn't attempt to put anything else there though. (note that the "pc"
machine type also has an ISA controller, which we also ignore).
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
Since PCI bridges, PCIe bridges, PCIe switches, and PCIe root ports
all share the same namespace, they are all defined as controllers of
type='pci' in libvirt (but with a differing model attribute). Each of
these controllers has a certain connection type upstream, allows
certain connection types downstream, and each can either allow a
single downstream connection at slot 0, or connections from slot 1 -
31.
Right now, we only support the pci-root and pci-bridge devices, both
of which only allow PCI devices to connect, and both which have usable
slots 1 - 31. In preparation for adding other types of controllers
that have different capabilities, this patch 1) adds info to the
qemuDomainPCIAddressBus object to indicate the capabilities, 2) sets
those capabilities appropriately for pci-root and pci-bridge devices,
and 3) validates that the controller being connected to is the proper
type when allocating slots or validating that a user-selected slot is
appropriate for a device..
Having this infrastructure in place will make it much easier to add
support for the other PCI controller types.
While it would be possible to do all the necessary checking by just
storing the controller model in the qemyuDomainPCIAddressBus, it
greatly simplifies all the validation code to also keep a "flags",
"minSlot" and "maxSlot" for each - that way we can just check those
attributes rather than requiring a nearly identical switch statement
everywhere we need to validate compatibility.
You may notice many places where the flags are seemingly hard-coded to
QEMU_PCI_CONNECT_HOTPLUGGABLE | QEMU_PCI_CONNECT_TYPE_PCI
This is currently the correct value for all PCI devices, and in the
future will be the default, with small bits of code added to change to
the flags for the few devices which are the exceptions to this rule.
Finally, there are a few places with "FIXME" comments. Note that these
aren't indicating places that are broken according to the currently
supported devices, they are places that will need fixing when support
for new PCI controller models is added.
To assure that there was no regression in the auto-allocation of PCI
addresses or auto-creation of integrated pci-root, ide, and usb
controllers, a new test case (pci-bridge-many-disks) has been added to
both the qemuxml2argv and qemuxml2xml tests. This new test defines a
domain with several dozen virtio disks but no pci-root or
pci-bridges. The .args file of the new test case was created using
libvirt sources from before this patch, and the test still passes
after this patch has been applied.
<hyperv>
<spinlocks state='off'/>
</hyperv>
results in:
error: XML error: missing HyperV spinlock retry count
Don't require retries when state is off and use virXPathUInt
instead of virXPathString to simplify parsing.
https://bugzilla.redhat.com/show_bug.cgi?id=784836#c19
Since 0d70656afd, it starts to access the sysfs files to build
the qemu command line (by virSCSIDeviceGetSgName, which is to find
out the scsi generic device name by adpater🚌target:unit), there
is no way to work around, qemu wants to see the scsi generic device
like "/dev/sg6" anyway.
And there might be other places which need to access sysfs files
when building qemu command line in future.
Instead of increasing the arguments of qemuBuildCommandLine, this
introduces a new callback for qemuBuildCommandLine, and thus tests
can register their own callbacks for sysfs test input files accessing.
* src/qemu/qemu_command.h: (New callback struct
qemuBuildCommandLineCallbacks;
extern buildCommandLineCallbacks)
* src/qemu/qemu_command.c: (wire up the callback struct)
* src/qemu/qemu_driver.c: (Use the new syntax of qemuBuildCommandLine)
* src/qemu/qemu_hotplug.c: Likewise
* src/qemu/qemu_process.c: Likewise
* tests/testutilsqemu.[ch]: (Helper testSCSIDeviceGetSgName;
callback struct testCallbacks;)
* tests/qemuxml2argvtest.c: (Use testCallbacks)
* src/tests/qemuxmlnstest.c: (Like above)
If the <sysinfo> system table 'uuid' field is improperly formatted,
then qemu will fail to start the guest with the error:
virsh start dom
error: Failed to start domain dom
error: internal error process exited while connecting to monitor: Invalid SMBIOS UUID string
This was because the parsing rules were lax with respect to allowing extraneous
spaces and dashes in the provided UUID. As long as there were 32 hexavalues
that matched the UUID for the domain the string was accepted. However startup
failed because the string format wasn't correct. This patch will adjust the
string format so that when it's presented to the driver it's in the expected
format.
Added a test for uuid comparison within sysinfo.
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
Files ending in -invalid.xml are expected to violate the
XML schema check. The RBD file does not so must have a
different filename.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU command line syntax for RBD disks is
file=rbd:pool/image:opt1=val1:opt2=val2...
There is no way to escape the ':' if it appears in the
pool or image name. Thus it must be explicitly forbidden
if it occurs in the libvirt XML. People are known to
be abusing the lack of escaping in current libvirt to
pass arbitrary args to QEMU.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
static const struct SCSIBusInfo megasas_scsi_info = {
.tcq = true,
.max_target = MFI_MAX_LD,
.max_lun = 255,
.transfer_data = megasas_xfer_complete,
.get_sg_list = megasas_get_sg_list,
.complete = megasas_command_complete,
.cancel = megasas_command_cancel,
};
Example of the qemu command line (lsilogic controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,scsi-id=8,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Example of the qemu command line (virtio-scsi controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,channel=0,scsi-id=128,lun=128,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
Print an error instead of crashing when a TPM device without
a backend is specified.
Add a test for tpm device with no backend, which should fail
with a parse error.
https://bugzilla.redhat.com/show_bug.cgi?id=961252
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
In the past we automatically added a USB controller and assigned
it a PCI address (0:0:1.2) even on machines without a PCI bus.
This didn't break machines with no PCI bus because the command
line for it is just '-usb', with no mention of the PCI bus.
The implicit IDE controller (reserved address 0:0:1.1) has
no command line at all.
Commit b33eb0dc removed the ability to reserve PCI addresses
on machines without a PCI bus. This made them stop working,
since there would always be the implicit USB controller.
Skip the reservation of addresses for these controllers when
there is no PCI bus, instead of failing.
The device option for vfio-pci is nearly identical to that for
pci-assign - only the configfd parameter isn't supported (or needed).
Checking for presence of the bootindex parameter is done separately
from constructing the commandline, similar to how it is done for
pci-assign.
This patch contains tests to check for proper commandline
construction. It also includes tests for parser-formatter-parser
roundtrips (xml2xml), because those tests use the same data files, and
would have failed had they been included before now.
qemu: xml/args tests for VFIO hostdev and <interface type='hostdev'/>
These should be squashed in with the patch that adds commandline
handling of vfio (they would fail at any earlier time).
When all usb controllers connected to the same bus have <master
startport='x'/> specified, none of them have 'id=usb' assigned and
thus qemu fails due to invalid masterport specification (we use 'usb'
for that purpose). Adding a check that at least one of the
controllers is specified without <master startport='x'/> and in case
this happens, error out due to invalid configuration.