If there are multiple devices assigned to the different functions of a
single PCI slot, they will not work properly if the device at function
0 doesn't have its "multi" attribute turned on, so it makes sense for
libvirt to turn it on during PCI address assignment. Setting multi
then assures that the new setting is stored in the config (so it will
be used next time the domain is started), preventing any potential
problems in the case that a future change in the configuration
eliminates the devices on all non-0 functions (multi will still be set
for function 0 even though it is the only function in use on the slot,
which has no useful purpose, but also doesn't cause any problems).
(NB: If we were to instead just decide on the setting for
multifunction at runtime, a later removal of the non-0 functions of a
slot would result in a silent change in the guest ABI for the
remaining device on function 0 (although it may seem like an
inconsequential guest ABI change, it *is* a guest ABI change to turn
off the multi bit).)
virtio-pci is the way forward for aarch64 guests: it's faster
and less alien to people coming from other architectures.
Now that guest support is finally getting there (Fedora 24,
CentOS 7.3, Ubuntu 16.04 and Debian testing all support
virtio-pci out of the box), we'd like to start using it by
default instead of virtio-mmio.
Users and applications can already opt-in by explicitly using
<address type='pci'/>
inside the relevant elements, but that's kind of cumbersome and
requires all users and management applications to adapt, which
we'd really like to avoid.
What we can do instead is use virtio-mmio only if the guest
already has at least one virtio-mmio device, and use virtio-pci
in all other situations.
That means existing virtio-mmio guests will keep using the old
addressing scheme, and new guests will automatically be created
using virtio-pci instead. Users can still override the default
in either direction.
Existing tests such as aarch64-aavmf-virtio-mmio and
aarch64-virtio-pci-default already cover all possible
scenarios, so no additions to the test suites are necessary.
Modify _virDomainBlockIoTuneInfo and rng schema to support the group_name
option for iotune throttling. Document the new value.
Signed-off-by: John Ferlan <jferlan@redhat.com>
With the QEMU components in place, provide the XML parsing to
invoke that code when given the following XML snippet:
<hostdev mode='subsystem' type='scsi_host'>
<source protocol='vhost' wwpn='naa.501234567890abcd'/>
</hostdev>
An optional address element can be specified within the hostdev
(pick CCW or PCI as necessary):
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0625'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
Add basic vhost-scsi tests which were cloned from hostdev-scsi-virtio-scsi
in both xml2argv and xml2xml. Added ones for both vhost-scsi-ccw and
vhost-scsi-pci since the syntaxes are slightly different between them.
Also adjusted the docs to describe the changes.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
For machinetypes with a pci-root bus (all legacy PCI), libvirt will
make a "fake" reservation for one extra slot prior to assigning
addresses to unaddressed PCI endpoint devices in the domain. This will
trigger auto-adding of a pci-bridge for the final device to be
assigned an address *if that device would have otherwise instead been
the last device on the last available pci-bridge*; thus it assures
that there will always be at least one slot left open in the domain's
bus topology for expansion (which is important both for hotplug (since
a new pci-bridge can't be added while the guest is running) as well as
for offline additions to the config (since adding a new device might
otherwise in some cases require re-addressing existing devices, which
we want to avoid)).
It's important to note that for the above case (legacy PCI), we must
check for the special case of all slots on all buses being occupied
*prior to assigning any addresses*, and avoid attempting to reserve
the extra address in that case, because there is no free address in
the existing topology, so no place to auto-add a pci-bridge for
expansion (i.e. it would always fail anyway). Since that condition can
only be reached by manual intervention, this is acceptable.
For machinetypes with pcie-root (Q35, aarch64 virt), libvirt's
methodology for automatically expanding the bus topology is different
- pcie-root-ports are plugged into slots (soon to be functions) of
pcie-root as needed, and the new endpoint devices are assigned to the
single slot in each pcie-root-port. This is done so that the devices
are, by default, hotpluggable (the slots of pcie-root don't support
hotplug, but the single slot of the pcie-root-port does). Since
pcie-root-ports can only be plugged into pcie-root, and we don't
auto-assign endpoint devices to the pcie-root slots, this means
topology expansion doesn't compete with endpoint devices for slots, so
we don't need to worry about checking for all "useful" slots being
free *prior* to assigning addresses to new endpoint devices - as a
matter of fact, if we attempt to reserve the open slots before the
used slots, it can lead to errors.
Instead this patch just reserves one slot for a "future potential"
PCIe device after doing the assignment for actual devices, but only
if the only PCI controller defined prior to starting address
assignment was pcie-root, and only if we auto-added at least one PCI
controller during address assignment. This assures two things:
1) that reserving the open slots will only be done when the domain is
initially defined, never at any time after, and
2) that if the user understands enough about PCI controllers that they
are adding them manually, that we don't mess up their plan by
adding extras - if they know enough to add one pcie-root-port, or
to manually assign addresses such that no pcie-root-ports are
needed, they know enough to add extra pcie-root-ports if they want
them (this could be called the "libguestfs clause", since
libguestfs needs to be able to create domains with as few
devices/controllers as possible).
This is set to reserve a single free port for now, but could be
increased in the future if public sentiment goes in that direction
(it's easy to increase later, but essentially impossible to decrease)
Real Q35 hardware has an ICH9 chip that includes several integrated
devices at particular addresses (see the file docs/q35-chipset.cfg in
the qemu source). libvirt already attempts to put the first two sets
of ich9 USB2 controllers it finds at 00:1D.* and 00:1A.* to match the
real hardware. This patch does the same for the ich9 "HD audio"
device.
The main inspiration for this patch is that currently the *only*
device in a reasonable "workstation" type virtual machine config that
requires a legacy PCI slot is the audio device, Without this patch,
the standard Q35 machine created by virt-manager will have a
dmi-to-pci-bridge and a pci-bridge just for the sound device; with the
patch (and if you change the sound device model from the default
"ich6" to "ich9"), the machine definition constructed by virt-manager
has absolutely no legacy PCI controllers - any legacy PCI devices
(e.g. video and sound) are on pcie-root as integrated devices.
Previously we added a set of EHCI+UHCI controllers to Q35 machines to
mimic real hardware as closely as possible, but recent discussions
have pointed out that the nec-usb-xhci (USB3) controller is much more
virtualization-friendly (uses less CPU), so this patch switches the
default for Q35 machinetypes to add an XHCI instead (if it's
supported, which it of course *will* be).
Since none of the existing test cases left out USB controllers in the
input XML, a new Q35 test case was added which has *no* devices, so
ends up with only the defaults always put in by qemu, plus those added
by libvirt.
Previously libvirt would only add pci-bridge devices automatically
when an address was requested for a device that required a legacy PCI
slot and none was available. This patch expands that support to
dmi-to-pci-bridge (which is needed in order to add a pci-bridge on a
machine with a pcie-root), and pcie-root-port (which is needed to add
a hotpluggable PCIe device). It does *not* automatically add
pcie-switch-upstream-ports or pcie-switch-downstream-ports (and
currently there are no plans for that).
Given the existing code to auto-add pci-bridge devices, automatically
adding pcie-root-ports is fairly straightforward. The
dmi-to-pci-bridge support is a bit tricky though, for a few reasons:
1) Although the only reason to add a dmi-to-pci-bridge is so that
there is a reasonable place to plug in a pci-bridge controller,
most of the time it's not the presence of a pci-bridge *in the
config* that triggers the requirement to add a dmi-to-pci-bridge.
Rather, it is the presence of a legacy-PCI device in the config,
which triggers auto-add of a pci-bridge, which triggers auto-add of
a dmi-to-pci-bridge (this is handled in
virDomainPCIAddressSetGrow() - if there's a request to add a
pci-bridge we'll check if there is a suitable bus to plug it into;
if not, we first add a dmi-to-pci-bridge).
2) Once there is already a single dmi-to-pci-bridge on the system,
there won't be a need for any more, even if it's full, as long as
there is a pci-bridge with an open slot - you can also plug
pci-bridges into existing pci-bridges. So we have to make sure we
don't add a dmi-to-pci-bridge unless there aren't any
dmi-to-pci-bridges *or* any pci-bridges.
3) Although it is strongly discouraged, it is legal for a pci-bridge
to be directly plugged into pcie-root, and we don't want to
auto-add a dmi-to-pci-bridge if there is already a pci-bridge
that's been forced directly into pcie-root.
Although libvirt will now automatically create a dmi-to-pci-bridge
when it's needed, the code still remains for now that forces a
dmi-to-pci-bridge on all domains with pcie-root (in
qemuDomainDefAddDefaultDevices()). That will be removed in a future
patch.
For now, the pcie-root-ports are added one to a slot, which is a bit
wasteful and means it will fail after 31 total PCIe devices (30 if
there are also some PCI devices), but helps keep the changeset down
for this patch. A future patch will have 8 pcie-root-ports sharing the
functions on a single slot.
The nec-usb-xhci device (which is a USB3 controller) has always
presented itself as a PCI device when plugged into a legacy PCI slot,
and a PCIe device when plugged into a PCIe slot, but libvirt has
always auto-assigned it to a legacy PCI slot.
This patch changes that behavior to auto-assign to a PCIe slot on
systems that have pcie-root (e.g. Q35 and aarch64/virt).
Since we don't yet auto-create pcie-*-port controllers on demand, this
means a config with an nec-xhci USB controller that has no PCI address
assigned will also need to have an otherwise-unused pcie-*-port
controller specified:
<controller type='pci' model='pcie-root-port'/>
<controller type='usb' model='nec-xhci'/>
(this assumes there is an otherwise-unused slot on pcie-root to accept
the pcie-root-port)
libvirt previously assigned nearly all devices to a "hotpluggable"
legacy PCI slot even on machines with a PCIe root bus (and even though
most such machines don't even support hotplug on legacy PCI slots!)
Forcing all devices onto legacy PCI slots means that the domain will
need a dmi-to-pci-bridge (to convert from PCIe to legacy PCI) and a
pci-bridge (to provide hotpluggable legacy PCI slots which, again,
usually aren't hotpluggable anyway).
To help reduce the need for these legacy controllers, this patch tries
to assign virtio-1.0-capable devices to PCIe slots whenever possible,
by setting appropriate connectFlags in
virDomainCalculateDevicePCIConnectFlags(). Happily, when that function
was written (just a few commits ago) it was created with a
"virtioFlags" argument, set by both of its callers, which is the
proper connectFlags to set for any virtio-*-pci device - depending on
the arch/machinetype of the domain, and whether or not the qemu binary
supports virtio-1.0, that flag will have either been set to PCI or
PCIe. This patch merely enables the functionality by setting the flags
for the device to whatever is in virtioFlags if the device is a
virtio-*-pci device.
NB: the first virtio video device will be placed directly on bus 0
slot 1 rather than on a pcie-root-port due to the override for primary
video devices in qemuDomainValidateDevicePCISlotsQ35(). Whether or not
to change that is a topic of discussion, but this patch doesn't change
that particular behavior.
NB2: since the slot must be hotpluggable, and pcie-root (the PCIe root
complex) does *not* support hotplug, this means that suitable
controllers must also be in the config (i.e. either pcie-root-port, or
pcie-downstream-port). For now, libvirt doesn't add those
automatically, so if you put virtio devices in a config for a qemu
that has PCIe-capable virtio devices, you'll need to add extra
pcie-root-ports yourself. That requirement will be eliminated in a
future patch, but for now, it's simple to do this:
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-root-port'/>
...
Partially Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1330024
The old ivshmem is deprecated in QEMU, so let's use the better
ivshmem-{plain,doorbell} variants instead.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add an optional "tls='yes|no'" attribute for a TCP chardev.
For QEMU, this will allow for disabling the host config setting of the
'chardev_tls' for a domain chardev channel by setting the value to "no" or
to attempt to use a host TLS environment when setting the value to "yes"
when the host config 'chardev_tls' setting is disabled, but a TLS environment
is configured via either the host config 'chardev_tls_x509_cert_dir' or
'default_tls_x509_cert_dir'
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The code is entirely correct, but it still managed to trip me
up when I first ran into it because I did not realize right away
that VIR_PCI_CONNECT_TYPES_ENDPOINT was not a single flag, but
rather a mask including both VIR_PCI_CONNECT_TYPE_PCI_DEVICE and
VIR_PCI_CONNECT_TYPE_PCIE_DEVICE.
In order to save the next distracted traveler in PCI Address Land
some time, document this fact with a comment. Add a test case for
the behavior as well.
If QEMU in question supports QMP, this capability is set if
QEMU_CAPS_DEVICE_QXL was set based on existence of "-device qxl". If
libvirt needs to parse *help*, because there is no QMP support, it
checks for existence of "-vga qxl", but it also parses output of
"-device ?" and sets QEMU_CAPS_DEVICE_QXL too.
Now that libvirt supports only QEMU that has "-device" implemented it's
safe to drop this capability and stop using it.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This patch simplifies QEMU capabilities for QXL video device. QEMU
exposes this device as *qxl-vga* and *qxl* and they are both the same
device with the same set of parameters, the only difference is that
*qxl-vga* includes VGA compatibility.
Based on QEMU code they are tied together so it's safe to check only for
presence of only one of them.
This patch also removes an invalid test case "video-qxl-sec-nodevice"
where there is only *qxl-vga* device and *qxl* device is not present.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If one of QEMU_CAPS_DEVICE_QXL_VGA or QEMU_CAPS_DEVICE_QXL is set the
other one will always be set as well because both devices are tied
together in QEMU.
The change of args files is caused by the presence of capability
QEMU_CAPS_DEVICE_VIDEO_PRIMARY which means it's safe to use
"-device qxl-vga" instead of "-vga qxl", see commit (e3f2686b) and
by the fact that if QEMU_CAPS_VGA_QXL is set QEMU_CAPS_DEVICE_QXL_VGA
and QEMU_CAPS_DEVICE_QXL would be set too (since we support only qemu
with "-device" option).
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
It was missing... Also since I'm using the soft link from qemuxml2xmloutdata
to the qemuxml2argvdata file, modify the output file to have the necessary
<address> elements plus the mouse and keyboard.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Most of QEMU's PCI display device models, such as:
libvirt video/model/@type QEMU -device
------------------------- ------------
cirrus cirrus-vga
vga VGA
qxl qxl-vga
virtio virtio-vga
come with a linear framebuffer (sometimes called "VGA compatibility
framebuffer"). This linear framebuffer lives in one of the PCI device's
MMIO BARs, and allows guest code (primarily: firmware drivers, and
non-accelerated OS drivers) to display graphics with direct memory access.
Due to architectural reasons on aarch64/KVM hosts, this kind of
framebuffer doesn't / can't work in
qemu-system-(arm|aarch64) -M virt
machines. Cache coherency issues guarantee a corrupted / unusable display.
The problem has been researched by several people, including kvm-arm
maintainers, and it's been decided that the best way (practically the only
way) to have boot time graphics for such guests is to consolidate on
QEMU's "virtio-gpu-pci" device.
>From <https://bugzilla.redhat.com/show_bug.cgi?id=1195176>, libvirt
supports
<devices>
<video>
<model type='virtio'/>
</video>
</devices>
but libvirt unconditionally maps @type='virtio' to QEMU's "virtio-vga"
device model. (See the qemuBuildDeviceVideoStr() function and the
"qemuDeviceVideo" enum impl.)
According to the above, this is not right for the "virt" machine type; the
qemu-system-(arm|aarch64) binaries don't even recognize the "virtio-vga"
device model (justifiedly). Whereas "virtio-gpu-pci", which is a pure
virtio device without a compatibility framebuffer, is available, and works
fine.
(The ArmVirtQemu ("AAVMF") platform of edk2 -- that is, the UEFI firmware
for "virt" -- supports "virtio-gpu-pci", as of upstream commit
3ef3209d3028. See
<https://tianocore.acgmultimedia.com/show_bug.cgi?id=66>.)
Override the default mapping of "virtio", from "virtio-vga" to
"virtio-gpu-pci", if qemuDomainMachineIsVirt() evaluates to true.
Cc: Andrea Bolognani <abologna@redhat.com>
Cc: Drew Jones <drjones@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Martin Kletzander <mkletzan@redhat.com>
Suggested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372901
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Martin Kletzander <mkletzan@redhat.com>
Add a new TLS X.509 certificate type - "chardev". This will handle the
creation of a TLS certificate capability (and possibly repository) for
properly configured character device TCP backends.
Unlike the vnc and spice there is no "listen" or "passwd" associated. The
credentials eventually will be handled via a libvirt secret provided to
a specific backend.
Make use of the default verify option as well.
Signed-off-by: John Ferlan <jferlan@redhat.com>
A bunch of cases were only being tested for WHEN_ACTIVE or
WHEN_INACTIVE. Use WHEN_BOTH for all except the very few that
actually require the existing setup.
Failure to parse a XML that was not supposed to fail would result into a
crash in the test suite as the vcpu bitmap would not be filled prior to
the active XML->XML test.
Skip formatting of the vcpu snippet in the fake status XML formatter in
such case to avoid the crash. The test would fail anyways.
We were requiring a USB port path in the schema, but not enforcing it.
Omitting the USB port would lead to libvirt formatting it as (null).
Such domain cannot be started and will disappear after libvirtd restart
(since it cannot parse back the XML).
Only format the port if it has been specified and mark it as optional
in the XML schema.
Commit id's '9bbf0d7e6' and '2552fec24' added some XML parsing tests
for a LUKS volume to use a 'passphrase' secret format. After commit,
this was deemed to be incorrect, so covert the various tests to use
the volume usage format where the 'usage' is the path to the volume
rather than a user defined name string.
Also, removed the qemuxml2argv-luks-disk-cipher.xml since it was
just a duplicate of qemuxml2argv-luks-disks.xml.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Status XML tests were done by prepending a constant string to an
existing XML. With the planned changes the header will depend on data
present in the definition rather than just on the data that was parsed.
The first dynamic element in the header will be the vcpu thread list.
Reuse and rename qemuXML2XMLPreFormatCallback for gathering the relevant
data when checking the active XML parsing and formating and pass the
bitmap to a newly crated header generator.
This is preferrable to -nographic which (in addition to disabling
graphics output) redirects the serial port to stdio and on OpenBIOS
enables the firmware's serial console.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For type='ethernet' interfaces only.
(This patch had been pushed earlier in
commit 0b4645a7e0, but was reverted in
commit 84d47a3cce because it had been
accidentally pushed during the freeze for release 2.0.0)
In order to use more common code and set up for a future type, modify the
encryption secret to allow the "usage" attribute or the "uuid" attribute
to define the secret. The "usage" in the case of a volume secret would be
the path to the volume as dictated by the backwards compatibility brought
on by virStorageGenerateQcowEncryption where it set up the usage field as
the vol->target.path and didn't allow someone to provide it. This carries
into virSecretObjListFindByUsageLocked which takes the secret usage attribute
value from from the domain disk definition and compares it against the
usage type from the secret definition. Since none of the code dealing
with qcow/qcow2 encryption secrets uses usage for lookup, it's a mostly
cosmetic change. The real usage comes in a future path where the encryption
is expanded to be a luks volume and the secret will allow definition of
the usage field.
This code will make use of the virSecretLookup{Parse|Format}Secret common code.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This option allows or disallows detection of zero-writes if it is set to
"on" or "off", respectively. It can be also set to "unmap" in which
case it will try discarding that part of image based on the value of the
"discard" option.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
VNC graphics already supports sockets but only via 'socket' attribute.
This patch coverts that attribute into listen type 'socket'.
For backward compatibility we need to handle listen type 'socket' and 'socket'
attribute properly to support old XMLs and new XMLs. If both are provided they
have to match, if only one of them is provided we need to be able to parse that
configuration too.
To not break migration back to old libvirt if the socket is provided by user we
need to generate migratable XML without the listen element and use only 'socket'
attribute.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Historically, we added heads=1 to videos, but for example for qxl, we
did not reflect that on the command line.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1283207
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Hand-entering indexes for 20 PCI controllers is not as tedious as
manually determining and entering their PCI addresses, but it's still
annoying, and the algorithm for determining the proper index is
incredibly simple (in all cases except one) - just pick the lowest
unused index.
The one exception is USB2 controllers because multiple controllers in
the same group have the same index. For these we look to see if 1) the
most recently added USB controller is also a USB2 controller, and 2)
the group *that* controller belongs to doesn't yet have a controller
of the exact model we're just now adding - if both are true, the new
controller gets the same index, but in all other cases we just assign
the lowest unused index.
With this patch in place and combined with the automatic PCI address
assignment, we can define a PCIe switch with several ports like this:
<controller type='pci' model='pcie-root-port'/>
<controller type='pci' model='pcie-switch-upstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
<controller type='pci' model='pcie-switch-downstream-port'/>
...
These will each get a unique index, and PCI addresses that connect
them together appropriately with no pesky numbers required.
Add a new element to <domain> XML:
<os>
<acpi>
<table type="slic">/path/to/acpi/table/file</table>
</acpi>
</os>
To supply a path to a SLIC (Software Licensing) ACPI
table blob.
https://bugzilla.redhat.com/show_bug.cgi?id=1327537
This test requests a read-only virtual FAT drive on the IDE bus.
Read-only IDE drives are unsupported, but libvirt only displays
the error if it has the QEMU_CAPS_DRIVE_READONLY capability.
Read-write FAT drives are also unsupported.
Rather than only assigning a PCI address when no address is given at
all, also do it when the config says that the address type is 'pci',
but it gives no address (virDeviceInfoPCIAddressWanted()).
There are also several places after parsing but prior to address
assignment where code previously expected that any info with address
type='pci' would have a *valid* PCI address, which isn't always the
case - now we check not only for type='pci', but also for a valid
address (virDeviceInfoPCIAddressPresent()).
The test case added in this patch was directly copied from Cole's patch titled:
qemu: Wire up address type=pci auto_allocate
Commit 55320c23 introduced a new test for VNC to test if
vnc_auto_unix_socket is set in qemu.conf, but forget to enable it in
qemuxml2argvtest.c.
This patch also moves the code in qemuxml2xmltest.c next to other VNC
tests and refactor the test so we also check the case for parsing active
XML.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This wires up qemuDomainAssignAddresses into the new
virDomainDefAssignAddressesCallback, so it's always triggered
via virDomainDefPostParse. We are essentially doing this already
with open coded calls sprinkled about.
qemu argv parse output changes slightly since previously it wasn't
hitting qemuDomainAssignAddresses.
The only case where the hardware capabilities influence the result
is when no <gic/> element was provided.
The test programs now ensure both that the correct GIC version is
picked in that case, and that hardware capabilities are not taken
into account when the user has already picked a GIC version.
Now that we choose the GIC version based on hardware features when
no <gic/> element has been provided, we need a way to fake the GIC
capabilities of the host.
Update the qemuxml2argv and qemuxml2xml tests to allow this.
Add the ability to add an 'iothread' to the controller which will be how
virtio-scsi-pci and virtio-scsi-ccw iothreads have been implemented in qemu.
Describe the new functionality and add tests to parse/validate that the
new attribute can be added.
Similarly to what commit 7140807917 did with some internal paths,
clear vnc socket paths that were generated by us. Having such path in
the definition can cause trouble when restoring the domain. The path is
generated to the per-domain directory that contains the domain ID.
However, that ID will be different upon restoration, so qemu won't be
able to create that socket because the directory will not be prepared.
To be able to migrate to older libvirt, skip formatting the socket path
in migratable XML if it was autogenerated. And mark it as autogenerated
if it already exists and we're parsing live XML.
Best viewed with '-C'.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1326270
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This controller provides a single PCIe port on a new root. It is
similar to pci-expander-bus, intended to provide a bus that can be
associated with a guest-identifiable NUMA node, but is for
machinetypes with PCIe rather than PCI (e.g. q35-based machinetypes).
Aside from PCIe vs. PCI, the other main difference is that a
pci-expander-bus has a companion pci-bridge that is automatically
attached along with it, but pcie-expander-bus has only a single port,
and that port will only connect to a pcie-root-port, or to a
pcie-switch-upstream-port. In order for the bus to be of any use in
the guest, it must have either a pcie-root-port or a
pcie-switch-upstream-port attached (and one or more
pcie-switch-downstream-ports attached to the
pcie-switch-upstream-port).
This is a standard PCI root bus (not a bridge) that can be added to a
440fx-based domain. Although it uses a PCI slot, this is *not* how it
is connected into the PCI bus hierarchy, but is only used for
control. Each pci-expander-bus provides 32 slots (0-31) that can
accept hotplug of standard PCI devices.
The usefulness of pci-expander-bus relative to a pci-bridge is that
the NUMA node of the bus can be specified with the <node> subelement
of <target>. This gives guest-side visibility to the NUMA node of
attached devices (presuming that management apps only assign a device
to a bus that has a NUMA node number matching the node number of the
device on the host).
Each pci-expander-bus also has a "busNr" attribute. The expander-bus
itself will take the busNr specified, and all buses that are connected
to this bus (including the pci-bridge that is automatically added to
any expander bus of model "pxb" (see the next commit)) will use
busNr+1, busNr+2, etc, and the pci-root (or the expander-bus with next
lower busNr) will use bus numbers lower than busNr.
After removing qemuBuildCommandLineCallbacks, testutilsqemu.h does not
need to include qemu_command.h.
Include just qemu_conf.h here and qemu_domain_address.h in files that
need it.
Commit dc98a5bc refactored the code a lot and forget about checking if
listen attribute is specified. This ensures that listen attribute and
first listen element are compared only if both exist.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Add Spice graphics gl attribute. qemu 2.6 should have -spice gl=on argument to
enable opengl rendering context (patches on the ML). This is necessary to
actually enable virgl rendering.
Add a qemuxml2argv test for virtio-gpu + spice with virgl.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Per-domain directories were introduced in order to be able to
completely separate security labels for each domain (commit
f1f68ca334). However when the domain
name is long (let's say a ridiculous 110 characters), we cannot
connect to the monitor socket because on length of UNIX socket address
is limited. In order to get around this, let's shorten it in similar
fashion and in order to avoid conflicts, throw in an ID there as well.
Also save that into the status XML and load the old status XMLs
properly (to clean up after older domains). That way we can change it
in the future.
The shortening can be seen in qemuxml2argv tests, for example in the
hugepages-pages2 case.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This does nothing more than adding the new device and capability.
The device is present since QEMU 2.6.0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Test all kinds of scenarios, including guests asking for GIC but
failing to specify a version, guests specifying an invalid version
and guests trying to use GIC with non-virt or even non-ARM machines.
Unify the naming to prepare for new test cases that will be added
later on.
Convert a couple of output XML files for the qemuxml2xml test to
symlinks while at it, since they were identical to the corresponding
input XML files anyways.
Moreover, since we're only interested in testing GIC support here,
simplify XML files by getting rid of the unrelevant bits.
We use the PreFormat callback for this. Many test cases need to be extended
to pass in proper qemuCaps flags so AssignAddresses doesn't throw errors.
One test case (pcie-root-port-too-many) is dropped, since it was meant
only for checking an error condition in qemuxml2argv, and one we add in
AssignAddresses it errors here too.
Long term I think AssignAddresses should be handled in qemu's PostParse
callback, but that's not entirely straightforward. Handling it here
means we can get the test suite churn over with.
When we unconditionally enable QEMU_CAPS_DEVICE, these tests need
some massaging, so do it ahead of time to not mix it in with the
big test refresh.
- minimal-s390 is not a real world working config, so drop it
- disk-usb was testing for an old code path that will be removed.
instead use it to test lack of USB disk support, and rename it
to disk-usb-nosupport. Switch xml2xml to use disk-usb-device for
input.
- cputune-numatune was needlessly using q35, switch it to an older
machine type
Most qemuxml2xml tests expect that the input XML is unchanged after
parsing. This is unlike 99% of new qemu configs in the wild, which after
initial parsing end up with stable PCI device addresses. The xml2xml bit
doesn't currently hit that code path though, so most XML testing indeed
does not change.
Future patches will add that PCI address bits, which means most test cases
will have different output. So let's do away with the hardcoded same vs
different test split, and always track a separate output file. Tests can
still have same input and output, it just necessitates 2 separate XML files.
The virDomainObjFormat and virDomainSaveStatus methods
both call into virDomainDefFormat, so should be providing
a non-NULL virCapsPtr instance.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Those tests are in qemuargv2xmltest and it makes sense to include them
also in qemuxml2xmltest and qemuxml2argvtest.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The real Q35 machine puts the first USB controller set (EHCI+(UHCIx4))
on bus 0 slot 0x1D, and the 2nd USB controller set on bus 0 slot 0x1A,
so let's attempt to make the virtual machine match that for
controllers with auto-assigned addresses when possible.
Three test cases were added to assure that the proper addresses are
assigned - one with a single set of unaddressed USB controllers, one
with 3 (to grab both preferred slots plus one more), and one with the
order of the controller definitions reordered, to assure that the
auto-assignment isn't mixed up by order.
Future changes will make some of these tests dependent on specific
QEMUCaps flags, so wire up the basic handling. Flags will be added
in future patches.
For the standard active/inactive XML testing, if we leave the file loading
up to the generic XML2XML infrastructure, we get the benefit of
VIR_TEST_REGENERATE_OUTPUT, at the price of a few more disk reads. Seems
worth it.
This patch enable regeneration of expected output file for
virTestDifferenceFull. It also introduces new
virTestDifferenceFullNoRegenerate function for special cases, where we
don't want to regenerate output.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
To be used by the family of virtio input devices:
<input type='mouse' bus='virtio'/>
<input type='tablet' bus='virtio'/>
<input type='keyboard' bus='virtio'/>
https://bugzilla.redhat.com/show_bug.cgi?id=1231114
Check if virtio-gpu provides virgl option, and add qemu command line
formatter.
It is enabled with the existing accel3d attribute:
<model type='virtio' heads='1'>
<acceleration accel3d='yes'/>
</model>
Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
qemu 2.5 provides virtio video device. It can be used with -device
virtio-vga for primary devices, or -device virtio-gpu for non-vga
devices. However, only the primary device (VGA) is supported with this
patch.
Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=1195176
Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
'model' attribute was added to a panic device but only one panic
device is allowed. This patch changes panic device presence
from 'optional' to 'zeroOrMore'.
Panic device type used depends on 'model' attribute.
If no model is specified then device type depends on hypervisor
and guest arch. 'pseries' model is used for pSeries guest and
'isa' model is used in other cases.
XML:
<devices>
<panic model='hyperv'/>
</devices>
QEMU command line:
qemu -cpu <cpu_model>,hv_crash
USB controllers can share the same 'index' which indicates, that there
is some sort of master-companion relationship. Reorder the controllers
in XML in to place the master controller before its companions. This is
required by QEMU to not fail with error message:
error: internal error: process exited while connecting to monitor:
2015-10-26T16:25:17.630265Z qemu-system-x86_64:
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6:
USB bus 'usb.0' not found
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1166452
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
As of QEMU 0.10.0, the -drive cache option stopped using
the on/off value names, so the QEMU driver can assume
use of the new value names.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of QEMU 0.9.1 the -drive argument can be used to configure
all disks, so the QEMU driver can assume it is always available
and drop support for -hda/-cdrom/etc.
Many of the tests need updating because a great many were
running without CAPS_DRIVE set, so using the -hda legacy
syntax.
Fixing the tests uncovered a bug in the argv -> xml
convertor which failed to handle disk with if=floppy.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The -no-reboot arg was added in QEMU 0.9.0, so the QEMU driver
can now assume it is always present.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of QEMU 0.9.0 the -vnc option accepts a ':' to separate port
from listen address, so the QEMU driver can assume that support
for listen addresses is always available.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The flag was used only for formatting the XML and once the parser and
formatter flags were split in 0ecd685109
it doesn't make sense any more to have it.
Two utility functions are introduced for proper initialization and
cleanup of the driver.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1260846
Introduced by 8fedbbdb, if we parse an unordered NUMA cell, will
get a segfault. This is because of a check for overlapping @cpus
sets we have there. However, since the array to hold guest NUMA
cells is allocated upfront and therefore it contains all zeros,
an out of order cell will break our assumption that cell IDs have
increasing character. At this point we try to access yet NULL
bitmap and therefore segfault.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Adds a new interface type using UDP sockets, this seems only applicable
to QEMU but have edited tree-wide to support the new interface type.
The interface type required the addition of a "localaddr" (local
address), this then maps into the following xml and qemu call.
<interface type='udp'>
<mac address='52:54:00:5c:67:56'/>
<source address='127.0.0.1' port='11112'>
<local address='127.0.0.1' port='22222'/>
</source>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</interface>
QEMU call:
-net socket,udp=127.0.0.1:11112,localaddr=127.0.0.1:22222
Notice the xml "local" entry becomes the "localaddr" for the qemu call.
reference:
http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg00629.html
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The numad hint stored in priv->autoNodeset is information that gets lost
during daemon restart. And because we would like to use that
information in the future, we also need to save it in the status XML.
For the sake of tests, we need to initialize nnumaCell_max to some
value, so that the restoration doesn't fail in our test suite. There is
no need to fill in the actual numa cell data since the recalculating
function virCapabilitiesGetCpusForNodemask() will not fail, it will just
skip filling the data in the bitmap which we don't use in tests anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This controller can be connected only to a port on a
pcie-switch-upstream-port. It provides a single hotpluggable port that
will accept any PCI or PCIe device, as well as any device requiring a
pcie-*-port (the only current example of such a device is the
pcie-switch-upstream-port).
This controller can be connected only to a pcie-root-port or a
pcie-switch-downstream-port (which will be added in a later patch),
which is the reason for the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_PORT. A pcie-switch-upstream-port provides
32 ports (slot=0 to slot=31) on the downstream side, which can only
have pci controllers of model "pcie-switch-downstream-port" plugged
into them, which is the reason for the other new connect type
VIR_PCI_CONNECT_TYPE_PCIE_SWITCH.
This is backed by the qemu device ioh3420.
chassis and port from the <target> subelement are used to store/set the
respective qemu device options for the ioh3420. Currently, chassis is
set to be the index of the controller, and port is set to
"(slot << 3) + function" (per suggestion from Alex Williamson).
This controller can be connected (at domain startup time only - not
hotpluggable) only to a port on the pcie root complex ("pcie-root" in
libvirt config), hence the new connect type
VIR_PCI_CONNECT_TYPE_PCIE_ROOT. It provides a hotpluggable port that
will accept any PCI or PCIe device.
New attributes must be added to the controller <target> subelement for
this - chassis and port are guest-visible option values that will be
set by libvirt with values derived from the controller's index and pci
address information.
https://bugzilla.redhat.com/show_bug.cgi?id=1235116
According to our XML definition, zero is as valid as any other value.
Mainly because it should be kernel-agnostic.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Defining a domain with a SCSI disk attached via a hostdev
tag and a source address unit value longer than two digits
causes an error when editing the domain with virsh edit,
even if no changes are made to the domain definition.
The error suggests invalid XML, somewhere:
# virsh edit lmb_guest
error: XML document failed to validate against schema:
Unable to validate doc against /usr/local/share/libvirt/schemas/domain.rng
Extra element devices in interleave
Element domain failed to validate content
The virt-xml-validate tool fails with a similar error:
# virt-xml-validate lmb_guest.xml
Relax-NG validity error : Extra element devices in interleave
lmb_guest.xml:17: element devices: Relax-NG validity error :
Element domain failed to validate content
lmb_guest.xml fails to validate
The hostdev tag requires a source address to be specified,
which includes bus, target, and unit address attributes.
According to the SCSI Architecture Model spec (section
4.9 of SAM-2), a LUN address is 64 bits and thus could be
up to 20 decimal digits long. Unfortunately, the XML
schema limits this string to just two digits. Similarly,
the target field can be up to 32 bits in length, which
would be 10 decimal digits.
# lsscsi -xx
[0:0:19:0x4022401100000000] disk IBM 2107900 3.44 /dev/sda
# lsscsi
[0:0:19:1074872354]disk IBM 2107900 3.44 /dev/sda
# cat lmb_guest.xml
<domain type='kvm'>
<name>lmb_guest</name>
<memory unit='MiB'>1024</memory>
...trimmed...
<devices>
<controller type='scsi' model='virtio-scsi' index='0'/>
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='19' unit='1074872354'/>
</source>
</hostdev>
...trimmed...
Since the reference unit and target fields are used in
several places in the XML schema, create a separate one
specific for SCSI Logical Units that will permit the
greater length. This permits both the validation utility
and the virsh edit command to succeed when a hostdev
tag is included.
Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1220527
This type of information defines attributes of a system
baseboard. With one exception: board type is yet not implemented
in qemu so it's not introduced here either.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We have been formatting the first serial device also
as a console device, but only if there were no other consoles.
If there is a <serial> device present in the XML, but no serial
<console>, or if there isn't any <console> at all but the domain
definition hasn't gone through a parse->format->parse round-trip,
the <console> device would not be formatted.
Change the code to always add the stub device for the first
serial device.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1089914
The guest firmware provides the same functionality as the pvpanic
device, and the relevant element should always be present in the
domain XML to reflect this fact, so add it after parsing the
definition if it wasn't there already.
The guest firmware provides the same functionality as the pvpanic
device, which is not available in QEMU on pSeries, so the domain
XML should be allowed to contain the <panic> element.
On the other hand, unlike the pvpanic device, the guest firmware
can't be configured, so report an error if an address has been
provided in the XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1182388
https://bugzilla.redhat.com/show_bug.cgi?id=998813
Like usb-serial, the pci-serial device allows a serial device to be
attached to PCI bus. An example XML looks like this:
<serial type='dev'>
<source path='/dev/ttyS2'/>
<target type='pci-serial' port='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</serial>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
My commit 747761a79 (v1.2.15 only) dropped this bit of logic when filling
in a default arch in the XML:
- /* First try to find one matching host arch */
- for (i = 0; i < caps->nguests; i++) {
- if (caps->guests[i]->ostype == ostype) {
- for (j = 0; j < caps->guests[i]->arch.ndomains; j++) {
- if (caps->guests[i]->arch.domains[j]->type == domain &&
- caps->guests[i]->arch.id == caps->host.arch)
- return caps->guests[i]->arch.id;
- }
- }
- }
That attempt to match host.arch is important, otherwise we end up
defaulting to i686 on x86_64 host for KVM, which is not intended.
Duplicate it in the centralized CapsLookup function.
Additionally add some testcases that would have caught this.
https://bugzilla.redhat.com/show_bug.cgi?id=1219191
My commit 7b9de914 added some aarch64 CPU test cases. I wanted to test
two different code paths but inadvertently added two of the same test
cases.
The second code path (using <cpu><model>host</model</cpu>) isn't easily
exercised via the qemu tests anyways, I'll need to look elsewhere.
Regardless, remove the redundant tests for now
Add 'thread_id' to the virDomainIOThreadIDDef as a means to store the
'thread_id' as returned from the live qemu monitor data.
Remove the iothreadpids list from _qemuDomainObjPrivate and replace with
the new iothreadids 'thread_id' element.
Rather than use the default numbering scheme of 1..number of iothreads
defined for the domain, use the iothreadid's list for the iothread_id
Since iothreadids list keeps track of the iothread_id's, these are
now used in place of the many places where a for loop would "know"
that the ID was "+ 1" from the array element.
The new tests ensure usage of the <iothreadid> values for an exact number
of iothreads and the usage of a smaller number of <iothreadid> values than
iothreads that exist (and usage of the default numbering scheme).
- Make sure aarch64 host-passthrough works correctly
- Make sure libvirt doesn't choke on cpu model=host, which is what
virt-install/virt-manager were incorrectly specifying up until recently.
This needs to specified in way too many places for a simple validation
check. The ostype/arch/virttype validation checks later in
DomainDefParseXML should catch most of the cases that this was covering.
The <inbound/> element to <bandwidth/> has several attributes from
which two are mandatory. Well, from two at least one has to be
present: @average or @floor or both. Instead of inventing crazy RNG
schema, let's make all the attributes optional there and rely on our
parsing code to correctly handle the situation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Recently we've fixed a bug where the status XML could not be parsed as
the parser used absolute path XPath queries. This test enhancement tests
all XML files used in the qemu-xml-2-xml test as a part of a status XML
snippet to see whether they are parsed correctly. The status XML-2-XML is
currently tested in 223 cases with this patch.
To allow adding more tests, refactor the XML-2-XML test so that the
files are not reloaded always and clarify the control flow.
Result of this changes is that the active and inactive portions of the
XML are tested in separate steps rather than one test step.
Add support to start qemu instance with 'pc-dimm' device. Thanks to the
refactors we are able to reuse the existing function to determine the
parameters.
To enable memory hotplug the maximum memory size and slot count need to
be specified. As qemu supports now other units than mebibytes when
specifying memory, use the new interface in this case.
Midonet is an opensource virtual networking that over lays the IP
network between hypervisors. Currently, such networks can be made
with the openvswitch virtualport type.
This patch, defines the schema and documentation that will serve
as basis for the follow up patches that will add support to libvirt
for using Midonet virtual ports for its interfaces. The schema
definition requires that the port profile expresses its interfaceid
as part of the port profile. For that reason, this is part of the
patch too.
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
All the devices we have format their address as its last sub-element, so
let's change memballoon to follow suit. Also adjust RNG to allow any
order of them so 'virsh edit' doesn't shout at us.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Now that the size of guest's memory can be inferred from the NUMA
configuration (if present) make it optional to specify <memory>
explicitly.
To make sure that memory is specified add a check that some form of
memory size was specified. One side effect of this change is that it is
no longer possible to specify 0KiB as memory size for the VM, but I
don't think it would be any useful to do so. (I can imagine embedded
systems without memory, just registers, but that's far from what libvirt
is usually doing).
Forbidding 0 memory for guests also fixes a few corner cases where 0 was
not interpreted correctly and caused failures. (Arguments for numad when
using automatic placement, size of the balloon). This fixes problems
described in https://bugzilla.redhat.com/show_bug.cgi?id=1161461
Test case changes are added to verify that the schema change and code
behave correctly.
We don't usually do tests purely for one change, but one change was
special because when users will migrate to OVMF/AAVMF, commit 18f9f69b
makes their lives easier by allowing them to interleave <type/> inside
<os/>. It would be nice of us to keep the possibility of them pasting
the loader and nvram elements wherever it is valid, hence this test.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 3e4b783e fixed an issue with RNG schema where this address type
was missing, this commit adds a test for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
We have something like pvpanic device. However, in some cases it does
not have any address assigned, in which case we produce this ugly XML
(still valid though):
<devices>
<emulator>/usr/bin/qemu</emulator>
...
<panic>
</panic>
</devices>
Lets format "<panic/>" instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since the APIs support just one element per namespace and while
modifying an element all duplicates would be removed, let's do this
right away in the post parse callback.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1190590
In order for QEMU vCPU (and other) threads to run with RT scheduler,
libvirt needs to take care of that so QEMU doesn't have to run privileged.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178986
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In our RNG schema we do allow multiple (different) seclabels per-domain,
but don't allow this for devices, yet we neither have a check in our XML parser,
nor in a post-parse callback. In that case we should allow multiple
(different) seclabels for devices as well.
It is only supported for virtio adapters.
Silently drop it if it was specified for other models,
as is done for other virtio attributes.
Also mention this in the documentation.
https://bugzilla.redhat.com/show_bug.cgi?id=1147195
https://bugzilla.redhat.com/show_bug.cgi?id=1170492
In one of our previous commits (dc8b7ce7) we've done a functional
change even though it was intended as pure refactor. The problem is,
that the following XML:
<vcpu placement='static' current='2'>6</vcpu>
<cputune>
<emulatorpin cpuset='1-3'/>
</cputune>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
gets translated into this one:
<vcpu placement='auto' current='2'>6</vcpu>
<cputune>
<emulatorpin cpuset='1-3'/>
</cputune>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
We should not change the vcpu placement mode. Moreover, we're doing
something similar in case of emulatorpin and iothreadpin. If they were
set, but vcpu placement was auto, we've mistakenly removed them from
the domain XML even though we are able to set them independently on
vcpus.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There are some interface types (notably 'server' and 'client')
which instead of allowing the default set of elements and
attributes (like the rest do), try to enumerate only the elements
they know of. This way it's, however, easy to miss something. For
instance, the <address/> element was not mentioned at all. This
resulted in a strange behavior: when such interface was added
into XML, the address was automatically generated by parsing
code. Later, the formatted XML hasn't passed the RNG schema. This
became more visible once we've turned on the XML validation on
domain XML changes: appending an empty line at the end of
formatted XML (to trick virsh think the XML had changed) made
libvirt to refuse the very same XML it formatted.
Instead of trying to find each element and attribute we are
missing in the schema, lets just allow all the elements and
attributes like we're doing that for the rest of types. It's no
harm if the schema is wider than our parser allows.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The virDomainDefParse* and virDomainDefFormat* methods both
accept the VIR_DOMAIN_XML_* flags defined in the public API,
along with a set of other VIR_DOMAIN_XML_INTERNAL_* flags
defined in domain_conf.c.
This is seriously confusing & error prone for a number of
reasons:
- VIR_DOMAIN_XML_SECURE, VIR_DOMAIN_XML_MIGRATABLE and
VIR_DOMAIN_XML_UPDATE_CPU are only relevant for the
formatting operation
- Some of the VIR_DOMAIN_XML_INTERNAL_* flags only apply
to parse or to format, but not both.
This patch cleanly separates out the flags. There are two
distint VIR_DOMAIN_DEF_PARSE_* and VIR_DOMAIN_DEF_FORMAT_*
flags that are used by the corresponding methods. The
VIR_DOMAIN_XML_* flags received via public API calls must
be converted to the VIR_DOMAIN_DEF_FORMAT_* flags where
needed.
The various calls to virDomainDefParse which hardcoded the
use of the VIR_DOMAIN_XML_INACTIVE flag change to use the
VIR_DOMAIN_DEF_PARSE_INACTIVE flag.
QEMU supports feature specification with -cpu host and we just skip
using that. Since QEMU developers themselves would like to use this
feature, this patch modifies the code to work.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1178850
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
To track state of virtio channels this patch adds a new output-only
attribute called 'state' to the <target> element of virtio channels.
This will be later populated with the guest state of the channel.
To simplify looking for a problem instrument the XML comparator function
with possibility to print the filename of the failed/expected XML
output.
This is necessary as the VIR_TEST_DIFFERENT macro possibly tests two XML
files for the inactive/active state and the resulting error may not be
obvious.
Extending the iothread disk support from pci to pci and ccw.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adds parsing/formatting code as well as documentation for
shared memory devices. This will currently be only accessible in QEMU
using it's ivshmem device, but is designed as generic as possible to
allow future expansion for other hypervisors.
In the devices section in the domain XML users may specify:
- For shmem device using a server:
<shmem name='shmem0'>
<server path='/tmp/socket-ivshmem0'/>
<size unit='M'>32</size>
<msi vectors='32' ioeventfd='on'/>
</shmem>
- For ivshmem device not using an ivshmem server:
<shmem name='shmem1'>
<size unit='M'>32</size>
</shmem>
Most of the configuration is made optional so it also allows
specifications like:
<shmem name='shmem1/>
<shmem name='shmem2'>
<server/>
</shmem>
Signed-off-by: Maxime Leroy <maxime.leroy@6wind.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add options for tuning segment offloading:
<driver>
<host csum='off' gso='off' tso4='off' tso6='off'
ecn='off' ufo='off'/>
<guest csum='off' tso4='off' tso6='off' ecn='off' ufo='off'/>
</driver>
which control the respective host_ and guest_ properties
of the virtio-net device.
For tuning the network, alternative devices
for creating tap and vhost devices can be specified via:
<backend tap='/dev/net/tun' vhost='/dev/net-vhost'/>
I noticed this with the recent iothread pinning code, but the
problem existed longer than that. The XML validation required
users to supply <cputune> children in a strict order, even though
there was no conceptual reason why they can't occur in any order.
docs/ changes best viewed with -w
* docs/schemas/domaincommon.rng (cputune): Add interleave.
* tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml: Swap
up order, copying canonical form...
* tests/qemuxml2xmloutdata/qemuxml2xmlout-cputune-iothreads.xml:
...here.
* tests/qemuxml2xmltest.c (mymain): Mark the difference.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1101574
Add an option 'iothreadpin' to the <cpuset> to allow for setting the
CPU affinity for each IOThread.
The iothreadspin will mimic the vcpupin with respect to being able to
assign each iothread to a specific CPU, although iothreads ids start
at 1 while vcpu ids start at 0. This matches the iothread naming scheme.
Up to now, users can configure BIOS via the <loader/> element. With
the upcoming implementation of UEFI this is not enough as BIOS and
UEFI are conceptually different. For instance, while BIOS is ROM, UEFI
is programmable flash (although all writes to code section are
denied). Therefore we need new attribute @type which will
differentiate the two. Then, new attribute @readonly is introduced to
reflect the fact that some images are RO.
Moreover, the OVMF (which is going to be used mostly), works in two
modes:
1) Code and UEFI variable store is mixed in one file.
2) Code and UEFI variable store is separated in two files
The latter has advantage of updating the UEFI code without losing the
configuration. However, in order to represent the latter case we need
yet another XML element: <nvram/>. Currently, it has no additional
attributes, it's just a bare element containing path to the variable
store file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This commit is rather big. Firstly, the in memory config
representation is adjusted like if security_driver was set to "none".
The rest is then just adaptation to the new code that will generate
different seclabels.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
For virtio-blk-pci disks with the disk iothread attribute that are
running the correct emulator, add the "iothread=iothread#" to the
-device command line in order to enable iothreads for the disk as
long as the command is available, the disk iothread value provided is
valid, and is supported for the disk device being added
Add a new capability to ensure the iothreads feature exists for the qemu
emulator being run - requires the "query-iothreads" QMP command. Using the
domain XML add correspoding command argument in order to generate the
threads. The iothreads will use a name space "iothread#" where, the
future patch to add support for using an iothread to a disk definition to
merely define which of the available threads to use.
Add tests to ensure the xml/argv processing is correct. Note that no
change was made to qemuargv2xmltest.c as processing the -object element
would require knowing more than just iothreads.
QEMU 2.1 added support for the kvm=off option to the -cpu command,
allowing the KVM hypervisor signature to be hidden from the guest.
This enables disabling of some paravirualization features in the
guest as well as allowing certain drivers which test for the
hypervisor to load. Domain XML syntax is as follows:
<domain type='kvm>
...
<features>
...
<kvm>
<hidden state='on'/>
</kvm>
</features>
...
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1128751
There's this <driver/> element under <interface/> which can have
several attributes. However, the driver element is currently formated
only if the driver's name or txmode has been specified. This makes
only a little sense as we parse even partial <driver/>, for instance:
<interface type='user'>
<mac address='52:54:00:e5:48:58'/>
<model type='virtio'/>
<driver ioeventfd='on' event_idx='on' queues='5'/>
</interface>
But such XML would never get formatted back.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduce a new structure to handle an iSCSI host device based on the
existing virDomainHostdevSubsysSCSI by adding a "protocol='iscsi'" to
the <source/> element. The existing scsi_host subsystem RNG was modified
to read an optional "protocol='adapter'", although it won't be written
out nor is it documented as an option (by choice).
The new hostdev structure mimics the existing <disk/> element for an
iSCSI device (network) device. New XML is:
<hostdev mode='subsystem' type='scsi' managed='yes'>
<source protocol='iscsi' name='iqn.1992-01.com.example'>
<host name='example.org' port='3260'/>
<auth username='myname'>
<secret type='iscsi' usage='mycluster_myname'/>
</auth>
</source>
<address type='drive' controller='0' bus='0' target='2' unit='5'/>
</hostdev>
The controller element will mimic the existing scsi_host code insomuch
as when 'lsi' and 'virtio-scsi' are used.
A future patch is going to wire up qemu active block commit jobs;
but as they have similar events and are canceled/pivoted in the
same way as block copy jobs, it is easiest to track all bookkeeping
for the commit job by reusing the <mirror> element. This patch
adds domain XML to track which job was responsible for creating a
mirroring situation, and adds a job='copy' attribute to all
existing uses of <mirror>. Along the way, it also massages the
qemu monitor backend to read the new field in order to generate
the correct type of libvirt job (even though it requires a
future patch to actually cause a qemu event that can be reported
as an active commit). It also prepares to update persistent XML
to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema.
* docs/formatdomain.html.in: Document it.
* src/conf/domain_conf.h (_virDomainDiskDef): Add a field.
* src/conf/domain_conf.c (virDomainBlockJobType): String conversion.
(virDomainDiskDefParseXML): Parse job type.
(virDomainDiskDefFormat): Output job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish
active from regular commit.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type.
(qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type
on completion.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml:
Update tests.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New
file.
* tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com>
There were numerous places where numatune configuration (and thus
domain config as well) was changed in different ways. On some
places this even resulted in persistent domain definition not to be
stable (it would change with daemon's restart).
In order to uniformly change how numatune config is dealt with, all
the internals are now accessible directly only in numatune_conf.c and
outside this file accessors must be used.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
In XML format, by definition, order of fields should not matter, so
order of parsing the elements doesn't affect the end result. When
specifying guest NUMA cells, we depend only on the order of the 'cell'
elements. With this patch all older domain XMLs are parsed as before,
but with the 'id' attribute they are parsed and formatted according to
that field. This will be useful when we have tuning settings for
particular guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This patch adds support for the QEMU vhost-user feature to libvirt.
vhost-user enables the communication between a QEMU virtual machine
and other userspace process using the Virtio transport protocol.
It uses a char dev (e.g. Unix socket) for the control plane,
while the data plane based on shared memory.
The XML looks like:
<interface type='vhostuser'>
<mac address='52:54:00:3b:83:1a'/>
<source type='unix' path='/tmp/vhost.sock' mode='server'/>
<model type='virtio'/>
</interface>
Signed-off-by: Michele Paolino <m.paolino@virtualopensystems.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1113860
We've always done that. Well, until 990e46c45. Point is, if we don't
format model, we may lose a domain on libvirtd restart. If the
seclabel is implicit however, we should skip it's formatting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This introduces two new attributes "cmd_per_lun" and "max_sectors" same
with the names QEMU uses for virtio-scsi. An example of the XML:
<controller type='scsi' index='0' model='virtio-scsi' cmd_per_lun='50'
max_sectors='512'/>
The corresponding QEMU command line:
-device virtio-scsi-pci,id=scsi0,cmd_per_lun=50,max_sectors=512,
bus=pci.0,addr=0x3
Signed-off-by: Mike Perez <thingee@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Now that we track a disk mirror as a virStorageSource, we might
as well update the XML to theoretically allow any type of
mirroring destination (not just a local file). A later patch
will also be reusing <mirror> to track the block commit of the
top layer of a chain, which is another case where libvirt needs
to update the backing chain after the job is finally pivoted,
and since backing chains can have network backing files as the
destination to commit into, it makes more sense to display that
in the XML.
This patch changes output-only XML; it was already documented
that <mirror> does not affect a domain definition at this point
(because qemu doesn't provide persistent bitmaps yet). Any
application that was starting a block copy job with older libvirt
and then relying on the domain XML to determine if it was
complete will no longer be able to access the file= and format=
attributes of mirror that were previously used. However, this is
not going to be a problem in practice: the only time a block copy
job works is on a transient domain, and any app that is managing
a transient domain probably already does enough of its own
bookkeeping to know which file it is mirroring into without
having to re-read it from the libvirt XML. The one thing that
was likely to be used in a mirroring job was the ready=
attribute, which is unchanged. Meanwhile, I made sure the schema
and parser still accept the old format, even if we no longer
output it, so that upgrading from an older version of libvirt is
seamless.
* docs/schemas/domaincommon.rng (diskMirror): Alter definition.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Parse two
styles of mirror elements.
(virDomainDiskDefFormat): Output new style.
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror-old.xml: New
file, copied from...
* tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: ...here
before modernizing.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old*: New
files.
* tests/qemuxml2xmltest.c (mymain): Test both styles.
Signed-off-by: Eric Blake <eblake@redhat.com>
We allow a seclabel to be specified in the <source> element
of a chardev:
<serial type='file'>
<source path='/tmp/serial.file'>
<seclabel model='dac' relabel='no'/>
</source>
</serial>
But we format it outside the source:
<serial type='file'>
<source path='/tmp/serial.file'/>
<target port='0'/>
<seclabel model='dac' relabel='no'/>
</serial>
Move the formatting inside the source to fix this to make the
seclabel persistent across XML format->parse.
Introduced by commit f8b08d0 'Add <seclabel> to character devices.'
So far, qemuxml2xml test was only able to check if the result matches
the original or the appropriate XML in qemuxml2xmloutdata regardless on
flags used to format the XML. Since the result can be different
depending on VIR_DOMAIN_XML_INACTIVE flag being used or not, this patch
adds support for qemuxml2xmlout-%s-active.xml and
qemuxml2xmlout-%s-inactive.xml output files. If the file specific to the
flag used exists, it is used in preference to the generic
qemuxml2xmlout-%s.xml file when reading the expected output.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
I noticed that depending on the <driver> attributes the user passed
in, the output may omit the <driver> element altogether. For example,
the rerror_policy has had this problem since commit 4bb4109 in Oct
2011. But in adding testsuite coverage to expose it, I found another
problem: the C code is just fine without a driver name, but the
XML validator required either a name or a cache mode.
* src/conf/domain_conf.c (virDomainDiskDefFormat): Update
conditional.
* docs/schemas/domaincommon.rng (diskDriver): Simplify.
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-copy-on-read.xml:
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-copy-on-read.args:
New files.
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-discard.xml:
Enhance test.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-drive-discard.xml:
Likewise.
* tests/qemuxml2argvtest.c (mymain): New test.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
In general, we try to make virt-xml-validate tolerant of input
elements in any order when possible. However, as written, the
RNG grammar did not permit <source> unless there was an explicit
type= attribute (even though the C code manages just fine by
defaulting to type='file'). After making the attribute optional
on the 'file' branch, I noticed that the use of diskspec was now
redundant with the branch when no <source> was supplied.
View this patch with 'git diff -b' for a better picture of the
schema change.
* docs/schemas/domaincommon.rng (disk): Hoist 'diskspec' out of
choice, make type='file' default, and still preserve interleave.
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-source-pool.xml:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-drive-discard.xml:
New files.
* tests/qemuxml2argvdata/qemuxml2argv-disk-source-pool.xml:
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-discard.xml:
Reorder XML.
* tests/qemuxml2xmltest.c (mymain): Cover new files.
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently, <cputune><shares>0</shares></cputune> is treated
as if it were not specified.
Treat is as a valid value if it was explicitly specified
and write it to the cgroups.
Commit a1cbe4b5 added a check for spaces around assignments and this
patch extends it to checks for spaces around '=='. One exception is
virAssertCmpInt where comma after '==' is acceptable (since it is a
macro and '==' is its argument).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add a new backend for any character device. This backend uses channel
in spice connection. This channel is similar to spicevmc, but
all-purpose in contrast to spicevmc.
Apart from spicevmc, spiceport-backed chardev will not be formatted
into the command-line if there is no spice to use (with test for that
as well). For this I moved the def->graphics counting to the start
of the function so its results can be used in rest of the code even in
the future.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add a new <timer> for the HyperV reference time counter enlightenment
and the iTSC reference page for Windows guests.
This feature provides a paravirtual approach to track timer events for
the guest (similar to kvmclock) with the option to use real hardware
clock on systems with a iTSC with compensation across various hosts.
According to the documentation describing various tunables for domain
timers not all the fields are supported by all the driver types. Express
these in the RNG:
- rtc, platform: Only these support the "track" attribute.
- tsc: only one to support "frequency" and "mode" attributes
- hpet, pit: tickpolicy/catchup attribute/element
- kvmclock: no extra attributes are supported
Additionally the attributes of the <catchup> element for
tickpolicy='catchup' are optional according to the parsing code. Express
this in the XML and fix a spurious space added while formatting the
<catchup> element and add tests for it.
Any test suite which involves a virDomainDefPtr should
call virDomainDefCheckABIStability with itself just as
a basic sanity check that the identity-comparison always
succeeds. This would have caught the recent NULL pointer
access crash.
Make sure we cope with def->name being NULL since the
VMWare config parser produces NULL names.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Map the new <panic> device in XML to the '-device pvpanic' command
line of qemu. Clients can then couple the <panic> device and the
<on_crash> directive to control behavior when the guest reports
a panic to qemu.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When changing memtune limits to unlimited with AFFECT_CONFIG, the
values in virDomainDef are set to PARAM_UNLIMITED, which causes the
whole <memtune> to be formatted. This can be changed in all drivers,
but it also makes sense to use the default (0) as another value for
"unlimited", since zero memory limit makes no sense.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1027096
If there's the following snippet in the domain XML, the domain will be
lost upon the daemon restart (if the domain is started prior restart):
<seclabel type='dynamic' relabel='yes'/>
The problem is, the 'label', 'imagelabel' and 'baselabel' are parsed
whenever the VIR_DOMAIN_XML_INACTIVE is *not* present or the label is
static. The latter is not our case, obviously. So, when libvirtd starts
up, it finds domain state xml and parse it. During parsing, many XML
flags are enabled but VIR_DOMAIN_XML_INACTIVE. Hence, our parser tries
to extract 'label', 'imagelabel' and 'baselabel' from the XML which
fails for model='none'. Err, this model - even though not specified in
XML - can be taken from qemu wide config file: /etc/libvirtd/qemu.conf.
However, in order to know we are dealing with model='none' the code in
question must be moved forward a bit. Then a new check must be
introduced. This is what the first two chunks are doing.
But this alone is not sufficient. The domain state XML won't contain the
model attribute without slight modification. The model should be
inserted into the XML even if equal to 'none' and the state XML is being
generated - what if the origin (the @security_driver variable in
qemu.conf) changes during libvirtd restarts?
At the end, a test to catch this scenario is introduced.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The linux kernel recently added support for paravirtual spinlock
handling to avoid performance regressions on overcomitted hosts. This
feature needs to be turned in the hypervisor so that the guest OS is
notified about the possible support.
This patch adds a new feature "paravirt-spinlock" to the XML and
supporting code to enable the "kvm_pv_unhalt" pseudo CPU feature in
qemu.
https://bugzilla.redhat.com/show_bug.cgi?id=1008989
The test case average timing code has not been used by any test
case ever. Delete it to remove complexity.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuxml2xmltest.c function testCompareXMLToXMLHelper would
clobber the 'ret' variable causing it to mis-diagnose OOM
errors.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
<controller type='pci' index='0' model='pci-root'>
<pcihole64 unit='KiB'>1048576</pcihole64>
</controller>
It can be used to adjust (or disable) the size of the 64-bit
PCI hole. The size attribute is in kilobytes (different unit
can be specified on input), but it gets rounded up to
the nearest GB by QEMU.
Disabling it will be needed for guests that crash with the
64-bit PCI hole (like Windows XP), see:
https://bugzilla.redhat.com/show_bug.cgi?id=990418
https://bugzilla.redhat.com/show_bug.cgi?id=924153
Commit 904e05a2 (v0.9.9) added a per-<disk> seclabel element with
an attribute relabel='no' in order to try and minimize the
impact of shutdown delays when an NFS server disappears. The idea
was that if a disk is on NFS and can't be labeled in the first
place, there is no need to attempt the (no-op) relabel on domain
shutdown. Unfortunately, the way this was implemented was by
modifying the domain XML so that the optimization would survive
libvirtd restart, but in a way that is indistinguishable from an
explicit user setting. Furthermore, once the setting is turned
on, libvirt avoids attempts at labeling, even for operations like
snapshot or blockcopy where the chain is being extended or pivoted
onto non-NFS, where SELinux labeling is once again possible. As
a result, it was impossible to do a blockcopy to pivot from an
NFS image file onto a local file.
The solution is to separate the semantics of a chain that must
not be labeled (which the user can set even on persistent domains)
vs. the optimization of not attempting a relabel on cleanup (a
live-only annotation), and using only the user's explicit notation
rather than the optimization as the decision on whether to skip
a label attempt in the first place. When upgrading an older
libvirtd to a newer, an NFS volume will still attempt the relabel;
but as the avoidance of a relabel was only an optimization, this
shouldn't cause any problems.
In the ideal future, libvirt will eventually have XML describing
EVERY file in the backing chain, with each file having a separate
<seclabel> element. At that point, libvirt will be able to track
more closely which files need a relabel attempt at shutdown. But
until we reach that point, the single <seclabel> for the entire
<disk> chain is treated as a hint - when a chain has only one
file, then we know it is accurate; but if the chain has more than
one file, we have to attempt relabel in spite of the attribute,
in case part of the chain is local and SELinux mattered for that
portion of the chain.
* src/conf/domain_conf.h (_virSecurityDeviceLabelDef): Add new
member.
* src/conf/domain_conf.c (virSecurityDeviceLabelDefParseXML):
Parse it, for live images only.
(virSecurityDeviceLabelDefFormat): Output it.
(virDomainDiskDefParseXML, virDomainChrSourceDefParseXML)
(virDomainDiskSourceDefFormat, virDomainChrDefFormat)
(virDomainDiskDefFormat): Pass flags on through.
* src/security/security_selinux.c
(virSecuritySELinuxRestoreSecurityImageLabelInt): Honor labelskip
when possible.
(virSecuritySELinuxSetSecurityFileLabel): Set labelskip, not
norelabel, if labeling fails.
(virSecuritySELinuxSetFileconHelper): Fix indentation.
* docs/formatdomain.html.in (seclabel): Document new xml.
* docs/schemas/domaincommon.rng (devSeclabel): Allow it in RNG.
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.xml:
* tests/qemuxml2argvdata/qemuxml2argv-seclabel-*-labelskip.args:
* tests/qemuxml2xmloutdata/qemuxml2xmlout-seclabel-*-labelskip.xml:
New test files.
* tests/qemuxml2argvtest.c (mymain): Run the new tests.
* tests/qemuxml2xmltest.c (mymain): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
This patch adds in special handling for a few devices that need to be
treated differently for q35 domains:
usb - there is no implicit/default usb controller for the q35
machinetype. This is done because normally the default usb controller
is added to a domain by just adding "-usb" to the qemu commandline,
and it's assumed that this will add a single piix3 usb1 controller at
slot 1 function 2. That's not what happens when the machinetype is
q35, though. Instead, adding -usb to the commandline adds 3 usb
(version 2) controllers to the domain at slot 0x1D.{1,2,7}. Rather
than having
<controller type='usb' index='0'/>
translate into 3 separate devices on the PCI bus, it's cleaner to not
automatically add a default usb device; one can always be added
explicitly if desired. Or we may decide that on q35 machines, 3 usb
controllers will be automatically added when none is given. But for
this initial commit, at least we aren't locking ourselves into
something we later won't want.
video - qemu always initializes the primary video device immediately
after any integrated devices for the machinetype. Unless instructed
otherwise (by using "-device vga..." instead of "-vga" which libvirt
uses in many cases to work around deficiencies and bugs in various
qemu versions) qemu will always pick the first unused slot. In the
case of the "pc" machinetype and its derivatives, this is always slot
2, but on q35 machinetypes, the first free slot is slot 1 (since the
q35's integrated peripheral devices are placed in other slots,
e.g. slot 0x1f). In order to make the PCI address of the video device
predictable, that slot (1 or 2, depending on machinetype) is reserved
even when no video device has been specified.
sata - a q35 machine always has a sata controller implicitly added at
slot 0x1F, function 2. There is no way to avoid this controller, so we
always add it. Note that the xml2xml tests for the pcie-root and q35
cases were changed to use DO_TEST_DIFFERENT() so that we can check for
the sata controller being automatically added. This is especially
important because we can't check for it in the xml2argv output (it has
no effect on that output since it's an implicit device).
ide - q35 has no ide controllers.
isa and smbus controllers - these two are always present in a q35 (at
slot 0x1F functions 0 and 3) but we have no way of modelling them in
our config. We do need to reserve those functions so that the user
doesn't attempt to put anything else there though. (note that the "pc"
machine type also has an ISA controller, which we also ignore).
This PCI controller, named "dmi-to-pci-bridge" in the libvirt config,
and implemented with qemu's "i82801b11-bridge" device, connects to a
PCI Express slot (e.g. one of the slots provided by the pcie-root
controller, aka "pcie.0" on the qemu commandline), and provides 31
*non-hot-pluggable* PCI (*not* PCIe) slots, numbered 1-31.
Any time a machine is defined which has a pcie-root controller
(i.e. any q35-based machinetype), libvirt will automatically add a
dmi-to-pci-bridge controller if one doesn't exist, and also add a
pci-bridge controller. The reasoning here is that any useful domain
will have either an immediate (startup time) or eventual (subsequent
hot-plug) need for a standard PCI slot; since the pcie-root controller
only provides PCIe slots, we need to connect a dmi-to-pci-bridge
controller to it in order to get a non-hot-plug PCI slot that we can
then use to connect a pci-bridge - the slots provided by the
pci-bridge will be both standard PCI and hot-pluggable.
Since pci-bridge devices themselves can not be hot-plugged into a
running system (although you can hot-plug other devices into a
pci-bridge's slots), any new pci-bridge controller that is added can
(and will) be plugged into the dmi-to-pci-bridge as long as it has
empty slots available.
This patch is also changing the qemuxml2xml-pcie test from a "DO_TEST"
to a "DO_DIFFERENT_TEST". This is so that the "before" xml can omit
the automatically added dmi-to-pci-bridge and pci-bridge devices, and
the "after" xml can include it - this way we are testing if libvirt is
properly adding these devices.
This controller is implicit on q35 machinetypes. It provides 31 PCIe
(*not* PCI) slots as controller 0.
Currently there are no devices that can connect to pcie-root, and no
implicit pci controller on a q35 machine, so q35 is still
unusable. For a usable q35 system, we need to add a
"dmi-to-pci-bridge" pci controller, which can connect to pcie-root,
and provides standard pci slots that can be used to connect other
devices.
Since PCI bridges, PCIe bridges, PCIe switches, and PCIe root ports
all share the same namespace, they are all defined as controllers of
type='pci' in libvirt (but with a differing model attribute). Each of
these controllers has a certain connection type upstream, allows
certain connection types downstream, and each can either allow a
single downstream connection at slot 0, or connections from slot 1 -
31.
Right now, we only support the pci-root and pci-bridge devices, both
of which only allow PCI devices to connect, and both which have usable
slots 1 - 31. In preparation for adding other types of controllers
that have different capabilities, this patch 1) adds info to the
qemuDomainPCIAddressBus object to indicate the capabilities, 2) sets
those capabilities appropriately for pci-root and pci-bridge devices,
and 3) validates that the controller being connected to is the proper
type when allocating slots or validating that a user-selected slot is
appropriate for a device..
Having this infrastructure in place will make it much easier to add
support for the other PCI controller types.
While it would be possible to do all the necessary checking by just
storing the controller model in the qemyuDomainPCIAddressBus, it
greatly simplifies all the validation code to also keep a "flags",
"minSlot" and "maxSlot" for each - that way we can just check those
attributes rather than requiring a nearly identical switch statement
everywhere we need to validate compatibility.
You may notice many places where the flags are seemingly hard-coded to
QEMU_PCI_CONNECT_HOTPLUGGABLE | QEMU_PCI_CONNECT_TYPE_PCI
This is currently the correct value for all PCI devices, and in the
future will be the default, with small bits of code added to change to
the flags for the few devices which are the exceptions to this rule.
Finally, there are a few places with "FIXME" comments. Note that these
aren't indicating places that are broken according to the currently
supported devices, they are places that will need fixing when support
for new PCI controller models is added.
To assure that there was no regression in the auto-allocation of PCI
addresses or auto-creation of integrated pci-root, ide, and usb
controllers, a new test case (pci-bridge-many-disks) has been added to
both the qemuxml2argv and qemuxml2xml tests. This new test defines a
domain with several dozen virtio disks but no pci-root or
pci-bridges. The .args file of the new test case was created using
libvirt sources from before this patch, and the test still passes
after this patch has been applied.
There are two ways to use a iSCSI LUN as disk source for qemu.
* The LUN's path as it shows up on host, e.g.
/dev/disk/by-path/ip-$ip:3260-iscsi-$iqn-fc18:iscsi.iscsi0-lun-1
* The libiscsi URI from the storage pool source element host attribute, e.g.
iscsi://demo.org:6000/iqn.1992-01.com.example/1
For a "volume" type disk, if the specified "pool" is of iscsi
type, we should support to use the LUN in either of above 2 ways.
That's why to introduce a new XML tag "mode" for the disk source
(libvirt should support iscsi pool with libiscsi, but it's another
new feature, which should be done later).
The "mode" can be either of "host" or "direct". Use "host" to indicate
use of the LUN with the path as it shows up on host. Use "direct" to
indicate to use it with the source pool host URI (future patches may support
to use network type libvirt storage too, e.g. Ceph)
Actually, I'm turning this function into a macro as filename,
function name and line number needs to be passed. The new
function virAsprintfInternal is introduced with the extended set
of arguments.
<hyperv>
<spinlocks state='off'/>
</hyperv>
results in:
error: XML error: missing HyperV spinlock retry count
Don't require retries when state is off and use virXPathUInt
instead of virXPathString to simplify parsing.
https://bugzilla.redhat.com/show_bug.cgi?id=784836#c19
For s390 the default console target type is virtio. This also requires
that an implicit virtio-serial controller is instantiated.
This testcase verifies that the target type of virtio is correctly set
in the generated XML if no target element was given and that the
corresponding virtio-serial element is generated too.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
With unknown good reasons, the attribute "bus" of scsi device
address is always set to 0, same for attribute "target". (See
virDomainDiskDefAssignAddress).
Though we might need to change the algorithm to honor "bus"
and "target" too, that's a different issue. The address generator
for scsi host device in this patch just follows the unknown
good reasons, only considering the "controller" and "unit".
It walks through all scsi controllers and their units, to see
if the address $controller:0:0:$unit can be used (if not used
by any disk or scsi host device yet), if found one, it sits on
it, otherwise, it creates a new controller (actually the controller
is implicitly created by someone else), and sits on
$new_controller:0:0:0 instead.
This attribute is going to represent number of queues for
multique vhost network interface. This commit implements XML
extension part of the feature and add one test as well. For now,
we can only do xml2xml test as qemu command line generation code
is not adapted yet.
The reason for it's not exposed for such long time is that the
enums for VirtioEventIdx and CopyOnReadType have same enum values
and Correspondingstrings. This fixes the bug and adds test.
QEMU introduced "discard" option for drive since commit a9384aff53,
<...>
@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and
controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
requests are ignored or passed to the filesystem. Some machine types
may not support discard requests.
</...>
This patch exposes the support in libvirt.
QEMU supported "discard" for "-drive" since v1.5.0-rc0:
% git tag --contains a9384aff53
contains
v1.5.0-rc0
v1.5.0-rc1
So this only detects the capability bit using virQEMUCapsProbeQMPCommandLine.
Adding a VNC WebSocket support for QEMU driver. This functionality is
in upstream qemu from commit described as v1.3.0-982-g7536ee4, so the
capability is being recognized based on QEMU version for now.
QEMU introduced command line "-mem-merge=on|off" (defaults to on) to
enable/disable the memory merge (KSM) at guest startup. This exposes
it by new XML:
<memoryBacking>
<nosharepages/>
</memoryBacking>
The XML tag is same with what we used internally for old RHEL.
Except the scsi host device's controller is "lsilogic", mapping
between the libvirt attributes and scsi-generic properties is:
libvirt qemu
-----------------------------------------
controller bus ($libvirt_controller.0)
bus channel
target scsi-id
unit lun
For scsi host device with "lsilogic" controller, the mapping is:
('target (libvirt)' must be 0, as it's not used; 'unit (libvirt)
must <= 7).
libvirt qemu
----------------------------------------------------------
controller && bus bus ($libvirt_controller.$libvirt_bus)
unit scsi-id
It's not good to hardcode/hard-check limits of these attributes,
and even worse, these limits are not documented, one has to find
out by either testing or reading the qemu code, I'm looking forward
to qemu expose limits like these one day). For example, exposing
"max_target", "max_lun" for megasas:
static const struct SCSIBusInfo megasas_scsi_info = {
.tcq = true,
.max_target = MFI_MAX_LD,
.max_lun = 255,
.transfer_data = megasas_xfer_complete,
.get_sg_list = megasas_get_sg_list,
.complete = megasas_command_complete,
.cancel = megasas_command_cancel,
};
Example of the qemu command line (lsilogic controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,scsi-id=8,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Example of the qemu command line (virtio-scsi controller):
-drive file=/dev/sg2,if=none,id=drive-hostdev-scsi_host7-0-0-0 \
-device scsi-generic,bus=scsi0.0,channel=0,scsi-id=128,lun=128,\
drive=drive-hostdev-scsi_host7-0-0-0,id=hostdev-scsi_host7-0-0-0
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
An example of the scsi hostdev XML:
<hostdev mode='subsystem' type='scsi'>
<source>
<adapter name='scsi_host0'/>
<address bus='0' target='0' unit='0'/>
</source>
<address type='drive' controller='0' bus='0' target='4' unit='8'/>
</hostdev>
Controller is implicitly added for scsi hostdev, though the scsi
controller's model defaults to "lsilogic", which might be not what
the user wants (same problem exists for virtio-scsi disk). It's
the existing problem, will be addressed later.
The device address must be specified manually. Later patch will let
libvirt generate it automatically.
This only introduces the generic XMLs for scsi hostdev, later patches
will add other elements, e.g. <readonly>, <shareable>.
Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: Osier Yang <jyang@redhat.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
The device option for vfio-pci is nearly identical to that for
pci-assign - only the configfd parameter isn't supported (or needed).
Checking for presence of the bootindex parameter is done separately
from constructing the commandline, similar to how it is done for
pci-assign.
This patch contains tests to check for proper commandline
construction. It also includes tests for parser-formatter-parser
roundtrips (xml2xml), because those tests use the same data files, and
would have failed had they been included before now.
qemu: xml/args tests for VFIO hostdev and <interface type='hostdev'/>
These should be squashed in with the patch that adds commandline
handling of vfio (they would fail at any earlier time).
Add a "dry run" address allocation to figure out how many bridges
will be needed for all the devices without explicit addresses.
Auto-add just enough bridges to put all the devices on, or up to the
bridge with the largest specified index.
With this patch, one can specify the disk source using libvirt
storage like:
<disk type='volume' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source pool='default' volume='fc18.img'/>
<target dev='vdb' bus='virtio'/>
</disk>
"seclabels" and "startupPolicy" are not supported for this new
disk type ("volume"). They will be supported in later patches.
docs/formatdomain.html.in:
* Add documents for new XMLs
docs/schemas/domaincommon.rng:
* Add rng for new XMLs;
src/conf/domain_conf.h:
* New struct for 'volume' type disk source (virDomainDiskSourcePoolDef)
* Add VIR_DOMAIN_DISK_TYPE_VOLUME for enum virDomainDiskType
src/conf/domain_conf.c:
* New helper virDomainDiskSourcePoolDefParse to parse the 'volume'
type disk source.
* New helper virDomainDiskSourcePoolDefFree to free the source def
if 'volume' type disk.
tests/qemuxml2argvdata/qemuxml2argv-disk-source-pool.xml:
tests/qemuxml2xmltest.c:
* New test
This introduce a new attribute "num_queues" (same with the good name
QEMU uses) for virtio-scsi controller. An example of the XML:
<controller type='scsi' index='0' model='virtio-scsi' num_queues='8'/>
The corresponding QEMU command line:
-device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x3 \
This patch removes the defaultDiskDriverName from the virCaps
structure. This particular default value is used only in the qemu driver
so this patch uses the recently added callback to fill the driver name
if it's needed instead of propagating it through virCaps.
This patch is the result of running:
for i in $(git ls-files | grep -v html | grep -v \.po$ ); do
sed -i -e "s/virDomainXMLConf/virDomainXMLOption/g" -e "s/xmlconf/xmlopt/g" $i
done
and a few manual tweaks.
This does nothing more than adding the new device and capability.
The device is present since QEMU 1.2.0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This plumbs in the XML description of iSCSI shares. The next patches
will add support for the libiscsi userspace initiator.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU 1.3 and newer support an alternative URI-based syntax to specify
the location of an NBD server. Libvirt can keep on using the old
syntax in general, but only the URI syntax supports IPv6 addresses.
The URI syntax also supports relative paths to Unix sockets. These
should never be used but aren't explicitly blocked either by the parser,
so support it just in case.
The URI syntax is intentionally compatible with Gluster's, and the
code can be reused.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This reuses the XML format that was introduced for Gluster.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
These are supported by nbd-server and by the NBD server that QEMU
embeds for live image access.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Enable more testing of NBD parsing, to ensure rewrites work.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The virCaps structure gathered a ton of irrelevant data over time that.
The original reason is that it was propagated to the XML parser
functions.
This patch aims to create a new data structure virDomainXMLConf that
will contain immutable data that are used by the XML parser. This will
allow two things we need:
1) Get rid of the stuff from virCaps
2) Allow us to add callbacks to check and add driver specific stuff
after domain XML is parsed.
This first attempt removes pointers to private data allocation functions
to this new structure and update all callers and function that require
them.
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Like "rawio", "sgio" is only allowed for block disk of device
type "lun".
It doesn't default disk->sgio to "filtered" when parsing, as
it won't be able to distinguish explicitly requested "filtered"
and a default "filtered" in driver then. We have to error out for
explicit request when the kernel doesn't support the new sysfs
knob "unpriv_sgio", however, for defaulted "filtered", we can
just ignore it if the kernel doesn't support "unpriv_sgio".
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
This patch adds full support for EOI setting for domains. Because this
is CPU feature (flag), the model needs to be added even when it's not
specified. Fortunately this problem was already solved with kvmclock,
so this patch simply abuses that.
And due to the size of the patch (17 lines) I dared to include the tests.
The following config elements now support a <vlan> subelements:
within a domain: <interface>, and the <actual> subelement of <interface>
within a network: the toplevel, as well as any <portgroup>
Each vlan element must have one or more <tag id='n'/> subelements. If
there is more than one tag, it is assumed that vlan trunking is being
requested. If trunking is required with only a single tag, the
attribute "trunk='yes'" should be added to the toplevel <vlan>
element.
Some examples:
<interface type='hostdev'/>
<vlan>
<tag id='42'/>
</vlan>
<mac address='52:54:00:12:34:56'/>
...
</interface>
<network>
<name>vlan-net</name>
<vlan trunk='yes'>
<tag id='30'/>
</vlan>
<virtualport type='openvswitch'/>
</network>
<interface type='network'/>
<source network='vlan-net'/>
...
</interface>
<network>
<name>trunk-vlan</name>
<vlan>
<tag id='42'/>
<tag id='43'/>
</vlan>
...
</network>
<network>
<name>multi</name>
...
<portgroup name='production'/>
<vlan>
<tag id='42'/>
</vlan>
</portgroup>
<portgroup name='test'/>
<vlan>
<tag id='666'/>
</vlan>
</portgroup>
</network>
<interface type='network'/>
<source network='multi' portgroup='test'/>
...
</interface>
IMPORTANT NOTE: As of this patch there is no backend support for the
vlan element for *any* network device type. When support is added in
later patches, it will only be for those select network types that
support setting up a vlan on the host side, without the guest's
involvement. (For example, it will be possible to configure a vlan for
a guest connected to an openvswitch bridge, but it won't be possible
to do that for one that is connected to a standard Linux host bridge.)