Commit Graph

10324 Commits

Author SHA1 Message Date
Michal Privoznik
a658a4bdf7 qemuBuildMemoryBackendProps: Prealloc mem for memfd backend
If a domain was using hugepages through memory-backend-file or
via -mem-path, we would turn prealloc on. But we are not doing
that for memory-backend-memfd. Fix this, because we need QEMU to
fully allocate hugepages.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-01 12:03:06 +02:00
Michal Privoznik
0217c5a6b4 qemuBuildMemoryBackendProps: Respect //memoryBacking/allocation/@mode=immediate
If user specifies immediate memory allocation in the domain XML,
they want QEMU to fully allocate its memory. And if the memory
was allocated using plain '-m' then we would honour it. But, if a
memory backend is used, then we don't set the prealloc attribute
of the backend.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-01 12:02:19 +02:00
Michal Privoznik
eda5cc7a62 qemuBuildMemoryBackendProps: Move @prealloc setting to backend agnostic part
All three memory backends (-file, -ram and -memfd) have .prealloc
attribute. Since we are setting it only for -file, the
corresponding code lives only under if() that handles that
specific backend. But in near future we will want to set the
attribute for other backends too. Therefore, move the
corresponding code outside of the if().

This causes some .argv files to be changed, but the only change
happening there is move of the attribute (best viewed with:
'git show --color-words=.').

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-01 12:01:31 +02:00
Michal Privoznik
bfb1ab1df1 qemu: Use .hostdevice attribute for usb-host
This originally started as bug 1595525 in which namespaces and
libusb used in QEMU were not playing nicely with each other. The
problem was that libusb built a cache of USB devices it saw
(which was a very limited set because of the namespace) and then
expected to receive udev events to keep the cache in sync. But
those udev events didn't come because on hotplug when we mknod()
devices in the namespace no udev event is generated. And what is
worse - libusb failed to open a device that wasn't in the cache.

Without going further into what the problem was, libusb added a
new API for opening USB devices that avoids using cache which
QEMU incorporated and exposes under "hostdevice" attribute.

What is even nicer is that QEMU uses qemu_open() for path
provided in the attribute and thus FD passing could be used.
Except qemu_open() expects so called FD sets instead of `getfd'
and these are not implemented in libvirt, yet.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1877218
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:59:35 +02:00
Michal Privoznik
66c5674e79 qemu_capabilities: Add QEMU_CAPS_USB_HOST_HOSTDEVICE
This capability tracks whether "usb-host" device has "hostdevice"
attribute. This attribute allows us to specify full path to the
USB device ("/dev/bus/usb/$bus/$dev") but more importantly, since
QEMU uses qemu_open() for this attribute it allows us to pass
pre-opened FD and have QEMU not bother with opening the file at
all.

The attribute was added in v5.1.0-rc0~71^2~1 QEMU commit.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:50:43 +02:00
Peter Krempa
43f0944f66 qemu: migration: Rename qemuMigrationEatCookie to qemuMigrationCookieParse
Use a more descriptive name and move the verb to the end so that the
functions conform with the naming policy.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Peter Krempa
5b32815d1a qemuMigrationCookieXMLFormatStr: Remove
There is just one caller, inline the code. This also optimizes the code
as we no longer have to calculate length of the output XML as it's
actually stored in the buffer struct.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Peter Krempa
2d155e2348 qemuMigrationSrcBeginPhase: Use qemuMigrationCookieNew
We need an empty cookie, so use qemuMigrationCookieNew instead of
qemuMigrationEatCookie with NULL/0 arguments.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Peter Krempa
775296cbd6 qemuMigrationCookieNew: Export
Allow direct use rather than going through qemuMigrationEatCookie with
NULL/0 arguments.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Peter Krempa
4aef0fe324 qemuMigrationCookieNew: Refactor allocation and cleanup
Move around some code so that we can get rid of the 'cleanup:' label.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Peter Krempa
6c8b68b312 qemu: migration: Rename qemuMigrationBakeCookie to qemuMigrationCookieFormat
Use a more descriptive name and move the verb to the end so that the
functions conform with the naming policy.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 10:01:05 +02:00
Masayoshi Mizuma
596c659b4e qemu: validate: Allow <transient/> disks
Extract the validation of transient disk option. We support transient
disks in qemu under the following conditions:

 - -blockdev is used
 - the disk source is a local file
 - the disk type is 'disk'
 - the disk is not readonly

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Masayoshi Mizuma
1c9227de5d qemu: process: Handle transient disks on VM startup
Add overlays after the VM starts before we start executing guest code.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
e86b16ced7 qemu: hotplug: Remove overlay of <transient> disk on disk unplug
Remove the overlay if the disk was <transient/>. Note that even if we'd
forbid unplug of such a disk through the API, the disk can still be
ejected from the guest.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Masayoshi Mizuma
cb62c23ff7 qemu: Block migration when transient disk option is enabled
Block migration when transient disk option is enabled to simplify the
handling of the overlay files.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Masayoshi Mizuma
83182f0838 qemu: Block disk hotplug when transient disk option is enabled
For now we disable disk hotplug of transient disk as it requires
creating an overlay prior to adding the frontend.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Masayoshi Mizuma
b3c582623c qemu: Block blockjobs when transient disk option is enabled
For now we disallow blockjobs with transient disks to avoid dealing with
obsoleted overlays.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
117ff49db7 qemu: snapshot: Introduce helpers for creating overlays on <transient/> disks
To implement <transient/> disks we'll need to install an overlay on top
of the original disk image which will be discarded after the VM is
turned off. This was initially implemented by qemu but libvirt never
picked up this option as the overlays were created by qemu without
libvirt involvment which didn't work with SELinux.

With blockdev the qemu feature became unsupported so we need to do this
via the snapshot code anyways.

The helpers introduced in this patch prepare a fake snapshot disk
definition for a disk which is configured as <transient/> and use it to
create a snapshot (without actually modifying metadata or persistent
def).

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
afc25e8553 qemu: prepare cleanup for <transient/> disk overlays
Later patches will implement support for <transient/> disks in libvirt
by installing an overlay on top of the configured image. This will
require cleanup after the VM will be stopped so that the state is
correctly discarded.

Since the overlay will be installed only during the startup phase of the
VM we need to ensure that qemuProcessStop doesn't delete the original
file on some previous failure. This is solved by adding
'inhibitDiskTransientDelete' VM private data member which is set prior
to any startup step and will be cleared once transient disk overlays are
established.

Based on that we can then delete the overlays for any <transient/> disk.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Ján Tomko
a63b48c5ec qemu: agent: set ifname to NULL after freeing
CVE-2020-25637

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Fixes: 0977b8aa07
Reviewed-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-30 11:42:28 +02:00
Ján Tomko
e4116eaa44 rpc: require write acl for guest agent in virDomainInterfaceAddresses
CVE-2020-25637

Add a requirement for domain:write if source is set to
VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_AGENT.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-30 11:42:28 +02:00
Peter Krempa
850f991897 qemuSnapshotDiskContextNew: Don't set 'ndd'
'ndd' tracks the actual number of snapshot disks filled into the
structure and is incremented by the functions filling the context, thus
it must not be set when initializing the context.

Fixes: 8c2ecdf131
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 13:20:45 +02:00
Peter Krempa
6e514ea27c qemuSnapshotDiskContextCleanup: Don't leak snapctxt
The container itself needs to be freed too.

Fixes: 8c2ecdf131
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 13:20:45 +02:00
Peter Krempa
4a927468fb qemuSnapshotDiskPrepare: rename to qemuSnapshotDiskPrepareActiveExternal
Make it obvious that the snapshot is prepared for the active external
snapshot case.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
ebdbd05aab qemuSnapshotCreateActiveExternalDisks: Extract actual snapshot creation to 'qemuSnapshotDiskCreate'
Extract the code which invokes the monitor and finalizes the snapshot
into a separate function for easier reuse.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
8c2ecdf131 qemu: snapshot: Introduce qemuSnapshotDiskContext
Add a container struct which holds all data needed to create and clean
up after a (for now external) snapshot. This will aggregate all the
'qemuSnapshotDiskDataPtr' the 'actions' of a transaction QMP command and
everything needed for cleanup at any given point.

This aggregation allows to simplify the arguments of the functions which
prepare the snapshot data and additionally will simplify the code
necessary for creating overlays on top of <transient/> disks.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
a09c82cbd5 qemuSnapshotDiskPrepare/Cleanup: simplify passing of 'driver' and 'blockdev'
Both can be fetched from 'vm'.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
eb4aa7b109 qemuSnapshotDiskUpdateSource: Extract 'driver' and 'blockdev' from 'vm'
Reduce the number of arguments by taking them from 'vm'.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
8eacbeac74 qemu: snapshot: Rename 'qemuSnapshotCreateDiskActive' to 'qemuSnapshotCreateActiveExternalDisks'
Be more specific about the role of the function. It's creating the disk
portion of an external active snapshot.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-24 11:49:13 +02:00
Peter Krempa
1bb0faa51a qemuSnapshotCreateInactiveExternal: Don't access 'idx' of snapshot
After virDomainSnapshotAlignDisks is called the definitions of disks in
the snapshot definition and in the domain definition are in the same
order so they can be addressed using the same index.

This frees up 'idx' to be removed later.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-23 22:37:56 +02:00
Peter Krempa
2b150c4d5f qemuDomainBlockRebase: Replace ternary operator with if/else
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-23 22:37:56 +02:00
Peter Krempa
bc3a78f61a virStorageSourceNew: Abort on failure
Add an abort() on the class/object allocation failures so that
virStorageSourceNew() always returns a virStorageSource and remove
checks from all callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-23 22:37:56 +02:00
Collin Walling
9c6996124f qemu: substitute missing model name for host-passthrough
Before:
  $ uname -m
  s390x
  $ cat passthrough-cpu.xml
  <cpu check="none" mode="host-passthrough" />
  $ virsh hypervisor-cpu-compare passthrough-cpu.xml
  error: Failed to compare hypervisor CPU with passthrough-cpu.xml
  error: internal error: unable to execute QEMU command 'query-cpu-model-comp
  arison': Invalid parameter type for 'modelb.name', expected: string

After:
  $ virsh hypervisor-cpu-compare passthrough-cpu.xml
  CPU described in passthrough-cpu.xml is identical to the CPU provided by hy
  pervisor on the host

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-23 21:20:06 +02:00
Daniel Henrique Barboza
ace5931553 conf, qemu: move qemuDomainNVDimmAlignSizePseries to domain_conf.c
We'll use the auto-alignment function during parse time, in
domain_conf.c. Let's move the function to that file, renaming
it to virDomainNVDimmAlignSizePseries(). This will also make it
clearer that, although QEMU is the only driver that currently
supports it, pSeries NVDIMM restrictions aren't tied to QEMU.

Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-23 13:19:47 -03:00
Ján Tomko
8e12a0b8fa qemu: firmware: check virJSONValueObjectGet return value
If the mapping is not present, we should not try to
access its elements.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: 8b5b80f4c5
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-09-23 13:26:34 +02:00
Daniel Henrique Barboza
63af8fdeb2 qemu: revert latest pSeries NVDIMM design changes
In [1], changes were made to remove the existing auto-alignment
for pSeries NVDIMM devices. That design promotes strange situations
where the NVDIMM size reported in the domain XML is different
from what QEMU is actually using. We removed the auto-alignment
and relied on standard size validation.

However, this goes against Libvirt design philosophy of not
tampering with existing guest behavior, as pointed out by Daniel
in [2]. Since we can't know for sure whether there are guests that
are relying on the auto-alignment feature to work, the changes
made in [1] are a direct violation of this rule.

This patch reverts [1] entirely, re-enabling auto-alignment for
pSeries NVDIMM as it was before. Changes will be made to ease
the limitations of this design without hurting existing
guests.

This reverts the following commits:

- commit 2d93cbdea9
  Revert "formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic"

- commit 0ee56369c8
  qemu_domain.c: change qemuDomainMemoryDeviceAlignSize() return type

- commit 07de813924
  qemu_domain.c: do not auto-align ppc64 NVDIMMs

- commit 0ccceaa57c
  qemu_validate.c: add pSeries NVDIMM size alignment validation

- commit 4fa2202d88
  qemu_domain.c: make qemuDomainGetMemorySizeAlignment() public

[1] https://www.redhat.com/archives/libvir-list/2020-July/msg02010.html
[2] https://www.redhat.com/archives/libvir-list/2020-September/msg00572.html

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2020-09-22 12:25:34 +02:00
Roman Bogorodskiy
f787df9947 conf: add 'isa' controller type
Introduce 'isa' controller type. In domain XML it looks this way:

    ...
    <controller type='isa' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
               function='0x0'/>
    </controller>
    ...

Currently, this is needed for the bhyve driver to allow choosing a
specific PCI address for that. In bhyve, this controller is used to
attach serial ports and a boot ROM.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-21 20:01:12 +04:00
Nikolay Shirokovskiy
5756a7bf2a qemu: fix concurrency crash bug in force snapshot revert
This patch is just revert of [1]. Actually we should NOT pass
QEMU_ASYNC_JOB_NONE as that patch suggests while we are in async job in order
to acquire nested jobs correctly. The patch tries to fix issues introduced by
another patch [2] where jobs are mistakenly cleared out in qemuProcessStop.
Later patch [3] fixed the issue introduced by patch [2]. Now we need to revert
[1] as well as we now still have same concurrency crash issues as [3] described
but for the force revert.

[1] 0c4408c83: qemu: Don't use asyncJob after stop during snapshot revert
[2] 888aa4b6b: qemuDomainObjPrivateDataClear: Don't leak @migParams
[3] d75f865fb: qemu: fix concurrency crash bug in snapshot revert

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-16 11:45:41 +03:00
Peter Krempa
f2d90b558f qemuBuildHostdevSCSIAttachPrepare: Propagate 'readonly' flag also for iSCSI
The 'readonly' hostdev property is stored separately from the
virStorageSource as some hostdevs are not described by a virStorage
source. We need to propagate the flag to the virStorage source also for
iSCSI backends as it's used to generate the backend properties.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1868856

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-16 09:08:56 +02:00
Peter Krempa
1a5f35dbd2 qemu: backup: Write TLS cert and secret object aliases into status XML
We've put the aliases into the backup job definition after the status
XML was already written so they didn't appear in the on-disk state.

Move the code putting them into the private definition earlier, so that
the status XML update done by saving blockjobs already writes them out.

Also add a note notifying that the block job status update writes the
status XML.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1870488
Fixes: 423576679a
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:25:22 +02:00
Peter Krempa
5058062b5d qemu: backup: Remove note that TLS should be implemented
Commit 423576679a implementing TLS forgot to remove the comment.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:25:22 +02:00
Peter Krempa
e5dc1427d7 qemuDomainPrepareHostdev: Don't base backend nodename on device alias
QEMU's blockdev nodenames which are used to back SCSI/iSCSI hostdevs are
limited to 32 characters. If a user passes a very long user alias as
name of the host device it's easy to end up with a too-long nodename.

To prevent this from happening don't base the nodename on the possibly
user-specified alias but on the normal sequential node name generator.

We then store the name in the status XML for further use.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
a669d68336 qemuDomainPrepareHostdev: base hostdev secret object names on backend alias
The secret object is used to pass data to the backend so it's better
fitting to base the secret object name on the SCSI host device backend
name.

Since we store the object alias in the status XML this modification is
safe in regards to existing guests.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
ca495825a3 qemuDomainPrepareHostdev: Allocate backend nodenames in the prepare function
Allocate the nodename in the setup function rather than in the command
line generator.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
c17e4907fe qemuDomainSecretHostdevPrepare: remove
The function is no longer used once we setup per-hostdev data.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
3673bdbe13 qemu: domain: Extract preparation of hostdev specific data to a separate function
Historically we've prepared secrets for all objects in one place. This
doesn't make much sense and it's semantically more appealing to prepare
everything for a single device type in one place.

Move the setup of the (iSCSI|SCSI) hostdev secrets into a new function
which will be used to setup other things as well in the future.

This is a similar approach we do for disks.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
82b60ec8ce qemuBlockStorageSourceAttachData: remove 'storageNodeNameCopy'
This was a hack when we were locally regenerating the nodename so that
it's not leaked. Now that we use proper virStorageSource with
persistence it's no longer required.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
cca2dd4890 qemuBuildHostdevSCSI(A|De)tachPrepare: Use virStorageSource in def for SCSI hostdevs
Modify the attach/detach data generators to actually use the
virStorageSourceStructure embedded in the SCSI config data rather than
creating an ad-hoc internal one.

The modification will allow us to properly store the nodename used for
the backend in the status XML rather than re-generating it all the
time.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Peter Krempa
482c52b177 qemu: domain: Fill in (i)SCSI backend nodename if it is not present in status XML
For upgrade reasons so that we can modify the used nodename we must
generate the old version for all status XMLs which don't have it stored
explicitly.

The change will be required as using the user-provided alias may result
in too-long nodenames which will be rejected by qemu.

Add code which fills in the appropriate old value and add test cases to
validate that it's added and also that existing nodenames are not
overwritten.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Ján Tomko
af16e754cd qemuProcessReconnect: clear 'oldjob'
After we started copying the privateData pointer in
qemuDomainObjRestoreJob, we should also free them
once we're done with them.

Register the clear function and use g_auto.
Also add a check for job->cb to qemuDomainObjClearJob,
to prevent freeing an uninitialized job.

https://bugzilla.redhat.com/show_bug.cgi?id=1878450

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: aca37c3fb2
2020-09-14 18:10:56 +02:00
Ján Tomko
a3c340e05f qemu: qemuDomainObjClearJob: use g_clear_pointer
The function used g_clear_pointer for all but one pointer.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-14 18:10:56 +02:00
Ján Tomko
ce66f9724c qemu: rename qemuDomainObjFreeJob -> qemuDomainObjClearJob
This function does not free the job.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-14 18:10:56 +02:00
Lin Ma
10841b6cb6 qemu: Return perf status that affect next boot for shutoff domains
While we set up perf events for a shutoff domain and check the settings,
All of perf events are reported as 'disabled', unless we add --config,
This is redundant for a shutoff domain.

 # virsh domstate $GUEST
shut off

 # virsh perf --domain $GUEST
cmt            : disabled
mbmt           : disabled
mbml           : disabled
......

 # virsh perf --domain $GUEST --enable mbmt
mbmt           : enabled

 # virsh perf --domain $GUEST
cmt            : disabled
mbmt           : disabled
mbml           : disabled
......

Use virDomainObjGetOneDefState instead of virDomainObjGetOneDef to fix
the issue. After patch, The perf event status of a shutoff domain is
reported correctly:

 # virsh domstate $GUEST
shut off

 # virsh perf --domain $GUEST
cmt            : disabled
mbmt           : disabled
mbml           : disabled
......

 # virsh perf --domain $GUEST --enable mbmt
mbmt           : enabled

 # virsh perf --domain $GUEST
cmt            : disabled
mbmt           : enabled
mbml           : disabled
......

Signed-off-by: Lin Ma <lma@suse.de>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-12 12:49:31 +02:00
Lin Ma
308ec831bb qemu: qemuDomainPMSuspendForDuration: Check availability of agent
It requires a guest agent configured and running in the domain's guest
OS, So check qemu agent during qemuDomainPMSuspendForDuration().

Signed-off-by: Lin Ma <lma@suse.de>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-12 12:49:31 +02:00
Tim Wiederhake
caf5a88e59 qemu: Use glib memory functions in qemuProcessReadLog
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-11 18:19:58 +02:00
Tim Wiederhake
3e60deeed3 qemu: Use glib memory functions in qemuDomainLogContextRead
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-11 18:19:58 +02:00
Tim Wiederhake
42459f0b01 qemu: Use glib memory functions in qemuDomainMasterKeyReadFile
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-11 18:19:58 +02:00
Erik Skultety
9824a82198 qemu: qemuDomainPMSuspendAgent: Don't assign to 'ret' in a conditional
When the guest agent isn't running, we still report success on a PM
suspend action even though we logged an error correctly, this is because
we poisoned the 'ret' value a few lines above.

Fixes: a663a86081

Signed-off-by: Erik Skultety <eskultet@redhat.com>
2020-09-11 14:48:40 +02:00
Michal Privoznik
c43622f06e qemuFirmwareFillDomain: Fill NVRAM template on migration too
In 8e1804f9f6 I've tried to fix the following use case: domain
is started with path to UEFI only and relies on libvirt to figure
out corresponding NVRAM template to create a per-domain copy
from. The fix consisted of having a check tailored exactly for
this use case and if it's hit then using FW autoselection to
figure it out. Unfortunately, the NVRAM template is not saved in
the inactive XML (well, the domain might be transient anyway).
Then, as a part of that check we see whether the per-domain copy
doesn't exist already and if it does then no template is looked
up hence no template will appear in the live XML.

This works, until the domain is migrated. At the destination, the
per-domain copy will not exist so we need to know the template to
create the per-domain copy from. But we don't even get to the
check because we are not starting a fresh new domain and thus the
qemuFirmwareFillDomain() function quits early.

The solution is to switch order of these two checks. That is
evaluate the check for the old style before checking flags.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852910
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2020-09-09 14:47:51 +02:00
Michal Privoznik
ec46e6d44b qemu_process: Separate VIR_PERF_EVENT_* setting into a function
When starting a domain, qemuProcessLaunch() iterates over all
VIR_PERF_EVENT_* values and (possibly) enables them. While there
is nothing wrong with the code, the for loop where it's done makes
it harder to jump onto next block of code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 10:57:24 +02:00
Peter Krempa
de79fad40f qemuBlockStorageSourceCreateDetectSize: Propagate cluster size for 'qcow2'
Propagate the cluster size from the original image as the user might
have configured a custom cluster size for performance reasons. Propagate
the cluster size of a qcow2 image to the new overlay or copy.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Peter Krempa
e60620e28b qemu: block: Allow specifying cluster size when using 'blockdev-create'
'blockdev-create' allows us to create the image with a custom cluster
size if we wish to. Wire it up for 'qcow2'.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Peter Krempa
fd49364d8b qemu: monitor: Detect image cluster size from 'query-named-block-nodes'
Configuring the cluster size of an image may have performance
implications. This patch allows us to detect cluster size for existing
images so that we will be able to propagate it to new images which are
based on existing images e.g. during snapshots/block-copy/etc.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Michal Privoznik
4a72b76b8a qemu_namespace: Don't leak mknod items that are being skipped over
When building and populating domain NS a couple of functions are
called that append paths to a string list. This string list is
then inspected, one item at the time by
qemuNamespacePrepareOneItem() which gathers all the info for
given path (stat buffer, possible link target, ACLs, SELinux
label) using qemuNamespaceMknodItemInit(). If the path needs to
be created in the domain's private /dev then it's added onto this
qemuNamespaceMknodData list which is freed later in the process.
But, if the path does not need to be created in the domain's
private /dev, then the memory allocated by
qemuNamespaceMknodItemInit() is not freed anywhere leading to a
leak.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 16:27:25 +02:00
Martin Kletzander
f5b486daea qemu: Allow setting affinity to fail and don't report error
This is just a clean-up of commit 3791f29b08 using the new parameter of
virProcessSetAffinity() introduced in commit 9514e24984 so that there is
no error reported in the logs.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 14:48:57 +02:00
Martin Kletzander
9514e24984 Do not report error when setting affinity is allowed to fail
Suggested-by: Ján Tomko <jtomko@redhat.com>

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 11:35:36 +02:00
Ján Tomko
7afc99ae2d qemu: migration: remove unused variable
../src/qemu/qemu_migration.c:4091:36: error: unused variable 'cfg' [-Werror,-Wunused-variable]
    g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: d92c2bbc65
2020-09-07 11:03:54 +02:00
Michal Privoznik
d92c2bbc65 lib: Prefer g_autoptr() declaration of virQEMUDriverConfigPtr
In the past we had to declare @cfg and then explicitly unref it.
But now, with glib we can use g_autoptr() which will do the unref
automatically and thus is more bulletproof.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-07 10:47:54 +02:00
Michal Privoznik
5befe4ee18 qemu_interface: Fix @cfg refcounting in qemuInterfacePrepareSlirp()
In the qemuInterfacePrepareSlirp() function, the qemu driver
config is obtained (via virQEMUDriverGetConfig()), but it is
never unrefed leading to mangled refcounter.

Fixes: 9145b3f1cc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-07 10:46:21 +02:00
Nikolay Shirokovskiy
399039a6b1 qemu: implement driver's shutdown/shutdown wait methods
On shutdown we just stop accepting new jobs for worker thread so that on
shutdown wait we can exit worker thread faster. Yes we basically stop
processing of events for VMs but we are going to do so anyway in case of daemon
shutdown.

At the same time synchronous event processing that some API calls may require
are still possible as per VM event loop is still running and we don't need
worker thread for synchronous event processing.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
860a999802 qemu: avoid deadlock in qemuDomainObjStopWorker
We are dropping the only reference here so that the event loop thread
is going to be exited synchronously. In order to avoid deadlocks we
need to unlock the VM so that any handler being called can finish
execution and thus even loop thread be finished too.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
5c0cd375d1 qemu: don't shutdown event thread in monitor EOF callback
This hunk was introduced in [1] in order to avoid loosing
events from monitor on stopping qemu process. But as explained
in [2] on destroy we won't get neither EOF nor any other
events as monitor is just closed. In case of crash/shutdown
we won't get any more events as well and qemuDomainObjStopWorker
will be called by qemuProcessStop eventually. Thus let's
remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF
as it is not useful anymore.

[1] e6afacb0f: qemu: start/stop an event loop thread for domains
[2] d2954c072: qemu: ensure domain event thread is always stopped

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Martin Kletzander
fc7d53edf4 qemu: Fix comment in qemuProcessSetupPid
This was supposed to be done in commit 3791f29b08, but I missed a spot.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-06 13:44:27 +02:00
Martin Kletzander
f51cbe92c0 qemu: Allow migration over UNIX socket
This allows:

 a) migration without access to network

 b) complete control of the migration stream

 c) easy migration between containerised libvirt daemons on the same host

Resolves: https://bugzilla.redhat.com/1638889

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-05 07:55:45 +02:00
Martin Kletzander
3791f29b08 qemu: Do not error out when setting affinity failed
Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:44:21 +02:00
Martin Kletzander
49186372db qemu: Allow NBD migration over UNIX socket
Adds new typed param for migration and uses this as a UNIX socket path that
should be used for the NBD part of migration.  And also adds virsh support.

Partially resolves: https://bugzilla.redhat.com/1638889

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
e74d627bb3 qemu: Rework starting NBD server for migration
Clean up the semantics by using one extra self-describing variable.
This also fixes the port allocation when the port is specified.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
d17ece4dd4 qemu: Rework qemuMigrationSrcConnect
Instead of saving some data from a union up front and changing an overlayed
struct before using said data, let's just set the new values after they are
decided.  This will increase the readability of future commit(s).

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
ae200449fe qemu: Use g_autofree in qemuMigrationSrcConnect
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Michal Privoznik
8abd1ffed1 qemu_namespace: Be tolerant to non-existent files when populating /dev
In 6.7.0 release I've changed how domain namespace is built and
populated. Previously it used to be done from a pre-exec hook
(ran in the forked off child, just before dropping all privileges
and exec()-ing QEMU), which not only meant we had to have two
different code paths for creating a node in domain's namespace
(one for this pre-exec hook, the other for hotplug ran from the
daemon), it also proved problematic because it was leaking FDs
into QEMU process.

To mitigate this problem, we've not only ditched libdevmapper
from the NS population process, I've also dropped the pre-exec
code and let the NS be populated from the daemon (using the
hotplug code). But, I was not careful when doing so, because the
pre-exec code was tolerant to files that doesn't exist, while
this new code isn't. For instance, the very first thing that is
done when the new NS is created is it's populated with
@defaultDeviceACL which contain files like /dev/null, /dev/zero,
/dev/random and /dev/kvm (and others).  While the rest will
probably exist every time, /dev/kvm might not and thus the new
code I wrote has to be tolerant to that.

Of course, users can override the @defaultDeviceACL (by setting
cgroup_device_acl in qemu.conf) and remove /dev/kvm (which is
acceptable workaround), but we definitely want libvirt to work
out of the box even on hosts without KVM.

Fixes: 9048dc4e62
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-04 08:18:21 +02:00
Han Han
be28a7fbd6 qemu_validate: Only allow none address for watchdog ib700
Since QEMU 1.5.3, the ib700 watchdog device has no options for address,
and not address in device tree:

$ /usr/libexec/qemu-kvm -version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-175.el7), Copyright (c) 2003-2008 Fabrice Bellard
$ /usr/libexec/qemu-kvm -device ib700,\?
$ virsh qemu-monitor-command seabios --hmp info qtree|grep ib700 -A 2
        dev: ib700, id "watchdog0"
        dev: isa-serial, id "serial0"
          index = 0

So only allow it to use none address.

Fixes: 8a54cc1d08
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1509908

Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-02 18:50:38 +02:00
Thomas Huth
f8333b3b0a qemu: Fix domfsinfo for non-PCI device information from guest agent
qemuAgentFSInfoToPublic() currently only sets the devAlias for PCI devices.
However, the QEMU guest agent could also provide the device name in the
"dev" field of the response for other devices instead (well, at least after
fixing another problem in the current QEMU guest agent...). So if creating
the devAlias from the PCI information failed, let's fall back to the name
provided by the guest agent. This helps to fix the empty "Target" fields
that occur when running "virsh domfsinfo" on s390x where CCW devices are
used for the guest instead of PCI devices.

Also add a proper debug message here in case we completely failed to set the
device alias, since this problem here was very hard to debug: The only two
error messages that I've seen were "Unable to get filesystem information"
and "Unable to encode message payload" - which only indicates that something
went wrong in the RPC call. No debug message indicated the real problem, so
I had to learn the hard way why the RPC call failed (it apparently does not
like devAlias left to be NULL) and where the real problem comes from.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1755075
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-09-02 17:49:09 +01:00
Thomas Huth
2f5d8ffebe qemu: Do not silently allow non-available timers on non-x86 systems
libvirt currently silently allows <timer name="kvmclock"/> and some
other timer tags in the guest XML definition for timers that do not
exist on non-x86 systems. We should not silently ignore these tags
since the users might not get what they expected otherwise.
Note: The error is only generated if the timer is marked with
present="yes" - otherwise we would suddenly refuse XML definitions
that worked without problems before.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1754887
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-02 18:48:14 +02:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Patrick Magauran
69e3381626 qemu: Add e1000e/vmxnet3 IFF_VNET_HDR support
Setting IFF_VNET_HDR for a tap device passes the whole packet to the
host, reducing emulation overhead and improving performance.

Libvirt bases its decision about applying IFF_VNET_HDR to the tap
interface on whether or not the model of the emulated network device
is virtio.  Originally, virtio was the only model to support
IFF_VNET_HDR in QEMU; however, the e1000e & vmxnet3 adapters have also
supported it since their introductions - QEMU commit
786fd2b0f87 for vmxnet3, and QEMU commit 6f3fbe4ed0 for e1000e, so it
should be set for those models too.

Signed-off-by: Patrick Magauran <patmagauran.j@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-01 18:48:21 -04:00
Jim Fehlig
9d15647dcb Xen: Add writeFiltering option for PCI devices
By default Xen only allows guests to write "known safe" values into PCI
configuration space, yet many devices require writes to other areas of
the configuration space in order to operate properly. To allow writing
any values Xen supports the 'permissive' setting, see xl.cfg(5) man page.

This change models Xen's permissive setting by adding a writeFiltering
attribute on the <source> element of a PCI hostdev. When writeFiltering
is set to 'no', the Xen permissive setting will be enabled and guests
will be able to write any values into the device's configuration space.
The permissive setting remains disabled in the absense of the
writeFiltering attribute, of if it is explicitly set to 'yes'.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-01 14:29:17 -06:00
Jim Fehlig
2ad009eadd qemu: Check for changes in qemu modules directory
Add a configuration option for specifying location of the qemu modules
directory, defaulting to /usr/lib64/qemu. Then use this location to
check for changes in the directory, indicating that a qemu module has
changed and capabilities need to be reprobed.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-01 14:22:24 -06:00
Ján Tomko
daec478600 Prefer https: for Red Hat websites
The list archives, people.redhat.com and bugzilla all support
https.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
2020-09-01 21:58:46 +02:00
Laine Stump
95089f481e util: assign tap device names using a monotonically increasing integer
When creating a standard tap device, if provided with an ifname that
contains "%d", rather than taking that literally as the name to use
for the new device, the kernel will instead use that string as a
template, and search for the lowest number that could be put in place
of %d and produce an otherwise unused and unique name for the new
device. For example, if there is no tap device name given in the XML,
libvirt will always send "vnet%d" as the device name, and the kernel
will create new devices named "vnet0", "vnet1", etc. If one of those
devices is deleted, creating a "hole" in the name list, the kernel
will always attempt to reuse the name in the hole first before using a
name with a higher number (i.e. it finds the lowest possible unused
number).

The problem with this, as described in the previous patch dealing with
macvtap device naming, is that it makes "immediate reuse" of a newly
freed tap device name *much* more common, and in the aftermath of
deleting a tap device, there is some other necessary cleanup of things
which are named based on the device name (nwfilter rules, bandwidth
rules, OVS switch ports, to name a few) that could end up stomping
over the top of the setup of a new device of the same name for a
different guest.

Since the kernel "create a name based on a template" functionality for
tap devices doesn't exist for macvtap, this patch for standard tap
devices is a bit different from the previous patch for macvtap - in
particular there was no previous "bitmap ID reservation system" or
overly-complex retry loop that needed to be removed. We simply find
and unused name, and pass that name on to the kernel instead of
"vnet%d".

This counter is also wrapped when either it gets to INT_MAX or if the
full name would overflow IFNAMSIZ-1 characters. In the case of
"vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly
someday someone will change the name from vnet to something else.

(NB: It is still possible for a user to provide their own
parameterized template name (e.g. "mytap%d") in the XML, and libvirt
will just pass that through to the kernel as it always has.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:44 -04:00
Laine Stump
d7f38beb2e util: replace macvtap name reservation bitmap with a simple counter
There have been some reports that, due to libvirt always trying to
assign the lowest numbered macvtap / tap device name possible, a new
guest would sometimes be started using the same tap device name as
previously used by another guest that is in the process of being
destroyed *as the new guest is starting.

In some cases this has led to, for example, the old guest's
qemuProcessStop() code deleting a port from an OVS switch that had
just been re-added by the new guest (because the port name is based on
only the device name using the port). Similar problems can happen (and
I believe have) with nwfilter rules and bandwidth rules (which are
both instantiated based on the name of the tap device).

A couple patches have been previously proposed to change the ordering
of startup and shutdown processing, or to put a mutex around
everything related to the tap/macvtap device name usage, but in the
end no matter what you do there will still be possible holes, because
the device could be deleted outside libvirt's control (for example,
regular tap devices are automatically deleted when the qemu process
terminates, and that isn't always initiated by libvirt but could
instead happen completely asynchronously - libvirt then has no control
over the ordering of shutdown operations, and no opportunity to
protect it with a mutex.)

But this only happens if a new device is created at the same time as
one is being deleted. We can effectively eliminate the chance of this
happening if we end the practice of always looking for the lowest
numbered available device name, and instead just keep an integer that
is incremented each time we need a new device name. At some point it
will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15
character limit if nothing else), and we can't guarantee that the new
name really will be the *least* recently used name, but "math"
suggests that it will be *much* less common that we'll try to re-use
the *most* recently used name.

This patch implements such a counter for macvtap/macvlan, replacing
the existing, and much more complicated, "ID reservation" system. The
counter is set according to whatever macvtap/macvlan devices are
already in use by guests when libvirtd is started, incremented each
time a new device name is needed, and wraps back to 0 when either
INT_MAX is reached, or when the resulting device name would be longer
than IFNAMSIZ-1 characters (which actually is what happens when the
template for the device name is "maccvtap%d"). The result is that no
macvtap name will be re-used until the host has created (and possibly
destroyed) 99,999,999 devices.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:36 -04:00
Michal Privoznik
fc19155819 qemu: Validate memory hotplug in domainValidateCallback instead of cmd line generator
When editing a domain with hotplug enabled, I removed the only
NUMA node it had and got no error. I got the error later though,
when starting the domain. This is not as user friendly as it can
be. Move the validation call out from command line generator and
into domain validator (which is called prior to starting cmd line
generation anyway).

When doing this, I had to remove memory-hotplug-nonuma xml2xml
test case because there is no way the test case can succeed,
obviously.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-01 09:30:27 +02:00
Daniel Henrique Barboza
2ba0b7497c virhostcpu.c: skip non x86 hosts in virHostCPUGetMicrocodeVersion()
Non-x86 archs does not have a 'microcode' version like x86. This is
covered already inside the function - just return 0 if no microcode
is found. Regardless of that, a read of /proc/cpuinfo is always made.
Each read will invoke the kernel to fill in the CPU details every time.

Now let's consider a non-x86 host, like a Power 9 server with 128 CPUs.
Each /proc/cpuinfo read will need to fetch data for each CPU and it
won't even matter because we know beforehand that PowerPC chips don't
have microcode information.

We can do better for non-x86 hosts by skipping this process entirely.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-25 19:44:39 +02:00
Ján Tomko
52cd849e62 VIR_XPATH_NODE_AUTORESTORE: remove semicolon from users
Since the macro no longer includes the 'ignore_value'
statement, stop putting another empty statement after it.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:12 +02:00
Ján Tomko
96b4f38603 Move debug statements after declarations
Many of our functions start with a DEBUG statement.
Move the statements after declarations to appease
our coding style.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Ján Tomko
0a37e0695b Split declarations from initializations
Split those initializations that depend on a statement
above them.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Ján Tomko
a5152f23e7 Move declarations before statements
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Peter Krempa
14b895ad3a qemuMigrationCapsToJSON: Refactor capability object formatting
Use virJSONValueObjectCreate rather than creating the object
piece-by-piece and use new accessors for bitmap to simplify the code.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-08-25 08:24:34 +02:00
Roman Bogorodskiy
9375bc7373 conf: allow to map sound device to host device
Introduce a new device element "<audio>" which allows
to map guest sound device specified using the "<sound>"
element to specific audio backend.

Example:

  <sound model='ich7'>
     <audio id='1'/>
  </sound>
  <audio id='1' type='oss'>
     <input dev='/dev/dsp0'/>
     <output dev='/dev/dsp0'/>
  </audio>

This block maps to OSS audio backend on the host using
/dev/dsp0 device for both input (recording)
and output (playback).

OSS is the only backend supported so far.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-08-25 08:42:16 +04:00
Roman Bogorodskiy
9499521718 conf: add 'ich7' sound model
Add 'ich7' sound model. This is a preparation for sound support in
bhyve, as 'ich7' is the only model it supports.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-08-25 08:42:16 +04:00
Laine Stump
5cad64ec03 qemu: remove unreachable code in qemuProcessStart()
Back when the original version of this chunk of code was added (commit
41b087198 in libvirt-0.8.1 in April 2010), we used virExecDaemonize()
to start the qemu process, and would continue on in the function
(which at that time was called qemudStartVMDaemon()) even if a -1 was
returned. So it was possible to get to this code with rv == -1 (it was
called "ret" in that version of the code).

In modern libvirt code, qemu is started with virCommandRun(); then we
call virPidFileReadPath(); those are the only two ways of setting "rv"
prior to this code being removed, and in either case if the new value
of rv < 0, then we immediately skip over the rest of the code to the
cleanup: label.

This means that the code being removed by this patch is
unreachable.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-08-24 23:46:51 -04:00