Commit Graph

13705 Commits

Author SHA1 Message Date
Jim Fehlig
b0dc8a923d qemu: conf: Improve the foo_image_format setting descriptions
The current description of the various foo_image_format settings can
be construded to imply the setting is only used to control compression
of the image. Improve the documentation to clarify that format describes
the representation of guest memory blocks on disk, which includes
compression among other possible layouts.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-10-09 13:48:38 -06:00
Peter Krempa
aa08a30048 qemu: snapshot: Allow internal snapshots with PFLASH nvram
With the new snapshot QMP command we can select which block device
backend receives the VM state and thus the main issue with internal
snapshots with pflash was addressed.

Thus we can relax the check and allow snapshots if the pflash nvram is
on qcow2.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 16:00:43 +02:00
Peter Krempa
8be8b7de78 qemuSnapshotActiveInternalDeleteGetDevices: Add warning when deleting inconsistent snapshot
As explained in the commit which added the new internal snapshot
deletion code we don't want to do any form of strict checking whether
the libvirt metadata is consistent with the on-disk state as we didn't
historically do that.

In order to be able to spot the cases add a warning into the logs if
such state is encountered. While warnings are easy to miss it's the only
reasonable way to do that. Users will be encouraged to file an issue
with the information, without requiring them to enable debug logs as
the reproduction of that issue may include very old historical state.

The checker is deliberately added separately so that it can be easily
reverted once it's no longer needed.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 16:00:43 +02:00
Peter Krempa
eac1a86f72 qemu snapshot: use QMP snapshot-delete for internal snapshots deletion
Switch to using the modern QMP command.

As the user visible logic when deleting internal snapshots using the old
'delvm' command was very lax in terms of catching inconsistencies
between the snapshot metadata and on-disk state we re-implement this
behaviour even using the new command. We could improve the validation
but that'd go at the cost of possible failures which users might not
expect.

As 'delvm' was simply ignoring any kind of failure the selection of
devices to delete the snapshot from is based on querying qemu first
which top level images do have the internal snapshot and then continuing
only on those.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 16:00:43 +02:00
Nikolai Barybin via Devel
b93af62c40 qemu snapshot: use QMP snapshot-save for internal snapshots creation
The usage of HMP commands are highly discouraged by qemu. Moreover,
current snapshot creation routine does not provide flexibility in
choosing target device for VM state snapshot.

This patch makes use of QMP commands snapshot-save and by
default chooses first writable non-shared qcow2 disk (if present)
as target for VM state.

Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 16:00:43 +02:00
Peter Krempa
6d8ae98fa0 qemu: monitor: Store internal snapshot names from 'query-named-block-nodes'
Store the names of internal snapshots present in supported images in the
data we dump from 'query-named-block-nodes' so that the upcoming changes
to the internal snapshot code can access it.

To test this we use the bitmap detection test cases which can be easily
extended to dump this data.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 16:00:43 +02:00
Nikolai Barybin via Devel
9df1453db8 qemu: capabilities: Introduce QEMU_CAPS_SNAPSHOT_INTERNAL_QMP capability
The 'snapshot-save/delete' QMP commands were introduced in QEMU 6.0.0,
so we add a compatible capability to check if target QEMU binary supports it.

Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 15:22:00 +02:00
Nikolai Barybin via Devel
ce4ed8deef qemu: blockjob: Add job types for 'snapshot-save/delete'
The snapshot creation/deletion QMP commands use the qemu 'job' API
to signal completion thus we need to add corresponding job types.

As the job handles everything internally we don't store anything about
the job.

Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 15:22:00 +02:00
Nikolai Barybin via Devel
5d0773633a qemu: monitor: Add plumbing for 'snaphot-save'/'snapshot-delete' QMP commands
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 15:22:00 +02:00
Peter Krempa
2e325804cc qemuDomainObjWait: Annotate with G_GNUC_WARN_UNUSED_RESULT
Callers must handle the return value of this function as the VM might
have died. Add compiler annotation to force it.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-09 15:22:00 +02:00
Jiri Denemark
93d97d8fa2 cpu_map: Drop vmx-invvpid-single-context from CPU models
QEMU calls the same feature differently, but translating the names in
libvirt does not make sense because the name in QEMU conflicts with
another feature. QEMU will not change the name for compatibility reasons
so we can just drop our invented name as it is not supported by QEMU.
Apart from this slightly different reason behind the feature being
unsupported by QEMU the situation is similar to vmx-ept-{uc,wb} dropped
in the previous patch and so is the implications.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:51 +02:00
Jiri Denemark
b1d4196580 cpu_map: Drop vmx-ept-{uc,wb} features from CPU models
Although QEMU knows and enables the corresponding MSR bits, it does not
allow users to configure them (there are no names attached to them).
They should have never been added to the CPU map and definitely not to
CPU models as the features will always be considered disabled regardless
on their actual state as QEMU will not report them.

While we cannot drop them completely for backward compatibility, we can
at least remove them from all CPU models.

This is effectively no change for CPU models where the features were
marked with added='yes' because migration source would always remove the
features from domain XML so not adding them to the live XML does not
hurt. On the other side the destination could not ever be surprised by
the features being suddenly enabled as QEMU never reports them, which
means libvirt considers them disabled all the time.

GraniteRapids CPU model is the only one which contains the feature ever
since it was introduced in libvirt, but it was never possible to migrate
a domain with such CPU. The source would always mark vmx-ept-wb as
disabled and the destination without the fixes in this series would drop
the feature from the XML completely as it is unsupported by QEMU and
disabled, but when probing for the actual CPU created by QEMU libvirt
would expect the feature to be enabled (as it is included in the CPU
model and not explicitly mentioned in the domain definition) and fail
the migration. There's nothing the source could do to workaround the
behavior on the destination and migration to older libvirt will still be
broken. But it's possible to migrate a domain with GraniteRapids to a
destination with this series applied from both old and new source.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:51 +02:00
Jiri Denemark
29aa9b02aa qemu: Replace big condition in virQEMUCapsCPUFilterFeatures with array
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:51 +02:00
Jiri Denemark
98700d354b qemu: Translate vmx-invvpid-single-context-noglobals CPU feature
This feature is called "vmx-invept-single-context-noglobals" in QEMU and
our CPU map even contains the appropriate alias. But we failed to
actually translate the name when talking to QEMU.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:51 +02:00
Jiri Denemark
00e55059e6 qemu: Do not drop unknown CPU features from domain XML
CPU features with policy='disable' which are unknown to QEMU may be
safely skipped when generating the -cpu command line, but we should
still keep them in the domain definition so that we can properly check
they are disabled after migrating the domain to a newer QEMU.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:51 +02:00
Jiri Denemark
aae8a5774b qemu: Drop vmx-* from migratable CPU model only when origCPU is set
When qemuDomainMakeCPUMigratable is called with origCPU == NULL the code
just removed all vmx-* features marked as added in the specified CPU
model just like when origCPU is not NULL, but does not list any of the
vmx-* features. But this is wrong, we should not touch these features at
all when no origCPU is supplied, which happens when parsing XML passed
by a user (e.g., migration XML). Such XML is supposed to be generated by
libvirt as migration XML and contains only vmx-* features explicitly
requested by a user.

https://issues.redhat.com/browse/RHEL-52314

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-10-09 14:46:50 +02:00
Andrea Bolognani
7d6759135e qemu: Handle locking of TPM state directory for incoming migration
By not attempting to lock the lock file, which would fail.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-10-03 14:50:06 +02:00
Andrea Bolognani
454219ad6c security: Allow skipping locking when labeling lock files
This is needed when migrating a guest that has persistent TPM
state: relabeling (which implies locking) needs to happen
before the swtpm process is started on the destination host,
but the lock file won't be released by the swtpm process
running on the source host before a handshake with the target
process has happened, creating a catch-22 scenario.

In order to make migration possible, make it so that locking
for lock files can be explicitly skipped. All other state
files are handled as usual.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-10-03 13:29:59 +02:00
Peter Krempa
3bfcb35dd5 qemu: migration: Don't remember seclabel for images shared from current host
In case when the user exports images from current host and there is an
incoming migration from a remote host, security label remembering would
be possible but would attempt to remember the label allowing access to
the image as the image is already used by a VM on remote host.

To prevent remembering the wrong label, we'll skip the remembering of
the label for any shared resource, so that the code behaves identically
regardless of how the image is accessed.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2024-10-03 13:29:26 +02:00
Andrea Bolognani
da0c363835 qemu: Always set labels for TPM state
Up until this point, we have avoided setting labels for
incoming migration when the TPM state is stored on a shared
filesystem. This seems to make sense, because since the
underlying storage is shared surely the labels will be as
well.

There's one problem, though: when a guest is migrated, the
SELinux context for the destination process is different from
the one of the source process.

We haven't hit any issues with the current approach so far
because NFS doesn't support SELinux, so effectively it doesn't
matter whether relabeling happens or not: even if the SELinux
contexts of the source and target processes are different,
both will be able to access the storage.

Now that it's possible for the local admin to manually mark
exported directories as shared filesystems, however, things
can get problematic.

Consider the case in which one host (mig-one) exports its
local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the
same time bind-mounts it to /var/lib/libvirt/swtpm; another
host (mig-two) mounts the same filesystem to the same
location, this time via NFS. Additionally, in order to
allow migration in both directions, on mig-one the
/var/lib/libvirt/swtpm directory is listed in the
shared_filesystems qemu.conf option.

When migrating from mig-one to mig-two, things work just fine;
going in the opposite direction, however, results in an error:

  # virsh migrate cirros qemu+ssh://mig-one/system
  error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'):
  qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f
  qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator'
  qemu-system-x86_64: load of migration failed: Input/output error

This is because the directory on mig-one is considered a
shared filesystem and thus labeling is skipped, resulting in
a SELinux denial.

The solution is quite simple: remove the check and always
relabel. We know that it's okay to do so not just because it
makes the error seen above go away, but also because no such
check currently exists for disks and other types of persistent
storage such as NVRAM files, which always get relabeled.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-10-03 13:29:26 +02:00
Andrea Bolognani
6952af8b43 qemu: Propagate shared_filesystems
virFileIsSharedFS() is the function that ultimately decides
whether a filesystem should be considered shared, but the list
of manually configured shared filesystems is part of the QEMU
driver's configuration, so we need to pass the information
through several layers in order to make use of it.

Note that with this change the list is propagated all the way
through, but its contents are still ignored, so the behavior
remains the same for now.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-10-03 13:29:26 +02:00
Andrea Bolognani
df3597ee70 qemu: Introduce shared_filesystems configuration option
As explained in the comment, this can help in scenarios where
a shared filesystem can't be detected as such by libvirt, by
giving the admin the opportunity to provide this information
manually.

https://issues.redhat.com/browse/RHEL-35752

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-10-03 13:29:25 +02:00
Peter Krempa
621f879adf qemu: Introduce and wire in 'VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES'
The new 'VIR_MIGRATE_PARAM_MIGRATE_DISKS_DETECT_ZEROES' migration
parameter allows users of migration to pass in a list of disks where
zero-detection (which avoids transferring the zeroed-blocks) should be
enabled for the migration connection. This comes at the cost of extra
CPU cycles needed to check each block if it's all-zero.

This is useful for storage backends where information about the
allocation state of a block is not available and thus without this the
image would become fully allocated on the destination.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
448b14f74d qemu: migration: Remove 'nmigration_disks' variable from all places
Now that none of the functions need it we can drop it.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
aaefaabf5a qemu: migration: Extract validation of disk target list
The migration code is checking the disk list provided via
VIR_MIGRATE_PARAM_MIGRATE_DISKS against existing disks. Extract it to a
helper function as we'll be passing another list of disk targets soon.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
4ebf1acb83 qemu: migration: Avoid use of 'nmigration_disks'
'migration_disks' is a NULL-terminated string list, so the code can be
converted to either iterate the string-list, use existing accessors or
check the presence of the pointers instead of checking the count.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
d98beef107 qemu: migration: Don't log 'nmigrate_disks'
The actual number of disks to migrate is not important. The presence of
disks to migrate can be inferred from presence of the 'migrate_disks'
pointer which is logged.

Since 'nmigrate_disks' will eventually be removed remove the logging
right now.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
ab52a069ee qemuMigrationSrcBeginPhaseBlockDirtyBitmaps: Use qemuMigrationAnyCopyDisk()
The function open-coded the checking whether a disk is being migrated
with non-shared storage and did so badly (not taking into account if
user doesn't explicitly provide list of disks to migrate).

Use the existing helper instead.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
165b30e06a qemu: migration: Pre-create QCOW2 images for non-shared storage with 0 allocation
Specify that the <allocation> parameter for the newly-created qcow2
image is 0 so that only metadata gets preallocated. Otherwise the
storage driver code instructs qemu to use 'fallocate' preallocation mode
and considers the image fully allocated.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:57:02 +02:00
Peter Krempa
54109db826 qemu: blockjob: Clean out disk mirror data after concluding the job
The 'disk->mirrorJob' and 'disk->mirrorState' fields need to be cleared
after a blockjob, but should be kept around while 'disk->mirror' is
still in place. As 'disk->mirror' is cleared only after conclusion of
the job in 'qemuBlockJobEventProcessConcluded()' we should be resetting
them only afterwards.

Move the code later, but since the job is unregistered from the disk we
need to store the pointer to the disk before concluding the job.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:54:40 +02:00
Peter Krempa
b85b60d140 qemu: blockjob: Update 'mirror' of a copy job before removing images
When concluding a job with a 'mirror' we first unplugged the appropriate
no-longer used images from qemu and then updated the definition.

Normally this wouldn't be a problem because for any other thread this is
done under the VM lock thus atomic. Unfortunately though, the AppArmor
security backend is using a VM XML to pass data to the helper process
and the state of the definition at that point was unsuitable to format a
valid XML thus making 'virt-aa-helper' report parsing failure.

Since we're removing the images the proper state of the VM definition
indeed should not include the mirror element any more at the point when
the images are removed.

Closes: https://gitlab.com/libvirt/libvirt/-/issues/601
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-10-01 12:54:40 +02:00
Laine Stump
bcd5ae4e73 qemu: fix regression in update-device for interfaces
Commit a37bd2a15b eliminated a failure
to update *any* change in an interface that was connected via a
network that consisted of a pool of VFs using macvtap passthrough
mode. Unfortunately it caused a regression that results in failure to
update changes to bandwidth/vlan/trustGuestRxFilters in any interface
connected via a network that uses a bridge to connect tap devices.

This fixes that problem by narrowing the usage of the fix in the
earlier patch to only be done in the case that the the interface is
connected via a macvtap+passthrough network.

Signed-off-by: Laine Stump <laine@redhat.com>
Fixes: a37bd2a15b
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-10-01 10:25:12 +02:00
Michal Privoznik
6126f743b1 qemu: Provide sane default for dump_guest_core
QEMU uses Linux extensions to madvise() to include/exclude guest
memory from core dump. These are obviously not available
everywhere. Currently, users have two options:

  1) configure <memory dumpCore=''/> in domain XML, or
  2) configure dump_guest_core in qemu.conf

While these work, they may harm user experience as "things just
don't work" out of the box. Provide sane default in
virQEMUDriverConfigNew() so neither of two options is required.

To have predictable results in tests, explicitly set
cfg->dumpGuestCore to false in qemuTestDriverInit() (which
creates cfg object for tests).

Resolves: https://gitlab.com/libvirt/libvirt/-/issues/679

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-09-25 08:38:09 +02:00
Michal Privoznik
18b61cb4f9 qemu.conf.in: Fix dumpCore capitalization
In qemu.conf.in we give examples of enabling/disabling core
dumps in domain XML. But the attribute is spelled wrong.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-09-25 08:38:09 +02:00
Martin Kletzander
6f0974ca32 qemu: Generate domain memory backing path directly
This makes qemuDomainGenerateMemoryBackingPath() nicer to call.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
f035f24777 qemu: Rename memory path functions
This way they make sense not only based on where they are located but
the name also relates to what they are actually doing.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
d599fc3d57 qemu: Make qemuGetMemoryBackingDomainPath static
After previous patches it is not used (and should not be used) outside
of qemu_domain.c.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
ff49d2a8c2 qemu: Use per-domain private memoryBackingDir for new memory backends
The function qemuGetMemoryBackingPath() does not need the @def any more
and priv->memoryBackingDir can be used instead of constructing the path
by calling qemuGetMemoryBackingDomainPath().

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
f58a4dc9d5 qemu: Set memoryBackingDir in private data upon start
This way we keep the path for each running VM.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
da8a1d7943 qemu: Add memoryBackingDir to qemuDomainObjPrivate
This way we _can_ (but do not, yet) remember the memory backing path for
running domains even after configuration change and daemon restart.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
c9a35eb255 qemu: Change parameters of qemuGetMemoryBackingDomainPath()
This way it does not use driver, since it will be later reworked and the
following patches cleaner, hopefully.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Martin Kletzander
edcf14be9c qemu: Move domain-related functions to qemu_domain
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-24 10:12:08 +02:00
Laine Stump
c7ea694f7d qemu: rework needBridgeChange/needReconnect decisions in qemuDomainChangeNet()
This patch simplifies (?) the of qemuDomainChangeNet() code while
fixing some incorrect decisions about exactly when it's necessary to
re-attach an interface's bridge device, or to fail the device update
(needReconnect[*]) because the type of connection has changed (or
within bridge and direct (macvtap) type because some attribute of the
connection has changed that can't actually be modified after the
tap/macvtap device of the interface is created).

Example 1: it's pointless to require the bridge device to be
reattached just because the interface has been switched to a different
network (i.e. the name of the network is different), since the new
network could be using the same bridge as the old network (very
uncommon, but technically possible). Instead we should only care if
the name of the *bridge device* changes (or if something in
<virtualport> changes - see Example 3).

Example 2: wrt changing the "type" of the interface, a change should
be allowed if old and new type both used a bridge device (whether or
not the name of the bridge changes), or if old and new type are both
"direct" *and* the device being linked and macvtap mode remain the
same. Any other change in interface type cannot be accommodated and
should be a failure (i.e. needReconnect).

Example 3: there is no valid reason to fail just because the interface
has a <virtualport> element - the <virtualport> could just say
"type='openvswitch'" in both the before and after cases (in which case
it isn't a change by itself, and so is completely acceptable), and
even if the interfaceid changes, or the <virtualport> disappears
completely, that can still be reconciled by simply re-attaching the
bridge device. (If, on the other hand, the modified <virtualport> is
for a type='direct' interface, we can't domodify that, and so must
fail (needReconnect).)

(I tried splitting this into multiple patches, but they were so
intertwined that the intermediate patches made no sense.)

[*] "needReconnect" was a flag added to this function way back in
2012, when I still believed that QEMU might someday support connecting
a new & different device backend (the way the virtual device connects
to the host) to an already existing guest netdev (the virtual device
as it appears to the guest). Sadly that has never happened, so for the
purposes of qemuDOmainChangeNet() "needReconnect" is equivalent to
"fail".

Resolves: https://issues.redhat.com/browse/RHEL-7036
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-19 13:56:39 -04:00
Laine Stump
601f4160b9 qemu: replace open-coded remove/attach bridge with virNetDevTapReattachBridge()
The new function does what the old qemuDomainChangeNetbridge() did
manually, except that:

1) the new function supports changing from a bridge of one type to
   another, e.g. from a Linux host bridge to an OVS
   bridge. (previously that wasn't handled)

2) the new function doesn't emit audit log messages. This is actually
   a good thing, because the old code would just log a "detach"
   followed immediately by "attach" for the same MAC address, so it's
   essentially a NOP. (the audit logs don't have any more detailed
   info about the connection - just the VM name and MAC address, so it
   makes no sense to log the detach/attach pair as it's not providing
   any information).

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-19 13:56:31 -04:00
Laine Stump
a37bd2a15b qemu: prevent unnecessarily failing live interface update
Attempts to use update-device to modify just the link state of a guest
interface were failing due to a supposed attempt to modify something
in the interface that can't be modified live (even though the only
thing that was changing was the link state, which *can* be modified
live).

It turned out that this failure happened because the guest interface
in question was type='network', and the network in question was a
'direct' network that provides each guest interface with one device
from a pool of network devices. As a part of qemuDomainChangeNet() we
would always allocate a new port from the network driver for the
updated interface definition (by way of calling
virDomainNetAllocateActualDevice(newdev)), and this new port (ie the
ActualNetDef in newdev) would of course be allocated a new host device
from the pool (which would of course be different from the one
currently in use by the guest interface (in olddev)). Because direct
interfaces don't support changing the host device in a live update,
this would cause the update to fail.

The solution to this is to realize that as long as the interface
doesn't get switched to a different network as a part of the update,
the network port information (ie the ActualNetDef) will not change as
a part of updating the guest interface itself. So for sake of
comparison we can just point the newdev at the ActualNetDef of olddev,
and then clear out one or the other when we're done (to avoid a double
free or, more likely, attempt to reference freed memory).

(If, on the other hand, the name of the network has changed, or if the
interface type has changed to type='network' from something else, then
we *do* need to allocate a new port (actual device) from the network
driver (as we used to do in all cases when the new type was
'network'), and also indicate that we'll need to replace olddev in the
domain with newdev (because either of these changes is major enough
that we shouldn't just try to fix up olddev)

Partially-Resolves: https://issues.redhat.com/browse/RHEL-7036
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-19 13:56:06 -04:00
Peter Krempa
852380cef5 qemuBuildChardevCommand: Remove unused variable
'charstr' is unused since 36d06a5637, breaking the build on some
platforms. Remove it.

Fixes: 36d06a5637
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2024-09-19 13:12:02 +02:00
Peter Krempa
24d468993c qemu: Reject unsupported chardev backend protocols
QEMU supports only 'raw' and 'telnet' in the

 <protocol type='telnets'/>

element. Reject 'telnets' and 'tls'. TLS transport for qemu chardevs is
configured via "tls='yes'" attribute added to the "<source>" element
instead, so this prevents potential misconfig as the value would be
silently accepted.

Closes: https://gitlab.com/libvirt/libvirt/-/issues/412
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:15 +02:00
Peter Krempa
2256466f70 qemu: monitor: Remove the old chardev backend generator
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:15 +02:00
Peter Krempa
e352a692a7 qemu: Use the new chardev backend JSON props generator also in the monitor
Now that we have a unified generator of chardev backend which is also
validated against the QMP schema we can replace the old generator with
it.

This patch modifies the monitor code to take virJSONValue 'props'
instead of the chardev definition and adds the conversion from the
chardev definition to JSON on higher levels.

The monitor code now also attempts to extract the returned 'pty' if
returned from qemu, so higher level code needs to report the error if
the path is needed and missing.

The current monitor generator is for now abandoned in place and will be
removed later.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:15 +02:00
Peter Krempa
d897ad2b89 qemu: Move check for chardev backends which can't be hotplugged out of the monitor
The upcoming refactor of the monitor code will make the hotplug code
paths use the same generator we have for commandline -chardev backends
which doesn't refuse to format certain backends which can't be
hotplugged.

To prepare for this we add a check to qemuHotplugChardevAttach()
refusing such hotplug and remove 'qemumonitorjsontest' test cases which
will not make sense any more.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:14 +02:00
Peter Krempa
36d06a5637 qemu: Introduce unified chardev backend config generator
Similarly to how we approach the generators for
-device/-object/-blockdev/-netdev rewrite the generator of -chardev to
be unified with the generator for the monitor.

Unfortunately with -chardev it will be a bit more quirky when compared
to the others as the generator itself will need to know whether it
generates command line output or not as a few field names change and data
is nested differently.

This first step adds the generator and uses it only for command line
generation. This was possible to achieve without changing any of the
output in tests.

In further patches the same generator will then be used also in the
monitor code replacing both.

As basis for the generator I took the monitor code but modified it to
have the same field order as the commandline code and extended it
further to support all backend types, even those which are not
hotpluggable.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:14 +02:00
Peter Krempa
9c88a566d8 qemu: capabilities: Explain that QEMU_CAPS_CHARDEV_JSON will be used in tests only
I've added that capability a long time ago when I was converting various
stuff to use JSON but the support in '-chardev' didn't yet materialize.

Fix the comment to make that clear and also that it'll be used in tests
for the upcoming refactor of the chardev code (so that we can validate
generator against the schema even if that doesn't yet work).

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-09-19 10:30:14 +02:00
Peter Krempa
ecffc91d02 qemuBackupDiskDataCleanupOne: Don't skip rest of cleanup if we can't enter monitor
Recent fix to use the proper 'async' monitor function would cause
libvirt to leak some of the objects it's supposed to clean up in other
places besides qemu.

Don't skip the whole function on failure to enter the job but just the
monitor section.

Fixes: 9b22c25548
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-06 18:14:34 +02:00
Peter Krempa
9b22c25548 qemu: backup: Use 'async' monitor in 'qemuBackupDiskDataCleanupOne'
'qemuBackupDiskDataCleanupOne()' is entering the monitor while we're in
the async backup job inside 'qemuBackupBegin()' which is semantically
wrong and per upstream report causes crashes if some monitoring commands
are run in parallel.

Use qemuDomainObjEnterMonitorAsync() instead.

Closes: https://gitlab.com/libvirt/libvirt/-/issues/668
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-09-05 15:52:26 +02:00
Peter Krempa
61c8a7180e qemuProcessSetupRawIO: Refactor return value and remove useless #ifdef
The function can return directly rather than setting 'ret' as there's no
cleanup.

It also doesn't make sense to conditionally compile out the 'break'
statement when checking whether a disk has rawio enabled if
'CAP_SYS_RAWIO' is _not_ defined as the function will still behave the
same.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-09-05 15:24:55 +02:00
Andrea Bolognani
ad92468924 qemu: Use pvpanic by default on aarch64
pvpanic-pci is the only reasonable implementation of a panic
device for aarch64/virt guests. Right now we're asking users to
provide the model name manually, but we can be more helpful and
fill it in automatically instead.

With this change, the aarch64-panic-no-model test no longer
fails and so it's no longer useful to us. Instead, we can amend
the aarch64-virt-default-models test case to include panic
coverage, something that until now wasn't possible.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-03 14:08:34 +02:00
Andrea Bolognani
6d92185a49 qemu: Sometimes the default panic model doesn't exist
Right now the fallback behavior is to use MODEL_ISA if we
haven't been able to find a better match, but that's not very
useful as we're still going to hit an error later, when
QEMU_CAPS_DEVICE_PANIC is not found at Validate time.

Instead of doing that, allow MODEL_DEFAULT to get all the
way to Validate and report an error upon encountering it.

The reported error changes slightly, but other than that the
set of configurations that are allowed and blocked remains
the same.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-03 14:07:47 +02:00
Andrea Bolognani
9e1970efa5 qemu: Refactor default panic model
Perform decisions based on the architecture and machine type
in a single place instead of duplicating them.

This technically adds new behavior for MODEL_ISA in
qemuDomainDefAddDefaultDevices(), but it doesn't make any
difference functionally since we don't set addPanicDevice
outside of ppc64(le) and s390(x). If we did, the lack of
handling for that value would be a latent bug.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-09-03 14:06:11 +02:00
Martin Kletzander
ac05dc8d4f qemu_driver: Fix indentation
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2024-09-03 13:13:58 +02:00
Kamil Szczęk
76f6caee3c qemu: Fix a few comments
Fixes: d292c5ba17
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2024-08-29 13:52:12 +02:00
Andrea Bolognani
725afb4e7b qemu: Expose availability of PS/2 feature in domcaps
This advertises the feature only for the architectures and
machine types where it can actually be used.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-29 09:44:53 +02:00
Andrea Bolognani
e0e496d90c qemu: Change signature for virQEMUCapsSupportsI8042Toggle()
We will soon need to use it in a context where we don't have
a virDomainDef handy.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-29 09:44:51 +02:00
Andrea Bolognani
d292c5ba17 qemu: Export a few functions
We're going to need them in a minute.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-29 09:44:46 +02:00
Praveen K Paladugu
af87ee7927 hypervisor: Move domain interface mgmt methods
From: Praveen K Paladugu <prapal@linux.microsoft.com>

Move methods to connect domain interfaces to host bridges to hypervisor.
This is to allow reuse between qemu and ch drivers.

Signed-off-by: Praveen K Paladugu <praveenkpaladugu@gmail.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-26 16:10:04 +02:00
Tim Wiederhake
7b6702d516 hyperv: Support hv-xmm-input enlightenment
qemu supports this enlightenment since version 7.10.

From the qemu commit:
    Hyper-V specification allows to pass parameters for certain hypercalls
    using XMM registers ("XMM Fast Hypercall Input"). When the feature is
    in use, it allows for faster hypercalls processing as KVM can avoid
    reading guest's memory.

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-08-26 11:48:15 +02:00
Tim Wiederhake
0313a500a9 hyperv: Support hv-emsr-bitmap enlightenment
qemu supports this enlightenment since version 7.10.

From the qemu commit:
    The newly introduced enlightenment allow L0 (KVM) and L1 (Hyper-V)
    hypervisors to collaborate to avoid unnecessary updates to L2
    MSR-Bitmap upon vmexits.

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-08-26 11:48:15 +02:00
Michal Privoznik
0888784f38 qemu: Use virEventThreadStop() in qemuProcessStop()
Currently, qemuProcessStop() unlocks given domain object right in
the middle of cleanup process. This is dangerous because there
might be another thread which is executing virDomainObjListAdd().
And since the domain object is on the list of domain objects AND
by the time qemuProcessStop() unlocks it the object is also
marked as inactive, the other thread acquires the lock and
switches vm->def pointer.

The unlocking of domain object is needed though, to allow even
processing thread finish its queue. Well, the processing can be
done before any cleanup is attempted.

Therefore, use freshly introduced virEventThreadStop() to join
the event thread and drop lock/unlock from the middle of
qemuProcessStop().

Now, there's a comment being removed that mentions
qemuDomainObjStopWorker() and why it has to be called only after
the domain is marked as dead. This comment is no longed
applicable because call to qemuDomainObjStopWorker() is removed
also. Moreover, priv->beingDestroyed is set to true before
unlocking the domain object, thus any event processing callback
is going to see the domain being destroyed and can chose to
either exit early or finish processing event.

Fixes: 3865410e7f
Resolves: https://issues.redhat.com/browse/RHEL-49607
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-08-22 13:33:09 +02:00
Kamil Szczęk
a9a5f8ef39 qemu: Introduce the 'ps2' feature
This introduces a new 'ps2' feature which, when disabled, results in
no implicit PS/2 bus input devices being automatically added to the
domain and addition of the 'i8042=off' machine option to the QEMU
command-line.

A notable side effect of disabling the i8042 controller in QEMU is that
the vmport device won't be created. For this reason we will not allow
setting the vmport feature if the ps2 feature is explicitly disabled.

Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-21 17:10:51 +02:00
Kamil Szczęk
9eb3c28323 qemu_capabilities: Introduce QEMU_CAPS_MACHINE_I8042_OPT
This capability tells us whether given QEMU binary supports the
'-machine xxx,i8042=on/off' toggle used to enable/disable PS/2
controller emulation.

A few facts:
- This option was introduced in QEMU 7.0 and defaults to 'on'
- QEMU versions before 7.0 enabled i8042 controller emulation implicitly
- This option (and i8042 controller emulation itself) is only supported
by descendants of the generic PC machine type (e.g. i440fx, q35, etc.)

Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-21 17:10:51 +02:00
Kamil Szczęk
51521d13a8 qemu: Improve PS/2 controller detection
Up until now, we've assumed that all x86 machines have a PS/2
controller built-in. This assumption was correct until QEMU v4.2
introduced a new x86-based machine type - microvm.

Due to this assumption, a pair of unnecessary PS/2 inputs are implicitly
added to all microvm domains. This patch fixes that by whitelisting
machine types which are known to include the i8042 PS/2 controller.

Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-21 17:10:51 +02:00
Peter Krempa
62d6e8dcb2 qemu: validate: Reject empty USB disks
Attempting to start qemu with or hotplug an empty 'usb-storage' based
disk results in the following error:

 qemu-system-x86_64: -device {"driver":"usb-storage","bus":"usb.0","port":"2","id":"usb-disk1","removable":true}: drive property not set

Reject such config at validation step and adjust tests.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-08-21 15:49:36 +02:00
Peter Krempa
204013d4aa qemu: block: Allow NULL 'data' in 'qemuBlockStorageSourceChainDetach'
Some code paths, such as if hotplug of an empty cdrom fails can cause
that 'qemuBlockStorageSourceChainDetach' will be called with 'NULL'
@data as there is no backend for the disk.

The above case became possible once we allowed hotplug of cdroms and
subsequently fixed the case when users would hotplug an empty cdrom
which ultimately caused the possibility of having no backend in the
hotplug code path which was not possible before (see 'Fixes:' below and
also the commit linked from there).

Make 'qemuBlockStorageSourceChainDetach' tolerate NULL @data by simply
returning early.

Fixes: 894c6c5c16
Resolves: https://issues.redhat.com/browse/RHEL-54550
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2024-08-21 15:49:36 +02:00
Michal Privoznik
ab7f877f27 lib: Use NULLSTR family of macros more
There is a family of convenient macros: NULLSTR, NULLSTR_EMPTY,
NULLSTR_STAR, NULLSTR_MINUS which hides ternary operator.

Generated using the following spatch (and its obvious variants):

  @@
  expression s;
  @@
    <+...
  - s ? s : "<null>"
  + NULLSTR(s)
    ...+>

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-08-19 13:44:12 +02:00
Peter Krempa
b3edf03c31 qemu: hotplug: Rollback FD passthrough for 'slirpfd' and 'vdpafd' on hotplug failure
On failure to plug the device the cleanup path didn't roll back the FD
passing to qemu thus qemu would hold the FDs indefinitely.

Resolves: https://issues.redhat.com/browse/RHEL-53964
Fixes: b79abf9c3c (vdpafd)
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-08-13 16:34:47 +02:00
Sandesh Patel
6d7dd09e8a qemu: format dma-translation on intel-iommu command line
Add dma-translation attribute to qemu command line if specified in
domain conf.

Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-09 10:05:56 +02:00
Sandesh Patel
6866f958c1 conf: add dma_translation attribute to iommu
Add dma_translation attribute to iommu to enable/disable dma traslation
for intel-iommu

Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-09 10:05:56 +02:00
Sandesh Patel
b2cc19e5fd qemu_capabilities: Introduce QEMU_CAPS_INTEL_IOMMU_DMA_TRANSLATION
This capability tracks whether intel-iommu device has
dma-translation attribute.

Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-09 10:05:56 +02:00
Jiri Denemark
11f6773f19 qemu: Avoid false failure when resuming post-copy migration
Depending on timing between QEMU and libvirt an attempt to resume failed
post-copy migration could immediately report a failure in post-copy
phase again even though the migration actually resumed and is
progressing just fine.

This is caused by QEMU reporting the original migration state (i.e.,
postcopy-paused) until migration is successfully resumed and QEMU
switches to postcopy-active. QEMU 9.1 introduced a new
postcopy-recover-setup migration state which is entered immediately
after requesting migration to be resumed and we can reliably wait for
the migration to either continue or fail without being confused by the
old state.

https://issues.redhat.com/browse/RHEL-22166

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-08 16:27:13 +02:00
Jiri Denemark
79e0b50bb6 qemu: Add support for postcopy-recover-setup migration state
This patch adds support for recognizing the new migration state reported
by QEMU when post-copy recovery is requested. It is not actually used
for anything yet.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-08-08 16:27:13 +02:00
Peter Krempa
4ba4f659e4 qemu_domain: Strip <acpi/> from s390(x) definitions
The s390(x) machines never supported ACPI. That didn't stop users
enabling ACPI in their config. As of libvirt-9.2 (98c4e3d073) with new
enough qemu we reject configs which require ACPI, but qemu can't satisfy
it.

This breaks migration of existing VMs with the old wrong configs to new
libvirt installations.

To address this introduce a post-parse fixup removing the ACPI flag
specifically for s390 machines which do enable it in the definition.

The advantage of doing it in post-parse, rather than simply relaxing the
ABI stability check to allow users providing an fixed XML when migrating
(allowing change of the ACPI flag for s390 in ABI stability check, as it
 doesn't impact ABI), is that only the destination installation needs to
be patched in order to preserve migration.

To mitigate the disadvantage of simply stripping it from all s390(x)
configs the hack is not applied when defining or starting a new domain
from the XML, to preserve the error about unsupported configuration.

Resolves: https://issues.redhat.com/browse/RHEL-49516
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
2024-08-06 15:12:14 +02:00
Michal Privoznik
d913f204e0 qemu: Pre-create pstore device file
So far we are relying on QEMU or sysadmin to create the file for
pstore. This is suboptimal as in the case of the former we can
not set proper seclabels (there's nothing to set seclabels on
until QEMU is started).

Therefore, make sure the file is created before launching QEMU
and that it has the correct size.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2024-07-31 14:15:48 +02:00
Michal Privoznik
de355b7873 qemu: Autofill pstore path if missing
Introduced only a couple of commits ago (in
v10.5.0-84-g90e50e67c6) the pstore device acts as a nonvolatile
storage, where guest kernel can store information about crashes.
This device, however, expects a file in the host from which the
crash data is read. So far, we expected users to provide a path,
but we can autogenerate one if missing. Just put it next to
per-domain's NVRAM stores.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2024-07-30 17:22:00 +02:00
Michal Privoznik
3cfe4caa0a qemu: Build cmd line for pstore device
Nothing special going on here.

Resolves: https://issues.redhat.com/browse/RHEL-24746
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
2024-07-25 16:04:50 +02:00
Michal Privoznik
90e50e67c6 conf: Introduce pstore device
The aim of pstore device is to provide a bit of NVRAM storage for
guest kernel to record oops/panic logs just before the it
crashes. Typical usage includes usage in combination with a
watchdog so that the logs can be inspected after the watchdog
rebooted the machine. While Linux kernel (and possibly Windows
too) support many backends, in QEMU there's just 'acpi-erst'
device so stick with that for now. The device must be attached to
a PCI bus and needs two additional values (well, corresponding
memory-backend-file needs them): size and path. Despite using
memory-backend-file this does NOT add any additional RAM to the
guest and thus I've decided to expose it as another device type
instead of memory model.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
2024-07-25 16:04:50 +02:00
Michal Privoznik
4a9c2d9bbe qemu_capabilities: Introduce QEMU_CAPS_DEVICE_ACPI_ERST
This capability tracks whether QEMU has acpi-erst device.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
2024-07-25 16:04:50 +02:00
Ján Tomko
8d3b239737 qemu: virtiofs: cache: use 'never' instead of 'none'
The new option style renamed one of the cache modes.

https://issues.redhat.com/browse/RHEL-50329

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-25 13:41:46 +02:00
Boris Fiuczynski
e62c26a20d qemu: add a monitor to /proc/$pid when killing times out
In cases when a QEMU process takes longer than the time sigterm and
sigkill are issued to kill the process do not simply fail and leave the
VM in state VIR_DOMAIN_SHUTDOWN until the daemon stops. Instead set up
an fd on /proc/$pid and get notified when the QEMU process finally has
terminated to cleanup the VM state.

Resolves: https://issues.redhat.com/browse/RHEL-28819
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-24 13:16:02 +02:00
Kristina Hanicova
e5eb64e9fd qemu_hotplug: Do not allow absent values in rom settings
If there are absent values in an already existing element
specifying rom settings, we simply use the old ones. This
behaviour is not desired, as users might think that deleting the
element from XML would delete the setting (because the hotplug
succeeds) - which does not happen. Because of that, we should not
accept an interface without elements that cannot be changed.

Therefore, we should not allow absent values for already existing
rom setting during hotplug.

Resolves: https://issues.redhat.com/browse/RHEL-7109
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-24 13:07:20 +02:00
Adam Julis
b53e9f834b virtiofs: rename member to 'openfiles' for clarity
New element 'openfiles' had confusing name. Since the patch with
this new element wasn't propagate yet, old name ('rlimit_nofile')
was changed.

...
<binary>
  <openfiles max='122333'/>
</binary>
...

Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-24 12:48:16 +02:00
Andrea Bolognani
47d34ffb26 qemu: ROM firmware images are always readonly
By definition. Accordingly, filter them out when looking for
a read/write image.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-07-19 15:18:39 +02:00
Andrea Bolognani
f13b3f8098 qemu: Filter firmware images by type
If the configuration explicitly requests a specific type of
firmware image, be it pflash or ROM, we should ignore all images
that are not of that type.

If no specific type has been requested, of course, any type is
considered a match and the selection will be based upon the
other attributes.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-07-19 15:18:38 +02:00
Adam Julis
ea6c3ea2d5 qemu: virtiofs: format --rlimit-nofile
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/485
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-17 13:18:11 +02:00
Jiri Denemark
bec903cae8 qemu: Don't leave beingDestroyed=true on inactive domain
Recent commit v10.4.0-87-gd9935a5c4f made a reasonable change to only
reset beingDestroyed back to false when vm->def->id is reset to make
sure other code can detect a domain is (about to become) inactive. It
even added a comment saying any caller of qemuProcessBeginStopJob is
supposed to call qemuProcessStop to clear beingDestroyed. But not every
caller really does so because they first call qemuProcessBeginStopJob
and then check whether a domain is still running. If not the
qemuProcessStop call is skipped leaving beingDestroyed=true. In case of
a persistent domain this may block incoming migrations of such domain as
the migration code would think the domain died unexpectedly (even though
it's still running).

The qemuProcessBeginStopJob function is a wrapper around
virDomainObjBeginJob, but virDomainObjEndJob was used directly for
cleanup. This patch introduces a new qemuProcessEndStopJob wrapper
around virDomainObjEndJob to properly undo everything
qemuProcessBeginStopJob did.

https://issues.redhat.com/browse/RHEL-43309

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-12 11:27:03 +02:00
Ján Tomko
d94b31a68a qemu: migration: allow migration for virtiofs
Allow migration if the "migrate-precopy" capability is present or
libvirt is not the one running the virtiofs daemon.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-10 12:32:23 +02:00
Ján Tomko
8dc04cafec qemu: do not use deprecated options for new virtiofsd
Use the to-be-introduced virtiofsd capability to mark whether
new options are safe to use.

Depends on:
https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/231

https://issues.redhat.com/browse/RHEL-7108

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-10 12:32:23 +02:00
Ján Tomko
730eaafaac qemu: fill capabilities for virtiofsd
Run the daemon with --print-capabilities first, to see what it supports.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-10 12:32:23 +02:00
Kshitij Jha
6d3955acf1 Include support for Vfio stats during Migration
As of now, libvirt supports few essential stats as
part of virDomainGetJobStats for Live Migration such
as memory transferred, dirty rate, number of iteration
etc. Currently it does not have support for the vfio
stats returned via QEMU. This patch adds support for that.

Signed-off-by: Kshitij Jha <kshitij.jha@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-10 12:28:55 +02:00
hongmianquan
0d3e962d47 security_manager: Remove redundant qemuSecurityGetNested() call
This commit removes the redundant call to qemuSecurityGetNested() in
qemuStateInitialize(). In qemuSecurityGetModel(), the first security manager
in the stack is already used by default, so this change helps to
simplify the code.

Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-09 13:24:57 +02:00
hongmianquan
790b4d8067 security_manager: Ensure top lock is acquired before nested locks
Fix libvirtd hang since fork() was called while another thread had
security manager locked.

We have the stack security driver, which internally manages other security drivers,
just call them "top" and "nested".

We call virSecurityStackPreFork() to lock the top one, and it also locks
and then unlocks the nested drivers prior to fork. Then in qemuSecurityPostFork(),
it unlocks the top one, but not the nested ones. Thus, if one of the nested
drivers ("dac" or "selinux") is still locked, it will cause a deadlock. If we always
surround nested locks with top lock, it is always secure. Because we have got top lock
before fork child libvirtd.

However, it is not always the case in the current code, We discovered this case:
the nested list obtained through the qemuSecurityGetNested() will be locked directly
for subsequent use, such as in virQEMUDriverCreateCapabilities(), where the nested list
is locked using qemuSecurityGetDOI, but the top one is not locked beforehand.

The problem stack is as follows:

libvirtd thread1          libvirtd thread2          child libvirtd
        |                           |                       |
        |                           |                       |
virsh capabilities      qemuProcessLanuch                   |
        |                           |                       |
        |                       lock top                    |
        |                           |                       |
    lock nested                     |                       |
        |                           |                       |
        |                           fork------------------->|(nested lock held by thread1)
        |                           |                       |
        |                           |                       |
    unlock nested               unlock top              unlock top
                                                            |
                                                            |
                                                qemuSecuritySetSocketLabel
                                                            |
                                                            |
                                                    lock nested (deadlock)

In this commit, we ensure that the top lock is acquired before the nested lock,
so during fork, it's not possible for another task to acquire the nested lock.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1303031

Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-09 13:22:26 +02:00
Miroslav Los via Devel
8515a178f8 qemuDomainChangeNet: check virtio options for non-virtio models
In a domain created with an interface with a <driver> subelement,
the device contains a non-NULL virDomainVirtioOptions struct, even
for non-virtio NIC models. The subelement need not be present again
after libvirt restarts, or when the interface is passed to clients.

When clients such as virsh domif-setlink put back the modified
interface XML, the new device's virtio attribute is NULL. This may
fail the equality checks for virtio options in qemuDomainChangeNet,
depending on whether libvird was restarted since define or not.

This patch modifies the check for non-virtio models, to ignore olddev
value of virtio (assumed valid), and to allow either NULL or a struct
with all values ABSENT in the new virtio options.

Signed-off-by: Miroslav Los <mirlos@cisco.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-09 13:20:05 +02:00
Daniel P. Berrangé
e40a533118 qemu: set swtpm log level parameter
This wires up the emulator 'debug' parameter to control the
/usr/bin/swtpm 'level' parameter for logging.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-07-05 14:43:15 +01:00
Adam Julis
c3302ceb1d qemuDomainChangeNet: forbid changing portgroup
Changing the postgroup attribute caused unexpected behavior.
Although it can be implemented, it has a non-trivial solution.
No requirement or use has yet been found for implementing this
feature, so it has been disabled for hot-plug.

Resolves: https://issues.redhat.com/browse/RHEL-7299
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-03 09:59:10 +02:00
Michal Privoznik
cf7d495324 qemu: Drop _virQEMUDriver::hostFips
The 'hostFips' member of _virQEMUDriver struct is not used
really, due to previous cleanups. Drop it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-02 09:14:24 +02:00
Michal Privoznik
ce48d584cc qemu_capabilities: Retire QEMU_CAPS_VXHS
The support for VXHS device was removed in QEMU commit
v5.1.0-rc1~16^2~10. Since we require QEMU-5.2.0 at least there's
no QEMU that has the device and thus the corresponding capability
can be retired.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-02 09:14:23 +02:00
Michal Privoznik
295eb1b3d8 qemu_capabilities: Retire QEMU_CAPS_ENABLE_FIPS
The capability is no longer used. Retire it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-02 09:14:22 +02:00
Michal Privoznik
8cf81de8bf qemu_capabilities: Drop version check for QEMU_CAPS_ENABLE_FIPS and QEMU_CAPS_NETDEV_USER
Now that the minimal required version of QEMU is 5.2.0 the
conditional setting of QEMU_CAPS_ENABLE_FIPS and
QEMU_CAPS_NETDEV_USER is effectively a dead code. Drop it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-02 09:14:20 +02:00
Michal Privoznik
073bf16784 qemu_capabilities: Require QEMU-5.2.0 or newer
According to repology.org and/or distro repos these are the version of QEMU:

     CentOS Stream 9: qemu-kvm-9.0.0
           Debian 11: qemu-5.2.0
           Fedora 39: qemu-8.3.1
  openSUSE Leap 15.3: qemu-5.2.0
              RHEL-8: qemu-6.2.0
        Ubuntu 22.04: qemu-6.2.0

Since the minimal version is 5.2.0 we can bump from 4.2.0 to
5.2.0.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-07-02 09:14:18 +02:00
Michal Privoznik
8f34fd0c4c qemu_domain: Set 'passt' net backend if 'default' is unsupported
It may happen that QEMU is compiled without SLIRP but with
support for passt. In such case it is acceptable to alter user
provided configuration and switch backend to passt as it offers
all the features as SLIRP.

Resolves: https://issues.redhat.com/browse/RHEL-45518
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-01 12:40:06 +02:00
Michal Privoznik
bd6060d1c3 qemu_validate: Use domaincaps to validate supported net backend type
Now that the logic for detecting supported net backend types has
been moved to domain capabilities generation, we can just use it
when validating net backend type. Just like we do for device
models and so on.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-01 12:39:10 +02:00
Michal Privoznik
6a0f45a9e0 qemu_capabilities: Fill supported net backend types
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-01 12:37:27 +02:00
Michal Privoznik
73fc20e262 qemu_validate: Validate net backends against QEMU caps
Now that we have a capability for each domain net backend we can
start validating user's selection against QEMU capabilities.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-01 12:33:14 +02:00
Michal Privoznik
e28bc15f09 qemu_capabilities: Introduce QEMU_CAPS_NETDEV_USER
Since -netdev user can be disabled during QEMU compilation, we
can't blindly expect it to just be there. We need a capability
that tracks its presence.

For qemu-4.2.0 we are not able to detect the capability so do the
next best thing - assume the capability is there. This is
consistent with our current behaviour where we blindly assume the
capability, anyway.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-07-01 12:32:16 +02:00
Jon Kohler
76e2dae01a qemu: fix switchover-ack regression for old qemu
When enabling switchover-ack on qemu from libvirt, the .party value
was set to both source and target; however, qemuMigrationParamsCheck()
only takes that into account to validate that the remote side of the
migration supports the flag if it is marked optional or auto/always on.

In the case of switchover-ack, when enabled on only the dst and not
the src, the migration will fail if the src qemu does not support
switchover-ack, as the dst qemu will issue a switchover-ack msg:
qemu/migration/savevm.c ->
  loadvm_process_command ->
    migrate_send_rp_switchover_ack(mis) ->
      migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)

Since the src qemu doesn't understand messages with header_type ==
MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
  qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
  qemu-kvm: Unable to write to socket: Bad file descriptor

Looking at the original commit [1] for optional migration capabilities,
it seems that the spirit of optional handling was to enhance a given
existing capability where possible. Given that switchover-ack
exclusively depends on return-path, adding it as optional to that cap
feels right.

[1] 61e34b0856 ("qemu: Add support for optional migration capabilities")

Fixes: 1cc7737f69 ("qemu: add support for qemu switchover-ack")

Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-06-28 08:50:12 +02:00
Michal Privoznik
fbe97ee17d qemu_validate: Use domaincaps to validate supported launchSecurity type
Now that the logic for detecting supported launchSecurity types
has been moved to domain capabilities generation, we can just use
it when validating launchSecurity type. Just like we do for
device models and so on.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-25 14:46:08 +02:00
Michal Privoznik
66df7992d8 qemu: Fill launchSecurity in domaincaps
The inspiration for these rules comes from
qemuValidateDomainDef().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-25 14:46:05 +02:00
Michal Privoznik
d00816209e qemu_capabilities: Probe SEV capabilities even for QEMU_CAPS_SEV_SNP_GUEST
While it's very unlikely to have QEMU that supports SEV-SNP but
doesn't support plain SEV, for completeness sake we ought to
query SEV capabilities if QEMU supports either. And similarly to
QEMU_CAPS_SEV_GUEST we need to clear the capability if talking to
QEMU proves SEV is not really supported.

This in turn removes the 'sev-snp-guest' capability from one of
our test cases as Peter's machine he uses to refresh capabilities
is not SEV capable. But that's okay. It's consistent with
'sev-guest' capability.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-25 14:46:00 +02:00
Rayhan Faizel
9b0606ef8e qemu_block: Validate number of hosts for iSCSI disk device
An iSCSI device with zero hosts will result in a segmentation fault. This patch
adds a check for the number of hosts, which must be one in the case of iSCSI.

Minimal reproducing XML:

<domain type='qemu'>
    <name>MyGuest</name>
    <uuid>4dea22b3-1d52-d8f3-2516-782e98ab3fa0</uuid>
    <os>
        <type arch='x86_64'>hvm</type>
    </os>
    <memory>4096</memory>
    <devices>
        <disk type='network'>
            <source name='dummy' protocol='iscsi'/>
            <target dev='vda'/>
        </disk>
    </devices>
</domain>

Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-06-25 10:05:49 +02:00
Jon Kohler
1cc7737f69 qemu: add support for qemu switchover-ack
Add plumbing for QEMU's switchover-ack migration capability, which
helps lower the downtime during VFIO migrations. This capability is
enabled by default as long as both the source and destination support
it.

Note: switchover-ack depends on the return path capability, so this may
not be used when VIR_MIGRATE_TUNNELLED flag is set.

Extensive details about the qemu switchover-ack implementation are
available in the qemu series v6 cover letter [1] where the highlight is
the extreme reduction in guest visible downtime. In addition to the
original test results below, I saw a roughly ~20% reduction in downtime
for VFIO VGPU devices at minimum.

  === Test results ===

  The below table shows the downtime of two identical migrations. In the
  first migration swithcover ack is disabled and in the second it is
  enabled. The migrated VM is assigned with a mlx5 VFIO device which has
  300MB of device data to be migrated.

  +----------------------+-----------------------+----------+
  |    Switchover ack    | VFIO device data size | Downtime |
  +----------------------+-----------------------+----------+
  |       Disabled       |         300MB         |  1900ms  |
  |       Enabled        |         300MB         |  420ms   |
  +----------------------+-----------------------+----------+

  Switchover ack gives a roughly 4.5 times improvement in downtime.
  The 1480ms difference is time that is used for resource allocation for
  the VFIO device in the destination. Without switchover ack, this time is
  spent when the source VM is stopped and thus the downtime is much
  higher. With switchover ack, the time is spent when the source VM is
  still running.

[1] https://patchwork.kernel.org/project/qemu-devel/cover/20230621111201.29729-1-avihaih@nvidia.com/

Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-06-25 09:51:00 +02:00
Jonathon Jongsma
af437d2d64 qemu: Don't specify vfio-pci.ramfb when ramfb is false
Commit 7c8e606b64 attempted to fix
the specification of the ramfb property for vfio-pci devices, but it
failed when ramfb is explicitly set to 'off'. This is because only the
'vfio-pci-nohotplug' device supports the 'ramfb' property. Since we use
the base 'vfio-pci' device unless ramfb is enabled, attempting to set
the 'ramfb' parameter to 'off' this will result in an error like the
following:

  error: internal error: QEMU unexpectedly closed the monitor
  (vm='rhel'): 2024-06-06T04:43:22.896795Z qemu-kvm: -device
  {"driver":"vfio-pci","host":"0000:b1:00.4","id":"hostdev0","display":"on
  ","ramfb":false,"bus":"pci.7","addr":"0x0"}: Property 'vfio-pci.ramfb'
  not found.

This also more closely matches what is done for mdev devices.

Resolves: https://issues.redhat.com/browse/RHEL-28808

Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-24 08:55:50 -05:00
Adam Julis
3a9095976e qemuDomainDiskChangeSupported: Fill in missing check
The attribute 'discard_no_unref' of <disk/> is not allowed to be
changed while the virtual machine is running.

Resolves: https://issues.redhat.com/browse/RHEL-37542
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-24 11:14:56 +02:00
Michal Privoznik
58b5219961 qemu_firmware: Pick the right firmware for SEV-SNP guests
The firmware descriptors have 'amd-sev-snp` feature which
describes whether firmware is suitable for SEV-SNP guests.
Provide necessary implementation to detect the feature and pick
the right firmware if guest is SEV-SNP enabled.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:59:04 +02:00
Michal Privoznik
a1d850b300 qemu: Build cmd line for SEV-SNP
Pretty straightforward as qemu has 'sev-snp-guest' object which
attributes maps pretty much 1:1 to our XML model. Except for
@vcek where QEMU has 'vcek-disabled`, an inverted boolean, while
we model it as virTristateBool. But that's easy to map too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:58:10 +02:00
Michal Privoznik
c65eba1f57 conf: Introduce SEV-SNP support
SEV-SNP is an enhancement of SEV/SEV-ES and thus it shares some
fields with it. Nevertheless, on XML level, it's yet another type
of <launchSecurity/>.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:56:57 +02:00
Michal Privoznik
1abcba9d4d qemu_capabilities: Introduce QEMU_CAPS_SEV_SNP_GUEST
This capability tracks sev-snp-guest object availability.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:56:18 +02:00
Michal Privoznik
be26d0ebbe qemu: Report snp-policy in virDomainGetLaunchSecurityInfo()
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:36:04 +02:00
Michal Privoznik
914b986275 qemu_monitor: Allow querying SEV-SNP state in 'query-sev'
In QEMU commit v9.0.0-1155-g59d3740cb4 the return type of
'query-sev' monitor command changed to accommodate SEV-SNP. Even
though we currently support launching plain SNP guests, this will
soon change.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:35:32 +02:00
Michal Privoznik
7d16c296e3 src: Convert some _virDomainSecDef::sectype checks to switch()
In a few instances there is a plain if() check for
_virDomainSecDef::sectype. While this works perfectly for now,
soon there'll be another type and we can utilize compiler to
identify all the places that need adaptation. Switch those if()
statements to switch().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:32:09 +02:00
Michal Privoznik
a44a43361f Drop needless typecast to virDomainLaunchSecurity
The sectype member of _virDomainSecDef struct is already declared
as of virDomainLaunchSecurity type. There's no need to typecast
it to the very same type when passing it to switch().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:31:33 +02:00
Michal Privoznik
d2cad18ca3 conf: Move some members of virDomainSEVDef into virDomainSEVCommonDef
Some parts of SEV are to be shared with SEV SNP. In order to
reuse XML parsing / formatting code cleanly, let's move those
common bits into a new struct (virDomainSEVCommonDef) and adjust
rest of the code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:28:54 +02:00
Michal Privoznik
66efdfabd9 qemu_monitor_json: Report error in error paths in SEV related code
While working on qemuMonitorJSONGetSEVMeasurement() and
qemuMonitorJSONGetSEVInfo() I've noticed that if these functions
fail, they do so without appropriate error set. Fill in error
reporting.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-21 09:25:32 +02:00
Peter Krempa
e6b94cba7e qemu: migration: Preserve error across qemuDomainSetMaxMemLock() on error paths
When a VM terminates itself while it's being migrated in running state
libvirt would report wrong error:

 error: cannot get locked memory limit of process 2502057: No such file or directory

rather than the proper error:

 error: operation failed: domain is not running

Remember the error on error paths in qemuMigrationSrcConfirmPhase and
qemuMigrationSrcPerformPhase.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
e00a58c10a qemuMigrationSrcRun: Re-check whether VM is active before accessing job data
'qemuProcessStop()' clears the 'current' job data. While the code under
the 'error' label in 'qemuMigrationSrcRun()' does check that the VM is
active before accessing the job, it also invokes multiple helper
functions to clean up the migration including
'qemuMigrationSrcNBDCopyCancel()' which calls 'qemuDomainObjWait()'
invalidating the result of the liveness check as it unlocks the VM.

Duplicate the liveness check and explain why. The rest of the code e.g.
accessing the monitor is safe as 'qemuDomainEnterMonitorAsync()'
performs a liveness check. The cleanup path just ignores the return
values of those functions.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
9243e87820 qemu: migration: Inline 'qemuMigrationDstFinishResume()'
The function is a pointless wrapper on top of
qemuMigrationDstWaitForCompletion.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
a52e125d56 qemu: migration: Properly check for live VM after qemuDomainObjWait()
Similarly to the one change in commit 4d1a1fdffd
we should be checking that the VM is not being yet destroyed if we've
invoked qemuDomainObjWait().

Use the new helper qemuDomainObjIsActive().

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
9eb33b7f03 qemu: domain: Introduce qemuDomainObjIsActive helper
The helper checks whether VM is active including the internal qemu
state. This helper will become useful in situations when an async job
is in use as VIR_JOB_DESTROY can run along async jobs thus both checks
are necessary.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
d9935a5c4f qemu: process: Ensure that 'beingDestroyed' gets cleared only after VM id is reset
Prevent the possibility that a VM could be considered as alive while
inside qemuProcessStop.

A recently fixed bug which unlocked the domain object while inside
qemuProcessStop showed that there's possibility to confuse the state of
the VM to be considered active while 'qemuProcessStop' is processing
shutdown of the VM. Ensure that this doesn't happen by clearing the
'beingDestroyed' flag only after the VM id is cleared.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
3865410e7f qemuProcessStop: Move code not depending on 'vm->def->id' after reset of the ID
There are few function calls done while cleaning up a stopped VM which
do require the old VM id, to e.g. clean up paths containing the 'short'
domain name in the path.

Anything else, which doesn't strictly require it can be moved after
clearing the 'id' in order to decrease likelyhood of potential bugs.

This patch moves all the code which does not require the 'id' (except
for the log entry and closing the monitor socket) after the statement
clearing the id and adds a comment explaining that anything in the
section must not unlock the VM object.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:52 +02:00
Peter Krempa
d29e0f3d4a qemuProcessStop: Prevent crash when qemuDomainObjStopWorker() unlocks the VM
'qemuDomainObjStopWorker()' which is meant to dispose of the event loop
thread for the monitor unlocks the VM object while disposing the thread
to prevent possible deadlocks with events waiting on the monitor thread.

Unfortunately 'qemuDomainObjStopWorker()' is called *before* the VM is
marked as inactive by clearing 'vm->def->id', but at the same time it's
no longer marked as 'beingDestroyed' when we're inside
'qemuProcessStop()'.

If 'vm' would be kept locked this wouldn't be a problem. Same way it's
not a problem for anything that uses non-ASYNC VM jobs, or when the
monitor is accessed in an async job, as the 'destroy' job interlocks
with those.

It is a problem for code inside an async job which uses
'qemuDomainObjWait()' though. The API contract of qemuDomainObjWait()
ensures the caller that the VM on successful return from it, but in this
specific reason it's not the case, as both 'beingDestroyed' is already
false, and 'vm->def->id' is not yet cleared.

To fix the issue move the 'qemuDomainObjStopWorker()' call *after*
clearing 'vm->def->id' and also add a note stating what the function is
doing.

Fixes: 860a999802
Closes: https://gitlab.com/libvirt/libvirt/-/issues/640
Reported-by: luzhipeng <luzhipeng@cestc.cn>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:58:21 +02:00
Peter Krempa
da8d97e4e2 qemuDomainObjWait: Add documentation
Document why this function exists and meaning of return values.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:52:55 +02:00
Peter Krempa
f9ad21996d qemuDomainDeviceBackendChardevForeach: Fix typo in comment
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:52:54 +02:00
Peter Krempa
b4423a753b qemuDomainDiskPrivateDispose: Prevent dangling 'disk' pointer in blockjob data
Clear the 'disk' member of 'blockjob' as we're freeing the disk object
at this point. While this should not normally happen it was observed
when other bug allowed the VM to be cleared while other threads didn't
yet finish.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:52:54 +02:00
Peter Krempa
737f897c29 qemuBlockJobProcessEventConcludedBackup: Handle potentially NULL 'job->disk'
Similarly to other blockjob handlers, if there's no disk associated with
the blockjob the handler needs to behave correctly. This is needed as
the disk might have been de-associated on unplug or other operations.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-20 09:52:54 +02:00
Swapnil Ingle
c772f1982d Pass shutoff reason to release hook
Sometimes in release hook it is useful to know if the VM shutdown was graceful
or not. This is especially useful to do cleanup based on the VM shutdown failure
reason in release hook. This patch proposes to use the last argument 'extra'
to pass VM shutoff reason in the call to release hook.
Making this change for Qemu and LXC.

Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-19 12:15:26 +02:00
Adam Julis
e145d182a6 qemu: implement iommu coldplug/unplug
Resolves: https://issues.redhat.com/browse/RHEL-23833
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-18 12:17:50 +02:00
Adam Julis
59f6e226bb qemu_driver: add validation of potential dependencies on cold plug
Although virDomainDeviceDefValidate() is called as a part of
parsing device XML routine, it validates only that single device.
The virDomainDefValidate() function performs a more comprehensive
check. It should detect errors resulting from dependencies
between devices, or a device and some other part of XML config.
Therefore, a call to virDomainDefValidate() is added at the end
of qemuDomainAttachDeviceConfig().

Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-18 08:46:28 +02:00
Michal Privoznik
095f22db21 qemu_process: Issue an info message when subtracting isolcpus
In one of my previous commits I've made us substract isolcpus
from all online CPUs when setting affinity on QEMU threads. See
commit below for more info on that. Nevertheless, this is
something that surely deserves an entry in log. I've chosen INFO
priority for now. We can promote that to a regular WARN if users
complain.

Fixes: da95bcb6b2
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-17 12:30:39 +02:00
Daniel P. Berrangé
f2828880b6 meson: allow systemd sysusersdir to be changed
We currently hardcode the systemd sysusersdir, but it is desirable to be
able to choose a different location in some cases. For example, Fedora
flatpak builds change the RPM %_sysusersdir macro, but we can't currently
honour that.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reported-by: Yaakov Selkowitz <yselkowi@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-13 10:23:11 +01:00
Peter Krempa
39bfd6c888 qemu_validate: Validate support for SCSI emulation support in 'virtio-blk' devices
The support will be dropped soon by qemu, and libvirt is not rejecting
such configurations. Add validation of this explicitly requested config.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-12 08:21:12 +02:00
Peter Krempa
126f95c1fe qemuValidateDomainDeviceDefDiskFrontend: Refactor validation of <disk type='lun'>
Use a switch statement for checks based on the disk bus.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-12 08:21:11 +02:00
Andrea Bolognani
971e767805 qemu: Reject TPM 1.2 in most scenarios
Everywhere we use TPM 2.0 as our default, the chances of TPM
1.2 being supported by the guest OS are very slim. Just reject
such configurations outright.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-06-07 11:13:19 +02:00