The aim is to move parts of vir_event_thread_finalize() that MAY
block into a separate function, so that unrefing the a
virEventThread no longer blocks (or require releasing and
subsequent re-acquiring of a mutex).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This introduces a new 'ps2' feature which, when disabled, results in
no implicit PS/2 bus input devices being automatically added to the
domain and addition of the 'i8042=off' machine option to the QEMU
command-line.
A notable side effect of disabling the i8042 controller in QEMU is that
the vmport device won't be created. For this reason we will not allow
setting the vmport feature if the ps2 feature is explicitly disabled.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This capability tells us whether given QEMU binary supports the
'-machine xxx,i8042=on/off' toggle used to enable/disable PS/2
controller emulation.
A few facts:
- This option was introduced in QEMU 7.0 and defaults to 'on'
- QEMU versions before 7.0 enabled i8042 controller emulation implicitly
- This option (and i8042 controller emulation itself) is only supported
by descendants of the generic PC machine type (e.g. i440fx, q35, etc.)
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Up until now, we've assumed that all x86 machines have a PS/2
controller built-in. This assumption was correct until QEMU v4.2
introduced a new x86-based machine type - microvm.
Due to this assumption, a pair of unnecessary PS/2 inputs are implicitly
added to all microvm domains. This patch fixes that by whitelisting
machine types which are known to include the i8042 PS/2 controller.
Signed-off-by: Kamil Szczęk <kamil@szczek.dev>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Attempting to start qemu with or hotplug an empty 'usb-storage' based
disk results in the following error:
qemu-system-x86_64: -device {"driver":"usb-storage","bus":"usb.0","port":"2","id":"usb-disk1","removable":true}: drive property not set
Reject such config at validation step and adjust tests.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Some code paths, such as if hotplug of an empty cdrom fails can cause
that 'qemuBlockStorageSourceChainDetach' will be called with 'NULL'
@data as there is no backend for the disk.
The above case became possible once we allowed hotplug of cdroms and
subsequently fixed the case when users would hotplug an empty cdrom
which ultimately caused the possibility of having no backend in the
hotplug code path which was not possible before (see 'Fixes:' below and
also the commit linked from there).
Make 'qemuBlockStorageSourceChainDetach' tolerate NULL @data by simply
returning early.
Fixes: 894c6c5c16
Resolves: https://issues.redhat.com/browse/RHEL-54550
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In its commit v9.0.0-rc0~1^2 QEMU started to read
/proc/sys/vm/max_map_count file to set up coroutine limits better
(something about VMAs, mmap(), see the commit for more info).
Allow the file in apparmor profile.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/660
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since we use 'tc' to set QoS, or we instruct OVS which then uses
'tc', we have to make sure values are within range acceptable to
'tc'.
Resolves: https://issues.redhat.com/browse/RHEL-45200
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This function validates whether parsed limits are within range as
defined by 'tc' sources (since we use tc to set QoS; or OVS which
then uses tc too). The 'tc' program stores speeds in 64bit
integers (unit is bytes per second) and sizes in uints (unit is
bytes). We use different units: kilobytes per second and
kibibytes and therefore we can parse values larger than 'tc' can
handle and thus need a function to check if values still fit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
There is a family of convenient macros: NULLSTR, NULLSTR_EMPTY,
NULLSTR_STAR, NULLSTR_MINUS which hides ternary operator.
Generated using the following spatch (and its obvious variants):
@@
expression s;
@@
<+...
- s ? s : "<null>"
+ NULLSTR(s)
...+>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Check for the last multipart message right as the first thing. The
presumption probably was that the last message might still contain a
payload we want to parse. However that cannot be true since that would
have to be a type RTM_NEWNEIGH. This was not caught because older
kernels were note sending NLMSG_DONE and probably relied on the fact
that the parsing just stops after all the messages are walked through,
which the NLMSG_OK macro successfully did.
Resolves: https://issues.redhat.com/browse/RHEL-52449
Resolves: https://bugzilla.redhat.com/2302245
Fixes: a176d67cdf
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
The previous check was all wrong since it calculated the how long would
the netlink message be if the netlink header was the payload and then
subtracted that from the whole message length, a variable that was not
used later in the code. This check can fail if there are no additional
payloads, struct rtattr in particular, which we are parsing later,
however the RTA_OK macro would've caught that anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Use convenience macro which does almost the same thing we were doing,
but also pads out the payload length to a multiple of NLMSG_ALIGNTO (4)
bytes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This brings the tool's list of features in sync with qemu
commit 37fbfda8f4145ba1700f63f0cb7be4c108d545de.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This adds an option to use libcpuinfo [1] as data source for
libvirt's list of x86 cpu features. This is purely optional and
does not change the script's behavior if libcpuinfo is not
installed.
libcpuinfo is a cross-vendor, cross-architecture source for CPU
related information that has the capability to replace libvirt's
dependence on qemu's cpu feature list.
[1] https://gitlab.com/twiederh/libcpuinfo
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
On failure to plug the device the cleanup path didn't roll back the FD
passing to qemu thus qemu would hold the FDs indefinitely.
Resolves: https://issues.redhat.com/browse/RHEL-53964
Fixes: b79abf9c3c (vdpafd)
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
As of libssh commit of libssh-0.11.0~70 [1] the
ssh_channel_get_exit_status() function is deprecated and a new
one is introduced instead: ssh_channel_get_exit_state().
It's not a drop-in replacement, but it's simple enough.
Adapt our libssh handling code to this change.
1: https://git.libssh.org/projects/libssh.git/commit/?id=04d86aeeae73c78af8b3dcdabb2e588cd31a8923
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This mostly reverts commit 65491a2dfe.
There was a bug introduced in glib 2.67.0 which impacted libvirt with
clang causing -Wincompatible-pointer-types-discards-qualifiers warnings.
This was actually fixed quite quickly in 2.67.1 with
https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1719
Our workaround was then broken with glib 2.81.1 due to commit
14b3d5da9019150d821f6178a075d85044b4c255 changing the signature of the
(private) macro we were overriding.
Since odd-number glib releases are development snapshots, and the
original problem was only present in 2.67.0 and no other releases,
just drop the workaround entirely.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Backport the implementation of 'g_string_replace' until we require at
least glib-2.68
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Invoke virCHProcessStop to kill CH process incase of any failures during
restore operation.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Cloud-hypervisor now supports restoring with new net fds.
Ref: https://github.com/cloud-hypervisor/cloud-hypervisor/pull/6402
So, pass new tap fds via SCM_RIGHTS to CH's restore api.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Remove the unwanted utility function and make api calls directly from
virCHMonitorSaveVM fn
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Instead of curl, use low-level socket connections to make restore api
request to CH. This will enable passing new net FDs to CH while
restoring domains with network configuration.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
chSocketRecv fn can be used by operations such as restore, which cannot
have a specific poll timeout. The runtime of these operations at server
side (vmm) cannot be determined or capped as it depends on the guest
configuration. Hence, add a new parameter 'use_timeout' which when set
will pass -1 as timeout to poll, otherwise the default PKT_TIMEOUT_MS is
used.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move monitor socket connection, response handling and closing FDs code into
new functions in preparation for adding restore support for net devices.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Pass "net_<index>" as net id to CH. This is to have better control over
the network configs. This id can be further used in performing
operations like restore etc.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The response message from CH for vm.add-net api will be more helpful in
debugging. Hence, log the message instead of just response code.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add dma-translation attribute to qemu command line if specified in
domain conf.
Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add dma_translation attribute to iommu to enable/disable dma traslation
for intel-iommu
Signed-off-by: Sandesh Patel <sandesh.patel@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Depending on timing between QEMU and libvirt an attempt to resume failed
post-copy migration could immediately report a failure in post-copy
phase again even though the migration actually resumed and is
progressing just fine.
This is caused by QEMU reporting the original migration state (i.e.,
postcopy-paused) until migration is successfully resumed and QEMU
switches to postcopy-active. QEMU 9.1 introduced a new
postcopy-recover-setup migration state which is entered immediately
after requesting migration to be resumed and we can reliably wait for
the migration to either continue or fail without being confused by the
old state.
https://issues.redhat.com/browse/RHEL-22166
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch adds support for recognizing the new migration state reported
by QEMU when post-copy recovery is requested. It is not actually used
for anything yet.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The original condition caused (after adding modify option)
possibly access to not allocated memory. For consistency added
new check for multiple same records.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/654
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The "modify" command allowed to replace an existing record, now
checks for the NULL string in the new value and throw error if
found.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/655
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The QEMU package in Debian has recently moved the
qemu-bridge-helper binary under /usr/libexec/qemu. Update the
AppArmor profile accordingly.
https://bugs.debian.org/1077915
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
The s390(x) machines never supported ACPI. That didn't stop users
enabling ACPI in their config. As of libvirt-9.2 (98c4e3d073) with new
enough qemu we reject configs which require ACPI, but qemu can't satisfy
it.
This breaks migration of existing VMs with the old wrong configs to new
libvirt installations.
To address this introduce a post-parse fixup removing the ACPI flag
specifically for s390 machines which do enable it in the definition.
The advantage of doing it in post-parse, rather than simply relaxing the
ABI stability check to allow users providing an fixed XML when migrating
(allowing change of the ACPI flag for s390 in ABI stability check, as it
doesn't impact ABI), is that only the destination installation needs to
be patched in order to preserve migration.
To mitigate the disadvantage of simply stripping it from all s390(x)
configs the hack is not applied when defining or starting a new domain
from the XML, to preserve the error about unsupported configuration.
Resolves: https://issues.redhat.com/browse/RHEL-49516
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
This reverts commit cf934c87cc.
The matching logic is flawed and it would complicate support of
this command.
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The whole point of pstore device is that the guest writes crash
dumps into it. But the way SELinux label is set on the
corresponding file warrants RO access only. This is due to a
copy-paste from code around: kernel/initrd/DTB/SLIC - these are
RO indeed, but pstore MUST be writable too. In a sense it's
closer to NVRAM/disks - hence set imagelabel on it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
So far we are relying on QEMU or sysadmin to create the file for
pstore. This is suboptimal as in the case of the former we can
not set proper seclabels (there's nothing to set seclabels on
until QEMU is started).
Therefore, make sure the file is created before launching QEMU
and that it has the correct size.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Introduced only a couple of commits ago (in
v10.5.0-84-g90e50e67c6) the pstore device acts as a nonvolatile
storage, where guest kernel can store information about crashes.
This device, however, expects a file in the host from which the
crash data is read. So far, we expected users to provide a path,
but we can autogenerate one if missing. Just put it next to
per-domain's NVRAM stores.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
As can be seen in earlier commits, there can be two OEM strings
with the same index. But since our parser
(virSysinfoParseOEMStrings()) doesn't expect that, it increments
index in each run and thus skips over these strings.
Fortunately, we have the right index at hand - we're just
skipping over it in a loop. Just reconstruct the index back
inside the loop.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
On some systems, there are two or even more 'OEM Strings'
sections in DMI table. Here's an example of dmidecode output on
such system:
# dmidecode -q -t 11
OEM Strings
String 1: Default string
OEM Strings
String 1: ThunderX2 System
String 2: cavium.com
String 3: Comanche
Now, this poses a problem, because when one tries to obtain
individual strings, they get:
# dmidecode -q --oem-string 1
Default string
ThunderX2 System
# dmidecode -q --oem-string 2
No OEM string number 2
cavium.com
NB, the "No OEM string number 2" is printed onto stderr and
everything else onto stdout. Oh, and trying to get OEM strings
from just one section doesn't fly:
# dmidecode -q -H 0x1d --oem-string 2
Options --string, --type, --handle and --dump-bin are mutually exclusive
This means two things:
1) we have no way of distinguishing OEM strings at the same index
but in different sections,
2) because of how virSysinfoDMIDecodeOEMString() is written, we
fail in querying OEM string that exists in one section but not
in the others (for instance string #2 from example above).
While there's not much we can do about 1), there is something
that can be done about 2) - refine the error condition and make
the function return an error iff there's nothing on stdout and
there's something on stderr.
Resolves: https://issues.redhat.com/browse/RHEL-45952
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
If dry run of a command was requested (virCommandSetDryRun())
then a specified callback is called instead of running actual
command. This is meant to be used in tests. To mimic running the
command as closely as possible the callback can also set exit
status of the command it's implementing. To save some lines
though, the exit status is initialized to 0 so that callback has
to set it only on failures. Now, 0 is not exactly portable value
- that's why stdlib.h has EXIT_SUCCESS (and EXIT_FAILURE) values.
Initialize the exit status (held in dryRunStatus) to EXIT_SUCCESS
then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The acpi-erst backend for pstore device exposes a path in the
host accessible to the guest and as such we must set seclabels on
it to grant QEMU RW access.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Nothing special going on here.
Resolves: https://issues.redhat.com/browse/RHEL-24746
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The aim of pstore device is to provide a bit of NVRAM storage for
guest kernel to record oops/panic logs just before the it
crashes. Typical usage includes usage in combination with a
watchdog so that the logs can be inspected after the watchdog
rebooted the machine. While Linux kernel (and possibly Windows
too) support many backends, in QEMU there's just 'acpi-erst'
device so stick with that for now. The device must be attached to
a PCI bus and needs two additional values (well, corresponding
memory-backend-file needs them): size and path. Despite using
memory-backend-file this does NOT add any additional RAM to the
guest and thus I've decided to expose it as another device type
instead of memory model.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The new option style renamed one of the cache modes.
https://issues.redhat.com/browse/RHEL-50329
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In cases when a QEMU process takes longer than the time sigterm and
sigkill are issued to kill the process do not simply fail and leave the
VM in state VIR_DOMAIN_SHUTDOWN until the daemon stops. Instead set up
an fd on /proc/$pid and get notified when the QEMU process finally has
terminated to cleanup the VM state.
Resolves: https://issues.redhat.com/browse/RHEL-28819
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If there are absent values in an already existing element
specifying rom settings, we simply use the old ones. This
behaviour is not desired, as users might think that deleting the
element from XML would delete the setting (because the hotplug
succeeds) - which does not happen. Because of that, we should not
accept an interface without elements that cannot be changed.
Therefore, we should not allow absent values for already existing
rom setting during hotplug.
Resolves: https://issues.redhat.com/browse/RHEL-7109
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
New element 'openfiles' had confusing name. Since the patch with
this new element wasn't propagate yet, old name ('rlimit_nofile')
was changed.
...
<binary>
<openfiles max='122333'/>
</binary>
...
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since libvirt commit 3ef9b51b10,
the pflash storage for the os loader file follows its read-only flag,
and qemu tries to open the file for writing if set so.
This patches virt-aa-helper to generate the VM's AppArmor rules
that allow this, using the same domain definition flag and default.
Signed-off-by: Miroslav Los <mirlos@cisco.com>
Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
By definition. Accordingly, filter them out when looking for
a read/write image.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If the configuration explicitly requests a specific type of
firmware image, be it pflash or ROM, we should ignore all images
that are not of that type.
If no specific type has been requested, of course, any type is
considered a match and the selection will be based upon the
other attributes.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add an element to configure the rlimit nofile size:
...
<binary>
<rlimit_nofile size='122333'/>
</binary>
...
Non-positive values are forbidden in 'domaincommon.rng'. Added separate
test file, created by modifying the 'vhost-user-fs-fd-memory.xml'.
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So much can happen in the fileName field of the VMX that the easiest
thing is to silently report a serial type="null".
This effectively reverts commits de81bdb8d4cd and 62c53db0421a, but
keeps the test files to show the fix is still in place.
There is one instance where an error gets reset, but since that is a
rare case on its own and on top of that does not happen in any of our
long-running daemons with a logfile that might get monitored it should
be fine to leave it there.
Resolves: https://issues.redhat.com/browse/RHEL-32182
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This CPU feature can be used to explicitly enable or disable
support for pointer authentication. By default, it will be
enabled if the host supports it.
https://issues.redhat.com/browse/RHEL-7044
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Recent commit v10.4.0-87-gd9935a5c4f made a reasonable change to only
reset beingDestroyed back to false when vm->def->id is reset to make
sure other code can detect a domain is (about to become) inactive. It
even added a comment saying any caller of qemuProcessBeginStopJob is
supposed to call qemuProcessStop to clear beingDestroyed. But not every
caller really does so because they first call qemuProcessBeginStopJob
and then check whether a domain is still running. If not the
qemuProcessStop call is skipped leaving beingDestroyed=true. In case of
a persistent domain this may block incoming migrations of such domain as
the migration code would think the domain died unexpectedly (even though
it's still running).
The qemuProcessBeginStopJob function is a wrapper around
virDomainObjBeginJob, but virDomainObjEndJob was used directly for
cleanup. This patch introduces a new qemuProcessEndStopJob wrapper
around virDomainObjEndJob to properly undo everything
qemuProcessBeginStopJob did.
https://issues.redhat.com/browse/RHEL-43309
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Allow migration if the "migrate-precopy" capability is present or
libvirt is not the one running the virtiofs daemon.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Run the daemon with --print-capabilities first, to see what it supports.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As of now, libvirt supports few essential stats as
part of virDomainGetJobStats for Live Migration such
as memory transferred, dirty rate, number of iteration
etc. Currently it does not have support for the vfio
stats returned via QEMU. This patch adds support for that.
Signed-off-by: Kshitij Jha <kshitij.jha@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The "modify" command allows to replace an existing record (its
text value). The primary key is the name of the record. If
duplicity or missing record detected, throw error.
Tests in networkxml2xmlupdatetest.c contain replacements of an
existing DNS-text record and failure due to non-existing record.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/639
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The "modify" command allows to replace an existing Srv record
(some of its elements respectively: port, priority and weight).
The primary key used to choose the modify record is the remaining
parameters, only one of them is required. Not using some of these
parameters may cause duplicate records and error message. This
logic is there because of the previous implementation (Add and
Delete options) in the function.
Tests in networkxml2xmlupdatetest.c contain replacements of an
existing DNS-Srv record and failure due to non-existing record.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/639
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The "modify" command allows you to replace an existing record
(its hostname, sub-elements). IP address acts as the primary key.
If it is not found, the attempt ends with an error message. If
the XML contains a duplicate address, it will select the last
one.
Tests in networkxml2xmlupdatetest.c contain replacements of an
existing DNS-Host record and failure due to non-existing record.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/639
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The outdated comment refers to a non-existent member in the
virDomainObj structure.
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When generating paths for a domain specific AppArmor profile each
path undergoes a validation where it's matched against an array
of well known prefixes (among other things). Now, for
OVMF/AAVMF/... images we have a list and some entries have
comments to which type of image the entry belongs to. For
instance:
"/usr/share/OVMF/", /* for OVMF images */
"/usr/share/AAVMF/", /* for AAVMF images */
But these comments are pretty useless. The path itself already
gives away the image type. Drop them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
This commit removes the redundant call to qemuSecurityGetNested() in
qemuStateInitialize(). In qemuSecurityGetModel(), the first security manager
in the stack is already used by default, so this change helps to
simplify the code.
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Fix libvirtd hang since fork() was called while another thread had
security manager locked.
We have the stack security driver, which internally manages other security drivers,
just call them "top" and "nested".
We call virSecurityStackPreFork() to lock the top one, and it also locks
and then unlocks the nested drivers prior to fork. Then in qemuSecurityPostFork(),
it unlocks the top one, but not the nested ones. Thus, if one of the nested
drivers ("dac" or "selinux") is still locked, it will cause a deadlock. If we always
surround nested locks with top lock, it is always secure. Because we have got top lock
before fork child libvirtd.
However, it is not always the case in the current code, We discovered this case:
the nested list obtained through the qemuSecurityGetNested() will be locked directly
for subsequent use, such as in virQEMUDriverCreateCapabilities(), where the nested list
is locked using qemuSecurityGetDOI, but the top one is not locked beforehand.
The problem stack is as follows:
libvirtd thread1 libvirtd thread2 child libvirtd
| | |
| | |
virsh capabilities qemuProcessLanuch |
| | |
| lock top |
| | |
lock nested | |
| | |
| fork------------------->|(nested lock held by thread1)
| | |
| | |
unlock nested unlock top unlock top
|
|
qemuSecuritySetSocketLabel
|
|
lock nested (deadlock)
In this commit, we ensure that the top lock is acquired before the nested lock,
so during fork, it's not possible for another task to acquire the nested lock.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1303031
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a domain created with an interface with a <driver> subelement,
the device contains a non-NULL virDomainVirtioOptions struct, even
for non-virtio NIC models. The subelement need not be present again
after libvirt restarts, or when the interface is passed to clients.
When clients such as virsh domif-setlink put back the modified
interface XML, the new device's virtio attribute is NULL. This may
fail the equality checks for virtio options in qemuDomainChangeNet,
depending on whether libvird was restarted since define or not.
This patch modifies the check for non-virtio models, to ignore olddev
value of virtio (assumed valid), and to allow either NULL or a struct
with all values ABSENT in the new virtio options.
Signed-off-by: Miroslav Los <mirlos@cisco.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to commit 2482801608 we can safely ignore connectionId,
portId and portgroupId in both XML and VMX as they are only a blind
pass-through between XML and VMX and an ethernet without such parameters
was spotted in the wild. On top of that even our documentation says the
whole VMWare Distrubuted Switch configuration is a best-effort.
Resolves: https://issues.redhat.com/browse/RHEL-46099
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
This wires up the emulator 'debug' parameter to control the
/usr/bin/swtpm 'level' parameter for logging.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Pick up some more of the qemu_driver.c code so this function supports
both CONFIG and LIVE updates.
Note that qemuDomainUpdateDeviceFlags() passed vm->def to
virDomainDeviceDefParse() for the VIR_DOMAIN_AFFECT_CONFIG case, which
is technically incorrect; in the test driver code we'll fix this.
Signed-off-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
mem_nodes[i].ndistances is written outside the loop causing an out-of-bounds
write leading to heap corruption.
While we are at it, the entire cleanup portion can be removed as it can be
handled in virDomainNumaFree. One instance of VIR_FREE is also removed and
replaced with g_autofree.
This patch also adds a testcase which would be picked up by ASAN, if this
portion regresses.
Fixes: 742494eed8
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Changing the postgroup attribute caused unexpected behavior.
Although it can be implemented, it has a non-trivial solution.
No requirement or use has yet been found for implementing this
feature, so it has been disabled for hot-plug.
Resolves: https://issues.redhat.com/browse/RHEL-7299
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The current hostdev parsing logic sets rawio or sgio even if the hostdev type
is not 'scsi'. The rawio field in virDomainHostdevSubsysSCSI overlaps with
wwpn field in virDomainHostdevSubsysSCSIVHost, consequently setting a bogus
pointer value such as 0x1 or 0x2 from virDomainHostdevSubsysSCSIVHost's
point of view. This leads to a segmentation fault when it attempts to free
wwpn.
While setting sgio does not appear to crash, it shares the same flawed logic
as setting rawio.
Instead, we ensure these are set only after the hostdev type check succeeds.
This patch also adds two test cases to exercise both scenarios.
Fixes: bdb95b520c
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add basic coverage of device update; for now, only support disk updates
until other types are needed or tested.
Signed-off-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The 'hostFips' member of _virQEMUDriver struct is not used
really, due to previous cleanups. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The support for VXHS device was removed in QEMU commit
v5.1.0-rc1~16^2~10. Since we require QEMU-5.2.0 at least there's
no QEMU that has the device and thus the corresponding capability
can be retired.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the minimal required version of QEMU is 5.2.0 the
conditional setting of QEMU_CAPS_ENABLE_FIPS and
QEMU_CAPS_NETDEV_USER is effectively a dead code. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
According to repology.org and/or distro repos these are the version of QEMU:
CentOS Stream 9: qemu-kvm-9.0.0
Debian 11: qemu-5.2.0
Fedora 39: qemu-8.3.1
openSUSE Leap 15.3: qemu-5.2.0
RHEL-8: qemu-6.2.0
Ubuntu 22.04: qemu-6.2.0
Since the minimal version is 5.2.0 we can bump from 4.2.0 to
5.2.0.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It may happen that QEMU is compiled without SLIRP but with
support for passt. In such case it is acceptable to alter user
provided configuration and switch backend to passt as it offers
all the features as SLIRP.
Resolves: https://issues.redhat.com/browse/RHEL-45518
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the logic for detecting supported net backend types has
been moved to domain capabilities generation, we can just use it
when validating net backend type. Just like we do for device
models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
After previous commits, domain capabilities XML reports basically
two possible values for backend type: 'default' and 'passt'.
Despite its misleading name, 'default' really means 'use
hypervisor's builtin SLIRP'. Since it's reported in domain
capabilities as a value accepted, make our parser and XML schema
accept it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
If mgmt apps on top of libvirt want to make a decision on the
backend type for <interface type='user'/> (e.g. whether past is
supported) we currently offer them no way to learn this fact.
Domain capabilities were invented exactly for this reason. Report
supported net backend types there.
Now, because of backwards compatibility, specifying no backend
type (which translates to VIR_DOMAIN_NET_BACKEND_DEFAULT) means
"use hyperviosr's builtin SLIRP". That behaviour can not be
changed. But it may happen that the hypervisor has no support for
SLIRP. So we have to report it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that we have a capability for each domain net backend we can
start validating user's selection against QEMU capabilities.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since -netdev user can be disabled during QEMU compilation, we
can't blindly expect it to just be there. We need a capability
that tracks its presence.
For qemu-4.2.0 we are not able to detect the capability so do the
next best thing - assume the capability is there. This is
consistent with our current behaviour where we blindly assume the
capability, anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The original code was incorrect and never tested because at the time of
implementing it the cgroup file `io.weight` was not available.
Resolves: https://issues.redhat.com/browse/RHEL-45185
Introduced-by: 9c1693eff4
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
When enabling switchover-ack on qemu from libvirt, the .party value
was set to both source and target; however, qemuMigrationParamsCheck()
only takes that into account to validate that the remote side of the
migration supports the flag if it is marked optional or auto/always on.
In the case of switchover-ack, when enabled on only the dst and not
the src, the migration will fail if the src qemu does not support
switchover-ack, as the dst qemu will issue a switchover-ack msg:
qemu/migration/savevm.c ->
loadvm_process_command ->
migrate_send_rp_switchover_ack(mis) ->
migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)
Since the src qemu doesn't understand messages with header_type ==
MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
qemu-kvm: Unable to write to socket: Bad file descriptor
Looking at the original commit [1] for optional migration capabilities,
it seems that the spirit of optional handling was to enhance a given
existing capability where possible. Given that switchover-ack
exclusively depends on return-path, adding it as optional to that cap
feels right.
[1] 61e34b0856 ("qemu: Add support for optional migration capabilities")
Fixes: 1cc7737f69 ("qemu: add support for qemu switchover-ack")
Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In ideal world, where clients close connection gracefully their
SASL session is freed in virNetServerClientDispose() as it's
stored in client->sasl. Unfortunately, if client connection is
closed prematurely (e.g. the moment virsh asks for credentials),
the _virNetServerClient member is never set and corresponding
SASL session is never freed. The handler is still stored in
client private data, so free it in remoteClientCloseFunc().
20,862 (288 direct, 20,574 indirect) bytes in 3 blocks are definitely lost in loss record 1,763 of 1,772
at 0x50390C4: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501BDAF: g_object_new_internal.part.0 (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501D43D: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x501E318: g_object_new (in /usr/lib64/libgobject-2.0.so.0.7800.6)
by 0x49BAA63: virObjectNew (virobject.c:252)
by 0x49BABC6: virObjectLockableNew (virobject.c:274)
by 0x4B0526C: virNetSASLSessionNewServer (virnetsaslcontext.c:230)
by 0x18EEFC: remoteDispatchAuthSaslInit (remote_daemon_dispatch.c:3696)
by 0x15E128: remoteDispatchAuthSaslInitHelper (remote_daemon_dispatch_stubs.h:74)
by 0x4B0FA5E: virNetServerProgramDispatchCall (virnetserverprogram.c:423)
by 0x4B0F591: virNetServerProgramDispatch (virnetserverprogram.c:299)
by 0x4B18AE3: virNetServerProcessMsg (virnetserver.c:135)
Resolves: https://issues.redhat.com/browse/RHEL-22574
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that the logic for detecting supported launchSecurity types
has been moved to domain capabilities generation, we can just use
it when validating launchSecurity type. Just like we do for
device models and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The inspiration for these rules comes from
qemuValidateDomainDef().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In order to learn what types of <launchSecurity/> are supported
users can turn to domain capabilities and find <sev/> and
<s390-pv/> elements. While these may expose some additional info
on individual launchSecurity types, we are lacking clean
enumeration (like we do for say device models). And given that
SEV and SEV SNP share the same basis (info found under <sev/> is
applicable to SEV SNP too) we have no other way to report SEV SNP
support.
Therefore, report supported launchSecurity types in domain
capabilities.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While it's very unlikely to have QEMU that supports SEV-SNP but
doesn't support plain SEV, for completeness sake we ought to
query SEV capabilities if QEMU supports either. And similarly to
QEMU_CAPS_SEV_GUEST we need to clear the capability if talking to
QEMU proves SEV is not really supported.
This in turn removes the 'sev-snp-guest' capability from one of
our test cases as Peter's machine he uses to refresh capabilities
is not SEV capable. But that's okay. It's consistent with
'sev-guest' capability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
An iSCSI device with zero hosts will result in a segmentation fault. This patch
adds a check for the number of hosts, which must be one in the case of iSCSI.
Minimal reproducing XML:
<domain type='qemu'>
<name>MyGuest</name>
<uuid>4dea22b3-1d52-d8f3-2516-782e98ab3fa0</uuid>
<os>
<type arch='x86_64'>hvm</type>
</os>
<memory>4096</memory>
<devices>
<disk type='network'>
<source name='dummy' protocol='iscsi'/>
<target dev='vda'/>
</disk>
</devices>
</domain>
Signed-off-by: Rayhan Faizel <rayhan.faizel@gmail.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Add plumbing for QEMU's switchover-ack migration capability, which
helps lower the downtime during VFIO migrations. This capability is
enabled by default as long as both the source and destination support
it.
Note: switchover-ack depends on the return path capability, so this may
not be used when VIR_MIGRATE_TUNNELLED flag is set.
Extensive details about the qemu switchover-ack implementation are
available in the qemu series v6 cover letter [1] where the highlight is
the extreme reduction in guest visible downtime. In addition to the
original test results below, I saw a roughly ~20% reduction in downtime
for VFIO VGPU devices at minimum.
=== Test results ===
The below table shows the downtime of two identical migrations. In the
first migration swithcover ack is disabled and in the second it is
enabled. The migrated VM is assigned with a mlx5 VFIO device which has
300MB of device data to be migrated.
+----------------------+-----------------------+----------+
| Switchover ack | VFIO device data size | Downtime |
+----------------------+-----------------------+----------+
| Disabled | 300MB | 1900ms |
| Enabled | 300MB | 420ms |
+----------------------+-----------------------+----------+
Switchover ack gives a roughly 4.5 times improvement in downtime.
The 1480ms difference is time that is used for resource allocation for
the VFIO device in the destination. Without switchover ack, this time is
spent when the source VM is stopped and thus the downtime is much
higher. With switchover ack, the time is spent when the source VM is
still running.
[1] https://patchwork.kernel.org/project/qemu-devel/cover/20230621111201.29729-1-avihaih@nvidia.com/
Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
When starting a domain on a host which lacks a vmx-* CPU feature which
is expected to be enabled by the CPU model specified in the domain XML,
libvirt properly marks such feature as disabled in the active domain
XML. But migrating the domain to a similar host which lacks the same
vmx-* feature will fail with libvirt reporting the feature as missing.
This is because of a bug in the hack ensuring backward compatibility
libvirt running on the destination thinks the missing feature is
expected to be enabled.
https://issues.redhat.com/browse/RHEL-40899
Fixes: v10.1.0-85-g5fbfa5ab8a
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 7c8e606b64 attempted to fix
the specification of the ramfb property for vfio-pci devices, but it
failed when ramfb is explicitly set to 'off'. This is because only the
'vfio-pci-nohotplug' device supports the 'ramfb' property. Since we use
the base 'vfio-pci' device unless ramfb is enabled, attempting to set
the 'ramfb' parameter to 'off' this will result in an error like the
following:
error: internal error: QEMU unexpectedly closed the monitor
(vm='rhel'): 2024-06-06T04:43:22.896795Z qemu-kvm: -device
{"driver":"vfio-pci","host":"0000:b1:00.4","id":"hostdev0","display":"on
","ramfb":false,"bus":"pci.7","addr":"0x0"}: Property 'vfio-pci.ramfb'
not found.
This also more closely matches what is done for mdev devices.
Resolves: https://issues.redhat.com/browse/RHEL-28808
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The attribute 'discard_no_unref' of <disk/> is not allowed to be
changed while the virtual machine is running.
Resolves: https://issues.redhat.com/browse/RHEL-37542
Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
A user reported that if they set <forward mode='nat|route' dev='blah'>
starting the network would fail if the device 'blah' didn't already
exist.
This is caused by using "iif" and "oif" in nftables rules to check for
the forwarding device - these two commands work by saving the named
interface's ifindex (an unsigned integer) when the rule is added, and
comparing it to the ifindex associated with the packet's path at
runtime. This works great if the interface both 1) exists when the
rule is added, and 2) is never deleted and re-created after the rule
is added (since it would end up with a different ifindex).
When checking for the network's bridge device, it is okay for us to
use "iif" and "oif", because the bridge device is created before the
firewall rules are added, and will continue to exist until just after
the firewall rules are deleted when the network is shutdown.
But since the forward device might be deleted/re-added during the
lifetime of the network's firewall rules, we must instead us "oifname"
and "iifname" - these are much less efficient than "Xif" because they
do a string compare of the interface's name rather than just comparing
two integers (ifindex), but they don't require the interface to exist
when the rule is added, and they can properly cope with the named
interface being deleted and re-added later.
Fixes: a4f38f6ffe
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
A few commits ago (v10.4.0-101-gc65eba1f57) I've introduced
virDomainDefLaunchSecurityValidate() and a switch() statement in
it. Some cases are empty but are lacking 'break' statement which
is not valid. Provide missing 'break' statement.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The firmware descriptors have 'amd-sev-snp` feature which
describes whether firmware is suitable for SEV-SNP guests.
Provide necessary implementation to detect the feature and pick
the right firmware if guest is SEV-SNP enabled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Pretty straightforward as qemu has 'sev-snp-guest' object which
attributes maps pretty much 1:1 to our XML model. Except for
@vcek where QEMU has 'vcek-disabled`, an inverted boolean, while
we model it as virTristateBool. But that's easy to map too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
SEV-SNP is an enhancement of SEV/SEV-ES and thus it shares some
fields with it. Nevertheless, on XML level, it's yet another type
of <launchSecurity/>.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This capability tracks sev-snp-guest object availability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In QEMU commit v9.0.0-1155-g59d3740cb4 the return type of
'query-sev' monitor command changed to accommodate SEV-SNP. Even
though we currently support launching plain SNP guests, this will
soon change.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In a few instances there is a plain if() check for
_virDomainSecDef::sectype. While this works perfectly for now,
soon there'll be another type and we can utilize compiler to
identify all the places that need adaptation. Switch those if()
statements to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The sectype member of _virDomainSecDef struct is already declared
as of virDomainLaunchSecurity type. There's no need to typecast
it to the very same type when passing it to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
To avoid convolution of switch() inside of virDomainSecDefFormat() even
more (as new sectypes are added), move formatting into a separate
function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Some parts of SEV are to be shared with SEV SNP. In order to
reuse XML parsing / formatting code cleanly, let's move those
common bits into a new struct (virDomainSEVCommonDef) and adjust
rest of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While working on qemuMonitorJSONGetSEVMeasurement() and
qemuMonitorJSONGetSEVInfo() I've noticed that if these functions
fail, they do so without appropriate error set. Fill in error
reporting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When a VM terminates itself while it's being migrated in running state
libvirt would report wrong error:
error: cannot get locked memory limit of process 2502057: No such file or directory
rather than the proper error:
error: operation failed: domain is not running
Remember the error on error paths in qemuMigrationSrcConfirmPhase and
qemuMigrationSrcPerformPhase.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'qemuProcessStop()' clears the 'current' job data. While the code under
the 'error' label in 'qemuMigrationSrcRun()' does check that the VM is
active before accessing the job, it also invokes multiple helper
functions to clean up the migration including
'qemuMigrationSrcNBDCopyCancel()' which calls 'qemuDomainObjWait()'
invalidating the result of the liveness check as it unlocks the VM.
Duplicate the liveness check and explain why. The rest of the code e.g.
accessing the monitor is safe as 'qemuDomainEnterMonitorAsync()'
performs a liveness check. The cleanup path just ignores the return
values of those functions.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function is a pointless wrapper on top of
qemuMigrationDstWaitForCompletion.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to the one change in commit 4d1a1fdffd
we should be checking that the VM is not being yet destroyed if we've
invoked qemuDomainObjWait().
Use the new helper qemuDomainObjIsActive().
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The helper checks whether VM is active including the internal qemu
state. This helper will become useful in situations when an async job
is in use as VIR_JOB_DESTROY can run along async jobs thus both checks
are necessary.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Prevent the possibility that a VM could be considered as alive while
inside qemuProcessStop.
A recently fixed bug which unlocked the domain object while inside
qemuProcessStop showed that there's possibility to confuse the state of
the VM to be considered active while 'qemuProcessStop' is processing
shutdown of the VM. Ensure that this doesn't happen by clearing the
'beingDestroyed' flag only after the VM id is cleared.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
There are few function calls done while cleaning up a stopped VM which
do require the old VM id, to e.g. clean up paths containing the 'short'
domain name in the path.
Anything else, which doesn't strictly require it can be moved after
clearing the 'id' in order to decrease likelyhood of potential bugs.
This patch moves all the code which does not require the 'id' (except
for the log entry and closing the monitor socket) after the statement
clearing the id and adds a comment explaining that anything in the
section must not unlock the VM object.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'qemuDomainObjStopWorker()' which is meant to dispose of the event loop
thread for the monitor unlocks the VM object while disposing the thread
to prevent possible deadlocks with events waiting on the monitor thread.
Unfortunately 'qemuDomainObjStopWorker()' is called *before* the VM is
marked as inactive by clearing 'vm->def->id', but at the same time it's
no longer marked as 'beingDestroyed' when we're inside
'qemuProcessStop()'.
If 'vm' would be kept locked this wouldn't be a problem. Same way it's
not a problem for anything that uses non-ASYNC VM jobs, or when the
monitor is accessed in an async job, as the 'destroy' job interlocks
with those.
It is a problem for code inside an async job which uses
'qemuDomainObjWait()' though. The API contract of qemuDomainObjWait()
ensures the caller that the VM on successful return from it, but in this
specific reason it's not the case, as both 'beingDestroyed' is already
false, and 'vm->def->id' is not yet cleared.
To fix the issue move the 'qemuDomainObjStopWorker()' call *after*
clearing 'vm->def->id' and also add a note stating what the function is
doing.
Fixes: 860a999802
Closes: https://gitlab.com/libvirt/libvirt/-/issues/640
Reported-by: luzhipeng <luzhipeng@cestc.cn>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Document why this function exists and meaning of return values.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Clear the 'disk' member of 'blockjob' as we're freeing the disk object
at this point. While this should not normally happen it was observed
when other bug allowed the VM to be cleared while other threads didn't
yet finish.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Similarly to other blockjob handlers, if there's no disk associated with
the blockjob the handler needs to behave correctly. This is needed as
the disk might have been de-associated on unplug or other operations.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Instead of fencing offline ccw devices add the state to the ccw
capability.
Resolves: https://issues.redhat.com/browse/RHEL-39497
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Prevent the creation of a new DASD node object when the device does not
exist.
Resolves: https://issues.redhat.com/browse/RHEL-39497
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In newer DASD driver versions the ID_TYPE tag is supported. This tag is
missing after a system reboot but when the ccw device is set offline and
online the tag is included. To fix this version independently we need to
check if devices detected as type disk is actually a DASD to maintain
the node object consistency and not end up with multiple node objects
for DASDs.
Resolves: https://issues.redhat.com/browse/RHEL-39497
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Refactor the storage type fixup into a reusable method.
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The virNetworkObjSetFwRemoval() function is called at least two
times when there's a network running and network driver
initializes:
1) when loading state XMLs:
#0 virNetworkObjSetFwRemoval (obj=0x7fffd4028250, fwRemoval=0x7fffd4020ad0) at ../src/conf/virnetworkobj.c:258
#1 0x00007ffff7a69c68 in virNetworkLoadState (...) at ../src/conf/virnetworkobj.c:952
#2 0x00007ffff7a6a35d in virNetworkObjLoadAllState (...) at ../src/conf/virnetworkobj.c:1072
#3 0x00007ffff7f9625f in networkStateInitialize (...) at ../src/network/bridge_driver.c:624
2) when firewall rules are being reloaded:
#0 virNetworkObjSetFwRemoval (obj=0x7fffd4028250, fwRemoval=0x7fffd402e5b0) at ../src/conf/virnetworkobj.c:258
#1 0x00007ffff7f997b4 in networkReloadFirewallRulesHelper (obj=0x7fffd4028250, opaque=0x0) at ../src/network/bridge_driver.c:1703
#2 0x00007ffff7a6b09b in virNetworkObjListForEachHelper (payload=0x7fffd4028250, ...) at ../src/conf/virnetworkobj.c:1414
#3 0x00007ffff79287b6 in virHashForEachSafe (...) at ../src/util/virhash.c:387
#4 0x00007ffff7a6b119 in virNetworkObjListForEach (...) at ../src/conf/virnetworkobj.c:1441
#5 0x00007ffff7f99978 in networkReloadFirewallRules (...) at ../src/network/bridge_driver.c:1742
#6 0x00007ffff7f962f2 in networkStateInitialize (...) at ../src/network/bridge_driver.c:645
Since virNetworkObjSetFwRemoval() does not free the object stored
in the first call, the second call just overwrites the stored
pointer leading to a memory leak:
5,530 (48 direct, 5,482 indirect) bytes in 1 blocks are definitely lost in loss record 1,863 of 1,880
at 0x4848C43: calloc (vg_replace_malloc.c:1595)
by 0x4F1E979: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.7800.6)
by 0x4976E32: virFirewallNew (virfirewall.c:118)
by 0x4979BA9: virFirewallParseXML (virfirewall.c:1071)
by 0x4ABEB1E: virNetworkLoadState (virnetworkobj.c:938)
by 0x4ABF35C: virNetworkObjLoadAllState (virnetworkobj.c:1072)
by 0x4E9A25E: networkStateInitialize (bridge_driver.c:624)
by 0x4CB1FA6: virStateInitialize (libvirt.c:665)
by 0x15A6C6: daemonRunStateInit (remote_daemon.c:611)
by 0x49E69F0: virThreadHelper (virthread.c:256)
by 0x532B428: start_thread (in /lib64/libc.so.6)
by 0x5397373: clone (in /lib64/libc.so.6)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
As a part of parsing XML, virFirewallParseXML() calls
virXMLNodeContentString() and then passes the return value
further. But virXMLNodeContentString() is documented so that it's
the caller's responsibility to free the returned string, which
virFirewallParseXML() never does. This leads to a memory leak:
14,300 bytes in 220 blocks are definitely lost in loss record 1,879 of 1,891
at 0x4841858: malloc (vg_replace_malloc.c:442)
by 0x5491E3C: xmlBufCreateSize (in /usr/lib64/libxml2.so.2.12.6)
by 0x54C2401: xmlNodeGetContent (in /usr/lib64/libxml2.so.2.12.6)
by 0x49F7791: virXMLNodeContentString (virxml.c:354)
by 0x4979F25: virFirewallParseXML (virfirewall.c:1134)
by 0x4ABEB1E: virNetworkLoadState (virnetworkobj.c:938)
by 0x4ABF35C: virNetworkObjLoadAllState (virnetworkobj.c:1072)
by 0x4E9A25E: networkStateInitialize (bridge_driver.c:624)
by 0x4CB1FA6: virStateInitialize (libvirt.c:665)
by 0x15A6C6: daemonRunStateInit (remote_daemon.c:611)
by 0x49E69F0: virThreadHelper (virthread.c:256)
by 0x532B428: start_thread (in /lib64/libc.so.6)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>