When automatically adding a NUMA node (qemuDomainDefNumaAutoAdd()) the
memory size of the node is computed as:
total_memory - sum(memory devices)
And we have a nice helper for that: virDomainDefGetMemoryInitial() so
it looks logical to just call it. Except, this code runs in post parse
callback, i.e. memory sizes were not validated and it may happen that
the sum is greater than the total memory. This would be caught by
virDomainDefPostParseMemory() but that runs only after driver specific
callbacks (i.e. after qemuDomainDefNumaAutoAdd()) and because the
domain config was changed and memory was increased to this huge
number no error is caught.
So let's do what virDomainDefGetMemoryInitial() would do, but
with error checking.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2216236
Fixes: f5d4f5c8ee
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
This brings the tool's list of features in sync with qemu
commit 6f05a92ddc73ac8aa16cfd6188f907b30b0501e3.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Inside daemonStreamHandleWrite on stream completion (status=OK) we
reuse msg object to send confirmation.
Only after that, msg is poped from the queue and checked for continue.
By that time, msg might've already been processed for the confirmation
and freed.
Signed-off-by: Oleg Vasilev <oleg.vasilev@virtuozzo.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Helped to debug next patch use-after-free.
Signed-off-by: Oleg Vasilev <oleg.vasilev@virtuozzo.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If a per-domain SWTPM state directory exists but is empty our
code still considers it a valid state and skips running
'swtpm_setup' (handled in qemuTPMEmulatorRunSetup()).
While we should not try to inspect individual files created by
swtpm, we can still consider empty folder as non-existent state.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/320
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There might be cases where we want to know whether given
directory is empty or not. Introduce a helper for that:
virDirIsEmpty().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Queues is supported by virtio bus, including virtio-blk and
vhost-user-blk.
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we don't use it for probing at all we can remove all the
corresponding monitor code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability code now probes the presence of commands from the QMP
schema instead of using 'query-commands'. Don't call the command and
adjust the '.replies' files.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Move the probing code to extract the data from the QMP schema rather
than invoking 'query-commands'. This patch doesn't yet remove the actual
invocation of 'query-commands', just moves the actual probing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
nodeDeviceUpdateMediatedDevices invokes virMdevctlListDefined and
virMdevctlListActive both of which were passed the same 'errmsg' buffer.
Since virCommandSetErrorBuffer() always allocates the error buffer one
of them was leaked.
Fix it by populating the 'errmsg' buffer only on failure of
virMdevctlListActive|Defined which invoke the command.
Add a comment to nodeDeviceGetMdevctlListCommand reminding how
virCommandSetErrorBuffer() works.
Fixes: 44a0f2f0c8
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
The support for configuring the 'wwn' of a IDE disk was added in qemu
commit 95ebda85e09 (v1.0-1869-g95ebda85e0) and can't be compiled
out.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The support for configuring the 'wwn' of a SCSI disk was added in qemu
commit 27395add759ff4caeb0 (v1.0-3326-g27395add75) and can't be compiled
out.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
CVE-2023-3750
'virStoragePoolObjListSearch' explicitly documents that it's returning
a pointer to a locked and ref'd pool that maches the lookup function.
This was not the case as in commit 0c4b391e2a (released in
libvirt-8.3.0) the code was accidentally converted to use 'VIR_LOCK_GUARD'
which auto-unlocked it when leaving the scope, even when the code was
originally "leaking" the lock.
Revert the corresponding conversion and add a comment that this function
is intentionally leaking a locked object.
Fixes: 0c4b391e2a
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2221851
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All backing chain members which were auto-added by image detection,
including the terminating element, should have the 'detected' property
set to true. This is needed to properly strip the detected elements in
some cases, e.g. for the status XML where we could treat some images as
manually terminated even when it was auto-detected.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rewrap argument definition of qemuDomainSaveInternal and align argument
in the invocation of the aforementioned function in
qemuDomainManagedSaveHelper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After the previous commit we no longer require that logind is actually
running, it merely has to be activatable.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Historically we wanted to check if logind was actually running, not
merely activatable, because on systems where systemd is installed,
but the OS is booted into non-systemd init, we want to fallback to
pm-utils.
Requiring logind to be running, however, forces us to serialize libvirtd
startup on startup of logind which is undesirable. We can relax this
dependancy if we check whether systemd itself is running, which implies
that logind will activated when we need it.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since systemd 240, all services get an open file hard limit of
500k, and a soft limit of 1024. This limit means apps are safe
to use select() by default which is limited to 1024 FDs. Apps
which don't use select() are expected to simply set their soft
limit to match the hard limit during startup.
With our current unit file settings we've been effectively
reducing the max open files we have on most modern systems.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
None of our daemons use select(), so it is safe to raise the max file
limit to its maximum on startup.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Historically the max files limit for processes has always been 1024,
because going beyond this is incompatible with the select() function.
None the less most apps these days will use poll() so should not be
limited in this way.
Since systemd >= 240, the hard limit will be 500k, while the soft
limit remains at 1k. Applications which don't use select() should
raise their soft limit to match the hard limit during their startup.
This function provides a convenient helper to do this limit raising.
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
These wrappers added no semantic difference over calling the system
function directly.
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The unit files both have After=network.target, and this in turn implies
After=network-pre.target. Both iptables.service & ip6tables.service have
Before=network-pre.target since Fedora >= 35 and RHEL >= 8.4.
When we first added the deps on ip[6]tables.service in
commit 0756415f14
Author: Laine Stump <laine@redhat.com>
Date: Fri May 1 00:05:50 2020 -0400
systemd: start libvirtd after firewalld/iptables services
the Before=network-pre.target didn't exist, but we can rely on it now
given our supported platforms matrix.
The firewalld.service has similarly has a Before=network-pre.target,
even when we took that commit above, so this dep was in face never
actually needed. This answers the question posed in that above commit
message about firewalld ordering.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
All services are ordered after local-fs.target unless they have set
DefaultDependencies=no, which we do not do.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
A test case can be part of a test suite (just like we already
have 'syntax-check'). This then allows developers to run only a
subset of tests. For instance - when using valgrind test setup
(`meson test -C _build/ --setup valgrind`) it makes zero sense to
run syntax-check tests or other script based tests (e.g.
check-augeas-*, check-remote_protocol, etc.). What does makes
sense is to run compiled binaries.
Strictly speaking, reaching that goal is as trivial as annotating
only those compiled tests (declared in tests/meson.build) and
running them selectively:
meson test -C _build/ --setup valgrind --suite $TAG
But it may be also desirable to run test scripts separately.
Therefore, introduce two new tags: 'bin' for compiled tests, and
'script' for script based tests and annotate each test()
accordingly.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The current virtStorageBackendZFSCheckPool checks for the existence of a
path under /dev/zvol/ to determine if the pool is active. ZFS does not
create a path under /dev/zvol/ if no ZFS volumes have been created under
a particular dataset, thus, empty ZFS storage pools are deactivated
whenever checkPool is called on them (as noted in referenced issue).
This commit changes virStorageBackendZFSCheckPool so that the 'zfs list'
command is used to explicitly check for the existence a dataset
specified by the pool's def->source.name.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/221
Signed-off-by: Matt Low <matt@mlow.ca>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since commit 44a0f2f0, we now query mdevctl for transient (active) mdevs
in order to gather attributes for the mdev. Unfortunately, this commit
introduced a regression because nodeDeviceUpdateMediatedDevice() assumed
that all mdevs returned from mdevctl were actually persistent mdevs but
we were using it to update transient mdevs. Refactor the function so
that we can use it to update both persistent and transient mdevs.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
The virtio-gpu 'blob' support was insufficiently validated. Qemu
requires a memfd memory backing in order to use udmabuf and enable blob
support. Example error:
$ virsh start rhel9
error: Failed to start domain 'rhel9'
error: internal error: qemu unexpectedly closed the monitor: 2023-07-18T02:33:57.083178Z qemu-kvm: -device {"driver":"virtio-vga","id":"video0","max_outputs":1,"blob":true,"bus":"pcie.0","addr":"0x1"}: cannot enable blob resources without udmabuf
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Historically, the way to set PC speaker for a guest was to pass:
-soundhw pcspk
but as of QEMU commit v5.1.0-rc0~28^2~3 this is deprecated and we
should use:
-machine pcspk-audiodev=$id
instead. The old way was then removed in commit v7.1.0-rc0~99^2~3.
Now, ideally we would have a capability selecting whether we talk
to a QEMU that understands the new way or not. But it's not that
simple - the machine attribute is just an alias to the .audiodev=
attribute of 'isa-pcspk' object and both are created in
pc_machine_initfn() function, i.e. not then the PC_MACHINE() class
is initialized, but when it's instantiated. IOW, it's not possible
for us to query whether we're dealing with older or newer QEMU.
But given that the newer version is supported since v5.1.0 and the
minimal version we require is v4.2.0 (i.e. there are two releases
which don't understand the newer cmd line) and how frequently this
feature is (un-)used (the issue was reported after ~1 year since it
stopped working), I believe we can live without any capability and
just use the newer cmd line unconditionally.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/490
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Now that the QEMU_CAPS_USB_STORAGE_REMOVABLE capability is no
longer used we can stop querying it and retire it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Introduced in QEMU commit of v0.14.0-rc0~83^2~1 and not being
able to compile the .removable attribute of the "usb-storage"
object out, renders our corresponding capability
QEMU_CAPS_USB_STORAGE_REMOVABLE always set. Stop using it in
command generation / domain validation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
After previous commit, there's no functional difference between
real virRandomGenerateWWN() and the mocked version. Drop the mock
then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This brings the code closer to real implementation:
nodeDeviceCreateXML(). For the unique OUI, let's take the value
from tests/virrandommock.c: 100000.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Firstly, drop needless concatenation of two static strings.
Secondly, use proper (portable) formatter for uint64_t so that
typecast to ULL can be dropped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Commit be1b7d5b18 introduced parsing /proc/cpuinfo for "address size"
which is not including on S390 and therefore reports an internal error.
Lets remove the parsing on S390.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add async-teardown to the features list in domain capabilities allowing
high level management to introspect the availability of the asynchronous
teardown feature.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Up until v2.11.0-rc2~19^2~3 QEMU used to require at least one
NUMA node to be configured when memory hotplug was enabled. After
that commit, QEMU automatically adds a NUMA node if none was
specified on the cmd line. Reflect this in domain XML, i.e.
explicitly add a NUMA node into our domain definition if needed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2216236
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Rather than directly executing mdevctl from the udev event thread when
we determine that we need to re-query, schedule the mdevctl thread to
run. This also helps to coalesce multiple back-to-back updates into a
single one when there are multiple updates in a row or at startup when a
host has a very large number of mdevs.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Factor out a new scheduleMdevctlUpdate() function so that we can re-use
it from other places. Now that other events can make it necessary to
re-query mdevctl for mdev updates, this function will be useful for
coalescing multiple updates in quick succession into a single mdevctl
query.
Also rename a couple functions. The names weren't very descriptive of
their behavior. For example, the old scheduleMdevctlHandler() function
didn't actually schedule anything, it just started a thread. So rename
it to free up the 'schedule' name for the above refactored function.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Update the optional mdev attributes by running an mdevctl update on a
new created nodedev object representing an mdev.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
If a domain has NUMA configured, then all <memory/> devices
(except for 'virtio-pmem') need to have targetNode set. There are
two checks inside of qemuDomainDefValidateMemoryHotplugDevice()
for this: one inside of big switch() statement, which only checks
'dimm' and 'nvdimm' cases, and the other at the end of the
function that checks all models (except for 'virtio-pmem'). Let's
keep the latter and remove the former as the latter covers the
former too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The libxl driver has basic support for VIR_MIGRATE_CHANGE_PROTECTION
by starting and stopping modify jobs in the begin/confirm and prepare/finish
phases of migration, but it doesn't advertise that support. This can result
in unterminated jobs because the migration logic skips phases of migration
when the VIR_MIGRATE_CHANGE_PROTECTION feature is absent. Ensure jobs are
terminated properly by advertising support for VIR_MIGRATE_CHANGE_PROTECTION.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
For unknown reasons, the libxl driver attempts to resume a domain in the
confirm phase when a migration operation has been canceled. This has shown
to be problematic when simulating scenarios that result in a canceled
migration. In all scenarios, the domain was in a running state when entering
libxlDomainMigrationSrcConfirm, causing the call to libxl_domain_resume to
fail. Making matters worse, the domain state is changed to paused when in
fact it's running. And finally, libxlDomainMigrationSrcConfirm incorrectly
returns an error.
Remove this incorrect logic from libxlDomainMigrationSrcConfirm. On a
canceled migration it's sufficient to resume the lock process that was
paused in the perform phase.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Our CI started to enable udev backend on FreeBSD. And while there
is udev on FreeBSD some parts of our code are highly Linux
specific, e.g. translating SCSI device type to string (from an
integer obtained from the sysfs). Obviously, this doesn't work
anywhere else. This is the reason why we need to include
scsi/scsi.h header file (which actually comes from the Linux
kernel source tree but for some reason glibc started to
distribute it, followed by musl).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Asynchronous teardown can be specified if the QEMU binary supports it by
adding in the domain XML
<features>
...
<async-teardown enabled='yes|no'/>
...
</features>
By default this new feature is disabled.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
QEMU capability is looking in query-command-line-options response for
...
{
"parameters": [
{
"name": "async-teardown",
"type": "boolean"
}
],
"option": "run-with"
}
...
allow to use the QEMU option -run-with async-teardown=on|off
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Allow //disk/target@removable for scsi disk devices, since QEMU has support
the removable attribute for scsi-hd device from v0.14.0[1].
[1]: 419e691f8e: scsi-disk: Allow overriding SCSI INQUIRY removable bit
Signed-off-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Do for all other profiles what we already do for the
virt-aa-helper one. In this case we limit the feature to AppArmor
3.x, as it was never implemented for 2.x.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
For AppArmor 3.x we can use 'include if exists', which frees us
from having to create a dummy override. For AppArmor 2.x we keep
things as they are to avoid introducing regressions.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Implement the standard AppArmor 3.x abstraction extension
approach.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
The subprofile can only work by including the abstraction shipped
in the passt package, which we can't assume is present, and
'include if exists' doesn't work well on 2.x.
No distro that's stuck on AppArmor 2.x is likely to be shipping
passt anyway.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Compared to profiles, we only need a single preprocessing step
here, as there is no variable substitution happening.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Perform an additional preprocessing step before the existing
variable substitution. This is the same approach that we already
use to customize systemd unit files based on whether the service
supports TCP connections.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
After v8.1.0-61-g030faee28d it is no longer necessary to make the
/proc/meminfo file nonseekable as our code that fills the file
with spoofed values can handle seeking just fine.
Previously, `free(1)` was okay with failed lseek(), but this was
ages ago and meanwhile the procps project moved to creating a
library and moved the file parsing code under an exported
function. In attempt to make the function callable multiple
times, it can lseek() multiple times and failure to do so is
fatal.
This reverts commit 7664955086
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/492
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Fix the syntax-check failures (which can be seen after
python3-flake8-import-order package is installed) with the help
of isort[1]:
289/316 libvirt:syntax-check / flake8 FAIL 5.24s exit status 2
[1]: https://pycqa.github.io/isort/
Signed-off-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As it turns out, apparmor 2.x and 3.x behave differently or have differing
levels of support for local customizations of profiles and profile
abstractions. Additionally the apparmor 2.x tools do not cope well with
'include if exists'. Revert this commit until a more complete solution is
developed that works with old and new apparmor.
Reverts: 9b743ee190
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
If VIR_ASYNC_JOB_NONE flag is present, job.current is equal
to NULL, which leads to SIGSEGV. Thus, this check should be
moved up.
Fixes: v8.0.0-427-gf304de0df6
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In one of my previous commits I've introduced @logfd variable
that was supposed to hold FD of passt logfile. But I've forgot to
assign the qemuDomainOpenFile() retval to it.
Fixes: 8511b96a31
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There are a few situations where passt itself is unable to create
a file because it runs under QEMU user (e.g. just like our
example from formatdomain.rst suggests: /var/log/passt.log). If
libvirtd runs with sufficient permissions (e.g. as root) it can
create the file and set seclabels on it so that passt can then
open it.
Ideally, we would just pass pre-opened FD, but this wasn't viewed
as secure enough [1]. So lets just create the file and set
seclabels.
For the case when both libvirtd and passt have the same
permissions, well then we fail before even needing to fork() and
exec().
1: https://archives.passt.top/passt-dev/20230606225836.63aecebe@elisabeth/
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209191
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
New storage types are not implemented in generators for -drive and the
xen config. Explicitly reject them in case of a programming error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If there are no parameters, there is nothing to validate.
If params == NULL, memcpy below results in memcpy(sorted, NULL, 0),
which is UB.
Found by UBSAN. Example of this codepath: virDomainBlockCopy()
(where nparams == 0 is valid) -> qemuDomainBlockCopy()
Signed-off-by: Oleg Vasilev <oleg.vasilev@virtuozzo.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for
removal, it returns true, which means it is the caller
that marked the device for removal is going to remove the
device from domain definition.
But qemuDomainWaitForDeviceRemoval() may still receive
timeout from virDomainObjWaitUntil() which is implemented
by pthread_cond_timedwait() due to an unavoidable race
between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.
And then qemuDomainWaitForDeviceRemoval() will return 0,
thus the caller will not remove the device from domain
definition.
In this situation, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is
no way to remove it from the domain definition.
Solution is to recheck the value of priv->unplug.alias to
determine who is going to remove the device from domain
definition.
Signed-off-by: zuo boqun <zuoboqun@baidu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Qemu 8.1.0 will add discard_no_unref option for qcow2 images.
When this option is enabled (default=false), then it will no longer
unreference clusters when guest does a discard, but it will just free
the blocks (useful for incremental backups for example) and pass the
discard to the lower layer.
This was implemented to avoid fragmentation within the qcow2 image.
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The qcow2 driver allows passing discards to the storage while keeping
the reference of the block, and just marking it as zeroed. This can
decrease the levels of fragmentation of the qcow2 metadata when
discards are enabled.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Memory slots are required only for DIMM-like devices, but the maximum
memory address space is relevant also for other non-DIMM memory devices
such as virtio-mem. Allow configurations where no slots are added.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Memory slots are required only for DIMM-like devices, while other
devices defined via <memory> such as virtio-mem may use the PCI bus and
thus do not require/consume a memory slot.
Fix the validation code to calculate the required count of memory
devices only for DIMMs and NVDIMMs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Specify the memory size by using '-m size=2048k' instead of just '-m 2'.
The new syntax is used when memory hotplug is enabled. To preserve
memory sizing, if memory hotplug is disabled the size is rounded down to
the nearest mebibyte.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This should not be needed, but here's what's happening:
virStrToLong_*() family of functions was switched from strtol*()
to g_ascii_strtol*() in order to handle corner cases on Windows
(most notably parsing hex numbers with base=0) - see
v9.4.0-61-g2ed41d7cd9. But what we did not realize back then, is
the fact that g_ascii_strtol*() family has their own global lock
rendering virStrToLong_*() function unsafe between fork() +
exec(). Worse, if one of the threads has to wait for the lock (or
on its corresponding condition), then errno is mangled and
g_ascii_strtol*() signals an error, even though there's no error.
Read more here:
https://gitlab.gnome.org/GNOME/glib/-/issues/3034
Nevertheless, if we make glib init the g_ascii_strtol*() global
state (by calling one function from g_ascii_strtol*() family),
then there shouldn't be any congestion on the lock and thus no
errno mangling.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The current implementation of virConnectBaselineHypervisorCPU in QEMU
driver can provide a CPU definition that will not work on all hosts in
case they have different maximum physical address size. So when we get
the info from domain capabilities, we need to choose the smallest
physical address size for the computed baseline CPU definition.
https://bugzilla.redhat.com/show_bug.cgi?id=2171860
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We already report the hosts physical address size in host capabilities,
but computing a baseline CPU definition is done from domain
capabilities.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Newer GCC (13.1.1 in my case) wrongly reports "maybe uninitialized"
warning for this variable inside the next condition. Even though this
accusation is wrong (the condition is guarded by the same condition as
the for cycle initializing it), initialize it during the declaration so
compilation errors don't stop others and maybe also future proof the
code for changes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This has two main advantages:
- it parses the number with C locale explicitly
- it behaves the same on Windows as on Linux and BSD
both of which are wanted behaviours.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With the last user gone this function can be abolished. It is
preferable to use _ll instead since that is not a subject to 32/64 bit
scaling.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is used to fill an unsigned long long anyway and if it is negative
than there is really an issue somewhere.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The @unionMems argument of qemuProcessSetupPid() function is not
necessary really as all callers pass 'true'. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The unit that cpuset CGroups controller works with is a
thread/process, not individual memory allocations. Therefore,
after we've set cpuset.mems for emulator (after previous commit
it's set to union of all host NUMA nodes allowed for given
domain), and as we try to set up cpuset.mems for vCPUs/IOThreads,
memory is migrated to selected NUMA node(s). We are effectively
saying: "this thread (vCPU thread) can have memory only from
these NUMA node(s)".
That's not really what we want though. The cpuset controller
doesn't differentiate memory "belonging" to the emulator thread
and vCPU thread or IOThread even.
Therefore, set union of all allowed host NUMA nodes, just like
we're doing for the emulator thread.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In ideal world, my plan was perfect. We allow union of all host
nodes in cpuset.mems and once QEMU has allocated its memory, we
'fix up' restriction of its emulator thread by writing the
original value we wanted to set all along. But in fact, we can't
do it because that triggers memory movement. For instance,
consider the following <numatune/>:
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="1" mode="strict" nodeset="1"/>
</numatune>
<numa>
<cell id="0" cpus="0-1" memory="1024000" unit="KiB" />
<cell id="1" cpus="2-3" memory="1048576" unit="KiB"/>
</numa>
This is meant to create 1:1 mapping between guest and host NUMA
nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it
can allocate memory even for guest node #1 and have the memory
come fro host node #1) and then, set cpuset.mems to "0" (because
that's where we wanted emulator thread to live).
But this in turn triggers movement of all memory (even the
allocated one) to host NUMA node #0. Therefore, we have to just
keep cpuset.mems untouched and rely on .host-nodes passed on the
QEMU cmd line.
The placement still suffers because of cpuset.mems set for vcpus
or iothreads, but that's fixed in next commit.
Fixes: 3ec6d586bc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Apparmor profiles in /etc/apparmor.d/ are config files that can and should
be replaced on package upgrade, which introduces the potential to overwrite
any local changes. Apparmor supports local profile customizations via
/etc/apparmor.d/local/<service> [1].
This change makes the support explicit by adding libvirtd, virtqemud, and
virtxend profile customization stubs to /etc/apparmor.d/local/. The stubs
are conditionally included by the corresponding main profiles.
[1] https://ubuntu.com/server/docs/security-apparmor
See "Profile customization" section
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In one of its commits [1] libssh2 changed the 'text' member of
LIBSSH2_USERAUTH_KBDINT_PROMPT struct from 'char' to 'unsigned
char'. But we g_strdup() the member in order to fill 'prompt'
member of virConnectCredential struct. Typecast the value to
avoid warnings. Also, drop @prompt variable, as it's needless.
1: 83853f8aea
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Use automatic memory freeing and modern XML parsers to simplify the
function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Format the rule attributes in two passes, first for positive 'match' and
second pass for negative. This removes the crazy logic for switching
between match modes inside the formatter.
The refactor makes it also more clear in which cases we actually do
format something.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLNodeGetSubelementList to get the elements to process.
The new approach documents the complexity of the parser, which is
designed to ignore unknown attributes and parse only a single kind of
them after finding the first valid one.
Note that the XML schema doesn't actually allow having multiple
sub-elements, but I'm not sure how that translates to actual configs
present.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use modern parsing. Invalid numbers are now rejected. Semantis for
numbers out of range is preserved.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Convert the fields to the proper types and use virXMLPropEnum for
parsing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLFormatElement to simplify the formatter. Drop return value of
virNWFilterRuleDefFormat as there are no errors to report.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLNodeGetSubelement(List) instead of the looped parser and
simplify the code.
Note that handling of the 'bootp' element now conforms to the schema
where we allow just one and the 'file' attribute is mandatory.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The new helper is similar to virXPathNodeSet list but for cases where we
want to get subelements directly rather than using XPath.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There's nothing to clean up in the 'host' local variable on error as
the function which fills it makes sure to fill it only on success. In
such case it's also directly assigned to the array thus the 'host'
variable is cleared.
Remove the 'cleanup' label and 'ret' variable as we can now directly
return -1 on error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Extract the 'inbound'/'outbound' subelements using
virXMLNodeGetSubelement to simplify the code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove the unnecessary check for valid arguments and use
virXMLPropULongLong instead of hand-written property parsers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Every caller will pass 'qdevid' as it's populated in the data
mandatorily with qemu-4.2 and onwards due to mandatory -blockdev use.
Thus we can drop compatibility with the old way of matching the disk via
alias.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Every caller will pass 'qdevid' as it's populated in the data
mandatorily with qemu-4.2 and onwards due to mandatory -blockdev use.
Thus we can drop compatibility with the old way of matching the disk via
alias.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Historically this didn't work with any supported qemu version as we
don't set the alias of the device, and thus qemu uses a different alias
resulting in a failure to startup the VM:
internal error: unable to execute QEMU command 'block_set_io_throttle': Device 'drive-sd-disk0' not found
Refuse setting throttling as this is unlikely to be needed and proper
fix requires using -device instead of -drive if=sd.
Note that this was broken when I moved the setup of throttling as a
command at startup for blockdev integration quite a while ago. Until
then throttling was passed as arguments for -drive.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function doesn't modify it. Fix the argument declaration so that the
function can be used in a context where we have a 'const' disk
definition.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When a user requests debug logging by setting the environment variable:
LIBVIRT_DEBUG=1
we should log any errors regardless of the setting of e.g.
'LIBVIRT_LOG_OUTPUTS' as the code will log every 'debug' and 'info'
level message to stderr but will skip 'error' level messages.
This obviously makes debugging things very complicated as you can get to
a situation when the error itself is missing.
This can happen e.g. in tests.
Fix the issue by probing the default log level and calling the logger if
it's set for VIR_LOG_DEBUG.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When starting a domain, it's done so in two steps (actually more,
but lets focus on just the following two):
1) qemuProcessPrepareDomain(), followed by
2) qemuProcessPrepareHost().
Now, in the first step (PrepareDomain()), PCI backends for all
hostdevs is set (qemuProcessPrepareDomain() ->
qemuProcessPrepareDomainHostdevs() -> qemuDomainPrepareHostdev()
-> qemuDomainPrepareHostdevPCI()). Perfect.
But then, additional hostdevs may appear, because in the host
prepare phase we may insert some hostdevs into domain definition
(qemuProcessPrepareHost() -> qemuProcessNetworkPrepareDevices()).
Now, these additional hostdevs don't undergo the same prepare as
hostdevs that were already present in the domain definition (i.e.
in qemuProcessPrepareDomain() phase). Therefore, we have to call
corresponding prepare function explicitly.
NB, the interface hotplug code (qemuDomainAttachNetDevice()) does
not suffer from this problem, because it calls top level
qemuDomainAttachHostDevice() which is used to hotplug regular
hostdevs too and as such calls qemuDomainPrepareHostdev().
Fixes: 3b87709c76
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209853
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
It's almost like we've anticipated this. Our XML parser and
formatter handles @address and @dev attributes of <portForward/>
element completely independent of each other. And as of commit
2023_03_29.b10b983~3 passt allows handling these two separately
too. All that's left is generate the cmd line according to this
new fact.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2210287
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We allow (some) domain devices to have a different <seclabel/>
than the top level domain one (this is mostly to allow access to
a resource for multiple domains). Now, we do couple of sanity
checks for such <seclabel/>, e.g. when the <label/> is specified,
but '@relabel' is set to no. But what we are missing is the
opposite: when '@relabel' is set, but no <label/> was provided.
Our schema already denies such combination. Make our parser
behave the same.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2160356
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In virNodeDeviceGetSCSIHostCaps, there is a pattern of reusing
a tmp value and stealing the pointer.
But in two case it is not stolen. Use separate variables for them
to avoid mixing autofree with manual free() calls.
Fixes: 8a0cb5f73a
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This is fairly trivial. Just set .memaddr attribute if a value
was set in the XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2180679
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After a QEMU domain is started, among other thing we query memory
device information. And while memory address is returned by QEMU
for all models, we store it only for DIMMs and NVDIMMs. Do store
it for VIRTIO_MEM and VIRTIO_PMEM too.
This effectively reports the address the virtio-mem/virtio-pmem
is mapped to in live XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Both virtio-mem and virtio-pmem devices have '.memaddr' attribute
which controls the address where they are mapped in the guest
memory. Ideally, users do not need to specify this as QEMU does
the right thing and computes addresses automatically on startup.
But soon, we will need to record this address as it is part of
guest ABI. And also, there might be some users that want to
control this value. Now, we are in a bit of a pickle, because
both these device types already have a PCI address, therefore we
can't just use <address/> blindly. But what we can do, is
introduce <address/> under the <target/> element. This is also
more conceptual, as knobs under <target/> control guest visible
config of memory device (and .memaddr surely falls into that
category).
NB, SgxEPCDeviceInfo struct in QMP definition also has .memaddr
attribute, but because of the way we build cmd line there's no
(easy) way to set the attribute. So ignore that for now.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Due to missed break; statement the virDomainInputDefPostParse()
is called not only for VIR_DOMAIN_DEVICE_INPUT but also
VIR_DOMAIN_DEVICE_LEASE and VIR_DOMAIN_DEVICE_NET, which can lead
to all sort of unpredictable results.
Fixes: c4bc4d3b82
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This brings the tool's list of features in sync with qemu
commit 886c0453cbf10eebd42a9ccf89c3e46eb389c357.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Introduced in QEMU by commit v8.0.0-7eb061b06e.
Signed-off-by: Lin Yang <lin.a.yang@intel.com>
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
Instead of updating defined mdevs only add another update for active
devices as well to cover transient mdev devices as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Again, this fixes the same problem as one of previous commits,
but this time for memory hotplug. Long story short, if there's a
domain running and the emulator thread is restricted to a subset
of host NUMA nodes, but the memory that's about to be hotplugged
requires memory from a host NUMA node that's not in the set we
need to allow emulator thread to access the node, temporarily.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Consider a domain with two guest NUMA nodes and the following
<numatune/> setting :
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="0" mode="strict" nodeset="1"/>
</numatune>
What this means is the emulator thread is pinned onto host NUMA
node #0 (by setting corresponding cpuset.mems to "0"), and two
memory-backend-* objects are created:
-object '{"qom-type":"memory-backend-ram","id":"ram-node0", .., "host-nodes":[1],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
-object '{"qom-type":"memory-backend-ram","id":"ram-node1", .., "host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
Note, the emulator thread is pinned well before QEMU is even
exec()-ed.
Now, the way memory allocation works in QEMU is: the emulator
thread calls mmap() followed by mbind() (which is sane, that's
how everybody should do it). BUT, because the thread is already
restricted by CGroups to just NUMA node #0, calling:
mbind(host-nodes:[1]); /* made up syntax (TM) */
fails. This is expected though. Kernel was instructed to place
the memory at NUMA node "0" and yet, process is trying to place
it elsewhere.
We used to solve this by not restricting emulator thread at all
initially, and only after it's done initializing (i.e. we got the
QMP greeting) we placed it onto desired nodes. But this had its
own problems (e.g. QEMU might have locked pieces of its memory
which were then unable to migrate onto different NUMA nodes).
Therefore, in v5.1.0-rc1~282 we've changed this and set cgroups
upfront (even before exec()-ing QEMU). And this used to work, but
something has changed (I can't really put my finger on it).
Therefore, for the initialization start the thread with union of
all configured host NUMA nodes ("0-1" in our example) and fix the
placement only after QEMU is started.
NB, the memory hotplug suffers the same problem, but that will
be fixed in the next commit.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Inside of qemuProcessSetupPid() there's @numatune variable which
is set to vm->def->numa, but it lives only in one block. In the
rest of places the expanded form (vm->def->numa) is used instead.
Move the variable declaration at the beginning of the function
and use it instead of the expanded form.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
We cannot use host-nodes attribute for it, but there is no reason for us
to skip the preallocation optimisation using thread-context in such
case. Thankfully returning the proper nodemask from
qemuBuildMemoryBackendProps is enough to trigger this.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: 720e8f13ff
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: 1347a19f75
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: c6c9b5d251
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: b10bc8f7ab
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
We already do check that if there's <memory mode='restrictive'/>
then all <memnode/> have to be of 'restrictive' mode too. But
what we are missing the reverse: if there is <memnode/> with
'restrictive' mode, then the <memory/> has to be of the same mode
too.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2208946
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When parsing a <memnode/> we also check whether the @mode
argument fulfills some requirements wrt 'restrictive' mode. This
is not the right place though. There's virDomainNumaDefValidate()
which contains other checks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The virDomainNumatuneNodeSpecified() function does not write into
passed @numatune pointer, it just reads from it. Therefore, the
argument should be const, which allows this function to be called
from places where virDomainNuma is already const (e.g. domain
validation code).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Add new compress methods zlib and zstd for parallel migration,
these method should be used with migration option --comp-methods
and will be processed in 'qemuMigrationParamsSetCompression'.
Note that only one compress method could be chosen for parallel
migration and they cann't be used in compress migration.
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The parser makes the values mandatory and also the qemu code implements
actions for those values. The formatter skips them though. Since
format+parse is used to copy the XML at startup a definition with those
values can't be started.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2203709
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This is pretty trivial, just append "mte=on/off" to -machine
arguments.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The MTE feature is not supported by all QEMUs, only those with
QEMU_CAPS_MACHINE_VIRT_MTE capability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The MTE feature (introduced in QEMU commit of v5.1.0-rc1~8^2~11)
is detectable via 'qom-list-properties' for 'virt' machine type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The Memory Tagging Extensions are hardware acceleration present
in some ARM processors that allow memory error detection [1].
Introduce a domain XML knob that turns them on or off.
1: https://www.arm.com/blogs/blueprint/memory-safety-arm-memory-tagging-extension
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainGetMemLockLimitBytes() with @forceVFIO set. All
callers pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainAdjustMaxMemLock() with @forceVFIO set. All callers
pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
During hotplug of a NVMe disk we need to adjust the memlock
limit. The computation of the limit is handled by
qemuDomainGetMemLockLimitBytes() which looks at given domain
definition and accounts for various device types (as different
types require different amounts). But during disk hotplug the
disk is not added to domain definition until the very last
moment. Therefore, qemuDomainGetMemLockLimitBytes() has this
@forceVFIO argument which tells it to assume VFIO even if there
are no signs of VFIO in domain definition. And this kind of
works, until the amount needed for NVMe disks changed (in
v9.3.0-rc1~52). What's missing in the commit is making @forceVFIO
behave the same as if there was an NVMe disk present in the
domain definition.
But, we can do even better - just mimic whatever we're doing for
hostdevs. IOW - introduce qemuDomainAdjustMaxMemLockNVMe() that
behaves the same as qemuDomainAdjustMaxMemLockHostdev().
There are subtle differences though:
1) qemuDomainAdjustMaxMemLockHostdev() can afford placing hostdev
right at the end of vm->def->hostdevs, because the array was
already reallocated (at the beginning of
qemuDomainAttachHostPCIDevice()). But
qemuDomainAdjustMaxMemLockNVMe() doesn't have that luxury.
2) qemuDomainAdjustMaxMemLockHostdev() places a
virDomainHostdevDef pointer into domain definition, while
qemuDomainStorageSourceAccessModifyNVMe() (which calls
qemuDomainAdjustMaxMemLock()) sees a virStorageSource pointer
but domain definition contains virDomainDiskDef. But that's
okay, we can create a dummy disk definition and append it into
the domain definition.
After this, qemuDomainAdjustMaxMemLock() can be called with
@forceVFIO = false, as the disk is now part of domain definition
(when computing the new limit).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030#c28
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
One of our examples in the 'formatbackup.rst' page shows following
config:
<disk name='vda' backup='yes'/>
The schema didn't allow it though. Fix the schema as the internals were
supposed to support it (except for the bug fixed in previous patches).
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If the 'disk->store' property is already allocated which happens e.g.
when the disk is described by the backup XML but the optional filename
is not filled in 'virDomainBackupDefAssignStore' would not fill in the
default location.
Fix the logic to do it also if a 'virStorageSource' categorizes as
empty.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reflect the new default value, and explain that a runtime
lookup will be performed if the value is not an absolute path.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we're performing the lookup at runtime, doing it at
build time is no longer necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Don't bother looking at /usr/libexec, since every distro
ships dbus-daemon in $PATH.
Note that it's still possible for the administrator to prevent
this lookup and use an arbitrary binary by setting the
appropriate key in qemu.conf.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reflect the new default value, and explain that a runtime
lookup will be performed if the value is not an absolute path.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Now that we're performing the lookup at runtime, doing it at
build time is no longer necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Use the recently introduced virFindFileInPathFull() function to
discover the path for qemu-bridge-helper and qemu-pr-helper at
runtime.
Note that it's still possible for the administrator to prevent
this lookup and use arbitrary binaries by setting the
appropriate keys in qemu.conf: this simply removes the need to
perform the lookup at build time, and thus to have the helpers
installed in the build environment.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The virfirewalld.h file provides a declaration for
virFirewallDApplyRule() which accepts an argument of type
virFirewallLayer. But the typedef lives in virfirewall.h and thus
including just virfirewalld.h is not sufficient.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Allow users controlling the multi-channel mode by adding a
'multichannel' property parsed for USB audio devices and wire up the
support in the qemu driver.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/472
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When QEMU closes the monitor suddenly, the following error
message is reported:
internal error: qemu unexpectedly closed the monitor: ...
And this works. But other error messages produced in the same
function include domain name too. Do that for the unexpectedly
closed monitor message too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
For all our daemons, we provide VIRXXXD_ARGS env var in the unit
file. The variable can then be overridden in corresponding file:
EnvironmentFile=-@initconfdir@/virtxxxd
The daemon is then executed as:
ExecStart=@sbindir@/virtxxxd $VIRTXXXD_ARGS
But virtlogd is exception, for no good reason. And while there
are probably no arguments we want to pass to virtlogd by default,
just mimic what we do for say virtlockd, where we also don't pass
any default argument.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The regular VM startup code first calls the setup of the disk backing
chain as defined in the XML and then calls the function to load the
rest of the backing chain from the image metadata. The hotplug code
did it the other way around, thus causing a failure when attempting
to attach a QCOW2 image via FD passing.
Reorder the hotplug code to have the same order.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2193315
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that a version of GLib that contains the fix has been
released, it's more useful to record that information. Adding
a TODO annotation makes the whole thing easily greppable.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The spelling is slightly different from another otherwise
identical error message in the same file.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
At this moment it is not possible to launch a 'riscv64' domain if a CPU
definition is presented in the domain. For example, adding this CPU
definition:
<cpu mode='custom' match='exact' check='none'>
<model fallback='forbid'>rv64</model>
</cpu>
Will trigger the following error:
$ sudo ./run tools/virsh start riscv-virt1
error: Failed to start domain 'riscv-virt1'
error: this function is not supported by the connection driver:
cannot update guest CPU for riscv64 architecture
The error comes from virCPUUpdate(), via qemuProcessUpdateGuestCPU(),
and it's caused by the absence of the 'update' API in the existing
RISC-V driver.
Add an 'update' API impl to the RISC-V driver to allow for CPU
definitions to be declared in RISC-V domains. This API was copied from
the ARM driver (virCPUarmUpdate()) since it's a good enough
implementation to get us going.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Implement the support for the persisted poll parameters and remove
restrictions on saving config when modifying them during runtime.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Currently we allow configuring the 'poll-max-ns', 'poll-grow', and
'poll-shrink' parameters of qemu iothreads only during runtime and they
are not persisted. Add XML machinery to persist them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Convert the internal types to unsigned long long. Luckily we can also
covert the external types too:
- 'qemuDomainSetIOThreadParams' can accept both _UINT and _ULLONG by
converting to 'virTypedParamsGetUnsigned'
- querying is handled via the bulk stats API which is flexible:
- we use virTypedParamListAddUnsigned to use the bigger type only if
necessary
- most users don't even notice because the bindings abstract the
data types
Apart from the code modifications we also improve the documentation
which was missing for the setters.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU accepts even values bigger than INT_MAX. The reasoning for these
checks was that the QAPI definition declares them as 'int', but in QAPI
terms that's any number as it's JSON.
Remove the validation as well as the comment misinterpreting the QAPI
definiton.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For certain typed parameters we want to extend the supproted range by
switching to VIR_TYPED_PARAM_ULLONG. To preserve compatibility we've
added APIs such as 'virTypedParamsGetUnsigned' and
'virTypedParamListAddUnsigned' which automatically select the bigger
type if necessary.
This patch adds a new internal macro VIR_TYPED_PARAM_UNSIGNED which
is used with virTypedParamsValidate to allow both types and adjusts the
code to handle it properly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic memory cleanup for the 'keys' and 'sorted' helpers and
remove the 'cleanup' label. Since this patch is modifying variable
declarations ensure that all declarations conform with our coding style.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add an internal helper for fetching a typed parameter which can be
either of the '_UINT' or '_ULONG' type and store it in a unsigned long
long variable.
Since this is an internal helper it offers less protections against
invalid use compared to those we expose as public API.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The new helper adds a unsigned value, stored as _UINT if it fits into
the type and stored as _ULLONG otherwise.
This is useful for the statistics code which is quite tolerant to
changes in type in cases when we'll need more range for the value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function now return always 0. Refactor the code and remove return
values.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The only non-abort()-ing error which can happen is if the field name is
too long. Store the overly long name in the virTypedParamList container
so that in upcoming patches the helpers adding to the list can be
refactored to not have a return value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Return the number of parameters via pointer passed as argument to free
up possibility to report errors. Strangely all callers actually use
'int' as type for storing the count of elements, thus this function will
use the same.
The function is also renamed to virTypedParamListSteal.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Introduce a helper that fetches the typed parameters from the list while
still preserving ownership of the pointer by the list.
In the future this will be also able to report errors stored in the
list.
Signed-off-by: Peter Krempa <pkrempa@redhat.com
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The struct will be made private in upcoming patches. Construct the list
of block entries into a separate list and append them rather than
remember the index of the count element.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Introduce a helper function to concatenate two virTypedParamLists. This
will allow us to refactor qemuDomainGetStatsBlock to not access the list
directly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Add an allocator function and refactor all allocations to use it. In
upcoming patches 'struct _virTypedParamList' will be made private.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The header uses both styles randomly, switch it to the contemporary
style.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Don't check the return value of 'virTypedParamListExtend' which will
always be a valid pointer and 'virTypedParameterAssignValue' always
returns 0.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There are two callers of virTypedParameterAssignValueVArgs.
- 'virTypedParameterAssignValue' always uses the correct type, thus
doesn't need to be modified. Just use the proper type in the function
declaration
- 'virTypedParameterAssign' can get improper type, but we can move the
validation into it decreasing the scope in which failures need to be
propagated.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
All changed lines even fit into 80 columns.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Ensure that all switch statements in this module use the proper type in
switch() statements to ensure complier protections.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
All callers pass 'true'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Repeatedly querying an SR-IOV PCI device's capabilities exposes a
memory leak caused by a failure to free the virPCIVirtualFunction
array within the parent struct's g_autoptr cleanup.
Valgrind output after getting a single interface's XML description
1000 times:
==325982== 256,000 bytes in 1,000 blocks are definitely lost in loss record 2,634 of 2,635
==325982== at 0x4C3C096: realloc (vg_replace_malloc.c:1437)
==325982== by 0x59D952D: g_realloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
==325982== by 0x4EE1F52: virReallocN (viralloc.c:52)
==325982== by 0x4EE1FB7: virExpandN (viralloc.c:78)
==325982== by 0x4EE219A: virInsertElementInternal (viralloc.c:183)
==325982== by 0x4EE23B2: virAppendElement (viralloc.c:288)
==325982== by 0x4F65D85: virPCIGetVirtualFunctionsFull (virpci.c:2389)
==325982== by 0x4F65753: virPCIGetVirtualFunctions (virpci.c:2256)
==325982== by 0x505CB75: virNodeDeviceGetPCISRIOVCaps (node_device_conf.c:2969)
==325982== by 0x505D181: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3099)
==325982== by 0x505BC4E: virNodeDeviceUpdateCaps (node_device_conf.c:2677)
==325982== by 0x260FCBB2: nodeDeviceGetXMLDesc (node_device_driver.c:355)
Signed-off-by: Tim Shearer <tshearer@adva.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After previous cleanups, qemuHostdevPreparePCIDevices() no longer
needs virQEMUCaps. Drop its passing from callers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there are some functions that do nothing:
qemuConnectDomainXMLToNativePrepareHostHostdev()
qemuConnectDomainXMLToNativePrepareHost()
qemuProcessPrepareHostHostdev()
qemuProcessPrepareHostHostdevs()
Remove them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When preparing a SCSI <hostdev/> with passthrough of a host SCSI
adapter (i.e. no protocol), a virStorageSource structure is
initialized and stored inside virDomainHostdevDef. But the source
structure is filled in many places, with almost the same code.
Firstly, qemuProcessPrepareHostHostdev() and
qemuConnectDomainXMLToNativePrepareHostHostdev() are the same.
Secondly, qemuDomainPrepareHostdev() allocates the src structure,
only to let qemuProcessPrepareHostHostdev() fill src->path later.
Well, src->path can be filled at the same place where the src
structure is allocated (qemuDomainPrepareHostdev()) which renders
the other two functions needless.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There is no way the qemuDomainAttachHostPCIDevice() function can
be called over a hostdev with PCI backend other than VFIO. And
even if it were, then the check is written so poorly that it lets
some types through (e.g. KVM) only to let
qemuBuildPCIHostdevDevProps() called afterwards fail properly.
Drop this check and rely on qemuDomainPrepareHostdevPCI() (and
worst case scenario even qemuBuildPCIHostdevDevProps()) to report
the proper error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
We used to support KVM and VFIO style of PCI assignment. The
former was dropped in v5.7.0-rc1~103 and thus we only support
VFIO. All other backends lead to an error (see
qemuBuildPCIHostdevDevProps(), or qemuBuildPCIHostdevDevStr() as
it used to be called in the era of aforementioned commit).
Might as well report the error in prepare phase and save hassle
of proceeding with device preparation (e.g. in case of hotplug
overriding the device's driver, setting seclabels, etc.).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
virsh command domxml-to-native failed with below error but start
command succeed for same domain xml.
"internal error: invalid PCI passthrough type 'default'"
If a <hostdev> PCI backend is not set in the XML, the supported
one is then chosen in qemuHostdevPreparePCIDevicesCheckSupport().
But this function is not called anywhere from
qemuConnectDomainXMLToNative(). But qemuDomainPrepareHostdev()
is. And it is also called from domain startup/hotplug code.
Therefore, move the backend setting to the common path and drop
qemuHostdevPreparePCIDevicesCheckSupport().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
So far, qemuDomainPrepareHostdev() is a NOP for anything but a
SCSI hostdev. This will change soon. Therefore, move the SCSI
hostdev preparation into a separate function
(qemuDomainPrepareHostdevSCSI()) and make
qemuDomainPrepareHostdev() call function corresponding to the
hostdev type (or nothing if the type doesn't need any
preparation).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When attaching a hostdev of a SCSI subsys,
qemuDomainPrepareHostdev() is called. This makes sense because
the function prepares just SCSI hostdevs ignoring others. But
this will soon change. Thefore, move the function call out of
qemuDomainAttachHostSCSIDevice() and into
qemuDomainAttachHostDevice().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Treat:
<maxphysaddr mode="emulate"/>
as a request not to take the maximum address size from the host.
This is useful if QEMU changes the default.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In a few places, where a capabilities <hostdev/> is processed, a
wrong union member is access: def->source.subsys.type instead of
def->source.caps.type. Fortunately, both union members have .type
as the very first member so no real harm is done. Nevertheless,
we should access the correct union member.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Usually, we want a function to be as reusable as possible. But in
this specific case, when it's used just once we don't need that.
The lxcCreateHostdevDef() function is meant to create a hostdev.
The first argument selects the hostdev mode (caps/subsys) and the
second argument selects the type of hostdev (NET/STORAGE/MISC).
But because of how the function is written, it's impossible to
create a subsys hostdev as the function sets
hostdev->source.caps.type, regardless of mode. So the @mode
argument can be dropped.
Then, the function is called from one place and one place only.
And in there, VIR_DOMAIN_HOSTDEV_CAPS_TYPE_NET is passed for
@type so we can drop that argument too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After previous cleanups a lot of functions from qemu_hotplug.c
are called only within the file. Make them static and drop their
declarations from the header file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There is no good reason for qemuDomainUpdateDeviceLive() to live
in (ever growing) qemu_driver.c while we have qemu_hotplug.c
which already contains the rest of hotplug code. Move the
function to its new home.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
There is no good reason for qemuDomainAttachDeviceLive() to live
in (ever growing) qemu_driver.c while we have qemu_hotplug.c
which already contains the rest of hotplug code. Move the
function to its new home.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The qemuDomainUpdateDeviceLive() accepts virDomainPtr as one of
its arguments, but use it only to get QEMU driver out of it.
Well, the only caller already does that and thus can pass it
instead of virDomainPtr.
This also makes it look like the rest of device hot(un-)plug
functions: qemuDomainAttachDeviceLive() and
qemuDomainUpdateDeviceLive().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
A new enum type "Default" has been added for Input bus.
The logic that handled default input bus types in
virDomainInputParseXML() has been moved to a new function
virDomainInputDefPostParse() in domain_postparse.c
Link to Issue: https://gitlab.com/libvirt/libvirt/-/issues/8
Signed-off-by: K Shiva <shiva_kr@riseup.net>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The disk private data contain information about the tray and
removability of the disk. Until recently we didn't support hotplug of
removable disks thus it wasn't a problem but now when you can hotplug a
CDROM you would not be able to open its tray.
Fix it by updating the hotplugged disk the same way we do at startup.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2160435
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Extract the logic to update one single disk (without emitting any
events) so that it can be reused when updating the state after a disk
hotplug.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code compares the 'tray_open' boolean from 'struct
qemuDomainDiskInfo' directly against 'disk->tray_status' which is
declared as virDomainDiskTray (enum). Now the logic works correctly
because the _OPEN enum has value '1'.
Separate the event emission code from the update code and remember the
old tray state in a separate variable rather than having the sneaky
logic we have today.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We exit early if poolOptions->formatToString is false.
Fixes: 9dadc73029
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In our meson scripts, we use configure_file(copy:true) to copy
files from srcdir into builddir. However, as of meson-0.64.0,
this is deprecated [1] in favor of using:
fs = import('fs')
fs.copyfile(in, out)
Except, the submodule's new method wasn't introduced until
0.64.0. And since we can't bump the minimal meson version we
require, we have to work with both: new and old versions.
Now, the fun part: fs.copyfile() is not a drop in replacement as
it returns different type (a custom_target object). This is
incompatible with places where we store the configure_file()
retval in a variable to process it further.
While we could just replace 'copy:true' with a dummy
'configuration:...' (say 'configuration: configmake_conf') we
can't do that for binary files (like src/fonts/ or src/images/).
Therefore, places where we are not interested in the retval can
be switched to fs.copyfile() and places where we are interested
in the retval will just use a dummy 'configuration:'.
Except, src/network/meson.build. In here we not just copy the
file but also specify alternative install dir and that's not
something that fs.copyfile() can handle. Yet, using 'copy: true'
is viewed wrong [2].
1: https://mesonbuild.com/Release-notes-for-0-64-0.html#fscopyfile-to-replace-configure_filecopy-true
2: https://github.com/mesonbuild/meson/pull/10042
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
With cgroupv2 this has better effect on the resource allocation. An
excerpt from Documentation/admin-guide/cgroup-v2.rst explains is this
way:
Migrating a process across cgroups is a relatively expensive operation
and stateful resources such as memory are not moved together with the
process. This is an explicit design decision as there often exist
inherent trade-offs between migration and various hot paths in terms
of synchronization cost.
[...]
Setting a non-empty value to "cpuset.mems" causes memory of
tasks within the cgroup to be migrated to the designated nodes if
they are currently using memory outside of the designated nodes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Most of them are platform devices and only i6300esb can be plugged
multiple times into different PCI slots.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This makes it also work during attach. Also add a test for attaching a
watchdog with incompatible action.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2187278
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The loop initially skipped the first one because it was mainly checking
the incompatible actions, but was then modified to also check the
duplicity of iTCO watchdogs.
While at it change the type of the iteration variable to the usual size_t.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2187133
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We can launch qemu with it, but it will not work since it's not even
probed by the kernel at the mapped address with different machine types
since they are expected to be connected to ISA and not even its newer
LPC counterpart found on q35. And it does not exist on non-x86
architectures.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When starting QEMU, or when hotplugging a PCI device QEMU might
lock some memory. How much? Well, that's an undecidable problem.
But despite that, we try to guess. And it more or less works,
until there's a counter example. This time, it's a guest with
both <hostdev/> and an NVMe <disk/>. I've started a simple guest
with 4GiB of memory:
# virsh dominfo fedora
Max memory: 4194304 KiB
Used memory: 4194304 KiB
And here are the amounts of memory that QEMU tried to lock,
obtained via:
grep VmLck /proc/$(pgrep qemu-kvm)/status
1) with just one <hostdev/>
VmLck: 4194308 kB
2) with just one NVMe <disk/>
VmLck: 4328544 kB
3) with one <hostdev/> and one NVMe <disk/>
VmLck: 8522852 kB
Now, what's surprising is case 2) where the locked memory exceeds
the VM memory. It almost resembles VDPA. Therefore, treat is as
such.
Unfortunately, I don't have a box with two or more spare NVMe-s
so I can't tell for sure. But setting limit too tight means QEMU
refuses to start.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This is a relic of commit v3.7.0-rc1~132 when getter/setter APIs
for dnsmasq's PID were introduced. Previously, obj->dnsmasqPid
was accessed directly. But the aforementioned commit introduced
two calls to virNetworkObjGetDnsmasqPid() even though the result
of the first call is stored in a variable.
Remove the second call as it's unnecessary.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Throughout all of our network driver code we assume that
dnsmasqPid of value -1 means the network has no dnsmasq process
running. There are plenty of calls to:
virNetworkObjSetDnsmasqPid(obj, -1);
or:
pid_t dnsmasqPid = virNetworkObjGetDnsmasqPid(obj);
if (dnsmasqPid > 0) ...;
Now, a virNetworkObj is created via virNetworkObjNew() which
might as well set this de-facto default value.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Assume there's a dnsmasq running (because there's an active
virtual network that spawned it). Now, shut down the daemon,
remove the dnsmasq binary and start the daemon again. At this
point, networkUpdateState() is called, but dnsmasq_caps is NULL
(because networkStateInitialize() called earlier failed to set
them, rightfully though).
Now, the networkUpdateState() tries to read the dnsmasq's PID
file using virPidFileReadIfAlive() which takes a path to the
corresponding binary as one of its arguments. To provide that
path, dnsmasqCapsGetBinaryPath() is called, but since
dnsmasq_caps is NULL, it dereferences it and thus causes a crash.
It's true that virPidFileReadIfAlive() can deal with a removed
binary (well virPidFileReadPathIfAlive() which it calls can), but
iff the binary path is provided in its absolute form. Otherwise,
virFileResolveAllLinks() fails to canonicalize the path
(expected, the path doesn't exist anyway).
Therefore, reading dnsmasq's PID file didn't work before
v8.1.0-rc1~401 which introduced this crash. It was always set to
-1. But passing NULL as binary path instead, makes
virPidFileReadIfAlive() return early, right after the PID file is
read and it's confirmed the PID exists.
Yes, this may yield wrong results, as the PID might be of a
completely different binary. But this problem is preexistent and
until we start locking PID files, there's nothing we can do about
it. IOW, it would require rework of dnsmasq PID file handling.
Fixes: 4b68c982e2
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/456
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It's quite difficult, if not impossible, to create a working RISC-V VMs
using the current default machine type of 'spike_v1.10'. Change the
default to the more appropriate and virtualization friendly 'virt'
machine type.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
It's quite difficult, if not impossible, to create a usable ARM VMs
using the current default machine type of 'integratorcp'. Change the
default to the more appropriate and virtualization friendly 'virt'
machine type.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
I've tried, then I've tried even harder, but still wasn't able to
make sense of our console backcompat code in all its fine
details. Since I value my sanity, let's just forbid hotunplug of
<console/>, especially since detaching of corresponding <serial/>
works.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When cleaning up after removed device, qemuDomainChrRemove() is
called. But this may fail, in which case we successfully ignore
the failure and virDomainChrDefFree() the device anyway. While it
decreases our memory consumption, it's a bit too far, especially
if the next step is 'virsh dumpxml'. Then our memory consumption
decreases all the way down to zero as we crash.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For a running guest, a <serial/> device can be hotunplugged. This
will then remove also aliased <console/>. Trying to hotplug a
<console/> device then, libvirtd crashed because it dereferences
def->consoles while there's none.
Fixes: 42d53ac799
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When removing the compat console from domain defintion, removing
it from the vmdef->consoles array is good, but not sufficient.
The console definition might have been fully allocated (after
daemon restarted and reloaded the status XML). Use
virDomainChrDefFree() to free also the definition.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When hotpluging a <serial/> device, we might need to add a
<console/> device with it (because of some crazy backcompat).
Now, hotplugging is done in several phases. In one of them,
qemuDomainChrPreInsert() allocates space for both devices, and
then qemuDomainChrInsertPreAlloced() actually inserts the device
into domain definition and sets up the <console/> device with it.
Except, the condition that checks whether to create the aliased
<console/> is wrong as it compares nconsoles against 0.
Surprisingly, qemuDomainChrInsertPreAllocCleanup() doesn't suffer
from the same error.
Fixes: daf51be5f1
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
That's already the case in practice, but it's a better
experience for the user if we reject this configuration
outright instead of silently ignoring part of it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Follow better meson build system conventions. This allows to find
keymap-gen or CSV without explicitly setting the paths.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function does not exist on win32.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
meson wraps python scripts already on win32, so we end up with these
failing commands:
[1/359] "C:/msys64/ucrt64/bin/meson" "--internal" "exe" "--capture" "src/util/virkeycodetable_atset1.h" "--" "sh" "C:/msys64/home/marca/src/libvirt/scripts/meson-python.sh" "C:/msys64/ucrt64/bin/python3.EXE" "python" "C:/msys64/home/marca/src/libvirt/src/keycodemapdb/tools/keymap-gen" "code-table" "--lang" "stdc" "--varname" "virKeyCodeTable_atset1" "C:/msys64/home/marca/src/libvirt/src/keycodemapdb/data/keymaps.csv" "atset1"
FAILED: src/util/virkeycodetable_atset1.h
"C:/msys64/ucrt64/bin/meson" "--internal" "exe" "--capture" "src/util/virkeycodetable_atset1.h" "--" "sh" "C:/msys64/home/marca/src/libvirt/scripts/meson-python.sh" "C:/msys64/ucrt64/bin/python3.EXE" "python" "C:/msys64/home/marca/src/libvirt/src/keycodemapdb/tools/keymap-gen" "code-table" "--lang" "stdc" "--varname" "virKeyCodeTable_atset1" "C:/msys64/home/marca/src/libvirt/src/keycodemapdb/data/keymaps.csv" "atset1"
If LC_ALL, LANG and LC_CTYPE need to be set, it would probably be better
to use a meson environment() instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>