There are some cases left after previous commit which were not
picked up by coccinelle. Mostly, becuase the spatch was not
generic enough. We are left with cases like: two variables
declared on one line, a variable declared in #ifdef-s (there are
notoriously difficult for coccinelle), arrays, macro definitions,
etc.
Finish what coccinelle started, by hand.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
This is a more concise approach and guarantees there is
no time window where the struct is uninitialized.
Generated using the following semantic patch:
@@
type T;
identifier X;
@@
- T X;
+ T X = { 0 };
... when exists
(
- memset(&X, 0, sizeof(X));
|
- memset(&X, 0, sizeof(T));
)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
When a VSERPORT_CHANGE event is processed, we firstly do a little
detour and try to detect whether the event is coming from guest
agent. If so, we notify threads that are currently talking to the
agent about this fact. Then we proceed with usual event
processing (BeginJob(), update domain def, emit event, and so
on).
In both cases we use the same @dev variable to refer to domain
device. While this works, it will make writing semantic patch
unnecessary harder (see next commit(s)). Therefore, introduce a
separate variable for the detour code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
There are couple of variables that are declared at function
beginning but then used solely within a block (either for() loop
or if() statement). And just before their use they are zeroed
explicitly using memset(). Decrease their scope, use struct zero
initializer and drop explicit memset().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
When I implemented passt support in libvirt, I saw the --mac-addr
option on the passt commandline, immediately assumed that this was
used for setting the guest interface's mac address somewhere within
passt, and read no further. As a result, "--mac-addr" is always added
to the passt commandline, specifying the setting from <mac
addr='blah'/> in the guest's interface config.
But as pointed out in this bugzilla comment:
https://bugzilla.redhat.com/2184967#c8
That is *not at all* what passt's --mac-addr option does. Instead, it
is used to force the *remote* mac address for incoming traffic to a
specific value. So setting --mac-addr results in all traffic on the
interface having the same (the guest's) mac address for both source
and destination in all traffic. Surprisingly, this still works, so
nobody noticed it during testing.
The proper thing is to not specify any mac address to passt - the
remote MAC addresses can and should remain untouched, and the local
MAC address will end up being known to passt just by the guest sending
out packets with that MAC address.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
This reverts commit 8511b96a31.
Turns out, we need to do a bit more than just plain
qemuSecurityDomainSetPathLabel() which sets svirt_image_t. Passt
has its own SELinux policy and as a part of that they invent
passt_log_t for log files. Right now, I don't know how libvirt
could query that and even if I did, passt SELinux policy would
need to permit relabelling from svirt_t to passt_log_t, which it
doesn't [1].
Until these problems are addressed we shouldn't be pre-creating
the file as it puts users into way worse position - even
scenarios that used to work don't work. But then again - using
log file for passt is usually valuable for developers only and
not regular users.
1: https://bugzilla.redhat.com/show_bug.cgi?id=2209191#c10
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This reverts commit 83686f1eea.
This is needed only so that the next revert is clean.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When automatically adding a NUMA node (qemuDomainDefNumaAutoAdd()) the
memory size of the node is computed as:
total_memory - sum(memory devices)
And we have a nice helper for that: virDomainDefGetMemoryInitial() so
it looks logical to just call it. Except, this code runs in post parse
callback, i.e. memory sizes were not validated and it may happen that
the sum is greater than the total memory. This would be caught by
virDomainDefPostParseMemory() but that runs only after driver specific
callbacks (i.e. after qemuDomainDefNumaAutoAdd()) and because the
domain config was changed and memory was increased to this huge
number no error is caught.
So let's do what virDomainDefGetMemoryInitial() would do, but
with error checking.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2216236
Fixes: f5d4f5c8ee
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
If a per-domain SWTPM state directory exists but is empty our
code still considers it a valid state and skips running
'swtpm_setup' (handled in qemuTPMEmulatorRunSetup()).
While we should not try to inspect individual files created by
swtpm, we can still consider empty folder as non-existent state.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/320
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that we don't use it for probing at all we can remove all the
corresponding monitor code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability code now probes the presence of commands from the QMP
schema instead of using 'query-commands'. Don't call the command and
adjust the '.replies' files.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Move the probing code to extract the data from the QMP schema rather
than invoking 'query-commands'. This patch doesn't yet remove the actual
invocation of 'query-commands', just moves the actual probing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The support for configuring the 'wwn' of a IDE disk was added in qemu
commit 95ebda85e09 (v1.0-1869-g95ebda85e0) and can't be compiled
out.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The support for configuring the 'wwn' of a SCSI disk was added in qemu
commit 27395add759ff4caeb0 (v1.0-3326-g27395add75) and can't be compiled
out.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All backing chain members which were auto-added by image detection,
including the terminating element, should have the 'detected' property
set to true. This is needed to properly strip the detected elements in
some cases, e.g. for the status XML where we could treat some images as
manually terminated even when it was auto-detected.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rewrap argument definition of qemuDomainSaveInternal and align argument
in the invocation of the aforementioned function in
qemuDomainManagedSaveHelper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After the previous commit we no longer require that logind is actually
running, it merely has to be activatable.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since systemd 240, all services get an open file hard limit of
500k, and a soft limit of 1024. This limit means apps are safe
to use select() by default which is limited to 1024 FDs. Apps
which don't use select() are expected to simply set their soft
limit to match the hard limit during startup.
With our current unit file settings we've been effectively
reducing the max open files we have on most modern systems.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
All services are ordered after local-fs.target unless they have set
DefaultDependencies=no, which we do not do.
https://gitlab.com/libvirt/libvirt/-/issues/489
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The virtio-gpu 'blob' support was insufficiently validated. Qemu
requires a memfd memory backing in order to use udmabuf and enable blob
support. Example error:
$ virsh start rhel9
error: Failed to start domain 'rhel9'
error: internal error: qemu unexpectedly closed the monitor: 2023-07-18T02:33:57.083178Z qemu-kvm: -device {"driver":"virtio-vga","id":"video0","max_outputs":1,"blob":true,"bus":"pcie.0","addr":"0x1"}: cannot enable blob resources without udmabuf
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Historically, the way to set PC speaker for a guest was to pass:
-soundhw pcspk
but as of QEMU commit v5.1.0-rc0~28^2~3 this is deprecated and we
should use:
-machine pcspk-audiodev=$id
instead. The old way was then removed in commit v7.1.0-rc0~99^2~3.
Now, ideally we would have a capability selecting whether we talk
to a QEMU that understands the new way or not. But it's not that
simple - the machine attribute is just an alias to the .audiodev=
attribute of 'isa-pcspk' object and both are created in
pc_machine_initfn() function, i.e. not then the PC_MACHINE() class
is initialized, but when it's instantiated. IOW, it's not possible
for us to query whether we're dealing with older or newer QEMU.
But given that the newer version is supported since v5.1.0 and the
minimal version we require is v4.2.0 (i.e. there are two releases
which don't understand the newer cmd line) and how frequently this
feature is (un-)used (the issue was reported after ~1 year since it
stopped working), I believe we can live without any capability and
just use the newer cmd line unconditionally.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/490
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Now that the QEMU_CAPS_USB_STORAGE_REMOVABLE capability is no
longer used we can stop querying it and retire it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Introduced in QEMU commit of v0.14.0-rc0~83^2~1 and not being
able to compile the .removable attribute of the "usb-storage"
object out, renders our corresponding capability
QEMU_CAPS_USB_STORAGE_REMOVABLE always set. Stop using it in
command generation / domain validation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit be1b7d5b18 introduced parsing /proc/cpuinfo for "address size"
which is not including on S390 and therefore reports an internal error.
Lets remove the parsing on S390.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add async-teardown to the features list in domain capabilities allowing
high level management to introspect the availability of the asynchronous
teardown feature.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Up until v2.11.0-rc2~19^2~3 QEMU used to require at least one
NUMA node to be configured when memory hotplug was enabled. After
that commit, QEMU automatically adds a NUMA node if none was
specified on the cmd line. Reflect this in domain XML, i.e.
explicitly add a NUMA node into our domain definition if needed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2216236
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
If a domain has NUMA configured, then all <memory/> devices
(except for 'virtio-pmem') need to have targetNode set. There are
two checks inside of qemuDomainDefValidateMemoryHotplugDevice()
for this: one inside of big switch() statement, which only checks
'dimm' and 'nvdimm' cases, and the other at the end of the
function that checks all models (except for 'virtio-pmem'). Let's
keep the latter and remove the former as the latter covers the
former too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Asynchronous teardown can be specified if the QEMU binary supports it by
adding in the domain XML
<features>
...
<async-teardown enabled='yes|no'/>
...
</features>
By default this new feature is disabled.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
QEMU capability is looking in query-command-line-options response for
...
{
"parameters": [
{
"name": "async-teardown",
"type": "boolean"
}
],
"option": "run-with"
}
...
allow to use the QEMU option -run-with async-teardown=on|off
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Allow //disk/target@removable for scsi disk devices, since QEMU has support
the removable attribute for scsi-hd device from v0.14.0[1].
[1]: 419e691f8e: scsi-disk: Allow overriding SCSI INQUIRY removable bit
Signed-off-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If VIR_ASYNC_JOB_NONE flag is present, job.current is equal
to NULL, which leads to SIGSEGV. Thus, this check should be
moved up.
Fixes: v8.0.0-427-gf304de0df6
Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In one of my previous commits I've introduced @logfd variable
that was supposed to hold FD of passt logfile. But I've forgot to
assign the qemuDomainOpenFile() retval to it.
Fixes: 8511b96a31
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There are a few situations where passt itself is unable to create
a file because it runs under QEMU user (e.g. just like our
example from formatdomain.rst suggests: /var/log/passt.log). If
libvirtd runs with sufficient permissions (e.g. as root) it can
create the file and set seclabels on it so that passt can then
open it.
Ideally, we would just pass pre-opened FD, but this wasn't viewed
as secure enough [1]. So lets just create the file and set
seclabels.
For the case when both libvirtd and passt have the same
permissions, well then we fail before even needing to fork() and
exec().
1: https://archives.passt.top/passt-dev/20230606225836.63aecebe@elisabeth/
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209191
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
New storage types are not implemented in generators for -drive and the
xen config. Explicitly reject them in case of a programming error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for
removal, it returns true, which means it is the caller
that marked the device for removal is going to remove the
device from domain definition.
But qemuDomainWaitForDeviceRemoval() may still receive
timeout from virDomainObjWaitUntil() which is implemented
by pthread_cond_timedwait() due to an unavoidable race
between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.
And then qemuDomainWaitForDeviceRemoval() will return 0,
thus the caller will not remove the device from domain
definition.
In this situation, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is
no way to remove it from the domain definition.
Solution is to recheck the value of priv->unplug.alias to
determine who is going to remove the device from domain
definition.
Signed-off-by: zuo boqun <zuoboqun@baidu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Qemu 8.1.0 will add discard_no_unref option for qcow2 images.
When this option is enabled (default=false), then it will no longer
unreference clusters when guest does a discard, but it will just free
the blocks (useful for incremental backups for example) and pass the
discard to the lower layer.
This was implemented to avoid fragmentation within the qcow2 image.
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The qcow2 driver allows passing discards to the storage while keeping
the reference of the block, and just marking it as zeroed. This can
decrease the levels of fragmentation of the qcow2 metadata when
discards are enabled.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Memory slots are required only for DIMM-like devices, while other
devices defined via <memory> such as virtio-mem may use the PCI bus and
thus do not require/consume a memory slot.
Fix the validation code to calculate the required count of memory
devices only for DIMMs and NVDIMMs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Specify the memory size by using '-m size=2048k' instead of just '-m 2'.
The new syntax is used when memory hotplug is enabled. To preserve
memory sizing, if memory hotplug is disabled the size is rounded down to
the nearest mebibyte.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The current implementation of virConnectBaselineHypervisorCPU in QEMU
driver can provide a CPU definition that will not work on all hosts in
case they have different maximum physical address size. So when we get
the info from domain capabilities, we need to choose the smallest
physical address size for the computed baseline CPU definition.
https://bugzilla.redhat.com/show_bug.cgi?id=2171860
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We already report the hosts physical address size in host capabilities,
but computing a baseline CPU definition is done from domain
capabilities.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is used to fill an unsigned long long anyway and if it is negative
than there is really an issue somewhere.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The @unionMems argument of qemuProcessSetupPid() function is not
necessary really as all callers pass 'true'. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The unit that cpuset CGroups controller works with is a
thread/process, not individual memory allocations. Therefore,
after we've set cpuset.mems for emulator (after previous commit
it's set to union of all host NUMA nodes allowed for given
domain), and as we try to set up cpuset.mems for vCPUs/IOThreads,
memory is migrated to selected NUMA node(s). We are effectively
saying: "this thread (vCPU thread) can have memory only from
these NUMA node(s)".
That's not really what we want though. The cpuset controller
doesn't differentiate memory "belonging" to the emulator thread
and vCPU thread or IOThread even.
Therefore, set union of all allowed host NUMA nodes, just like
we're doing for the emulator thread.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In ideal world, my plan was perfect. We allow union of all host
nodes in cpuset.mems and once QEMU has allocated its memory, we
'fix up' restriction of its emulator thread by writing the
original value we wanted to set all along. But in fact, we can't
do it because that triggers memory movement. For instance,
consider the following <numatune/>:
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="1" mode="strict" nodeset="1"/>
</numatune>
<numa>
<cell id="0" cpus="0-1" memory="1024000" unit="KiB" />
<cell id="1" cpus="2-3" memory="1048576" unit="KiB"/>
</numa>
This is meant to create 1:1 mapping between guest and host NUMA
nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it
can allocate memory even for guest node #1 and have the memory
come fro host node #1) and then, set cpuset.mems to "0" (because
that's where we wanted emulator thread to live).
But this in turn triggers movement of all memory (even the
allocated one) to host NUMA node #0. Therefore, we have to just
keep cpuset.mems untouched and rely on .host-nodes passed on the
QEMU cmd line.
The placement still suffers because of cpuset.mems set for vcpus
or iothreads, but that's fixed in next commit.
Fixes: 3ec6d586bc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Every caller will pass 'qdevid' as it's populated in the data
mandatorily with qemu-4.2 and onwards due to mandatory -blockdev use.
Thus we can drop compatibility with the old way of matching the disk via
alias.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Every caller will pass 'qdevid' as it's populated in the data
mandatorily with qemu-4.2 and onwards due to mandatory -blockdev use.
Thus we can drop compatibility with the old way of matching the disk via
alias.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Historically this didn't work with any supported qemu version as we
don't set the alias of the device, and thus qemu uses a different alias
resulting in a failure to startup the VM:
internal error: unable to execute QEMU command 'block_set_io_throttle': Device 'drive-sd-disk0' not found
Refuse setting throttling as this is unlikely to be needed and proper
fix requires using -device instead of -drive if=sd.
Note that this was broken when I moved the setup of throttling as a
command at startup for blockdev integration quite a while ago. Until
then throttling was passed as arguments for -drive.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function doesn't modify it. Fix the argument declaration so that the
function can be used in a context where we have a 'const' disk
definition.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When starting a domain, it's done so in two steps (actually more,
but lets focus on just the following two):
1) qemuProcessPrepareDomain(), followed by
2) qemuProcessPrepareHost().
Now, in the first step (PrepareDomain()), PCI backends for all
hostdevs is set (qemuProcessPrepareDomain() ->
qemuProcessPrepareDomainHostdevs() -> qemuDomainPrepareHostdev()
-> qemuDomainPrepareHostdevPCI()). Perfect.
But then, additional hostdevs may appear, because in the host
prepare phase we may insert some hostdevs into domain definition
(qemuProcessPrepareHost() -> qemuProcessNetworkPrepareDevices()).
Now, these additional hostdevs don't undergo the same prepare as
hostdevs that were already present in the domain definition (i.e.
in qemuProcessPrepareDomain() phase). Therefore, we have to call
corresponding prepare function explicitly.
NB, the interface hotplug code (qemuDomainAttachNetDevice()) does
not suffer from this problem, because it calls top level
qemuDomainAttachHostDevice() which is used to hotplug regular
hostdevs too and as such calls qemuDomainPrepareHostdev().
Fixes: 3b87709c76
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209853
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
It's almost like we've anticipated this. Our XML parser and
formatter handles @address and @dev attributes of <portForward/>
element completely independent of each other. And as of commit
2023_03_29.b10b983~3 passt allows handling these two separately
too. All that's left is generate the cmd line according to this
new fact.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2210287
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This is fairly trivial. Just set .memaddr attribute if a value
was set in the XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2180679
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After a QEMU domain is started, among other thing we query memory
device information. And while memory address is returned by QEMU
for all models, we store it only for DIMMs and NVDIMMs. Do store
it for VIRTIO_MEM and VIRTIO_PMEM too.
This effectively reports the address the virtio-mem/virtio-pmem
is mapped to in live XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Again, this fixes the same problem as one of previous commits,
but this time for memory hotplug. Long story short, if there's a
domain running and the emulator thread is restricted to a subset
of host NUMA nodes, but the memory that's about to be hotplugged
requires memory from a host NUMA node that's not in the set we
need to allow emulator thread to access the node, temporarily.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Consider a domain with two guest NUMA nodes and the following
<numatune/> setting :
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="0" mode="strict" nodeset="1"/>
</numatune>
What this means is the emulator thread is pinned onto host NUMA
node #0 (by setting corresponding cpuset.mems to "0"), and two
memory-backend-* objects are created:
-object '{"qom-type":"memory-backend-ram","id":"ram-node0", .., "host-nodes":[1],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
-object '{"qom-type":"memory-backend-ram","id":"ram-node1", .., "host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
Note, the emulator thread is pinned well before QEMU is even
exec()-ed.
Now, the way memory allocation works in QEMU is: the emulator
thread calls mmap() followed by mbind() (which is sane, that's
how everybody should do it). BUT, because the thread is already
restricted by CGroups to just NUMA node #0, calling:
mbind(host-nodes:[1]); /* made up syntax (TM) */
fails. This is expected though. Kernel was instructed to place
the memory at NUMA node "0" and yet, process is trying to place
it elsewhere.
We used to solve this by not restricting emulator thread at all
initially, and only after it's done initializing (i.e. we got the
QMP greeting) we placed it onto desired nodes. But this had its
own problems (e.g. QEMU might have locked pieces of its memory
which were then unable to migrate onto different NUMA nodes).
Therefore, in v5.1.0-rc1~282 we've changed this and set cgroups
upfront (even before exec()-ing QEMU). And this used to work, but
something has changed (I can't really put my finger on it).
Therefore, for the initialization start the thread with union of
all configured host NUMA nodes ("0-1" in our example) and fix the
placement only after QEMU is started.
NB, the memory hotplug suffers the same problem, but that will
be fixed in the next commit.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Inside of qemuProcessSetupPid() there's @numatune variable which
is set to vm->def->numa, but it lives only in one block. In the
rest of places the expanded form (vm->def->numa) is used instead.
Move the variable declaration at the beginning of the function
and use it instead of the expanded form.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
We cannot use host-nodes attribute for it, but there is no reason for us
to skip the preallocation optimisation using thread-context in such
case. Thankfully returning the proper nodemask from
qemuBuildMemoryBackendProps is enough to trigger this.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: 720e8f13ff
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: 1347a19f75
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: c6c9b5d251
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The QEMU interface is still in a state of flux, and KVM support
has been pulled shortly after having been merged. Let's not
commit to a stable interface in libvirt just yet.
Reverts: b10bc8f7ab
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Add new compress methods zlib and zstd for parallel migration,
these method should be used with migration option --comp-methods
and will be processed in 'qemuMigrationParamsSetCompression'.
Note that only one compress method could be chosen for parallel
migration and they cann't be used in compress migration.
Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
This is pretty trivial, just append "mte=on/off" to -machine
arguments.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The MTE feature is not supported by all QEMUs, only those with
QEMU_CAPS_MACHINE_VIRT_MTE capability.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The MTE feature (introduced in QEMU commit of v5.1.0-rc1~8^2~11)
is detectable via 'qom-list-properties' for 'virt' machine type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The Memory Tagging Extensions are hardware acceleration present
in some ARM processors that allow memory error detection [1].
Introduce a domain XML knob that turns them on or off.
1: https://www.arm.com/blogs/blueprint/memory-safety-arm-memory-tagging-extension
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainGetMemLockLimitBytes() with @forceVFIO set. All
callers pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainAdjustMaxMemLock() with @forceVFIO set. All callers
pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
During hotplug of a NVMe disk we need to adjust the memlock
limit. The computation of the limit is handled by
qemuDomainGetMemLockLimitBytes() which looks at given domain
definition and accounts for various device types (as different
types require different amounts). But during disk hotplug the
disk is not added to domain definition until the very last
moment. Therefore, qemuDomainGetMemLockLimitBytes() has this
@forceVFIO argument which tells it to assume VFIO even if there
are no signs of VFIO in domain definition. And this kind of
works, until the amount needed for NVMe disks changed (in
v9.3.0-rc1~52). What's missing in the commit is making @forceVFIO
behave the same as if there was an NVMe disk present in the
domain definition.
But, we can do even better - just mimic whatever we're doing for
hostdevs. IOW - introduce qemuDomainAdjustMaxMemLockNVMe() that
behaves the same as qemuDomainAdjustMaxMemLockHostdev().
There are subtle differences though:
1) qemuDomainAdjustMaxMemLockHostdev() can afford placing hostdev
right at the end of vm->def->hostdevs, because the array was
already reallocated (at the beginning of
qemuDomainAttachHostPCIDevice()). But
qemuDomainAdjustMaxMemLockNVMe() doesn't have that luxury.
2) qemuDomainAdjustMaxMemLockHostdev() places a
virDomainHostdevDef pointer into domain definition, while
qemuDomainStorageSourceAccessModifyNVMe() (which calls
qemuDomainAdjustMaxMemLock()) sees a virStorageSource pointer
but domain definition contains virDomainDiskDef. But that's
okay, we can create a dummy disk definition and append it into
the domain definition.
After this, qemuDomainAdjustMaxMemLock() can be called with
@forceVFIO = false, as the disk is now part of domain definition
(when computing the new limit).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030#c28
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reflect the new default value, and explain that a runtime
lookup will be performed if the value is not an absolute path.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we're performing the lookup at runtime, doing it at
build time is no longer necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Don't bother looking at /usr/libexec, since every distro
ships dbus-daemon in $PATH.
Note that it's still possible for the administrator to prevent
this lookup and use an arbitrary binary by setting the
appropriate key in qemu.conf.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reflect the new default value, and explain that a runtime
lookup will be performed if the value is not an absolute path.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Now that we're performing the lookup at runtime, doing it at
build time is no longer necessary.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Use the recently introduced virFindFileInPathFull() function to
discover the path for qemu-bridge-helper and qemu-pr-helper at
runtime.
Note that it's still possible for the administrator to prevent
this lookup and use arbitrary binaries by setting the
appropriate keys in qemu.conf: this simply removes the need to
perform the lookup at build time, and thus to have the helpers
installed in the build environment.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Allow users controlling the multi-channel mode by adding a
'multichannel' property parsed for USB audio devices and wire up the
support in the qemu driver.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/472
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When QEMU closes the monitor suddenly, the following error
message is reported:
internal error: qemu unexpectedly closed the monitor: ...
And this works. But other error messages produced in the same
function include domain name too. Do that for the unexpectedly
closed monitor message too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The regular VM startup code first calls the setup of the disk backing
chain as defined in the XML and then calls the function to load the
rest of the backing chain from the image metadata. The hotplug code
did it the other way around, thus causing a failure when attempting
to attach a QCOW2 image via FD passing.
Reorder the hotplug code to have the same order.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2193315
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The spelling is slightly different from another otherwise
identical error message in the same file.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Implement the support for the persisted poll parameters and remove
restrictions on saving config when modifying them during runtime.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Convert the internal types to unsigned long long. Luckily we can also
covert the external types too:
- 'qemuDomainSetIOThreadParams' can accept both _UINT and _ULLONG by
converting to 'virTypedParamsGetUnsigned'
- querying is handled via the bulk stats API which is flexible:
- we use virTypedParamListAddUnsigned to use the bigger type only if
necessary
- most users don't even notice because the bindings abstract the
data types
Apart from the code modifications we also improve the documentation
which was missing for the setters.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU accepts even values bigger than INT_MAX. The reasoning for these
checks was that the QAPI definition declares them as 'int', but in QAPI
terms that's any number as it's JSON.
Remove the validation as well as the comment misinterpreting the QAPI
definiton.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function now return always 0. Refactor the code and remove return
values.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Return the number of parameters via pointer passed as argument to free
up possibility to report errors. Strangely all callers actually use
'int' as type for storing the count of elements, thus this function will
use the same.
The function is also renamed to virTypedParamListSteal.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The struct will be made private in upcoming patches. Construct the list
of block entries into a separate list and append them rather than
remember the index of the count element.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Add an allocator function and refactor all allocations to use it. In
upcoming patches 'struct _virTypedParamList' will be made private.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanups, qemuHostdevPreparePCIDevices() no longer
needs virQEMUCaps. Drop its passing from callers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there are some functions that do nothing:
qemuConnectDomainXMLToNativePrepareHostHostdev()
qemuConnectDomainXMLToNativePrepareHost()
qemuProcessPrepareHostHostdev()
qemuProcessPrepareHostHostdevs()
Remove them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When preparing a SCSI <hostdev/> with passthrough of a host SCSI
adapter (i.e. no protocol), a virStorageSource structure is
initialized and stored inside virDomainHostdevDef. But the source
structure is filled in many places, with almost the same code.
Firstly, qemuProcessPrepareHostHostdev() and
qemuConnectDomainXMLToNativePrepareHostHostdev() are the same.
Secondly, qemuDomainPrepareHostdev() allocates the src structure,
only to let qemuProcessPrepareHostHostdev() fill src->path later.
Well, src->path can be filled at the same place where the src
structure is allocated (qemuDomainPrepareHostdev()) which renders
the other two functions needless.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There is no way the qemuDomainAttachHostPCIDevice() function can
be called over a hostdev with PCI backend other than VFIO. And
even if it were, then the check is written so poorly that it lets
some types through (e.g. KVM) only to let
qemuBuildPCIHostdevDevProps() called afterwards fail properly.
Drop this check and rely on qemuDomainPrepareHostdevPCI() (and
worst case scenario even qemuBuildPCIHostdevDevProps()) to report
the proper error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
We used to support KVM and VFIO style of PCI assignment. The
former was dropped in v5.7.0-rc1~103 and thus we only support
VFIO. All other backends lead to an error (see
qemuBuildPCIHostdevDevProps(), or qemuBuildPCIHostdevDevStr() as
it used to be called in the era of aforementioned commit).
Might as well report the error in prepare phase and save hassle
of proceeding with device preparation (e.g. in case of hotplug
overriding the device's driver, setting seclabels, etc.).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
virsh command domxml-to-native failed with below error but start
command succeed for same domain xml.
"internal error: invalid PCI passthrough type 'default'"
If a <hostdev> PCI backend is not set in the XML, the supported
one is then chosen in qemuHostdevPreparePCIDevicesCheckSupport().
But this function is not called anywhere from
qemuConnectDomainXMLToNative(). But qemuDomainPrepareHostdev()
is. And it is also called from domain startup/hotplug code.
Therefore, move the backend setting to the common path and drop
qemuHostdevPreparePCIDevicesCheckSupport().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
So far, qemuDomainPrepareHostdev() is a NOP for anything but a
SCSI hostdev. This will change soon. Therefore, move the SCSI
hostdev preparation into a separate function
(qemuDomainPrepareHostdevSCSI()) and make
qemuDomainPrepareHostdev() call function corresponding to the
hostdev type (or nothing if the type doesn't need any
preparation).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
When attaching a hostdev of a SCSI subsys,
qemuDomainPrepareHostdev() is called. This makes sense because
the function prepares just SCSI hostdevs ignoring others. But
this will soon change. Thefore, move the function call out of
qemuDomainAttachHostSCSIDevice() and into
qemuDomainAttachHostDevice().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Treat:
<maxphysaddr mode="emulate"/>
as a request not to take the maximum address size from the host.
This is useful if QEMU changes the default.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanups a lot of functions from qemu_hotplug.c
are called only within the file. Make them static and drop their
declarations from the header file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
There is no good reason for qemuDomainUpdateDeviceLive() to live
in (ever growing) qemu_driver.c while we have qemu_hotplug.c
which already contains the rest of hotplug code. Move the
function to its new home.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
There is no good reason for qemuDomainAttachDeviceLive() to live
in (ever growing) qemu_driver.c while we have qemu_hotplug.c
which already contains the rest of hotplug code. Move the
function to its new home.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
The qemuDomainUpdateDeviceLive() accepts virDomainPtr as one of
its arguments, but use it only to get QEMU driver out of it.
Well, the only caller already does that and thus can pass it
instead of virDomainPtr.
This also makes it look like the rest of device hot(un-)plug
functions: qemuDomainAttachDeviceLive() and
qemuDomainUpdateDeviceLive().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
A new enum type "Default" has been added for Input bus.
The logic that handled default input bus types in
virDomainInputParseXML() has been moved to a new function
virDomainInputDefPostParse() in domain_postparse.c
Link to Issue: https://gitlab.com/libvirt/libvirt/-/issues/8
Signed-off-by: K Shiva <shiva_kr@riseup.net>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The disk private data contain information about the tray and
removability of the disk. Until recently we didn't support hotplug of
removable disks thus it wasn't a problem but now when you can hotplug a
CDROM you would not be able to open its tray.
Fix it by updating the hotplugged disk the same way we do at startup.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2160435
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Extract the logic to update one single disk (without emitting any
events) so that it can be reused when updating the state after a disk
hotplug.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code compares the 'tray_open' boolean from 'struct
qemuDomainDiskInfo' directly against 'disk->tray_status' which is
declared as virDomainDiskTray (enum). Now the logic works correctly
because the _OPEN enum has value '1'.
Separate the event emission code from the update code and remember the
old tray state in a separate variable rather than having the sneaky
logic we have today.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
With cgroupv2 this has better effect on the resource allocation. An
excerpt from Documentation/admin-guide/cgroup-v2.rst explains is this
way:
Migrating a process across cgroups is a relatively expensive operation
and stateful resources such as memory are not moved together with the
process. This is an explicit design decision as there often exist
inherent trade-offs between migration and various hot paths in terms
of synchronization cost.
[...]
Setting a non-empty value to "cpuset.mems" causes memory of
tasks within the cgroup to be migrated to the designated nodes if
they are currently using memory outside of the designated nodes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Most of them are platform devices and only i6300esb can be plugged
multiple times into different PCI slots.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This makes it also work during attach. Also add a test for attaching a
watchdog with incompatible action.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2187278
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The loop initially skipped the first one because it was mainly checking
the incompatible actions, but was then modified to also check the
duplicity of iTCO watchdogs.
While at it change the type of the iteration variable to the usual size_t.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2187133
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We can launch qemu with it, but it will not work since it's not even
probed by the kernel at the mapped address with different machine types
since they are expected to be connected to ISA and not even its newer
LPC counterpart found on q35. And it does not exist on non-x86
architectures.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When starting QEMU, or when hotplugging a PCI device QEMU might
lock some memory. How much? Well, that's an undecidable problem.
But despite that, we try to guess. And it more or less works,
until there's a counter example. This time, it's a guest with
both <hostdev/> and an NVMe <disk/>. I've started a simple guest
with 4GiB of memory:
# virsh dominfo fedora
Max memory: 4194304 KiB
Used memory: 4194304 KiB
And here are the amounts of memory that QEMU tried to lock,
obtained via:
grep VmLck /proc/$(pgrep qemu-kvm)/status
1) with just one <hostdev/>
VmLck: 4194308 kB
2) with just one NVMe <disk/>
VmLck: 4328544 kB
3) with one <hostdev/> and one NVMe <disk/>
VmLck: 8522852 kB
Now, what's surprising is case 2) where the locked memory exceeds
the VM memory. It almost resembles VDPA. Therefore, treat is as
such.
Unfortunately, I don't have a box with two or more spare NVMe-s
so I can't tell for sure. But setting limit too tight means QEMU
refuses to start.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
It's quite difficult, if not impossible, to create a working RISC-V VMs
using the current default machine type of 'spike_v1.10'. Change the
default to the more appropriate and virtualization friendly 'virt'
machine type.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
It's quite difficult, if not impossible, to create a usable ARM VMs
using the current default machine type of 'integratorcp'. Change the
default to the more appropriate and virtualization friendly 'virt'
machine type.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
I've tried, then I've tried even harder, but still wasn't able to
make sense of our console backcompat code in all its fine
details. Since I value my sanity, let's just forbid hotunplug of
<console/>, especially since detaching of corresponding <serial/>
works.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When cleaning up after removed device, qemuDomainChrRemove() is
called. But this may fail, in which case we successfully ignore
the failure and virDomainChrDefFree() the device anyway. While it
decreases our memory consumption, it's a bit too far, especially
if the next step is 'virsh dumpxml'. Then our memory consumption
decreases all the way down to zero as we crash.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For a running guest, a <serial/> device can be hotunplugged. This
will then remove also aliased <console/>. Trying to hotplug a
<console/> device then, libvirtd crashed because it dereferences
def->consoles while there's none.
Fixes: 42d53ac799
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When removing the compat console from domain defintion, removing
it from the vmdef->consoles array is good, but not sufficient.
The console definition might have been fully allocated (after
daemon restarted and reloaded the status XML). Use
virDomainChrDefFree() to free also the definition.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When hotpluging a <serial/> device, we might need to add a
<console/> device with it (because of some crazy backcompat).
Now, hotplugging is done in several phases. In one of them,
qemuDomainChrPreInsert() allocates space for both devices, and
then qemuDomainChrInsertPreAlloced() actually inserts the device
into domain definition and sets up the <console/> device with it.
Except, the condition that checks whether to create the aliased
<console/> is wrong as it compares nconsoles against 0.
Surprisingly, qemuDomainChrInsertPreAllocCleanup() doesn't suffer
from the same error.
Fixes: daf51be5f1
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
igb is a new network device which will be introduced with QEMU 8.0.0.
It is a successor of e1000e so it has PCIe interface and is understands
virtio-net headers as e1000e does.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
During qemu driver shutdown, objects are freed in qemuStateCleanup that
could still be used by active worker threads, resulting in crashes. E.g.
a worker thread could be processing a monitor EOF event after the
security manager is already disposed
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fd9a9a1e1fe in virSecurityManagerMoveImageMetadata (mgr=0x7fd948012160, pid=-1, src=src@entry=0x7fd98c072c90, dst=dst@entry=0x0)
at ../../src/security/security_manager.c:468
#1 0x00007fd9646ff0f0 in qemuSecurityMoveImageMetadata (driver=driver@entry=0x7fd948043830, vm=vm@entry=0x7fd98c066db0, src=src@entry=0x7fd98c072c90,
dst=dst@entry=0x0) at ../../src/qemu/qemu_security.c:182
#2 0x00007fd96462c7b0 in qemuBlockRemoveImageMetadata (driver=driver@entry=0x7fd948043830, vm=vm@entry=0x7fd98c066db0, diskTarget=0x7fd98c072530 "vda",
src=<optimized out>) at ../../src/qemu/qemu_block.c:2628
#3 0x00007fd9646929d6 in qemuProcessStop (driver=driver@entry=0x7fd948043830, vm=vm@entry=0x7fd98c066db0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>) at ../../src/qemu/qemu_process.c:7585
#4 0x00007fd9646fc842 in processMonitorEOFEvent (vm=0x7fd98c066db0, driver=0x7fd948043830) at ../../src/qemu/qemu_driver.c:4794
#5 qemuProcessEventHandler (data=0x561a93febb60, opaque=0x7fd948043830) at ../../src/qemu/qemu_driver.c:4900
#6 0x00007fd9a9971a31 in virThreadPoolWorker (opaque=opaque@entry=0x561a93fb58e0) at ../../src/util/virthreadpool.c:163
(gdb) p mgr->drv
$2 = (virSecurityDriverPtr) 0x0
Prior to commit 7cf76d4e3a, the worker thread pool was freed before
disposing any driver objects. Let's return to that pattern, but leave
the other changes made by 7cf76d4e3a.
Signed-off-by: Tamara Schmitz <tamara.schmitz@suse.com>
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Historically the snapshot code attempted to forbid internal snapshots
with UEFI both in active and inactive case. Unfortunately due to the
intricacies of UEFI probing this didn't really work for inactive VMs
which made users rely on the feature.
Now with the changes to store detected UEFI environment also in the
inactive definition this broke the feature for those users.
Since the varstore doesn't really change that much in the lifecycle of a
VM it usually is okay to simply leave it as is.
Restore the functionality for inactive snapshots by disabling the check.
In the future when uefi snapshotting will be added the rest of the
condition will also be removed.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/460
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Attaching disk into running VM the offline definition may not be
updated and we will end up with that disk existing only in live
definition. Creating snapshot with this state saves both live and
offline definition into snapshot metadata.
When we are deleting an external snapshot we are updating these
definitions in the snapshot metadata so we should just skip over
non-existing disks instead of reporting error.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2174700
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Unify validation of VIR_DOMAIN_FEATURE_HTM, VIR_DOMAIN_FEATURE_NESTED_HV,
VIR_DOMAIN_FEATURE_CCF_ASSIST and remove temporary string.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The features:
QEMU_CAPS_MACHINE_PSERIES_CAP_HPT_MAX_PAGE_SIZE
QEMU_CAPS_MACHINE_PSERIES_CAP_HTM
QEMU_CAPS_MACHINE_PSERIES_CAP_NESTED_HV
QEMU_CAPS_MACHINE_PSERIES_CAP_CCF_ASSIST
QEMU_CAPS_MACHINE_PSERIES_CAP_CFPC
QEMU_CAPS_MACHINE_PSERIES_CAP_SBBC
QEMU_CAPS_MACHINE_PSERIES_CAP_IBS
are supported by all qemu versions that libvirt supports. Drop the
obsolete checks.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic pointer freeing, remove 'ret' variable and also remove
return value completely.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove useless call to virCapabilitiesFreeMachines as the pointers were
cleared and the unneeded 'ret' variable. Since we don't need to clear
the 'machines' pointer now, remove that as well.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It's never set to any real value. Remove it along with the caching code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Separate the architecture specific code to probe the support for HVF
from the actual setting of the capability.
In upcoming patches 'virQEMUCapsProbeHVF' will be mocked in the
testsuite to provide testing for the HVF hypervisor.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The logic in 'virQEMUCapsInitQMP' invokes a second probe of qemu in case
when acceleration is used and TCG is supported to specifically probe the
CPU and features of non-accelerated guests.
The same logic must then be used in 'qemucapabilitiestest' when
replaying the data for testing otherwise the test would fail.
Export 'virQEMUCapsHaveAccel' for test usage and use the same logic
in 'testQemuCaps'.
Fix the comment in 'virQEMUCapsInitQMP' to outline what's happening.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The changes to the output files are the exact opposite of
those from commit 22207713cf: this is proof that the fix is
working as intended, and that existing domains will keep using
raw firmware images regardless of whether or not qcow2 images
are available on the system and have higher priority.
New domains will keep picking whatever firmware is considered
the preferred one according to the order of descriptors, as
evidenced by the fact that the recently introduced
firmware-auto-efi-abi-update-aarch64 test case is unaffected.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The virConnectOpen(), well virConnectOpenInternal() reports an
error if embed root is not an absolute path. This is a fair
requirement, but our qemu_shim doesn't check this requirement and
passes the path to mkdir(), only to fail later on, leaving the
empty directory behind:
$ ls -d asd
ls: cannot access 'asd': No such file or directory
$ virt-qemu-run -r asd whatever.xml
virt-qemu-run: cannot open qemu:///embed?root=asd: unsupported configuration: root path must be absolute
$ ls -d asd
asd
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
After cleanup done in v8.2.0-rc1~47 the
qemuDomainObjExitMonitor() and after v8.7.0-rc1~176 the
qemuDomainObjEnterMonitor() lost the @driver argument. But
corresponding ATTRIBUTE_NONNULL() annotation was not removed and
both functions are still annotated as ATTRIBUTE_NONNULL(2) even
though they accept just one argument (@obj).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Otherwise looking up a secret fails when we try to elevate the identity
in qemuDomainSecretInfoSetupFromSecret.
https://bugzilla.redhat.com/show_bug.cgi?id=2000410
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Even when the user is not taking advantage of firmware
autoselection and instead manually providing all the necessary
information, in most cases they're still going to use firmware
builds that are provided by the OS vendor, are installed in
standard paths and come with a corresponding firmware
descriptor.
Similarly, even when the user is not guiding the autoselection
process by specifying the desired status of certain features
and instead is relying on the system-level descriptor priority
being set up correctly, libvirt will still ultimately decide to
use a specific descriptor, which includes information about the
firmware's features.
In both these cases, take the additional information that were
obtained from the firmware descriptor and reflect them back into
the domain XML, where they can be conveniently inspected by the
user and management applications alike.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that we no longer reject configurations that include both
this information and explicit firmware details, as long of
course as everything is internally consistent, and that we've
ensured that we produce maximally compatible XML on migration,
we can stop stripping this information at the end of the
firmware selection process.
There are several advantages to keeping this information around:
* if the user wants to change the firmware configuration for
an existing VM, they can simply drop the <loader> and
<nvram> elements, tweak the firmware autoselection parameters
and let libvirt pick a firmware that matches on the new
requirements;
* management applications can inspect the XML and easily
figure out firmware-related information without having to
reverse-engineer them based on some opaque paths.
Overall, this change makes things more transparent and easier to
understand. The improvement is so significant that, in a
follow-up commit, we're going to ensure that this information is
available in even more cases.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Right now there are a few scenarios in which we skip ahead, and
removing these exceptions will make for more consistent and
predictable behavior.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The requires-smm feature being present in a firmware descriptor
causes loader.secure=yes to be automatically chosen for the
domain, so we have to avoid this situation or the user's choice
will be silently subverted.
Note that we can't actually encounter loader.secure=no in this
function at the moment because of earlier checks, but that's
going to change soon.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Right now we have checks in place that ensure that explicit
paths are not provided when firmware autoselection has been
enabled, but that's going to change soon.
To prepare for that, take into account user-provided paths
during firmware autoselection if present, and discard all
firmware descriptors that don't contain matching information.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Otherwise the build on armv7l breaks:
error: format ‘%lu’ expects argument of type
‘long unsigned int’, but argument 4 has type
‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
Fixes: 1992ae40fa
Fixes: e239f7d0a8
Signed-off-by: Ján Tomko <jtomko@redhat.com>
The newly added luks-any rbd encryption format in qemu
allows for opening both LUKS and LUKS2 encryption formats.
This commit enables libvirt uses to use this wildcard format.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This capability represents that qemu supports the "luks-any" encryption
format for RBD images.
Both LUKS and LUKS2 formats can be parsed using this wildcard format.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This commit enables libvirt users to use layered encryption
of RBD images, using the librbd encryption engine.
This allows opening of an encrypted cloned image
whose parent is encrypted with a possibly different encryption key.
To open such images, multiple encryption secrets are expected
to be defined under the encryption XML tag.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This commit changes the _qemuDomainStorageSourcePrivate struct
to support multiple secrets (instead of a single one before this commit).
This will useful for storage encryption requiring more than a single secret.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This commit changes the qemuBlockStorageSourceAttachData struct
to support multiple secrets (instead of a single one before this commit).
This will useful for storage encryption requiring more than a single secret.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Change secret aliases from %s-%s-secret0 to %s-%s-secret%lu,
which will later be used for storage encryption requiring more
than a single secret.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This capability represents that qemu supports the layered encryption
of RBD images, where a cloned image is encrypted with a possible
different encryption than its parent image.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When a thread-context object is specified on the cmd line, then
QEMU spawns a thread and sets its affinity to the list of NUMA
nodes specified in .node-affinity attribute. And this works just
fine, until the main QEMU thread itself is not restricted.
Because of v5.3.0-rc1~18 we restrict the main emulator thread
even before QEMU is executed and thus then it tries to set
affinity of a thread-context thread, it inevitably fails with:
Setting CPU affinity failed: Invalid argument
Now, we could lift the pinning temporarily, let QEMU spawn all
thread-context threads, and enforce pinning again, but that would
require some form of communication with QEMU (maybe -preconfig?).
But that would still be wrong, because it would circumvent
<emulatorpin/>.
Technically speaking, thread-context is an internal
implementation detail of QEMU, and if it weren't for it, the main
emulator thread would be doing the allocation. Therefore, we
should honor the pinning and prune the list of node so that
inaccessible ones are dropped.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2154750
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
When building a thread-context object (inside of
qemuBuildThreadContextProps()) we look at given memory-backend-*
object and look for .host-nodes attribute. This works, as long as
we need to just copy the attribute value into another
thread-context attribute. But soon we will need to adjust it.
That's the point where having the value in virBitmap comes handy.
Utilize the previous commit, which made
qemuBuildMemoryBackendProps() set the argument and pass it into
qemuBuildThreadContextProps().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
While it's true that anybody who's interested in getting
.host-nodes attribute value can just use
virJSONValueObjectGetArray() (and that's exactly what
qemuBuildThreadContextProps() is doing, btw), if somebody is
interested in getting the actual virBitmap, they would have to
parse the JSON array.
Instead, introduce an argument to qemuBuildMemoryBackendProps()
which is set to corresponding value used when formatting the
attribute.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
There are two compound conditions in
qemuBuildMemoryBackendProps() and each one checks for nodemask
for NULL first. Join them into one bigger block.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The order of pinning priority (at least for emulator thread) was
set by v1.2.15-rc1~58 (for cgroup code). But later, when
automatic placement was implemented into
qemuDomainGetEmulatorPinInfo(), the priority was not honored.
Now that we have this priority code in a separate function, we
can just call that and avoid this type of error.
Fixes: 776924e376
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
The set of if()-s that determines the preference in cpumask used
for setting things like emulatorpin, vcpupin, etc. is going to be
re-used. Separate it out into a function.
You may think that this changes behaviour, but
qemuProcessPrepareDomainNUMAPlacement() ensures that
priv->autoCpuset is set for VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
We have this crazy backwards compatibility when it comes to
serial and console devices. Basically, in same cases the very
first <console/> is just an alias to the very first <serial/>
device. This is to be seen at various places:
1) virDomainDefFormatInternalSetRootName() - when generating
domain XML, the <console/> configuration is basically ignored
and corresponding <serial/> config is formatted,
2) virDomainDefAddConsoleCompat() - which adds a copy of
<serial/> or <console/> into virDomainDef in post parse.
And when talking to QEMU we need a special handling too, because
while <serial/> is generated on the cmd line, the <console/> is
not. And in a lot of place we get it right. Except for generating
device aliases. On domain startup the 'expected' happens and
devices get "serial0" and "console0" aliases, correspondingly.
This ends up in the status XML too. But due to aforementioned
trick when formatting domain XML, "serial0" ends up in both
'virsh dumpxml' and the status XML. But internally, both devices
have different alias. Therefore, detaching the device using
<console/> fails as qemuDomainDetachDeviceChr() tries to detach
"console0".
After the daemon is restarted and status XML is parsed, then
everything works suddenly. This is because in the status XML both
devices have the same alias.
Let's generate correct alias from the beginning.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2156300
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Other APIs that internally use QEMU migration and need to temporarily
suspend a domain already report failure to resume vCPUs by setting
VIR_DOMAIN_PAUSED_API_ERROR state reason and emitting
VIR_DOMAIN_EVENT_SUSPENDED event with
VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR.
Let's do the same in qemuMigrationSrcRestoreDomainState for consistent
behavior.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some APIs (migration, save/restore, snapshot, ...) require a domain to
be suspended temporarily. In case resuming the domain fails, the domain
will be unexpectedly left paused when the API finishes. This situation
is reported via VIR_DOMAIN_EVENT_SUSPENDED event with
VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR detail. But we do not have a
corresponding reason for VIR_DOMAIN_PAUSED state and the reason would
remain set to the value used when the domain was paused. So the state
reason would suggest the operation is still running.
This patch changes the state reason to a new VIR_DOMAIN_PAUSED_API_ERROR
to make it clear the API that paused the domain already finished, but
failed to resume the domain.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For some vhostuser daemons, we validate that the guest memory is shared
with the host.
With earlier versions of QEMU, it was only possible to mark memory
as shared by defining an explicit NUMA topology. Later, QEMU exposed
the name of the default memory backend (defaultRAMid) so we can mark
that memory as shared.
Since libvirt commit:
commit bff2ad5d6b
qemu: Relax validation for mem->access if guest has no NUMA
we already check for the case when user requests shared memory,
but QEMU did not expose defaultRAMid.
Drop the duplicit check from vhostuser device validation, to make
it pass on hotplug even after libvirtd restart.
This avoids the need to store the defaultRAMid, since we don't really
need it for anything after the VM has been already started.
https://bugzilla.redhat.com/show_bug.cgi?id=2078693https://bugzilla.redhat.com/show_bug.cgi?id=2177701
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
set useBinarySpecificLabel = true when calling qemuSecurityCommandRun
for the passt process, so that the new process context will include
the binary-specific label that should be used for passt (passt_t)
rather than svirt_t (as would happen if useBinarySpecificLabel was
false). (The MCS part of the label, which is common to all child
processes related to a particular qemu domain instance, is also set).
Resolves: https://bugzilla.redhat.com/2172267
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Normally when a child process is started by libvirt, the SELinux label
of that process is set to virtd_t (plus an MCS range). In at least one
case (passt) we need for the SELinux label of a child process label to
match the label that the binary would have transitioned to
automatically if it had been run standalone (in the case of passt,
that label is passt_t).
This patch modifies virSecuritySELinuxSetChildProcessLabel() (and all
the functions above it in the call chain) so that the toplevel
function can set a new argument "useBinarySpecificLabel" to true. If
it is true, then virSecuritySELinuxSetChildProcessLabel() will call
the new function virSecuritySELinuxContextSetFromFile(), which uses
the selinux library function security_compute_create() to determine
what would be the label of the new process if it had been run
standalone (rather than being run by libvirt) - the MCS range from the
normally-used label is added to this newly derived label, and that is
what is used for the new process rather than whatever is in the
domain's security label (which will usually be virtd_t).
In order to easily verify that nothing was broken by these changes to
the call chain, all callers currently set useBinarySpecificPath =
false, so all behavior should be completely unchanged. (The next
patch will set it to true only for the case of running passt.)
https://bugzilla.redhat.com/2172267
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently it's only possible to set this parameter during domain
creation via QEMU commandline passthrough feature.
With the new delay attribute it's also possible to set this
parameter if you want to attach a new NBD disk
using "virsh attach-device domain device.xml" e.g.:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='nbd' name='foo'>
<host name='example.org' port='6000'/>
<reconnect delay='10'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Christian Nautze <christian.nautze@exoscale.ch>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 54fa1b44af ("conf: Add loadparm boot option for a boot device")
added the ability to specify a loadparm parameter on a <boot/> tag, while
commit 29ba41c2d4 ("qemu: Add loadparm to qemu command line string")
added that value to the QEMU "-machine" command line parameters.
Unfortunately, the latter commit only looked at disks and network
devices for boot information, even though anything with
VIR_DOMAIN_DEF_FORMAT_ALLOW_BOOT could potentially have this tag.
In practice, a <hostdev> tag pointing to a passthrough (SCSI or DASD)
disk device can be used in this way, which means the loadparm is
accepted, but not given to QEMU.
Correct this, and add some XML/argv tests.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Good to have for debugging in case something wrong happens during
incoming migration.
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
For shutoff VMs we don't have the storage source backing chain
populated so it will fail this check and error out. Move it to
part that is done only when VM is running.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This can improve performance for some guests since it reduces copying of
display data between host and guest. Requires udmabuf on the host.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Capability to determine whether this qemu supports the 'blob' option for
virtio-gpu.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rather than storing the video type as an integer, use the proper enum
type within the struct.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The parameter was added for consistency with virPidFileAcquirePath.
However, all callers of virPidFileAcquire pass false.
Remove the argument.
Partially-reverts: 2250a2b5d2
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The function doesn't set any capability and we don't want to add
arch-dependent always-peresent capabilities in the future.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Check the architecture of the guest rather than relying on
QEMU_CAPS_LOADPARM which is set based on architecture.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use the guest architecture to decide whether to format
'aes-key-wrap'/'dea-key-wrap' rather than
QEMU_CAPS_AES_KEY_WRAP/QEMU_CAPS_DEA_KEY_WRAP which were set based on
architecture.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU_CAPS_MACH_VIRT_GIC_VERSION is always asserted for VIR_ARCH_AARCH64.
Note that this patch is a direct conversion of the logic originally
residing in the capabilities code. A better coversion would be (based on
whether it is available for just AARCH64 or also ARM) to base it on the
guest architecture.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rather than asserting a capability based on architecture, format the
fallback parameter based on the presence of the newer capability and an
explicit architecture check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capability is based on a platform check rather than what given qemu
supports.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU_CAPS_NO_ACPI is asserted based on architecture, so it can be
replaced by a non-capability check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 24cc9cda82 switched over to use -machine hpet, but one of the
steps it did was to clear the QEMU_CAPS_NO_HPET capability.
The validation check still uses the old capability though which means
that for configs which would explicitly enable HPET we'd report an error.
Since HPET is an x86(_64) platform specific device, convert the
validation check to an architecture check as all supported qemu versions
actually support it.
Modify a test case to request HPET to catch posible future problems.
Fixes: 24cc9cda82
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We always assert the flag for aarch64 qemus and in qemu the 'aarch64'
cpu property doesn't seem to be optional.
Remove checks and remove impossible test case.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function can't fail at this point. Remove the last outstanding
pointless error check and turn the return type into 'void'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Make all callers always pass a valid pointer which in turn allows us to
remove return value check from the callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Make all callers always pass a valid pointer which in turn allows us to
remove return value check from the callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Do the two fixups of CPU as one block and split up the return value
checks to separate conditions. This will make the upcoming refactors
simpler.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The allocation of the object itself can't fail. What can fail is the
creation of the class on a programming error. Rather than punting the
error up the stack abort() directly on the first occurence as the error
can't be fixed during runtime.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Format aliases into temporary strings and append them using
virJSONValueObjectAdd.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The 'ipv6-prefix' and 'ipv6-prefixlen' fields can be directly added
using virJSONValueObjectAdd rather than by two separate calls.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virJSONValueObjectAdd and format the string directly via
g_strdup_printf. In the end virJSONValueObjectAppendStringPrintf will be
removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Prefer virJSONValueObjectAdd which we already use internally combined
with local formatting of the string.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
VIR_DOMAIN_NET_TYPE_NULL and VIR_DOMAIN_NET_TYPE_VDS are not implemented
for the qemu driver but the formatter code in 'qemuBuildHostNetProps'
didn't report an error for them and didn't even return from the function
when they were encountered.
This caused a crash in 'virJSONValueObjectAppendStringPrintf' which
does not tolerate NULL JSON object to append to when the unsupported
devices were used.
Properly report error when unhandled devices are encountered. This also
includes the case for VIR_DOMAIN_NET_TYPE_HOSTDEV, but that code path
should never be reached.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2175582
Fixes: bac6b266fb / 6457619d18
Fixes: 0225483adc
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU deprecated the '-no-acpi' option, thus we should switch to the
modern way to use '-machine'.
Certain ARM machine types don't support ACPI. Given our historically
broken design of using '<acpi/>' without attribute to enable ACPI and
qemu's default of enabling it without '-no-acpi' such configurations
would not work.
Now when qemu reports whether given machine type supports ACPI we can do
a better decision and un-break those configs. Unfortunately not
retroactively.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/297
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The helper returns the 'acpi' flag for a given machine type.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The return data from 'query-machines' now contains an 'acpi' field. If
the field is present we can use it to decide how to handle user's
setting of '<acpi/>' domain feature.
Add logic to extract the 'acpi' field and store it in machine type list
along with other properties.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We now always assume support for polling mode of iothreads.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
iothread polling mode and the corresponding properties were added in
qemu-2.9 ( 0d9d86fb4df4882b ). We can always assume that qemu supports
them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
iothreads were introduced in qemu-2.0 and can't be compiled out thus we
can always assume qemu supports them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 068efae5b1 created a copy of this code instead of
simply moving it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Take the information from the descriptor and store it in the
domain definition. Various things, such as the arguments passed
to -blockdev and the path generated for the NVRAM file, will
then be based on it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If the user has requested a specific firmware format, then
all firmware builds that are not in that format should be
ignored while looking for matches.
The legacy hardcoded firmware list predates firmware
descriptors and their "format" field, so we can safely
assume that all builds listed in there are in raw format.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This ensures that, as we add support for more formats at the
domain XML level, we don't accidentally cause drivers to
misbehave or users to get confused.
All existing drivers support the raw format, and supporting
additional formats will require explicit opt-in on the
driver's part.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Right now, this results in loader->nvram being NULL, which is
reasonable: loader->nvramTemplate is stored separately, so if
the <nvram> element doesn't contain a path there is really no
useful information inside it.
However, this is about to change, so we will find ourselves
needing to hold on to loader->nvram even when no path is
present. Change the firmware handling code so that such a
scenario is dealt with appropriately.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This helper replaces qemuDomainNVRAMPathFormat() and also
incorporates some common operations that all callers of that
helper needed.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently, firmware selection is performed as part of the
domain startup process. This mostly works fine, but there's a
significant downside to this approach: since the process is
affected by factors outside of libvirt's control, specifically
the contents of the various JSON firmware descriptors and
their names, it's pretty much impossible to guarantee that the
outcome is always going to be the same. It would only take an
edk2 update, or a change made by the local admin, to render a
domain unbootable or downgrade its boot security.
To avoid this, move firmware selection to the postparse phase.
This way it will only be performed once, when the domain is
first defined; subsequent boots will not need to go through
the process again, as all the paths that were picked during
firmware selection are recorded in the domain XML.
Care is taken to ensure that existing domains are handled
correctly, even if their firmware configuration can't be
successfully resolved. Failure to complete the firmware
selection process is only considered fatal when defining a
new domain; in all other cases the error will be reported
during startup, as is already the case today.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Right now, if the descriptor with the highest priority happens
to describe a firmware in a format other than raw, no domain
that uses autoselection will be able to start.
A better approach is to filter out descriptors that advertise
unsupported formats during autoselection.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
At the moment, if SMM is explicitly disabled in the domain XML
but a firmware descriptor that requires SMM to be enabled has
the highest priority and otherwise matches the requirements,
we pick that firmware only to error out later, when the domain
is started.
A better approach is to take into account the fact that SMM is
disabled while performing autoselection, and ignore all
descriptors that advertise the requires-smm feature.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We already clear os.firmware, so it doesn't make sense to keep
the list of features around.
Moreover, our validation routines will reject an XML that
contains a list of firmware features but disables firmware
autoselection, so not clearing these means that the live XML
for a domain that uses feature-based autoselection can't be
fed back into libvirt.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It doesn't make sense for non-local sources, since we can't
create or reset the corresponding NVRAM file.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This makes the code more compact and less awkward.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
For now we just allocate the object, so the only advantage is
that invocations are shorter and look a bit nicer.
Later on, its introduction will pay off by letting us change
things in a single spot instead of all over the library.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move all the boot related parts of qemuDomainDefPostParse()
to a separate helper.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move all the machine type related parts of
qemuDomainDefPostParse() to a separate helper.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When starting (some) external helpers, callers of
qemuSecurityCommandRun() pass &exitstatus variable, to learn the
exit code of helper process (with qemuTPMEmulatorStart() being
the only exception). Then, if the status wasn't zero they produce
a generic error message, like:
"Starting of helper process failed. exitstatus=%d"
or, in case of qemuPasstStart():
"Could not start 'passt': %s"
This is needless as virCommandRun() (that's called under the
hood), can do both for us, if NULL was passed instead of
@exitstatus. Not only it appends exit status, it also reads
stderr of failed command producing comprehensive error message:
Child process (${args}) unexpected exit status ${exitstatus}: ${stderr}
Therefore, pass NULL everywhere. But in contrast with one of
previous commits which removed @cmdret argument, there could be a
sensible caller which might want to process exit code. So keep
the argument for now and just pass NULL.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Every single caller of qemuSecurityCommandRun() calls the
function as:
if (qemuSecurityCommandRun(..., &cmdret) < 0)
goto cleanup;
if (cmdret < 0)
goto cleanup;
(modulo @exitstatus shenanigans)
Well, there's no need for such complication. There isn't a single
caller (and probably will never be (TM)), that would need to
distinguish the reason for the failure. Therefore,
qemuSecurityCommandRun() can be made to pass the retval of
virCommandRun() called under the hood.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The usual pattern when starting a helper daemon is:
if (qemuSecurityCommandRun(..., &exitstatus, &cmdret) < 0)
goto cleanup;
if (cmdret < 0 || exitstatus != 0) {
virReportError();
goto cleanup;
}
The only problem with this pattern is that if virCommandRun()
fails (i.e. cmdret < 0), then proper error was already reported.
But in this pattern we overwrite it (usually with less specific)
error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Way back, in v6.2.0-rc1~67 we removed the code that reads slirp's
stderr on failed startup. However, we forgot to remove
corresponding virCommandSetErrorFD() call and variable
declaration. Do that now.
While this may seem like a step in wrong direction (we should be
reading stderr as it may contain reason for failed start), this
is going to be handled in more general way in next commits.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function is used only inside qemu_domain.c, unexport it and move it
above its user.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>