Nothing inside the qemuPrepareNVRAM function relies on @srcFD
being closed early and nothing closes it early. It's okay then to
close it automatically when leaving the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
After previous commits there is no need for qemuPrepareNVRAM() to
open code virFileRewrite(). Deduplicate the code by calling the
function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In one of my previous commits I've fixed the value of
VIR_QEMU_PROCESS_START_RESET_NVRAM flag (which was masking
another value). But what I forgot to do is update virCheckFlags()
calls in two places where the flag is passed: qemuProcessLaunch()
and qemuProcessStart().
Fixes: 1b636593c7
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The qemuPrepareNVRAM() function accepts three arguments and the
last one being a boolean type. However, when the function is
called from qemuProcessPrepareHost() the argument passed is a
result of logical and of @flags (unsigned int) and
VIR_QEMU_PROCESS_START_RESET_NVRAM value. In theory this is
unsafe to do because if the value of the flag is ever changed
then this expression might overflow. Do what we do elsewhere:
double negation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We can now replace the existing NVRAM file on startup when
the API requests this.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
If we crash part way through writing the NVRAM file we end up with an
unusable NVRAM on file. To avoid this we need to write to a temporary
file and fsync(2) at the end, then rename to the real NVRAM file path.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This change was generated using the following spatch:
@ rule1 @
expression a;
identifier f;
@@
<...
- f(*a);
... when != a;
- *a = NULL;
+ g_clear_pointer(a, f);
...>
@ rule2 @
expression a;
identifier f;
@@
<...
- f(a);
... when != a;
- a = NULL;
+ g_clear_pointer(&a, f);
...>
Then, I left some of the changes out, like tools/nss/ (which
doesn't link with glib) and put back a comment in
qemuBlockJobProcessEventCompletedActiveCommit() which coccinelle
decided to remove (I have no idea why).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Modify 'qemuProcessGetVCPUQOMPath' to take the detected QOM path of the
first vCPU which is always present as the QOM path used our code probing
CPU flags via 'qom-get'.
This is needed as upcoming qemu will change it.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/272
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2051451
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Similarly to previous commit we need to probe the vcpus first.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Upcoming changes will require that we have a proper QOM path for cpus
when querying the flags as qemu is going to change it.
By moving the flag probing code later we'll already probe the QOM paths
so no re-query will be needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Convert all code using the 'QOM_CPU_PATH' macro to accept the QOM path
as an argument.
For now the new helper for fetching the path 'qemuProcessGetVCPUQOMPath'
will always return the same hard-coded value.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic memory clearing and remove the 'ret' variable.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When we are about to spawn QEMU, we validate the domain
definition against qemuCaps. Except when domain is/was already
running before (i.e. on incoming migration, snapshots, resume
from a file). However, especially on incoming migration it may
happen that the destination QEMU is different to the source
QEMU, e.g. the destination QEMU may have some devices disabled.
And we have a function that validates devices/features requested
in domain XML against the desired QEMU capabilities (aka
qemuCaps) - it's virDomainDefValidate() which calls
qemuValidateDomainDef() and qemuValidateDomainDeviceDef()
subsequently.
But the problem here is that the validation function is
explicitly skipped over in specific scenarios (like incoming
migration, restore from a snapshot or previously saved file).
This in turn means that we may spawn QEMU and request
device/features it doesn't support. When that happens QEMU fails
to load migration stream:
qemu-kvm: ... 'virtio-mem-pci' is not a valid device model name
(NB, while the example shows one particular device, the problem
is paramount)
This problem is easier to run into since we are slowly moving
validation from qemu_command.c into said validation functions.
The solution is simple: do the validation in all cases. And while
it may happen that users would be unable to migrate/restore a
guest due to a bug in our validator, spawning QEMU without
validation is worse (especially when you consider that users can
supply their own XMLs for migrate/restore operations - these were
never validated).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2048435
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Refactor some cgroup management methods from qemu into hypervisor.
These methods will be shared with ch driver for cgroup management.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This function never returns an error, make it void then. And
while at it, make the @src argument const to make it obvious it's
never changed inside the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We already know it's not going to be available on other
platforms.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Tested-by: Brad Laue <brad@brad-x.com>
Tested-by: Christophe Fergeau <cfergeau@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
After previous cleanups, the virDomainDefParseBootXML() function
uses a mixture of virXMLProp*() and the old virXMLPropString() +
virXXXTypeFromString() patterns. Rework it so that virXMLProp*()
is used.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove the now unused 'driver' parameter, as well as the pointless
if (ret == 0) comparison which is always true after removing the
cleanup label.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Its only use was to check conflicts of the sgio attributes between
devices shared with other domains.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Now that the 'unfiltered' attribute is rejected by the validator,
remove all the code that deals with the feature.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
All the fd-passing setup of chardevs which this hack meant to disable
was moved to the host-preparation phase which is skipped for formatting
of non-real commandlines.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
And its callers. The parameter is no longer used since virDomainObjSave
was replaced with qemuDomainSaveStatus wrapper.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is a nice wrapper around virDomainObjSave which logs a warning, but
otherwise ignores the error. Let's use it where appropriate.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
As of ff024b60cc we are opening chardevs before starting QEMU.
However, we are also doing that before domain private directories
are created. This leaves us unable to create guest agent socket
which lives under priv->channelTargetDir.
While creating the dirs can be moved just before
qemuProcessPrepareHostBackendChardev() it's better to do it as
the very first step so that this kind of error is prevented in
future.
Fixes: ff024b60cc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Add handling to qemuDomainDeviceBackendChardevForeachOne and callbacks
so that we can later use 'qemuBuildChardevCommand' for TPM devices.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add handling to qemuDomainDeviceBackendChardevForeachOne and callbacks
so that we can later use 'qemuBuildChardevCommand' for vhost-user disks
instead of a custom formatter.
Since we don't pass the FD for the vhost-user connection to qemu all of
the setup can be skipped.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use the qemuDomainDeviceBackendChardevForeach helper to iterate all
eligible structs and convert the setup of the TLS defaults from the
config.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The opening of files for FD passing for a chardev backend was
historically done in the function which is formatting the commandline.
This has multiple problems. Firstly the function takes a lot of
parameters which need to be passed through the commandline formatters.
This made the 'qemuBuildChrChardevStr' extremely unappealing to the
extent that we have multiple other custom formatters in places which
didn't really want to use the function.
Additionally the function is also creating files in the host in certain
configurations which is wrong for a commandline formatter to do. This
meant that e.g. not all chardev test cases can be converted to use
DO_TEST_CAPS_LATEST as we attempt to use such code path and attempt to
create files outside of the test directory.
This patch moves the opening of the filedescriptors from
'qemuBuildChrChardevFileStr' into a new helper
'qemuProcessPrepareHostBackendChardevOne' which is called using
'qemuDomainDeviceBackendChardevForeach'.
To preserve test behaviour we also have another instance
'testPrepareHostBackendChardevOne' which is populating mock
filedescriptors.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use automatic memory freeing for the temporary bitmap and remove the
pointless 'cleanup' section.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
I think it makes more sense for the variable about jobs to be in
the job object. I also renamed it to be consistent with the rest
of the struct.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Remove the check from conditions where it's coupled with some other
checks.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The 'xmlopt' parameter can be auto-unref by using g_autoptr().
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The upcoming QEMU 6.2.0 implements a new event called
DEVICE_UNPLUG_GUEST_ERROR, a new event that reports generic device
unplug errors that were detected by the guest and reported back to QEMU.
This new event is going to be specially useful for pseries guests that
uses newer kernels (must have kernel commit 29c9a2699e71), which is the
case for Fedora 34 at this moment. These guests have the capability of
reporting CPU removal errors back to QEMU which, starting in 6.2.0, will
emit the DEVICE_UNPLUG_GUEST_ERROR event. Libvirt can use this event to
abort the device removal immediately instead of waiting for 'setvcpus'
timeout.
QEMU 6.2.0 is also going to emit DEVICE_UNPLUG_GUEST_ERROR for memory
hotunplug errors, both in pseries and ACPI guests. QEMU 6.1.0 reports
memory removal errors using the MEM_UNPLUG_ERROR event, which is going to
be deprecated by DEVICE_UNPLUG_GUEST_ERROR in 6.2.0. Given that
Libvirt wasn't handling the MEM_UNPLUG_ERROR event we don't need to
worry about it - adding support to DEVICE_UNPLUG_GUEST_ERROR will be
enough to cover all future cases.
This patch adds support to DEVICE_UNPLUG_GUEST_ERROR by adding the
minimal wiring required for Libvirt to be aware of it. The monitor
callback for this event will abort the pending removal operation of the
device reported by the "device" property of the event. Most of the heavy
lifting is already done by existing code that handles
QEMU_DOMAIN_UNPLUGGING_DEVICE_STATUS_GUEST_REJECTED, making our life
easier to abort the pending removal operation.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
After previous cleanups this callback is unused. Remove it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Currently, when opening an agent socket the qemuConnectAgent()
increments domain object refcounter and calls qemuAgentOpen()
where the domain object pointer is simply stored inside
_qemuAgent struct. If qemuAgentOpen() fails, then it clears @cb
member only to avoid qemuProcessHandleAgentDestroy() being called
(which decrements the domain object refcounter) and the domain
object refcounter is then decreased explicitly in
qemuConnectAgent().
The same result can be achieved with much cleaner code: increment
the refcounter inside qemuAgentOpen() and drop the dance around
@cb.
Also, the comment in qemuConnectAgent() about holding an extra
reference is not correct. The thread that called
qemuConnectAgent() already holds a reference to the domain
object. No matter how many time the object is locked and unlocked
the reference counter can't be decreased.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Just like qemuMonitorOpen(), hold the domain object locked
throughout the whole time of qemuConnectAgent() and unlock it
only for a brief time of actual connect() (because this is the
only part that has a potential of blocking).
The reason is that qemuAgentOpen() does access domain object
(well, its privateData) AND also at least one argument (@context)
depends on domain object. Accessing these without the lock is
potentially dangerous.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1845468#c12
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
During the vm rebooting, the vm could be paused if the libvirtd is
restarted for some reason, which is not expected. We need continue
fakereboot process if fakereboot flags is true and the vm is in
paused-user status.
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
During the vm rebooting, the vm could be shut down if the libvirtd is
restarted for some reason, which is not expected. We move set
fakereboot flags false after processing fakereboot over, so we can
ensure that fakereboot process have been executed.
Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This is a typical example of what can go wrong when sending out
an old patch. Back in January, when I was writing
qemuProcessHandleMemoryDeviceSizeChange() events were sent to the
worker pool thread using virThreadPoolSendJob(). Then, in July a
helper was introduced (qemuProcessEventSubmit()) but since my
code was not committed and I did not pay attention my code wasn't
updated. Later, when I merged my code it uses the old approach.
BTW: this also fixes a possible double free which I completely
missed when writing the code ~10 months ago.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Nobody's interested in the return value of any of
struct _qemuMonitorCallbacks callbacks. They are all void, but
domainMemoryDeviceSizeChange. Change it to void.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Libvirt will put the pid file of pr-helper to per-domain directory.
However, the ownership of the per-domain directory is the user to run
the QEMU process and the user has the write permission of the directory.
If VM escape occurs, the attacker can
1. write arbitrary content to the pid file (if running QEMU using root),
then the attacker can kill any process by writing appropriate pid to
the pid file;
2. spoof the pid file (if running QEMU using a regular user), then the
pr-helper process will never be cleared even if the VM is destroyed.
So, move the pid file of pr-helper from per-domain directory to
stateDir.
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Fill in the effective boot index for network devices (or hostdev-backed
network devices via 'qemuProcessPrepareDeviceBootorder'. This patch
doesn't clean up the cruft to make it more obvious what's happening.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rename it to 'qemuProcessPrepareDeviceBootorder' and call it from
'qemuProcessPrepareDomain' rather than
'qemuProcessPrepareDomainStorage'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'effectiveBootIndex' is a copy of 'bootIndex' if '<boot order=' was
present and left unassigned if not. This allows hypervisor drivers to
reinterpret <os><boot> without being visible in the XML.
QEMU driver had a internal implementation for disks, which is now
replaced. Additionally this will simplify a refactor of network boot
assignment.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We commonly use 'props' for the JSON object describing something. Rename
the monitor device addition code.
Additionally the common approach is to clear the pointer if it was
consumed so the arguments are adjusted to do so.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The -netdev formatter code switched to a real virQEMUCaps flag so we can
remove the old flags which used to enable JSON for -netdev for
validation purposes.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reporting how much memory is exposed to the guest happens under
<currentMemory/> which is taken from def->mem.cur_balloon. The
reported amount should account for both balloon size and the sum
of @currentsize of all virtio-mems. For instance, if domain has
4GiB via balloon and additional 2GiB via virtio-mem, then the
domain XML should report 6GiB. The same applies for domain
statistics.
The way to achieve this is to account for either balloon or
virtio-mem when the size of the other is changed, e.g. on balloon
change we have to add all @currentsize (for non virtio-mem these
will be zero, so the check for memory model is needless, but
makes it more obvious what's happening), and vice versa.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If the QEMU driver restarts it loses the track of the current size
of virtio-mem (because it's runtime type of information and thus
not stored in XML) and therefore, we have to refresh it when
reconnecting to the domain monitor.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
As advertised in previous commit, this event is delivered to us
when virtio-mem module changes the allocation inside the guest.
It comes with one attribute - size - which holds the new size of
the virtio-mem (well, allocated size), in bytes.
Mind you, this is not necessarily the same number as 'requested
size'. It almost certainly will be when sizing the memory up, but
it might not be when sizing the memory down - the guest kernel
might be unable to free some blocks.
This current size is reported in the domain XML as an output
element only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The virtio-mem is paravirtualized mechanism of adding/removing
memory to/from a VM. A virtio-mem-pci device is split into blocks
of equal size which are then exposed (all or only a requested
portion of them) to the guest kernel to use as regular memory.
Therefore, the device has two important attributes:
1) block-size, which defines the size of a block
2) requested-size, which defines how much memory (in bytes)
is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable
throughout device's lifetime. The 'requested-size' can be set on
the command line too, but also is adjustable via monitor. In
fact, that is how management software places its requests to
change the memory allocation. If it wants to give more memory to
the guest it changes 'requested-size' to a bigger value, and if it
wants to shrink guest memory it changes the 'requested-size' to a
smaller value. Note, value of zero means that guest should
release all memory offered by the device. Of course, guest has to
cooperate. Therefore, there is a third attribute 'size' which is
read only and reflects how much memory the guest still has. This
can be different to 'requested-size', obviously. Because of name
clash, I've named it 'current' and it is dealt with in future
commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects:
memory-backend-{ram,file,memfd} and their size puts the cap on
the amount of memory that a virtio-mem device can offer to a
guest. But we are already able to express this info using <size/>
under <target/>.
Therefore, we need only two more elements to cover 'block-size'
and 'requested-size' attributes. This is the XML I've came up
with:
<memory model='virtio-mem'>
<source>
<nodemask>1-3</nodemask>
<pagesize unit='KiB'>2048</pagesize>
</source>
<target>
<size unit='KiB'>2097152</size>
<node>0</node>
<block unit='KiB'>2048</block>
<requested unit='KiB'>1048576</requested>
</target>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of
'block-size', and
2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
address.
Then there is a limitation that the minimal 'block-size' is
transparent huge page size (I'll leave this without explanation).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When action for 'on_poweroff' is set to 'restart', 'fake reboot'
is triggered and qemu shutdown state is transient. Domain state
need not to be changed and events not sent in this case.
Fixes: 4ffc807214
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We don't need to propagate all public flags, only the information
about the presence of the validation one, which can differ from
function to function. This patch makes it easier and more
readable in case of a future additions of validation flags.
This change was suggested by Daniel.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'-qmp' in this case behaves the same as '-chardev' so it should have
been converted the same way as others were in 43c9c0859f since
short options are deprecated.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Split out the logic which was used to determine whether qemu should
allow the guest OS to reboot for QEMU versions which don't support the
'set-action' QMP command.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
In cases when we are adding a <transient/> disk with sharing backend
(and thus hotplugging it) we need to re-initialize ACPI tables so that
the VM boots from the correct device.
This has a side-effect of emitting the RESET event and forwarding it to
the clients which is not correct.
Fix this by ignoring RESET events during startup of the VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We don't use the value of the flag when the new handling is in place so
we don't have to initialize it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The '-no-shutdown' flag prevents qemu from terminating if a shutdown was
requested. Libvirt will handle the termination of the qemu process
anyways and using this consistently will allow greater flexibility for
the virDomainSetLifecycleAction API as well as will allow using
the 'system-reset' QMP command during startup to reinitiate devices
exported to the firmware.
This efectively partially reverts 0e034efaf9
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Rather than using '-no-reboot' use the QMP command to update the
lifecycle action of 'on_reboot'.
This will be identical to how we set the behaviour during lifetime and
also avoids problems with use of the 'system-reset' QMP command during
bringup of the VM (used to update the firmware table of disks when disks
were hotplugged as part of startup).
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The RESET event is delivered by qemu only when the guest OS is actually
allowed to reboot ('-no-reboot' or equivalent is not used) and due to
the nature of async handling of the events VM is actually already
executing guest code after the reboot, until our code gets to killing
it.
In general it should have been impossible to reach a state where the
reboot action is 'destroy' but we didn't use '-no-reboot' but due to
various bugs it was.
Due to the fact that this was not a desired operation and additionally
guest code already is executing I think the best option is not to kill
the VM any more (possible data loss?) and rely for the proper fix where
we use the new 'set-action' QMP command to enable an equivalent
behaviour to '-no-reboot' during runtime.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Directly use 'priv->allowReboot' as we now document what the behaiour is
to avoid another lookup.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We simply terminate qemu instead of issuing a reset as the semantics of
the setting dictate.
Fix it by handling it identically to 'fake reboot'.
We need to forbid the combination of 'onReboot' -> 'destroy' and
'onPoweroff' -> reboot though as the handling would be hairy and it
honetly makes no sense.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch also includes propagation of flags into the
virNetworkDefParse().
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
All supported QEMU versions have all the fields so we can remove the
booleans controlling which fields are used on the monitor.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Make it more obvious that we care about passing FDs on the commandline
before startup of qemu, which is used to avoid startup monitor polling.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Libvirt assumes that a SCSI bus can fit up to 8 devices
(including controller itself), except for so called wide bus
which can accommodate up to 16 devices (again, including
controller). This plays important role when computing 'drive'
address in virDomainDiskDefAssignAddress(). So far, the only
driver that enables wide SCSI bus is VMX. But with newer
releases, ESX is capable of "super wide" bus (64 devices).
We can blindly bump the limit in our code because then we would
compute address that's invalid for older ESX versions that we
still want to support.
Unfortunately, I haven't found a better place where to store this
than virDomainDef.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We always process the full list so there's no value in storing the count
separately.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'bootHotplug' can be auto-freed when terminating the function and moving
the declaration of 'vcpuprops' to the loop which uses it along with
automatic freeing allows us to simplify cleanup in certain cases.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use virAppendElement instead of virInsertElementsN to implement
VIR_APPEND_ELEMENT which allows us to remove error handling as the
only relevant errors were removed when switching to aborting memory
allocation functions.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add a version of virPidFileForceCleanupPath that takes
a 'group' bool argument and propagate it all the way
down to virProcessKillPainfullyDelay.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The submission of the event to the helper thread has a verbose cleanup
path which was duplicated in all the event handlers. Simplify it by
extracting the code into a helper named 'qemuProcessEventSubmit' and
reuse it where appropriate.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
The removed error messages are impossible as the enum values are
converted via VIR_ENUM helpers and guarded by compiler checks.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
It is also impossible for @info to be non-NULL in the cleanup section so
the cleanup can be completely removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Change the callback prototype and fix the callback registered in the
process code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add launch security type 's390-pv' as well as some tests.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Adding virDomainSecDef for general launch security data
and moving virDomainSEVDef as an element for SEV data.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signaling the condition before vm->def->id is reset to -1 is dangerous:
in case a waiting thread wakes up, it does not see anything interesting
(the domain is still marked as running) and just enters virDomainObjWait
where it waits forever because the condition will never be signalled
again.
Originally it was impossible to get into such situation because the vm
object was locked all the time between signaling the condition and
resetting vm->def->id, but after commit 860a999802 released in 6.8.0,
qemuDomainObjStopWorker called in qemuProcessStop between
virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm
object giving other threads a chance to wake up and possibly hang.
In real world, this can be easily reproduced by killing, destroying, or
just shutting down (from the guest OS) a domain while it is being
migrated somewhere else. The migration job would never finish.
So let's make sure we delay signaling the domain condition to the point
when a woken up thread can detect the domain is not active anymore.
https://bugzilla.redhat.com/show_bug.cgi?id=1949869
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If the attempt to attach a device failed, we erased the
unattached device from the namespace. This resulted in erasing an
already attached device in case of a duplicate. We need to check
for existing file in the namespace in order to determine erasing
it in case of a failure.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1780508
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Remember whether the user passed an explicit index when registering the
event so that we can avoid the top level event when it isn't needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When qos is set or delete, we have to check if the port is an ovs managed
port. If true, call the virNetDevOpenvswitchInterfaceSetQos function when qos
is set, and call the virNetDevOpenvswitchInterfaceClearQos function when
the interface is to be destroyed.
Signed-off-by: Jinsheng Zhang <zhangjl02@inspur.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The NVRAM label is set in qemuSecuritySetAllLabel(). There's no
need to set its label upfront. In fact, setting it twice creates
an imbalance because it's unset only once which mangles seclabel
remembering. However, plain removal of the
qemuSecurityDomainSetPathLabel() undoes the fix for the original
bug (when dynamic ownership is off then the NVRAM is not created
with cfg->user and cfg->group but as root:root). Therefore, we
have to switch to virFileOpenAs() and pass cfg->user and
cfg->group and VIR_FILE_OPEN_FORCE_OWNER flag. There's no need to
pass VIR_FILE_OPEN_FORCE_MODE because the file will be created
with the proper mode.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1969347
Fixes: bcdaa91a27
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
QEMU_DOMAIN_DISK_PRIVATE(disk)->transientOverlayCreated flag
gets true unexpectedly on qemuProcessSetupDisksTransientSnapshot() when
the disk has <transient shareBacking='yes'> option.
The flag should be enabled on qemuDomainAttachDiskGeneric() after the
overlay setup is completed.
Skip enabling transientOverlayCreated for the disk here.
Fixes: 75871da0ec
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Implement this behaviour by skipping the disks on traditional
commandline and hotplug them before resuming CPUs. That allows to use
the support for hotplugging of transient disks which inherently allows
sharing of the backing image as we open it read-only.
This commit implements the validation code to allow it only with buses
supporting hotplug and the hotplug code while starting up the VM.
When we have such disk we need to issue a system-reset so that firmware
tables are regenerated to allow booting from such device.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In preparation for hotplug of <transient> disks we'll need to track
whether the overlay file was created individually per-disk.
Add 'transientOverlayCreated' to 'struct _qemuDomainDiskPrivate' and
remove 'inhibitDiskTransientDelete' from 'qemuDomainObjPrivate' and
adjust the code for the change.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The logic assigning the bootindices from the legacy boot order
configuration was spread through the command line formatters for the
disk device and for the floppy controller.
This patch adds 'effectiveBootindex' property to the disk private data
which holds the calculated boot index and moves the logic of determining
the boot index into 'qemuProcessPrepareDomainDiskBootorder' called from
'qemuProcessPrepareDomainStorage'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Later patches will implement sharing of the backing file, so we'll need
to be able to discriminate the overlays per VM.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The code deals with the startup of the VM and just uses the snapshot
code to achieve the desired outcome.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Creating the overlay for the disk is needed when starting a new VM only.
Additionally for now migration with transient disks is forbidden
anyways.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
None of them are currently needed to pass our upstream CI, most were
either for ancient clang versions or coverity for silencing false
positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
They were added mostly randomly and we don't really want to keep working
around of false positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Previously, nvram file was created with user/group owner as
'root', rather than specifications defined in libvirtd.conf. The
solution is to call qemuDomainOpenFile(), which creates file with
defined permissions and qemuSecurityDomainSetPathLabel() to set
security label for created nvram file.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1783255
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When building a commandline for a DIMM memory device with
non-default access mode, the qemuBuildMemoryBackendProps() will
tell QEMU to allocate memory from per-domain memory backing dir.
But later, when preparing the host, the
qemuProcessNeedMemoryBackingPath() does not check for memory
devices at all resulting in per-domain memory backing dir not
being created which upsets QEMU.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1961114
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The aim of this function is to return whether domain definition
and/or memory device that user intents to hotplug needs a private
path inside cfg->memoryBackingDir. The rule for the memory device
that's being hotplug includes checking whether corresponding
guest NUMA node needs memoryBackingDir. Well, while the rationale
behind makes sense it is not necessary to check for that really -
just a few lines above every guest NUMA node was checked exactly
for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The aim of qemuProcessNeedHugepagesPath() is to return whether
guest needs private path inside HugeTLBFS mounts (deducted from
domain definition @def) or whether the memory device that user is
hotplugging in needs the private path (deducted from the @mem
argument). The actual creation of the path is done in the only
caller qemuProcessBuildDestroyMemoryPaths().
The rule for the first case (@def) and the second case (@mem) is
the same (domain has a DIMM device that has HP requested) and is
written twice. Move the logic into a function to deduplicate the
code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When placing vCPUs into CGroups the qemuProcessSetupPid() is
called which then enters a for() loop (around its middle) where
it calls virDomainNumaGetNodeCpumask() for each guest NUMA node.
But the latter returns only a pointer not new reference/copy and
thus the caller must not free it. But the variable is decorated
with g_autoptr() which leads to a double free.
Fixes: 2d37d8dbc9
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Historically, we declared pointer type to our types:
typedef struct _virXXX virXXX;
typedef virXXX *virXXXPtr;
But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.
This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:
https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
virQEMUCapsGet checks for qemuCaps itself, no need to do it explicitly.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
When the backup job is terminated normally the security label is
restored by the blockjob finishing handler.
If the VM dies or is destroyed that wouldn't happen as the blockjob
handler wouldn't be called.
Restore the security label on disk store where we remember that the job
was running at the point when 'qemuBackupJobTerminate' was called.
Not resetting the security label means that we also leak the xattr
attributes remembering the label which prevents any further use of the
file, which is a problem for block devices.
This also requires that the call to 'qemuBackupJobTerminate' from
'qemuProcessStop' happens only after 'vm->pid' was reset as otherwise
the security subdrivers attempt to enter the process namespace which
fails if the process isn't running any more.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1939082
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When connecting to the monitor, a timeout is calculated that is
bigger the more memory guest has (because QEMU has to allocate
and possibly zero out the memory and what not, empirically
deducted). However, when computing the timeout the @total_memory
mmember is accessed directly even though
virDomainDefGetMemoryTotal() should have been used.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'qemuBackupJobTerminate' needs the API flags to see whether
VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL. Unfortunately when called via
qemuProcessReconnect()->qemuProcessStop() early (e.g. if the qemu
process died while we were reconnecting) the job is cleared temporarily
so that other APIs can be called. This would mean that we couldn't clean
up the files in some cases.
Save the 'apiFlags' inside the backup object and set it from the
'qemuDomainJobObj' 'apiFlags' member when reconnecting to a VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The current code is written under the assumption that, for all
limits except the core size, asking for the limit to be set to
zero is a no-op, and so the operation is performed
unconditionally.
While this is the behavior we want for the QEMU driver, the
virCommand and virProcess facilities are generic, and should not
implement this kind of policy: asking for a limit to be set to
zero should result in that limit being set to zero every single
time.
Add some checks in the QEMU driver, effectively moving the
policy where it belongs.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemuProcessLaunch() is the correct place to set process limits,
and in fact is where we were dealing with almost all of them,
but the memory locking limit was handled in
qemuBuildCommandLine() instead for some reason.
The code is rewritten so that the desired limit is calculated
and applied in separated steps, which will help with further
changes, but this doesn't alter the behavior.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Doing this now will make the next changes nicer.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The function is constructing an error message from a prefix and the
contents of the qemu log file. Marking just two string modifiers as
translatable is pointless and will certainly confuse translators.
Remove the marking and add a comment which bypasses the
sc_libvirt_unmarked_diagnostics check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that error message formatting doesn't use fixed size buffers we can
drop the math for calculating the maximum chunk of log to report in the
error message and use a round number. This also makes it obvious that
the chosen number is arbitrary.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Generated using the following spatch:
@@
expression path;
@@
- virFileMakePath(path)
+ g_mkdir_with_parents(path, 0777)
However, 14 occurrences were not replaced, e.g. in
virHostdevManagerNew(). I don't really understand why.
Fixed by hand afterwards.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
These functions are identical. Made using this spatch:
@@
expression path, mode;
@@
- virFileMakePathWithMode(path, mode)
+ g_mkdir_with_parents(path, mode)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Using the job owner API name directly works fine as long as it is a
static string or the owner's thread is still running. However, this is
not always the case. For example, when the owner API name is filled in a
job when we're reconnecting to existing domains after daemon restart,
the dynamically allocated owner name will disappear with the
reconnecting thread. Any follow up usage of the pointer will read random
memory.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
qemuMonitorUnregister will be called in multiple threads (e.g. threads
in rpc worker pool and the vm event thread). In some cases, it isn't
protected by the monitor lock, which may lead to call g_source_unref
more than one time and a use-after-free problem eventually.
Add the missing lock in qemuProcessHandleMonitorEOF (which is the only
position missing lock of monitor I found).
Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch partially reverts commit 5cde9dee where the qemuExtDevicesStop()
was moved to a location before the QEMU process is stopped. It may be
alright to tear down some devices before QEMU is stopped, but it doesn't work
for the external TPM (swtpm) which assumes that QEMU sends it a signal to stop
it before libvirt may try to clean it up. So this patch moves the
virFileDeleteTree() calls after the call to qemuExtDevicesStop() so that the
pid file of virtiofsd is not deleted before that call.
Afftected libvirt versions are 6.10 and 7.0.
Fixes: 5cde9dee8c
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
These messages are only valid while the domain is running.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Our implementation was inspired by glib anyways. The difference is only
the order of arguments.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The glib variant doesn't accept NULL list, but there's just one caller
where it wasn't checked explicitly, thus there's no need for our own
wrapper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use automatic memory freeing and remove the 'cleanup' label. Also make
it a bit more obvious that nothing happens if the 'old' list wasn't
present.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The "max" model can be treated the same way as "host" model in general.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Up until now we had a runtime code and XML related code in the same
source file inside util directory.
This patch takes the runtime part and extracts it into the new
storage_file directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that this function can be called regardless of interface type (and
whether or not we have a conn for the network driver), let's actually
call it for all interface types. This will assure that we re-connect
any disconnected bridge devices for <interface type='bridge'> as
mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1730084#c26
(until now we've only been reconnecting bridge devices for <interface
type='network'>)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Some secdrivers (typically SELinux driver) generate unique
dynamic seclabel for each domain (unless a static one is
requested in domain XML). This is achieved by calling
qemuSecurityGenLabel() from qemuProcessPrepareDomain() which
allocates unique seclabel and stores it in domain def->seclabels.
The counterpart is qemuSecurityReleaseLabel() which releases the
label and removes it from def->seclabels. Problem is, that with
current code the qemuProcessStop() may still want to use the
seclabel after it was released, e.g. when it wants to restore the
label of a disk mirror.
What is happening now, is that in qemuProcessStop() the
qemuSecurityReleaseLabel() is called, which removes the SELinux
seclabel from def->seclabels, yada yada yada and eventually
qemuSecurityRestoreImageLabel() is called. This bubbles down to
virSecuritySELinuxRestoreImageLabelSingle() which find no SELinux
seclabel (using virDomainDefGetSecurityLabelDef()) and this
returns early doing nothing.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1751664
Fixes: 8fa0374c5b
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The kernel refuses to set guest TSC frequency less than a minimum
frequency or greater than maximum frequency (both computed based on the
host TSC frequency). When writing the libvirt code with a reversed logic
(return success when the requested frequency falls within the tolerance
interval) I forgot to include the boundaries.
Fixes: d8e5b45600https://bugzilla.redhat.com/show_bug.cgi?id=1839095
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When starting a VM with an empty cdrom which has <iotune> configured the
startup fails as qemu is not happy about setting tuning for an empty
drive:
error: internal error: unable to execute 'block_set_io_throttle', unexpected error: 'Device has no medium'
Resolve this by skipping the setting of throttling for empty drives and
updating the throttling when new medium is inserted into the drive.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/111
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Simplify ReserveName/GenerateName for macvlan and macvtap by using
common functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Simplify GenerateName/ReserveName for netdevtap by using common
functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Move virDomainDeviceDefValidate() and all its helper functions to
domain_validate.c.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virDomainDefPostParse infrastructure has apart from the global opaque
data also per-run data, but this was not duplicated into the validation
callbacks.
This is important when drivers want to use correct run-state for the
validation.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Nodename may be asociated to a disk backup job, add support to looking
up in that chain too. This is specifically useful for the
BLOCK_WRITE_THRESHOLD event which can be registered for any nodename.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In v6.0.0-rc1~439 (and friends) we tried to cache NUMA
capabilities because we assumed they are immutable. And to some
extent they are (NUMA hotplug is not a thing, is it). However,
our capabilities contain also some runtime info that can change,
e.g. hugepages pool allocation sizes or total amount of memory
per node (host side memory hotplug might change the value).
Because of the caching we might not be reporting the correct
runtime info in 'virsh capabilities'.
The NUMA caps are used in three places:
1) 'virsh capabilities'
2) domain startup, when parsing numad reply
3) parsing domain private data XML
In cases 2) and 3) we need NUMA caps to construct list of
physical CPUs that belong to NUMA nodes from numad reply. And
while this may seem static, it's not really because of possible
CPU hotplug on physical host.
There are two possible approaches:
1) build a validation mechanism that would invalidate the
cached NUMA caps, or
2) drop the caching and construct NUMA caps from scratch on
each use.
In this commit, the latter approach is implemented, because it's
easier.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1819058
Fixes: 1a1d848694
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
qemuBuildCommandLine() is calling qemuDomainAlignMemorySizes(),
which is an operation that changes live XML and domain and has
little to do with the command line build process.
Move it to qemuProcessPrepareDomain() where we're supposed to
make live XML and domain changes before launch. qemuProcessStart()
is setting VIR_QEMU_PROCESS_START_NEW if !migrate && !snapshot,
same conditions used in qemuBuildCommandLine() to call
qemuDomainAlignMemorySizes(), making this change seamless.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
qemuProcessCreatePretendCmdPrepare() is setting the
VIR_QEMU_PROCESS_START_NEW regardless of whether this is
a migration case or not. This behavior differs from what we're
doing in qemuProcessStart(), where the flag is set only
if !migrate && !snapshot.
Fix it by making the flag setting consistent with what we're
doing in qemuProcessStart().
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Let's pass along / fill @niothreads rather than trying to make dual
use as a return value and thread count.
This resolves a Coverity issue detected in qemuDomainGetIOThreadsMon
where if qemuDomainObjExitMonitor failed, then a -1 was returned and
overwrite @niothreads causing a memory leak.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Glib provides g_auto(GStrv) which is in-place replacement of our
VIR_AUTOSTRINGLIST.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some CPUs provide a way to read exact TSC frequency, while measuring it
is required on other CPUs. However, measuring is never exact and the
result may slightly differ across reboots. For this reason both Linux
kernel and QEMU recently started allowing for guests TSC frequency to
fall into +/- 250 ppm tolerance interval around the host TSC frequency.
Let's do the same to avoid unnecessary failures (esp. during migration)
in case the host frequency does not exactly match the frequency
configured in a domain XML.
https://bugzilla.redhat.com/show_bug.cgi?id=1839095
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
A qemu guest which has virtiofs config fails to start if the previous
starting failed because of invalid option or something.
That's because the virtiofsd isn't killed by virPidFileForceCleanupPath()
on the former failure because the pidfile was already removed by
virFileDeleteTree(priv->libDir) in qemuProcessStop(), so
virPidFileForceCleanupPath() just returned.
Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that
virPidFileForceCleanupPath() can kill virtiofsd correctly.
For example of the reproduction:
# virsh start guest
error: Failed to start domain guest
error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option
... fix the option ...
# virsh start guest
error: Failed to start domain guest
error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy
#
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Don't hide our use of GHashTable behind our typedef. This will also
promote the use of glibs hash function directly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Matt Coleman <matt@datto.com>
Currently all errors from qemuInterfacePrepareSlirp() are completely
ignored by the callers. The intention is that missing qemu-slirp binary
should cause the caller to fallback to the built-in slirp impl.
Many of the possible errors though should indeed be considered fatal.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since QEMU 5.2 (commit-77b285f7f6), QEMU supports 'memory failure'
event, posts event to monitor if hitting a hardware memory error.
Fully support this feature for QEMU.
Test with commit 'libvirt: support memory failure event', build a
little complex environment(nested KVM):
1, install newly built libvirt in L1, and start a L2 vm. run command
in L1:
~# virsh event l2 --event memory-failure
2, run command in L0 to inject MCE to L1:
~# virsh qemu-monitor-command l1 --hmp mce 0 9 0xbd000000000000c0 0xd 0x62000000 0x8c
Test result in l1(recipient hypervisor case):
event 'memory-failure' for domain l2:
recipient: hypervisor
action: ignore
flags:
action required: 0
recursive: 0
Test result in l1(recipient guest case):
event 'memory-failure' for domain l2:
recipient: guest
action: inject
flags:
action required: 0
recursive: 0
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
All users of virHashTable pass strings as the name/key of the entry.
Make this an official requirement by turning the variables to 'const
char *'.
For any other case it's better to use glib's GHashTable.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Use of the -enable-fips option is being deprecated in QEMU >= 5.2.0. If
FIPS compliance is required, QEMU must be built with libcrypt which will
unconditionally enforce it.
Thus there is no need for libvirt to pass -enable-fips to modern QEMU.
Unfortunately there was never any way to probe for -enable-fips in the
first instance, it was enabled by libvirt based on version number
originally, and then later unconditionally enabled when libvirt dropped
support for older QEMU. Similarly we now use a version number check to
decide when to stop passing -enable-fips.
Note that the qemu-5.2 capabilities are currently from the pre-release
version and will be updated once qemu-5.2 is released.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This patch adds new schema and adds support for parsing and formatting
domain configurations that include vdpa devices.
vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.
When a device on the host is bound to an appropriate vendor-specific
driver, it will create a chardev on the host at e.g. /dev/vhost-vdpa-0.
That chardev path can then be used to define a new interface with
type='vdpa'.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
SCSI hostdev setup requires querying the host os for the actual path of
the configured hostdev. This was historically done in the command line
formatter. Our new approach is to split out this part into
'qemuProcessPrepareHost' which is designed to be skipped in tests.
Refactor the hostdev code to use this new semantics, and add appropriate
handlers filling in the data for tests and the qemuConnectDomainXMLToNative
users.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Host preparation steps which are deliberately skipped when
pretend-creating a commandline are normally executed after VM object
preparation. In the test code we are faking some of the host
preparation steps, but we were doing that prior to the call to
qemuProcessPrepareDomain embedded in qemuProcessCreatePretendCmd.
By splitting up qemuProcessCreatePretendCmd into two functions we can
ensure that the ordering of the prepare steps stays consistent.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
These XML attributes have been mandatory since the introduction of SEV
support to libvirt. This design decision was based on QEMU's
requirement for these to be mandatory for migration purposes, as
differences in these values across platforms must result in the
pre-migration checks failing (not that migration with SEV works at the
time of this patch).
This patch enables autofill of these attributes right before launching
QEMU and thus updating the live XML.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Checks such as this one should be done at domain def validation time,
not before starting the QEMU process.
As for this change, existing domains will see some QEMU error when
starting as opposed to a libvirt error that this QEMU binary doesn't
support SEV, but that's okay, we never guaranteed error messages to
remain the same.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
As preparation for g_autoptr() we need to change the function to take
only virCgroupPtr.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Move them to separate conditions to reduce churn
in following patches.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Add overlays after the VM starts before we start executing guest code.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
Later patches will implement support for <transient/> disks in libvirt
by installing an overlay on top of the configured image. This will
require cleanup after the VM will be stopped so that the state is
correctly discarded.
Since the overlay will be installed only during the startup phase of the
VM we need to ensure that qemuProcessStop doesn't delete the original
file on some previous failure. This is solved by adding
'inhibitDiskTransientDelete' VM private data member which is set prior
to any startup step and will be cleared once transient disk overlays are
established.
Based on that we can then delete the overlays for any <transient/> disk.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
Historically we've prepared secrets for all objects in one place. This
doesn't make much sense and it's semantically more appealing to prepare
everything for a single device type in one place.
Move the setup of the (iSCSI|SCSI) hostdev secrets into a new function
which will be used to setup other things as well in the future.
This is a similar approach we do for disks.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After we started copying the privateData pointer in
qemuDomainObjRestoreJob, we should also free them
once we're done with them.
Register the clear function and use g_auto.
Also add a check for job->cb to qemuDomainObjClearJob,
to prevent freeing an uninitialized job.
https://bugzilla.redhat.com/show_bug.cgi?id=1878450
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: aca37c3fb2
When starting a domain, qemuProcessLaunch() iterates over all
VIR_PERF_EVENT_* values and (possibly) enables them. While there
is nothing wrong with the code, the for loop where it's done makes
it harder to jump onto next block of code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This is just a clean-up of commit 3791f29b08 using the new parameter of
virProcessSetAffinity() introduced in commit 9514e24984 so that there is
no error reported in the logs.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>