It has nothing to do with assigning addresses, so it makes more
sense to have it in qemu_domain.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
After v9.1.0-rc1~116 we track whether it's us who created a
macvtap or not. But when updating a vNIC its definition might be
replaced with a new one (though, ifname is not allowed to
change), e.g. to reflect new QoS, link state, etc.
Now, the fact whether we created macvtap for given vNIC is stored
in net->privateData->created. And replacing definition is done by
simply freeing the old definition and making the pointer point to
the new one. But this does not preserve the 'created' flag, which
in turn means when a domain is shutting off, the macvtap is not
removed (see loop inside of qemuProcessStop()).
Copy this flag into new definition and leave a note in
_qemuDomainNetworkPrivate struct.
Fixes: 61d1b9e6592660121aeda66bf7adbcd39de06aa8
Resolves: https://issues.redhat.com/browse/RHEL-22714
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It belongs next to qemuDomainSupportsPCI().
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The way the function is currently written sort of obscures this
fact, but ultimately we already unconditionally assume PCI
support on most architectures.
Arm and RISC-V need some additional checks to maintain
compatibility with existing configurations but for all future
architectures, such as the upcoming LoongArch64, we expect PCI
support to come out of the box.
Last but not least, the functions is made const-correct.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For all versions of QEMU that we support, the virt machine type
has a hard dependency on this device, so we can stop checking
whether the capability is present and just use it unconditionally.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
QEMU's blockdev-mirror job doesn't allow copy into a destination which
isn't exactly the same size as source. This is a problem for
non-shared-storage migration when migrating into a raw block device, as
there it's very hard to ensure that the destination size will match the
source size.
Rather than failing the migration, we can add a storage slice in such
case automatically and thus make the migration pass.
To do this we need to probe the size of the block device on the
destination and if it differs form the size detected on the source we'll
install the 'slice'.
An additional handling is required when persisting the VM as we want to
propagate the slice even there to ensure that the device sizes won't
change.
Resolves: https://issues.redhat.com/browse/RHEL-4607
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Move qemuDomainStorageUpdatePhysical, qemuDomainStorageOpenStat,
qemuDomainStorageCloseStat to qemu_domain.c and export them. They'll be
reused in the migration code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Return whether a relevant cachemode was presented rather than returning
an error, so that callers can be simplified. Use the proper enum type as
argument rather than typecasting in the switch statement.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Adds the ability to monitor the nbdkit process so that we can take
action in case the child exits unexpectedly.
When the nbdkit process exits, we pause the vm, restart nbdkit, and then
resume the vm. This allows the vm to continue working in the event of a
nbdkit failure.
Eventually we may want to generalize this functionality since we may
need something similar for e.g. qemu-storage-daemon, etc.
The process is monitored with the pidfd_open() syscall if it exists
(since linux 5.3). Otherwise it resorts to checking whether the process
is alive once a second. The one-second time period was chosen somewhat
arbitrarily.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This will allow us to use it for nbdkit logging in upcoming commits.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Allow to specify a basename for the log file so that
qemuDomainLogContextNew() can be used to create log contexts for
secondary loggers.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
An object for storing information about a nbdkit process that is serving
a specific virStorageSource. At the moment, this information is just
stored in the private data of virStorageSource and not used at all.
Future commits will use this data to actually start a nbdkit process.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Support for legacy cpu hotplug was removed a long time ago. At this
point this function only checks whether the current machine type
supports cpu hotplug.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since its introduction in 4d1b771fbb610537b7425e649a490143588b8ed3
it has only been used to differentiate between START and non-START.
Last use of QEMU_DOMAIN_LOG_CONTEXT_MODE_ATTACH was removed by:
commit f709377301b919a9fcbfc366e33057f7848bee28
qemu: Fix qemuDomainObjTaint with virtlogd
QEMU_DOMAIN_LOG_CONTEXT_MODE_STOP is unused since:
commit cf3ea0769c54a328733bcb0cd27f546e70090c89
qemu: process: Append the "shutting down" message using the new APIs
Now, the only caller passes QEMU_DOMAIN_LOG_CONTEXT_MODE_START.
Assume that's always the case and remove the 'mode' argument.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We only need the domain definition from domain object. This will allow
us to use it from snapshot code where we need to pass different domain
definition.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Consider a domain with two guest NUMA nodes and the following
<numatune/> setting :
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="0" mode="strict" nodeset="1"/>
</numatune>
What this means is the emulator thread is pinned onto host NUMA
node #0 (by setting corresponding cpuset.mems to "0"), and two
memory-backend-* objects are created:
-object '{"qom-type":"memory-backend-ram","id":"ram-node0", .., "host-nodes":[1],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
-object '{"qom-type":"memory-backend-ram","id":"ram-node1", .., "host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
Note, the emulator thread is pinned well before QEMU is even
exec()-ed.
Now, the way memory allocation works in QEMU is: the emulator
thread calls mmap() followed by mbind() (which is sane, that's
how everybody should do it). BUT, because the thread is already
restricted by CGroups to just NUMA node #0, calling:
mbind(host-nodes:[1]); /* made up syntax (TM) */
fails. This is expected though. Kernel was instructed to place
the memory at NUMA node "0" and yet, process is trying to place
it elsewhere.
We used to solve this by not restricting emulator thread at all
initially, and only after it's done initializing (i.e. we got the
QMP greeting) we placed it onto desired nodes. But this had its
own problems (e.g. QEMU might have locked pieces of its memory
which were then unable to migrate onto different NUMA nodes).
Therefore, in v5.1.0-rc1~282 we've changed this and set cgroups
upfront (even before exec()-ing QEMU). And this used to work, but
something has changed (I can't really put my finger on it).
Therefore, for the initialization start the thread with union of
all configured host NUMA nodes ("0-1" in our example) and fix the
placement only after QEMU is started.
NB, the memory hotplug suffers the same problem, but that will
be fixed in the next commit.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainGetMemLockLimitBytes() with @forceVFIO set. All
callers pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous cleanup, there's not a single caller that would
call qemuDomainAdjustMaxMemLock() with @forceVFIO set. All callers
pass false.
Drop the unneeded argument from the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
During hotplug of a NVMe disk we need to adjust the memlock
limit. The computation of the limit is handled by
qemuDomainGetMemLockLimitBytes() which looks at given domain
definition and accounts for various device types (as different
types require different amounts). But during disk hotplug the
disk is not added to domain definition until the very last
moment. Therefore, qemuDomainGetMemLockLimitBytes() has this
@forceVFIO argument which tells it to assume VFIO even if there
are no signs of VFIO in domain definition. And this kind of
works, until the amount needed for NVMe disks changed (in
v9.3.0-rc1~52). What's missing in the commit is making @forceVFIO
behave the same as if there was an NVMe disk present in the
domain definition.
But, we can do even better - just mimic whatever we're doing for
hostdevs. IOW - introduce qemuDomainAdjustMaxMemLockNVMe() that
behaves the same as qemuDomainAdjustMaxMemLockHostdev().
There are subtle differences though:
1) qemuDomainAdjustMaxMemLockHostdev() can afford placing hostdev
right at the end of vm->def->hostdevs, because the array was
already reallocated (at the beginning of
qemuDomainAttachHostPCIDevice()). But
qemuDomainAdjustMaxMemLockNVMe() doesn't have that luxury.
2) qemuDomainAdjustMaxMemLockHostdev() places a
virDomainHostdevDef pointer into domain definition, while
qemuDomainStorageSourceAccessModifyNVMe() (which calls
qemuDomainAdjustMaxMemLock()) sees a virStorageSource pointer
but domain definition contains virDomainDiskDef. But that's
okay, we can create a dummy disk definition and append it into
the domain definition.
After this, qemuDomainAdjustMaxMemLock() can be called with
@forceVFIO = false, as the disk is now part of domain definition
(when computing the new limit).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030#c28
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After cleanup done in v8.2.0-rc1~47 the
qemuDomainObjExitMonitor() and after v8.7.0-rc1~176 the
qemuDomainObjEnterMonitor() lost the @driver argument. But
corresponding ATTRIBUTE_NONNULL() annotation was not removed and
both functions are still annotated as ATTRIBUTE_NONNULL(2) even
though they accept just one argument (@obj).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
This commit changes the _qemuDomainStorageSourcePrivate struct
to support multiple secrets (instead of a single one before this commit).
This will useful for storage encryption requiring more than a single secret.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The set of if()-s that determines the preference in cpumask used
for setting things like emulatorpin, vcpupin, etc. is going to be
re-used. Separate it out into a function.
You may think that this changes behaviour, but
qemuProcessPrepareDomainNUMAPlacement() ensures that
priv->autoCpuset is set for VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
This can improve performance for some guests since it reduces copying of
display data between host and guest. Requires udmabuf on the host.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The function is used only inside qemu_domain.c, unexport it and move it
above its user.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
When a QEMU netdev is of type "stream", if the socket it uses for
connectivity to the host network gets closed, then QEMU will send a
NETDEV_STREAM_DISCONNECTED event. We know that any stream netdev we've
created is backed by a passt process, and if the socket was closed,
that means the passt process has disappeared.
When we receive this event, we can respond by starting a new passt
process with the same options (including socket path) we originally
used. If we have previously created the stream netdev device with a
"reconnect" option, then QEMU will automatically reconnect to this new
passt process. (If we hadn't used "reconnect", then QEMU will never
try to reconnect to the new passt process, so there's no point in
starting it.)
Note that NETDEV_STREAM_DISCONNECTED is an event sent for the netdev
(ie "host side") of the network device, and so it sends the
"netdev-id" to specify which device was disconnected. But libvirt's
virDomainNetDef (the object used to keep track of network devices) is
the internal representation of both the host-side "netdev", and the
guest side device, and virDomainNetDef doesn't directly keep track of
the netdev-id, only of the device's "alias" (which is the "id"
parameter of the *guest* side of the device). Fortunately, by convention
libvirt always names the host-side of devices as "host" + alias, so in
order to search for the affected NetDef, all we need to do is trim the
1st 4 characters from the netdev-id and look for the NetDef having
that resulting trimmed string as its alias. (Contrast this to
NIC_RX_FILTER_CHANGED, which is an event received for the guest side
of the device, and so directly contains the device alias.)
Resolves: https://bugzilla.redhat.com/2172098
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If a domain is configured to create a macvtap/macvlan but the
target link already exists, startup fails (as expected) with:
error: error creating macvtap interface test@eth0 (52:54:00:d9:0b:db): File exists
Okay, we could make that error message better, but that's not the
point. Since this error originated while generating cmd line
(the caller is qemuProcessStart(), transitively), the cleanup
after failed start is performed (qemuProcessStop()). Here,
virNetDevMacVLanDeleteWithVPortProfile() is called which removes
the macvtap interface we did not create (as it made us fail in
the first place).
Therefore, we need to track which macvtap/macvlan interface was
created successfully and remove only those.
You'll notice that only qemuProcessStop() has the new check. For
the (failed) hotplug case (qemuDomainAttachNetDevice()) this
function is already in place (the @iface_connected variable), or
not needed (qemuDomainRemoveNetDevice() - we're removing an
interface that was already attached to QEMU).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2166235
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The field is no longer used so we can remove it and the code filling it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
This consists of (1) adding the necessary args to the qemu commandline
netdev option, and (2) starting a passt process prior to starting
qemu, and making sure that it is terminated when it's no longer
needed. Under normal circumstances, passt will terminate itself as
soon as qemu closes its socket, but in case of some error where qemu
is never started, or fails to startup completely, we need to terminate
passt manually.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When starting up a VM with FD-passed images we need to look up the
corresponding named FD set and associate it with the virStorageSource
based on the name.
The association is brought into virStorageSource as security labelling
code will need to access the FD to perform selinux labelling.
Similarly when startup is complete in certain cases we no longer need to
keep the copy of FDs and thus can close them.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The new helper qemuDomainStartupCleanup is used to perform cleanup after
a startup of a VM (successful or not). The initial implementation just
calls qemuDomainSecretDestroy, which can be un-exported.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Implement passing and storage of FDs for the qemu driver. The FD tuples
are g_object instances stored in a per-domain hash table and are
automatically removed once the connection is closed.
In the future we can consider supporting also to not tie the lifetime of
the passed FDs bound to the connection.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When daemon is restarted and libvirt tries to recover domain jobs we
need to know if the snapshot job was a snapshot delete in order to
safely abort running QEMU block jobs.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Internal domain state needs to be refreshed after reset from the guest
side because it may be inconsistent with the internal qemu state.
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
While technically thread-context objects can be reused, we only
use them (well, will use them) to pin memory allocation threads.
Therefore, once we connect to QEMU monitor, all memory (with
prealloc=yes) was allocated and thus these objects are no longer
needed and can be removed. For on demand allocation the TC object
is left behind.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
SGX memory backend needs to access /dev/sgx_vepc (which allows
userspace to allocate "raw" EPC without an associated enclave)
and /dev/sgx_provision (which allows creating provisioning
enclaves). Allow these two devices in CGroups if a domain is
configured so.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Haibin Huang <haibin.huang@intel.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Never remove the TPM state on outgoing migration if the storage setup
has shared storage for the TPM state files. Also, do not do the security
cleanup on outgoing migration if shared storage is detected.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add support for storing private TPM-related data. The first private data
will be related to the capability of the started swtpm indicating whether
it is capable of migration with a shared storage setup since that requires
support for certain command line flags that were only becoming available
in v0.8.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The aim of this helper function is to spawn a child process in
which new scheduling group is created. This dummy process will
then used to distribute scheduling group from (e.g. when starting
helper processes or QEMU itself). The process is not needed for
QEMU_SCHED_CORE_NONE case (obviously) nor for
QEMU_SCHED_CORE_VCPUS case (because in that case a slightly
different child will be forked off).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There are couple of scenarios where we need to reflect MAC change
done in the guest:
1) domain restore from a file (here, we don't store updated MAC
in the save file and thus on restore create the macvtap with
the original MAC),
2) reconnecting to a running domain (here, the guest might have
changed the MAC while we were not running),
3) migration (here, guest might change the MAC address but we
fail to respond to it,
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Parts of the code that responds to the NIC_RX_FILTER_CHANGED
event are going to be re-used. Separate them into a function
(qemuDomainSyncRxFilter()) and move the code into qemu_domain.c
so that it can be re-used from other places of the driver.
There's one slight change though: instead of passing device alias
from the just received event to qemuMonitorQueryRxFilter(), I've
switched to using the alias stored in our domain definition. But
these two are guaranteed to be equal. virDomainDefFindDevice()
made sure about that, if nothing else.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This patch adds a hashtable for storing the stats schema and a function
to refresh it by querying "query-stats-schemas" using
qemuMonitorQueryStatsSchema
Signed-off-by: Amneesh Singh <natto@weirdnatto.in>
Add UNDEFINE_TPM and UNDEFINE_KEEP_TPM flags to qemuDomainUndefineFlags()
API and --tpm and --keep-tpm to 'virsh undefine'. Pass the
virDomainUndefineFlagsValues via qemuDomainRemoveInactive()
from qemuDomainUndefineFlags() all the way down to
qemuTPMEmulatorCleanupHost() and delete TPM storage there considering that
the UNDEFINE_TPM flag has priority over the persistent_state attribute
from the domain XML. Pass 0 in all other API call sites to
qemuDomainRemoveInactive() for now.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This patch uses the job object directly in the domain object and
removes the job object from private data of all drivers that use
it as well as other relevant code (initializing and freeing the
structure).
Signed-off-by: Kristina Hanicova <khanicov@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All callers pass 'true'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Previous patches removed the job submission for the handler so now even
the handler itself can be removed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>