In our passt example XML we use /var/log/passt.log as path to the
log file. This is not optimal, because in case of unprivileged
daemon, neither libvirt nor passt has enough permissions to
create the file. Let's move the file under /tmp.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There are a few situations where passt itself is unable to create
a file because it runs under QEMU user (e.g. just like our
example from formatdomain.rst suggests: /var/log/passt.log). If
libvirtd runs with sufficient permissions (e.g. as root) it can
create the file and set seclabels on it so that passt can then
open it.
Ideally, we would just pass pre-opened FD, but this wasn't viewed
as secure enough [1]. So lets just create the file and set
seclabels.
For the case when both libvirtd and passt have the same
permissions, well then we fail before even needing to fork() and
exec().
1: https://archives.passt.top/passt-dev/20230606225836.63aecebe@elisabeth/
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209191
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
New storage types are not implemented in generators for -drive and the
xen config. Explicitly reject them in case of a programming error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Update to v8.0.0-1739-g5f9dd6a8ce and build on a newer kernel and with
newer libblkio.
Notable changes:
- 'fdset' feature is supported for the vdpa block backend provided by
libblkio
- 'xsaves' feature is optional for EPYC-Rome
- 'cryptodev-backend-lkcf' and 'PIIX3-xen' devices removed
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If there are no parameters, there is nothing to validate.
If params == NULL, memcpy below results in memcpy(sorted, NULL, 0),
which is UB.
Found by UBSAN. Example of this codepath: virDomainBlockCopy()
(where nparams == 0 is valid) -> qemuDomainBlockCopy()
Signed-off-by: Oleg Vasilev <oleg.vasilev@virtuozzo.com>
Reviewed-by: Kristina Hanicova <khanicov@redhat.com>
When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for
removal, it returns true, which means it is the caller
that marked the device for removal is going to remove the
device from domain definition.
But qemuDomainWaitForDeviceRemoval() may still receive
timeout from virDomainObjWaitUntil() which is implemented
by pthread_cond_timedwait() due to an unavoidable race
between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.
And then qemuDomainWaitForDeviceRemoval() will return 0,
thus the caller will not remove the device from domain
definition.
In this situation, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is
no way to remove it from the domain definition.
Solution is to recheck the value of priv->unplug.alias to
determine who is going to remove the device from domain
definition.
Signed-off-by: zuo boqun <zuoboqun@baidu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Qemu 8.1.0 will add discard_no_unref option for qcow2 images.
When this option is enabled (default=false), then it will no longer
unreference clusters when guest does a discard, but it will just free
the blocks (useful for incremental backups for example) and pass the
discard to the lower layer.
This was implemented to avoid fragmentation within the qcow2 image.
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The qcow2 driver allows passing discards to the storage while keeping
the reference of the block, and just marking it as zeroed. This can
decrease the levels of fragmentation of the qcow2 metadata when
discards are enabled.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Add a config where both DIMM and non-DIMM <memory> devices are used so
that it validates that only DIMMs require memory slots.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Memory slots are required only for DIMM-like devices, but the maximum
memory address space is relevant also for other non-DIMM memory devices
such as virtio-mem. Allow configurations where no slots are added.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Memory slots are required only for DIMM-like devices, while other
devices defined via <memory> such as virtio-mem may use the PCI bus and
thus do not require/consume a memory slot.
Fix the validation code to calculate the required count of memory
devices only for DIMMs and NVDIMMs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Specify the memory size by using '-m size=2048k' instead of just '-m 2'.
The new syntax is used when memory hotplug is enabled. To preserve
memory sizing, if memory hotplug is disabled the size is rounded down to
the nearest mebibyte.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This should not be needed, but here's what's happening:
virStrToLong_*() family of functions was switched from strtol*()
to g_ascii_strtol*() in order to handle corner cases on Windows
(most notably parsing hex numbers with base=0) - see
v9.4.0-61-g2ed41d7cd9. But what we did not realize back then, is
the fact that g_ascii_strtol*() family has their own global lock
rendering virStrToLong_*() function unsafe between fork() +
exec(). Worse, if one of the threads has to wait for the lock (or
on its corresponding condition), then errno is mangled and
g_ascii_strtol*() signals an error, even though there's no error.
Read more here:
https://gitlab.gnome.org/GNOME/glib/-/issues/3034
Nevertheless, if we make glib init the g_ascii_strtol*() global
state (by calling one function from g_ascii_strtol*() family),
then there shouldn't be any congestion on the lock and thus no
errno mangling.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The current implementation of virConnectBaselineHypervisorCPU in QEMU
driver can provide a CPU definition that will not work on all hosts in
case they have different maximum physical address size. So when we get
the info from domain capabilities, we need to choose the smallest
physical address size for the computed baseline CPU definition.
https://bugzilla.redhat.com/show_bug.cgi?id=2171860
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We already report the hosts physical address size in host capabilities,
but computing a baseline CPU definition is done from domain
capabilities.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Newer GCC (13.1.1 in my case) wrongly reports "maybe uninitialized"
warning for this variable inside the next condition. Even though this
accusation is wrong (the condition is guarded by the same condition as
the for cycle initializing it), initialize it during the declaration so
compilation errors don't stop others and maybe also future proof the
code for changes.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Using os.system("cp {0} {1}".format(...)) has two issues, it does not
work on Windows, but more importantly it can cause issues in case one of
the directories has a space in it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This has two main advantages:
- it parses the number with C locale explicitly
- it behaves the same on Windows as on Linux and BSD
both of which are wanted behaviours.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With the last user gone this function can be abolished. It is
preferable to use _ll instead since that is not a subject to 32/64 bit
scaling.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is used to fill an unsigned long long anyway and if it is negative
than there is really an issue somewhere.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So far this change alone doesn't make much sense, but prepares
code for upcoming change. Unfortunately, some conf.has()
statements have to stay, because there's no corresponding
dependency(). But that's okay.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The pkg-config file to libnuma was introduced in 2.0.12 release
(though the comment mistakenly claims 2.0.14 version). Every
supported distro ships at least this version, and thus we can
switch meson detection to dependency().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The pkg-config file to libattr was introduced in 2.4.48 release.
Now that every supported distro ships at least this version, we
can switch meson detection to dependency().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The pkg-config file to libacl was introduced in 2.2.53 release.
Now that every supported distro ships at least this version, we
can switch meson detection to dependency().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The @unionMems argument of qemuProcessSetupPid() function is not
necessary really as all callers pass 'true'. Drop it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The unit that cpuset CGroups controller works with is a
thread/process, not individual memory allocations. Therefore,
after we've set cpuset.mems for emulator (after previous commit
it's set to union of all host NUMA nodes allowed for given
domain), and as we try to set up cpuset.mems for vCPUs/IOThreads,
memory is migrated to selected NUMA node(s). We are effectively
saying: "this thread (vCPU thread) can have memory only from
these NUMA node(s)".
That's not really what we want though. The cpuset controller
doesn't differentiate memory "belonging" to the emulator thread
and vCPU thread or IOThread even.
Therefore, set union of all allowed host NUMA nodes, just like
we're doing for the emulator thread.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In ideal world, my plan was perfect. We allow union of all host
nodes in cpuset.mems and once QEMU has allocated its memory, we
'fix up' restriction of its emulator thread by writing the
original value we wanted to set all along. But in fact, we can't
do it because that triggers memory movement. For instance,
consider the following <numatune/>:
<numatune>
<memory mode="strict" nodeset="0"/>
<memnode cellid="1" mode="strict" nodeset="1"/>
</numatune>
<numa>
<cell id="0" cpus="0-1" memory="1024000" unit="KiB" />
<cell id="1" cpus="2-3" memory="1048576" unit="KiB"/>
</numa>
This is meant to create 1:1 mapping between guest and host NUMA
nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it
can allocate memory even for guest node #1 and have the memory
come fro host node #1) and then, set cpuset.mems to "0" (because
that's where we wanted emulator thread to live).
But this in turn triggers movement of all memory (even the
allocated one) to host NUMA node #0. Therefore, we have to just
keep cpuset.mems untouched and rely on .host-nodes passed on the
QEMU cmd line.
The placement still suffers because of cpuset.mems set for vcpus
or iothreads, but that's fixed in next commit.
Fixes: 3ec6d586bc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Apparmor profiles in /etc/apparmor.d/ are config files that can and should
be replaced on package upgrade, which introduces the potential to overwrite
any local changes. Apparmor supports local profile customizations via
/etc/apparmor.d/local/<service> [1].
This change makes the support explicit by adding libvirtd, virtqemud, and
virtxend profile customization stubs to /etc/apparmor.d/local/. The stubs
are conditionally included by the corresponding main profiles.
[1] https://ubuntu.com/server/docs/security-apparmor
See "Profile customization" section
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Add data as of v8.0.0-1619-g369081c455:
Notable changes:
- 'SapphireRapids' cpu model added
- 'EPYC-Genoa(-v1)' cpu model added
- 'EPYC-Milan-v2' cpu model added
- 'EPYC-Rome-(v3|v4)' cpu models added
- new cpu features:
'fb-clear', 'cmpccxadd', 'vnmi', 'flush-l1d', 'avx-vnni-int8', 'avx-ifma',
'no-nested-data-bp', 'null-sel-clr-base', 'amd-psfd', 'auto-ibrs', 'amx-fp16',
'prefetchiti', 'lfence-always-serializing', 'avx-ne-convert'
- 8.1 machine types added
- QMP schema:
- 'block-latency-histogram-set' gained 'boundaries-zap' property
- 'qcow2' block driver gained 'discard-no-unref' flag
- 'input-send-event' now supports the 'mtt' type and corresponding properties
- 'memory-backend-file' object now has a 'offset' property
- 'query-blockstats' reports 'failed_zone_append_operations', 'avg_zone_append_latency_ns'
'avg_zone_append_queue_depth', 'zone_append_bytes', 'zone_append_latency_histogram',
'zone_append_operations', 'zone_append_merged', 'zone_append_total_time_ns'
- 'single-step' property of 'query-status' is deprecated
- 'vcpu' argument of 'trace-events-(set|get'-state' is deprecated
'cpu-host-model' qemuxml2argv test output changed as EPYC-Rome gained
few new cpu flags.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
'trace-event-get-state' was used for testing schema validation as it had
simple arguments. Now 'vcpu' is optional and deprecated. Fix the test so
that it won't break with upcoming qemu-8.1.
Drop the 'all-attrs' case, as it's not not really testing anything
special and for the 'missing mandatory attr' case use an empty object.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In one of its commits [1] libssh2 changed the 'text' member of
LIBSSH2_USERAUTH_KBDINT_PROMPT struct from 'char' to 'unsigned
char'. But we g_strdup() the member in order to fill 'prompt'
member of virConnectCredential struct. Typecast the value to
avoid warnings. Also, drop @prompt variable, as it's needless.
1: 83853f8aea
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Use automatic memory freeing and modern XML parsers to simplify the
function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Format the rule attributes in two passes, first for positive 'match' and
second pass for negative. This removes the crazy logic for switching
between match modes inside the formatter.
The refactor makes it also more clear in which cases we actually do
format something.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLNodeGetSubelementList to get the elements to process.
The new approach documents the complexity of the parser, which is
designed to ignore unknown attributes and parse only a single kind of
them after finding the first valid one.
Note that the XML schema doesn't actually allow having multiple
sub-elements, but I'm not sure how that translates to actual configs
present.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use modern parsing. Invalid numbers are now rejected. Semantis for
numbers out of range is preserved.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Convert the fields to the proper types and use virXMLPropEnum for
parsing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLFormatElement to simplify the formatter. Drop return value of
virNWFilterRuleDefFormat as there are no errors to report.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virXMLNodeGetSubelement(List) instead of the looped parser and
simplify the code.
Note that handling of the 'bootp' element now conforms to the schema
where we allow just one and the 'file' attribute is mandatory.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The parser and formatter for nwfilter rules is very strange and has
weird quirks. Add a test case trying to capture some of the quirks to
visualize how it will change when the code is refactored.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The new helper is similar to virXPathNodeSet list but for cases where we
want to get subelements directly rather than using XPath.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There's nothing to clean up in the 'host' local variable on error as
the function which fills it makes sure to fill it only on success. In
such case it's also directly assigned to the array thus the 'host'
variable is cleared.
Remove the 'cleanup' label and 'ret' variable as we can now directly
return -1 on error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Extract the 'inbound'/'outbound' subelements using
virXMLNodeGetSubelement to simplify the code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Remove the unnecessary check for valid arguments and use
virXMLPropULongLong instead of hand-written property parsers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>