Sync was introduced in [1] to check for ga presence. This
check is racy but in the era before serial events are available
there was not better solution I guess.
In case we have the events the sync function is different. It allows us
to flush stateless ga channel from remnants of previous communications.
But we need to do it only once. Until we get timeout on issued command
channel state is ok.
[1] qemu_agent: Issue guest-sync prior to every command
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If a disk has persistent reservations enabled, qemu-pr-helper
might open not only /dev/mapper/control but also individual
targets of the multipath device. We are already querying for them
in CGroups, but now we have to create them in the namespace too.
This was brought up in [1].
1: https://bugzilla.redhat.com/show_bug.cgi?id=1711045#c61
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Lin Ma <LMa@suse.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
This converts the QEMU agent APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.
A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
We are dealing with the QEMU agent, not the monitor.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This converts the QEMU monitor APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.
A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In common with regular QEMU guests, the QMP probing
will need an event loop for handling monitor I/O
operations.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The event loop thread will be responsible for handling
any per-domain I/O operations, most notably the QEMU
monitor and agent sockets.
We start this event loop when launching QEMU, but stopping
the event loop is a little more complicated. The obvious
idea is to stop it in qemuProcessStop(), but if we do that
we risk loosing the final events from the QEMU monitor, as
they might not have been read by the event thread at the
time we tell the thread to stop.
The solution is to delay shutdown of the event thread until
we have seen EOF from the QEMU monitor, and thus we know
there are no further events to process.
Note that this assumes that we don't have events to process
from the QEMU agent.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.
The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image. Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1803551
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The qemuMonitorOpenFD method has not been used since it
was first introduced.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Libvirt has never configured the QEMU agent to support
running on a PTY implicitly. In theory an end user may
have written such an XML config, but this is reasonably
unlikely since when a bare <channel> is provided, libvirt
will auto-expand it to a UNIX socket backend.
With this change a user who has use the PTY backend will
have to switch to the UNIX backend if they wish to use
libvirt APIs for interacting with the agent. This will
not have guest ABI impact.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
qemuMonitorJSONMakeCommandInternal does the full command construction if
you pass in what would become the value of the 'arguments' key. Refactor
the open-coded implementation to use the helper and use modern cleanup
helpers at the same time.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Make it obvious that the function always returns a valid pointer and fix
all callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
I've found that if my virtlogd is socket activated but the daemon
doesn't run yet, then the virt-qemu-run is killed right after it
tries to start the domain. The problem is that because the default
setting is to use virtlogd, the domain create code tries to
connect to virtlogd socket, which in turn tries to detect who is
connecting (virNetSocketGetUNIXIdentity()) and as a part of it,
it will try to open /proc/${PID_OF_SHIM}/stat which is denied by
SELinux:
type=AVC msg=audit(1582903501.927:323): avc: denied { search } for \
pid=1210 comm="virtlogd" name="1843" dev="proc" ino=37224 \
scontext=system_u:system_r:virtlogd_t:s0-s0:c0.c1023 \
tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=dir \
permissive=0
Virtlogd reacts by closing the connection which the shim sees as
SIGPIPE. Since the default response to the signal is Term, we
don't even get to reporting any error nor to removing the
temporary directory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
When virt-qemu-run is ran without any root directory specified on
the command line, a temporary directory is made and used instead.
But since we are using g_dir_make_tmp() to create the directory
it is going to have 0700 mode. So even though we create the whole
directory structure under it and label everything, QEMU is very
likely to not have the access. This is because in this case there
is no qemu.conf and thus distro default UID:GID is used to run
QEMU (e.g. qemu:kvm on Fedora). Change the mode of the temporary
directory so that everybody has eXecute permission.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Libvirt tries to forbid migration onto the same host and it does
that by checking if local and remote hostnames are the same and
whether local and remote UUIDs are the same. Well, the latter
makes sense but the former doesn't really because libvirtd can be
running inside an UTS namespace and hostnames can appear the same
on both sides of migration. On the other hand, host UUIDs are
unique, so rely on them when trying to prevent migration onto the
same host.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1639596
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Use the 'flat' flag for 'query-named-block-nodes' if qemu supports
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT in qemuBlockGetNamedNodeData.
We don't need the data so plumb in whether qemu supports the
'flat' output.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Modern qemu allows to skip the nested redundant data in the output of
query-named-block-nodes. Plumb in the support for the argument that
enables it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Replace qemuMonitorBlockGetNamedNodeData by qemuBlockGetNamedNodeData.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Detect the presence of the flag and make it available internally as
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The monitor password callback was removed long time ago but the callback
type and variable were left around. Finish the cleanup.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Format the 'vhost-user-fs' device on the QEMU command line.
This device provides shared file system access using the FUSE protocol
carried over virtio.
The actual file server is implemented in an external vhost-user-fs device
backend process.
https://bugzilla.redhat.com/show_bug.cgi?id=1694166
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Look into /usr/share/qemu/vhost-user to see whether we can find
a suitable virtiofsd binary, in case the user did not provide one
in the domain XML.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Wire up the code to put virtiofsd in the emulator cgroup on domain
startup.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Start virtiofsd for each <filesystem> device using it.
Pre-create the socket for communication with QEMU and pass it
to virtiofsd.
Note that virtiofsd needs to run as root.
https://bugzilla.redhat.com/show_bug.cgi?id=1694166
Introduced by QEMU commit a43efa34c7d7b628cbf1ec0fe60043e5c91043ea
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
This is not yet supported.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Add a 'virtiofsd_debug' option for tuning whether to run virtiofsd
in debug mode.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Introduce a new 'virtiofs' driver type for filesystem.
<filesystem type='mount' accessmode='passthrough'>
<driver type='virtiofs'/>
<source dir='/path'/>
<target dir='mount_tag'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</filesystem>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Introduced by QEMU commit 98fc1ada4cf70af0f1df1a2d7183cf786fc7da05
virtio: add vhost-user-fs base device
Released in QEMU v4.2.0.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Pass logManager to qemuExtDevicesStart for future usage.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
The default memlock limit is 64k which is not enough to start a single
VM. The requirements for one VM are 12k, 8k for eBPF map and 4k for eBPF
program, however, it fails to create eBPF map and program with 64k limit.
By testing I figured out that the minimal limit is 80k to start a single
VM with functional eBPF and if I add 12k I can start another one.
This leads into following calculation:
80k as memlock limit worked to start a VM with eBPF which means there
is 68k of lock memory that I was not able to figure out what was using
it. So to get a number for 4096 VMs:
68 + 12 * 4096 = 49220
If we round it up we will get 64M of memory lock limit to support 4096
VMs with default map size which can hold 64 entries for devices.
This should be good enough as a sane default and users can change it if
the need to.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1807090
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Whenever there is a guest CPU configured in domain XML, we will call
some CPU driver APIs to validate the CPU definition and check its
compatibility with the hypervisor. Thus domains with guest CPU
specification can only be started if the guest architecture is supported
by the CPU driver. But we would add a default CPU to any domain as long
as QEMU reports it causing failures to start any domain on affected
architectures.
https://bugzilla.redhat.com/show_bug.cgi?id=1805755
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While our code can detect ISO as a separate format, qemu does not use it
as such and just passes it through as raw. Add conversion for detected
parts of the backing chain so that the validation code does not reject
it right away.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Include virutil.h in all files that use it,
instead of relying on it being pulled in somehow.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Historically, this file was a dump for most of our helper
functions and needed almost everywhere.
With the introduction of virfile.h and virstring.h,
and more importantly, virenum.h and the introduction
of GLib, that is no longer true.
Remove its include from C files that don't even use it.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Both callers pass false. Since we frown upon format probing, remove the
unused possibility to do the probing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The backend name is memory-backend-memfd but we've been checking
for memory-backend-memory.
Reported by GCC on rawhide:
../../../src/internal.h:75:22: error: 'strcmp' of a string of length 21 and
an array of size 21 evaluates to nonzero [-Werror=string-compare]
../../../src/qemu/qemu_command.c:3525:20: note: in expansion of macro 'STREQ'
3525 | } else if (STREQ(backendType, "memory-backend-memory") &&
| ^~~~~
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: 24b74d187c
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Another vircgroup helper to avoid code repetition between
the LXC and QEMU driver.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
lxcDomainSetMemoryParameters() and qemuDomainSetMemoryParameters()
has duplicated chunks of code that can be put in a new
helper.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This new helper avoids more code repetition inside
lxcDomainSetBlkioParameters() and qemuDomainSetBlkioParameters().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
After the introduction of virDomainDriverMergeBlkioDevice() in a
previous patch, it is now clear that lxcDomainSetBlkioParameters() and
qemuDomainSetBlkioParameters() uses the same loop to set cgroup
blkio parameter of a domain.
Avoid the repetition by adding a new helper called
virDomainCgroupSetupDomainBlkioParameters().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
lxcDomainParseBlkioDeviceStr() and qemuDomainParseBlkioDeviceStr()
are the same function. Avoid code repetition by putting the code
in a new helper.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
lxcDomainMergeBlkioDevice() and qemuDomainMergeBlkioDevice()
are the same functions. This duplicated code can't be put in
the existing domain_cgroup.c since it's not cgroup related.
This patch introduces a new src/hypervisor/domain_driver.c to
host this more generic code that can be shared between virt
drivers. This new file is then used to create a new helper
called virDomainDeivceMergeBlkioDevice() to eliminate the code
repetition mentioned above. Callers in LXC and QEMU files
were updated.
This change is a preliminary step for more code reduction of
cgroup related code inside lxcDomainSetBlkioParameters() and
qemuDomainSetBlkioParameters().
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
qemuSetupCgroupVcpuBW() and lxcSetVcpuBWLive() shares the
same code to set CPU CFS period and quota. This code can be
moved to a new virCgroupSetupCpuPeriodQuota() helper to
avoid code repetition.
A similar code is also executed in virLXCCgroupSetupCpuTune(),
but without the rollback on error. Use the new helper in this
function as well since the 'period' rollback, if not a
straight improvement for virLXCCgroupSetupCpuTune(), is
benign. And we end up cutting more code repetition.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code that calls virCgroupSetCpuShares() and virCgroupGetCpuShares()
is repeated in 4 different places. Let's put it in a new
virCgroupSetupCpuShares() to avoid code repetition.
There's a reason of why we execute a Get in the same value we
just executed Set, explained in detail by commit 97814d8ab3.
Let's add a gist of the reasoning behind it as a comment in
this new function as well.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>