1413 Commits

Author SHA1 Message Date
Martin Kletzander
3791f29b08 qemu: Do not error out when setting affinity failed
Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:44:21 +02:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Laine Stump
95089f481e util: assign tap device names using a monotonically increasing integer
When creating a standard tap device, if provided with an ifname that
contains "%d", rather than taking that literally as the name to use
for the new device, the kernel will instead use that string as a
template, and search for the lowest number that could be put in place
of %d and produce an otherwise unused and unique name for the new
device. For example, if there is no tap device name given in the XML,
libvirt will always send "vnet%d" as the device name, and the kernel
will create new devices named "vnet0", "vnet1", etc. If one of those
devices is deleted, creating a "hole" in the name list, the kernel
will always attempt to reuse the name in the hole first before using a
name with a higher number (i.e. it finds the lowest possible unused
number).

The problem with this, as described in the previous patch dealing with
macvtap device naming, is that it makes "immediate reuse" of a newly
freed tap device name *much* more common, and in the aftermath of
deleting a tap device, there is some other necessary cleanup of things
which are named based on the device name (nwfilter rules, bandwidth
rules, OVS switch ports, to name a few) that could end up stomping
over the top of the setup of a new device of the same name for a
different guest.

Since the kernel "create a name based on a template" functionality for
tap devices doesn't exist for macvtap, this patch for standard tap
devices is a bit different from the previous patch for macvtap - in
particular there was no previous "bitmap ID reservation system" or
overly-complex retry loop that needed to be removed. We simply find
and unused name, and pass that name on to the kernel instead of
"vnet%d".

This counter is also wrapped when either it gets to INT_MAX or if the
full name would overflow IFNAMSIZ-1 characters. In the case of
"vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly
someday someone will change the name from vnet to something else.

(NB: It is still possible for a user to provide their own
parameterized template name (e.g. "mytap%d") in the XML, and libvirt
will just pass that through to the kernel as it always has.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:44 -04:00
Laine Stump
d7f38beb2e util: replace macvtap name reservation bitmap with a simple counter
There have been some reports that, due to libvirt always trying to
assign the lowest numbered macvtap / tap device name possible, a new
guest would sometimes be started using the same tap device name as
previously used by another guest that is in the process of being
destroyed *as the new guest is starting.

In some cases this has led to, for example, the old guest's
qemuProcessStop() code deleting a port from an OVS switch that had
just been re-added by the new guest (because the port name is based on
only the device name using the port). Similar problems can happen (and
I believe have) with nwfilter rules and bandwidth rules (which are
both instantiated based on the name of the tap device).

A couple patches have been previously proposed to change the ordering
of startup and shutdown processing, or to put a mutex around
everything related to the tap/macvtap device name usage, but in the
end no matter what you do there will still be possible holes, because
the device could be deleted outside libvirt's control (for example,
regular tap devices are automatically deleted when the qemu process
terminates, and that isn't always initiated by libvirt but could
instead happen completely asynchronously - libvirt then has no control
over the ordering of shutdown operations, and no opportunity to
protect it with a mutex.)

But this only happens if a new device is created at the same time as
one is being deleted. We can effectively eliminate the chance of this
happening if we end the practice of always looking for the lowest
numbered available device name, and instead just keep an integer that
is incremented each time we need a new device name. At some point it
will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15
character limit if nothing else), and we can't guarantee that the new
name really will be the *least* recently used name, but "math"
suggests that it will be *much* less common that we'll try to re-use
the *most* recently used name.

This patch implements such a counter for macvtap/macvlan, replacing
the existing, and much more complicated, "ID reservation" system. The
counter is set according to whatever macvtap/macvlan devices are
already in use by guests when libvirtd is started, incremented each
time a new device name is needed, and wraps back to 0 when either
INT_MAX is reached, or when the resulting device name would be longer
than IFNAMSIZ-1 characters (which actually is what happens when the
template for the device name is "maccvtap%d"). The result is that no
macvtap name will be re-used until the host has created (and possibly
destroyed) 99,999,999 devices.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:36 -04:00
Ján Tomko
0a37e0695b Split declarations from initializations
Split those initializations that depend on a statement
above them.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Ján Tomko
a5152f23e7 Move declarations before statements
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Laine Stump
5cad64ec03 qemu: remove unreachable code in qemuProcessStart()
Back when the original version of this chunk of code was added (commit
41b087198 in libvirt-0.8.1 in April 2010), we used virExecDaemonize()
to start the qemu process, and would continue on in the function
(which at that time was called qemudStartVMDaemon()) even if a -1 was
returned. So it was possible to get to this code with rv == -1 (it was
called "ret" in that version of the code).

In modern libvirt code, qemu is started with virCommandRun(); then we
call virPidFileReadPath(); those are the only two ways of setting "rv"
prior to this code being removed, and in either case if the new value
of rv < 0, then we immediately skip over the rest of the code to the
cleanup: label.

This means that the code being removed by this patch is
unreachable.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-08-24 23:46:51 -04:00
Michal Privoznik
9048dc4e62 qemuDomainBuildNamespace: Populate basic /dev from daemon's namespace
As mentioned in previous commit, populating domain's namespace
from pre-exec() hook is dangerous. This commit moves population
of the namespace with basic /dev nodes (e.g. /dev/null, /dev/kvm,
etc.) into daemon's namespace.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:36 +02:00
Michal Privoznik
8da362fe62 qemu_domain_namespace: Repurpose qemuDomainBuildNamespace()
Okay, here is the deal. Currently, the way we build namespace is
very fragile. It is done from pre-exec hook when starting a
domain, after we mass closed all FDs and before we drop
privileges and exec() QEMU. This fact poses some limitations onto
the namespace build code, e.g. it has to make sure not to keep
any FD opened (not even through a library call), because it would
be leaked to QEMU. Also, it has to call only async signal safe
functions. These requirements are hard to meet - in fact as of my
commit v6.2.0-rc1~235 we are leaking a FD into QEMU by calling
libdevmapper functions.

To solve this issue and avoid similar problems in the future, we
should change our paradigm. We already have functions which can
populate domain's namespace with nodes from the daemon context.
If we use them to populate the namespace and keep only the bare
minimum in the pre-exec hook, we've mitigated the risk.

Therefore, the old qemuDomainBuildNamespace() is renamed to
qemuDomainUnshareNamespace() and new qemuDomainBuildNamespace()
function is introduced. So far, the new function is basically a
NOP and domain's namespace is still populated from the pre-exec
hook - next patches will fix it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:36 +02:00
Michal Privoznik
764eaf1aa4 qemu_domain_namespace: Rename qemuDomainCreateNamespace()
The name of this function is not very helpful, because it doesn't
create anything, it just flips a bit in a bitmask when domain is
starting up. Move the function internals into qemu_process.c and
forget the function ever existed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:33 +02:00
Michal Privoznik
90eee87569 qemu: Separate out namespace handling code
The qemu_domain.c file is big as is and we should split it into
separate semantic blocks. Start with code that handles domain
namespaces.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:32:27 +02:00
Ján Tomko
ee247e1d3f Use g_strfeev instead of virStringFreeList
Both accept a NULL value gracefully and virStringFreeList
does not zero the pointer afterwards, so a straight replace
is safe.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-08-03 15:37:36 +02:00
Ján Tomko
1edf164848 Remove redundant conditions
All of these have been checked earlier.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-08-03 15:19:28 +02:00
Ján Tomko
6c7ba7b496 qemu: Fix affinity typo
Fixes: 4c0398b5284d14c55eca51095673b6fadbbd85fb
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-07-22 15:51:26 +02:00
Bihong Yu
3ee423c363 qemu: pre-create the dbus directory in qemuStateInitialize
There are races condiction to make '/run/libvirt/qemu/dbus' directory in
virDirCreateNoFork() while concurrent start VMs, and get "failed to create
directory '/run/libvirt/qemu/dbus': File exists" error message. pre-create the
dbus directory in qemuStateInitialize.

Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-07-22 09:40:15 +02:00
Jiri Denemark
1031db3600 qemu: Properly set //cpu/@migratable default value for running domains
Since active domains which do not have the attribute already set were
not started by libvirt that probed for CPU migratable property, we need
to check this property on reconnect and update the domain definition
accordingly.

https://bugzilla.redhat.com/show_bug.cgi?id=1857967

Reported-by: Mark Mielke <mark.mielke@gmail.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-07-21 15:40:01 +02:00
Daniel Henrique Barboza
f187b2fb98 qemu_process.c: modernize qemuProcessQMPNew()
Use g_autoptr() and remove the 'cleanup' label.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200717211556.1024748-3-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-07-21 15:34:36 +02:00
Peter Krempa
db712b0673 qemuDomainDiskLookupByNodename: Remove unused 'idx'
All callers pass NULL as the value. Remove the argument.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Peter Krempa
c414ab00e2 qemuProcessHandleBlockThreshold: Report correct indexes
The index returned by qemuDomainDiskLookupByNodename is the position in
the backing chain rather than the index we report in the XML.

Since with -blockdev they differ now and additionally the disk source
also has an index we need to fix the 'threshold' events we report:

1) If it's the top level image we must always trigger the event without
   any suffix as we did until now

2) We must report the correct index

3) We must report the correct index also for the top level image, when
   blockdev is used.

This means that we need to potentially emit 2 events, one for the device
without the index and then when blockdev is used and the top level image
has an index we must do it also with the index.

This will fix it for blockdev cases, while also not removing previous
semantics.

https://bugzilla.redhat.com/show_bug.cgi?id=1857204

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Peter Krempa
4a19b7b832 qemuDomainDiskBackingStoreGetName: Remove unused argument
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Prathamesh Chavan
aca37c3fb2 qemu_domainjob: introduce privateData for qemuDomainJob
To remove dependecy of `qemuDomainJob` on job specific
paramters, a `privateData` pointer is introduced.
To handle it, structure of callback functions is
also introduced.

Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-07-20 15:34:58 +02:00
Michal Privoznik
824e349397 qemu: Use qemuSecuritySetSavedStateLabel() to label restore path
Currently, when restoring from a domain the path that the domain
restores from is labelled under qemuSecuritySetAllLabel() (and after
v6.3.0-rc1~108 even outside transactions). While this grants QEMU
the access, it has a flaw, because once the domain is restored, up
and running then qemuSecurityDomainRestorePathLabel() is called,
which is not real counterpart. In case of DAC driver the
SetAllLabel() does nothing with the restore path but
RestorePathLabel() does - it chown()-s the file back and since there
is no original label remembered, the file is chown()-ed to
root:root. While the apparent solution is to have DAC driver set the
label (and thus remember the original one) in SetAllLabel(), we can
do better.

Turns out, we are opening the file ourselves (because it may live on
a root squashed NFS) and then are just passing the FD to QEMU. But
this means, that we don't have to chown() the file at all, we need
to set SELinux labels and/or add the path to AppArmor profile.

And since we want to restore labels right after QEMU is done loading
the migration stream (we don't want to wait until
qemuSecurityRestoreAllLabel()), the best way to approach this is to
have separate APIs for labelling and restoring label on the restore
file.

I will investigate whether AppArmor can use the SavedStateLabel()
API instead of passing the restore path to SetAllLabel().

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1851016

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-07-10 14:18:07 +02:00
Daniel P. Berrangé
fd460ef561 qemu: stop checking virObjectUnref return value
Some, but not all, of the monitor event handlers check
the virObjectUnref return value to see if the domain
was disposed.

It should not be possible for this to happen, since
the function already holds a lock on the domain and
has only just acquired an extra reference on the
domain a few lines earlier.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-06-03 10:20:17 +01:00
Daniel Henrique Barboza
9665b27dba qemuProcessRefreshCPU: skip 'host-model' logic for pSeries guests
Commit v3.10.0-182-g237f045d9a ("qemu: Ignore fallback CPU attribute
on reconnect") forced CPU 'fallback' to ALLOW, regardless of user
choice. This fixed a situation in which guests created with older
Libvirt versions, which used CPU mode 'host-model' in runtime, would
fail to launch in a newer Libvirt if the fallback was set to FORBID.
This would lead to a scenario where the CPU was translated to 'host-model'
to 'custom', but then the FORBID setting would make the translation
process fail.

PSeries can operate with 'host-model' in runtime due to specific PPC64
mechanics regarding compatibility mode. The update() implementation of
the cpuDriverPPC64 driver is a NO-OP if CPU mode is 'host-model', and
the driver does not implement translate(). The commit mentioned above
is causing PSeries guests to get their 'fallback' setting to ALLOW,
overwriting user choice, exposing a design problem in
qemuProcessRefreshCPU() - for PSeries guests, handling 'host-model'
as it is being done does not apply.

All other cpuArchDrivers implements update() and changes guest mode
to VIR_CPU_MODE_CUSTOM, meaning that PSeries is currently the only
exception to this logic. Let's make it official.

https://bugzilla.redhat.com/show_bug.cgi?id=1660711

Suggested-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200525123945.4049591-2-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-05-25 16:20:25 +02:00
Daniel Henrique Barboza
f600c42627 qemu_process.c: modernize qemuProcessUpdateCPU code path
Use automatic cleanup on qemuProcessUpdateCPU and the functions called
by it.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200522195620.3843442-5-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-05-25 12:31:14 +02:00
Peter Krempa
78d30aa0bf qemu: Prepare for testing of 'netdev_add' props via qemuxml2argvtest
qemuxml2argv test suite is way more comprehensive than the hotplug
suite. Since we share the code paths for monitor and command line
hotplug we can easily test the properties of devices against the QAPI
schema.

To achieve this we'll need to skip the JSON->commandline conversion for
the test run so that we can analyze the pure properties. This patch adds
flags for the comand line generator and hook them into the
JSON->commandline convertor for -netdev. An upcoming patch will make use
of this new infrastructure.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-05-20 09:41:58 +02:00
Michal Privoznik
8fd2749b2d qemuProcessStop: Reattach NVMe disks a domain is mirroring into
If the mirror destination is not a file but a NVMe disk, then
call qemuHostdevReAttachOneNVMeDisk() to reattach the NVMe back
to the host.

This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1825785

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-05-18 15:14:27 +02:00
Michal Privoznik
0230e38384 qemuProcessStop: Use XATTRs to restore seclabels on disks a domain is mirroring into
In v5.10.0-rc1~42 (which was later fixed in v6.0.0-rc1~487) I am
removing XATTRs for a file that QEMU is mirroring a disk into but
it is killed meanwhile. Well, we can call
qemuSecurityRestoreImageLabel() which will not only remove XATTRs
but also use them to restore the original owner of the file.

This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-05-18 15:13:14 +02:00
Ján Tomko
006782a8bc qemu: only stop external devices after the domain
A failure in qemuProcessLaunch would lead to qemuExtDevicesStop
being called twice - once in the cleanup section and then again
in qemuProcessStop.

However, the first one is called while the QEMU process is
still running, which is too soon for the swtpm process, because
the swtmp_ioctl command can lock up:

https://bugzilla.redhat.com/show_bug.cgi?id=1822523

Remove the first call and only leave the one in qemuProcessStop,
which is called after the QEMU process is killed.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-05-13 15:29:37 +02:00
Peter Krempa
3bcbdc51da qemu: process: Don't clear QEMU_CAPS_BLOCKDEV when SD card is present
Help QEMU in deprecation of -drive if=none without the need to refactor
all old boards. Stop masking out -blockdev support when -drive if=sd
needs to be used. We achieve this by forbidding blockjobs and
special-casing all other code paths. Blockjobs are sacrificed in this
case as SD cards are a corner case for some ARM boards and are thus not
used commonly.

https://bugzilla.redhat.com/show_bug.cgi?id=1821692

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
e664abb62e qemu: Prepare for 'sd' card use together with blockdev
SD cards need to be instantiated via -drive if=sd. This means that all
cases where we use the blockdev path need to be special-cased for SD
cards.

Note that at this point QEMU_CAPS_BLOCKDEV is still cleared if the VM
config has a SD card.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
d876a93f05 qemu: Handle cases when 'qomName' isn't present
Use the drive alias for all cases when we can't generate qomName. This
is meant to handle disks on 'sd' bus which are instantiated via -drive
if=sd as there isn't any specific QOM name for them.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
cc4a277db2 qemu: Rename qemuDiskBusNeedsDriveArg to qemuDiskBusIsSD
The function effectively boils down to whether the disk is 'SD'. Since
we'll need to make more decisions based on the fact whether the disk is
on the SD bus, rename the function.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:54:59 +02:00
Michal Privoznik
1d3a9ee9da qemu: Make memory path generation embed driver aware
So far, libvirt generates the following path for memory:

  $memoryBackingDir/$id-$shortName/ram-nodeN

where $memoryBackingDir is the path where QEMU mmaps() memory for
the guest (e.g. /var/lib/libvirt/qemu/ram), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:

  /var/lib/libvirt/qemu/ram/1-QEMUGuest/ram-node0

While in case of embed driver the following path would be
generated by default:

  $root/lib/qemu/ram/1-QEMUGuest/ram-node0

which is not clashing with other embed drivers, we allow users to
override the default and have all embed drivers use the same
prefix. This can create clashing paths. Fortunately, we can reuse
the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.

Note, the important change is in qemuGetMemoryBackingBasePath().
The rest is needed to pass driver around.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-04-07 15:26:32 +02:00
Michal Privoznik
bf54784cb1 qemu: Make hugepages path generation embed driver aware
So far, libvirt generates the following path for hugepages:

  $mnt/libvirt/qemu/$id-$shortName

where $mnt is the mount point of hugetlbfs corresponding to
hugepages of desired size (e.g. /dev/hugepages), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:

  /dev/hugepages/libvirt/qemu/1-QEMUGuest

But this won't work with embed driver really, because if there
are two instances of embed driver, and they both want to start a
domain with the same name and with hugepages, both drivers will
generate the same path which is not desired. Fortunately, we can
reuse the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.

Note, the important change is in qemuGetBaseHugepagePath(). The
rest is needed to pass driver around.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-04-07 15:26:26 +02:00
Daniel P. Berrangé
d2954c0729 qemu: ensure domain event thread is always stopped
In previous commit:

  commit e6afacb0feabd9bf58331aa0e5259a35378be74e
  Author: Daniel P. Berrangé <berrange@redhat.com>
  Date:   Wed Feb 12 12:26:11 2020 +0000

    qemu: start/stop an event loop thread for domains

A bogus comment was added claiming we didn't need to shutdown the
event thread in the qemuProcessStop method, because this would be
done in the monitor EOF callback. This was wrong because the EOF
callback only runs in the case of a QEMU crash or a guest initiated
clean shutdown & poweroff.  In the case where the libvirt admin
calls virDomainDestroy, the EOF callback never fires because we
have already unregistered the event callbacks. We must thus always
attempt to stop the event thread in qemuProcessStop.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-30 16:48:15 +01:00
Marc-André Lureau
db670b8d67 qemu: prepare and stop the dbus daemon
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-24 15:57:33 +01:00
Michal Privoznik
a02c589886 qemuProcessStartManagedPRDaemon: Don't pass -f pidfile to the daemon
Now, that our virCommandSetPidFile() is more intelligent we don't
need to rely on the daemon to create and lock the pidfile and use
virCommandSetPidFile() at the same time.

NOTE that as advertised in the previous commit, this was
temporarily broken, because both virCommand and
qemuProcessStartManagedPRDaemon() would try to lock the pidfile.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-03-24 15:53:03 +01:00
Gaurav Agrawal
d2c43a5b51 qemu: convert DomainLogContext class to use GObject
Signed-off-by: Gaurav Agrawal <agrawalgaurav@gnome.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-03-16 17:28:39 +01:00
Ján Tomko
b0eea635b3 Use g_strerror instead of virStrerror
Remove lots of stack-allocated buffers.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-13 17:26:55 +01:00
Nikolay Shirokovskiy
b47e3b9b5c qemu: agent: sync once if qemu has serial port event
Sync was introduced in [1] to check for ga presence. This
check is racy but in the era before serial events are available
there was not better solution I guess.

In case we have the events the sync function is different. It allows us
to flush stateless ga channel from remnants of previous communications.
But we need to do it only once. Until we get timeout on issued command
channel state is ok.

[1] qemu_agent: Issue guest-sync prior to every command

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-12 18:07:50 +01:00
Daniel P. Berrangé
a18f2c52ac qemu: convert agent to use the per-VM event loop
This converts the QEMU agent APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.

A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:45:01 +00:00
Daniel P. Berrangé
436a56e37d qemu: convert monitor to use the per-VM event loop
This converts the QEMU monitor APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.

A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:55 +00:00
Daniel P. Berrangé
92890fbfa1 qemu: start/stop an event thread for QMP probing
In common with regular QEMU guests, the QMP probing
will need an event loop for handling monitor I/O
operations.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:47 +00:00
Daniel P. Berrangé
e6afacb0fe qemu: start/stop an event loop thread for domains
The event loop thread will be responsible for handling
any per-domain I/O operations, most notably the QEMU
monitor and agent sockets.

We start this event loop when launching QEMU, but stopping
the event loop is a little more complicated. The obvious
idea is to stop it in qemuProcessStop(), but if we do that
we risk loosing the final events from the QEMU monitor, as
they might not have been read by the event thread at the
time we tell the thread to stop.

The solution is to delay shutdown of the event thread until
we have seen EOF from the QEMU monitor, and thus we know
there are no further events to process.

Note that this assumes that we don't have events to process
from the QEMU agent.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:44 +00:00
Michal Privoznik
13eb6c1468 qemu: Tell secdrivers which images are top parent
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.

The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image.  Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1803551

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-03-09 14:14:55 +01:00
Daniel P. Berrangé
5bff668dfb src: improve thread naming with human targetted names
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-05 12:23:04 +00:00
Ján Tomko
b164eac5e1 qemuExtDevicesStart: pass logManager
Pass logManager to qemuExtDevicesStart for future usage.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2020-03-04 12:08:50 +01:00
Ján Tomko
feb69a19ac conf: do not pass vm object to virDomainClearNetBandwidth
This function only uses the domain definition.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-02-25 17:50:47 +01:00
Ján Tomko
7e0d11be5b virsh: include virutil.h where used
Include virutil.h in all files that use it,
instead of relying on it being pulled in somehow.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-02-24 23:15:50 +01:00