Commit Graph

1357 Commits

Author SHA1 Message Date
Michal Privoznik
5ac2439a83 qemu_process: Release domain seclabel later in qemuProcessStop()
Some secdrivers (typically SELinux driver) generate unique
dynamic seclabel for each domain (unless a static one is
requested in domain XML). This is achieved by calling
qemuSecurityGenLabel() from qemuProcessPrepareDomain() which
allocates unique seclabel and stores it in domain def->seclabels.
The counterpart is qemuSecurityReleaseLabel() which releases the
label and removes it from def->seclabels. Problem is, that with
current code the qemuProcessStop() may still want to use the
seclabel after it was released, e.g. when it wants to restore the
label of a disk mirror.

What is happening now, is that in qemuProcessStop() the
qemuSecurityReleaseLabel() is called, which removes the SELinux
seclabel from def->seclabels, yada yada yada and eventually
qemuSecurityRestoreImageLabel() is called. This bubbles down to
virSecuritySELinuxRestoreImageLabelSingle() which find no SELinux
seclabel (using virDomainDefGetSecurityLabelDef()) and this
returns early doing nothing.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1751664
Fixes: 8fa0374c5b
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2021-01-06 13:29:09 +01:00
Jiri Denemark
f7c40b5c71 qemu: The TSC tolerance interval should be closed
The kernel refuses to set guest TSC frequency less than a minimum
frequency or greater than maximum frequency (both computed based on the
host TSC frequency). When writing the libvirt code with a reversed logic
(return success when the requested frequency falls within the tolerance
interval) I forgot to include the boundaries.

Fixes: d8e5b45600
https://bugzilla.redhat.com/show_bug.cgi?id=1839095

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-01-06 11:24:37 +01:00
Peter Krempa
d0819b9f02 qemu: Properly handle setting of <iotune> for empty cdrom
When starting a VM with an empty cdrom which has <iotune> configured the
startup fails as qemu is not happy about setting tuning for an empty
drive:

 error: internal error: unable to execute 'block_set_io_throttle', unexpected error: 'Device has no medium'

Resolve this by skipping the setting of throttling for empty drives and
updating the throttling when new medium is inserted into the drive.

Resolves: https://gitlab.com/libvirt/libvirt/-/issues/111
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2021-01-06 09:24:48 +01:00
Shi Lei
9b5d741a9d netdevmacvlan: Use helper function to create unique macvlan/macvtap name
Simplify ReserveName/GenerateName for macvlan and macvtap by using
common functions.

Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-12-15 13:35:33 -05:00
Shi Lei
c36cad1a31 netdevtap: Use common helper function to create unique tap name
Simplify GenerateName/ReserveName for netdevtap by using common
functions.

Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-12-15 13:35:27 -05:00
Daniel Henrique Barboza
9432693e2b domain_conf.c: move virDomainDeviceDefValidate() to domain_validate.c
Move virDomainDeviceDefValidate() and all its helper functions to
domain_validate.c.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-14 09:29:09 -03:00
Peter Krempa
18de9dfd77 virDomainDefValidate: Add per-run 'opaque' data
virDomainDefPostParse infrastructure has apart from the global opaque
data also per-run data, but this was not duplicated into the validation
callbacks.

This is important when drivers want to use correct run-state for the
validation.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-09 09:33:47 +01:00
Peter Krempa
c1720b9ac7 qemuDomainDiskLookupByNodename: Lookup also backup 'store' nodenames
Nodename may be asociated to a disk backup job, add support to looking
up in that chain too. This is specifically useful for the
BLOCK_WRITE_THRESHOLD event which can be registered for any nodename.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-08 15:12:34 +01:00
Michal Privoznik
40a162f83e qemu: Don't cache NUMA caps
In v6.0.0-rc1~439 (and friends) we tried to cache NUMA
capabilities because we assumed they are immutable. And to some
extent they are (NUMA hotplug is not a thing, is it). However,
our capabilities contain also some runtime info that can change,
e.g. hugepages pool allocation sizes or total amount of memory
per node (host side memory hotplug might change the value).

Because of the caching we might not be reporting the correct
runtime info in 'virsh capabilities'.

The NUMA caps are used in three places:

  1) 'virsh capabilities'
  2) domain startup, when parsing numad reply
  3) parsing domain private data XML

In cases 2) and 3) we need NUMA caps to construct list of
physical CPUs that belong to NUMA nodes from numad reply. And
while this may seem static, it's not really because of possible
CPU hotplug on physical host.

There are two possible approaches:

  1) build a validation mechanism that would invalidate the
     cached NUMA caps, or
  2) drop the caching and construct NUMA caps from scratch on
     each use.

In this commit, the latter approach is implemented, because it's
easier.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1819058
Fixes: 1a1d848694
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-07 11:32:40 +01:00
Peter Krempa
0f7b80691b qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately
Don't make the logic confusing by representing the 3 options using an
integer with negative values.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2020-12-07 10:15:00 +01:00
Daniel Henrique Barboza
5a34d0667d qemu: move memory size align to qemuProcessPrepareDomain()
qemuBuildCommandLine() is calling qemuDomainAlignMemorySizes(),
which is an operation that changes live XML and domain and has
little to do with the command line build process.

Move it to qemuProcessPrepareDomain() where we're supposed to
make live XML and domain changes before launch. qemuProcessStart()
is setting VIR_QEMU_PROCESS_START_NEW if !migrate && !snapshot,
same conditions used in qemuBuildCommandLine() to call
qemuDomainAlignMemorySizes(), making this change seamless.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-03 17:19:35 -03:00
Daniel Henrique Barboza
3bb9ed8bc2 qemu_process.c: check migrateURI when setting VIR_QEMU_PROCESS_START_NEW
qemuProcessCreatePretendCmdPrepare() is setting the
VIR_QEMU_PROCESS_START_NEW regardless of whether this is
a migration case or not. This behavior differs from what we're
doing in qemuProcessStart(), where the flag is set only
if !migrate && !snapshot.

Fix it by making the flag setting consistent with what we're
doing in qemuProcessStart().

Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-03 17:16:33 -03:00
John Ferlan
148cfcf051 qemu: Pass / fill niothreads for qemuMonitorGetIOThreads
Let's pass along / fill @niothreads rather than trying to make dual
use as a return value and thread count.

This resolves a Coverity issue detected in qemuDomainGetIOThreadsMon
where if qemuDomainObjExitMonitor failed, then a -1 was returned and
overwrite @niothreads causing a memory leak.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-12-03 17:06:07 +01:00
Michal Privoznik
b7d4e6b67e lib: Replace VIR_AUTOSTRINGLIST with GStrv
Glib provides g_auto(GStrv) which is in-place replacement of our
VIR_AUTOSTRINGLIST.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-02 15:43:07 +01:00
Pavel Hrdina
82bda55e2f qemuProcessHandleGraphics: no need to check for NULL
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-11-16 17:25:41 +01:00
Jiri Denemark
d8e5b45600 qemu: Do not require TSC frequency to strictly match host
Some CPUs provide a way to read exact TSC frequency, while measuring it
is required on other CPUs. However, measuring is never exact and the
result may slightly differ across reboots. For this reason both Linux
kernel and QEMU recently started allowing for guests TSC frequency to
fall into +/- 250 ppm tolerance interval around the host TSC frequency.

Let's do the same to avoid unnecessary failures (esp. during migration)
in case the host frequency does not exactly match the frequency
configured in a domain XML.

https://bugzilla.redhat.com/show_bug.cgi?id=1839095

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-11-12 17:29:16 +01:00
Masayoshi Mizuma
5cde9dee8c qemu: Move qemuExtDevicesStop() before removing the pidfiles
A qemu guest which has virtiofs config fails to start if the previous
starting failed because of invalid option or something.

That's because the virtiofsd isn't killed by virPidFileForceCleanupPath()
on the former failure because the pidfile was already removed by
virFileDeleteTree(priv->libDir) in qemuProcessStop(), so
virPidFileForceCleanupPath() just returned.

Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that
virPidFileForceCleanupPath() can kill virtiofsd correctly.

For example of the reproduction:

  # virsh start guest
  error: Failed to start domain guest
  error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option

  ... fix the option ...

  # virsh start guest
  error: Failed to start domain guest
  error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy
  #

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-11 15:20:12 +01:00
Peter Krempa
62a01d84a3 util: hash: Retire 'virHashTable' in favor of 'GHashTable'
Don't hide our use of GHashTable behind our typedef. This will also
promote the use of glibs hash function directly.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Matt Coleman <matt@datto.com>
2020-11-06 10:40:51 +01:00
Daniel P. Berrangé
99a1cfc438 qemu: honour fatal errors dealing with qemu slirp helper
Currently all errors from qemuInterfacePrepareSlirp() are completely
ignored by the callers. The intention is that missing qemu-slirp binary
should cause the caller to fallback to the built-in slirp impl.

Many of the possible errors though should indeed be considered fatal.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-10-27 12:03:19 +00:00
zhenwei pi
7555a55470 qemu: implement memory failure event
Since QEMU 5.2 (commit-77b285f7f6), QEMU supports 'memory failure'
event, posts event to monitor if hitting a hardware memory error.
Fully support this feature for QEMU.

Test with commit 'libvirt: support memory failure event', build a
little complex environment(nested KVM):
1, install newly built libvirt in L1, and start a L2 vm. run command
in L1:
 ~# virsh event l2 --event memory-failure

2, run command in L0 to inject MCE to L1:
 ~# virsh qemu-monitor-command l1 --hmp mce 0 9 0xbd000000000000c0 0xd 0x62000000 0x8c

Test result in l1(recipient hypervisor case):
event 'memory-failure' for domain l2:
recipient: hypervisor
action: ignore
flags:
        action required: 0
        recursive: 0

Test result in l1(recipient guest case):
event 'memory-failure' for domain l2:
recipient: guest
action: inject
flags:
        action required: 0
        recursive: 0

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-23 09:42:00 +02:00
Peter Krempa
d6d4c08daf util: hash: Change type of hash table name/key to 'char'
All users of virHashTable pass strings as the name/key of the entry.
Make this an official requirement by turning the variables to 'const
char *'.

For any other case it's better to use glib's GHashTable.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-10-22 15:02:46 +02:00
Daniel P. Berrangé
7b1ed1cd73 qemu: stop passing -enable-fips to QEMU >= 5.2.0
Use of the -enable-fips option is being deprecated in QEMU >= 5.2.0. If
FIPS compliance is required, QEMU must be built with libcrypt which will
unconditionally enforce it.

Thus there is no need for libvirt to pass -enable-fips to modern QEMU.
Unfortunately there was never any way to probe for -enable-fips in the
first instance, it was enabled by libvirt based on version number
originally, and then later unconditionally enabled when libvirt dropped
support for older QEMU. Similarly we now use a version number check to
decide when to stop passing -enable-fips.

Note that the qemu-5.2 capabilities are currently from the pre-release
version and will be updated once qemu-5.2 is released.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-10-22 09:03:33 +02:00
Jonathon Jongsma
08f8fd8413 conf: Add support for vDPA network devices
This patch adds new schema and adds support for parsing and formatting
domain configurations that include vdpa devices.

vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.

When a device on the host is bound to an appropriate vendor-specific
driver, it will create a chardev on the host at e.g.  /dev/vhost-vdpa-0.
That chardev path can then be used to define a new interface with
type='vdpa'.

Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-10-20 14:46:52 -04:00
Peter Krempa
7b0ced89e7 qemu: Prepare hostdev data which depends on the host state separately
SCSI hostdev setup requires querying the host os for the actual path of
the configured hostdev. This was historically done in the command line
formatter. Our new approach is to split out this part into
'qemuProcessPrepareHost' which is designed to be skipped in tests.

Refactor the hostdev code to use this new semantics, and add appropriate
handlers filling in the data for tests and the qemuConnectDomainXMLToNative
users.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-20 15:08:22 +02:00
Peter Krempa
9ff3ad9058 qemuProcessCreatePretendCmd: Split up preparation and command building
Host preparation steps which are deliberately skipped when
pretend-creating a commandline are normally executed after VM object
preparation. In the test code we are faking some of the host
preparation steps, but we were doing that prior to the call to
qemuProcessPrepareDomain embedded in qemuProcessCreatePretendCmd.

By splitting up qemuProcessCreatePretendCmd into two functions we can
ensure that the ordering of the prepare steps stays consistent.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-20 15:08:22 +02:00
Erik Skultety
ccb40cf288 qemu: process: sev: Fill missing 'cbitpos' & 'reducedPhysBits' from caps
These XML attributes have been mandatory since the introduction of SEV
support to libvirt. This design decision was based on QEMU's
requirement for these to be mandatory for migration purposes, as
differences in these values across platforms must result in the
pre-migration checks failing (not that migration with SEV works at the
time of this patch).

This patch enables autofill of these attributes right before launching
QEMU and thus updating the live XML.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:03:27 +02:00
Erik Skultety
1fdc907325 qemu: process: Move SEV capability check to qemuValidateDomainDef
Checks such as this one should be done at domain def validation time,
not before starting the QEMU process.
As for this change, existing domains will see some QEMU error when
starting as opposed to a libvirt error that this QEMU binary doesn't
support SEV, but that's okay, we never guaranteed error messages to
remain the same.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:03:16 +02:00
Erik Skultety
649f720a9a qemu_process: sev: Drop an unused variable
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:01:56 +02:00
Pavel Hrdina
5ad8272888 util: vircgroup: change virCgroupFree to take only virCgroupPtr
As preparation for g_autoptr() we need to change the function to take
only virCgroupPtr.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
2020-10-09 16:24:35 +02:00
Ján Tomko
cc3190cc4c qemu: process: use g_new0
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-10-05 16:44:06 +02:00
Ján Tomko
868c350752 qemu: separate out VIR_ALLOC calls
Move them to separate conditions to reduce churn
in following patches.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-10-05 16:44:06 +02:00
Cole Robinson
0fa5c23865 qemu: Taint cpu host-passthrough only after migration
From a discussion last year[1], Dan recommended libvirt drop the tain
flag for cpu host-passthrough, unless the VM has been migrated.

This repurposes the existing host-cpu taint flag to do just that.

[1]: https://www.redhat.com/archives/virt-tools-list/2019-February/msg00041.html

https://bugzilla.redhat.com/show_bug.cgi?id=1673098

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
2020-10-05 10:08:26 -04:00
Peter Krempa
faa88866f5 Don't check return value of virBitmapNewCopy
The function will not fail any more.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-05 15:50:45 +02:00
Peter Krempa
cb6fdb0125 virBitmapNew: Don't check return value
Remove return value check from all callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-05 15:38:47 +02:00
Masayoshi Mizuma
1c9227de5d qemu: process: Handle transient disks on VM startup
Add overlays after the VM starts before we start executing guest code.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
afc25e8553 qemu: prepare cleanup for <transient/> disk overlays
Later patches will implement support for <transient/> disks in libvirt
by installing an overlay on top of the configured image. This will
require cleanup after the VM will be stopped so that the state is
correctly discarded.

Since the overlay will be installed only during the startup phase of the
VM we need to ensure that qemuProcessStop doesn't delete the original
file on some previous failure. This is solved by adding
'inhibitDiskTransientDelete' VM private data member which is set prior
to any startup step and will be cleared once transient disk overlays are
established.

Based on that we can then delete the overlays for any <transient/> disk.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
3673bdbe13 qemu: domain: Extract preparation of hostdev specific data to a separate function
Historically we've prepared secrets for all objects in one place. This
doesn't make much sense and it's semantically more appealing to prepare
everything for a single device type in one place.

Move the setup of the (iSCSI|SCSI) hostdev secrets into a new function
which will be used to setup other things as well in the future.

This is a similar approach we do for disks.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Ján Tomko
af16e754cd qemuProcessReconnect: clear 'oldjob'
After we started copying the privateData pointer in
qemuDomainObjRestoreJob, we should also free them
once we're done with them.

Register the clear function and use g_auto.
Also add a check for job->cb to qemuDomainObjClearJob,
to prevent freeing an uninitialized job.

https://bugzilla.redhat.com/show_bug.cgi?id=1878450

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: aca37c3fb2
2020-09-14 18:10:56 +02:00
Tim Wiederhake
caf5a88e59 qemu: Use glib memory functions in qemuProcessReadLog
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-11 18:19:58 +02:00
Michal Privoznik
ec46e6d44b qemu_process: Separate VIR_PERF_EVENT_* setting into a function
When starting a domain, qemuProcessLaunch() iterates over all
VIR_PERF_EVENT_* values and (possibly) enables them. While there
is nothing wrong with the code, the for loop where it's done makes
it harder to jump onto next block of code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 10:57:24 +02:00
Martin Kletzander
f5b486daea qemu: Allow setting affinity to fail and don't report error
This is just a clean-up of commit 3791f29b08 using the new parameter of
virProcessSetAffinity() introduced in commit 9514e24984 so that there is
no error reported in the logs.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 14:48:57 +02:00
Martin Kletzander
9514e24984 Do not report error when setting affinity is allowed to fail
Suggested-by: Ján Tomko <jtomko@redhat.com>

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 11:35:36 +02:00
Nikolay Shirokovskiy
5c0cd375d1 qemu: don't shutdown event thread in monitor EOF callback
This hunk was introduced in [1] in order to avoid loosing
events from monitor on stopping qemu process. But as explained
in [2] on destroy we won't get neither EOF nor any other
events as monitor is just closed. In case of crash/shutdown
we won't get any more events as well and qemuDomainObjStopWorker
will be called by qemuProcessStop eventually. Thus let's
remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF
as it is not useful anymore.

[1] e6afacb0f: qemu: start/stop an event loop thread for domains
[2] d2954c072: qemu: ensure domain event thread is always stopped

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Martin Kletzander
fc7d53edf4 qemu: Fix comment in qemuProcessSetupPid
This was supposed to be done in commit 3791f29b08, but I missed a spot.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-06 13:44:27 +02:00
Martin Kletzander
3791f29b08 qemu: Do not error out when setting affinity failed
Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:44:21 +02:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Laine Stump
95089f481e util: assign tap device names using a monotonically increasing integer
When creating a standard tap device, if provided with an ifname that
contains "%d", rather than taking that literally as the name to use
for the new device, the kernel will instead use that string as a
template, and search for the lowest number that could be put in place
of %d and produce an otherwise unused and unique name for the new
device. For example, if there is no tap device name given in the XML,
libvirt will always send "vnet%d" as the device name, and the kernel
will create new devices named "vnet0", "vnet1", etc. If one of those
devices is deleted, creating a "hole" in the name list, the kernel
will always attempt to reuse the name in the hole first before using a
name with a higher number (i.e. it finds the lowest possible unused
number).

The problem with this, as described in the previous patch dealing with
macvtap device naming, is that it makes "immediate reuse" of a newly
freed tap device name *much* more common, and in the aftermath of
deleting a tap device, there is some other necessary cleanup of things
which are named based on the device name (nwfilter rules, bandwidth
rules, OVS switch ports, to name a few) that could end up stomping
over the top of the setup of a new device of the same name for a
different guest.

Since the kernel "create a name based on a template" functionality for
tap devices doesn't exist for macvtap, this patch for standard tap
devices is a bit different from the previous patch for macvtap - in
particular there was no previous "bitmap ID reservation system" or
overly-complex retry loop that needed to be removed. We simply find
and unused name, and pass that name on to the kernel instead of
"vnet%d".

This counter is also wrapped when either it gets to INT_MAX or if the
full name would overflow IFNAMSIZ-1 characters. In the case of
"vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly
someday someone will change the name from vnet to something else.

(NB: It is still possible for a user to provide their own
parameterized template name (e.g. "mytap%d") in the XML, and libvirt
will just pass that through to the kernel as it always has.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:44 -04:00
Laine Stump
d7f38beb2e util: replace macvtap name reservation bitmap with a simple counter
There have been some reports that, due to libvirt always trying to
assign the lowest numbered macvtap / tap device name possible, a new
guest would sometimes be started using the same tap device name as
previously used by another guest that is in the process of being
destroyed *as the new guest is starting.

In some cases this has led to, for example, the old guest's
qemuProcessStop() code deleting a port from an OVS switch that had
just been re-added by the new guest (because the port name is based on
only the device name using the port). Similar problems can happen (and
I believe have) with nwfilter rules and bandwidth rules (which are
both instantiated based on the name of the tap device).

A couple patches have been previously proposed to change the ordering
of startup and shutdown processing, or to put a mutex around
everything related to the tap/macvtap device name usage, but in the
end no matter what you do there will still be possible holes, because
the device could be deleted outside libvirt's control (for example,
regular tap devices are automatically deleted when the qemu process
terminates, and that isn't always initiated by libvirt but could
instead happen completely asynchronously - libvirt then has no control
over the ordering of shutdown operations, and no opportunity to
protect it with a mutex.)

But this only happens if a new device is created at the same time as
one is being deleted. We can effectively eliminate the chance of this
happening if we end the practice of always looking for the lowest
numbered available device name, and instead just keep an integer that
is incremented each time we need a new device name. At some point it
will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15
character limit if nothing else), and we can't guarantee that the new
name really will be the *least* recently used name, but "math"
suggests that it will be *much* less common that we'll try to re-use
the *most* recently used name.

This patch implements such a counter for macvtap/macvlan, replacing
the existing, and much more complicated, "ID reservation" system. The
counter is set according to whatever macvtap/macvlan devices are
already in use by guests when libvirtd is started, incremented each
time a new device name is needed, and wraps back to 0 when either
INT_MAX is reached, or when the resulting device name would be longer
than IFNAMSIZ-1 characters (which actually is what happens when the
template for the device name is "maccvtap%d"). The result is that no
macvtap name will be re-used until the host has created (and possibly
destroyed) 99,999,999 devices.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:36 -04:00
Ján Tomko
0a37e0695b Split declarations from initializations
Split those initializations that depend on a statement
above them.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Ján Tomko
a5152f23e7 Move declarations before statements
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00