Commit Graph

28260 Commits

Author SHA1 Message Date
Daniel P. Berrangé
f8ec7c842d rpc: use new virt-ssh-helper binary for remote tunnelling
This wires up support for using the new virt-ssh-helper binary with the ssh,
libssh and libssh2 protocols.

The new binary will be used preferentially if it is available in $PATH,
otherwise we fall back to traditional netcat.

The "proxy" URI parameter can be used to force use of netcat e.g.

  qemu+ssh://host/system?proxy=netcat

or the disable fallback e.g.

  qemu+ssh://host/system?proxy=native

With use of virt-ssh-helper, we can now support remote session URIs

  qemu+ssh://host/session

and this will only use virt-ssh-helper, with no fallback. This also lets
the libvirtd process be auto-started, and connect directly to the
modular daemons, avoiding use of virtproxyd back-compat tunnelling.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
6e4143c851 rpc: switch order of args in virNetClientNewSSH
Switch keyfile and netcat parameters, since the netcat path and
socket path are a logical pair that belong together. This patches
the other constructors.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
dfad1b551d remote: introduce virt-ssh-helper binary
When accessing libvirtd over a SSH tunnel, the remote driver needs a way
to proxy the SSH input/output stream to a suitable libvirt daemon. This
is currently done by spawning netcat, pointing it to the libvirtd socket
path. This is problematic for a number of reasons:

 - The socket path varies according to the --prefix chosen at build
   time. The remote client is seeing the local prefix, but what we
   need is the remote prefix

 - The socket path varies according to remote env variables, such as
   the XDG_RUNTIME_DIR location. Again we see the local XDG_RUNTIME_DIR
   value, but what we need is the remote value (if any)

 - The remote driver doesn't know whether it must connect to the legacy
   libvirtd or the modular daemons, so must always assume legacy
   libvirtd for back-compat. This means we'll always end up using the
   virtproxyd daemon adding an extra hop in the RPC layer.

 - We can not able to autospawn the libvirtd daemon for session mode
   access

To address these problems this patch introduces the 'virtd-ssh-helper'
program which takes the URI for the remote driver as a CLI parameter.
It then figures out which daemon to connect to and its socket path,
using the same code that the remote driver client would on the remote
host's build of libvirt.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
1ba30bc1e5 remote: extract logic for determining daemon to connect to
We'll shortly want to reuse code for determining whether to connect to
the system or session daemon from places outside the remote driver
client. Pulling it out into a self contained function facilitates reuse.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
7ce887989a remote: split out function for constructing socket path
The remoteGetUNIXSocketHelper method will be needed by source files
beyond the remote driver client.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
3dfd0cbf25 remote: parse the remote transport string earlier
We delay converting the remote transport string to enum form until
fairly late. As a result we're doing string comparisons when we
could be just doing enum comparisons.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
849de2b844 remote: split out function for parsing URI scheme
The remoteSplitURISCheme method will be needed by source files beyond
the remote driver client.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
9949d9b28f remote: split off enums into separate source file
The remoteDriverTransport and remoteDriverMode enums are going to be
needed by source files beyond the remote driver client.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
5cd80b04d9 remote: push logic for default netcat binary into common helper
We don't want to repeat the choice of default netcat binary setting in
three different places. This will also make it possible to do better
error reporting in the helper.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Daniel P. Berrangé
019b13dd20 rpc: merge logic for generating remote SSH shell script
Three parts of the code all build up the same SSH shell script
snippet for remote tunneling the RPC protocol, but in slightly
different ways. Combine them all into one helper method in the
virNetClient code, since this logic doesn't really belong in
the virNetSocket code.

Note that the this change means the shell snippet is passed to
the SSH binary as a single arg, instead of three separate args,
but this is functionally identical, as the three separate args
were combined into one already when passed to the remote system.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-09 16:46:22 +01:00
Sebastian Mitterle
653fdf48e3 disk storage: fix allocation size for pool format dos
The changed condition was always false because the function was always
called with boundary values 0.

Use the free extent's start value to get its start offset from the
cylinder boundary and determine if the needed size for allocation
needs to be expanded too in case the offset doesn't fit within extra
bytes for alignment.

This fixes an issue where vol-create-from will call qemu-img convert
to create a destination volume of same capacity as the source volume
and qemu-img will error 'Cannot grow device files' due to the partition
being too small for the source although both destination partition and
source volume have the same capacity.

Signed-off-by: Sebastian Mitterle <smitterl@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-09 16:53:40 +02:00
Hao Wang
0011ec3191 client: fix memory leak in client msg
When closing client->waitDispatch in virNetClientIOEventLoopRemoveAll
or virNetClientIOEventLoopRemoveDone, VIR_FREE() is called to free
call->msg directly, resulting in leak of the memory call->msg->buffer
points to.
Use virNetMessageFree(call->msg) instead of VIR_FREE(call->msg).

Signed-off-by: Hao Wang <wanghao232@huawei.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-09 15:03:45 +02:00
Michal Privoznik
c43622f06e qemuFirmwareFillDomain: Fill NVRAM template on migration too
In 8e1804f9f6 I've tried to fix the following use case: domain
is started with path to UEFI only and relies on libvirt to figure
out corresponding NVRAM template to create a per-domain copy
from. The fix consisted of having a check tailored exactly for
this use case and if it's hit then using FW autoselection to
figure it out. Unfortunately, the NVRAM template is not saved in
the inactive XML (well, the domain might be transient anyway).
Then, as a part of that check we see whether the per-domain copy
doesn't exist already and if it does then no template is looked
up hence no template will appear in the live XML.

This works, until the domain is migrated. At the destination, the
per-domain copy will not exist so we need to know the template to
create the per-domain copy from. But we don't even get to the
check because we are not starting a fresh new domain and thus the
qemuFirmwareFillDomain() function quits early.

The solution is to switch order of these two checks. That is
evaluate the check for the old style before checking flags.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852910
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
2020-09-09 14:47:51 +02:00
Tim Wiederhake
0b261edd15 cpu_ppc64: Remove error path in virCPUppc64DriverGetModels
The call to `g_strfreev` is not required, as in both cases no memory has been allocated.

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-09 08:44:22 +02:00
Tim Wiederhake
239218c567 cpu_map: Use g_auto* in loadData
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-09 08:44:22 +02:00
Daniel Henrique Barboza
ddbff7e3ee cpu_ppc64.c: use g_autoptr() whenever possible
Using g_autoptr() in virCPUDef pointers allows for more
cleanups in ppc64Compute() and virCPUppc64Baseline()

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
c9491341af cpu_ppc64.c: use g_autofree() whenever possible
This allows for a label removal in ppc64ModelParse().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
ec79c3338b cpu_ppc64.c: use g_autoptr() in virCPUppc64GetHost()
We don't need to call virCPUppc64DataFree() in a cleanup label.
This function is already assigned to the 'dataFree' interface
of cpuDriverPPC64, and it will be called by virCPUDataFree(), the
autocleanup function of virCPUDataPtr, via driver->dataFree.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
76d2c048c9 cpu_ppc64.c: use g_autoptr() with virCPUppc64MapPtr
Use autocleanup with virCPUppc64MapPtr to simplify existing
code. Remove labels when possible.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
f5e5ba3a55 cpu_ppc64.c: register AUTOPTR_CLEANUP_FUNC for virCPUppc64MapPtr
Next patch will use g_autoptr() in virCPUppc64MapPtr pointers
for some cleanups.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
c79825d705 cpu_ppc64.c: use typedefs for 'struct ppc64_map'
Introduce virCPUppc64Map and virCPUppc64MapPtr types to
improve code readability.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
93c1e64a16 cpu_ppc64.c: use g_autoptr() with virCPUppc64ModelPtr
Use autocleanup with virCPUppc64ModelPtr to simplify existing
code. Remove the 'error' label in ppc64ModelCopy() since it is
now obsolete.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
5cf09b579f cpu_ppc64.c: register AUTOPTR_CLEANUP_FUNC for virCPUppc64ModelPtr
Next patch will use g_autoptr() in virCPUppc64ModelPtr pointers
for some cleanups.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
b45b3dbe7a cpu_ppc64.c: use typedefs for 'struct ppc64_model'
Introduce virCPUppc64Model and virCPUppc64ModelPtr types to
improve code readability.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
a79ba15546 cpu_ppc64.c: modernize ppc64VendorParse()
Use g_autoptr() in virCPUppc64VendorPtr and remove the now
uneeded 'cleanup' label.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
339525a68d cpu_ppc64.c: register AUTOPTR_CLEANUP_FUNC for virCPUppc64VendorPtr
Next patch will use g_autoptr() in virCPUppc64VendorPtr pointers
for some cleanups.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:15 +02:00
Daniel Henrique Barboza
0d4ee33633 cpu_ppc64.c: use typedefs for 'struct ppc64_vendor'
Introduce virCPUppc64Vendor and virCPUppc64VendorPtr types to
improve code readability.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 18:13:14 +02:00
Tim Wiederhake
d369264a99 cpu: Use g_auto* in virCPUGetHost
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 17:41:22 +02:00
Tim Wiederhake
ca1c9716cc cpu: Use g_auto* in virCPUCompareXML
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 17:41:22 +02:00
Tim Wiederhake
e8d40cc3be cpu_map: Use g_auto* in loadIncludes
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 17:41:22 +02:00
Tim Wiederhake
6d1c458346 cpu_map: Use g_auto* in cpuMapLoadInclude
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 17:41:22 +02:00
Pavel Hrdina
a490693367 src/storage/meson: fix vir_storage_file_gluster module dependencies
The correct key for dependencies for virt_modules hash is `deps`.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-09-08 16:14:49 +02:00
Ján Tomko
92b252456e check for NULL before calling g_regex_unref
g_regex_unref reports an error if called with a NULL argument.

We have two cases in the code where we (possibly) call it on a NULL
argument. The interesting one is in virDomainQemuMonitorEventCleanup.

Based on VIR_CONNECT_DOMAIN_QEMU_MONITOR_EVENT_REGISTER_REGEX, we unref
data->regex, which has two problems:

* On the client side, flags is -1 so the comparison is true even if no
  regex was used, reproducible by:
  $ virsh qemu-monitor-event --timeout 1
  which results in an ugly error:
(process:1289846): GLib-CRITICAL **: 14:58:42.631: g_regex_unref: assertion 'regex != NULL' failed
* On the server side, we only create the regex if both the flag and the
  string are present, so it's possible to trigger this message by:
  $ virsh qemu-monitor-event --regex --timeout 1

Use a non-NULL comparison instead of the flag to decide whether we need
to unref the regex. And add a non-NULL check to the unref in the
VirtualBox test too.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: 71efb59a4d
https://bugzilla.redhat.com/show_bug.cgi?id=1876907
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-08 16:07:47 +02:00
Michal Privoznik
9e0d4b9240 virnuma: Report error when NUMA -> CPUs translation fails
When starting a domain with <numatune/> set libvirt translates
given NUMA nodes into a set of host CPUs which is then used to
QEMU process affinity. But, if the numatune contains a
non-existent NUMA node then the translation fails with no error
reported. This is because virNumaNodesetToCPUset() calls
virNumaGetNodeCPUs() and expects it to report an error on
failure. Well, it does except for non-existent NUMA nodes. While
this behaviour might look strange it is actually desired because
of how we construct host capabilities. The virNumaGetNodeCPUs()
is called from virCapabilitiesHostNUMAInitReal() where we do not
want any error reported for non-existent NUMA nodes.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1724866
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-08 11:00:54 +02:00
Michal Privoznik
ec46e6d44b qemu_process: Separate VIR_PERF_EVENT_* setting into a function
When starting a domain, qemuProcessLaunch() iterates over all
VIR_PERF_EVENT_* values and (possibly) enables them. While there
is nothing wrong with the code, the for loop where it's done makes
it harder to jump onto next block of code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 10:57:24 +02:00
Peter Krempa
de79fad40f qemuBlockStorageSourceCreateDetectSize: Propagate cluster size for 'qcow2'
Propagate the cluster size from the original image as the user might
have configured a custom cluster size for performance reasons. Propagate
the cluster size of a qcow2 image to the new overlay or copy.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Peter Krempa
e60620e28b qemu: block: Allow specifying cluster size when using 'blockdev-create'
'blockdev-create' allows us to create the image with a custom cluster
size if we wish to. Wire it up for 'qcow2'.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Peter Krempa
fd49364d8b qemu: monitor: Detect image cluster size from 'query-named-block-nodes'
Configuring the cluster size of an image may have performance
implications. This patch allows us to detect cluster size for existing
images so that we will be able to propagate it to new images which are
based on existing images e.g. during snapshots/block-copy/etc.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-08 08:48:53 +02:00
Michal Privoznik
4a72b76b8a qemu_namespace: Don't leak mknod items that are being skipped over
When building and populating domain NS a couple of functions are
called that append paths to a string list. This string list is
then inspected, one item at the time by
qemuNamespacePrepareOneItem() which gathers all the info for
given path (stat buffer, possible link target, ACLs, SELinux
label) using qemuNamespaceMknodItemInit(). If the path needs to
be created in the domain's private /dev then it's added onto this
qemuNamespaceMknodData list which is freed later in the process.
But, if the path does not need to be created in the domain's
private /dev, then the memory allocated by
qemuNamespaceMknodItemInit() is not freed anywhere leading to a
leak.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 16:27:25 +02:00
Martin Kletzander
f5b486daea qemu: Allow setting affinity to fail and don't report error
This is just a clean-up of commit 3791f29b08 using the new parameter of
virProcessSetAffinity() introduced in commit 9514e24984 so that there is
no error reported in the logs.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 14:48:57 +02:00
Martin Kletzander
9514e24984 Do not report error when setting affinity is allowed to fail
Suggested-by: Ján Tomko <jtomko@redhat.com>

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 11:35:36 +02:00
Ján Tomko
7afc99ae2d qemu: migration: remove unused variable
../src/qemu/qemu_migration.c:4091:36: error: unused variable 'cfg' [-Werror,-Wunused-variable]
    g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: d92c2bbc65
2020-09-07 11:03:54 +02:00
Michal Privoznik
d92c2bbc65 lib: Prefer g_autoptr() declaration of virQEMUDriverConfigPtr
In the past we had to declare @cfg and then explicitly unref it.
But now, with glib we can use g_autoptr() which will do the unref
automatically and thus is more bulletproof.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-07 10:47:54 +02:00
Michal Privoznik
5befe4ee18 qemu_interface: Fix @cfg refcounting in qemuInterfacePrepareSlirp()
In the qemuInterfacePrepareSlirp() function, the qemu driver
config is obtained (via virQEMUDriverGetConfig()), but it is
never unrefed leading to mangled refcounter.

Fixes: 9145b3f1cc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-07 10:46:21 +02:00
Nikolay Shirokovskiy
9b648cb83e util: remove unused virThreadPoolNew macro
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:34:00 +03:00
Nikolay Shirokovskiy
61845fbf42 rpc: cleanup virNetDaemonClose method
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
399039a6b1 qemu: implement driver's shutdown/shutdown wait methods
On shutdown we just stop accepting new jobs for worker thread so that on
shutdown wait we can exit worker thread faster. Yes we basically stop
processing of events for VMs but we are going to do so anyway in case of daemon
shutdown.

At the same time synchronous event processing that some API calls may require
are still possible as per VM event loop is still running and we don't need
worker thread for synchronous event processing.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
860a999802 qemu: avoid deadlock in qemuDomainObjStopWorker
We are dropping the only reference here so that the event loop thread
is going to be exited synchronously. In order to avoid deadlocks we
need to unlock the VM so that any handler being called can finish
execution and thus even loop thread be finished too.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
f4fc3db920 vireventthread: exit thread synchronously on finalize
It it useful to be sure no thread is running after we drop all references to
virEventThread. Otherwise in order to avoid crashes we need to synchronize some
other way or we make extra references in event handler callbacks to all the
object in use. And some of them are not prepared to be refcounted.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
5c0cd375d1 qemu: don't shutdown event thread in monitor EOF callback
This hunk was introduced in [1] in order to avoid loosing
events from monitor on stopping qemu process. But as explained
in [2] on destroy we won't get neither EOF nor any other
events as monitor is just closed. In case of crash/shutdown
we won't get any more events as well and qemuDomainObjStopWorker
will be called by qemuProcessStop eventually. Thus let's
remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF
as it is not useful anymore.

[1] e6afacb0f: qemu: start/stop an event loop thread for domains
[2] d2954c072: qemu: ensure domain event thread is always stopped

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
94e45d1042 rpc: finish all threads before exiting main loop
Currently we have issues like [1] on libvirtd shutdown as we cleanup while RPC
and other threads are still running. Let's finish all threads other then main
before cleanup.

The approach to finish threads is suggested in [2]. In order to finish RPC
threads serving API calls we let the event loop run but stop accepting new API
calls and block processing any pending API calls. We also inform all drivers of
shutdown so they can prepare for shutdown too. Then we wait for all RPC threads
and driver's background thread to finish. If finishing takes more then 15s we
just exit as we can't safely cleanup in time.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1828207
[2] https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
b776dfa8e8 rpc: add shutdown facilities to netserver
virNetServerClose and virNetServerShutdownWait are used to start net server
threads shutdown and wait net server threads to actually finish respectively
during net daemon shutdown procedure.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
0f38dedd89 rpc: add virNetDaemonSetShutdownCallbacks
The function is used to set shutdown prepare and wait callbacks. Prepare
callback is used to inform other threads of the daemon that the daemon will be
closed soon so that they can start to shutdown. Wait callback is used to wait
for other threads to actually finish.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:59 +03:00
Nikolay Shirokovskiy
1eae52b9f1 rpc: don't unref service ref on socket behalf twice
Second unref was added in [1]. We don't need it actually as
we pass free callback to virNetSocketAddIOCallback thus
when we call virNetSocketRemoveIOCallback the extra ref for
callback will be dropped without extra efforts.

[1] 355d8f470f: virNetServerServiceClose: Don't leak sockets

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:58 +03:00
Nikolay Shirokovskiy
255437eeb7 util: add stop/drain functions to thread pool
Stop just send signal for threads to exit when they finish with
current task. Drain waits when all threads will finish.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:58 +03:00
Nikolay Shirokovskiy
018e213f5d util: always initialize priority condition
Even if we have no priority threads on pool creation we can add them thru
virThreadPoolSetParameters later.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:58 +03:00
Nikolay Shirokovskiy
c5bf40bfa6 libvirt: add stateShutdownPrepare/stateShutdownWait to drivers
stateShutdownPrepare is supposed to inform driver that it will be closed soon
so that the driver can prepare and finish all background threads quickly on
stateShutdownWait call.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-07 09:33:58 +03:00
Martin Kletzander
fc7d53edf4 qemu: Fix comment in qemuProcessSetupPid
This was supposed to be done in commit 3791f29b08, but I missed a spot.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-06 13:44:27 +02:00
Martin Kletzander
f51cbe92c0 qemu: Allow migration over UNIX socket
This allows:

 a) migration without access to network

 b) complete control of the migration stream

 c) easy migration between containerised libvirt daemons on the same host

Resolves: https://bugzilla.redhat.com/1638889

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-05 07:55:45 +02:00
Daniel P. Berrangé
ee6c936fbb network: drop use of dummy tap device in bridges
A long time ago we introduced a dummy tap device (e.g. virbr0-nic) that
we attached to the bridge device created for virtual networks:

  commit 5754dbd56d
  Author: Laine Stump <laine@redhat.com>
  Date:   Wed Feb 9 03:28:12 2011 -0500

    Give each virtual network bridge its own fixed MAC address

This was a hack to workaround a Linux kernel bug where it would not
honour any attempt to set a MAC address on a bridge. Instead the
bridge would adopt the numerically lowest MAC address of all NICs
attached to the bridge. This lead to the MAC addrss of the bridge
changing over time as NICs were attached/detached.

The Linux bug was actually fixed 3 years before the libvirt
workaround was added in:

  commit 92c0574f11598c8036f81e27d2e8bdd6eed7d76d
  Author: Stephen Hemminger <shemminger@vyatta.com>
  Date:   Tue Jun 17 16:10:06 2008 -0700

    bridge: make bridge address settings sticky

    Normally, the bridge just chooses the smallest mac address as the
    bridge id and mac address of bridge device. But if the administrator
    has explictly set the interface address then don't change it.

    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

but libvirt needed to support RHEL-5 kernels at that time, so
none the less added the workaround.

We have long since dropped support for RHEL-5 vintage distros,
so there's no reason to keep the dummy tap device for the purpose
of setting the bridge MAC address.

Later the dummy TAP device was used for a second purpose related
to IPv6 DAD (Duplicate Address Detection) in:

  commit db488c7917
  Author: Benjamin Cama <benoar@dolka.fr>
  Date:   Wed Sep 26 21:02:20 2012 +0200

    network: fix dnsmasq/radvd binding to IPv6 on recent kernels

This was again dealing with a regression in the Linux kernel, where
if there were no devices attached to the bridge in the UP state,
IPv6 DAD would not be performed. The virbr0-nic was attached but
in the DOWN state, so the above libvirt fix tenporarily brought
the NIC online. The Linux commit causing the problem was in v2.6.38

  commit 1faa4356a3bd89ea11fb92752d897cff3a20ec0e
  Author: stephen hemminger <shemminger@vyatta.com>
  Date:   Mon Mar 7 08:34:06 2011 +0000

    bridge: control carrier based on ports online

A short while later Linux was tweaked so that DAD would still occur
if the bridge had no attached devices at all in 3.1:

  commit b64b73d7d0c480f75684519c6134e79d50c1b341
  Author: stephen hemminger <shemminger@vyatta.com>
  Date:   Mon Oct 3 18:14:45 2011 +0000

    bridge: leave carrier on for empty bridge

IOW, the only reason we need the DAD hack of bringing virbr0-nic
online is because virbr0-nic exists. Once it doesn't exist, then
we hit the "empty bridge" case which works in Linux.

We can rely on distros having Linux kernel >= 3.1, so both things
that the virbr0-nic are doing are redundant.

Fixes https://gitlab.com/libvirt/libvirt/-/issues/53
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 17:17:30 +01:00
Tim Wiederhake
36f922ef8e cpu_map: Use g_auto* in cpuMapLoad
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-04 16:04:24 +02:00
Daniel P. Berrangé
090fd6a413 util: add device name in errors from ethtool ioctls
Reviewed-by: Laine Stump <laine@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:02:34 +01:00
Martin Kletzander
3791f29b08 qemu: Do not error out when setting affinity failed
Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:44:21 +02:00
Daniel P. Berrangé
59c5bf3faa util: re-add conditional for ifi_iqdrops field for macOS
The conditional was removed in

  commit ebbf8ebe4f
  Author: Ján Tomko <jtomko@redhat.com>
  Date:   Tue Sep 1 22:56:37 2020 +0200

    util: virnetdevtap: stats: fix txdrop on FreeBSD

That commit was correct about this no longer being required for FreeBSD,
but missed that the code is also built on macOS.

Rather than testing for this field in meson though, we can simply use
a platform conditional test in the code.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 11:19:08 +01:00
Martin Kletzander
c69915ccaf peer2peer migration: allow connecting to local sockets
Local socket connections were outright disabled because there was no "server"
part in the URI.  However, given how requirements and usage scenarios are
evolving, some management apps might need the source libvirt daemon to connect
to the destination daemon over a UNIX socket for peer2peer migration.  Since we
cannot know where the socket leads (whether the same daemon or not) let's decide
that based on whether the socket path is non-standard, or rather explicitly
specified in the URI.  Checking non-standard path would require to ask the
daemon for configuration and the only misuse that it would prevent would be a
pretty weird one.  And that's not worth it.  The assumption is that whenever
someone uses explicit UNIX socket paths in the URI for migration they better
know what they are doing.

Partially resolves: https://bugzilla.redhat.com/1638889

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
49186372db qemu: Allow NBD migration over UNIX socket
Adds new typed param for migration and uses this as a UNIX socket path that
should be used for the NBD part of migration.  And also adds virsh support.

Partially resolves: https://bugzilla.redhat.com/1638889

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
c5872b9a1b tests: Add simple test for virDomainMigrateCheckNotLocal
For this we need to make the function accessible (at least privately).  The
behaviour will change in following patches and the test helps explaining the
change.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
e74d627bb3 qemu: Rework starting NBD server for migration
Clean up the semantics by using one extra self-describing variable.
This also fixes the port allocation when the port is specified.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
d17ece4dd4 qemu: Rework qemuMigrationSrcConnect
Instead of saving some data from a union up front and changing an overlayed
struct before using said data, let's just set the new values after they are
decided.  This will increase the readability of future commit(s).

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Martin Kletzander
ae200449fe qemu: Use g_autofree in qemuMigrationSrcConnect
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-09-04 10:20:49 +02:00
Michal Privoznik
8abd1ffed1 qemu_namespace: Be tolerant to non-existent files when populating /dev
In 6.7.0 release I've changed how domain namespace is built and
populated. Previously it used to be done from a pre-exec hook
(ran in the forked off child, just before dropping all privileges
and exec()-ing QEMU), which not only meant we had to have two
different code paths for creating a node in domain's namespace
(one for this pre-exec hook, the other for hotplug ran from the
daemon), it also proved problematic because it was leaking FDs
into QEMU process.

To mitigate this problem, we've not only ditched libdevmapper
from the NS population process, I've also dropped the pre-exec
code and let the NS be populated from the daemon (using the
hotplug code). But, I was not careful when doing so, because the
pre-exec code was tolerant to files that doesn't exist, while
this new code isn't. For instance, the very first thing that is
done when the new NS is created is it's populated with
@defaultDeviceACL which contain files like /dev/null, /dev/zero,
/dev/random and /dev/kvm (and others).  While the rest will
probably exist every time, /dev/kvm might not and thus the new
code I wrote has to be tolerant to that.

Of course, users can override the @defaultDeviceACL (by setting
cgroup_device_acl in qemu.conf) and remove /dev/kvm (which is
acceptable workaround), but we definitely want libvirt to work
out of the box even on hosts without KVM.

Fixes: 9048dc4e62
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-04 08:18:21 +02:00
Ján Tomko
ebbf8ebe4f util: virnetdevtap: stats: fix txdrop on FreeBSD
For older FreeBSD, we needed an ifdef guard to use
if_data.ifi_oqdrops, which was introduced by:

commit 61bbdbb94c
    Implement interface stats for BSD

But when we dropped the check because we deprecated
building on FreeBSD-10 in:

commit 83131d9714
    configure: drop check for unsupported FreeBSD

We started building the wrong side of the ifdef.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: 83131d9714
Reviewed-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
2020-09-03 20:25:07 +02:00
Daniel P. Berrangé
16317c2b59 remote: adapt augeas test case for dynamic polkit config change
We need to use @default_auth@ in the augeas test case to match
its use in the main libvirtd.conf.in file.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-03 16:37:17 +01:00
Daniel P. Berrangé
b196f8fcdd remote: use SocketMode=0600 when polkit is not compiled
The systemd .socket unit files we ship for libvirt daemons use
SocketMode=0666 on the assumption that libvirt is built with
polkit which provides access control.

Some people, however, may have explicitly turned off polkit at
build time and not realize that leaves them insecure unless
they also change the SocketMode.  This addresses that problem
by making the SocketMode default to 0600 when polkit is
disabled at compile time.

Note we cannot automatically fix the case where the user
compiles polkit, but then overrides the libvirtd.conf defaults
to disable polkit. This is what lead to CVE-2020-15708 in
Ubuntu 20.10.  We can at least improve the inline comments
in the config file to give a clearer warning though, which
may have helped avoid the mistaken config.

Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-03 11:13:25 +01:00
Han Han
be28a7fbd6 qemu_validate: Only allow none address for watchdog ib700
Since QEMU 1.5.3, the ib700 watchdog device has no options for address,
and not address in device tree:

$ /usr/libexec/qemu-kvm -version
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-175.el7), Copyright (c) 2003-2008 Fabrice Bellard
$ /usr/libexec/qemu-kvm -device ib700,\?
$ virsh qemu-monitor-command seabios --hmp info qtree|grep ib700 -A 2
        dev: ib700, id "watchdog0"
        dev: isa-serial, id "serial0"
          index = 0

So only allow it to use none address.

Fixes: 8a54cc1d08
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1509908

Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-02 18:50:38 +02:00
Thomas Huth
f8333b3b0a qemu: Fix domfsinfo for non-PCI device information from guest agent
qemuAgentFSInfoToPublic() currently only sets the devAlias for PCI devices.
However, the QEMU guest agent could also provide the device name in the
"dev" field of the response for other devices instead (well, at least after
fixing another problem in the current QEMU guest agent...). So if creating
the devAlias from the PCI information failed, let's fall back to the name
provided by the guest agent. This helps to fix the empty "Target" fields
that occur when running "virsh domfsinfo" on s390x where CCW devices are
used for the guest instead of PCI devices.

Also add a proper debug message here in case we completely failed to set the
device alias, since this problem here was very hard to debug: The only two
error messages that I've seen were "Unable to get filesystem information"
and "Unable to encode message payload" - which only indicates that something
went wrong in the RPC call. No debug message indicated the real problem, so
I had to learn the hard way why the RPC call failed (it apparently does not
like devAlias left to be NULL) and where the real problem comes from.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1755075
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-09-02 17:49:09 +01:00
Thomas Huth
2f5d8ffebe qemu: Do not silently allow non-available timers on non-x86 systems
libvirt currently silently allows <timer name="kvmclock"/> and some
other timer tags in the guest XML definition for timers that do not
exist on non-x86 systems. We should not silently ignore these tags
since the users might not get what they expected otherwise.
Note: The error is only generated if the timer is marked with
present="yes" - otherwise we would suddenly refuse XML definitions
that worked without problems before.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1754887
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-02 18:48:14 +02:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Michal Privoznik
63b41d3f93 virfile.c: Remove some #endif comments
There are couple of conditional #includes at the beginning of
virfile.c and they try to be nice and document #endifs. But they
are mostly wrong because either they have the condition in the
comment inverted or the comment refers to a different condition
than they belong to. Just remove the comments as these #includes
are single line mostly.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Michal Privoznik
e1178d55c6 util: Check for HAVE_NET_IF_H correctly
There are two places where we try to check whether the host
system has net/if.h before including it. But the check is missing
'_H' suffix.

Fixes: 7f3eb533f4
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Patrick Magauran
69e3381626 qemu: Add e1000e/vmxnet3 IFF_VNET_HDR support
Setting IFF_VNET_HDR for a tap device passes the whole packet to the
host, reducing emulation overhead and improving performance.

Libvirt bases its decision about applying IFF_VNET_HDR to the tap
interface on whether or not the model of the emulated network device
is virtio.  Originally, virtio was the only model to support
IFF_VNET_HDR in QEMU; however, the e1000e & vmxnet3 adapters have also
supported it since their introductions - QEMU commit
786fd2b0f87 for vmxnet3, and QEMU commit 6f3fbe4ed0 for e1000e, so it
should be set for those models too.

Signed-off-by: Patrick Magauran <patmagauran.j@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-09-01 18:48:21 -04:00
Ján Tomko
e603fcf537 conf: fix enum conversion
../src/conf/domain_conf.c:8144:78: error: result of comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-unsigned-enum-zero-compare]
        if ((def->writeFiltering = virTristateBoolTypeFromString(filtering)) < 0) {
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: 9d15647dcb
2020-09-01 23:34:00 +02:00
Jim Fehlig
01ad5de41d Xen: Add support for writeFiltering in config converter
Add support for the writeFiltering attribute in the domXML to native
config converter. Also include a test.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-01 14:29:46 -06:00
Jim Fehlig
9d15647dcb Xen: Add writeFiltering option for PCI devices
By default Xen only allows guests to write "known safe" values into PCI
configuration space, yet many devices require writes to other areas of
the configuration space in order to operate properly. To allow writing
any values Xen supports the 'permissive' setting, see xl.cfg(5) man page.

This change models Xen's permissive setting by adding a writeFiltering
attribute on the <source> element of a PCI hostdev. When writeFiltering
is set to 'no', the Xen permissive setting will be enabled and guests
will be able to write any values into the device's configuration space.
The permissive setting remains disabled in the absense of the
writeFiltering attribute, of if it is explicitly set to 'yes'.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-01 14:29:17 -06:00
Jim Fehlig
2ad009eadd qemu: Check for changes in qemu modules directory
Add a configuration option for specifying location of the qemu modules
directory, defaulting to /usr/lib64/qemu. Then use this location to
check for changes in the directory, indicating that a qemu module has
changed and capabilities need to be reprobed.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-01 14:22:24 -06:00
Ján Tomko
6fab37da59 Prefer https: everywhere where possible
Use https: links for websites that support them.

The URIs which are used as namespace identifiers
are left alone.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
2020-09-01 21:58:46 +02:00
Ján Tomko
daec478600 Prefer https: for Red Hat websites
The list archives, people.redhat.com and bugzilla all support
https.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
2020-09-01 21:58:46 +02:00
Ján Tomko
4e7a27b610 Prefer https: for Wikipedia links
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
2020-09-01 21:58:45 +02:00
Ján Tomko
4216192929 vbox: do not repeat the innotek namespace url
Also, remove the url from the translatable string,
reducing it to the generic message already used
by virXMLNamespaceRegister.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
2020-09-01 21:58:45 +02:00
Ján Tomko
4b45a7102e libxl: do not include math.h
The include was introduced by:
  commit 3d6fe99c5c
    Add vcpu functions to libxl driver
which used ceil() and floor(), but these were later
removed by:
  commit 3eb869a04b
    libxl: avoid compiler warning
which did not remove the include.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
2020-09-01 21:52:47 +02:00
Laine Stump
95089f481e util: assign tap device names using a monotonically increasing integer
When creating a standard tap device, if provided with an ifname that
contains "%d", rather than taking that literally as the name to use
for the new device, the kernel will instead use that string as a
template, and search for the lowest number that could be put in place
of %d and produce an otherwise unused and unique name for the new
device. For example, if there is no tap device name given in the XML,
libvirt will always send "vnet%d" as the device name, and the kernel
will create new devices named "vnet0", "vnet1", etc. If one of those
devices is deleted, creating a "hole" in the name list, the kernel
will always attempt to reuse the name in the hole first before using a
name with a higher number (i.e. it finds the lowest possible unused
number).

The problem with this, as described in the previous patch dealing with
macvtap device naming, is that it makes "immediate reuse" of a newly
freed tap device name *much* more common, and in the aftermath of
deleting a tap device, there is some other necessary cleanup of things
which are named based on the device name (nwfilter rules, bandwidth
rules, OVS switch ports, to name a few) that could end up stomping
over the top of the setup of a new device of the same name for a
different guest.

Since the kernel "create a name based on a template" functionality for
tap devices doesn't exist for macvtap, this patch for standard tap
devices is a bit different from the previous patch for macvtap - in
particular there was no previous "bitmap ID reservation system" or
overly-complex retry loop that needed to be removed. We simply find
and unused name, and pass that name on to the kernel instead of
"vnet%d".

This counter is also wrapped when either it gets to INT_MAX or if the
full name would overflow IFNAMSIZ-1 characters. In the case of
"vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly
someday someone will change the name from vnet to something else.

(NB: It is still possible for a user to provide their own
parameterized template name (e.g. "mytap%d") in the XML, and libvirt
will just pass that through to the kernel as it always has.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:44 -04:00
Laine Stump
d7f38beb2e util: replace macvtap name reservation bitmap with a simple counter
There have been some reports that, due to libvirt always trying to
assign the lowest numbered macvtap / tap device name possible, a new
guest would sometimes be started using the same tap device name as
previously used by another guest that is in the process of being
destroyed *as the new guest is starting.

In some cases this has led to, for example, the old guest's
qemuProcessStop() code deleting a port from an OVS switch that had
just been re-added by the new guest (because the port name is based on
only the device name using the port). Similar problems can happen (and
I believe have) with nwfilter rules and bandwidth rules (which are
both instantiated based on the name of the tap device).

A couple patches have been previously proposed to change the ordering
of startup and shutdown processing, or to put a mutex around
everything related to the tap/macvtap device name usage, but in the
end no matter what you do there will still be possible holes, because
the device could be deleted outside libvirt's control (for example,
regular tap devices are automatically deleted when the qemu process
terminates, and that isn't always initiated by libvirt but could
instead happen completely asynchronously - libvirt then has no control
over the ordering of shutdown operations, and no opportunity to
protect it with a mutex.)

But this only happens if a new device is created at the same time as
one is being deleted. We can effectively eliminate the chance of this
happening if we end the practice of always looking for the lowest
numbered available device name, and instead just keep an integer that
is incremented each time we need a new device name. At some point it
will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15
character limit if nothing else), and we can't guarantee that the new
name really will be the *least* recently used name, but "math"
suggests that it will be *much* less common that we'll try to re-use
the *most* recently used name.

This patch implements such a counter for macvtap/macvlan, replacing
the existing, and much more complicated, "ID reservation" system. The
counter is set according to whatever macvtap/macvlan devices are
already in use by guests when libvirtd is started, incremented each
time a new device name is needed, and wraps back to 0 when either
INT_MAX is reached, or when the resulting device name would be longer
than IFNAMSIZ-1 characters (which actually is what happens when the
template for the device name is "maccvtap%d"). The result is that no
macvtap name will be re-used until the host has created (and possibly
destroyed) 99,999,999 devices.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:36 -04:00
Laine Stump
b546b48344 meson: link libm
On some platforms libm (needed for the pow() function) isn't being
linked in somehow. This patch adds the necessary bits to assure that
it's linked in when necessary.

Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit 20a62b42ec001310a6329d7ee2021f0737d534ef)
2020-09-01 14:16:19 -04:00
Andrea Bolognani
88c3490aa1 meson: Use @BASENAME@ more
This patch takes care of just the obvious cases: there are
many more situations where the data we pass to configure_file()
could likely be obtained in a more effective way, but we can
address the low-hanging fruits as a first approximation.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-01 13:22:24 +02:00
Michal Privoznik
fc19155819 qemu: Validate memory hotplug in domainValidateCallback instead of cmd line generator
When editing a domain with hotplug enabled, I removed the only
NUMA node it had and got no error. I got the error later though,
when starting the domain. This is not as user friendly as it can
be. Move the validation call out from command line generator and
into domain validator (which is called prior to starting cmd line
generation anyway).

When doing this, I had to remove memory-hotplug-nonuma xml2xml
test case because there is no way the test case can succeed,
obviously.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-09-01 09:30:27 +02:00
Roman Bogorodskiy
4955a459f4 meson: don't install sysconf files unconditionally
There's no need to install sysconf files when init script installation
was not requested, i.e. when configured with init_script=none.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-08-27 16:14:15 +04:00
Kevin Locke
44cbd3afaf
apparmor: allow libvirtd to call virtiofsd
When using [virtiofs], libvirtd must launch [virtiofsd] to provide
filesystem access on the host.  When a guest is configured with
virtiofs, such as:

    <filesystem type='mount' accessmode='passthrough'>
      <driver type='virtiofs'/>
      <source dir='/path'/>
      <target dir='mount_tag'/>
    </filesystem>

Attempting to start the guest fails with:

    internal error: virtiofsd died unexpectedly

/var/log/libvirt/qemu/$name-fs0-virtiofsd.log contains (as a single
line, wrapped below):

    libvirt:  error : cannot execute binary /usr/lib/qemu/virtiofsd:
    Permission denied

dmesg contains (as a single line, wrapped below):

    audit: type=1400 audit(1598229295.959:73): apparmor="DENIED"
    operation="exec" profile="libvirtd" name="/usr/lib/qemu/virtiofsd"
    pid=46007 comm="rpc-worker" requested_mask="x" denied_mask="x"
    fsuid=0 ouid=0

To avoid this, allow execution of virtiofsd from the libvirtd AppArmor
profile.

[virtiofs]: https://libvirt.org/kbase/virtiofs.html
[virtiofsd]: https://www.qemu.org/docs/master/interop/virtiofsd.html

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
Reviewed-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
2020-08-26 10:34:53 +02:00
Scott Shambarger
89f5b90a5f util: use host module suffix when loading drivers
Driver module loaders current hardcode ".so" as the file
extension.  On MacOS, meson uses ".dylib" as a module file extension.
This patch adds VIR_FILE_MODULE_EXT to virfile.h defined as the
hosts module extension, and updates driver module loaders to make
use of it.

Signed-off-by: Scott Shambarger <scott-libvirt@shambarger.net>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-08-26 10:30:18 +02:00
Daniel Henrique Barboza
46d88d8dba domaincapsmock: mock virHostCPUGetMicrocodeVersion()
Previous patch handled the runtime case where a non-x86 host is
fetching /proc/cpuinfo data for a microcode info that we know
it doesn't exist. This change alone speeded everything by a
bit for non-x86, but there is at least one major culprit left.

qemuxml2argvtest does several arch-specific tests, and a good
chunk of them are x86 exclusive. This means that 'hostArch'
will be seen as x86 for these tests, even when running in
non-x86 hosts. In a Power 9 server with 128 CPUs, qemuxml2argvtest
takes 298 seconds to complete in average, and 'perf record'
indicates that 95% of the time is spent in
virHostCPUGetMicrocodeVersion().

This patch mocks virHostCPUGetMicrocodeVersion() to always return
0 in the tests, avoiding /proc/cpuinfo reads. This will make all
tests behave arch-agnostic, and the microcode value being 0 has no
impact on any existing test.

This is a CI speed across the board for all archs, including x86,
given that we're not reading /proc/cpuinfo in the tests. For
a Thinkpad T480 laptop with 8 Intel i7 CPUs, qemuxml2argvtest
went from 15.50 sec to 12.50 seconds. The performance gain is even
more noticeable for huge servers with lots of CPUs. For the
Power 9 server mentioned above, this patch speeds qemuxml2argvtest
to 9 seconds, down from 298 sec.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-08-25 19:44:43 +02:00
Daniel Henrique Barboza
2ba0b7497c virhostcpu.c: skip non x86 hosts in virHostCPUGetMicrocodeVersion()
Non-x86 archs does not have a 'microcode' version like x86. This is
covered already inside the function - just return 0 if no microcode
is found. Regardless of that, a read of /proc/cpuinfo is always made.
Each read will invoke the kernel to fill in the CPU details every time.

Now let's consider a non-x86 host, like a Power 9 server with 128 CPUs.
Each /proc/cpuinfo read will need to fetch data for each CPU and it
won't even matter because we know beforehand that PowerPC chips don't
have microcode information.

We can do better for non-x86 hosts by skipping this process entirely.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-25 19:44:39 +02:00