Commit Graph

1487 Commits

Author SHA1 Message Date
Michal Privoznik
655f67c68a qemu_process: Drop needless check in qemuProcessNeedMemoryBackingPath()
The aim of this function is to return whether domain definition
and/or memory device that user intents to hotplug needs a private
path inside cfg->memoryBackingDir. The rule for the memory device
that's being hotplug includes checking whether corresponding
guest NUMA node needs memoryBackingDir. Well, while the rationale
behind makes sense it is not necessary to check for that really -
just a few lines above every guest NUMA node was checked exactly
for that.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-05-18 17:47:58 +02:00
Michal Privoznik
4d779874ef qemu_process: Deduplicate code in qemuProcessNeedHugepagesPath()
The aim of qemuProcessNeedHugepagesPath() is to return whether
guest needs private path inside HugeTLBFS mounts (deducted from
domain definition @def) or whether the memory device that user is
hotplugging in needs the private path (deducted from the @mem
argument). The actual creation of the path is done in the only
caller qemuProcessBuildDestroyMemoryPaths().

The rule for the first case (@def) and the second case (@mem) is
the same (domain has a DIMM device that has HP requested) and is
written twice. Move the logic into a function to deduplicate the
code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-05-18 17:47:58 +02:00
Michal Privoznik
310b37e486 qemu: Don't double free @node_cpus in qemuProcessSetupPid()
When placing vCPUs into CGroups the qemuProcessSetupPid() is
called which then enters a for() loop (around its middle) where
it calls virDomainNumaGetNodeCpumask() for each guest NUMA node.
But the latter returns only a pointer not new reference/copy and
thus the caller must not free it. But the variable is decorated
with g_autoptr() which leads to a double free.

Fixes: 2d37d8dbc9
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-04-23 11:02:21 +02:00
Luyao Zhong
2d37d8dbc9 qemu: Add support for 'restrictive' mode in numatune
Signed-off-by: Luyao Zhong <luyao.zhong@intel.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-04-19 11:39:21 +02:00
Michal Privoznik
c8238579fb lib: Drop internal virXXXPtr typedefs
Historically, we declared pointer type to our types:

  typedef struct _virXXX virXXX;
  typedef virXXX *virXXXPtr;

But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.

This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:

https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-04-13 17:00:38 +02:00
Tim Wiederhake
c5d4d0198f qemuProcessUpdateGuestCPU: Check host cpu for forbidden features
See https://bugzilla.redhat.com/show_bug.cgi?id=1840770

Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com
2021-03-26 11:40:55 +01:00
Jiri Denemark
1107c0b9c3 Do not check return value of VIR_REALLOC_N
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2021-03-22 12:44:18 +01:00
Jiri Denemark
b8c919b5b4 qemu: Drop redundant checks for qemuCaps before virQEMUCapsGet
virQEMUCapsGet checks for qemuCaps itself, no need to do it explicitly.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2021-03-22 12:44:18 +01:00
Peter Krempa
8967ad7be6 qemu: backup: Restore security label on backup disk store image on VM termination
When the backup job is terminated normally the security label is
restored by the blockjob finishing handler.

If the VM dies or is destroyed that wouldn't happen as the blockjob
handler wouldn't be called.

Restore the security label on disk store where we remember that the job
was running at the point when 'qemuBackupJobTerminate' was called.

Not resetting the security label means that we also leak the xattr
attributes remembering the label which prevents any further use of the
file, which is a problem for block devices.

This also requires that the call to 'qemuBackupJobTerminate' from
'qemuProcessStop' happens only after 'vm->pid' was reset as otherwise
the security subdrivers attempt to enter the process namespace which
fails if the process isn't running any more.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1939082
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-03-19 16:41:39 +01:00
Michal Privoznik
6e9c4811be qemu_process: Use accessor for def->mem.total_memory
When connecting to the monitor, a timeout is calculated that is
bigger the more memory guest has (because QEMU has to allocate
and possibly zero out the memory and what not, empirically
deducted). However, when computing the timeout the @total_memory
mmember is accessed directly even though
virDomainDefGetMemoryTotal() should have been used.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-16 09:16:13 +01:00
Peter Krempa
aa372e5a01 backup: Store 'apiFlags' in private section of virDomainBackupDef
'qemuBackupJobTerminate' needs the API flags to see whether
VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL. Unfortunately when called via
qemuProcessReconnect()->qemuProcessStop() early (e.g. if the qemu
process died while we were reconnecting) the job is cleared temporarily
so that other APIs can be called. This would mean that we couldn't clean
up the files in some cases.

Save the 'apiFlags' inside the backup object and set it from the
'qemuDomainJobObj' 'apiFlags' member when reconnecting to a VM.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-12 10:59:05 +01:00
Andrea Bolognani
c2180c2fd6 qemu: Set limits only when explicitly asked to do so
The current code is written under the assumption that, for all
limits except the core size, asking for the limit to be set to
zero is a no-op, and so the operation is performed
unconditionally.

While this is the behavior we want for the QEMU driver, the
virCommand and virProcess facilities are generic, and should not
implement this kind of policy: asking for a limit to be set to
zero should result in that limit being set to zero every single
time.

Add some checks in the QEMU driver, effectively moving the
policy where it belongs.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-03-08 22:41:40 +01:00
Andrea Bolognani
bd33680f02 qemu: Set all limits at the same time
qemuProcessLaunch() is the correct place to set process limits,
and in fact is where we were dealing with almost all of them,
but the memory locking limit was handled in
qemuBuildCommandLine() instead for some reason.

The code is rewritten so that the desired limit is calculated
and applied in separated steps, which will help with further
changes, but this doesn't alter the behavior.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-03-08 22:41:40 +01:00
Andrea Bolognani
9bf5c00f9b qemu: Make some minor tweaks
Doing this now will make the next changes nicer.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-03-08 22:41:40 +01:00
Peter Krempa
3c546f7eb4 qemuProcessReportLogError: Don't mark "%s: %s" as translatable
The function is constructing an error message from a prefix and the
contents of the qemu log file. Marking just two string modifiers as
translatable is pointless and will certainly confuse translators.

Remove the marking and add a comment which bypasses the
sc_libvirt_unmarked_diagnostics check.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-05 15:01:29 +01:00
Peter Krempa
c8ff56c7ad qemuProcessReportLogError: Remove unnecessary math for max error message
Now that error message formatting doesn't use fixed size buffers we can
drop the math for calculating the maximum chunk of log to report in the
error message and use a round number. This also makes it obvious that
the chosen number is arbitrary.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-05 15:01:29 +01:00
Michal Privoznik
7f482a67e4 lib: Replace virFileMakePath() with g_mkdir_with_parents()
Generated using the following spatch:

  @@
  expression path;
  @@
  - virFileMakePath(path)
  + g_mkdir_with_parents(path, 0777)

However, 14 occurrences were not replaced, e.g. in
virHostdevManagerNew(). I don't really understand why.
Fixed by hand afterwards.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-04 20:52:23 +01:00
Michal Privoznik
b1e3728dec lib: Replace virFileMakePathWithMode() with g_mkdir_with_parents()
These functions are identical. Made using this spatch:

  @@
  expression path, mode;
  @@
  - virFileMakePathWithMode(path, mode)
  + g_mkdir_with_parents(path, mode)

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2021-03-04 20:52:23 +01:00
Jiri Denemark
c8f3b83c72 qemu_domainjob: Make copy of owner API
Using the job owner API name directly works fine as long as it is a
static string or the owner's thread is still running. However, this is
not always the case. For example, when the owner API name is filled in a
job when we're reconnecting to existing domains after daemon restart,
the dynamically allocated owner name will disappear with the
reconnecting thread. Any follow up usage of the pointer will read random
memory.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-02-25 09:55:31 +01:00
Peng Liang
1ac703a7d0 qemu: Add missing lock in qemuProcessHandleMonitorEOF
qemuMonitorUnregister will be called in multiple threads (e.g. threads
in rpc worker pool and the vm event thread).  In some cases, it isn't
protected by the monitor lock, which may lead to call g_source_unref
more than one time and a use-after-free problem eventually.

Add the missing lock in qemuProcessHandleMonitorEOF (which is the only
position missing lock of monitor I found).

Suggested-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-24 15:00:51 +01:00
Stefan Berger
f30aa2ec74 qemu: Fix libvirt hang due to early TPM device stop
This patch partially reverts commit 5cde9dee where the qemuExtDevicesStop()
was moved to a location before the QEMU process is stopped. It may be
alright to tear down some devices before QEMU is stopped, but it doesn't work
for the external TPM (swtpm) which assumes that QEMU sends it a signal to stop
it before libvirt may try to clean it up. So this patch moves the
virFileDeleteTree() calls after the call to qemuExtDevicesStop() so that the
pid file of virtiofsd is not deleted before that call.

Afftected libvirt versions are 6.10 and 7.0.

Fixes: 5cde9dee8c
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-19 17:31:37 +01:00
Daniel P. Berrangé
17f001c451 qemu: record deprecation messages against the domain
These messages are only valid while the domain is running.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-02-12 09:19:12 +00:00
Peter Krempa
480fecaa21 Replace virStringListJoin by g_strjoinv
Our implementation was inspired by glib anyways. The difference is only
the order of arguments.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-11 17:05:34 +01:00
Peter Krempa
56cedfcf38 Replace virStringListHasString by g_strv_contains
The glib variant doesn't accept NULL list, but there's just one caller
where it wasn't checked explicitly, thus there's no need for our own
wrapper.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-11 17:05:33 +01:00
Peter Krempa
d9f7e87673 qemuProcessUpdateDevices: Refactor cleanup and memory handling
Use automatic memory freeing and remove the 'cleanup' label. Also make
it a bit more obvious that nothing happens if the 'old' list wasn't
present.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-11 17:05:33 +01:00
Daniel P. Berrangé
c32f172d12 qemu: wire up support for maximum CPU model
The "max" model can be treated the same way as "host" model in general.

Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-02-10 11:44:48 +00:00
Laine Stump
674719afe6 qemu: replace VIR_FREE with g_free in all vir*Free() functions
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2021-02-05 00:20:43 -05:00
Pavel Hrdina
836e0a960b storage_source: use virStorageSource prefix for all functions
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-01-22 11:10:27 +01:00
Pavel Hrdina
01f7ade912 util: extract virStorageFile code into storage_source
Up until now we had a runtime code and XML related code in the same
source file inside util directory.

This patch takes the runtime part and extracts it into the new
storage_file directory.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-01-22 11:10:27 +01:00
Laine Stump
c2b2cdf746 call virDomainNetNotifyActualDevice() for all interface types
Now that this function can be called regardless of interface type (and
whether or not we have a conn for the network driver), let's actually
call it for all interface types. This will assure that we re-connect
any disconnected bridge devices for <interface type='bridge'> as
mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1730084#c26
(until now we've only been reconnecting bridge devices for <interface
type='network'>)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-01-08 11:34:49 -05:00
Michal Privoznik
5ac2439a83 qemu_process: Release domain seclabel later in qemuProcessStop()
Some secdrivers (typically SELinux driver) generate unique
dynamic seclabel for each domain (unless a static one is
requested in domain XML). This is achieved by calling
qemuSecurityGenLabel() from qemuProcessPrepareDomain() which
allocates unique seclabel and stores it in domain def->seclabels.
The counterpart is qemuSecurityReleaseLabel() which releases the
label and removes it from def->seclabels. Problem is, that with
current code the qemuProcessStop() may still want to use the
seclabel after it was released, e.g. when it wants to restore the
label of a disk mirror.

What is happening now, is that in qemuProcessStop() the
qemuSecurityReleaseLabel() is called, which removes the SELinux
seclabel from def->seclabels, yada yada yada and eventually
qemuSecurityRestoreImageLabel() is called. This bubbles down to
virSecuritySELinuxRestoreImageLabelSingle() which find no SELinux
seclabel (using virDomainDefGetSecurityLabelDef()) and this
returns early doing nothing.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1751664
Fixes: 8fa0374c5b
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2021-01-06 13:29:09 +01:00
Jiri Denemark
f7c40b5c71 qemu: The TSC tolerance interval should be closed
The kernel refuses to set guest TSC frequency less than a minimum
frequency or greater than maximum frequency (both computed based on the
host TSC frequency). When writing the libvirt code with a reversed logic
(return success when the requested frequency falls within the tolerance
interval) I forgot to include the boundaries.

Fixes: d8e5b45600
https://bugzilla.redhat.com/show_bug.cgi?id=1839095

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-01-06 11:24:37 +01:00
Peter Krempa
d0819b9f02 qemu: Properly handle setting of <iotune> for empty cdrom
When starting a VM with an empty cdrom which has <iotune> configured the
startup fails as qemu is not happy about setting tuning for an empty
drive:

 error: internal error: unable to execute 'block_set_io_throttle', unexpected error: 'Device has no medium'

Resolve this by skipping the setting of throttling for empty drives and
updating the throttling when new medium is inserted into the drive.

Resolves: https://gitlab.com/libvirt/libvirt/-/issues/111
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2021-01-06 09:24:48 +01:00
Shi Lei
9b5d741a9d netdevmacvlan: Use helper function to create unique macvlan/macvtap name
Simplify ReserveName/GenerateName for macvlan and macvtap by using
common functions.

Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-12-15 13:35:33 -05:00
Shi Lei
c36cad1a31 netdevtap: Use common helper function to create unique tap name
Simplify GenerateName/ReserveName for netdevtap by using common
functions.

Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-12-15 13:35:27 -05:00
Daniel Henrique Barboza
9432693e2b domain_conf.c: move virDomainDeviceDefValidate() to domain_validate.c
Move virDomainDeviceDefValidate() and all its helper functions to
domain_validate.c.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-14 09:29:09 -03:00
Peter Krempa
18de9dfd77 virDomainDefValidate: Add per-run 'opaque' data
virDomainDefPostParse infrastructure has apart from the global opaque
data also per-run data, but this was not duplicated into the validation
callbacks.

This is important when drivers want to use correct run-state for the
validation.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-09 09:33:47 +01:00
Peter Krempa
c1720b9ac7 qemuDomainDiskLookupByNodename: Lookup also backup 'store' nodenames
Nodename may be asociated to a disk backup job, add support to looking
up in that chain too. This is specifically useful for the
BLOCK_WRITE_THRESHOLD event which can be registered for any nodename.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-08 15:12:34 +01:00
Michal Privoznik
40a162f83e qemu: Don't cache NUMA caps
In v6.0.0-rc1~439 (and friends) we tried to cache NUMA
capabilities because we assumed they are immutable. And to some
extent they are (NUMA hotplug is not a thing, is it). However,
our capabilities contain also some runtime info that can change,
e.g. hugepages pool allocation sizes or total amount of memory
per node (host side memory hotplug might change the value).

Because of the caching we might not be reporting the correct
runtime info in 'virsh capabilities'.

The NUMA caps are used in three places:

  1) 'virsh capabilities'
  2) domain startup, when parsing numad reply
  3) parsing domain private data XML

In cases 2) and 3) we need NUMA caps to construct list of
physical CPUs that belong to NUMA nodes from numad reply. And
while this may seem static, it's not really because of possible
CPU hotplug on physical host.

There are two possible approaches:

  1) build a validation mechanism that would invalidate the
     cached NUMA caps, or
  2) drop the caching and construct NUMA caps from scratch on
     each use.

In this commit, the latter approach is implemented, because it's
easier.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1819058
Fixes: 1a1d848694
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-07 11:32:40 +01:00
Peter Krempa
0f7b80691b qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately
Don't make the logic confusing by representing the 3 options using an
integer with negative values.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2020-12-07 10:15:00 +01:00
Daniel Henrique Barboza
5a34d0667d qemu: move memory size align to qemuProcessPrepareDomain()
qemuBuildCommandLine() is calling qemuDomainAlignMemorySizes(),
which is an operation that changes live XML and domain and has
little to do with the command line build process.

Move it to qemuProcessPrepareDomain() where we're supposed to
make live XML and domain changes before launch. qemuProcessStart()
is setting VIR_QEMU_PROCESS_START_NEW if !migrate && !snapshot,
same conditions used in qemuBuildCommandLine() to call
qemuDomainAlignMemorySizes(), making this change seamless.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-03 17:19:35 -03:00
Daniel Henrique Barboza
3bb9ed8bc2 qemu_process.c: check migrateURI when setting VIR_QEMU_PROCESS_START_NEW
qemuProcessCreatePretendCmdPrepare() is setting the
VIR_QEMU_PROCESS_START_NEW regardless of whether this is
a migration case or not. This behavior differs from what we're
doing in qemuProcessStart(), where the flag is set only
if !migrate && !snapshot.

Fix it by making the flag setting consistent with what we're
doing in qemuProcessStart().

Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-12-03 17:16:33 -03:00
John Ferlan
148cfcf051 qemu: Pass / fill niothreads for qemuMonitorGetIOThreads
Let's pass along / fill @niothreads rather than trying to make dual
use as a return value and thread count.

This resolves a Coverity issue detected in qemuDomainGetIOThreadsMon
where if qemuDomainObjExitMonitor failed, then a -1 was returned and
overwrite @niothreads causing a memory leak.

Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-12-03 17:06:07 +01:00
Michal Privoznik
b7d4e6b67e lib: Replace VIR_AUTOSTRINGLIST with GStrv
Glib provides g_auto(GStrv) which is in-place replacement of our
VIR_AUTOSTRINGLIST.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-12-02 15:43:07 +01:00
Pavel Hrdina
82bda55e2f qemuProcessHandleGraphics: no need to check for NULL
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-11-16 17:25:41 +01:00
Jiri Denemark
d8e5b45600 qemu: Do not require TSC frequency to strictly match host
Some CPUs provide a way to read exact TSC frequency, while measuring it
is required on other CPUs. However, measuring is never exact and the
result may slightly differ across reboots. For this reason both Linux
kernel and QEMU recently started allowing for guests TSC frequency to
fall into +/- 250 ppm tolerance interval around the host TSC frequency.

Let's do the same to avoid unnecessary failures (esp. during migration)
in case the host frequency does not exactly match the frequency
configured in a domain XML.

https://bugzilla.redhat.com/show_bug.cgi?id=1839095

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-11-12 17:29:16 +01:00
Masayoshi Mizuma
5cde9dee8c qemu: Move qemuExtDevicesStop() before removing the pidfiles
A qemu guest which has virtiofs config fails to start if the previous
starting failed because of invalid option or something.

That's because the virtiofsd isn't killed by virPidFileForceCleanupPath()
on the former failure because the pidfile was already removed by
virFileDeleteTree(priv->libDir) in qemuProcessStop(), so
virPidFileForceCleanupPath() just returned.

Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that
virPidFileForceCleanupPath() can kill virtiofsd correctly.

For example of the reproduction:

  # virsh start guest
  error: Failed to start domain guest
  error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option

  ... fix the option ...

  # virsh start guest
  error: Failed to start domain guest
  error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy
  #

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-11 15:20:12 +01:00
Peter Krempa
62a01d84a3 util: hash: Retire 'virHashTable' in favor of 'GHashTable'
Don't hide our use of GHashTable behind our typedef. This will also
promote the use of glibs hash function directly.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Matt Coleman <matt@datto.com>
2020-11-06 10:40:51 +01:00
Daniel P. Berrangé
99a1cfc438 qemu: honour fatal errors dealing with qemu slirp helper
Currently all errors from qemuInterfacePrepareSlirp() are completely
ignored by the callers. The intention is that missing qemu-slirp binary
should cause the caller to fallback to the built-in slirp impl.

Many of the possible errors though should indeed be considered fatal.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-10-27 12:03:19 +00:00
zhenwei pi
7555a55470 qemu: implement memory failure event
Since QEMU 5.2 (commit-77b285f7f6), QEMU supports 'memory failure'
event, posts event to monitor if hitting a hardware memory error.
Fully support this feature for QEMU.

Test with commit 'libvirt: support memory failure event', build a
little complex environment(nested KVM):
1, install newly built libvirt in L1, and start a L2 vm. run command
in L1:
 ~# virsh event l2 --event memory-failure

2, run command in L0 to inject MCE to L1:
 ~# virsh qemu-monitor-command l1 --hmp mce 0 9 0xbd000000000000c0 0xd 0x62000000 0x8c

Test result in l1(recipient hypervisor case):
event 'memory-failure' for domain l2:
recipient: hypervisor
action: ignore
flags:
        action required: 0
        recursive: 0

Test result in l1(recipient guest case):
event 'memory-failure' for domain l2:
recipient: guest
action: inject
flags:
        action required: 0
        recursive: 0

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-23 09:42:00 +02:00
Peter Krempa
d6d4c08daf util: hash: Change type of hash table name/key to 'char'
All users of virHashTable pass strings as the name/key of the entry.
Make this an official requirement by turning the variables to 'const
char *'.

For any other case it's better to use glib's GHashTable.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-10-22 15:02:46 +02:00
Daniel P. Berrangé
7b1ed1cd73 qemu: stop passing -enable-fips to QEMU >= 5.2.0
Use of the -enable-fips option is being deprecated in QEMU >= 5.2.0. If
FIPS compliance is required, QEMU must be built with libcrypt which will
unconditionally enforce it.

Thus there is no need for libvirt to pass -enable-fips to modern QEMU.
Unfortunately there was never any way to probe for -enable-fips in the
first instance, it was enabled by libvirt based on version number
originally, and then later unconditionally enabled when libvirt dropped
support for older QEMU. Similarly we now use a version number check to
decide when to stop passing -enable-fips.

Note that the qemu-5.2 capabilities are currently from the pre-release
version and will be updated once qemu-5.2 is released.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-10-22 09:03:33 +02:00
Jonathon Jongsma
08f8fd8413 conf: Add support for vDPA network devices
This patch adds new schema and adds support for parsing and formatting
domain configurations that include vdpa devices.

vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.

When a device on the host is bound to an appropriate vendor-specific
driver, it will create a chardev on the host at e.g.  /dev/vhost-vdpa-0.
That chardev path can then be used to define a new interface with
type='vdpa'.

Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2020-10-20 14:46:52 -04:00
Peter Krempa
7b0ced89e7 qemu: Prepare hostdev data which depends on the host state separately
SCSI hostdev setup requires querying the host os for the actual path of
the configured hostdev. This was historically done in the command line
formatter. Our new approach is to split out this part into
'qemuProcessPrepareHost' which is designed to be skipped in tests.

Refactor the hostdev code to use this new semantics, and add appropriate
handlers filling in the data for tests and the qemuConnectDomainXMLToNative
users.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-20 15:08:22 +02:00
Peter Krempa
9ff3ad9058 qemuProcessCreatePretendCmd: Split up preparation and command building
Host preparation steps which are deliberately skipped when
pretend-creating a commandline are normally executed after VM object
preparation. In the test code we are faking some of the host
preparation steps, but we were doing that prior to the call to
qemuProcessPrepareDomain embedded in qemuProcessCreatePretendCmd.

By splitting up qemuProcessCreatePretendCmd into two functions we can
ensure that the ordering of the prepare steps stays consistent.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-10-20 15:08:22 +02:00
Erik Skultety
ccb40cf288 qemu: process: sev: Fill missing 'cbitpos' & 'reducedPhysBits' from caps
These XML attributes have been mandatory since the introduction of SEV
support to libvirt. This design decision was based on QEMU's
requirement for these to be mandatory for migration purposes, as
differences in these values across platforms must result in the
pre-migration checks failing (not that migration with SEV works at the
time of this patch).

This patch enables autofill of these attributes right before launching
QEMU and thus updating the live XML.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:03:27 +02:00
Erik Skultety
1fdc907325 qemu: process: Move SEV capability check to qemuValidateDomainDef
Checks such as this one should be done at domain def validation time,
not before starting the QEMU process.
As for this change, existing domains will see some QEMU error when
starting as opposed to a libvirt error that this QEMU binary doesn't
support SEV, but that's okay, we never guaranteed error messages to
remain the same.

Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:03:16 +02:00
Erik Skultety
649f720a9a qemu_process: sev: Drop an unused variable
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-10-19 11:01:56 +02:00
Pavel Hrdina
5ad8272888 util: vircgroup: change virCgroupFree to take only virCgroupPtr
As preparation for g_autoptr() we need to change the function to take
only virCgroupPtr.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
2020-10-09 16:24:35 +02:00
Ján Tomko
cc3190cc4c qemu: process: use g_new0
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-10-05 16:44:06 +02:00
Ján Tomko
868c350752 qemu: separate out VIR_ALLOC calls
Move them to separate conditions to reduce churn
in following patches.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-10-05 16:44:06 +02:00
Cole Robinson
0fa5c23865 qemu: Taint cpu host-passthrough only after migration
From a discussion last year[1], Dan recommended libvirt drop the tain
flag for cpu host-passthrough, unless the VM has been migrated.

This repurposes the existing host-cpu taint flag to do just that.

[1]: https://www.redhat.com/archives/virt-tools-list/2019-February/msg00041.html

https://bugzilla.redhat.com/show_bug.cgi?id=1673098

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
2020-10-05 10:08:26 -04:00
Peter Krempa
faa88866f5 Don't check return value of virBitmapNewCopy
The function will not fail any more.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-05 15:50:45 +02:00
Peter Krempa
cb6fdb0125 virBitmapNew: Don't check return value
Remove return value check from all callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-10-05 15:38:47 +02:00
Masayoshi Mizuma
1c9227de5d qemu: process: Handle transient disks on VM startup
Add overlays after the VM starts before we start executing guest code.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
afc25e8553 qemu: prepare cleanup for <transient/> disk overlays
Later patches will implement support for <transient/> disks in libvirt
by installing an overlay on top of the configured image. This will
require cleanup after the VM will be stopped so that the state is
correctly discarded.

Since the overlay will be installed only during the startup phase of the
VM we need to ensure that qemuProcessStop doesn't delete the original
file on some previous failure. This is solved by adding
'inhibitDiskTransientDelete' VM private data member which is set prior
to any startup step and will be cleared once transient disk overlays are
established.

Based on that we can then delete the overlays for any <transient/> disk.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
2020-10-01 09:55:02 +02:00
Peter Krempa
3673bdbe13 qemu: domain: Extract preparation of hostdev specific data to a separate function
Historically we've prepared secrets for all objects in one place. This
doesn't make much sense and it's semantically more appealing to prepare
everything for a single device type in one place.

Move the setup of the (iSCSI|SCSI) hostdev secrets into a new function
which will be used to setup other things as well in the future.

This is a similar approach we do for disks.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-15 15:20:23 +02:00
Ján Tomko
af16e754cd qemuProcessReconnect: clear 'oldjob'
After we started copying the privateData pointer in
qemuDomainObjRestoreJob, we should also free them
once we're done with them.

Register the clear function and use g_auto.
Also add a check for job->cb to qemuDomainObjClearJob,
to prevent freeing an uninitialized job.

https://bugzilla.redhat.com/show_bug.cgi?id=1878450

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Fixes: aca37c3fb2
2020-09-14 18:10:56 +02:00
Tim Wiederhake
caf5a88e59 qemu: Use glib memory functions in qemuProcessReadLog
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-09-11 18:19:58 +02:00
Michal Privoznik
ec46e6d44b qemu_process: Separate VIR_PERF_EVENT_* setting into a function
When starting a domain, qemuProcessLaunch() iterates over all
VIR_PERF_EVENT_* values and (possibly) enables them. While there
is nothing wrong with the code, the for loop where it's done makes
it harder to jump onto next block of code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-08 10:57:24 +02:00
Martin Kletzander
f5b486daea qemu: Allow setting affinity to fail and don't report error
This is just a clean-up of commit 3791f29b08 using the new parameter of
virProcessSetAffinity() introduced in commit 9514e24984 so that there is
no error reported in the logs.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 14:48:57 +02:00
Martin Kletzander
9514e24984 Do not report error when setting affinity is allowed to fail
Suggested-by: Ján Tomko <jtomko@redhat.com>

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-07 11:35:36 +02:00
Nikolay Shirokovskiy
5c0cd375d1 qemu: don't shutdown event thread in monitor EOF callback
This hunk was introduced in [1] in order to avoid loosing
events from monitor on stopping qemu process. But as explained
in [2] on destroy we won't get neither EOF nor any other
events as monitor is just closed. In case of crash/shutdown
we won't get any more events as well and qemuDomainObjStopWorker
will be called by qemuProcessStop eventually. Thus let's
remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF
as it is not useful anymore.

[1] e6afacb0f: qemu: start/stop an event loop thread for domains
[2] d2954c072: qemu: ensure domain event thread is always stopped

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-07 09:33:59 +03:00
Martin Kletzander
fc7d53edf4 qemu: Fix comment in qemuProcessSetupPid
This was supposed to be done in commit 3791f29b08, but I missed a spot.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2020-09-06 13:44:27 +02:00
Martin Kletzander
3791f29b08 qemu: Do not error out when setting affinity failed
Consider a host with 8 CPUs. There are the following possible scenarios

1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs

2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs

3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
   QEMU should get 8 CPUs

4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 8 CPUs

5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
   QEMU should get 4 CPUs

6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
   QEMU should get 4 CPUs

Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.

Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue

Scenario 4 works only if CAP_SYS_NICE is availalbe

Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.

If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.

Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.

Resolves: https://bugzilla.redhat.com/1819801

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-04 14:44:21 +02:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Laine Stump
95089f481e util: assign tap device names using a monotonically increasing integer
When creating a standard tap device, if provided with an ifname that
contains "%d", rather than taking that literally as the name to use
for the new device, the kernel will instead use that string as a
template, and search for the lowest number that could be put in place
of %d and produce an otherwise unused and unique name for the new
device. For example, if there is no tap device name given in the XML,
libvirt will always send "vnet%d" as the device name, and the kernel
will create new devices named "vnet0", "vnet1", etc. If one of those
devices is deleted, creating a "hole" in the name list, the kernel
will always attempt to reuse the name in the hole first before using a
name with a higher number (i.e. it finds the lowest possible unused
number).

The problem with this, as described in the previous patch dealing with
macvtap device naming, is that it makes "immediate reuse" of a newly
freed tap device name *much* more common, and in the aftermath of
deleting a tap device, there is some other necessary cleanup of things
which are named based on the device name (nwfilter rules, bandwidth
rules, OVS switch ports, to name a few) that could end up stomping
over the top of the setup of a new device of the same name for a
different guest.

Since the kernel "create a name based on a template" functionality for
tap devices doesn't exist for macvtap, this patch for standard tap
devices is a bit different from the previous patch for macvtap - in
particular there was no previous "bitmap ID reservation system" or
overly-complex retry loop that needed to be removed. We simply find
and unused name, and pass that name on to the kernel instead of
"vnet%d".

This counter is also wrapped when either it gets to INT_MAX or if the
full name would overflow IFNAMSIZ-1 characters. In the case of
"vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly
someday someone will change the name from vnet to something else.

(NB: It is still possible for a user to provide their own
parameterized template name (e.g. "mytap%d") in the XML, and libvirt
will just pass that through to the kernel as it always has.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:44 -04:00
Laine Stump
d7f38beb2e util: replace macvtap name reservation bitmap with a simple counter
There have been some reports that, due to libvirt always trying to
assign the lowest numbered macvtap / tap device name possible, a new
guest would sometimes be started using the same tap device name as
previously used by another guest that is in the process of being
destroyed *as the new guest is starting.

In some cases this has led to, for example, the old guest's
qemuProcessStop() code deleting a port from an OVS switch that had
just been re-added by the new guest (because the port name is based on
only the device name using the port). Similar problems can happen (and
I believe have) with nwfilter rules and bandwidth rules (which are
both instantiated based on the name of the tap device).

A couple patches have been previously proposed to change the ordering
of startup and shutdown processing, or to put a mutex around
everything related to the tap/macvtap device name usage, but in the
end no matter what you do there will still be possible holes, because
the device could be deleted outside libvirt's control (for example,
regular tap devices are automatically deleted when the qemu process
terminates, and that isn't always initiated by libvirt but could
instead happen completely asynchronously - libvirt then has no control
over the ordering of shutdown operations, and no opportunity to
protect it with a mutex.)

But this only happens if a new device is created at the same time as
one is being deleted. We can effectively eliminate the chance of this
happening if we end the practice of always looking for the lowest
numbered available device name, and instead just keep an integer that
is incremented each time we need a new device name. At some point it
will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15
character limit if nothing else), and we can't guarantee that the new
name really will be the *least* recently used name, but "math"
suggests that it will be *much* less common that we'll try to re-use
the *most* recently used name.

This patch implements such a counter for macvtap/macvlan, replacing
the existing, and much more complicated, "ID reservation" system. The
counter is set according to whatever macvtap/macvlan devices are
already in use by guests when libvirtd is started, incremented each
time a new device name is needed, and wraps back to 0 when either
INT_MAX is reached, or when the resulting device name would be longer
than IFNAMSIZ-1 characters (which actually is what happens when the
template for the device name is "maccvtap%d"). The result is that no
macvtap name will be re-used until the host has created (and possibly
destroyed) 99,999,999 devices.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-09-01 14:16:36 -04:00
Ján Tomko
0a37e0695b Split declarations from initializations
Split those initializations that depend on a statement
above them.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Ján Tomko
a5152f23e7 Move declarations before statements
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-08-25 19:03:11 +02:00
Laine Stump
5cad64ec03 qemu: remove unreachable code in qemuProcessStart()
Back when the original version of this chunk of code was added (commit
41b087198 in libvirt-0.8.1 in April 2010), we used virExecDaemonize()
to start the qemu process, and would continue on in the function
(which at that time was called qemudStartVMDaemon()) even if a -1 was
returned. So it was possible to get to this code with rv == -1 (it was
called "ret" in that version of the code).

In modern libvirt code, qemu is started with virCommandRun(); then we
call virPidFileReadPath(); those are the only two ways of setting "rv"
prior to this code being removed, and in either case if the new value
of rv < 0, then we immediately skip over the rest of the code to the
cleanup: label.

This means that the code being removed by this patch is
unreachable.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-08-24 23:46:51 -04:00
Michal Privoznik
9048dc4e62 qemuDomainBuildNamespace: Populate basic /dev from daemon's namespace
As mentioned in previous commit, populating domain's namespace
from pre-exec() hook is dangerous. This commit moves population
of the namespace with basic /dev nodes (e.g. /dev/null, /dev/kvm,
etc.) into daemon's namespace.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:36 +02:00
Michal Privoznik
8da362fe62 qemu_domain_namespace: Repurpose qemuDomainBuildNamespace()
Okay, here is the deal. Currently, the way we build namespace is
very fragile. It is done from pre-exec hook when starting a
domain, after we mass closed all FDs and before we drop
privileges and exec() QEMU. This fact poses some limitations onto
the namespace build code, e.g. it has to make sure not to keep
any FD opened (not even through a library call), because it would
be leaked to QEMU. Also, it has to call only async signal safe
functions. These requirements are hard to meet - in fact as of my
commit v6.2.0-rc1~235 we are leaking a FD into QEMU by calling
libdevmapper functions.

To solve this issue and avoid similar problems in the future, we
should change our paradigm. We already have functions which can
populate domain's namespace with nodes from the daemon context.
If we use them to populate the namespace and keep only the bare
minimum in the pre-exec hook, we've mitigated the risk.

Therefore, the old qemuDomainBuildNamespace() is renamed to
qemuDomainUnshareNamespace() and new qemuDomainBuildNamespace()
function is introduced. So far, the new function is basically a
NOP and domain's namespace is still populated from the pre-exec
hook - next patches will fix it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:36 +02:00
Michal Privoznik
764eaf1aa4 qemu_domain_namespace: Rename qemuDomainCreateNamespace()
The name of this function is not very helpful, because it doesn't
create anything, it just flips a bit in a bitmask when domain is
starting up. Move the function internals into qemu_process.c and
forget the function ever existed.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:40:33 +02:00
Michal Privoznik
90eee87569 qemu: Separate out namespace handling code
The qemu_domain.c file is big as is and we should split it into
separate semantic blocks. Start with code that handles domain
namespaces.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-08-03 19:32:27 +02:00
Ján Tomko
ee247e1d3f Use g_strfeev instead of virStringFreeList
Both accept a NULL value gracefully and virStringFreeList
does not zero the pointer afterwards, so a straight replace
is safe.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-08-03 15:37:36 +02:00
Ján Tomko
1edf164848 Remove redundant conditions
All of these have been checked earlier.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
2020-08-03 15:19:28 +02:00
Ján Tomko
6c7ba7b496 qemu: Fix affinity typo
Fixes: 4c0398b528
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-07-22 15:51:26 +02:00
Bihong Yu
3ee423c363 qemu: pre-create the dbus directory in qemuStateInitialize
There are races condiction to make '/run/libvirt/qemu/dbus' directory in
virDirCreateNoFork() while concurrent start VMs, and get "failed to create
directory '/run/libvirt/qemu/dbus': File exists" error message. pre-create the
dbus directory in qemuStateInitialize.

Signed-off-by: Bihong Yu <yubihong@huawei.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-07-22 09:40:15 +02:00
Jiri Denemark
1031db3600 qemu: Properly set //cpu/@migratable default value for running domains
Since active domains which do not have the attribute already set were
not started by libvirt that probed for CPU migratable property, we need
to check this property on reconnect and update the domain definition
accordingly.

https://bugzilla.redhat.com/show_bug.cgi?id=1857967

Reported-by: Mark Mielke <mark.mielke@gmail.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-07-21 15:40:01 +02:00
Daniel Henrique Barboza
f187b2fb98 qemu_process.c: modernize qemuProcessQMPNew()
Use g_autoptr() and remove the 'cleanup' label.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200717211556.1024748-3-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-07-21 15:34:36 +02:00
Peter Krempa
db712b0673 qemuDomainDiskLookupByNodename: Remove unused 'idx'
All callers pass NULL as the value. Remove the argument.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Peter Krempa
c414ab00e2 qemuProcessHandleBlockThreshold: Report correct indexes
The index returned by qemuDomainDiskLookupByNodename is the position in
the backing chain rather than the index we report in the XML.

Since with -blockdev they differ now and additionally the disk source
also has an index we need to fix the 'threshold' events we report:

1) If it's the top level image we must always trigger the event without
   any suffix as we did until now

2) We must report the correct index

3) We must report the correct index also for the top level image, when
   blockdev is used.

This means that we need to potentially emit 2 events, one for the device
without the index and then when blockdev is used and the top level image
has an index we must do it also with the index.

This will fix it for blockdev cases, while also not removing previous
semantics.

https://bugzilla.redhat.com/show_bug.cgi?id=1857204

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Peter Krempa
4a19b7b832 qemuDomainDiskBackingStoreGetName: Remove unused argument
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-07-21 09:52:46 +02:00
Prathamesh Chavan
aca37c3fb2 qemu_domainjob: introduce privateData for qemuDomainJob
To remove dependecy of `qemuDomainJob` on job specific
paramters, a `privateData` pointer is introduced.
To handle it, structure of callback functions is
also introduced.

Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-07-20 15:34:58 +02:00
Michal Privoznik
824e349397 qemu: Use qemuSecuritySetSavedStateLabel() to label restore path
Currently, when restoring from a domain the path that the domain
restores from is labelled under qemuSecuritySetAllLabel() (and after
v6.3.0-rc1~108 even outside transactions). While this grants QEMU
the access, it has a flaw, because once the domain is restored, up
and running then qemuSecurityDomainRestorePathLabel() is called,
which is not real counterpart. In case of DAC driver the
SetAllLabel() does nothing with the restore path but
RestorePathLabel() does - it chown()-s the file back and since there
is no original label remembered, the file is chown()-ed to
root:root. While the apparent solution is to have DAC driver set the
label (and thus remember the original one) in SetAllLabel(), we can
do better.

Turns out, we are opening the file ourselves (because it may live on
a root squashed NFS) and then are just passing the FD to QEMU. But
this means, that we don't have to chown() the file at all, we need
to set SELinux labels and/or add the path to AppArmor profile.

And since we want to restore labels right after QEMU is done loading
the migration stream (we don't want to wait until
qemuSecurityRestoreAllLabel()), the best way to approach this is to
have separate APIs for labelling and restoring label on the restore
file.

I will investigate whether AppArmor can use the SavedStateLabel()
API instead of passing the restore path to SetAllLabel().

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1851016

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-07-10 14:18:07 +02:00
Daniel P. Berrangé
fd460ef561 qemu: stop checking virObjectUnref return value
Some, but not all, of the monitor event handlers check
the virObjectUnref return value to see if the domain
was disposed.

It should not be possible for this to happen, since
the function already holds a lock on the domain and
has only just acquired an extra reference on the
domain a few lines earlier.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-06-03 10:20:17 +01:00
Daniel Henrique Barboza
9665b27dba qemuProcessRefreshCPU: skip 'host-model' logic for pSeries guests
Commit v3.10.0-182-g237f045d9a ("qemu: Ignore fallback CPU attribute
on reconnect") forced CPU 'fallback' to ALLOW, regardless of user
choice. This fixed a situation in which guests created with older
Libvirt versions, which used CPU mode 'host-model' in runtime, would
fail to launch in a newer Libvirt if the fallback was set to FORBID.
This would lead to a scenario where the CPU was translated to 'host-model'
to 'custom', but then the FORBID setting would make the translation
process fail.

PSeries can operate with 'host-model' in runtime due to specific PPC64
mechanics regarding compatibility mode. The update() implementation of
the cpuDriverPPC64 driver is a NO-OP if CPU mode is 'host-model', and
the driver does not implement translate(). The commit mentioned above
is causing PSeries guests to get their 'fallback' setting to ALLOW,
overwriting user choice, exposing a design problem in
qemuProcessRefreshCPU() - for PSeries guests, handling 'host-model'
as it is being done does not apply.

All other cpuArchDrivers implements update() and changes guest mode
to VIR_CPU_MODE_CUSTOM, meaning that PSeries is currently the only
exception to this logic. Let's make it official.

https://bugzilla.redhat.com/show_bug.cgi?id=1660711

Suggested-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200525123945.4049591-2-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-05-25 16:20:25 +02:00
Daniel Henrique Barboza
f600c42627 qemu_process.c: modernize qemuProcessUpdateCPU code path
Use automatic cleanup on qemuProcessUpdateCPU and the functions called
by it.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200522195620.3843442-5-danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-05-25 12:31:14 +02:00
Peter Krempa
78d30aa0bf qemu: Prepare for testing of 'netdev_add' props via qemuxml2argvtest
qemuxml2argv test suite is way more comprehensive than the hotplug
suite. Since we share the code paths for monitor and command line
hotplug we can easily test the properties of devices against the QAPI
schema.

To achieve this we'll need to skip the JSON->commandline conversion for
the test run so that we can analyze the pure properties. This patch adds
flags for the comand line generator and hook them into the
JSON->commandline convertor for -netdev. An upcoming patch will make use
of this new infrastructure.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-05-20 09:41:58 +02:00
Michal Privoznik
8fd2749b2d qemuProcessStop: Reattach NVMe disks a domain is mirroring into
If the mirror destination is not a file but a NVMe disk, then
call qemuHostdevReAttachOneNVMeDisk() to reattach the NVMe back
to the host.

This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1825785

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-05-18 15:14:27 +02:00
Michal Privoznik
0230e38384 qemuProcessStop: Use XATTRs to restore seclabels on disks a domain is mirroring into
In v5.10.0-rc1~42 (which was later fixed in v6.0.0-rc1~487) I am
removing XATTRs for a file that QEMU is mirroring a disk into but
it is killed meanwhile. Well, we can call
qemuSecurityRestoreImageLabel() which will not only remove XATTRs
but also use them to restore the original owner of the file.

This would be done by blockjob code when the job finishes, but in
this case the job won't finish - QEMU is killed meanwhile

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-05-18 15:13:14 +02:00
Ján Tomko
006782a8bc qemu: only stop external devices after the domain
A failure in qemuProcessLaunch would lead to qemuExtDevicesStop
being called twice - once in the cleanup section and then again
in qemuProcessStop.

However, the first one is called while the QEMU process is
still running, which is too soon for the swtpm process, because
the swtmp_ioctl command can lock up:

https://bugzilla.redhat.com/show_bug.cgi?id=1822523

Remove the first call and only leave the one in qemuProcessStop,
which is called after the QEMU process is killed.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-05-13 15:29:37 +02:00
Peter Krempa
3bcbdc51da qemu: process: Don't clear QEMU_CAPS_BLOCKDEV when SD card is present
Help QEMU in deprecation of -drive if=none without the need to refactor
all old boards. Stop masking out -blockdev support when -drive if=sd
needs to be used. We achieve this by forbidding blockjobs and
special-casing all other code paths. Blockjobs are sacrificed in this
case as SD cards are a corner case for some ARM boards and are thus not
used commonly.

https://bugzilla.redhat.com/show_bug.cgi?id=1821692

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
e664abb62e qemu: Prepare for 'sd' card use together with blockdev
SD cards need to be instantiated via -drive if=sd. This means that all
cases where we use the blockdev path need to be special-cased for SD
cards.

Note that at this point QEMU_CAPS_BLOCKDEV is still cleared if the VM
config has a SD card.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
d876a93f05 qemu: Handle cases when 'qomName' isn't present
Use the drive alias for all cases when we can't generate qomName. This
is meant to handle disks on 'sd' bus which are instantiated via -drive
if=sd as there isn't any specific QOM name for them.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:55:00 +02:00
Peter Krempa
cc4a277db2 qemu: Rename qemuDiskBusNeedsDriveArg to qemuDiskBusIsSD
The function effectively boils down to whether the disk is 'SD'. Since
we'll need to make more decisions based on the fact whether the disk is
on the SD bus, rename the function.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-05-12 06:54:59 +02:00
Michal Privoznik
1d3a9ee9da qemu: Make memory path generation embed driver aware
So far, libvirt generates the following path for memory:

  $memoryBackingDir/$id-$shortName/ram-nodeN

where $memoryBackingDir is the path where QEMU mmaps() memory for
the guest (e.g. /var/lib/libvirt/qemu/ram), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:

  /var/lib/libvirt/qemu/ram/1-QEMUGuest/ram-node0

While in case of embed driver the following path would be
generated by default:

  $root/lib/qemu/ram/1-QEMUGuest/ram-node0

which is not clashing with other embed drivers, we allow users to
override the default and have all embed drivers use the same
prefix. This can create clashing paths. Fortunately, we can reuse
the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.

Note, the important change is in qemuGetMemoryBackingBasePath().
The rest is needed to pass driver around.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-04-07 15:26:32 +02:00
Michal Privoznik
bf54784cb1 qemu: Make hugepages path generation embed driver aware
So far, libvirt generates the following path for hugepages:

  $mnt/libvirt/qemu/$id-$shortName

where $mnt is the mount point of hugetlbfs corresponding to
hugepages of desired size (e.g. /dev/hugepages), $id is domain ID
and $shortName is shortened version of domain name. So for
instance, the generated path may look something like this:

  /dev/hugepages/libvirt/qemu/1-QEMUGuest

But this won't work with embed driver really, because if there
are two instances of embed driver, and they both want to start a
domain with the same name and with hugepages, both drivers will
generate the same path which is not desired. Fortunately, we can
reuse the approach for machined name generation
(v6.1.0-178-gc9bd08ee35) and include part of hash of the root in
the generated path.

Note, the important change is in qemuGetBaseHugepagePath(). The
rest is needed to pass driver around.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-04-07 15:26:26 +02:00
Daniel P. Berrangé
d2954c0729 qemu: ensure domain event thread is always stopped
In previous commit:

  commit e6afacb0fe
  Author: Daniel P. Berrangé <berrange@redhat.com>
  Date:   Wed Feb 12 12:26:11 2020 +0000

    qemu: start/stop an event loop thread for domains

A bogus comment was added claiming we didn't need to shutdown the
event thread in the qemuProcessStop method, because this would be
done in the monitor EOF callback. This was wrong because the EOF
callback only runs in the case of a QEMU crash or a guest initiated
clean shutdown & poweroff.  In the case where the libvirt admin
calls virDomainDestroy, the EOF callback never fires because we
have already unregistered the event callbacks. We must thus always
attempt to stop the event thread in qemuProcessStop.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reported-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-30 16:48:15 +01:00
Marc-André Lureau
db670b8d67 qemu: prepare and stop the dbus daemon
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-24 15:57:33 +01:00
Michal Privoznik
a02c589886 qemuProcessStartManagedPRDaemon: Don't pass -f pidfile to the daemon
Now, that our virCommandSetPidFile() is more intelligent we don't
need to rely on the daemon to create and lock the pidfile and use
virCommandSetPidFile() at the same time.

NOTE that as advertised in the previous commit, this was
temporarily broken, because both virCommand and
qemuProcessStartManagedPRDaemon() would try to lock the pidfile.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2020-03-24 15:53:03 +01:00
Gaurav Agrawal
d2c43a5b51 qemu: convert DomainLogContext class to use GObject
Signed-off-by: Gaurav Agrawal <agrawalgaurav@gnome.org>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2020-03-16 17:28:39 +01:00
Ján Tomko
b0eea635b3 Use g_strerror instead of virStrerror
Remove lots of stack-allocated buffers.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-13 17:26:55 +01:00
Nikolay Shirokovskiy
b47e3b9b5c qemu: agent: sync once if qemu has serial port event
Sync was introduced in [1] to check for ga presence. This
check is racy but in the era before serial events are available
there was not better solution I guess.

In case we have the events the sync function is different. It allows us
to flush stateless ga channel from remnants of previous communications.
But we need to do it only once. Until we get timeout on issued command
channel state is ok.

[1] qemu_agent: Issue guest-sync prior to every command

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-03-12 18:07:50 +01:00
Daniel P. Berrangé
a18f2c52ac qemu: convert agent to use the per-VM event loop
This converts the QEMU agent APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.

A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:45:01 +00:00
Daniel P. Berrangé
436a56e37d qemu: convert monitor to use the per-VM event loop
This converts the QEMU monitor APIs to use the per-VM
event loop, which involves switching from virEvent APIs
to GMainContext / GSource APIs.

A GSocket is used as a convenient way to create a GSource
for a socket, but is not yet used for actual I/O.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:55 +00:00
Daniel P. Berrangé
92890fbfa1 qemu: start/stop an event thread for QMP probing
In common with regular QEMU guests, the QMP probing
will need an event loop for handling monitor I/O
operations.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:47 +00:00
Daniel P. Berrangé
e6afacb0fe qemu: start/stop an event loop thread for domains
The event loop thread will be responsible for handling
any per-domain I/O operations, most notably the QEMU
monitor and agent sockets.

We start this event loop when launching QEMU, but stopping
the event loop is a little more complicated. The obvious
idea is to stop it in qemuProcessStop(), but if we do that
we risk loosing the final events from the QEMU monitor, as
they might not have been read by the event thread at the
time we tell the thread to stop.

The solution is to delay shutdown of the event thread until
we have seen EOF from the QEMU monitor, and thus we know
there are no further events to process.

Note that this assumes that we don't have events to process
from the QEMU agent.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-11 14:44:44 +00:00
Michal Privoznik
13eb6c1468 qemu: Tell secdrivers which images are top parent
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.

The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image.  Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1803551

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-03-09 14:14:55 +01:00
Daniel P. Berrangé
5bff668dfb src: improve thread naming with human targetted names
Historically threads are given a name based on the C function,
and this name is just used inside libvirt. With OS level thread
naming this name is now visible to debuggers, but also has to
fit in 15 characters on Linux, so function names are too long
in some cases.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-03-05 12:23:04 +00:00
Ján Tomko
b164eac5e1 qemuExtDevicesStart: pass logManager
Pass logManager to qemuExtDevicesStart for future usage.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
2020-03-04 12:08:50 +01:00
Ján Tomko
feb69a19ac conf: do not pass vm object to virDomainClearNetBandwidth
This function only uses the domain definition.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-02-25 17:50:47 +01:00
Ján Tomko
7e0d11be5b virsh: include virutil.h where used
Include virutil.h in all files that use it,
instead of relying on it being pulled in somehow.

Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-02-24 23:15:50 +01:00
Michal Privoznik
74ec3f4d7d qemu: Don't explicitly remove pidfile after virPidFileForceCleanupPath()
In two places where virPidFileForceCleanupPath() is called, we
try to unlink() the pidfile again. This is needless because
virPidFileForceCleanupPath() has done just that.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-20 12:57:19 +01:00
zhenwei pi
26badd13e8 qemu: support Panic Crashloaded event handling
Pvpanic device supports bit 1 as crashloaded event, it means that
guest actually panicked and run kexec to handle error by guest side.

Handle crashloaded as a lifecyle event in libvirt.

Test case:
Guest side:
before testing, we need make sure kdump is enabled,
1, build new pvpanic driver (with commit from upstream
   e0b9a42735f2672ca2764cfbea6e55a81098d5ba
   191941692a3d1b6a9614502b279be062926b70f5)
2, insmod new kmod
3, enable crash_kexec_post_notifiers,
  # echo 1 > /sys/module/kernel/parameters/crash_kexec_post_notifiers
4, trigger kernel panic
  # echo 1 > /proc/sys/kernel/sysrq
  # echo c > /proc/sysrq-trigger

Host side:
1, build new qemu with pvpanic patches (with commit from upstream
   600d7b47e8f5085919fd1d1157f25950ea8dbc11
   7dc58deea79a343ac3adc5cadb97215086054c86)
2, build libvirt with this patch
3, handle lifecycle event and trigger guest side panic
  # virsh event stretch --event lifecycle
  event 'lifecycle' for domain stretch: Crashed Crashloaded
  events received: 1

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-02-07 14:05:25 +00:00
Jiri Denemark
80791859ac qemu: Pass machine type to virQEMUCapsIsCPUModeSupported
The usability of a specific CPU mode may depend on machine type, let's
prepare for this by passing it to virQEMUCapsIsCPUModeSupported.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-07 09:19:02 +01:00
Michal Privoznik
a37a8c569d Drop virAtomic module
Now, that every use of virAtomic was replaced with its g_atomic
equivalent, let's remove the module.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-02 16:36:58 +01:00
Michal Privoznik
7390ff3caa src: Drop virAtomicIntDecAndTest() with g_atomic_int_dec_and_test()
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-02 16:36:56 +01:00
Michal Privoznik
574678a27f src: Replace virAtomicIntInc() with g_atomic_int_add()
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-02 16:36:54 +01:00
Daniel P. Berrangé
068efae5b1 qemu: add support for running QEMU driver in embedded mode
This enables support for running QEMU embedded to the calling
application process using a URI:

   qemu:///embed?root=/some/path

Note that it is important to keep the path reasonably short to
avoid risk of hitting the limit on UNIX socket path names
which is 108 characters.

When using the embedded mode with a root=/var/tmp/embed, the
driver will use the following paths:

                logDir: /var/tmp/embed/log/qemu
           swtpmLogDir: /var/tmp/embed/log/swtpm
         configBaseDir: /var/tmp/embed/etc/qemu
              stateDir: /var/tmp/embed/run/qemu
         swtpmStateDir: /var/tmp/embed/run/swtpm
              cacheDir: /var/tmp/embed/cache/qemu
                libDir: /var/tmp/embed/lib/qemu
       swtpmStorageDir: /var/tmp/embed/lib/swtpm
 defaultTLSx509certdir: /var/tmp/embed/etc/pki/qemu

These are identical whether the embedded driver is privileged
or unprivileged.

This compares with the system instance which uses

                logDir: /var/log/libvirt/qemu
           swtpmLogDir: /var/log/swtpm/libvirt/qemu
         configBaseDir: /etc/libvirt/qemu
              stateDir: /run/libvirt/qemu
         swtpmStateDir: /run/libvirt/qemu/swtpm
              cacheDir: /var/cache/libvirt/qemu
                libDir: /var/lib/libvirt/qemu
       swtpmStorageDir: /var/lib/libvirt/swtpm
 defaultTLSx509certdir: /etc/pki/qemu

At this time all features present in the QEMU driver are available when
running in embedded mode, availability matching whether the embedded
driver is privileged or unprivileged.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-27 11:04:03 +00:00
Pavel Hrdina
894556ca81 secret: move virSecretGetSecretString into virsecret
The function virSecretGetSecretString calls into secret driver and is
used from other hypervisors drivers and as such makes more sense in
util.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-17 15:52:37 +01:00
Daniel P. Berrangé
7b9645a7d1 util: replace atomic ops impls with g_atomic_int*
Libvirt's original atomic ops impls were largely copied
from GLib's code at the time. The only API difference
was that libvirt's virAtomicIntInc() would return a
value, but g_atomic_int_inc was void. We thus use
g_atomic_int_add(v, 1) instead, though this means
virAtomicIntInc() now returns the original value,
instead of the new value.

This rewrites libvirt's impl in terms of g_atomic_int*
as a short term conversion. The key motivation was to
quickly eliminate use of GNULIB's verify_expr() macro
which is not a direct match for G_STATIC_ASSERT_EXPR.
Long term all the callers should be updated to use
g_atomic_int* directly.

Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-17 10:02:00 +00:00
Jiri Denemark
bd04d63ad9 qemu: Don't emit SUSPENDED_POSTCOPY event on destination
When pause-before-switchover QEMU capability is enabled, we get STOP
event before MIGRATION event with postcopy-active state. To properly
handle post-copy migration and emit correct events commit
v4.10.0-rc1-4-geca9d21e6c added a hack to
qemuProcessHandleMigrationStatus which translates the paused state
reason to VIR_DOMAIN_PAUSED_POSTCOPY and emits
VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event when migration state changes
to post-copy.

However, the code was effective on both sides of migration resulting in
a confusing VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY event on the destination
host, where entering post-copy mode is already properly advertised by
VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event.

https://bugzilla.redhat.com/show_bug.cgi?id=1791458

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-01-16 15:12:19 +01:00
Michal Privoznik
50d7465f3d qemu_firmware: Pass virDomainDef into qemuFirmwareFillDomain()
This function needs domain definition really, we don't need to
pass the whole domain object. This saves couple of dereferences
and characters esp. in more checks to come.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-07 16:26:47 +01:00
Peter Krempa
5632ed8bad qemu: process: Terminate backup job on VM destroy
Commit d75f865fb9 caused a job-deadlock if
a VM is running the backup job and being destroyed as it removed the
cleanup of the async job type and there was nothing to clean up the
backup job.

Add an explicit cleanup of the backup job when destroying a VM.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:36 +01:00
Peter Krempa
728b993c8a qemu: Reset the node-name allocator in qemuDomainObjPrivateDataClear
qemuDomainObjPrivateDataClear clears state which become invalid after VM
stopped running and the node name allocator belongs there.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2020-01-06 10:15:35 +01:00
Nikolay Shirokovskiy
6c6d93bc62 qemu: hide details of fake reboot
If we use fake reboot then domain goes thru running->shutdown->running
state changes with shutdown state only for short period of time.  At
least this is implementation details leaking into API. And also there is
one real case when this is not convinient. I'm doing a backup with the
help of temporary block snapshot (with the help of qemu's API which is
used in the newly created libvirt's backup API). If guest is shutdowned
I want to continue to backup so I don't kill the process and domain is
in shutdown state. Later when backup is finished I want to destroy qemu
process. So I check if it is in shutdowned state and destroy it if it
is. Now if instead of shutdown domain got fake reboot then I can destroy
process in the middle of fake reboot process.

After shutdown event we also get stop event and now as domain state is
running it will be transitioned to paused state and back to running
later. Though this is not critical for the described case I guess it is
better not to leak these details to user too. So let's leave domain in
running state on stop event if fake reboot is in process.

Reconnection code handles this patch without modification. It detects
that qemu is not running due to shutdown and then calls qemuProcessShutdownOrReboot
which reboots as fake reboot flag is set.

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-24 09:22:40 +03:00
Daniel Henrique Barboza
7a7d36055c qemu_process.c: remove 'cleanup' label from qemuProcessCreatePretendCmd()
The 'cleanup' flag is doing no cleaup in this function. We can
remove it and return NULL on error or qemuBuildCommandLine().

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
d8eb3ab9e1 qemu_process.c: remove cleanup labels after g_auto*() changes
The g_auto*() changes made by the previous patches made a lot
of 'cleanup' labels obsolete. Let's remove them.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
d234efc59a qemu_process.c: use g_autoptr()
Change all feasible pointers to use g_autoptr().

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Daniel Henrique Barboza
982ea95142 qemu_process.c: use g_autofree
Change all feasible strings and scalar pointers to use g_autofree.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2019-12-20 18:31:51 -05:00
Michal Privoznik
8e2026cc18 qemu: Generate command line of NVMe disks
Now, that we have everything prepared, we can generate command
line for NVMe disks.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
2019-12-17 10:04:44 +01:00
Daniel P. Berrangé
766c8ae963 Revert "qemu: directly create virResctrlInfo ignoring capabilities"
This reverts commit 7be5fe66cd.

This commit broke resctrl, because it missed the fact that the
virResctrlInfoGetCache() has side-effects causing it to actually
change the virResctrlInfo parameter, not merely get data from
it.

This code will need some refactoring before we can try separating
it from virCapabilities again.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-12 11:16:44 +00:00
Daniel P. Berrangé
1902356231 qemu: keep capabilities when running QEMU as root
When QEMU uid/gid is set to non-root this is pointless as if we just
used a regular setuid/setgid call, the process will have all its
capabilities cleared anyway by the kernel.

When QEMU uid/gid is set to root, this is almost (always?) never
what people actually want. People make QEMU run as root in order
to access some privileged resource that libvirt doesn't support
yet and this often requires capabilities. As a result they have
to go find the qemu.conf param to turn this off. This is not
viable for libguestfs - they want to control everything via the
XML security label to request running as root regardless of the
qemu.conf settings for user/group.

Clearing capabilities was implemented originally because there
was a proposal in Fedora to change permissions such that root,
with no capabilities would not be able to compromise the system.
ie a locked down root account. This never went anywhere though,
and as a result clearing capabilities when running as root does
not really get us any security benefit AFAICT. The root user
can easily do something like create a cronjob, which will then
faithfully be run with full capabilities, trivially bypassing
the restriction we place.

IOW, our clearing of capabilities is both useless from a security
POV, and breaks valid use cases when people need to run as root.

This removes the clear_emulator_capabilities configuration
option from qemu.conf, and always runs QEMU with capabilities
when root.  The behaviour when non-root is unchanged.

Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-11 16:01:20 +00:00
Peter Krempa
3656bb0a13 qemu: domain: Introduce QEMU_ASYNC_JOB_BACKUP async job type
We will want to use the async job infrastructure along with all the APIs
and event for the backup job so add the backup job as a new async job
type.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-12-10 12:41:57 +01:00
Daniel P. Berrangé
7be5fe66cd qemu: directly create virResctrlInfo ignoring capabilities
We always refresh the capabilities object when using virResctrlInfo
during process startup. This is undesirable overhead, because we can
just directly create a virResctrlInfo instead.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-09 10:17:27 +00:00
Daniel P. Berrangé
adf009b48f qemu: use host CPU object directly
Avoid grabbing the whole virCapsPtr object when we only need the
host CPU information.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-09 10:17:27 +00:00
Daniel P. Berrangé
1a1d848694 qemu: use NUMA capabilities object directly
Avoid grabbing the whole virCapsPtr object when we only need the
NUMA information.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-09 10:17:27 +00:00
Daniel P. Berrangé
6cc992bd1a conf: move NUMA capabilities into self contained object
The NUMA cells are stored directly in the virCapsHostPtr
struct. This moves them into their own struct allowing
them to be stored independantly of the rest of the host
capabilities. The change is used as an excuse to switch
the representation to use a GPtrArray too.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-12-09 10:17:27 +00:00