libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2025-01-02 19:15:20 +00:00

Author	SHA1	Message	Date
Michal Privoznik	5ac2439a83	qemu_process: Release domain seclabel later in qemuProcessStop() Some secdrivers (typically SELinux driver) generate unique dynamic seclabel for each domain (unless a static one is requested in domain XML). This is achieved by calling qemuSecurityGenLabel() from qemuProcessPrepareDomain() which allocates unique seclabel and stores it in domain def->seclabels. The counterpart is qemuSecurityReleaseLabel() which releases the label and removes it from def->seclabels. Problem is, that with current code the qemuProcessStop() may still want to use the seclabel after it was released, e.g. when it wants to restore the label of a disk mirror. What is happening now, is that in qemuProcessStop() the qemuSecurityReleaseLabel() is called, which removes the SELinux seclabel from def->seclabels, yada yada yada and eventually qemuSecurityRestoreImageLabel() is called. This bubbles down to virSecuritySELinuxRestoreImageLabelSingle() which find no SELinux seclabel (using virDomainDefGetSecurityLabelDef()) and this returns early doing nothing. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1751664 Fixes: `8fa0374c5b` Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2021-01-06 13:29:09 +01:00
Jiri Denemark	f7c40b5c71	qemu: The TSC tolerance interval should be closed The kernel refuses to set guest TSC frequency less than a minimum frequency or greater than maximum frequency (both computed based on the host TSC frequency). When writing the libvirt code with a reversed logic (return success when the requested frequency falls within the tolerance interval) I forgot to include the boundaries. Fixes: `d8e5b45600` https://bugzilla.redhat.com/show_bug.cgi?id=1839095 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2021-01-06 11:24:37 +01:00
Peter Krempa	d0819b9f02	qemu: Properly handle setting of <iotune> for empty cdrom When starting a VM with an empty cdrom which has <iotune> configured the startup fails as qemu is not happy about setting tuning for an empty drive: error: internal error: unable to execute 'block_set_io_throttle', unexpected error: 'Device has no medium' Resolve this by skipping the setting of throttling for empty drives and updating the throttling when new medium is inserted into the drive. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/111 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2021-01-06 09:24:48 +01:00
Shi Lei	9b5d741a9d	netdevmacvlan: Use helper function to create unique macvlan/macvtap name Simplify ReserveName/GenerateName for macvlan and macvtap by using common functions. Signed-off-by: Shi Lei <shi_lei@massclouds.com> Reviewed-by: Laine Stump <laine@redhat.com>	2020-12-15 13:35:33 -05:00
Shi Lei	c36cad1a31	netdevtap: Use common helper function to create unique tap name Simplify GenerateName/ReserveName for netdevtap by using common functions. Signed-off-by: Shi Lei <shi_lei@massclouds.com> Reviewed-by: Laine Stump <laine@redhat.com>	2020-12-15 13:35:27 -05:00
Daniel Henrique Barboza	9432693e2b	domain_conf.c: move virDomainDeviceDefValidate() to domain_validate.c Move virDomainDeviceDefValidate() and all its helper functions to domain_validate.c. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-12-14 09:29:09 -03:00
Peter Krempa	18de9dfd77	virDomainDefValidate: Add per-run 'opaque' data virDomainDefPostParse infrastructure has apart from the global opaque data also per-run data, but this was not duplicated into the validation callbacks. This is important when drivers want to use correct run-state for the validation. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-12-09 09:33:47 +01:00
Peter Krempa	c1720b9ac7	qemuDomainDiskLookupByNodename: Lookup also backup 'store' nodenames Nodename may be asociated to a disk backup job, add support to looking up in that chain too. This is specifically useful for the BLOCK_WRITE_THRESHOLD event which can be registered for any nodename. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-12-08 15:12:34 +01:00
Michal Privoznik	40a162f83e	qemu: Don't cache NUMA caps In v6.0.0-rc1~439 (and friends) we tried to cache NUMA capabilities because we assumed they are immutable. And to some extent they are (NUMA hotplug is not a thing, is it). However, our capabilities contain also some runtime info that can change, e.g. hugepages pool allocation sizes or total amount of memory per node (host side memory hotplug might change the value). Because of the caching we might not be reporting the correct runtime info in 'virsh capabilities'. The NUMA caps are used in three places: 1) 'virsh capabilities' 2) domain startup, when parsing numad reply 3) parsing domain private data XML In cases 2) and 3) we need NUMA caps to construct list of physical CPUs that belong to NUMA nodes from numad reply. And while this may seem static, it's not really because of possible CPU hotplug on physical host. There are two possible approaches: 1) build a validation mechanism that would invalidate the cached NUMA caps, or 2) drop the caching and construct NUMA caps from scratch on each use. In this commit, the latter approach is implemented, because it's easier. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1819058 Fixes: `1a1d848694` Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-12-07 11:32:40 +01:00
Peter Krempa	0f7b80691b	qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately Don't make the logic confusing by representing the 3 options using an integer with negative values. Signed-off-by: Peter Krempa <pkrempa@redhat.com>	2020-12-07 10:15:00 +01:00
Daniel Henrique Barboza	5a34d0667d	qemu: move memory size align to qemuProcessPrepareDomain() qemuBuildCommandLine() is calling qemuDomainAlignMemorySizes(), which is an operation that changes live XML and domain and has little to do with the command line build process. Move it to qemuProcessPrepareDomain() where we're supposed to make live XML and domain changes before launch. qemuProcessStart() is setting VIR_QEMU_PROCESS_START_NEW if !migrate && !snapshot, same conditions used in qemuBuildCommandLine() to call qemuDomainAlignMemorySizes(), making this change seamless. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-12-03 17:19:35 -03:00
Daniel Henrique Barboza	3bb9ed8bc2	qemu_process.c: check migrateURI when setting VIR_QEMU_PROCESS_START_NEW qemuProcessCreatePretendCmdPrepare() is setting the VIR_QEMU_PROCESS_START_NEW regardless of whether this is a migration case or not. This behavior differs from what we're doing in qemuProcessStart(), where the flag is set only if !migrate && !snapshot. Fix it by making the flag setting consistent with what we're doing in qemuProcessStart(). Reviewed-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-12-03 17:16:33 -03:00
John Ferlan	148cfcf051	qemu: Pass / fill niothreads for qemuMonitorGetIOThreads Let's pass along / fill @niothreads rather than trying to make dual use as a return value and thread count. This resolves a Coverity issue detected in qemuDomainGetIOThreadsMon where if qemuDomainObjExitMonitor failed, then a -1 was returned and overwrite @niothreads causing a memory leak. Signed-off-by: John Ferlan <jferlan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-12-03 17:06:07 +01:00
Michal Privoznik	b7d4e6b67e	lib: Replace VIR_AUTOSTRINGLIST with GStrv Glib provides g_auto(GStrv) which is in-place replacement of our VIR_AUTOSTRINGLIST. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-12-02 15:43:07 +01:00
Pavel Hrdina	82bda55e2f	qemuProcessHandleGraphics: no need to check for NULL Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2020-11-16 17:25:41 +01:00
Jiri Denemark	d8e5b45600	qemu: Do not require TSC frequency to strictly match host Some CPUs provide a way to read exact TSC frequency, while measuring it is required on other CPUs. However, measuring is never exact and the result may slightly differ across reboots. For this reason both Linux kernel and QEMU recently started allowing for guests TSC frequency to fall into +/- 250 ppm tolerance interval around the host TSC frequency. Let's do the same to avoid unnecessary failures (esp. during migration) in case the host frequency does not exactly match the frequency configured in a domain XML. https://bugzilla.redhat.com/show_bug.cgi?id=1839095 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-11-12 17:29:16 +01:00
Masayoshi Mizuma	5cde9dee8c	qemu: Move qemuExtDevicesStop() before removing the pidfiles A qemu guest which has virtiofs config fails to start if the previous starting failed because of invalid option or something. That's because the virtiofsd isn't killed by virPidFileForceCleanupPath() on the former failure because the pidfile was already removed by virFileDeleteTree(priv->libDir) in qemuProcessStop(), so virPidFileForceCleanupPath() just returned. Move qemuExtDevicesStop() before virFileDeleteTree(priv->libDir) so that virPidFileForceCleanupPath() can kill virtiofsd correctly. For example of the reproduction: # virsh start guest error: Failed to start domain guest error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -foo: invalid option ... fix the option ... # virsh start guest error: Failed to start domain guest error: Cannot open log file: '/var/log/libvirt/qemu/guest-fs0-virtiofsd.log': Device or resource busy # Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-11-11 15:20:12 +01:00
Peter Krempa	62a01d84a3	util: hash: Retire 'virHashTable' in favor of 'GHashTable' Don't hide our use of GHashTable behind our typedef. This will also promote the use of glibs hash function directly. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Matt Coleman <matt@datto.com>	2020-11-06 10:40:51 +01:00
Daniel P. Berrangé	99a1cfc438	qemu: honour fatal errors dealing with qemu slirp helper Currently all errors from qemuInterfacePrepareSlirp() are completely ignored by the callers. The intention is that missing qemu-slirp binary should cause the caller to fallback to the built-in slirp impl. Many of the possible errors though should indeed be considered fatal. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-10-27 12:03:19 +00:00
zhenwei pi	7555a55470	qemu: implement memory failure event Since QEMU 5.2 (commit-77b285f7f6), QEMU supports 'memory failure' event, posts event to monitor if hitting a hardware memory error. Fully support this feature for QEMU. Test with commit 'libvirt: support memory failure event', build a little complex environment(nested KVM): 1, install newly built libvirt in L1, and start a L2 vm. run command in L1: ~# virsh event l2 --event memory-failure 2, run command in L0 to inject MCE to L1: ~# virsh qemu-monitor-command l1 --hmp mce 0 9 0xbd000000000000c0 0xd 0x62000000 0x8c Test result in l1(recipient hypervisor case): event 'memory-failure' for domain l2: recipient: hypervisor action: ignore flags: action required: 0 recursive: 0 Test result in l1(recipient guest case): event 'memory-failure' for domain l2: recipient: guest action: inject flags: action required: 0 recursive: 0 Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-10-23 09:42:00 +02:00
Peter Krempa	d6d4c08daf	util: hash: Change type of hash table name/key to 'char' All users of virHashTable pass strings as the name/key of the entry. Make this an official requirement by turning the variables to 'const char *'. For any other case it's better to use glib's GHashTable. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>	2020-10-22 15:02:46 +02:00
Daniel P. Berrangé	7b1ed1cd73	qemu: stop passing -enable-fips to QEMU >= 5.2.0 Use of the -enable-fips option is being deprecated in QEMU >= 5.2.0. If FIPS compliance is required, QEMU must be built with libcrypt which will unconditionally enforce it. Thus there is no need for libvirt to pass -enable-fips to modern QEMU. Unfortunately there was never any way to probe for -enable-fips in the first instance, it was enabled by libvirt based on version number originally, and then later unconditionally enabled when libvirt dropped support for older QEMU. Similarly we now use a version number check to decide when to stop passing -enable-fips. Note that the qemu-5.2 capabilities are currently from the pre-release version and will be updated once qemu-5.2 is released. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-10-22 09:03:33 +02:00
Jonathon Jongsma	08f8fd8413	conf: Add support for vDPA network devices This patch adds new schema and adds support for parsing and formatting domain configurations that include vdpa devices. vDPA network devices allow high-performance networking in a virtual machine by providing a wire-speed data path. These devices require a vendor-specific host driver but the data path follows the virtio specification. When a device on the host is bound to an appropriate vendor-specific driver, it will create a chardev on the host at e.g. /dev/vhost-vdpa-0. That chardev path can then be used to define a new interface with type='vdpa'. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Laine Stump <laine@redhat.com>	2020-10-20 14:46:52 -04:00
Peter Krempa	7b0ced89e7	qemu: Prepare hostdev data which depends on the host state separately SCSI hostdev setup requires querying the host os for the actual path of the configured hostdev. This was historically done in the command line formatter. Our new approach is to split out this part into 'qemuProcessPrepareHost' which is designed to be skipped in tests. Refactor the hostdev code to use this new semantics, and add appropriate handlers filling in the data for tests and the qemuConnectDomainXMLToNative users. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-10-20 15:08:22 +02:00
Peter Krempa	9ff3ad9058	qemuProcessCreatePretendCmd: Split up preparation and command building Host preparation steps which are deliberately skipped when pretend-creating a commandline are normally executed after VM object preparation. In the test code we are faking some of the host preparation steps, but we were doing that prior to the call to qemuProcessPrepareDomain embedded in qemuProcessCreatePretendCmd. By splitting up qemuProcessCreatePretendCmd into two functions we can ensure that the ordering of the prepare steps stays consistent. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-10-20 15:08:22 +02:00
Erik Skultety	ccb40cf288	qemu: process: sev: Fill missing 'cbitpos' & 'reducedPhysBits' from caps These XML attributes have been mandatory since the introduction of SEV support to libvirt. This design decision was based on QEMU's requirement for these to be mandatory for migration purposes, as differences in these values across platforms must result in the pre-migration checks failing (not that migration with SEV works at the time of this patch). This patch enables autofill of these attributes right before launching QEMU and thus updating the live XML. Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-10-19 11:03:27 +02:00
Erik Skultety	1fdc907325	qemu: process: Move SEV capability check to qemuValidateDomainDef Checks such as this one should be done at domain def validation time, not before starting the QEMU process. As for this change, existing domains will see some QEMU error when starting as opposed to a libvirt error that this QEMU binary doesn't support SEV, but that's okay, we never guaranteed error messages to remain the same. Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-10-19 11:03:16 +02:00
Erik Skultety	649f720a9a	qemu_process: sev: Drop an unused variable Signed-off-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2020-10-19 11:01:56 +02:00
Pavel Hrdina	5ad8272888	util: vircgroup: change virCgroupFree to take only virCgroupPtr As preparation for g_autoptr() we need to change the function to take only virCgroupPtr. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>	2020-10-09 16:24:35 +02:00
Ján Tomko	cc3190cc4c	qemu: process: use g_new0 Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>	2020-10-05 16:44:06 +02:00
Ján Tomko	868c350752	qemu: separate out VIR_ALLOC calls Move them to separate conditions to reduce churn in following patches. Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>	2020-10-05 16:44:06 +02:00
Cole Robinson	0fa5c23865	qemu: Taint cpu host-passthrough only after migration From a discussion last year[1], Dan recommended libvirt drop the tain flag for cpu host-passthrough, unless the VM has been migrated. This repurposes the existing host-cpu taint flag to do just that. [1]: https://www.redhat.com/archives/virt-tools-list/2019-February/msg00041.html https://bugzilla.redhat.com/show_bug.cgi?id=1673098 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Cole Robinson <crobinso@redhat.com>	2020-10-05 10:08:26 -04:00
Peter Krempa	faa88866f5	Don't check return value of virBitmapNewCopy The function will not fail any more. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-10-05 15:50:45 +02:00
Peter Krempa	cb6fdb0125	virBitmapNew: Don't check return value Remove return value check from all callers. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-10-05 15:38:47 +02:00
Masayoshi Mizuma	1c9227de5d	qemu: process: Handle transient disks on VM startup Add overlays after the VM starts before we start executing guest code. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com>	2020-10-01 09:55:02 +02:00
Peter Krempa	afc25e8553	qemu: prepare cleanup for <transient/> disk overlays Later patches will implement support for <transient/> disks in libvirt by installing an overlay on top of the configured image. This will require cleanup after the VM will be stopped so that the state is correctly discarded. Since the overlay will be installed only during the startup phase of the VM we need to ensure that qemuProcessStop doesn't delete the original file on some previous failure. This is solved by adding 'inhibitDiskTransientDelete' VM private data member which is set prior to any startup step and will be cleared once transient disk overlays are established. Based on that we can then delete the overlays for any <transient/> disk. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com>	2020-10-01 09:55:02 +02:00
Peter Krempa	3673bdbe13	qemu: domain: Extract preparation of hostdev specific data to a separate function Historically we've prepared secrets for all objects in one place. This doesn't make much sense and it's semantically more appealing to prepare everything for a single device type in one place. Move the setup of the (iSCSI\|SCSI) hostdev secrets into a new function which will be used to setup other things as well in the future. This is a similar approach we do for disks. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-09-15 15:20:23 +02:00
Ján Tomko	af16e754cd	qemuProcessReconnect: clear 'oldjob' After we started copying the privateData pointer in qemuDomainObjRestoreJob, we should also free them once we're done with them. Register the clear function and use g_auto. Also add a check for job->cb to qemuDomainObjClearJob, to prevent freeing an uninitialized job. https://bugzilla.redhat.com/show_bug.cgi?id=1878450 Signed-off-by: Ján Tomko <jtomko@redhat.com> Fixes: `aca37c3fb2`	2020-09-14 18:10:56 +02:00
Tim Wiederhake	caf5a88e59	qemu: Use glib memory functions in qemuProcessReadLog Signed-off-by: Tim Wiederhake <twiederh@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>	2020-09-11 18:19:58 +02:00
Michal Privoznik	ec46e6d44b	qemu_process: Separate VIR_PERF_EVENT_* setting into a function When starting a domain, qemuProcessLaunch() iterates over all VIR_PERF_EVENT_* values and (possibly) enables them. While there is nothing wrong with the code, the for loop where it's done makes it harder to jump onto next block of code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-09-08 10:57:24 +02:00
Martin Kletzander	f5b486daea	qemu: Allow setting affinity to fail and don't report error This is just a clean-up of commit `3791f29b08` using the new parameter of virProcessSetAffinity() introduced in commit `9514e24984` so that there is no error reported in the logs. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-09-07 14:48:57 +02:00
Martin Kletzander	9514e24984	Do not report error when setting affinity is allowed to fail Suggested-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-09-07 11:35:36 +02:00
Nikolay Shirokovskiy	5c0cd375d1	qemu: don't shutdown event thread in monitor EOF callback This hunk was introduced in [1] in order to avoid loosing events from monitor on stopping qemu process. But as explained in [2] on destroy we won't get neither EOF nor any other events as monitor is just closed. In case of crash/shutdown we won't get any more events as well and qemuDomainObjStopWorker will be called by qemuProcessStop eventually. Thus let's remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF as it is not useful anymore. [1] `e6afacb0f`: qemu: start/stop an event loop thread for domains [2] `d2954c072`: qemu: ensure domain event thread is always stopped Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-07 09:33:59 +03:00
Martin Kletzander	fc7d53edf4	qemu: Fix comment in qemuProcessSetupPid This was supposed to be done in commit `3791f29b08`, but I missed a spot. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2020-09-06 13:44:27 +02:00
Martin Kletzander	3791f29b08	qemu: Do not error out when setting affinity failed Consider a host with 8 CPUs. There are the following possible scenarios 1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs 2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs 3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus; QEMU should get 8 CPUs 4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus; QEMU should get 8 CPUs 5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus; QEMU should get 4 CPUs 6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus; QEMU should get 4 CPUs Scenarios 1 & 2 always work unless systemd restricted libvirtd privs. Scenario 3 works because libvirt checks current affinity first and skips the sched_setaffinity call, avoiding the SYS_NICE issue Scenario 4 works only if CAP_SYS_NICE is availalbe Scenarios 5 & 6 works only if CAP_SYS_NICE is present AND the cgroups cpuset is not set on the container. If libvirt blindly ignores the sched_setaffinity failure, then scenarios 4, 5 and 6 should all work, but with caveat in case 4 and 6, that QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively. This is still better than failing. Therefore libvirt can blindly ignore the setaffinity failure, but ONLY ignore it when there was no affinity specified in the XML config. If user specified affinity explicitly, libvirt must report an error if it can't be honoured. Resolves: https://bugzilla.redhat.com/1819801 Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-04 14:44:21 +02:00
Michal Privoznik	95b9db4ee2	lib: Prefer WITH_* prefix for #if conditionals Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH. Things got way better with Pavel's work on meson, but apparently, mixing these two lead to confusing and easy to miss bugs (see `31fb929eca` for instance). While we were forced to use HAVE_ prefix with autotools, we are free to chose our own prefix with meson and since WITH_ prefix appears to be more popular let's use it everywhere. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-09-02 10:28:10 +02:00
Laine Stump	95089f481e	util: assign tap device names using a monotonically increasing integer When creating a standard tap device, if provided with an ifname that contains "%d", rather than taking that literally as the name to use for the new device, the kernel will instead use that string as a template, and search for the lowest number that could be put in place of %d and produce an otherwise unused and unique name for the new device. For example, if there is no tap device name given in the XML, libvirt will always send "vnet%d" as the device name, and the kernel will create new devices named "vnet0", "vnet1", etc. If one of those devices is deleted, creating a "hole" in the name list, the kernel will always attempt to reuse the name in the hole first before using a name with a higher number (i.e. it finds the lowest possible unused number). The problem with this, as described in the previous patch dealing with macvtap device naming, is that it makes "immediate reuse" of a newly freed tap device name much more common, and in the aftermath of deleting a tap device, there is some other necessary cleanup of things which are named based on the device name (nwfilter rules, bandwidth rules, OVS switch ports, to name a few) that could end up stomping over the top of the setup of a new device of the same name for a different guest. Since the kernel "create a name based on a template" functionality for tap devices doesn't exist for macvtap, this patch for standard tap devices is a bit different from the previous patch for macvtap - in particular there was no previous "bitmap ID reservation system" or overly-complex retry loop that needed to be removed. We simply find and unused name, and pass that name on to the kernel instead of "vnet%d". This counter is also wrapped when either it gets to INT_MAX or if the full name would overflow IFNAMSIZ-1 characters. In the case of "vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly someday someone will change the name from vnet to something else. (NB: It is still possible for a user to provide their own parameterized template name (e.g. "mytap%d") in the XML, and libvirt will just pass that through to the kernel as it always has.) Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-09-01 14:16:44 -04:00
Laine Stump	d7f38beb2e	util: replace macvtap name reservation bitmap with a simple counter There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed as the new guest is starting. In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device). A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.) But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the least* recently used name, but "math" suggests that it will be much less common that we'll try to re-use the most recently used name. This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices. Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-09-01 14:16:36 -04:00
Ján Tomko	0a37e0695b	Split declarations from initializations Split those initializations that depend on a statement above them. Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-08-25 19:03:11 +02:00
Ján Tomko	a5152f23e7	Move declarations before statements Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2020-08-25 19:03:11 +02:00

1 2 3 4 5 ...

1357 Commits