libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2024-10-13 17:49:16 +00:00

Author	SHA1	Message	Date
Marc-André Lureau	638f066b73	qemu: prepare domain for vhost-user GPU Call qemuExtVhostUserGPUPrepareDomain() to fill the domain with the location of the vhost-user binary to start. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-09-24 13:19:09 -04:00
Marc-André Lureau	c3d0831745	qemu: validate virtio-gpu with vhost-user Check qemu capability, and accept 3d acceleration. 3d acceleration support is checked when looking for a suitable vhost-user helper. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Cole Robinson <crobinso@redhat.com>	2019-09-24 12:30:02 -04:00
Michal Privoznik	ccf41a4b57	qemu: Enable slirp-helper iff dbus-vmstate present The fact that qemu is capable -netdev socket is not enough to start a migratable domain. It also needs dbus-vmstate capability. Since there are already some qemu releases which have net-socket-dgram capability and don't have dbus-vmstate we need to check for dbus-vmstate. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-09-19 11:36:44 +02:00
Laine Stump	7cd0911e1a	qemu: support unmanaged target tap dev for <interface type='ethernet'> If managed='no', then the tap device must already exist, and setting of MAC address and online status (IFF_UP) is skipped. NB: we still set IFF_VNET_HDR and IFF_MULTI_QUEUE as appropriate, because those bits must be properly set in the TUNSETIFF we use to set the tap device name of the handle we've opened - if IFF_VNET_HDR has not been set and we set it the request will be honored even when running libvirtd unprivileged; if IFF_MULTI_QUEUE is requested to be different than how it was created, that will result in an error from the kernel. This means that you don't need to pay attention to IFF_VNET_HDR when creating the tap devices, but you do need to set IFF_MULTI_QUEUE if you're going to use multiple queues for your tap device. NB2: /dev/vhost-net normally has permissions 600, so it can't be opened by an unprivileged process. This would normally cause a warning message when using a virtio net device from an unprivileged libvirtd. I've found that setting the permissions for /dev/vhost-net permits unprivileged libvirtd to use vhost-net for virtio devices, but have no idea what sort of security implications that has. I haven't changed libvrit's code to avoid attempting to open /dev/vhost-net - if you are concerned about the security of opening up permissions of /dev/vhost-net (probably a good idea at least until we ask someone who knows about the code) then add <driver name='qemu'/> to the interface definition and you'll avoid the warning message. Note that virNetDevTapCreate() is the correct function to call in the case of an existing device, because the same ioctl() that creates a new tap device will also open an existing tap device. Resolves: https://bugzilla.redhat.com/1723367 (partially) Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2019-09-09 14:38:01 -04:00
Michal Privoznik	d301bc8d08	lib: Grab write lock when modifying list of domains In some places where virDomainObjListForEach() is called the passed callback calls virDomainObjListRemoveLocked(). Well, this is unsafe, because the former only grabs a read lock but the latter modifies the list. I've identified the following unsafe calls: - qemuProcessReconnectAll() - libxlReconnectDomains() The rest seem to be safe. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-09-07 08:22:30 +02:00
Marc-André Lureau	9145b3f1cc	qemu-process: prepare slirp-helper When the network interface is of "user" type, and QEMU has the "-net socket,fd=" datagram support, call qemuInterfacePrepareSlirp() to probe and associate a slirp-helper with the interface. The usage of automated slirp-helper can be prevented with disableSlirp (in particular when resuming a VM that didn't start with slirp-helper before). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-09-06 12:47:47 +02:00
Marc-André Lureau	eef413e728	qemu-extdevice: prepare, start and stop slirp-helper If a slirp-helper is associated with a network interface, prepare/start/stop the process via qemu-extdevice. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-09-06 12:47:47 +02:00
Marc-André Lureau	13e6083efa	qemu: reset VM id after external devices stop pid filenames (from swtpm and other helpers from this series) are based on VM shortname, which is derived from VM id. If the id is reset to early, the state filenames will not be found. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-09-06 12:47:47 +02:00
Marc-André Lureau	861882d314	qemu: replace logCtxt with qemuDomainLogAppendMessage() Once QEMU is started, the qemuDomainLogContext is owned by it, and can no longer be used from libvirt. Instead, use qemuDomainLogAppendMessage() which will redirect the log. This is not strictly necessary for swtpm, but the following patches are going to reuse qemuExtDeviceLogCommand(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-09-06 12:47:46 +02:00
Peter Krempa	3fbaf0587c	qemu: hotplug: Setup disk throttling with blockdev With blockdev we must issue the block_set_io_throttle QMP command to setup disk throttling as we currently can't do it with the 'throttle' layer. Unfortunately there's nothing we can do if it fails. https://bugzilla.redhat.com/show_bug.cgi?id=1733163 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-09-06 08:12:21 +02:00
Vitaly Kuznetsov	9f3b5f89d4	qemu: add support for Direct Mode for Hyper-V Synthetic timers QEMU-4.1 supports 'Direct Mode' for Hyper-V synthetic timers (hv-stimer-direct CPU flag): Windows guests can request that timer expiration notifications are delivered as normal interrupts (and not VMBus messages). This is used by Hyper-V on KVM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Ján Tomko <jtomko@redhat.com>	2019-08-19 11:38:28 +02:00
Jiri Denemark	c90fb5a828	qemu: Pass correct qemuCaps to virDomainDefPostParse Since qemuDomainDefPostParse callback requires qemuCaps, we need to make sure it gets the capabilities stored in the domain's private data if the domain is running. Passing NULL may cause QEMU capabilities probing to be triggered in case QEMU binary changed in the meantime. When this happens while a running domain object is locked, QMP event delivered to the domain before QEMU capabilities probing finishes will deadlock the event loop. This patch fixes all paths leading to virDomainDefPostParse. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-08-09 13:55:54 +02:00
Jiri Denemark	bbcfa07bea	qemu: Pass correct qemuCaps to virDomainDefCopy Since qemuDomainDefPostParse callback requires qemuCaps, we need to make sure it gets the capabilities stored in the domain's private data if the domain is running. Passing NULL may cause QEMU capabilities probing to be triggered in case QEMU binary changed in the meantime. When this happens while a running domain object is locked, QMP event delivered to the domain before QEMU capabilities probing finishes will deadlock the event loop. Several general functions from domain_conf.c were lazily passing NULL as the parseOpaque pointer instead of letting their callers pass the right data. This patch fixes all paths leading to virDomainDefCopy to do the right thing. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-08-09 13:55:54 +02:00
Jiri Denemark	900c595249	qemu: Pass qemuCaps to qemuDomainDefFormatBufInternal Since qemuDomainDefPostParse callback requires qemuCaps, we need to make sure it gets the capabilities stored in the domain's private data if the domain is running. Passing NULL may cause QEMU capabilities probing to be triggered in case QEMU binary changed in the meantime. When this happens while a running domain object is locked, QMP event delivered to the domain before QEMU capabilities probing finishes will deadlock the event loop. This patch fixes all paths leading to qemuDomainDefFormatBufInternal. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-08-09 13:55:54 +02:00
Wang Huaqiang	816cef0783	util, conf: Handle default monitor group of an allocation properly 'default monitor of an allocation' is defined as the resctrl monitor group that created along with an resctrl allocation, which is created by resctrl file system. If the monitor group specified in domain configuration file is happened to be a default monitor group of an allocation, then it is not necessary to create monitor group since it is already created. But if an monitor group is not an allocation default group, you should create the group under folder '/sys/fs/resctrl/mon_groups' and fill the vcpu PIDs to 'tasks' file. Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-08-05 19:41:11 +02:00
Jiri Denemark	ad9d5d3a6a	cpu: Drop CPUID definition for hv-spinlocks hv-spinlocks is not a CPUID feature and should not be checked as such. While starting a domain with hv-spinlocks enabled, we would report a warning about unsupported hyperv spinlocks feature even though it was set properly. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-30 17:09:53 +02:00
Jiri Denemark	0ccdd476bb	qemu: Fix hyperv features with QEMU 4.1 Originally the names of the hyperv CPU features were only used internally for looking up their CPUID bits. So we used "__kvm_hv_" prefix for them to make sure the names do not collide with normal CPU features stored in our CPU map. But with QEMU 4.1 we check which features were enabled or disabled by a freshly started QEMU process using their names rather than their CPUID bits (mostly because of MSR features). Thus we need to change our made up internal names into the actual names used by QEMU. Most of the names are only used with QEMU 4.1 and newer and the reset was introduced with QEMU recently enough to already support spelling with "-". Thus we don't need to define them as "hv_" with a translation to "hv-" for new QEMU. Without this patch libvirt would mistakenly report all hyperv features as unavailable and refuse to start any domain using them with QEMU 4.1. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-29 15:41:50 +02:00
Stefan Berger	72299db636	tpm: Run swtpm_setup with less parameters on incoming migration In case of an incoming migration we do not need to run swtpm_setup with all the parameters but only want to get the benefit of it creating a TPM state file for us that we can then label with an SELinux label. The actual state will be overwritten by the in- coming state. So we have to pass an indicator for incomingMigration all the way to the command line parameter generation for swtpm_setup. Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2019-07-27 07:56:00 -04:00
Eric Blake	c82abfdea9	backup: qemu: Detect node names at domain startup If we are using -blockdev, then node names are always available (because we set them). But when not using it, we have to scrape node names from QMP, and want to do so as infrequently as possible. We were scraping node names after reconnecting a new libvirtd to an existing guest (see qemuProcessReconnect), and after any block job that may have changed the set of node names we care about (legacy block jobs), but forgot to scrape the names when first starting a guest. Do so now in order to allow the checkpoint code to always have access to a node name without having to repeat a node name scrape itself. Future patches may need to clean up qemuDomainSetBlockThreshold (if node names are always available, then it doesn't need to repeat a scrape) and/or hotplug and media changes (if the addition of new nodes can result in a null node name, then scraping at that point in time would be appropriate). But for now, this patch addresses only the most common instance of a missing node name. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2019-07-26 16:48:58 -05:00
Peter Krempa	00c4c971fd	qemu: process: Don't use qemuBlockJobStartupFinalize in qemuProcessHandleBlockJob The block job event handler qemuProcessHandleBlockJob looks at the block job data to see whether the job requires synchronous handling. Since the block job event may arrive before we continue the job handling (if the job has no data to copy) we could hit the state when the job is still set as QEMU_BLOCKJOB_STATE_NEW (as we move it to the QEMU_BLOCKJOB_STATE_RUNNING state only after returning from monitor). If the event handler uses qemuBlockJobStartupFinalize it would unregister and free the job. Thankfully this is not a big problem for legacy blockjobs as we don't need much data for them but since we'd re-instantiate the job data structure we'd report wrong job type for active commit as qemu reports it as a regular commit job. Fix it by not using qemuBlockJobStartupFinalize function in qemuProcessHandleBlockJob as it is not starting the job anyways. https://bugzilla.redhat.com/show_bug.cgi?id=1721375 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-19 15:49:40 +02:00
Peter Krempa	0a9fd83240	qemu: Detect managed persistent reservations in block job orphan chains The PR manager is a property of the format layer in qemu so we need to be able to track it also in the chains of orphaned block jobs. Add a helper for qemu to look also into the blockjob state. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Peter Krempa	59a0306f07	qemu: process: Refresh -blockdev based blockjobs on reconnect to qemu Refresh the state of the jobs and process any events that might have happened while libvirt was not running. The job state processing requires some care to figure out if a job needs to be bumped. For any invalid job try doing our best to cancel it. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Peter Krempa	cbf4e3af70	qemu: Add handler for job state change event Add support for handling the event either synchronously or asynchronously using the event thread. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Peter Krempa	8e2a5c3a4c	qemu: process: Don't trigger BLOCK_JOB* events with -blockdev With blockdev we'll need to use the JOB_STATUS_CHANGE so gate the old events by the blockdev capability. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Peter Krempa	4cc4357f3e	qemu: blockjob: Save status XML when modifying job state Now that block job data is stored in the status XML portion we need to make sure that everything which changes the state also saves the status XML. The job registering function is used while parsing the status XML so in that case we need to skip the XML saving. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Peter Krempa	5ff46aaa7f	qemu: blockjob: Register new and running blockjobs in the global table Add the job structure to the table when instantiating a new job and remove it when it terminates/fails. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-07-18 17:59:34 +02:00
Cole Robinson	8911d843f3	conf: Add network xmlopt argument Pass an xmlopt argument through all the needed network conf functions, like is done for domain XML handling. No functional change for now Reviewed-by: Laine Stump <laine@laine.org> Signed-off-by: Cole Robinson <crobinso@redhat.com>	2019-07-17 17:18:56 -04:00
Peter Krempa	2cb86fc260	qemu: Implement support for 'capability_filters' config option Filter out the given capabilities and set domain taint if we've done so. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-21 15:24:06 +02:00
Peter Krempa	3616ec3927	qemu: domain: Add support for modifying qemu capability list via qemu namespace For testing purposes it's sometimes desired to be able to control the presence of capabilities of qemu. This adds the possibility to do this via the qemu namespace. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-21 15:24:06 +02:00
Peter Krempa	d9536f5cff	qemu: process: Report better error when virtlogd connection fails When connecting to virtlogd fails e.g. due to wrong libvirtd selinux process label we'd report an utterly useless error message: $ virsh start upstream error: Failed to start domain upstream error: Cannot recv data: Connection reset by peer Use virLastErrorPrefixMessage in the correct place to give a better sense of what's going on: $ virsh start upstream error: Failed to start domain upstream error: can't connect to virtlogd: Cannot recv data: Connection reset by peer Signed-off-by: Peter Krempa <pkrempa@redhat.com> ACKed-by: Michal Privoznik <mprivozn@redhat.com>	2019-06-20 17:10:24 +02:00
Jiri Denemark	8eb4a89f5f	qemu: Forbid MSR features with old QEMU Without "unavailable-features" CPU property we cannot properly detect whether a specific MSR feature we asked for (either explicitly or implicitly via a CPU model) was disabled by QEMU for some reason. Because this could break migration, snapshots, and save/restore operaions, it's better to just forbid any use of MSR features with QEMU which lacks "unavailable-features" CPU property. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 14:02:36 +02:00
Ján Tomko	7bf679aec6	qemu: remove json argument from qemuMonitorOpen Always assume JSON monitor was requested, since all the callers pass true anyway. Signed-off-by: Ján Tomko <jtomko@redhat.com> Acked-by: Peter Krempa <pkrempa@redhat.com>	2019-06-20 13:47:41 +02:00
Ján Tomko	466764346d	qemu: domain: remove monJSON field If we have a monitor, it is a JSON monitor. Signed-off-by: Ján Tomko <jtomko@redhat.com> Acked-by: Peter Krempa <pkrempa@redhat.com>	2019-06-20 13:47:41 +02:00
Ján Tomko	011f4eb124	qemu: assume monJSON is always true Now that we no longer support the HMP monitor, remove some dead code. Signed-off-by: Ján Tomko <jtomko@redhat.com> Acked-by: Peter Krempa <pkrempa@redhat.com>	2019-06-20 13:47:41 +02:00
Ján Tomko	4d497566e6	qemu: also delete qemuProcessAttach Now that the virDomainQemuAttach API returns an error, we can remove the unused qemuProcessAttach function as well, deleting the only user that possibly could have requested to open a non-JSON monitor. Signed-off-by: Ján Tomko <jtomko@redhat.com> Acked-by: Peter Krempa <pkrempa@redhat.com>	2019-06-20 12:47:10 +02:00
Michal Privoznik	7979066b69	qemuProcessLaunch: Return earlier if spawning qemu failed If spawning qemu fails then we report an error and proceed to writing status XML onto the disk. This is unnecessary as we are sure that the domain is not running. At the same time, if virPidFileReadPath() fails it returns -errno. Use it in the error message. It may explain what went wrong. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 10:29:54 +02:00
Jiri Denemark	63acb7bfd5	qemu_process: Prefer generic qemuMonitorGetGuestCPU When updating guest CPU definition according to the vCPU actually created by QEMU, we want to use the generic qemuMonitorGetGuestCPU to get both CPUID and MSR features. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 00:22:39 +02:00
Jiri Denemark	055f8f6bb9	qemu: Make qemuMonitorGetGuestCPU usable on x86 only It was never implemented or used for anything else anyway. Mainly because it uses CPUID features bits. The function is renamed as qemuMonitorGetGuestCPUx86 to make this explicit. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 00:22:39 +02:00
Jiri Denemark	0b763774a5	qemu: Filter CPU features in active XML Properly filter features which should not be passed to QEMU because they were never supported by QEMU or they did nothing and QEMU dropped them. Currently they are just silently ignored by the command line generator. Let's make this process more visible and clean by dropping the features from the domain's active definition in qemuProcessUpdateGuestCPU. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 00:22:37 +02:00
Jiri Denemark	955fd6e7a2	qemu_process: Drop cleanup label from qemuProcessUpdateGuestCPU Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-20 00:22:37 +02:00
Jie Wang	7a232286b9	qemu: Try harder to remove pr-helper object and kill pr-helper process If libvirt receives DISCONNECTED event and prDaemonRunning is set to false, and qemuDomainRemoveDiskDevice() is performing in the meantime, then qemuDomainRemoveDiskDevice() will fail to remove pr-helper object because prDaemonRunning is false. But removing that check from qemuHotplugRemoveManagedPR() is not enough, because after removing the object through monitor the qemuProcessKillManagedPRDaemon() is called which contains the same check. Thus the pr-helper process might be left behind. Signed-off-by: Jie Wang <wangjie88@huawei.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2019-06-14 09:51:10 +02:00
Peter Krempa	56c6893ff5	qemu: Use proper block job name when reconnecting to VM The hash table returned by qemuMonitorGetAllBlockJobInfo is organized by the frontend name (which skipps the 'drive-' prefix). While our code properly matches the jobs to the disk, qemu needs the full job name including the 'drive-' prefix to be able to identify jobs. Fix this by adding an argument to qemuMonitorGetAllBlockJobInfo which does not modify the job name before filling the hash. This fixes a regression where users would not be able to cancel/pivot block jobs after restarting libvirtd while a blockjob is running. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-12 09:40:02 +02:00
Andrea Bolognani	a84922c09e	qemu: Fix NULL pointer access in qemuProcessInitCpuAffinity() Commit `2f2254c7f4` attempted to fix a memory leak by ensuring cpumapToSet is always a freshly allocated bitmap, but regrettably introduced a NULL pointer access while doing so, because it called virBitmapCopy() without allocating the destination bitmap first. Solve the issue by using virBitmapNewCopy() instead. Reported-by: John Ferlan <jferlan@redhat.com> Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com>	2019-06-06 16:50:11 +02:00
Andrea Bolognani	de563ebcf9	qemu: Drop cleanup label from qemuProcessInitCpuAffinity() We're using VIR_AUTOPTR() for everything now, plus the cleanup section was not doing anything useful anyway. Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-04 15:54:04 +02:00
Andrea Bolognani	2f2254c7f4	qemu: Fix leak in qemuProcessInitCpuAffinity() In two out of three scenarios we are cleaning up properly after ourselves, but commit `5f2212c062` has changed the remaining one in a way that caused it to start leaking cpumapToSet. Refactor the logic so that cpumapToSet is always a freshly allocated bitmap that gets cleaned up automatically thanks to VIR_AUTOPTR(); this also allows us to remove the hostcpumap variable. Reported-by: John Ferlan <jferlan@redhat.com> Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-04 15:53:51 +02:00
Andrea Bolognani	5f2212c062	qemu: Fix qemuProcessInitCpuAffinity() Ever since the feature was introduced with commit `0f8e7ae33a`, it has contained a logic error in that it attempted to use a NUMA node map where a CPU map was expected. Because of that, guests using <numatune> might fail to start: # virsh start guest error: Failed to start domain guest error: cannot set CPU affinity on process 40055: Invalid argument This was particularly easy to trigger on POWER 8 machines, where secondary threads always show up as offline in the host: having <numatune> <memory mode='strict' placement='static' nodeset='1'/> </numatune> in the guest configuration, for example, would result in libvirt trying to set the process affinity so that it would prefer running on CPU 1, but since that's a secondary thread and thus shows up as offline, the operation would fail, and so would starting the guest. Use the newly introduced virNumaNodesetToCPUset() to convert the NUMA node map to a CPU map, which in the example above would be 48,56,64,72,80,88 - a valid input for virProcessSetAffinity(). https://bugzilla.redhat.com/show_bug.cgi?id=1703661 Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2019-06-04 09:29:35 +02:00
Jiri Denemark	7da62c91f0	qemu: Check TSC frequency before starting QEMU When migrating a domain with invtsc CPU feature enabled, the TSC frequency of the destination host must match the frequency used when the domain was started on the source host or the destination host has to support TSC scaling. If the frequencies do not match and the destination host does not support TSC scaling, QEMU will fail to set the right TSC frequency when starting vCPUs on the destination and thus migration will fail. However, this is quite late since both host might have spent significant time transferring memory and perhaps even storage data. By adding the check to libvirt we can let migration fail before any data starts to be sent over. If for some reason libvirt is unable to detect the host's TSC frequency or scaling support, we'll just let QEMU try and the migration will either succeed or fail later. Luckily, we mandate TSC frequency to be explicitly set in the domain XML to even allow migration of domains with invtsc. We can just check whether the requested frequency is compatible with the current host before starting QEMU. https://bugzilla.redhat.com/show_bug.cgi?id=1641702 Signed-off-by: Jiri Denemark <jdenemar@redhat.com>	2019-06-03 18:07:16 +02:00
Martin Kletzander	c67a3c0fc3	qemu: Set emulator thread scheduler only after QEMU starts If the scheduler is set before vCPU0 cannot be moved into its cpu,cpuacct cgroup. While it is not yet known whether this is a bug or not, it makes sense for us to do that later as otherwise the scheduler would be inherited by vCPU and I/O Threads even when they do not have any such setting specified. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2019-05-27 16:05:23 +02:00
Daniel P. Berrangé	e007e8ba3a	Revert "virt drivers: don't handle type=network after resolving actual network type" This reverts commit `2f5e6502e3`. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2019-04-30 14:42:22 +01:00
Michal Privoznik	0eaa4716e1	qemu: Set up EMULATOR thread and cpuset.mems before exec()-ing qemu It's funny how this went unnoticed for such a long time. Long story short, if a domain is configured with VIR_DOMAIN_NUMATUNE_MEM_STRICT libvirt doesn't really honour that. This is because of `7e72ac7878` after which libvirt allowed qemu to allocate memory just anywhere and only after that it used some magic involving cpuset.memory_migrate and cpuset.mems to move the memory to desired NUMA nodes. This was done in order to work around some KVM bug where KVM would fail if there wasn't a DMA zone available on the NUMA node. Well, while the work around might stopped libvirt tickling the KVM bug it also caused a bug on libvirt side: if there is not enough memory on configured NUMA node(s) then any attempt to start a domain must fail. Because of the way we play with guest memory domains can start just happily. The solution is to move the child we've just forked into emulator cgroup, set up cpuset.mems and exec() qemu only after that. This basically reverts `7e72ac7878` which was a workaround for kernel bug. This bug was apparently fixed because I've tested this successfully with recent kernel. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2019-04-18 17:53:42 +02:00

1 2 3 4 5 ...

1186 Commits