libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2024-12-22 21:55:25 +00:00

Author	SHA1	Message	Date
Jiri Denemark	66643931e7	qemu: Add support for /dev/userfaultfd /dev/userfaultfd device is preferred over userfaultfd syscall for post-copy migrations. Unless qemu driver is configured to disable mount namespace or to forbid access to /dev/userfaultfd in cgroup_device_acl, we will copy it to the limited /dev filesystem QEMU will have access to and label it appropriately. So in the default configuration post-copy migration will be allowed even without enabling vm.unprivileged_userfaultfd sysctl. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2024-02-13 17:44:26 +01:00
Praveen K Paladugu	4bfd513d92	hypervisor: Move domain interface mgmt methods Move domain interface management methods from qemu to hypervisor. This refactoring allows the domain management methods to be shared between CH and qemu drivers. This commit does not introduce any functional changes. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2024-02-02 10:58:26 +01:00
Praveen K Paladugu	a22d7fde17	conf: Drop unused parameter Drop unused parameter from virDomainNetReleaseActualDevice method. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2024-02-02 10:58:21 +01:00
Michal Privoznik	bee5301afa	qemu_process: Skip over non-virtio non-TAP NIC models when refreshing rx-filter After guest is started, or we are reconnecting to already running one (after daemon restart), qemuProcessRefreshRxFilters() is called to refresh rx-filters (basically MAC addresses of guest NICs) as they might have changed while we were not running (for the case when reconnecting to an already running guest), or we need to enable them by running a command (for freshly started guest - see processNicRxFilterChangedEvent()). Now, our XML parser allowed trustGuestRxFilters attribute for all types and models of <interface/> while in reality, only virtio model AND TUN/TAP based types can see MAC address changes. For other combinations, QEMU reports an error. This all means that when the daemon is restarted and it reconnects to a guest with, well invalid configuration, or when such guest is restored from a saved image, or migrated then we issue the monitor command, to which QEMU replies with an error which is then propagated to users: error: internal error: unable to execute QEMU command 'query-rx-filter': invalid net client name: hostdev0 While on one hand users should fix their configuration (and after v10.0.0-rc1~123 they can do that even on live domains), libvirt can also has some logic built in that prevent issuing the command in the first place (for obviously wrong cases). Fixes: `060d4c83ef` Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2024-01-25 15:55:33 +01:00
Peter Krempa	2da71d8e43	qemu: process: Separate setup of network device objects Separate the SLIRP bits from 'qemuProcessNetworkPrepareDevices' and do the setup of the internal data when setting up domain data. This will allow tests to use the same code path to lookup data for a network. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2024-01-04 22:26:10 +01:00
Artem Chernyshev	d05cdd1879	virprocess: virProcessGetNamespaces() to void virProcessGetNamespaces() return value is invariant, so change it type and remove all dependent checks. Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2024-01-04 17:06:14 +01:00
Guoyi Tu	dd2f36d66e	qemu_driver: Don't handle the EOF event if vm get restarted Currently, libvirt creates a thread pool with only on thread to handle all qemu monitor events for virtual machines, In the cases that if the thread gets stuck while handling a monitor EOF event, such as unable to kill the virtual machine process or release resources, the events of other virtual machine will be also blocked, which will lead to the abnormal behavior of other virtual machines. For instance, when another virtual machine completes a shutdown operation and the monitor EOF event has been queued but remains unprocessed, we immediately destroy and start the virtual machine again, at a later time when EOF event get processed, the processMonitorEOFEvent() will kill the virtual machine that just started. To address this issue, in the processMonitorEOFEvent(), we check whether the current virtual machine's id is equal to the the one at the time the event was generated. If they do not match, we immediately return. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn> Signed-off-by: dengpengcheng <dengpc12@chinatelecom.cn>	2024-01-03 17:13:23 +00:00
Peter Krempa	69880584e6	qemuProcessStartWithMemoryState: Don't start qemu with '-loadvm SNAP' and '-incoming defer' together A bug in qemuProcessStartWithMemoryState caused that we would start qemu with '-loadvm SNAP' and '-incoming defer' together. qemu doesn't expect that and crashes on an assertion failure [1]. [1]: https://issues.redhat.com/browse/RHEL-16782 Fixes: `8a88d3e586` Resolves: https://issues.redhat.com/browse/RHEL-17841 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>	2023-12-01 11:35:14 +01:00
Michal Privoznik	cfcbba4c2b	lib: Replace qsort() with g_qsort_with_data() While glibc provides qsort(), which usually is just a mergesort, until sorting arrays so huge that temporary array used by mergesort would not fit into physical memory (which in our case is never), we are not guaranteed it'll use mergesort. The advantage of mergesort is clear - it's stable. IOW, if we have an array of values parsed from XML, qsort() it and produce some output based on those values, we can then compare the output with some expected output, line by line. But with newer glibc this is all history. After [1], qsort() is no longer mergesort but introsort instead, which is not stable. This is suboptimal, because in some cases we want to preserve order of equal items. For instance, in ebiptablesApplyNewRules(), nwfilter rules are sorted by their priority. But if two rules have the same priority, we want to keep them in the order they appear in the XML. Since it's hard/needless work to identify places where stable or unstable sorting is needed, let's just play it safe and use stable sorting everywhere. Fortunately, glib provides g_qsort_with_data() which indeed implement mergesort and it's a drop in replacement for qsort(), almost. It accepts fifth argument (pointer to opaque data), that is passed to comparator function, which then accepts three arguments. We have to keep one occurance of qsort() though - in NSS module which deliberately does not link with glib. 1: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=03bf8357e8291857a435afcc3048e0b697b6cc04 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-11-24 09:53:14 +01:00
Pavel Hrdina	4f4a8dce94	qemu_process: fix crash in qemuSaveImageDecompressionStart Commit changing the code to allow passing NULL as @data into qemuSaveImageDecompressionStart() was not correct as it left the original call into the function as well. Introduced-by: `2f3e582a1a` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2247754 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-11-03 14:17:06 +01:00
Peter Krempa	61baeb1152	qemu: process: Extract host setup of disk device into helpers Currently the code sets up only VDPA backends but will be used later in hotplug code too. This patch also uses normal forward iteration in the loop in qemuProcessPrepareHostStorage as we don't need to remove disks from the disk list at that point. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-10-27 15:04:20 +02:00
Peter Krempa	3781988107	qemu: Refactor storage backend 'storage' layer helepr object setup Use the new nodename accessors for any storage layer helper object. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-10-17 14:16:16 +02:00
Pavel Hrdina	2f3e582a1a	qemuProcessStartWithMemoryState: make it possible to use without data When used with internal snapshots there is no memory state file so we have no data to load and decompression is not needed. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-10-09 13:56:50 +02:00
Pavel Hrdina	8a88d3e586	qemuProcessStartWithMemoryState: add snapshot argument When called from snapshot code we will need to pass snapshot object in order to make internal snapshots work correctly. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-10-09 13:56:49 +02:00
Pavel Hrdina	6a88060d32	qemuProcessStartWithMemoryState: allow setting reason for audit log When called by snapshot code we will need to use different reason. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-10-09 13:56:49 +02:00
Pavel Hrdina	6c0f30b37e	qemu_saveimage: move qemuSaveImageStartProcess to qemu_process The function will no longer be used only when restoring VM as it will be used when reverting snapshot as well so move it to qemu_process and rename it accordingly. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-10-09 13:56:49 +02:00
Jonathon Jongsma	447e09dfdb	qemu: Monitor nbdkit process for exit Adds the ability to monitor the nbdkit process so that we can take action in case the child exits unexpectedly. When the nbdkit process exits, we pause the vm, restart nbdkit, and then resume the vm. This allows the vm to continue working in the event of a nbdkit failure. Eventually we may want to generalize this functionality since we may need something similar for e.g. qemu-storage-daemon, etc. The process is monitored with the pidfd_open() syscall if it exists (since linux 5.3). Otherwise it resorts to checking whether the process is alive once a second. The one-second time period was chosen somewhat arbitrarily. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-19 14:28:50 -05:00
Jonathon Jongsma	dfa657aa27	qemu: include nbdkit state in private xml Add xml to the private data for a disk source to represent the nbdkit process so that the state can be re-created if the libvirt daemon is restarted. Format: <nbdkit> <pidfile>/path/to/nbdkit.pid</pidfile> <socketfile>/path/to/nbdkit.socket</socketfile> </nbdkit> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-19 14:28:50 -05:00
Jonathon Jongsma	e498941476	qemu: move qemuProcessReadLog() to qemuLogContext This code can be used by the nbdkit implementation for reading back filtered log data for error reporting. Move it to qemuLogContext so that it can be shared. Renamed to qemuLogContextReadFiltered(). Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-19 14:28:50 -05:00
Jonathon Jongsma	b658b1a27e	qemu: Extract qemuDomainLogContext into a new file This will allow us to use it for nbdkit logging in upcoming commits. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-19 14:28:50 -05:00
Jonathon Jongsma	abdc4f2092	Generalize qemuDomainLogContextNew() Allow to specify a basename for the log file so that qemuDomainLogContextNew() can be used to create log contexts for secondary loggers. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-19 14:28:50 -05:00
Jonathon Jongsma	4ef2bcfd3f	qemu: Implement support for vDPA block devices Requires recent qemu with support for the virtio-blk-vhost-vdpa device and the ability to pass a /dev/fdset/N path for the vdpa path (8.1.0) Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1900770 Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>	2023-09-12 11:06:41 -05:00
Peter Krempa	24b769a25b	qemu: capabilities: Remove unused 'virQEMUCapsFilterByMachineType' The filtering of qemu capabilities by machine type doesn't seem to be ever used, remove it and adjust callers. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2023-09-04 10:31:52 +02:00
Peter Krempa	a9e71cb737	qemu: process: Probe machine type data on reconnect to qemu When reconnecting we populate only the capability flags from the XML as we need to know the exact flags that were present when starting the VM. On the other hand the machine type data is not stored as it wasn't really used after startup. While storing all of the data into the status XML would be theoretically possible, with machine-type specific data it makes no sense to do so, and thus the data can be re-probed from the current instance. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2023-09-04 10:31:52 +02:00
Michal Privoznik	895525db81	qemu: Move error messages onto a single line Error messages are exempt from the 80 columns rule. Move them onto one line. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pavel Hrdina <phrdina@redhat.com>	2023-09-04 09:35:36 +02:00
Michal Privoznik	7d01b67323	src: Move _virDomainMemoryDef target nodes into an union The _virDomainMemoryDef struct is getting a bit messy. It has various members and only some of them are valid for given model. Worse, some are re-used for different models. We tried to make this more bearable by putting a comment next to each member describing what models the member is valid for, but that gets messy too. Therefore, do what we do elsewhere: introduce an union of structs and move individual members into their respective groups. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-08-24 12:39:26 +02:00
Michal Privoznik	f23a991bea	src: Move _virDomainMemoryDef source nodes into an union The _virDomainMemoryDef struct is getting a bit messy. It has various members and only some of them are valid for given model. Worse, some are re-used for different models. We tried to make this more bearable by putting a comment next to each member describing what models the member is valid for, but that gets messy too. Therefore, do what we do elsewhere: introduce an union of structs and move individual members into their respective groups. This allows us to shorten some names (e.g. nvdimmPath or sourceNodes) as their purpose is obvious due to their placement. But to make this commit as small as possible, that'll be addressed later. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-08-24 12:39:23 +02:00
Ján Tomko	2bad705ebb	qemu: remove pointless qemuDomainLogContextMode Since its introduction in `4d1b771fbb` it has only been used to differentiate between START and non-START. Last use of QEMU_DOMAIN_LOG_CONTEXT_MODE_ATTACH was removed by: commit `f709377301` qemu: Fix qemuDomainObjTaint with virtlogd QEMU_DOMAIN_LOG_CONTEXT_MODE_STOP is unused since: commit `cf3ea0769c` qemu: process: Append the "shutting down" message using the new APIs Now, the only caller passes QEMU_DOMAIN_LOG_CONTEXT_MODE_START. Assume that's always the case and remove the 'mode' argument. Signed-off-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2023-08-23 15:25:29 +02:00
Andrea Bolognani	b845e376a4	qemu: Match NVRAM template extension for new domains Keep things consistent by using the same file extension for the generated NVRAM path as the NVRAM template. Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2023-08-21 13:51:32 +02:00
Michal Privoznik	1ca3c339a1	lib: Prefer sizeof(variable) instead of sizeof(type) in memset If one of previous commits taught us something, it's that: sizeof(variable) and sizeof(type) are not the same. Especially because for live enough code the type might change (e.g. as we use autoptr more). And since we don't get any warnings when an incorrect length is passed to memset() it is easy to mess up. But with sizeof(variable) instead, it's not as easy. Therefore, switch to using memset(variable, 0, sizeof(*variable)), or its alternatives, depending on level of pointers. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Claudio Fontana <cfontana@suse.de>	2023-08-03 16:41:19 +02:00
Michal Privoznik	b20a5e9a4d	lib: use struct zero initializer instead of memset This is a more concise approach and guarantees there is no time window where the struct is uninitialized. Generated using the following semantic patch: @@ type T; identifier X; @@ - T X; + T X = { 0 }; ... when exists ( - memset(&X, 0, sizeof(X)); \| - memset(&X, 0, sizeof(T)); ) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Claudio Fontana <cfontana@suse.de>	2023-08-03 16:41:19 +02:00
Nikolai Barybin	2d6659e778	qemu: prevent SIGSEGV in qemuProcessHandleDumpCompleted If VIR_ASYNC_JOB_NONE flag is present, job.current is equal to NULL, which leads to SIGSEGV. Thus, this check should be moved up. Fixes: v8.0.0-427-gf304de0df6 Signed-off-by: Nikolai Barybin <nikolai.barybin@virtuozzo.com> Reviewed-by: Jiri Denemark <jdenemar@redhat.com>	2023-06-27 12:39:50 +02:00
Michal Privoznik	d09b73b560	qemu: Drop @unionMems argument from qemuProcessSetupPid() The @unionMems argument of qemuProcessSetupPid() function is not necessary really as all callers pass 'true'. Drop it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-06-08 09:39:20 +02:00
Michal Privoznik	83adba541a	qemu: Allow more generous cpuset.mems for vCPUs and IOThreads The unit that cpuset CGroups controller works with is a thread/process, not individual memory allocations. Therefore, after we've set cpuset.mems for emulator (after previous commit it's set to union of all host NUMA nodes allowed for given domain), and as we try to set up cpuset.mems for vCPUs/IOThreads, memory is migrated to selected NUMA node(s). We are effectively saying: "this thread (vCPU thread) can have memory only from these NUMA node(s)". That's not really what we want though. The cpuset controller doesn't differentiate memory "belonging" to the emulator thread and vCPU thread or IOThread even. Therefore, set union of all allowed host NUMA nodes, just like we're doing for the emulator thread. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-06-08 09:39:20 +02:00
Michal Privoznik	fddbb2f12f	qemu: Don't try to 'fix up' cpuset.mems after QEMU's memory allocation In ideal world, my plan was perfect. We allow union of all host nodes in cpuset.mems and once QEMU has allocated its memory, we 'fix up' restriction of its emulator thread by writing the original value we wanted to set all along. But in fact, we can't do it because that triggers memory movement. For instance, consider the following <numatune/>: <numatune> <memory mode="strict" nodeset="0"/> <memnode cellid="1" mode="strict" nodeset="1"/> </numatune> <numa> <cell id="0" cpus="0-1" memory="1024000" unit="KiB" /> <cell id="1" cpus="2-3" memory="1048576" unit="KiB"/> </numa> This is meant to create 1:1 mapping between guest and host NUMA nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it can allocate memory even for guest node #1 and have the memory come fro host node #1) and then, set cpuset.mems to "0" (because that's where we wanted emulator thread to live). But this in turn triggers movement of all memory (even the allocated one) to host NUMA node #0. Therefore, we have to just keep cpuset.mems untouched and rely on .host-nodes passed on the QEMU cmd line. The placement still suffers because of cpuset.mems set for vcpus or iothreads, but that's fixed in next commit. Fixes: `3ec6d586bc` Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-06-08 09:39:20 +02:00
Peter Krempa	9d6867198d	qemuMonitorSetBlockIoThrottle: Drop 'diskalias' argument Every caller will pass 'qdevid' as it's populated in the data mandatorily with qemu-4.2 and onwards due to mandatory -blockdev use. Thus we can drop compatibility with the old way of matching the disk via alias. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-06-05 13:20:13 +02:00
Michal Privoznik	8b9d2bda8a	qemu: Set proper PCI backend for <interface/>-s that are actually hostdevs When starting a domain, it's done so in two steps (actually more, but lets focus on just the following two): 1) qemuProcessPrepareDomain(), followed by 2) qemuProcessPrepareHost(). Now, in the first step (PrepareDomain()), PCI backends for all hostdevs is set (qemuProcessPrepareDomain() -> qemuProcessPrepareDomainHostdevs() -> qemuDomainPrepareHostdev() -> qemuDomainPrepareHostdevPCI()). Perfect. But then, additional hostdevs may appear, because in the host prepare phase we may insert some hostdevs into domain definition (qemuProcessPrepareHost() -> qemuProcessNetworkPrepareDevices()). Now, these additional hostdevs don't undergo the same prepare as hostdevs that were already present in the domain definition (i.e. in qemuProcessPrepareDomain() phase). Therefore, we have to call corresponding prepare function explicitly. NB, the interface hotplug code (qemuDomainAttachNetDevice()) does not suffer from this problem, because it calls top level qemuDomainAttachHostDevice() which is used to hotplug regular hostdevs too and as such calls qemuDomainPrepareHostdev(). Fixes: `3b87709c76` Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2209853 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-06-05 12:18:53 +02:00
Michal Privoznik	e53291514c	qemu_hotplug: Temporarily allow emulator thread to access other NUMA nodes during mem hotplug Again, this fixes the same problem as one of previous commits, but this time for memory hotplug. Long story short, if there's a domain running and the emulator thread is restricted to a subset of host NUMA nodes, but the memory that's about to be hotplugged requires memory from a host NUMA node that's not in the set we need to allow emulator thread to access the node, temporarily. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-05-23 17:21:16 +02:00
Michal Privoznik	3ec6d586bc	qemu: Start emulator thread with more generous cpuset.mems Consider a domain with two guest NUMA nodes and the following <numatune/> setting : <numatune> <memory mode="strict" nodeset="0"/> <memnode cellid="0" mode="strict" nodeset="1"/> </numatune> What this means is the emulator thread is pinned onto host NUMA node #0 (by setting corresponding cpuset.mems to "0"), and two memory-backend-* objects are created: -object '{"qom-type":"memory-backend-ram","id":"ram-node0", .., "host-nodes":[1],"policy":"bind"}' \ -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ -object '{"qom-type":"memory-backend-ram","id":"ram-node1", .., "host-nodes":[0],"policy":"bind"}' \ -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \ Note, the emulator thread is pinned well before QEMU is even exec()-ed. Now, the way memory allocation works in QEMU is: the emulator thread calls mmap() followed by mbind() (which is sane, that's how everybody should do it). BUT, because the thread is already restricted by CGroups to just NUMA node #0, calling: mbind(host-nodes:[1]); /* made up syntax (TM) */ fails. This is expected though. Kernel was instructed to place the memory at NUMA node "0" and yet, process is trying to place it elsewhere. We used to solve this by not restricting emulator thread at all initially, and only after it's done initializing (i.e. we got the QMP greeting) we placed it onto desired nodes. But this had its own problems (e.g. QEMU might have locked pieces of its memory which were then unable to migrate onto different NUMA nodes). Therefore, in v5.1.0-rc1~282 we've changed this and set cgroups upfront (even before exec()-ing QEMU). And this used to work, but something has changed (I can't really put my finger on it). Therefore, for the initialization start the thread with union of all configured host NUMA nodes ("0-1" in our example) and fix the placement only after QEMU is started. NB, the memory hotplug suffers the same problem, but that will be fixed in the next commit. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-05-23 17:21:16 +02:00
Michal Privoznik	c4a7f8007c	qemuProcessSetupPid: Use @numatune variable more Inside of qemuProcessSetupPid() there's @numatune variable which is set to vm->def->numa, but it lives only in one block. In the rest of places the expanded form (vm->def->numa) is used instead. Move the variable declaration at the beginning of the function and use it instead of the expanded form. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-05-23 17:21:16 +02:00
Michal Privoznik	37e41b7f16	qemu: Drop @forceVFIO argument of qemuDomainGetMemLockLimitBytes() After previous cleanup, there's not a single caller that would call qemuDomainGetMemLockLimitBytes() with @forceVFIO set. All callers pass false. Drop the unneeded argument from the function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-05-16 14:43:43 +02:00
Andrea Bolognani	934113d376	qemu: Find helpers at runtime Use the recently introduced virFindFileInPathFull() function to discover the path for qemu-bridge-helper and qemu-pr-helper at runtime. Note that it's still possible for the administrator to prevent this lookup and use arbitrary binaries by setting the appropriate keys in qemu.conf: this simply removes the need to perform the lookup at build time, and thus to have the helpers installed in the build environment. Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-05-10 18:54:09 +02:00
Michal Privoznik	4644aba0b0	qemu: Stop virQEMUCaps propagation into qemuHostdevPreparePCIDevices() After previous cleanups, qemuHostdevPreparePCIDevices() no longer needs virQEMUCaps. Drop its passing from callers. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-04-25 12:36:31 +02:00
Michal Privoznik	430fc2ec26	qemu: Remove empty functions After previous cleanup, there are some functions that do nothing: qemuConnectDomainXMLToNativePrepareHostHostdev() qemuConnectDomainXMLToNativePrepareHost() qemuProcessPrepareHostHostdev() qemuProcessPrepareHostHostdevs() Remove them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-04-25 12:36:31 +02:00
Michal Privoznik	fea0d8c40d	qemu: Move <hostdev> SCSI path generation into qemuDomainPrepareHostdev() When preparing a SCSI <hostdev/> with passthrough of a host SCSI adapter (i.e. no protocol), a virStorageSource structure is initialized and stored inside virDomainHostdevDef. But the source structure is filled in many places, with almost the same code. Firstly, qemuProcessPrepareHostHostdev() and qemuConnectDomainXMLToNativePrepareHostHostdev() are the same. Secondly, qemuDomainPrepareHostdev() allocates the src structure, only to let qemuProcessPrepareHostHostdev() fill src->path later. Well, src->path can be filled at the same place where the src structure is allocated (qemuDomainPrepareHostdev()) which renders the other two functions needless. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Martin Kletzander <mkletzan@redhat.com>	2023-04-25 12:36:30 +02:00
Peter Krempa	b60efa9a39	qemuProcessRefreshDisks: Extract update of a single disk Extract the logic to update one single disk (without emitting any events) so that it can be reused when updating the state after a disk hotplug. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-04-24 12:57:56 +02:00
Peter Krempa	c8e7ed7f7b	qemuProcessRefreshDisks: Properly compare tray status The code compares the 'tray_open' boolean from 'struct qemuDomainDiskInfo' directly against 'disk->tray_status' which is declared as virDomainDiskTray (enum). Now the logic works correctly because the _OPEN enum has value '1'. Separate the event emission code from the update code and remember the old tray state in a separate variable rather than having the sneaky logic we have today. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2023-04-24 12:57:56 +02:00
Martin Kletzander	383caddea1	qemu, ch: Move threads to cgroup dir before changing parameters With cgroupv2 this has better effect on the resource allocation. An excerpt from Documentation/admin-guide/cgroup-v2.rst explains is this way: Migrating a process across cgroups is a relatively expensive operation and stateful resources such as memory are not moved together with the process. This is an explicit design decision as there often exist inherent trade-offs between migration and various hot paths in terms of synchronization cost. [...] Setting a non-empty value to "cpuset.mems" causes memory of tasks within the cgroup to be migrated to the designated nodes if they are currently using memory outside of the designated nodes. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>	2023-04-20 12:39:49 +02:00
Jiri Denemark	49f2835ee3	qemu/qemu_process: Update format strings in translated messages Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2023-04-01 11:40:34 +02:00
Michal Privoznik	b4ccb0dc41	qemu: Move cpuset preference evaluation into a separate function The set of if()-s that determines the preference in cpumask used for setting things like emulatorpin, vcpupin, etc. is going to be re-used. Separate it out into a function. You may think that this changes behaviour, but qemuProcessPrepareDomainNUMAPlacement() ensures that priv->autoCpuset is set for VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Kristina Hanicova <khanicov@redhat.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com>	2023-03-15 12:46:40 +01:00

1 2 3 4 5 ...

1723 Commits