libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2025-01-09 14:35:25 +00:00

Author	SHA1	Message	Date
Peter Krempa	8715120e4d	qemu: cgroup: Don't use priv->ncpupids to iterate domain vCPUs Use the proper data structures for the iteration since ncpupids will be made private later.	2015-12-09 14:57:12 +01:00
Peter Krempa	e6b36736a8	qemu: Add helper to retrieve vCPU pid Instead of directly accessing the array add a helper to do this.	2015-12-09 14:57:12 +01:00
Peter Krempa	220a2d51de	qemu: Replace checking for vcpu<->pid mapping availability with a helper Add qemuDomainHasVCpuPids to do the checking and replace in place checks with it. We no longer need checking whether the thread contains fake data (vcpupids[0] == vm->pid) as in `b07f3d821d` and `65686e5a81` this was removed.	2015-12-09 14:57:12 +01:00
Peter Krempa	6ba02c21ac	qemu: cgroup: Remove now unreachable check Since commit `0c04906fa` the check for priv->cgroup doesn't make sense as the calls to virCgroupHasController return the same information. Remove it and move it's comment partially to the new check. The already spurious check was also later copied to the iothreads code.	2015-12-09 14:57:12 +01:00
Ján Tomko	1c00dcd665	qemu: add passed-through input devs to cgroup ACL https://bugzilla.redhat.com/show_bug.cgi?id=1231114	2015-11-30 12:59:10 +01:00
Ján Tomko	eebe58adeb	qemuSetupChrSourceCgroup: rename dev to source We do not have a pointer to the device here, just its source.	2015-11-23 13:52:18 +01:00
Ján Tomko	b8286f0666	Simplify qemuSetupChrSourceCgroup and its callers The domain definition is not needed in any of these functions. Only pass it to qemuSetupChardevCgroup, which is used as a callback for virDomainChrDefForeach. Use the right type for passing virDomainObjPtr instead of void* where possible.	2015-11-23 13:52:18 +01:00
Ján Tomko	b57ce788a7	rename qemuSetupHostdevCGroup to qemuSetupHostdevCgroup Change CGroup to Cgroup to match other functions in the file.	2015-11-23 13:52:18 +01:00
John Ferlan	10604cb8c5	qemu: Check for niothreads == 0 in qemuSetupCgroupForIOThreads If there are no IOThreads defined, no sense making other checks	2015-10-16 06:49:19 -04:00
Jiri Denemark	cda2afac79	qemuDomainEventQueue: Check if event is non-NULL Every single call to qemuDomainEventQueue() uses the following pattern: if (event) qemuDomainEventQueue(driver, event); Let's move the check for valid event to qemuDomainEventQueue and simplify all callers. Signed-off-by: Jiri Denemark <jdenemar@redhat.com>	2015-09-18 13:50:03 +02:00
Martin Kletzander	7b5acf9461	qemu: Sync BlkioDevice values when setting them in cgroups The problem here is that there are some values that kernel accepts, but does not set them, for example 18446744073709551615 which acts the same way as zero. Let's do the same thing we do with other tuning options and re-read them right after they are set in order to keep our internal structures up-to-date. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165580 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2015-08-18 16:27:43 -07:00
Luyao Huang	1439eb32af	qemu: fix some api cannot work when disable cpuset in conf If cpuset is disabled or not available, it libvirt must not use it. Mainly for actions that do not need it and can use sched_setaffinity() or numa_membind() instead, because they will fail without good reason. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1244664 Signed-off-by: Luyao Huang <lhuang@redhat.com>	2015-08-03 13:08:00 +02:00
Peter Krempa	88f6c007c3	cgroup: Drop resource partition from virSystemdMakeScopeName The scope name, even according to our docs is "machine-$DRIVER\x2d$VMNAME.scope" virSystemdMakeScopeName would use the resource partition name instead of "machine-" if it was specified thus creating invalid scope paths. This makes libvirt drop cgroups for a VM that uses custom resource partition upon reconnecting since the detected scope name would not match the expected name generated by virSystemdMakeScopeName. The error is exposed by the following log entry: debug : virCgroupValidateMachineGroup:302 : Name 'machine-qemu\x2dtestvm.scope' for controller 'cpu' does not match 'testvm', 'testvm.libvirt-qemu' or 'machine-test-qemu\x2dtestvm.scope' for a "/machine/test" resource and "testvm" vm. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1238570	2015-07-22 07:12:56 +02:00
Peter Krempa	0b416434f8	qemu: 'privileged' flag is not really configuration The privileged flag will not change while the configuration might change. Make the 'privileged' flag member of the driver again and mark it immutable. Should that ever change add an accessor that will group reads of the state.	2015-06-18 15:13:45 +02:00
Peter Krempa	ee3da892f2	conf: Refactor emulatorpin handling Store the emulator pinning cpu mask as a pure virBitmap rather than the virDomainPinDef since it stores only the bitmap and refactor qemuDomainPinEmulator to do the same operations in a much saner way. As a side effect virDomainEmulatorPinAdd and virDomainEmulatorPinDel can be removed since they don't add any value.	2015-06-03 09:42:07 +02:00
Michal Privoznik	bcd9a564b6	virDomainNumatuneGetMode: Report if numatune was defined So far, we are not reporting if numatune was even defined. The value of zero is blindly returned (which maps onto VIR_DOMAIN_NUMATUNE_MEM_STRICT). Unfortunately, we are making decisions based on this value. Instead, we should not only return the correct value, but report to the caller if the value is valid at all. For better viewing of this patch use '-w'. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2015-05-20 14:02:25 +02:00
John Ferlan	b266486fb9	Move iothreadspin information into iothreadids Remove the iothreadspin array from cputune and replace with a cpumask to be stored in the iothreadids list. Adjust the test output because our printing goes in order of the iothreadids list now.	2015-04-27 12:36:35 -04:00
John Ferlan	8d4614a512	qemu: Use domain iothreadids to IOThread's 'thread_id' Add 'thread_id' to the virDomainIOThreadIDDef as a means to store the 'thread_id' as returned from the live qemu monitor data. Remove the iothreadpids list from _qemuDomainObjPrivate and replace with the new iothreadids 'thread_id' element. Rather than use the default numbering scheme of 1..number of iothreads defined for the domain, use the iothreadid's list for the iothread_id Since iothreadids list keeps track of the iothread_id's, these are now used in place of the many places where a for loop would "know" that the ID was "+ 1" from the array element. The new tests ensure usage of the <iothreadid> values for an exact number of iothreads and the usage of a smaller number of <iothreadid> values than iothreads that exist (and usage of the default numbering scheme).	2015-04-27 12:36:35 -04:00
Peter Krempa	5a35b2e599	qemu: cgroup: Fix priorities when setting emulatorpin Use the custom emulator pin setting with the highest priority same as with vcpupin.	2015-04-24 09:59:38 +02:00
John Ferlan	0456eda317	cgroup: Use virCgroupNewThread Replace the virCgroupNew{Vcpu\|Emulator\|IOThread} calls with the common virCgroupNewThread API Signed-off-by: John Ferlan <jferlan@redhat.com>	2015-04-09 19:27:08 -04:00
Luyao Huang	7cd0cf05f7	fix memleak in qemuRestoreCgroupState 131,088 bytes in 16 blocks are definitely lost in loss record 2,174 of 2,176 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x52A026F: virReallocN (viralloc.c:245) by 0x52BFCB5: saferead_lim (virfile.c:1268) by 0x52C00EF: virFileReadLimFD (virfile.c:1328) by 0x52C019A: virFileReadAll (virfile.c:1351) by 0x52A5D4F: virCgroupGetValueStr (vircgroup.c:763) by 0x1DDA0DA3: qemuRestoreCgroupState (qemu_cgroup.c:805) by 0x1DDA0DA3: qemuConnectCgroup (qemu_cgroup.c:857) by 0x1DDB7BA1: qemuProcessReconnect (qemu_process.c:3694) by 0x52FD171: virThreadHelper (virthread.c:206) by 0x82B8DF4: start_thread (pthread_create.c:308) by 0x85C31AC: clone (clone.S:113) Signed-off-by: Luyao Huang <lhuang@redhat.com>	2015-04-08 11:56:30 +02:00
Michal Privoznik	225aa80246	virQEMUDriverGetConfig: Fix memleak ==19015== 968 (416 direct, 552 indirect) bytes in 1 blocks are definitely lost in loss record 999 of 1,049 ==19015== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==19015== by 0x52ADF14: virAllocVar (viralloc.c:560) ==19015== by 0x5302FD1: virObjectNew (virobject.c:193) ==19015== by 0x1DD9401E: virQEMUDriverConfigNew (qemu_conf.c:164) ==19015== by 0x1DDDF65D: qemuStateInitialize (qemu_driver.c:666) ==19015== by 0x53E0823: virStateInitialize (libvirt.c:777) ==19015== by 0x11E067: daemonRunStateInit (libvirtd.c:905) ==19015== by 0x53201AD: virThreadHelper (virthread.c:206) ==19015== by 0xA1EE1F2: start_thread (in /lib64/libpthread-2.19.so) ==19015== by 0xA4EFC8C: clone (in /lib64/libc-2.19.so) Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2015-04-07 18:52:27 +02:00
Michal Privoznik	9dbe6f3151	qemuSetupCgroupForVcpu: Fix memleak ==19015== 1,064 (656 direct, 408 indirect) bytes in 2 blocks are definitely lost in loss record 1,002 of 1,049 ==19015== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==19015== by 0x52AD74B: virAlloc (viralloc.c:144) ==19015== by 0x52B47CA: virCgroupNew (vircgroup.c:1057) ==19015== by 0x52B53E5: virCgroupNewVcpu (vircgroup.c:1451) ==19015== by 0x1DD85A40: qemuSetupCgroupForVcpu (qemu_cgroup.c:1013) ==19015== by 0x1DDA66EA: qemuProcessStart (qemu_process.c:4844) ==19015== by 0x1DDF1807: qemuDomainObjStart (qemu_driver.c:7265) ==19015== by 0x1DDF1A66: qemuDomainCreateWithFlags (qemu_driver.c:7320) ==19015== by 0x1DDF1ACD: qemuDomainCreate (qemu_driver.c:7337) ==19015== by 0x53F87EA: virDomainCreate (libvirt-domain.c:6820) ==19015== by 0x12690A: remoteDispatchDomainCreate (remote_dispatch.h:3481) ==19015== by 0x126827: remoteDispatchDomainCreateHelper (remote_dispatch.h:3457) Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2015-04-07 18:52:26 +02:00
Peter Krempa	6afb0d04fe	qemu: cgroup: Kill qemuSetupCgroupVcpuPin() The function doesn't make sense. There's a simpler way to achieve the same.	2015-04-02 10:12:08 +02:00
Peter Krempa	8a81264b18	qemu: cgroup: Kill qemuSetupCgroupIOThreadsPin() The function doesn't make sense. There's a simpler way to achieve the same.	2015-04-02 10:12:08 +02:00
Peter Krempa	55072593d8	qemu: cgroup: Rename qemuSetupCgroupEmulatorPin to qemuSetupCgroupCpusetCpus The function is used to set cpuset.cpus in various other helpers.	2015-04-02 10:12:08 +02:00
Peter Krempa	98f08aba8e	qemu: cgroup: Use priv->autoCpuset instead of using qemuPrepareCpumap() Two places would call to qemuPrepareCpumap() with priv->autoNodeset to convert it to a cpuset. Remove the function and use the prepared cpuset automatically.	2015-04-02 10:12:08 +02:00
Peter Krempa	f0fa9080d4	qemu: cgroup: Properly set up vcpu pinning When the default cpuset or automatic numa placement is used libvirt would place the whole parent cgroup in the specified cpuset. This then disallowed to re-pin the vcpus to a different cpu. This patch pins only the vcpu threads to the default cpuset and thus allows to re-pin them later. The following config would fail to start: <domain type='kvm'> ... <vcpu placement='static' cpuset='0-1' current='2'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='2-3'/> ... This is a regression since `a39f69d2b`.	2015-04-02 10:12:08 +02:00
Peter Krempa	7095006921	qemu: cgroup: Refactor setup for IOThread cgroups Use the default or auto cpuset if they are provided for IOThreads.	2015-04-02 10:12:08 +02:00
Peter Krempa	c9f9fa25d3	qemu: cgroup: Store auto cpuset instead of re-creating it on demand The automatic cpuset can be stored along with automatic nodeset and it does not have to be recreated when used.	2015-04-02 10:12:08 +02:00
Martin Kletzander	3a0e5b0c20	qemu: Migrate memory on numatune change We've never set the cpuset.memory_migrate value to anything, keeping it on default. However, we allow changing cpuset.mems on live domain. That setting, however, don't have any consequence on a domain unless it's going to allocate new memory. I managed to make 'virsh numatune' move all the memory to any node I wanted even without disabling libnuma's numa_set_membind(), so this should be safe to use with it as well. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1198497 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2015-03-20 13:40:02 +01:00
John Ferlan	a9f528ab29	Convert virDomainPinDefPtr->vcpuid to virDomainPinDefPtr->id Since we're not specifically a vcpu related structure anymore...	2015-03-16 11:54:57 -04:00
John Ferlan	59ba70237a	Convert virDomainVcpuPinDefPtr to virDomainPinDefPtr As pointed out by jtomko in his review of the IOThreads pinning code: http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html there are some comments sprinkled in indicating IOThreads were using the same structure as the VcpuPin code... This is the first patch of a few that will change the virDomainVcpuPin* structures and code to just virDomainPin* - starting with the data structure naming...	2015-03-16 11:54:56 -04:00
Pavel Hrdina	cf521fc8ba	memtune: change the way how we store unlimited value There was a mess in the way how we store unlimited value for memory limits and how we handled values provided by user. Internally there were two possible ways how to store unlimited value: as 0 value or as VIR_DOMAIN_MEMORY_PARAM_UNLIMITED. Because we chose to store memory limits as unsigned long long, we cannot use -1 to represent unlimited. It's much easier for us to say that everything greater than VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid value despite that it makes no sense to set limit to 0. Remove unnecessary function virCompareLimitUlong. The update of test is to prevent the 0 to be miss-used as unlimited in future. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539 Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2015-03-06 11:52:24 +01:00
Peter Krempa	6bc80fa86d	conf: numa: Rename virDomainNumatune to virDomainNuma The structure will gradually become the only place for NUMA related config, thus rename it appropriately.	2015-02-20 17:43:04 +01:00
Pavel Hrdina	77a9dc0b8d	qemu_cgroup: initialize mem_mask to NULL If 'virNumaGetHostNodeset()' fails then the error path will try to free uninitialized pointer mem_mask. Introduced by commit `af2a1f058`. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2015-02-17 14:22:50 +01:00
Daniel P. Berrange	f7afeddce9	qemu: report TAP device indexes to systemd Record the index of each TAP device created and report them to systemd, so they show up in machinectl status for the VM.	2015-01-27 13:57:02 +00:00
Daniel P. Berrange	318df5a05f	Add support for systemd-machined CreateMachineWithNetwork systemd-machined introduced a new method CreateMachineWithNetwork that obsoletes CreateMachine. It expects to be given a list of VETH/TAP device indexes for the host side device(s) associated with a container/machine. This falls back to the old CreateMachine method when the new one is not supported.	2015-01-15 11:07:07 +00:00
Martin Kletzander	86759ec61a	qemu: Add missing goto error in qemuRestoreCgroupState Commit `af2a1f05` tried clearly separating each condition in qemuRestoreCgroupState() for the sake of readability, however somehow one condition body was missing. That means that the body of the next condition got executed only if both of there were true, which is impossible, thus resulting in a dead code and a logic error. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 20:44:33 +01:00
Martin Kletzander	af2a1f0587	qemu: Leave cpuset.mems in parent cgroup alone Instead of setting the value of cpuset.mems once when the domain starts and then re-calculating the value every time we need to change the child cgroup values, leave the cgroup alone and rather set the child data every time there is new cgroup created. We don't leave any task in the parent group anyway. This will ease both current and future code. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	c74d58ad47	qemu: Save numad advice into qemuDomainObjPrivate Thanks to that we don't need to drag the pointer everywhere and future code will get cleaner. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	f801a81208	qemu: Remove unnecessary qemuSetupCgroupPostInit function Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	5cca4cd16f	Remove unnecessary curly brackets in src/qemu/ Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-11-14 17:13:01 +01:00
Wang Rui	c6e9024867	qemu: fix domain startup failing with 'strict' mode in numatune If the memory mode is specified as 'strict' and with one node, we get the following error when starting domain. error: Unable to write to '$cgroup_path/cpuset.mems': Device or resource busy XML is configured with numatune as follows: <numatune> <memory mode='strict' nodeset='0'/> </numatune> It's broken by Commit `411cea638f` which moved qemuSetupCgroupForEmulator() before setting cpuset.mems in qemuSetupCgroupPostInit. Directory '$cgroup_path/emulator/' is created in qemuSetupCgroupForEmulator. But '$cgroup_path/emulator/cpuset.mems' it not set and has a default value (all nodes, such as 0-1). Then we setup '$cgroup_path/cpuset.mems' to the nodemask (in this case it's '0') in qemuSetupCgroupPostInit. It must fail. This patch makes '$cgroup_path/emulator/cpuset.mems' is set before '$cgroup_path/cpuset.mems'. The action is similar with that in qemuDomainSetNumaParamsLive. Signed-off-by: Wang Rui <moon.wangrui@huawei.com>	2014-11-11 12:14:09 +01:00
Wang Rui	38a0f6df64	qemu: don't setup cpuset.mems if memory mode in numatune is not 'strict' If the memory mode in numatune is specified as 'preferred' with one node (such as nodeset='0'), domain's memory is not all in node 0 absolutely. Assumption that node 0 doesn't have enough memory, memory can be allocated on node 1 when qemu process startup. Then if we set cpuset.mems to '0', it may invoke OOM. Commit `1a7be8c600` changed the former logic of checking memory mode in virDomainNumatuneGetNodeset. This patch adds the check as before. Signed-off-by: Wang Rui <moon.wangrui@huawei.com>	2014-11-11 12:14:09 +01:00
Martin Kletzander	9661ac2f46	qemu: unref cfg after TerminateMachine has been called Commit `4882618ed1` added the code that requests driver cfg, but forgot to unref it. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-10-21 13:54:09 +02:00
Guido Günther	4882618ed1	qemu: use systemd's TerminateMachine to kill all processes If we don't properly clean up all processes in the machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm starts fail with 'CreateMachine: File exists' Additional processes can e.g. be added via echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks but there are other cases like http://bugs.debian.org/761521 Invoke TerminateMachine to be on the safe side since systemd tracks the cgroup anyway. This is a noop if all processes have terminated already.	2014-10-01 20:17:46 +02:00
Ján Tomko	e26bbf49cc	Fix crash cpu_shares change event crash on domain startup Introduced by commit `0dce260`. qemuDomainEventQueue was called with qemuDomainObjPrivatePtr instead of virQEMUDriverPtr. https://bugzilla.redhat.com/show_bug.cgi?id=1147494	2014-09-29 13:58:43 +02:00
Daniel P. Berrange	0778c0be8d	Rename tunable event constants For the new VIR_DOMAIN_EVENT_ID_TUNABLE event we have a bunch of constants added VIR_DOMAIN_EVENT_CPUTUNE_<blah> VIR_DOMAIN_EVENT_BLKDEVIOTUNE_<blah> This naming convention is bad for two reasons - There is no common prefix unique for the events to both relate them, and distinguish them from other event constants - The values associated with the constants were chosen to match the names used with virConnectGetAllDomainStats so having EVENT in the constant name is not applicable in that respect This patch proposes renaming the constants to VIR_DOMAIN_TUNABLE_CPU_<blah> VIR_DOMAIN_TUNABLE_BLKDEV_<blah> ie, given them a common VIR_DOMAIN_TUNABLE prefix. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-09-26 10:58:15 +01:00
Pavel Hrdina	0dce260cc8	cputune_event: queue the event for cputune updates Now we have universal tunable event so we can use it for reporting changes to user. The cputune values will be prefixed with "cputune" to distinguish it from other tunable events. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2014-09-23 21:58:09 +02:00
Ján Tomko	c1480871bb	Fixes for domains with no iothreads Plug a memory leak and silence a warning.	2014-09-18 14:49:01 +02:00
John Ferlan	500c91c57d	qemu_cgroup: Adjust spacing around incrementor Change "i+1" to "i + 1"	2014-09-15 21:05:46 -04:00
John Ferlan	5f6ad32c73	qemu_cgroup: Introduce cgroup functions for IOThreads In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's. These will support having an 'iotrhreadpin' element in the 'cpuset' in order to pin named IOThreads to specific CPU's. The IOThread pin names will follow the IOThread naming scheme starting at 1 (eg "iothread1") up through an including the def->iothreads value.	2014-09-15 13:18:56 -04:00
Peter Krempa	1c6999d340	conf: RNG: Always fill in default random source path for default backend Libvirt documents that the default entropy source for the 'random' backend of a RNG device is /dev/random. Instead of storing and propagating NULL across our code and checking it in multiple places fill the default in the post parse callback and use that in the other places.	2014-07-28 10:07:09 +02:00
Peter Krempa	bbddbefa2f	virtio-rng: allow multiple RNG devices qemu supports adding multiple RNG devices. This patch allows libvirt to support this.	2014-07-25 09:34:53 +02:00
Peter Krempa	99ff49eed1	qemu: cgroup: Don't use NULL path on default backed RNGs The "random" backend for virtio-rng can be started with no path specified which equals to /dev/random. The cgroup code didn't consider this and called few of the functions with NULL resulting into: $ virsh start rng-vm error: Failed to start domain rng-vm error: Path '(null)' is not accessible: Bad address Problem introduced by commit `c6320d3463`	2014-07-25 09:34:53 +02:00
John Ferlan	17bddc46f4	hostdev: Introduce virDomainHostdevSubsysSCSIiSCSI Create the structures and API's to hold and manage the iSCSI host device. This extends the 'scsi_host' definitions added in commit id '5c811dce'. A future patch will add the XML parsing, but that code requires some infrastructure to be in place first in order to handle the differences between a 'scsi_host' and an 'iSCSI host' device.	2014-07-24 07:04:44 -04:00
John Ferlan	42957661dc	hostdev: Introduce virDomainHostdevSubsysSCSIHost Split virDomainHostdevSubsysSCSI further. In preparation for having either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI to contain just a virDomainHostdevSubsysSCSIHost to describe the 'scsi_host' host device	2014-07-24 06:39:28 -04:00
John Ferlan	5805621cd9	hostdev: Introduce virDomainHostdevSubsysSCSI Create a separate typedef for the hostdev union data describing SCSI Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	1c8da0d44e	hostdev: Introduce virDomainHostdevSubsysPCI Create a separate typedef for the hostdev union data describing PCI. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	7540d07f09	hostdev: Introduce virDomainHostdevSubsysUSB Create a separate typedef for the hostdev union data describing USB. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
Martin Kletzander	7e72ac7878	qemu: leave restricting cpuset.mems after initialization When domain is started with numatune memory mode strict and the nodeset does not include host NUMA node with DMA and DMA32 zones, KVM initialization fails. This is because cgroup restrict even kernel allocations. We are already doing numa_set_membind() which does the same thing, only it does not restrict kernel allocations. This patch leaves the userspace numa_set_membind() in place and moves the cpuset.mems setting after the point where monitor comes up, but before vcpu and emulator sub-groups are created. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	aa668fccf0	qemu: split out cpuset.mems setting Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	1a7be8c600	numatune: add support for per-node memory bindings in private APIs Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	93e82727ec	numatune: Encapsulate numatune configuration in order to unify results There were numerous places where numatune configuration (and thus domain config as well) was changed in different ways. On some places this even resulted in persistent domain definition not to be stable (it would change with daemon's restart). In order to uniformly change how numatune config is dealt with, all the internals are now accessible directly only in numatune_conf.c and outside this file accessors must be used. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	e764ec7ae3	numatune: unify numatune struct and enum names Since there was already public virDomainNumatune*, I changed the private virNumaTune to match the same, so all the uses are unified and public API is kept: s/vir$Domain$\?Numa[tT]une/virDomainNumatune/g then shrunk long lines, and mainly functions, that were created after that: sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g' And to cope with the enum name, I haad to change the constants as well: s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g Last thing I did was at least a little shortening of already long name: s/virDomainNumatuneDef/virDomainNumatune/g Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	0c04906fa8	qemu: don't error out when cgroups don't exist When creating cgroups for vcpu and emulator threads whilst starting a domain, we explicitly skip creating those cgroups in case priv->cgroup is NULL (cgroups not supported) because SetAffinity() serves the same purpose. If the host supports only some cgroups (the ones we need are either unmounted or disabled in qemu.conf), we error out with weird message even though we could continue starting the domain. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1097028 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-09 15:09:54 +02:00
Peter Krempa	1ba14d6df2	qemu: cgroup: Setup only the top level disk image for read-write access Only the top level gets writes, so the rest of the backing chain requires only read-only access.	2014-07-09 10:38:55 +02:00
Peter Krempa	aa53c77e1d	qemu: cgroup: Add functions to set cgroup image stuff on individual imgs Add functions that will allow to set all the required cgroup stuff on individual images taking a virStorageSourcePtr. Also convert functions designed to setup whole backing chain to take advantage of the change.	2014-07-09 10:38:55 +02:00
Peter Krempa	63834faadb	storage: Move readonly and shared flags to disk source from disk def In the future we might need to track state of individual images. Move the readonly and shared flags to the virStorageSource struct so that we can keep them in a per-image basis.	2014-07-08 14:27:19 +02:00
Ján Tomko	d4edce5f1e	Always report an error if virBitmapFormat fails It already reports an error if STRDUP fails.	2014-06-06 14:35:19 +02:00
Michal Privoznik	4dae1eddde	qemuSetupCgroupForVcpu: s/virProcessInfoSetAffinity/virProcessSetAffinity/ In the `f56c773bf` we've made the substitution but forgot to fix one comment which is still referring to the old name. This may be potentially misleading. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2014-05-22 12:30:20 +02:00
Nehal J Wani	3d5c29a17c	Fix typos in src/* Fix minor typos in source comments Signed-off-by: Eric Blake <eblake@redhat.com>	2014-04-21 16:49:08 -06:00
Daniel P. Berrange	edfe82c7f9	Replace Usb with USB throughout Since it is an abbreviation, USB should always be fully capitalized or full lower case, never Usb. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:59 +01:00
Daniel P. Berrange	21a2446d92	Replace Scsi with SCSI throughout Since it is an abbreviation, SCSI should always be fully capitalized or full lower case, never Scsi. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:31 +01:00
Ján Tomko	97814d8ab3	Show the real cpu shares value in live XML Currently, the Linux kernel treats values of '0' and '1' as the minimum of 2. Values larger than the maximum are changed to the maximum. Re-reading the shares value after setting it reflects this in the live domain XML.	2014-03-26 10:10:13 +01:00
Ján Tomko	bdffab0d5c	Treat zero cpu shares as a valid value Currently, <cputune><shares>0</shares></cputune> is treated as if it were not specified. Treat is as a valid value if it was explicitly specified and write it to the cgroups.	2014-03-26 10:10:02 +01:00
Ján Tomko	5922d05aec	Indent top-level labels by one space in src/qemu/	2014-03-25 14:58:39 +01:00
Daniel P. Berrange	2835c1e730	Add virLogSource variables to all source files Any source file which calls the logging APIs now needs to have a VIR_LOG_INIT("source.name") declaration at the start of the file. This provides a static variable of the virLogSource type. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-03-18 14:29:22 +00:00
Osier Yang	10c9ceff6d	util: Add one argument for several scsi utils To support passing the path of the test data to the utils, one more argument is added to virSCSIDeviceGetSgName, virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related code is changed accordingly. Later tests for the scsi utils will be based on this patch. Signed-off-by: Osier Yang <jyang@redhat.com>	2014-01-30 15:48:28 +08:00
Pradipta Kr. Banerjee	c6320d3463	Add hw random number generator (/dev/hwrng) to cgroup ACL Creating a qemu VM with /dev/hwrng as backend RNG device throws the following error - "Could not open '/dev/hwrng': Permission denied" This patch fixes the issue Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2014-01-27 09:48:39 -07:00
Osier Yang	2b66504ded	util: Add "shareable" field for virSCSIDevice struct Unlike the host devices of other types, SCSI host device XML supports "shareable" tag. This patch introduces it for the virSCSIDevice struct for a later patch use (to detect if the SCSI device is shareable when preparing the SCSI host device in QEMU driver).	2014-01-23 17:52:33 +08:00
Gao feng	3b431929a2	blkio: Setting throttle blkio cgroup for domain This patch introduces virCgroupSetBlkioDeviceReadIops, virCgroupSetBlkioDeviceWriteIops, virCgroupSetBlkioDeviceReadBps and virCgroupSetBlkioDeviceWriteBps, we can use these interfaces to set up throttle blkio cgroup for domain. This patch also adds the new throttle blkio cgroup elements to the test xml. Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2014-01-20 10:52:44 +08:00
Gao feng	b9ce5d388f	rename virBlkioDeviceWeightPtr to virBlkioDevicePtr The throttle blkio cgroup will reuse this struct. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-12-12 12:29:59 +00:00
Eric Blake	5d509e9ee2	maint: fix comma style issues: qemu Most of our code base uses space after comma but not before; fix the remaining uses before adding a syntax check. * src/qemu/qemu_cgroup.c: Consistently use commas. * src/qemu/qemu_command.c: Likewise. * src/qemu/qemu_conf.c: Likewise. * src/qemu/qemu_driver.c: Likewise. * src/qemu/qemu_monitor.c: Likewise. Signed-off-by: Eric Blake <eblake@redhat.com>	2013-11-20 09:14:55 -07:00
Cole Robinson	a924d9d083	qemu: cgroup: Fix crash if starting nographics guest We can dereference graphics[0] even if guest has no graphics device configured. I screwed this up in `a216e64872` https://bugzilla.redhat.com/show_bug.cgi?id=1014088	2013-10-01 11:22:18 -04:00
Peter Krempa	4baa8d7637	cleanup: Kill usage of access(PATH, F_OK) in favor of virFileExists() Semantics of the libvirt helper are more clear. This change also allows to clean up some pieces of code.	2013-09-16 10:37:39 +02:00
Cole Robinson	a216e64872	qemu: Set QEMU_AUDIO_DRV=none with -nographic On my machine, a guest fails to boot if it has a sound card, but not graphical device/display is configured, because pulseaudio fails to initialize since it can't access $HOME. A workaround is removing the audio device, however on ARM boards there isn't any option to do that, so -nographic always fails. Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately this has massive test suite fallout. Add a qemu.conf parameter nographics_allow_host_audio, that if enabled will pass through QEMU_AUDIO_DRV from sysconfig (similar to vnc_allow_host_audio)	2013-09-02 16:53:39 -04:00
Michal Privoznik	94a24dd3a9	qemuSetupMemoryCgroup: Handle hard_limit properly Since 16bcb3 we have a regression. The hard_limit is set unconditionally. By default the limit is zero. Hence, if user hasn't configured any, we set the zero in cgroup subsystem making the kernel kill the corresponding qemu process immediately. The proper fix is to set hard_limit iff user has configured any.	2013-08-20 15:03:17 +02:00
Michal Privoznik	16bcb3b616	qemu: Drop qemuDomainMemoryLimit This function is to guess the correct limit for maximal memory usage by qemu for given domain. This can never be guessed correctly, not to mention all the pains and sleepless nights this code has caused. Once somebody discovers algorithm to solve the Halting Problem, we can compute the limit algorithmically. But till then, this code should never see the light of the release again.	2013-08-19 11:16:58 +02:00
Daniel P. Berrange	1166eeba61	Fix crashing upgrading from older libvirts with running guests If upgrading from a libvirt that is older than 1.0.5, we can not assume that vm->def->resource is non-NULL. This bogus assumption caused libvirtd to crash Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-08-02 15:32:26 +01:00
Daniel P. Berrange	2fe2470181	Enable support for systemd-machined in cgroups creation Make the virCgroupNewMachine method try to use systemd-machined first. If that fails, then fallback to using the traditional cgroup setup code path. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-31 19:29:19 +01:00
Daniel P. Berrange	5ec5a22493	Add 'controllers' arg to virCgroupNewDetect When detecting cgroups we must honour any controllers whitelist the driver may have. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:55:47 +01:00
Daniel P. Berrange	a45b99ead9	Introduce a more convenient virCgroupNewDetectMachine Instead of requiring drivers to use a combination of calls to virCgroupNewDetect and virCgroupIsValidMachine, combine the two into virCgroupNewDetectMachine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:47:30 +01:00
Daniel P. Berrange	02098ac260	Convert QEMU driver to use virCgroupNewMachine Convert the QEMU driver code to use the new atomic API for setup of cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 11:42:47 +01:00
Daniel P. Berrange	2049ef9942	Create + setup cgroups atomically for QEMU process Currently the QEMU driver creates the VM's cgroup prior to forking, and then uses a virCommand hook to move the child into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically. Fortunately we have a handshake taking place between the QEMU driver and the child process prior to QEMU being exec()d, which was introduced to allow setup of disk locking. By good fortune this synchronization point can be used to enable the QEMU driver to do atomic setup of cgroups removing the use of the hook script. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	87b2e6fa84	Auto-detect existing cgroup placement Use the new virCgroupNewDetect function to determine cgroup placement of existing running VMs. This will allow the legacy cgroups creation APIs to be removed entirely Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	0d7f45aea7	Convert remainder of cgroups code to report errors Convert the remaining methods in vircgroup.c to report errors instead of returning errno values. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Daniel P. Berrange	b64dabff27	Report full errors from virCgroupNew* Instead of returning raw errno values, report full libvirt errors in virCgroupNew* functions. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Peter Krempa	bac2182041	qemu: Cleanup coding style nits in qemu_cgroup.c	2013-07-18 14:58:12 +02:00
Osier Yang	a39f69d2bb	qemu: Set cpuset.cpus for domain process When either "cpuset" of <vcpu> is specified, or the "placement" of <vcpu> is "auto", only setting the cpuset.mems might cause the guest starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is "auto"): 1) Related XMLs <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> 2) Host NUMA topology % numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 16374 MB node 0 free: 11899 MB node 1 cpus: 32 36 40 44 48 52 56 60 node 1 size: 16384 MB node 1 free: 15318 MB node 2 cpus: 2 6 10 14 18 22 26 30 node 2 size: 16384 MB node 2 free: 15766 MB node 3 cpus: 34 38 42 46 50 54 58 62 node 3 size: 16384 MB node 3 free: 15347 MB node 4 cpus: 3 7 11 15 19 23 27 31 node 4 size: 16384 MB node 4 free: 15041 MB node 5 cpus: 35 39 43 47 51 55 59 63 node 5 size: 16384 MB node 5 free: 15202 MB node 6 cpus: 1 5 9 13 17 21 25 29 node 6 size: 16384 MB node 6 free: 15197 MB node 7 cpus: 33 37 41 45 49 53 57 61 node 7 size: 16368 MB node 7 free: 15669 MB 4) cpuset.cpus will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus' to '0-63' 5) The advisory nodeset got from querying numad (from debug log) 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 : Nodeset returned from numad: 1 6) cpuset.mems will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems' to '0-7' I.E, the domain process's memory is restricted on the first NUMA node, however, it can use all of the CPUs, which will likely cause the domain process to fail to start because of the kernel fails to allocate memory with the the memory policy as "strict". % tail -n 20 /var/log/libvirt/qemu/toy.log ... 2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 : Handshake with parent is done char device redirected to /dev/pts/2 (label charserial0) kvm_init_vcpu failed: Cannot allocate memory ... Signed-off-by: Peter Krempa <pkrempa@redhat.com>	2013-07-18 14:57:57 +02:00
Daniel P. Berrange	50760e2a8a	Convert 'int i' to 'size_t i' in src/qemu files Convert the type of loop iterators named 'i', 'j', k', 'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or 'unsigned int', also santizing 'ii', 'jj', 'kk' to use the normal 'i', 'j', 'k' naming Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-10 17:55:15 +01:00
Michal Privoznik	e987a30dfa	Adapt to VIR_ALLOC and virAsprintf in src/qemu/*	2013-07-10 11:07:32 +02:00
Jiri Denemark	e0e438af00	qemu: Move memory limit computation to a reusable function	2013-07-08 12:35:27 +02:00
Laine Stump	1d829e1306	pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev I realized after the fact that it's probably better in the long run to give this function a name that matches the name of the link used in sysfs to hold the group (iommu_group). I'm changing it now because I'm about to add several more functions that deal with iommu groups.	2013-06-25 18:07:38 -04:00
Osier Yang	8da9516a84	qemu: Abstract code for the cpu controller setting into a helper	2013-06-05 19:25:48 +08:00
Michal Privoznik	a88fb3009f	Adapt to VIR_STRDUP and VIR_STRNDUP in src/qemu/*	2013-05-23 09:56:38 +02:00
Osier Yang	66194f71df	src/qemu: Remove the whitespace before ';'	2013-05-21 23:41:44 +08:00
Osier Yang	58f8e0cd58	qemu: Don't remove the "return 0" Commit `f60a50c795` intended to remove the warning only, but not with the "return 0" together.	2013-05-21 23:08:57 +08:00
Osier Yang	479d5991cd	qemu: Abstract code for cpuset controller setting into a helper	2013-05-20 19:57:00 +08:00
Osier Yang	9f2455d359	qemu: Abstract code for devices controller setting into a helper	2013-05-20 19:52:35 +08:00
Osier Yang	f60a50c795	qemu: Abstract code for memory controller setting into a helper	2013-05-20 19:39:54 +08:00
Osier Yang	2fd16df7b5	qemu: Abstract the code for blkio controller setting into a helper	2013-05-20 19:24:45 +08:00
Daniel P. Berrange	c2cf5f1c2a	Fix failure to detect missing cgroup partitions Change `bbe97ae968` caused the QEMU driver to ignore ENOENT errors from cgroups, in order to cope with missing /proc/cgroups. This is not good though because many other things can cause ENOENT and should not be ignored. The callers expect to see ENXIO when cgroups are not present, so adjust the code to report that errno when /proc/cgroups is missing Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-05-17 10:25:15 +01:00
Jim Fehlig	bbe97ae968	Fix starting domains when kernel has no cgroups support Found that I was unable to start existing domains after updating to a kernel with no cgroups support # zgrep CGROUP /proc/config.gz # CONFIG_CGROUPS is not set # virsh start test error: Failed to start domain test error: Unable to initialize /machine cgroup: Cannot allocate memory virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when attempting to open /proc/cgroups on such a system, but it was being dropped in virCgroupSetPartitionSuffix(). Change virCgroupSetPartitionSuffix() to propagate errors returned by its callees. Also check for ENOENT in qemuInitCgroup() when determining if cgroups support is available.	2013-05-13 09:27:46 -06:00
Han Cheng	6eb42e38e8	qemu: Allow the scsi-generic device in cgroup This adds the scsi-generic device into the device controller's whitelist, so that it's allowed to used by the qemu process. Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat.com>	2013-05-13 19:08:34 +08:00
Laine Stump	52ba0f6e1c	qemu: fix stupid typos in VFIO cgroup setup/teardown I must have looked at this a couple dozen times before I noticed it had "!=" instead of "==". Not doing this setup prevented qemu from doing anything with the vfio group device.	2013-05-03 14:32:54 -04:00
Michal Privoznik	7c9a2d88cd	virutil: Move string related functions to virstring.c The source code base needs to be adapted as well. Some files include virutil.h just for the string related functions (here, the include is substituted to match the new file), some include virutil.h without any need (here, the include is removed), and some require both.	2013-05-02 16:56:55 +02:00
Laine Stump	811143c0b6	qemu: put usb cgroup setup in common function The USB-specific cgroup setup had been inserted inline in qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a common cgroup setup function called for all hostdevs, so it makes sens to put the usb-specific setup there and just rely on that function being called. The one thing I'm uncertain of here (and a reason for not pushing until after release) is that previously hostdev->missing was checked only when starting a domain (and cgroup setup for the device skipped if missing was true), but with this consolidation, it is now checked in the case of hotplug as well. I don't know if this will have any practical effect (does it make sense to hotplug a "missing" usb device?)	2013-04-29 21:52:28 -04:00
Laine Stump	6e13860cb4	qemu: add vfio devices to cgroup ACL when appropriate PCIO device assignment using VFIO requires read/write access by the qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the VFIO group number that the assigned device belongs to (and can be found with the function virPCIDeviceGetVFIOGroupDev) /dev/vfio/vfio can be accessible to any guest without danger (according to vfio developers), so it is added to the static ACL. The group device must be dynamically added to the cgroup ACL for each vfio hostdev in two places: 1) for any devices in the persistent config when the domain is started (done during qemuSetupCgroup()) 2) at device attach time for any hotplug devices (done in qemuDomainAttachHostDevice) The group device must be removed from the ACL when a device it "hot-unplugged" (in qemuDomainDetachHostDevice()) Note that USB devices are already doing their own cgroup setup and teardown in the hostdev-usb specific function. I chose to make the new functions generic and call them in a common location though. We can then move the USB-specific code (which is duplicated in two locations) to this single location. I'll be posting a followup patch to do that.	2013-04-29 21:52:28 -04:00
Daniel P. Berrange	1e05073fbb	Replace more cases of /system with /machine The change in commit `aed4986322` was incomplete, missing a couple of cases of /system. This caused failure to start VMs. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 17:11:36 +01:00
Daniel P. Berrange	aed4986322	Change default resource partition to /machine After discussions with systemd developers it was decided that a better default policy for resource partitions is to have 3 default partitions at the top level /system - system services /machine - virtual machines / containers /user - user login session This ensures that the default policy isolates guest from user login sessions & system services, so a mis-behaving guest can't consume 100% of CPU usage if other things are contending for it. Thus we change the default partition from /system to /machine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 12:10:12 +01:00
Daniel P. Berrange	767596bdb4	Remove non-functional code for setting up non-root cgroups The virCgroupNewDriver method had a 'bool privileged' param. If a false value was ever passed in, it would simply not work, since non-root users don't have any privileges to create new cgroups. Just delete this broken code entirely and make the QEMU driver skip cgroup setup in non-privileged mode Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	db44eb1b5f	Change default cgroup layout for QEMU/LXC and honour XML config Historically QEMU/LXC guests have been placed in a cgroup layout that is $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME This is bad for a number of reasons - The cgroup hierarchy gets very deep which seriously impacts kernel performance due to cgroups scalability limitations. - It is hard to setup cgroup policies which apply across services and virtual machines, since all VMs are underneath the libvirtd service. To address this the default cgroup location is changed to be /system/$VMNAME.{lxc,qemu}.libvirt This puts virtual machines at the same level in the hierarchy as system services, allowing consistent policy to be setup across all of them. This also honours the new resource partition location from the XML configuration, for example <resource> <partition>/virtualmachines/production</partitions> </resource> will result in the VM being placed at /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt NB, with the exception of the default, /system, path which is intended to always exist, libvirt will not attempt to auto-create the partitions in the XML. It is the responsibility of the admin/app to configure the partitions. Later libvirt APIs will provide a way todo this. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	aa8604dd45	Add a new virCgroupNewPartition for setting up resource partitions A resource partition is an absolute cgroup path, ignoring the current process placement. Expose a virCgroupNewPartition API for constructing such cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	04c18d25f1	Rename virCgroupForXXX to virCgroupNewXXX Rename all the virCgroupForXXX methods to use the form virCgroupNewXXX since they are all constructors. Also make sure the output parameter is the last one in the list, and annotate all pointers as non-null. Fix up all callers, and make sure they use true/false not 0/1 for the boolean parameters Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	632f78caaf	Store a virCgroupPtr instance in qemuDomainObjPrivatePtr Instead of calling virCgroupForDomain every time we need the virCgrouPtr instance, just do it once at Vm startup and cache a reference to the object in qemuDomainObjPrivatePtr until shutdown of the VM. Removing the virCgroupPtr from the QEMU driver state also means we don't have stale mount info, if someone mounts the cgroups filesystem after libvirtd has been started Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Stefan Berger	22feb0d3e7	QEMU Cgroup support for TPM passthrough Some refactoring for virDomainChrSourceDef type of devices so we can use common code. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>	2013-04-12 16:55:46 -04:00
Daniel P. Berrange	dca927c82f	Rename virCgroupMounted to virCgroupHasController & make it more robust The virCgroupMounted method is badly named, since a controller can be mounted, but disabled in the current object. Rename the method to be virCgroupHasController. Also make it tolerant to a NULL virCgroupPtr and out-of-range controller index, to avoid duplication of these checks in all callers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-08 14:49:12 +01:00
Daniel P. Berrange	56f27b3bbc	Don't create dirs in cgroup controllers we don't want to use Currently when getting an instance of virCgroupPtr we will create the path in all cgroup controllers. Only at the virt driver layer are we attempting to filter controllers. This is bad because the mere act of creating the dirs in the controllers can have a functional impact on the kernel, particularly for performance. Update the virCgroupForDriver() method to accept a bitmask of controllers to use. Only create dirs in the controllers that are requested. When creating cgroups for domains, respect the active controller list from the parent cgroup Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-05 10:41:54 +01:00
Gao feng	45e9d27ad8	NUMA: cleanup for numa related codes Intend to reduce the redundant code,use virNumaSetupMemoryPolicy to replace virLXCControllerSetupNUMAPolicy and qemuProcessInitNumaMemoryPolicy. This patch also moves the numa related codes to the file virnuma.c and virnuma.h Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-03-20 19:37:00 +08:00
Daniel P. Berrange	7f544a4c8f	Don't try to add non-existant devices to ACL The QEMU driver has a list of devices nodes that are whitelisted for all guests. The kernel has recently started returning an error if you try to whitelist a device which does not exist. This causes a warning in libvirt logs and an audit error for any missing devices. eg 2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Daniel P. Berrange	279336c5d8	Avoid spamming logs with cgroups warnings The code for putting the emulator threads in a separate cgroup would spam the logs with warnings 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4 2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6 This is because it has only created child cgroups for 3 of the controllers, but was trying to move the processes from all the controllers. The fix is to only try to move threads in the controllers we actually created. Also remove the warning and make it return a hard error to avoid such lazy callers in the future. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Eric Blake	82d5fe5437	qemu: check backing chains even when cgroup is omitted https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out a regression caused by commit `38c4a9c` - libvirt only labels the backing chain if the backing chain cache is populated, but the code to populate the cache was only conditionally performed if cgroup labeling was necessary. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup... * src/qemu/qemu_process.c (qemuProcessStart): ...earlier into caller, where it is now unconditional.	2013-02-21 12:32:56 -07:00
Daniel P. Berrange	77c3015f9c	Rename all USB device functions to have a standard name prefix Rename all the usbDeviceXXX and usbXXXDevice APIs to have a fixed virUSBDevice name prefix	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	3e86e8f327	Fix leak of usbDevice struct when initializing cgroups When iterating over USB host devices to setup cgroups, the usbDevice object was leaked in both LXC and QEMU driers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	b090aa7d55	Introduce a virQEMUDriverConfigPtr object Currently the virQEMUDriverPtr struct contains an wide variety of data with varying access needs. Move all the static config data into a dedicated virQEMUDriverConfigPtr object. The only locking requirement is to hold the driver lock, while obtaining an instance of virQEMUDriverConfigPtr. Once a reference is held on the config object, it can be used completely lockless since it is immutable. NB, not all APIs correctly hold the driver lock while getting a reference to the config object in this patch. This is safe for now since the config is never updated on the fly. Later patches will address this fully. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 15:49:25 +00:00
Eric Blake	7034531814	maint: fix comment typo While OOM can have knock-on effects that trash a system, generally the first symptom is one of memory thrashing. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.	2013-01-09 16:45:59 -07:00
Michal Privoznik	3c83df679e	qemu: Relax hard RSS limit Currently, if there's no hard memory limit defined for a domain, libvirt tries to calculate one, based on domain definition and magic equation and set it upon the domain startup. The rationale behind was, if there's a memory leak or exploit in qemu, we should prevent the host system trashing. However, the equation was too tightening, as it didn't reflect what the kernel counts into the memory used by a process. Since many hosts do have a swap, nobody hasn't noticed anything, because if hard memory limit is reached, process can continue allocating memory on a swap. However, if there is no swap on the host, the process gets killed by OOM killer. In our case, the qemu process it is. To prevent this, we need to relax the hard RSS limit. Moreover, we should reflect more precisely the kernel way of accounting the memory for process. That is, even the kernel caches are counted within the memory used by a process (within cgroups at least). Hence the magic equation has to be changed: limit = 1.5 * (domain memory + total video memory) + (32MB for cache per each disk) + 200MB	2013-01-08 16:32:11 +01:00
Daniel P. Berrange	f24404a324	Rename virterror.c virterror_internal.h to virerror.{c,h}	2012-12-21 11:19:50 +00:00
Daniel P. Berrange	44f6ae27fe	Rename util.{c,h} to virutil.{c,h}	2012-12-21 11:19:49 +00:00
Daniel P. Berrange	ab9b7ec2f6	Rename memory.{c,h} to viralloc.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	936d95d347	Rename logging.{c,h} to virlog.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	f9c7020c1f	Rename cgroup.{h,c} to vircgroup.{h,c} To bring in line with new naming practice, rename the= src/util/cgroup.{h,c} files to vircgroup.{h,c} Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-21 11:17:12 +00:00
Daniel P. Berrange	df5928ea56	Allow passing a vroot into security manager hostdev labelling When LXC labels USB devices during hotplug, it is running in host context, so it needs to pass in a vroot path to the container root. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-17 17:50:51 +00:00
Daniel P. Berrange	4738c2a7e7	Replace 'struct qemud_driver *' with virQEMUDriverPtr Remove the obsolete 'qemud' naming prefix and underscore based type name. Introduce virQEMUDriverPtr as the replacement, in common with LXC driver naming style	2012-11-28 18:17:25 +00:00
Daniel P. Berrange	1c04f99970	Remove spurious whitespace between function name & open brackets The libvirt coding standard is to use 'function(...args...)' instead of 'function (...args...)'. A non-trivial number of places did not follow this rule and are fixed in this patch. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-11-02 13:36:49 +00:00
Osier Yang	bb81021bfe	qemu: Keep the affinity when creating cgroup for emulator thread When the cpu placement model is "auto", it sets the affinity for domain process with the advisory nodeset from numad, however, creating cgroup for the domain process (called emulator thread in some contexts) later overrides that with pinning it to all available pCPUs. How to reproduce: * Configure the domain with "auto" placement for <vcpu>, e.g. <vcpu placement='auto'>4</vcpu> * % virsh start dom * % cat /proc/$dompid/status Though the emulator cgroup cause conflicts, but we can't simply prohibit creating it, as other tunables are still useful, such as "emulator_period", which is used by API virDomainSetSchedulerParameter. So this patch doesn't prohibit creating the emulator cgroup, but inherit the nodeset from numad, and reset the affinity for domain process. * src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator to accept the passed nodenet * src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset	2012-10-24 21:46:24 +08:00
Eric Blake	67aea3fb78	blockjob: remove unused parameters after previous patch Minor cleanup made possible by previous simplifications. * src/qemu/qemu_cgroup.h (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Alter signature. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup, qemuSetupCgroup): Update all uses. * src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice) (qemuDomainDetachDiskDevice): Likewise. * src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive) (qemuDomainChangeDiskMediaLive) (qemuDomainSnapshotCreateSingleDiskActive) (qemuDomainSnapshotUndoSingleDiskActive): Likewise.	2012-10-19 17:35:11 -06:00
Eric Blake	38c4a9cc40	storage: use cache to walk backing chain We used to walk the backing file chain at least twice per disk, once to set up cgroup device whitelisting, and once to set up security labeling. Rather than walk the chain every iteration, which possibly includes calls to fork() in order to open root-squashed NFS files, we can exploit the cache of the previous patch. * src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter signature. * src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller to supply backing chain via disk, if recursion is desired. * src/security/security_dac.c (virSecurityDACSetSecurityImageLabel): Adjust caller. * src/security/security_selinux.c (virSecuritySELinuxSetSecurityImageLabel): Likewise. * src/security/virt-aa-helper.c (get_files): Likewise. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Likewise. (qemuSetupCgroup): Pre-populate chain.	2012-10-19 17:35:11 -06:00

1 2 3 4 5 ...

299 Commits