libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2025-01-03 19:45:21 +00:00

Author	SHA1	Message	Date
Peter Krempa	7095006921	qemu: cgroup: Refactor setup for IOThread cgroups Use the default or auto cpuset if they are provided for IOThreads.	2015-04-02 10:12:08 +02:00
Peter Krempa	c9f9fa25d3	qemu: cgroup: Store auto cpuset instead of re-creating it on demand The automatic cpuset can be stored along with automatic nodeset and it does not have to be recreated when used.	2015-04-02 10:12:08 +02:00
Martin Kletzander	3a0e5b0c20	qemu: Migrate memory on numatune change We've never set the cpuset.memory_migrate value to anything, keeping it on default. However, we allow changing cpuset.mems on live domain. That setting, however, don't have any consequence on a domain unless it's going to allocate new memory. I managed to make 'virsh numatune' move all the memory to any node I wanted even without disabling libnuma's numa_set_membind(), so this should be safe to use with it as well. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1198497 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2015-03-20 13:40:02 +01:00
John Ferlan	a9f528ab29	Convert virDomainPinDefPtr->vcpuid to virDomainPinDefPtr->id Since we're not specifically a vcpu related structure anymore...	2015-03-16 11:54:57 -04:00
John Ferlan	59ba70237a	Convert virDomainVcpuPinDefPtr to virDomainPinDefPtr As pointed out by jtomko in his review of the IOThreads pinning code: http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html there are some comments sprinkled in indicating IOThreads were using the same structure as the VcpuPin code... This is the first patch of a few that will change the virDomainVcpuPin* structures and code to just virDomainPin* - starting with the data structure naming...	2015-03-16 11:54:56 -04:00
Pavel Hrdina	cf521fc8ba	memtune: change the way how we store unlimited value There was a mess in the way how we store unlimited value for memory limits and how we handled values provided by user. Internally there were two possible ways how to store unlimited value: as 0 value or as VIR_DOMAIN_MEMORY_PARAM_UNLIMITED. Because we chose to store memory limits as unsigned long long, we cannot use -1 to represent unlimited. It's much easier for us to say that everything greater than VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid value despite that it makes no sense to set limit to 0. Remove unnecessary function virCompareLimitUlong. The update of test is to prevent the 0 to be miss-used as unlimited in future. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539 Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2015-03-06 11:52:24 +01:00
Peter Krempa	6bc80fa86d	conf: numa: Rename virDomainNumatune to virDomainNuma The structure will gradually become the only place for NUMA related config, thus rename it appropriately.	2015-02-20 17:43:04 +01:00
Pavel Hrdina	77a9dc0b8d	qemu_cgroup: initialize mem_mask to NULL If 'virNumaGetHostNodeset()' fails then the error path will try to free uninitialized pointer mem_mask. Introduced by commit `af2a1f058`. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2015-02-17 14:22:50 +01:00
Daniel P. Berrange	f7afeddce9	qemu: report TAP device indexes to systemd Record the index of each TAP device created and report them to systemd, so they show up in machinectl status for the VM.	2015-01-27 13:57:02 +00:00
Daniel P. Berrange	318df5a05f	Add support for systemd-machined CreateMachineWithNetwork systemd-machined introduced a new method CreateMachineWithNetwork that obsoletes CreateMachine. It expects to be given a list of VETH/TAP device indexes for the host side device(s) associated with a container/machine. This falls back to the old CreateMachine method when the new one is not supported.	2015-01-15 11:07:07 +00:00
Martin Kletzander	86759ec61a	qemu: Add missing goto error in qemuRestoreCgroupState Commit `af2a1f05` tried clearly separating each condition in qemuRestoreCgroupState() for the sake of readability, however somehow one condition body was missing. That means that the body of the next condition got executed only if both of there were true, which is impossible, thus resulting in a dead code and a logic error. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 20:44:33 +01:00
Martin Kletzander	af2a1f0587	qemu: Leave cpuset.mems in parent cgroup alone Instead of setting the value of cpuset.mems once when the domain starts and then re-calculating the value every time we need to change the child cgroup values, leave the cgroup alone and rather set the child data every time there is new cgroup created. We don't leave any task in the parent group anyway. This will ease both current and future code. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	c74d58ad47	qemu: Save numad advice into qemuDomainObjPrivate Thanks to that we don't need to drag the pointer everywhere and future code will get cleaner. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	f801a81208	qemu: Remove unnecessary qemuSetupCgroupPostInit function Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-12-16 11:15:27 +01:00
Martin Kletzander	5cca4cd16f	Remove unnecessary curly brackets in src/qemu/ Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-11-14 17:13:01 +01:00
Wang Rui	c6e9024867	qemu: fix domain startup failing with 'strict' mode in numatune If the memory mode is specified as 'strict' and with one node, we get the following error when starting domain. error: Unable to write to '$cgroup_path/cpuset.mems': Device or resource busy XML is configured with numatune as follows: <numatune> <memory mode='strict' nodeset='0'/> </numatune> It's broken by Commit `411cea638f` which moved qemuSetupCgroupForEmulator() before setting cpuset.mems in qemuSetupCgroupPostInit. Directory '$cgroup_path/emulator/' is created in qemuSetupCgroupForEmulator. But '$cgroup_path/emulator/cpuset.mems' it not set and has a default value (all nodes, such as 0-1). Then we setup '$cgroup_path/cpuset.mems' to the nodemask (in this case it's '0') in qemuSetupCgroupPostInit. It must fail. This patch makes '$cgroup_path/emulator/cpuset.mems' is set before '$cgroup_path/cpuset.mems'. The action is similar with that in qemuDomainSetNumaParamsLive. Signed-off-by: Wang Rui <moon.wangrui@huawei.com>	2014-11-11 12:14:09 +01:00
Wang Rui	38a0f6df64	qemu: don't setup cpuset.mems if memory mode in numatune is not 'strict' If the memory mode in numatune is specified as 'preferred' with one node (such as nodeset='0'), domain's memory is not all in node 0 absolutely. Assumption that node 0 doesn't have enough memory, memory can be allocated on node 1 when qemu process startup. Then if we set cpuset.mems to '0', it may invoke OOM. Commit `1a7be8c600` changed the former logic of checking memory mode in virDomainNumatuneGetNodeset. This patch adds the check as before. Signed-off-by: Wang Rui <moon.wangrui@huawei.com>	2014-11-11 12:14:09 +01:00
Martin Kletzander	9661ac2f46	qemu: unref cfg after TerminateMachine has been called Commit `4882618ed1` added the code that requests driver cfg, but forgot to unref it. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-10-21 13:54:09 +02:00
Guido Günther	4882618ed1	qemu: use systemd's TerminateMachine to kill all processes If we don't properly clean up all processes in the machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm starts fail with 'CreateMachine: File exists' Additional processes can e.g. be added via echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks but there are other cases like http://bugs.debian.org/761521 Invoke TerminateMachine to be on the safe side since systemd tracks the cgroup anyway. This is a noop if all processes have terminated already.	2014-10-01 20:17:46 +02:00
Ján Tomko	e26bbf49cc	Fix crash cpu_shares change event crash on domain startup Introduced by commit `0dce260`. qemuDomainEventQueue was called with qemuDomainObjPrivatePtr instead of virQEMUDriverPtr. https://bugzilla.redhat.com/show_bug.cgi?id=1147494	2014-09-29 13:58:43 +02:00
Daniel P. Berrange	0778c0be8d	Rename tunable event constants For the new VIR_DOMAIN_EVENT_ID_TUNABLE event we have a bunch of constants added VIR_DOMAIN_EVENT_CPUTUNE_<blah> VIR_DOMAIN_EVENT_BLKDEVIOTUNE_<blah> This naming convention is bad for two reasons - There is no common prefix unique for the events to both relate them, and distinguish them from other event constants - The values associated with the constants were chosen to match the names used with virConnectGetAllDomainStats so having EVENT in the constant name is not applicable in that respect This patch proposes renaming the constants to VIR_DOMAIN_TUNABLE_CPU_<blah> VIR_DOMAIN_TUNABLE_BLKDEV_<blah> ie, given them a common VIR_DOMAIN_TUNABLE prefix. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-09-26 10:58:15 +01:00
Pavel Hrdina	0dce260cc8	cputune_event: queue the event for cputune updates Now we have universal tunable event so we can use it for reporting changes to user. The cputune values will be prefixed with "cputune" to distinguish it from other tunable events. Signed-off-by: Pavel Hrdina <phrdina@redhat.com>	2014-09-23 21:58:09 +02:00
Ján Tomko	c1480871bb	Fixes for domains with no iothreads Plug a memory leak and silence a warning.	2014-09-18 14:49:01 +02:00
John Ferlan	500c91c57d	qemu_cgroup: Adjust spacing around incrementor Change "i+1" to "i + 1"	2014-09-15 21:05:46 -04:00
John Ferlan	5f6ad32c73	qemu_cgroup: Introduce cgroup functions for IOThreads In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's. These will support having an 'iotrhreadpin' element in the 'cpuset' in order to pin named IOThreads to specific CPU's. The IOThread pin names will follow the IOThread naming scheme starting at 1 (eg "iothread1") up through an including the def->iothreads value.	2014-09-15 13:18:56 -04:00
Peter Krempa	1c6999d340	conf: RNG: Always fill in default random source path for default backend Libvirt documents that the default entropy source for the 'random' backend of a RNG device is /dev/random. Instead of storing and propagating NULL across our code and checking it in multiple places fill the default in the post parse callback and use that in the other places.	2014-07-28 10:07:09 +02:00
Peter Krempa	bbddbefa2f	virtio-rng: allow multiple RNG devices qemu supports adding multiple RNG devices. This patch allows libvirt to support this.	2014-07-25 09:34:53 +02:00
Peter Krempa	99ff49eed1	qemu: cgroup: Don't use NULL path on default backed RNGs The "random" backend for virtio-rng can be started with no path specified which equals to /dev/random. The cgroup code didn't consider this and called few of the functions with NULL resulting into: $ virsh start rng-vm error: Failed to start domain rng-vm error: Path '(null)' is not accessible: Bad address Problem introduced by commit `c6320d3463`	2014-07-25 09:34:53 +02:00
John Ferlan	17bddc46f4	hostdev: Introduce virDomainHostdevSubsysSCSIiSCSI Create the structures and API's to hold and manage the iSCSI host device. This extends the 'scsi_host' definitions added in commit id '5c811dce'. A future patch will add the XML parsing, but that code requires some infrastructure to be in place first in order to handle the differences between a 'scsi_host' and an 'iSCSI host' device.	2014-07-24 07:04:44 -04:00
John Ferlan	42957661dc	hostdev: Introduce virDomainHostdevSubsysSCSIHost Split virDomainHostdevSubsysSCSI further. In preparation for having either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI to contain just a virDomainHostdevSubsysSCSIHost to describe the 'scsi_host' host device	2014-07-24 06:39:28 -04:00
John Ferlan	5805621cd9	hostdev: Introduce virDomainHostdevSubsysSCSI Create a separate typedef for the hostdev union data describing SCSI Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	1c8da0d44e	hostdev: Introduce virDomainHostdevSubsysPCI Create a separate typedef for the hostdev union data describing PCI. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	7540d07f09	hostdev: Introduce virDomainHostdevSubsysUSB Create a separate typedef for the hostdev union data describing USB. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
Martin Kletzander	7e72ac7878	qemu: leave restricting cpuset.mems after initialization When domain is started with numatune memory mode strict and the nodeset does not include host NUMA node with DMA and DMA32 zones, KVM initialization fails. This is because cgroup restrict even kernel allocations. We are already doing numa_set_membind() which does the same thing, only it does not restrict kernel allocations. This patch leaves the userspace numa_set_membind() in place and moves the cpuset.mems setting after the point where monitor comes up, but before vcpu and emulator sub-groups are created. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	aa668fccf0	qemu: split out cpuset.mems setting Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	1a7be8c600	numatune: add support for per-node memory bindings in private APIs Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	93e82727ec	numatune: Encapsulate numatune configuration in order to unify results There were numerous places where numatune configuration (and thus domain config as well) was changed in different ways. On some places this even resulted in persistent domain definition not to be stable (it would change with daemon's restart). In order to uniformly change how numatune config is dealt with, all the internals are now accessible directly only in numatune_conf.c and outside this file accessors must be used. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	e764ec7ae3	numatune: unify numatune struct and enum names Since there was already public virDomainNumatune*, I changed the private virNumaTune to match the same, so all the uses are unified and public API is kept: s/vir$Domain$\?Numa[tT]une/virDomainNumatune/g then shrunk long lines, and mainly functions, that were created after that: sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g' And to cope with the enum name, I haad to change the constants as well: s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g Last thing I did was at least a little shortening of already long name: s/virDomainNumatuneDef/virDomainNumatune/g Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	0c04906fa8	qemu: don't error out when cgroups don't exist When creating cgroups for vcpu and emulator threads whilst starting a domain, we explicitly skip creating those cgroups in case priv->cgroup is NULL (cgroups not supported) because SetAffinity() serves the same purpose. If the host supports only some cgroups (the ones we need are either unmounted or disabled in qemu.conf), we error out with weird message even though we could continue starting the domain. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1097028 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-09 15:09:54 +02:00
Peter Krempa	1ba14d6df2	qemu: cgroup: Setup only the top level disk image for read-write access Only the top level gets writes, so the rest of the backing chain requires only read-only access.	2014-07-09 10:38:55 +02:00
Peter Krempa	aa53c77e1d	qemu: cgroup: Add functions to set cgroup image stuff on individual imgs Add functions that will allow to set all the required cgroup stuff on individual images taking a virStorageSourcePtr. Also convert functions designed to setup whole backing chain to take advantage of the change.	2014-07-09 10:38:55 +02:00
Peter Krempa	63834faadb	storage: Move readonly and shared flags to disk source from disk def In the future we might need to track state of individual images. Move the readonly and shared flags to the virStorageSource struct so that we can keep them in a per-image basis.	2014-07-08 14:27:19 +02:00
Ján Tomko	d4edce5f1e	Always report an error if virBitmapFormat fails It already reports an error if STRDUP fails.	2014-06-06 14:35:19 +02:00
Michal Privoznik	4dae1eddde	qemuSetupCgroupForVcpu: s/virProcessInfoSetAffinity/virProcessSetAffinity/ In the `f56c773bf` we've made the substitution but forgot to fix one comment which is still referring to the old name. This may be potentially misleading. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2014-05-22 12:30:20 +02:00
Nehal J Wani	3d5c29a17c	Fix typos in src/* Fix minor typos in source comments Signed-off-by: Eric Blake <eblake@redhat.com>	2014-04-21 16:49:08 -06:00
Daniel P. Berrange	edfe82c7f9	Replace Usb with USB throughout Since it is an abbreviation, USB should always be fully capitalized or full lower case, never Usb. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:59 +01:00
Daniel P. Berrange	21a2446d92	Replace Scsi with SCSI throughout Since it is an abbreviation, SCSI should always be fully capitalized or full lower case, never Scsi. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:31 +01:00
Ján Tomko	97814d8ab3	Show the real cpu shares value in live XML Currently, the Linux kernel treats values of '0' and '1' as the minimum of 2. Values larger than the maximum are changed to the maximum. Re-reading the shares value after setting it reflects this in the live domain XML.	2014-03-26 10:10:13 +01:00
Ján Tomko	bdffab0d5c	Treat zero cpu shares as a valid value Currently, <cputune><shares>0</shares></cputune> is treated as if it were not specified. Treat is as a valid value if it was explicitly specified and write it to the cgroups.	2014-03-26 10:10:02 +01:00
Ján Tomko	5922d05aec	Indent top-level labels by one space in src/qemu/	2014-03-25 14:58:39 +01:00
Daniel P. Berrange	2835c1e730	Add virLogSource variables to all source files Any source file which calls the logging APIs now needs to have a VIR_LOG_INIT("source.name") declaration at the start of the file. This provides a static variable of the virLogSource type. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-03-18 14:29:22 +00:00
Osier Yang	10c9ceff6d	util: Add one argument for several scsi utils To support passing the path of the test data to the utils, one more argument is added to virSCSIDeviceGetSgName, virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related code is changed accordingly. Later tests for the scsi utils will be based on this patch. Signed-off-by: Osier Yang <jyang@redhat.com>	2014-01-30 15:48:28 +08:00
Pradipta Kr. Banerjee	c6320d3463	Add hw random number generator (/dev/hwrng) to cgroup ACL Creating a qemu VM with /dev/hwrng as backend RNG device throws the following error - "Could not open '/dev/hwrng': Permission denied" This patch fixes the issue Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2014-01-27 09:48:39 -07:00
Osier Yang	2b66504ded	util: Add "shareable" field for virSCSIDevice struct Unlike the host devices of other types, SCSI host device XML supports "shareable" tag. This patch introduces it for the virSCSIDevice struct for a later patch use (to detect if the SCSI device is shareable when preparing the SCSI host device in QEMU driver).	2014-01-23 17:52:33 +08:00
Gao feng	3b431929a2	blkio: Setting throttle blkio cgroup for domain This patch introduces virCgroupSetBlkioDeviceReadIops, virCgroupSetBlkioDeviceWriteIops, virCgroupSetBlkioDeviceReadBps and virCgroupSetBlkioDeviceWriteBps, we can use these interfaces to set up throttle blkio cgroup for domain. This patch also adds the new throttle blkio cgroup elements to the test xml. Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2014-01-20 10:52:44 +08:00
Gao feng	b9ce5d388f	rename virBlkioDeviceWeightPtr to virBlkioDevicePtr The throttle blkio cgroup will reuse this struct. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-12-12 12:29:59 +00:00
Eric Blake	5d509e9ee2	maint: fix comma style issues: qemu Most of our code base uses space after comma but not before; fix the remaining uses before adding a syntax check. * src/qemu/qemu_cgroup.c: Consistently use commas. * src/qemu/qemu_command.c: Likewise. * src/qemu/qemu_conf.c: Likewise. * src/qemu/qemu_driver.c: Likewise. * src/qemu/qemu_monitor.c: Likewise. Signed-off-by: Eric Blake <eblake@redhat.com>	2013-11-20 09:14:55 -07:00
Cole Robinson	a924d9d083	qemu: cgroup: Fix crash if starting nographics guest We can dereference graphics[0] even if guest has no graphics device configured. I screwed this up in `a216e64872` https://bugzilla.redhat.com/show_bug.cgi?id=1014088	2013-10-01 11:22:18 -04:00
Peter Krempa	4baa8d7637	cleanup: Kill usage of access(PATH, F_OK) in favor of virFileExists() Semantics of the libvirt helper are more clear. This change also allows to clean up some pieces of code.	2013-09-16 10:37:39 +02:00
Cole Robinson	a216e64872	qemu: Set QEMU_AUDIO_DRV=none with -nographic On my machine, a guest fails to boot if it has a sound card, but not graphical device/display is configured, because pulseaudio fails to initialize since it can't access $HOME. A workaround is removing the audio device, however on ARM boards there isn't any option to do that, so -nographic always fails. Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately this has massive test suite fallout. Add a qemu.conf parameter nographics_allow_host_audio, that if enabled will pass through QEMU_AUDIO_DRV from sysconfig (similar to vnc_allow_host_audio)	2013-09-02 16:53:39 -04:00
Michal Privoznik	94a24dd3a9	qemuSetupMemoryCgroup: Handle hard_limit properly Since 16bcb3 we have a regression. The hard_limit is set unconditionally. By default the limit is zero. Hence, if user hasn't configured any, we set the zero in cgroup subsystem making the kernel kill the corresponding qemu process immediately. The proper fix is to set hard_limit iff user has configured any.	2013-08-20 15:03:17 +02:00
Michal Privoznik	16bcb3b616	qemu: Drop qemuDomainMemoryLimit This function is to guess the correct limit for maximal memory usage by qemu for given domain. This can never be guessed correctly, not to mention all the pains and sleepless nights this code has caused. Once somebody discovers algorithm to solve the Halting Problem, we can compute the limit algorithmically. But till then, this code should never see the light of the release again.	2013-08-19 11:16:58 +02:00
Daniel P. Berrange	1166eeba61	Fix crashing upgrading from older libvirts with running guests If upgrading from a libvirt that is older than 1.0.5, we can not assume that vm->def->resource is non-NULL. This bogus assumption caused libvirtd to crash Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-08-02 15:32:26 +01:00
Daniel P. Berrange	2fe2470181	Enable support for systemd-machined in cgroups creation Make the virCgroupNewMachine method try to use systemd-machined first. If that fails, then fallback to using the traditional cgroup setup code path. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-31 19:29:19 +01:00
Daniel P. Berrange	5ec5a22493	Add 'controllers' arg to virCgroupNewDetect When detecting cgroups we must honour any controllers whitelist the driver may have. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:55:47 +01:00
Daniel P. Berrange	a45b99ead9	Introduce a more convenient virCgroupNewDetectMachine Instead of requiring drivers to use a combination of calls to virCgroupNewDetect and virCgroupIsValidMachine, combine the two into virCgroupNewDetectMachine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:47:30 +01:00
Daniel P. Berrange	02098ac260	Convert QEMU driver to use virCgroupNewMachine Convert the QEMU driver code to use the new atomic API for setup of cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 11:42:47 +01:00
Daniel P. Berrange	2049ef9942	Create + setup cgroups atomically for QEMU process Currently the QEMU driver creates the VM's cgroup prior to forking, and then uses a virCommand hook to move the child into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically. Fortunately we have a handshake taking place between the QEMU driver and the child process prior to QEMU being exec()d, which was introduced to allow setup of disk locking. By good fortune this synchronization point can be used to enable the QEMU driver to do atomic setup of cgroups removing the use of the hook script. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	87b2e6fa84	Auto-detect existing cgroup placement Use the new virCgroupNewDetect function to determine cgroup placement of existing running VMs. This will allow the legacy cgroups creation APIs to be removed entirely Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	0d7f45aea7	Convert remainder of cgroups code to report errors Convert the remaining methods in vircgroup.c to report errors instead of returning errno values. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Daniel P. Berrange	b64dabff27	Report full errors from virCgroupNew* Instead of returning raw errno values, report full libvirt errors in virCgroupNew* functions. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Peter Krempa	bac2182041	qemu: Cleanup coding style nits in qemu_cgroup.c	2013-07-18 14:58:12 +02:00
Osier Yang	a39f69d2bb	qemu: Set cpuset.cpus for domain process When either "cpuset" of <vcpu> is specified, or the "placement" of <vcpu> is "auto", only setting the cpuset.mems might cause the guest starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is "auto"): 1) Related XMLs <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> 2) Host NUMA topology % numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 16374 MB node 0 free: 11899 MB node 1 cpus: 32 36 40 44 48 52 56 60 node 1 size: 16384 MB node 1 free: 15318 MB node 2 cpus: 2 6 10 14 18 22 26 30 node 2 size: 16384 MB node 2 free: 15766 MB node 3 cpus: 34 38 42 46 50 54 58 62 node 3 size: 16384 MB node 3 free: 15347 MB node 4 cpus: 3 7 11 15 19 23 27 31 node 4 size: 16384 MB node 4 free: 15041 MB node 5 cpus: 35 39 43 47 51 55 59 63 node 5 size: 16384 MB node 5 free: 15202 MB node 6 cpus: 1 5 9 13 17 21 25 29 node 6 size: 16384 MB node 6 free: 15197 MB node 7 cpus: 33 37 41 45 49 53 57 61 node 7 size: 16368 MB node 7 free: 15669 MB 4) cpuset.cpus will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus' to '0-63' 5) The advisory nodeset got from querying numad (from debug log) 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 : Nodeset returned from numad: 1 6) cpuset.mems will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems' to '0-7' I.E, the domain process's memory is restricted on the first NUMA node, however, it can use all of the CPUs, which will likely cause the domain process to fail to start because of the kernel fails to allocate memory with the the memory policy as "strict". % tail -n 20 /var/log/libvirt/qemu/toy.log ... 2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 : Handshake with parent is done char device redirected to /dev/pts/2 (label charserial0) kvm_init_vcpu failed: Cannot allocate memory ... Signed-off-by: Peter Krempa <pkrempa@redhat.com>	2013-07-18 14:57:57 +02:00
Daniel P. Berrange	50760e2a8a	Convert 'int i' to 'size_t i' in src/qemu files Convert the type of loop iterators named 'i', 'j', k', 'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or 'unsigned int', also santizing 'ii', 'jj', 'kk' to use the normal 'i', 'j', 'k' naming Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-10 17:55:15 +01:00
Michal Privoznik	e987a30dfa	Adapt to VIR_ALLOC and virAsprintf in src/qemu/*	2013-07-10 11:07:32 +02:00
Jiri Denemark	e0e438af00	qemu: Move memory limit computation to a reusable function	2013-07-08 12:35:27 +02:00
Laine Stump	1d829e1306	pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev I realized after the fact that it's probably better in the long run to give this function a name that matches the name of the link used in sysfs to hold the group (iommu_group). I'm changing it now because I'm about to add several more functions that deal with iommu groups.	2013-06-25 18:07:38 -04:00
Osier Yang	8da9516a84	qemu: Abstract code for the cpu controller setting into a helper	2013-06-05 19:25:48 +08:00
Michal Privoznik	a88fb3009f	Adapt to VIR_STRDUP and VIR_STRNDUP in src/qemu/*	2013-05-23 09:56:38 +02:00
Osier Yang	66194f71df	src/qemu: Remove the whitespace before ';'	2013-05-21 23:41:44 +08:00
Osier Yang	58f8e0cd58	qemu: Don't remove the "return 0" Commit `f60a50c795` intended to remove the warning only, but not with the "return 0" together.	2013-05-21 23:08:57 +08:00
Osier Yang	479d5991cd	qemu: Abstract code for cpuset controller setting into a helper	2013-05-20 19:57:00 +08:00
Osier Yang	9f2455d359	qemu: Abstract code for devices controller setting into a helper	2013-05-20 19:52:35 +08:00
Osier Yang	f60a50c795	qemu: Abstract code for memory controller setting into a helper	2013-05-20 19:39:54 +08:00
Osier Yang	2fd16df7b5	qemu: Abstract the code for blkio controller setting into a helper	2013-05-20 19:24:45 +08:00
Daniel P. Berrange	c2cf5f1c2a	Fix failure to detect missing cgroup partitions Change `bbe97ae968` caused the QEMU driver to ignore ENOENT errors from cgroups, in order to cope with missing /proc/cgroups. This is not good though because many other things can cause ENOENT and should not be ignored. The callers expect to see ENXIO when cgroups are not present, so adjust the code to report that errno when /proc/cgroups is missing Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-05-17 10:25:15 +01:00
Jim Fehlig	bbe97ae968	Fix starting domains when kernel has no cgroups support Found that I was unable to start existing domains after updating to a kernel with no cgroups support # zgrep CGROUP /proc/config.gz # CONFIG_CGROUPS is not set # virsh start test error: Failed to start domain test error: Unable to initialize /machine cgroup: Cannot allocate memory virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when attempting to open /proc/cgroups on such a system, but it was being dropped in virCgroupSetPartitionSuffix(). Change virCgroupSetPartitionSuffix() to propagate errors returned by its callees. Also check for ENOENT in qemuInitCgroup() when determining if cgroups support is available.	2013-05-13 09:27:46 -06:00
Han Cheng	6eb42e38e8	qemu: Allow the scsi-generic device in cgroup This adds the scsi-generic device into the device controller's whitelist, so that it's allowed to used by the qemu process. Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat.com>	2013-05-13 19:08:34 +08:00
Laine Stump	52ba0f6e1c	qemu: fix stupid typos in VFIO cgroup setup/teardown I must have looked at this a couple dozen times before I noticed it had "!=" instead of "==". Not doing this setup prevented qemu from doing anything with the vfio group device.	2013-05-03 14:32:54 -04:00
Michal Privoznik	7c9a2d88cd	virutil: Move string related functions to virstring.c The source code base needs to be adapted as well. Some files include virutil.h just for the string related functions (here, the include is substituted to match the new file), some include virutil.h without any need (here, the include is removed), and some require both.	2013-05-02 16:56:55 +02:00
Laine Stump	811143c0b6	qemu: put usb cgroup setup in common function The USB-specific cgroup setup had been inserted inline in qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a common cgroup setup function called for all hostdevs, so it makes sens to put the usb-specific setup there and just rely on that function being called. The one thing I'm uncertain of here (and a reason for not pushing until after release) is that previously hostdev->missing was checked only when starting a domain (and cgroup setup for the device skipped if missing was true), but with this consolidation, it is now checked in the case of hotplug as well. I don't know if this will have any practical effect (does it make sense to hotplug a "missing" usb device?)	2013-04-29 21:52:28 -04:00
Laine Stump	6e13860cb4	qemu: add vfio devices to cgroup ACL when appropriate PCIO device assignment using VFIO requires read/write access by the qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the VFIO group number that the assigned device belongs to (and can be found with the function virPCIDeviceGetVFIOGroupDev) /dev/vfio/vfio can be accessible to any guest without danger (according to vfio developers), so it is added to the static ACL. The group device must be dynamically added to the cgroup ACL for each vfio hostdev in two places: 1) for any devices in the persistent config when the domain is started (done during qemuSetupCgroup()) 2) at device attach time for any hotplug devices (done in qemuDomainAttachHostDevice) The group device must be removed from the ACL when a device it "hot-unplugged" (in qemuDomainDetachHostDevice()) Note that USB devices are already doing their own cgroup setup and teardown in the hostdev-usb specific function. I chose to make the new functions generic and call them in a common location though. We can then move the USB-specific code (which is duplicated in two locations) to this single location. I'll be posting a followup patch to do that.	2013-04-29 21:52:28 -04:00
Daniel P. Berrange	1e05073fbb	Replace more cases of /system with /machine The change in commit `aed4986322` was incomplete, missing a couple of cases of /system. This caused failure to start VMs. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 17:11:36 +01:00
Daniel P. Berrange	aed4986322	Change default resource partition to /machine After discussions with systemd developers it was decided that a better default policy for resource partitions is to have 3 default partitions at the top level /system - system services /machine - virtual machines / containers /user - user login session This ensures that the default policy isolates guest from user login sessions & system services, so a mis-behaving guest can't consume 100% of CPU usage if other things are contending for it. Thus we change the default partition from /system to /machine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 12:10:12 +01:00
Daniel P. Berrange	767596bdb4	Remove non-functional code for setting up non-root cgroups The virCgroupNewDriver method had a 'bool privileged' param. If a false value was ever passed in, it would simply not work, since non-root users don't have any privileges to create new cgroups. Just delete this broken code entirely and make the QEMU driver skip cgroup setup in non-privileged mode Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	db44eb1b5f	Change default cgroup layout for QEMU/LXC and honour XML config Historically QEMU/LXC guests have been placed in a cgroup layout that is $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME This is bad for a number of reasons - The cgroup hierarchy gets very deep which seriously impacts kernel performance due to cgroups scalability limitations. - It is hard to setup cgroup policies which apply across services and virtual machines, since all VMs are underneath the libvirtd service. To address this the default cgroup location is changed to be /system/$VMNAME.{lxc,qemu}.libvirt This puts virtual machines at the same level in the hierarchy as system services, allowing consistent policy to be setup across all of them. This also honours the new resource partition location from the XML configuration, for example <resource> <partition>/virtualmachines/production</partitions> </resource> will result in the VM being placed at /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt NB, with the exception of the default, /system, path which is intended to always exist, libvirt will not attempt to auto-create the partitions in the XML. It is the responsibility of the admin/app to configure the partitions. Later libvirt APIs will provide a way todo this. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	aa8604dd45	Add a new virCgroupNewPartition for setting up resource partitions A resource partition is an absolute cgroup path, ignoring the current process placement. Expose a virCgroupNewPartition API for constructing such cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	04c18d25f1	Rename virCgroupForXXX to virCgroupNewXXX Rename all the virCgroupForXXX methods to use the form virCgroupNewXXX since they are all constructors. Also make sure the output parameter is the last one in the list, and annotate all pointers as non-null. Fix up all callers, and make sure they use true/false not 0/1 for the boolean parameters Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	632f78caaf	Store a virCgroupPtr instance in qemuDomainObjPrivatePtr Instead of calling virCgroupForDomain every time we need the virCgrouPtr instance, just do it once at Vm startup and cache a reference to the object in qemuDomainObjPrivatePtr until shutdown of the VM. Removing the virCgroupPtr from the QEMU driver state also means we don't have stale mount info, if someone mounts the cgroups filesystem after libvirtd has been started Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Stefan Berger	22feb0d3e7	QEMU Cgroup support for TPM passthrough Some refactoring for virDomainChrSourceDef type of devices so we can use common code. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>	2013-04-12 16:55:46 -04:00

1 2 3 4 5

221 Commits