libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2025-01-06 04:55:22 +00:00

Author	SHA1	Message	Date
Ján Tomko	c1480871bb	Fixes for domains with no iothreads Plug a memory leak and silence a warning.	2014-09-18 14:49:01 +02:00
John Ferlan	500c91c57d	qemu_cgroup: Adjust spacing around incrementor Change "i+1" to "i + 1"	2014-09-15 21:05:46 -04:00
John Ferlan	5f6ad32c73	qemu_cgroup: Introduce cgroup functions for IOThreads In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's. These will support having an 'iotrhreadpin' element in the 'cpuset' in order to pin named IOThreads to specific CPU's. The IOThread pin names will follow the IOThread naming scheme starting at 1 (eg "iothread1") up through an including the def->iothreads value.	2014-09-15 13:18:56 -04:00
Peter Krempa	1c6999d340	conf: RNG: Always fill in default random source path for default backend Libvirt documents that the default entropy source for the 'random' backend of a RNG device is /dev/random. Instead of storing and propagating NULL across our code and checking it in multiple places fill the default in the post parse callback and use that in the other places.	2014-07-28 10:07:09 +02:00
Peter Krempa	bbddbefa2f	virtio-rng: allow multiple RNG devices qemu supports adding multiple RNG devices. This patch allows libvirt to support this.	2014-07-25 09:34:53 +02:00
Peter Krempa	99ff49eed1	qemu: cgroup: Don't use NULL path on default backed RNGs The "random" backend for virtio-rng can be started with no path specified which equals to /dev/random. The cgroup code didn't consider this and called few of the functions with NULL resulting into: $ virsh start rng-vm error: Failed to start domain rng-vm error: Path '(null)' is not accessible: Bad address Problem introduced by commit `c6320d3463`	2014-07-25 09:34:53 +02:00
John Ferlan	17bddc46f4	hostdev: Introduce virDomainHostdevSubsysSCSIiSCSI Create the structures and API's to hold and manage the iSCSI host device. This extends the 'scsi_host' definitions added in commit id '5c811dce'. A future patch will add the XML parsing, but that code requires some infrastructure to be in place first in order to handle the differences between a 'scsi_host' and an 'iSCSI host' device.	2014-07-24 07:04:44 -04:00
John Ferlan	42957661dc	hostdev: Introduce virDomainHostdevSubsysSCSIHost Split virDomainHostdevSubsysSCSI further. In preparation for having either SCSI or iSCSI data, create a union in virDomainHostdevSubsysSCSI to contain just a virDomainHostdevSubsysSCSIHost to describe the 'scsi_host' host device	2014-07-24 06:39:28 -04:00
John Ferlan	5805621cd9	hostdev: Introduce virDomainHostdevSubsysSCSI Create a separate typedef for the hostdev union data describing SCSI Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	1c8da0d44e	hostdev: Introduce virDomainHostdevSubsysPCI Create a separate typedef for the hostdev union data describing PCI. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
John Ferlan	7540d07f09	hostdev: Introduce virDomainHostdevSubsysUSB Create a separate typedef for the hostdev union data describing USB. Then adjust the code to use the new pointer	2014-07-24 06:39:27 -04:00
Martin Kletzander	7e72ac7878	qemu: leave restricting cpuset.mems after initialization When domain is started with numatune memory mode strict and the nodeset does not include host NUMA node with DMA and DMA32 zones, KVM initialization fails. This is because cgroup restrict even kernel allocations. We are already doing numa_set_membind() which does the same thing, only it does not restrict kernel allocations. This patch leaves the userspace numa_set_membind() in place and moves the cpuset.mems setting after the point where monitor comes up, but before vcpu and emulator sub-groups are created. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	aa668fccf0	qemu: split out cpuset.mems setting Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:46 +02:00
Martin Kletzander	1a7be8c600	numatune: add support for per-node memory bindings in private APIs Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	93e82727ec	numatune: Encapsulate numatune configuration in order to unify results There were numerous places where numatune configuration (and thus domain config as well) was changed in different ways. On some places this even resulted in persistent domain definition not to be stable (it would change with daemon's restart). In order to uniformly change how numatune config is dealt with, all the internals are now accessible directly only in numatune_conf.c and outside this file accessors must be used. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	e764ec7ae3	numatune: unify numatune struct and enum names Since there was already public virDomainNumatune*, I changed the private virNumaTune to match the same, so all the uses are unified and public API is kept: s/vir$Domain$\?Numa[tT]une/virDomainNumatune/g then shrunk long lines, and mainly functions, that were created after that: sed -i 's/virDomainNumatuneMemPlacementMode/virDomainNumatunePlacement/g' And to cope with the enum name, I haad to change the constants as well: s/VIR_NUMA_TUNE_MEM_PLACEMENT_MODE/VIR_DOMAIN_NUMATUNE_PLACEMENT/g Last thing I did was at least a little shortening of already long name: s/virDomainNumatuneDef/virDomainNumatune/g Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-16 20:15:45 +02:00
Martin Kletzander	0c04906fa8	qemu: don't error out when cgroups don't exist When creating cgroups for vcpu and emulator threads whilst starting a domain, we explicitly skip creating those cgroups in case priv->cgroup is NULL (cgroups not supported) because SetAffinity() serves the same purpose. If the host supports only some cgroups (the ones we need are either unmounted or disabled in qemu.conf), we error out with weird message even though we could continue starting the domain. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1097028 Signed-off-by: Martin Kletzander <mkletzan@redhat.com>	2014-07-09 15:09:54 +02:00
Peter Krempa	1ba14d6df2	qemu: cgroup: Setup only the top level disk image for read-write access Only the top level gets writes, so the rest of the backing chain requires only read-only access.	2014-07-09 10:38:55 +02:00
Peter Krempa	aa53c77e1d	qemu: cgroup: Add functions to set cgroup image stuff on individual imgs Add functions that will allow to set all the required cgroup stuff on individual images taking a virStorageSourcePtr. Also convert functions designed to setup whole backing chain to take advantage of the change.	2014-07-09 10:38:55 +02:00
Peter Krempa	63834faadb	storage: Move readonly and shared flags to disk source from disk def In the future we might need to track state of individual images. Move the readonly and shared flags to the virStorageSource struct so that we can keep them in a per-image basis.	2014-07-08 14:27:19 +02:00
Ján Tomko	d4edce5f1e	Always report an error if virBitmapFormat fails It already reports an error if STRDUP fails.	2014-06-06 14:35:19 +02:00
Michal Privoznik	4dae1eddde	qemuSetupCgroupForVcpu: s/virProcessInfoSetAffinity/virProcessSetAffinity/ In the `f56c773bf` we've made the substitution but forgot to fix one comment which is still referring to the old name. This may be potentially misleading. Signed-off-by: Michal Privoznik <mprivozn@redhat.com>	2014-05-22 12:30:20 +02:00
Nehal J Wani	3d5c29a17c	Fix typos in src/* Fix minor typos in source comments Signed-off-by: Eric Blake <eblake@redhat.com>	2014-04-21 16:49:08 -06:00
Daniel P. Berrange	edfe82c7f9	Replace Usb with USB throughout Since it is an abbreviation, USB should always be fully capitalized or full lower case, never Usb. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:59 +01:00
Daniel P. Berrange	21a2446d92	Replace Scsi with SCSI throughout Since it is an abbreviation, SCSI should always be fully capitalized or full lower case, never Scsi. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-04-08 11:10:31 +01:00
Ján Tomko	97814d8ab3	Show the real cpu shares value in live XML Currently, the Linux kernel treats values of '0' and '1' as the minimum of 2. Values larger than the maximum are changed to the maximum. Re-reading the shares value after setting it reflects this in the live domain XML.	2014-03-26 10:10:13 +01:00
Ján Tomko	bdffab0d5c	Treat zero cpu shares as a valid value Currently, <cputune><shares>0</shares></cputune> is treated as if it were not specified. Treat is as a valid value if it was explicitly specified and write it to the cgroups.	2014-03-26 10:10:02 +01:00
Ján Tomko	5922d05aec	Indent top-level labels by one space in src/qemu/	2014-03-25 14:58:39 +01:00
Daniel P. Berrange	2835c1e730	Add virLogSource variables to all source files Any source file which calls the logging APIs now needs to have a VIR_LOG_INIT("source.name") declaration at the start of the file. This provides a static variable of the virLogSource type. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2014-03-18 14:29:22 +00:00
Osier Yang	10c9ceff6d	util: Add one argument for several scsi utils To support passing the path of the test data to the utils, one more argument is added to virSCSIDeviceGetSgName, virSCSIDeviceGetDevName, and virSCSIDeviceNew, and the related code is changed accordingly. Later tests for the scsi utils will be based on this patch. Signed-off-by: Osier Yang <jyang@redhat.com>	2014-01-30 15:48:28 +08:00
Pradipta Kr. Banerjee	c6320d3463	Add hw random number generator (/dev/hwrng) to cgroup ACL Creating a qemu VM with /dev/hwrng as backend RNG device throws the following error - "Could not open '/dev/hwrng': Permission denied" This patch fixes the issue Signed-off-by: Pradipta Kr. Banerjee <bpradip@in.ibm.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2014-01-27 09:48:39 -07:00
Osier Yang	2b66504ded	util: Add "shareable" field for virSCSIDevice struct Unlike the host devices of other types, SCSI host device XML supports "shareable" tag. This patch introduces it for the virSCSIDevice struct for a later patch use (to detect if the SCSI device is shareable when preparing the SCSI host device in QEMU driver).	2014-01-23 17:52:33 +08:00
Gao feng	3b431929a2	blkio: Setting throttle blkio cgroup for domain This patch introduces virCgroupSetBlkioDeviceReadIops, virCgroupSetBlkioDeviceWriteIops, virCgroupSetBlkioDeviceReadBps and virCgroupSetBlkioDeviceWriteBps, we can use these interfaces to set up throttle blkio cgroup for domain. This patch also adds the new throttle blkio cgroup elements to the test xml. Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2014-01-20 10:52:44 +08:00
Gao feng	b9ce5d388f	rename virBlkioDeviceWeightPtr to virBlkioDevicePtr The throttle blkio cgroup will reuse this struct. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-12-12 12:29:59 +00:00
Eric Blake	5d509e9ee2	maint: fix comma style issues: qemu Most of our code base uses space after comma but not before; fix the remaining uses before adding a syntax check. * src/qemu/qemu_cgroup.c: Consistently use commas. * src/qemu/qemu_command.c: Likewise. * src/qemu/qemu_conf.c: Likewise. * src/qemu/qemu_driver.c: Likewise. * src/qemu/qemu_monitor.c: Likewise. Signed-off-by: Eric Blake <eblake@redhat.com>	2013-11-20 09:14:55 -07:00
Cole Robinson	a924d9d083	qemu: cgroup: Fix crash if starting nographics guest We can dereference graphics[0] even if guest has no graphics device configured. I screwed this up in `a216e64872` https://bugzilla.redhat.com/show_bug.cgi?id=1014088	2013-10-01 11:22:18 -04:00
Peter Krempa	4baa8d7637	cleanup: Kill usage of access(PATH, F_OK) in favor of virFileExists() Semantics of the libvirt helper are more clear. This change also allows to clean up some pieces of code.	2013-09-16 10:37:39 +02:00
Cole Robinson	a216e64872	qemu: Set QEMU_AUDIO_DRV=none with -nographic On my machine, a guest fails to boot if it has a sound card, but not graphical device/display is configured, because pulseaudio fails to initialize since it can't access $HOME. A workaround is removing the audio device, however on ARM boards there isn't any option to do that, so -nographic always fails. Set QEMU_AUDIO_DRV=none if no <graphics> are configured. Unfortunately this has massive test suite fallout. Add a qemu.conf parameter nographics_allow_host_audio, that if enabled will pass through QEMU_AUDIO_DRV from sysconfig (similar to vnc_allow_host_audio)	2013-09-02 16:53:39 -04:00
Michal Privoznik	94a24dd3a9	qemuSetupMemoryCgroup: Handle hard_limit properly Since 16bcb3 we have a regression. The hard_limit is set unconditionally. By default the limit is zero. Hence, if user hasn't configured any, we set the zero in cgroup subsystem making the kernel kill the corresponding qemu process immediately. The proper fix is to set hard_limit iff user has configured any.	2013-08-20 15:03:17 +02:00
Michal Privoznik	16bcb3b616	qemu: Drop qemuDomainMemoryLimit This function is to guess the correct limit for maximal memory usage by qemu for given domain. This can never be guessed correctly, not to mention all the pains and sleepless nights this code has caused. Once somebody discovers algorithm to solve the Halting Problem, we can compute the limit algorithmically. But till then, this code should never see the light of the release again.	2013-08-19 11:16:58 +02:00
Daniel P. Berrange	1166eeba61	Fix crashing upgrading from older libvirts with running guests If upgrading from a libvirt that is older than 1.0.5, we can not assume that vm->def->resource is non-NULL. This bogus assumption caused libvirtd to crash Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-08-02 15:32:26 +01:00
Daniel P. Berrange	2fe2470181	Enable support for systemd-machined in cgroups creation Make the virCgroupNewMachine method try to use systemd-machined first. If that fails, then fallback to using the traditional cgroup setup code path. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-31 19:29:19 +01:00
Daniel P. Berrange	5ec5a22493	Add 'controllers' arg to virCgroupNewDetect When detecting cgroups we must honour any controllers whitelist the driver may have. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:55:47 +01:00
Daniel P. Berrange	a45b99ead9	Introduce a more convenient virCgroupNewDetectMachine Instead of requiring drivers to use a combination of calls to virCgroupNewDetect and virCgroupIsValidMachine, combine the two into virCgroupNewDetectMachine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 19:47:30 +01:00
Daniel P. Berrange	02098ac260	Convert QEMU driver to use virCgroupNewMachine Convert the QEMU driver code to use the new atomic API for setup of cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-25 11:42:47 +01:00
Daniel P. Berrange	2049ef9942	Create + setup cgroups atomically for QEMU process Currently the QEMU driver creates the VM's cgroup prior to forking, and then uses a virCommand hook to move the child into the cgroup. This won't work with systemd whose APIs do the creation of cgroups + attachment of processes atomically. Fortunately we have a handshake taking place between the QEMU driver and the child process prior to QEMU being exec()d, which was introduced to allow setup of disk locking. By good fortune this synchronization point can be used to enable the QEMU driver to do atomic setup of cgroups removing the use of the hook script. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	87b2e6fa84	Auto-detect existing cgroup placement Use the new virCgroupNewDetect function to determine cgroup placement of existing running VMs. This will allow the legacy cgroups creation APIs to be removed entirely Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-23 22:46:31 +01:00
Daniel P. Berrange	0d7f45aea7	Convert remainder of cgroups code to report errors Convert the remaining methods in vircgroup.c to report errors instead of returning errno values. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Daniel P. Berrange	b64dabff27	Report full errors from virCgroupNew* Instead of returning raw errno values, report full libvirt errors in virCgroupNew* functions. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-22 13:09:58 +01:00
Peter Krempa	bac2182041	qemu: Cleanup coding style nits in qemu_cgroup.c	2013-07-18 14:58:12 +02:00
Osier Yang	a39f69d2bb	qemu: Set cpuset.cpus for domain process When either "cpuset" of <vcpu> is specified, or the "placement" of <vcpu> is "auto", only setting the cpuset.mems might cause the guest starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is "auto"): 1) Related XMLs <vcpu placement='auto'>4</vcpu> <numatune> <memory mode='strict' placement='auto'/> </numatune> 2) Host NUMA topology % numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 4 8 12 16 20 24 28 node 0 size: 16374 MB node 0 free: 11899 MB node 1 cpus: 32 36 40 44 48 52 56 60 node 1 size: 16384 MB node 1 free: 15318 MB node 2 cpus: 2 6 10 14 18 22 26 30 node 2 size: 16384 MB node 2 free: 15766 MB node 3 cpus: 34 38 42 46 50 54 58 62 node 3 size: 16384 MB node 3 free: 15347 MB node 4 cpus: 3 7 11 15 19 23 27 31 node 4 size: 16384 MB node 4 free: 15041 MB node 5 cpus: 35 39 43 47 51 55 59 63 node 5 size: 16384 MB node 5 free: 15202 MB node 6 cpus: 1 5 9 13 17 21 25 29 node 6 size: 16384 MB node 6 free: 15197 MB node 7 cpus: 33 37 41 45 49 53 57 61 node 7 size: 16368 MB node 7 free: 15669 MB 4) cpuset.cpus will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus' to '0-63' 5) The advisory nodeset got from querying numad (from debug log) 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 : Nodeset returned from numad: 1 6) cpuset.mems will be set as: (from debug log) 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems' to '0-7' I.E, the domain process's memory is restricted on the first NUMA node, however, it can use all of the CPUs, which will likely cause the domain process to fail to start because of the kernel fails to allocate memory with the the memory policy as "strict". % tail -n 20 /var/log/libvirt/qemu/toy.log ... 2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 : Handshake with parent is done char device redirected to /dev/pts/2 (label charserial0) kvm_init_vcpu failed: Cannot allocate memory ... Signed-off-by: Peter Krempa <pkrempa@redhat.com>	2013-07-18 14:57:57 +02:00
Daniel P. Berrange	50760e2a8a	Convert 'int i' to 'size_t i' in src/qemu files Convert the type of loop iterators named 'i', 'j', k', 'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or 'unsigned int', also santizing 'ii', 'jj', 'kk' to use the normal 'i', 'j', 'k' naming Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-07-10 17:55:15 +01:00
Michal Privoznik	e987a30dfa	Adapt to VIR_ALLOC and virAsprintf in src/qemu/*	2013-07-10 11:07:32 +02:00
Jiri Denemark	e0e438af00	qemu: Move memory limit computation to a reusable function	2013-07-08 12:35:27 +02:00
Laine Stump	1d829e1306	pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev I realized after the fact that it's probably better in the long run to give this function a name that matches the name of the link used in sysfs to hold the group (iommu_group). I'm changing it now because I'm about to add several more functions that deal with iommu groups.	2013-06-25 18:07:38 -04:00
Osier Yang	8da9516a84	qemu: Abstract code for the cpu controller setting into a helper	2013-06-05 19:25:48 +08:00
Michal Privoznik	a88fb3009f	Adapt to VIR_STRDUP and VIR_STRNDUP in src/qemu/*	2013-05-23 09:56:38 +02:00
Osier Yang	66194f71df	src/qemu: Remove the whitespace before ';'	2013-05-21 23:41:44 +08:00
Osier Yang	58f8e0cd58	qemu: Don't remove the "return 0" Commit `f60a50c795` intended to remove the warning only, but not with the "return 0" together.	2013-05-21 23:08:57 +08:00
Osier Yang	479d5991cd	qemu: Abstract code for cpuset controller setting into a helper	2013-05-20 19:57:00 +08:00
Osier Yang	9f2455d359	qemu: Abstract code for devices controller setting into a helper	2013-05-20 19:52:35 +08:00
Osier Yang	f60a50c795	qemu: Abstract code for memory controller setting into a helper	2013-05-20 19:39:54 +08:00
Osier Yang	2fd16df7b5	qemu: Abstract the code for blkio controller setting into a helper	2013-05-20 19:24:45 +08:00
Daniel P. Berrange	c2cf5f1c2a	Fix failure to detect missing cgroup partitions Change `bbe97ae968` caused the QEMU driver to ignore ENOENT errors from cgroups, in order to cope with missing /proc/cgroups. This is not good though because many other things can cause ENOENT and should not be ignored. The callers expect to see ENXIO when cgroups are not present, so adjust the code to report that errno when /proc/cgroups is missing Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-05-17 10:25:15 +01:00
Jim Fehlig	bbe97ae968	Fix starting domains when kernel has no cgroups support Found that I was unable to start existing domains after updating to a kernel with no cgroups support # zgrep CGROUP /proc/config.gz # CONFIG_CGROUPS is not set # virsh start test error: Failed to start domain test error: Unable to initialize /machine cgroup: Cannot allocate memory virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when attempting to open /proc/cgroups on such a system, but it was being dropped in virCgroupSetPartitionSuffix(). Change virCgroupSetPartitionSuffix() to propagate errors returned by its callees. Also check for ENOENT in qemuInitCgroup() when determining if cgroups support is available.	2013-05-13 09:27:46 -06:00
Han Cheng	6eb42e38e8	qemu: Allow the scsi-generic device in cgroup This adds the scsi-generic device into the device controller's whitelist, so that it's allowed to used by the qemu process. Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat.com>	2013-05-13 19:08:34 +08:00
Laine Stump	52ba0f6e1c	qemu: fix stupid typos in VFIO cgroup setup/teardown I must have looked at this a couple dozen times before I noticed it had "!=" instead of "==". Not doing this setup prevented qemu from doing anything with the vfio group device.	2013-05-03 14:32:54 -04:00
Michal Privoznik	7c9a2d88cd	virutil: Move string related functions to virstring.c The source code base needs to be adapted as well. Some files include virutil.h just for the string related functions (here, the include is substituted to match the new file), some include virutil.h without any need (here, the include is removed), and some require both.	2013-05-02 16:56:55 +02:00
Laine Stump	811143c0b6	qemu: put usb cgroup setup in common function The USB-specific cgroup setup had been inserted inline in qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a common cgroup setup function called for all hostdevs, so it makes sens to put the usb-specific setup there and just rely on that function being called. The one thing I'm uncertain of here (and a reason for not pushing until after release) is that previously hostdev->missing was checked only when starting a domain (and cgroup setup for the device skipped if missing was true), but with this consolidation, it is now checked in the case of hotplug as well. I don't know if this will have any practical effect (does it make sense to hotplug a "missing" usb device?)	2013-04-29 21:52:28 -04:00
Laine Stump	6e13860cb4	qemu: add vfio devices to cgroup ACL when appropriate PCIO device assignment using VFIO requires read/write access by the qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the VFIO group number that the assigned device belongs to (and can be found with the function virPCIDeviceGetVFIOGroupDev) /dev/vfio/vfio can be accessible to any guest without danger (according to vfio developers), so it is added to the static ACL. The group device must be dynamically added to the cgroup ACL for each vfio hostdev in two places: 1) for any devices in the persistent config when the domain is started (done during qemuSetupCgroup()) 2) at device attach time for any hotplug devices (done in qemuDomainAttachHostDevice) The group device must be removed from the ACL when a device it "hot-unplugged" (in qemuDomainDetachHostDevice()) Note that USB devices are already doing their own cgroup setup and teardown in the hostdev-usb specific function. I chose to make the new functions generic and call them in a common location though. We can then move the USB-specific code (which is duplicated in two locations) to this single location. I'll be posting a followup patch to do that.	2013-04-29 21:52:28 -04:00
Daniel P. Berrange	1e05073fbb	Replace more cases of /system with /machine The change in commit `aed4986322` was incomplete, missing a couple of cases of /system. This caused failure to start VMs. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 17:11:36 +01:00
Daniel P. Berrange	aed4986322	Change default resource partition to /machine After discussions with systemd developers it was decided that a better default policy for resource partitions is to have 3 default partitions at the top level /system - system services /machine - virtual machines / containers /user - user login session This ensures that the default policy isolates guest from user login sessions & system services, so a mis-behaving guest can't consume 100% of CPU usage if other things are contending for it. Thus we change the default partition from /system to /machine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 12:10:12 +01:00
Daniel P. Berrange	767596bdb4	Remove non-functional code for setting up non-root cgroups The virCgroupNewDriver method had a 'bool privileged' param. If a false value was ever passed in, it would simply not work, since non-root users don't have any privileges to create new cgroups. Just delete this broken code entirely and make the QEMU driver skip cgroup setup in non-privileged mode Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	db44eb1b5f	Change default cgroup layout for QEMU/LXC and honour XML config Historically QEMU/LXC guests have been placed in a cgroup layout that is $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME This is bad for a number of reasons - The cgroup hierarchy gets very deep which seriously impacts kernel performance due to cgroups scalability limitations. - It is hard to setup cgroup policies which apply across services and virtual machines, since all VMs are underneath the libvirtd service. To address this the default cgroup location is changed to be /system/$VMNAME.{lxc,qemu}.libvirt This puts virtual machines at the same level in the hierarchy as system services, allowing consistent policy to be setup across all of them. This also honours the new resource partition location from the XML configuration, for example <resource> <partition>/virtualmachines/production</partitions> </resource> will result in the VM being placed at /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt NB, with the exception of the default, /system, path which is intended to always exist, libvirt will not attempt to auto-create the partitions in the XML. It is the responsibility of the admin/app to configure the partitions. Later libvirt APIs will provide a way todo this. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	aa8604dd45	Add a new virCgroupNewPartition for setting up resource partitions A resource partition is an absolute cgroup path, ignoring the current process placement. Expose a virCgroupNewPartition API for constructing such cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	04c18d25f1	Rename virCgroupForXXX to virCgroupNewXXX Rename all the virCgroupForXXX methods to use the form virCgroupNewXXX since they are all constructors. Also make sure the output parameter is the last one in the list, and annotate all pointers as non-null. Fix up all callers, and make sure they use true/false not 0/1 for the boolean parameters Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	632f78caaf	Store a virCgroupPtr instance in qemuDomainObjPrivatePtr Instead of calling virCgroupForDomain every time we need the virCgrouPtr instance, just do it once at Vm startup and cache a reference to the object in qemuDomainObjPrivatePtr until shutdown of the VM. Removing the virCgroupPtr from the QEMU driver state also means we don't have stale mount info, if someone mounts the cgroups filesystem after libvirtd has been started Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Stefan Berger	22feb0d3e7	QEMU Cgroup support for TPM passthrough Some refactoring for virDomainChrSourceDef type of devices so we can use common code. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>	2013-04-12 16:55:46 -04:00
Daniel P. Berrange	dca927c82f	Rename virCgroupMounted to virCgroupHasController & make it more robust The virCgroupMounted method is badly named, since a controller can be mounted, but disabled in the current object. Rename the method to be virCgroupHasController. Also make it tolerant to a NULL virCgroupPtr and out-of-range controller index, to avoid duplication of these checks in all callers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-08 14:49:12 +01:00
Daniel P. Berrange	56f27b3bbc	Don't create dirs in cgroup controllers we don't want to use Currently when getting an instance of virCgroupPtr we will create the path in all cgroup controllers. Only at the virt driver layer are we attempting to filter controllers. This is bad because the mere act of creating the dirs in the controllers can have a functional impact on the kernel, particularly for performance. Update the virCgroupForDriver() method to accept a bitmask of controllers to use. Only create dirs in the controllers that are requested. When creating cgroups for domains, respect the active controller list from the parent cgroup Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-05 10:41:54 +01:00
Gao feng	45e9d27ad8	NUMA: cleanup for numa related codes Intend to reduce the redundant code,use virNumaSetupMemoryPolicy to replace virLXCControllerSetupNUMAPolicy and qemuProcessInitNumaMemoryPolicy. This patch also moves the numa related codes to the file virnuma.c and virnuma.h Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-03-20 19:37:00 +08:00
Daniel P. Berrange	7f544a4c8f	Don't try to add non-existant devices to ACL The QEMU driver has a list of devices nodes that are whitelisted for all guests. The kernel has recently started returning an error if you try to whitelist a device which does not exist. This causes a warning in libvirt logs and an audit error for any missing devices. eg 2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Daniel P. Berrange	279336c5d8	Avoid spamming logs with cgroups warnings The code for putting the emulator threads in a separate cgroup would spam the logs with warnings 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4 2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6 This is because it has only created child cgroups for 3 of the controllers, but was trying to move the processes from all the controllers. The fix is to only try to move threads in the controllers we actually created. Also remove the warning and make it return a hard error to avoid such lazy callers in the future. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Eric Blake	82d5fe5437	qemu: check backing chains even when cgroup is omitted https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out a regression caused by commit `38c4a9c` - libvirt only labels the backing chain if the backing chain cache is populated, but the code to populate the cache was only conditionally performed if cgroup labeling was necessary. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup... * src/qemu/qemu_process.c (qemuProcessStart): ...earlier into caller, where it is now unconditional.	2013-02-21 12:32:56 -07:00
Daniel P. Berrange	77c3015f9c	Rename all USB device functions to have a standard name prefix Rename all the usbDeviceXXX and usbXXXDevice APIs to have a fixed virUSBDevice name prefix	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	3e86e8f327	Fix leak of usbDevice struct when initializing cgroups When iterating over USB host devices to setup cgroups, the usbDevice object was leaked in both LXC and QEMU driers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	b090aa7d55	Introduce a virQEMUDriverConfigPtr object Currently the virQEMUDriverPtr struct contains an wide variety of data with varying access needs. Move all the static config data into a dedicated virQEMUDriverConfigPtr object. The only locking requirement is to hold the driver lock, while obtaining an instance of virQEMUDriverConfigPtr. Once a reference is held on the config object, it can be used completely lockless since it is immutable. NB, not all APIs correctly hold the driver lock while getting a reference to the config object in this patch. This is safe for now since the config is never updated on the fly. Later patches will address this fully. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 15:49:25 +00:00
Eric Blake	7034531814	maint: fix comment typo While OOM can have knock-on effects that trash a system, generally the first symptom is one of memory thrashing. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.	2013-01-09 16:45:59 -07:00
Michal Privoznik	3c83df679e	qemu: Relax hard RSS limit Currently, if there's no hard memory limit defined for a domain, libvirt tries to calculate one, based on domain definition and magic equation and set it upon the domain startup. The rationale behind was, if there's a memory leak or exploit in qemu, we should prevent the host system trashing. However, the equation was too tightening, as it didn't reflect what the kernel counts into the memory used by a process. Since many hosts do have a swap, nobody hasn't noticed anything, because if hard memory limit is reached, process can continue allocating memory on a swap. However, if there is no swap on the host, the process gets killed by OOM killer. In our case, the qemu process it is. To prevent this, we need to relax the hard RSS limit. Moreover, we should reflect more precisely the kernel way of accounting the memory for process. That is, even the kernel caches are counted within the memory used by a process (within cgroups at least). Hence the magic equation has to be changed: limit = 1.5 * (domain memory + total video memory) + (32MB for cache per each disk) + 200MB	2013-01-08 16:32:11 +01:00
Daniel P. Berrange	f24404a324	Rename virterror.c virterror_internal.h to virerror.{c,h}	2012-12-21 11:19:50 +00:00
Daniel P. Berrange	44f6ae27fe	Rename util.{c,h} to virutil.{c,h}	2012-12-21 11:19:49 +00:00
Daniel P. Berrange	ab9b7ec2f6	Rename memory.{c,h} to viralloc.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	936d95d347	Rename logging.{c,h} to virlog.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	f9c7020c1f	Rename cgroup.{h,c} to vircgroup.{h,c} To bring in line with new naming practice, rename the= src/util/cgroup.{h,c} files to vircgroup.{h,c} Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-21 11:17:12 +00:00
Daniel P. Berrange	df5928ea56	Allow passing a vroot into security manager hostdev labelling When LXC labels USB devices during hotplug, it is running in host context, so it needs to pass in a vroot path to the container root. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-17 17:50:51 +00:00
Daniel P. Berrange	4738c2a7e7	Replace 'struct qemud_driver *' with virQEMUDriverPtr Remove the obsolete 'qemud' naming prefix and underscore based type name. Introduce virQEMUDriverPtr as the replacement, in common with LXC driver naming style	2012-11-28 18:17:25 +00:00
Daniel P. Berrange	1c04f99970	Remove spurious whitespace between function name & open brackets The libvirt coding standard is to use 'function(...args...)' instead of 'function (...args...)'. A non-trivial number of places did not follow this rule and are fixed in this patch. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-11-02 13:36:49 +00:00
Osier Yang	bb81021bfe	qemu: Keep the affinity when creating cgroup for emulator thread When the cpu placement model is "auto", it sets the affinity for domain process with the advisory nodeset from numad, however, creating cgroup for the domain process (called emulator thread in some contexts) later overrides that with pinning it to all available pCPUs. How to reproduce: * Configure the domain with "auto" placement for <vcpu>, e.g. <vcpu placement='auto'>4</vcpu> * % virsh start dom * % cat /proc/$dompid/status Though the emulator cgroup cause conflicts, but we can't simply prohibit creating it, as other tunables are still useful, such as "emulator_period", which is used by API virDomainSetSchedulerParameter. So this patch doesn't prohibit creating the emulator cgroup, but inherit the nodeset from numad, and reset the affinity for domain process. * src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator to accept the passed nodenet * src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset	2012-10-24 21:46:24 +08:00
Eric Blake	67aea3fb78	blockjob: remove unused parameters after previous patch Minor cleanup made possible by previous simplifications. * src/qemu/qemu_cgroup.h (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Alter signature. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup, qemuSetupCgroup): Update all uses. * src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice) (qemuDomainDetachDiskDevice): Likewise. * src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive) (qemuDomainChangeDiskMediaLive) (qemuDomainSnapshotCreateSingleDiskActive) (qemuDomainSnapshotUndoSingleDiskActive): Likewise.	2012-10-19 17:35:11 -06:00
Eric Blake	38c4a9cc40	storage: use cache to walk backing chain We used to walk the backing file chain at least twice per disk, once to set up cgroup device whitelisting, and once to set up security labeling. Rather than walk the chain every iteration, which possibly includes calls to fork() in order to open root-squashed NFS files, we can exploit the cache of the previous patch. * src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter signature. * src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller to supply backing chain via disk, if recursion is desired. * src/security/security_dac.c (virSecurityDACSetSecurityImageLabel): Adjust caller. * src/security/security_selinux.c (virSecuritySELinuxSetSecurityImageLabel): Likewise. * src/security/virt-aa-helper.c (get_files): Likewise. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Likewise. (qemuSetupCgroup): Pre-populate chain.	2012-10-19 17:35:11 -06:00

1 2 3 4

199 Commits