libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2024-09-24 16:35:44 +00:00

Author	SHA1	Message	Date
Han Cheng	6eb42e38e8	qemu: Allow the scsi-generic device in cgroup This adds the scsi-generic device into the device controller's whitelist, so that it's allowed to used by the qemu process. Signed-off-by: Han Cheng <hanc.fnst@cn.fujitsu.com> Signed-off-by: Osier Yang <jyang@redhat.com>	2013-05-13 19:08:34 +08:00
Laine Stump	52ba0f6e1c	qemu: fix stupid typos in VFIO cgroup setup/teardown I must have looked at this a couple dozen times before I noticed it had "!=" instead of "==". Not doing this setup prevented qemu from doing anything with the vfio group device.	2013-05-03 14:32:54 -04:00
Michal Privoznik	7c9a2d88cd	virutil: Move string related functions to virstring.c The source code base needs to be adapted as well. Some files include virutil.h just for the string related functions (here, the include is substituted to match the new file), some include virutil.h without any need (here, the include is removed), and some require both.	2013-05-02 16:56:55 +02:00
Laine Stump	811143c0b6	qemu: put usb cgroup setup in common function The USB-specific cgroup setup had been inserted inline in qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a common cgroup setup function called for all hostdevs, so it makes sens to put the usb-specific setup there and just rely on that function being called. The one thing I'm uncertain of here (and a reason for not pushing until after release) is that previously hostdev->missing was checked only when starting a domain (and cgroup setup for the device skipped if missing was true), but with this consolidation, it is now checked in the case of hotplug as well. I don't know if this will have any practical effect (does it make sense to hotplug a "missing" usb device?)	2013-04-29 21:52:28 -04:00
Laine Stump	6e13860cb4	qemu: add vfio devices to cgroup ACL when appropriate PCIO device assignment using VFIO requires read/write access by the qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the VFIO group number that the assigned device belongs to (and can be found with the function virPCIDeviceGetVFIOGroupDev) /dev/vfio/vfio can be accessible to any guest without danger (according to vfio developers), so it is added to the static ACL. The group device must be dynamically added to the cgroup ACL for each vfio hostdev in two places: 1) for any devices in the persistent config when the domain is started (done during qemuSetupCgroup()) 2) at device attach time for any hotplug devices (done in qemuDomainAttachHostDevice) The group device must be removed from the ACL when a device it "hot-unplugged" (in qemuDomainDetachHostDevice()) Note that USB devices are already doing their own cgroup setup and teardown in the hostdev-usb specific function. I chose to make the new functions generic and call them in a common location though. We can then move the USB-specific code (which is duplicated in two locations) to this single location. I'll be posting a followup patch to do that.	2013-04-29 21:52:28 -04:00
Daniel P. Berrange	1e05073fbb	Replace more cases of /system with /machine The change in commit `aed4986322` was incomplete, missing a couple of cases of /system. This caused failure to start VMs. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 17:11:36 +01:00
Daniel P. Berrange	aed4986322	Change default resource partition to /machine After discussions with systemd developers it was decided that a better default policy for resource partitions is to have 3 default partitions at the top level /system - system services /machine - virtual machines / containers /user - user login session This ensures that the default policy isolates guest from user login sessions & system services, so a mis-behaving guest can't consume 100% of CPU usage if other things are contending for it. Thus we change the default partition from /system to /machine Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-22 12:10:12 +01:00
Daniel P. Berrange	767596bdb4	Remove non-functional code for setting up non-root cgroups The virCgroupNewDriver method had a 'bool privileged' param. If a false value was ever passed in, it would simply not work, since non-root users don't have any privileges to create new cgroups. Just delete this broken code entirely and make the QEMU driver skip cgroup setup in non-privileged mode Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	db44eb1b5f	Change default cgroup layout for QEMU/LXC and honour XML config Historically QEMU/LXC guests have been placed in a cgroup layout that is $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME This is bad for a number of reasons - The cgroup hierarchy gets very deep which seriously impacts kernel performance due to cgroups scalability limitations. - It is hard to setup cgroup policies which apply across services and virtual machines, since all VMs are underneath the libvirtd service. To address this the default cgroup location is changed to be /system/$VMNAME.{lxc,qemu}.libvirt This puts virtual machines at the same level in the hierarchy as system services, allowing consistent policy to be setup across all of them. This also honours the new resource partition location from the XML configuration, for example <resource> <partition>/virtualmachines/production</partitions> </resource> will result in the VM being placed at /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt NB, with the exception of the default, /system, path which is intended to always exist, libvirt will not attempt to auto-create the partitions in the XML. It is the responsibility of the admin/app to configure the partitions. Later libvirt APIs will provide a way todo this. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	aa8604dd45	Add a new virCgroupNewPartition for setting up resource partitions A resource partition is an absolute cgroup path, ignoring the current process placement. Expose a virCgroupNewPartition API for constructing such cgroups Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	04c18d25f1	Rename virCgroupForXXX to virCgroupNewXXX Rename all the virCgroupForXXX methods to use the form virCgroupNewXXX since they are all constructors. Also make sure the output parameter is the last one in the list, and annotate all pointers as non-null. Fix up all callers, and make sure they use true/false not 0/1 for the boolean parameters Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Daniel P. Berrange	632f78caaf	Store a virCgroupPtr instance in qemuDomainObjPrivatePtr Instead of calling virCgroupForDomain every time we need the virCgrouPtr instance, just do it once at Vm startup and cache a reference to the object in qemuDomainObjPrivatePtr until shutdown of the VM. Removing the virCgroupPtr from the QEMU driver state also means we don't have stale mount info, if someone mounts the cgroups filesystem after libvirtd has been started Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-15 17:35:31 +01:00
Stefan Berger	22feb0d3e7	QEMU Cgroup support for TPM passthrough Some refactoring for virDomainChrSourceDef type of devices so we can use common code. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Tested-by: Corey Bryant <coreyb@linux.vnet.ibm.com>	2013-04-12 16:55:46 -04:00
Daniel P. Berrange	dca927c82f	Rename virCgroupMounted to virCgroupHasController & make it more robust The virCgroupMounted method is badly named, since a controller can be mounted, but disabled in the current object. Rename the method to be virCgroupHasController. Also make it tolerant to a NULL virCgroupPtr and out-of-range controller index, to avoid duplication of these checks in all callers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-08 14:49:12 +01:00
Daniel P. Berrange	56f27b3bbc	Don't create dirs in cgroup controllers we don't want to use Currently when getting an instance of virCgroupPtr we will create the path in all cgroup controllers. Only at the virt driver layer are we attempting to filter controllers. This is bad because the mere act of creating the dirs in the controllers can have a functional impact on the kernel, particularly for performance. Update the virCgroupForDriver() method to accept a bitmask of controllers to use. Only create dirs in the controllers that are requested. When creating cgroups for domains, respect the active controller list from the parent cgroup Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-04-05 10:41:54 +01:00
Gao feng	45e9d27ad8	NUMA: cleanup for numa related codes Intend to reduce the redundant code,use virNumaSetupMemoryPolicy to replace virLXCControllerSetupNUMAPolicy and qemuProcessInitNumaMemoryPolicy. This patch also moves the numa related codes to the file virnuma.c and virnuma.h Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>	2013-03-20 19:37:00 +08:00
Daniel P. Berrange	7f544a4c8f	Don't try to add non-existant devices to ACL The QEMU driver has a list of devices nodes that are whitelisted for all guests. The kernel has recently started returning an error if you try to whitelist a device which does not exist. This causes a warning in libvirt logs and an audit error for any missing devices. eg 2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Daniel P. Berrange	279336c5d8	Avoid spamming logs with cgroups warnings The code for putting the emulator threads in a separate cgroup would spam the logs with warnings 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3 2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4 2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6 This is because it has only created child cgroups for 3 of the controllers, but was trying to move the processes from all the controllers. The fix is to only try to move threads in the controllers we actually created. Also remove the warning and make it return a hard error to avoid such lazy callers in the future. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-27 22:51:24 +00:00
Eric Blake	82d5fe5437	qemu: check backing chains even when cgroup is omitted https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out a regression caused by commit `38c4a9c` - libvirt only labels the backing chain if the backing chain cache is populated, but the code to populate the cache was only conditionally performed if cgroup labeling was necessary. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup... * src/qemu/qemu_process.c (qemuProcessStart): ...earlier into caller, where it is now unconditional.	2013-02-21 12:32:56 -07:00
Daniel P. Berrange	77c3015f9c	Rename all USB device functions to have a standard name prefix Rename all the usbDeviceXXX and usbXXXDevice APIs to have a fixed virUSBDevice name prefix	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	3e86e8f327	Fix leak of usbDevice struct when initializing cgroups When iterating over USB host devices to setup cgroups, the usbDevice object was leaked in both LXC and QEMU driers Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 19:22:25 +00:00
Daniel P. Berrange	b090aa7d55	Introduce a virQEMUDriverConfigPtr object Currently the virQEMUDriverPtr struct contains an wide variety of data with varying access needs. Move all the static config data into a dedicated virQEMUDriverConfigPtr object. The only locking requirement is to hold the driver lock, while obtaining an instance of virQEMUDriverConfigPtr. Once a reference is held on the config object, it can be used completely lockless since it is immutable. NB, not all APIs correctly hold the driver lock while getting a reference to the config object in this patch. This is safe for now since the config is never updated on the fly. Later patches will address this fully. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2013-02-05 15:49:25 +00:00
Eric Blake	7034531814	maint: fix comment typo While OOM can have knock-on effects that trash a system, generally the first symptom is one of memory thrashing. * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.	2013-01-09 16:45:59 -07:00
Michal Privoznik	3c83df679e	qemu: Relax hard RSS limit Currently, if there's no hard memory limit defined for a domain, libvirt tries to calculate one, based on domain definition and magic equation and set it upon the domain startup. The rationale behind was, if there's a memory leak or exploit in qemu, we should prevent the host system trashing. However, the equation was too tightening, as it didn't reflect what the kernel counts into the memory used by a process. Since many hosts do have a swap, nobody hasn't noticed anything, because if hard memory limit is reached, process can continue allocating memory on a swap. However, if there is no swap on the host, the process gets killed by OOM killer. In our case, the qemu process it is. To prevent this, we need to relax the hard RSS limit. Moreover, we should reflect more precisely the kernel way of accounting the memory for process. That is, even the kernel caches are counted within the memory used by a process (within cgroups at least). Hence the magic equation has to be changed: limit = 1.5 * (domain memory + total video memory) + (32MB for cache per each disk) + 200MB	2013-01-08 16:32:11 +01:00
Daniel P. Berrange	f24404a324	Rename virterror.c virterror_internal.h to virerror.{c,h}	2012-12-21 11:19:50 +00:00
Daniel P. Berrange	44f6ae27fe	Rename util.{c,h} to virutil.{c,h}	2012-12-21 11:19:49 +00:00
Daniel P. Berrange	ab9b7ec2f6	Rename memory.{c,h} to viralloc.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	936d95d347	Rename logging.{c,h} to virlog.{c,h}	2012-12-21 11:17:14 +00:00
Daniel P. Berrange	f9c7020c1f	Rename cgroup.{h,c} to vircgroup.{h,c} To bring in line with new naming practice, rename the= src/util/cgroup.{h,c} files to vircgroup.{h,c} Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-21 11:17:12 +00:00
Daniel P. Berrange	df5928ea56	Allow passing a vroot into security manager hostdev labelling When LXC labels USB devices during hotplug, it is running in host context, so it needs to pass in a vroot path to the container root. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-12-17 17:50:51 +00:00
Daniel P. Berrange	4738c2a7e7	Replace 'struct qemud_driver *' with virQEMUDriverPtr Remove the obsolete 'qemud' naming prefix and underscore based type name. Introduce virQEMUDriverPtr as the replacement, in common with LXC driver naming style	2012-11-28 18:17:25 +00:00
Daniel P. Berrange	1c04f99970	Remove spurious whitespace between function name & open brackets The libvirt coding standard is to use 'function(...args...)' instead of 'function (...args...)'. A non-trivial number of places did not follow this rule and are fixed in this patch. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-11-02 13:36:49 +00:00
Osier Yang	bb81021bfe	qemu: Keep the affinity when creating cgroup for emulator thread When the cpu placement model is "auto", it sets the affinity for domain process with the advisory nodeset from numad, however, creating cgroup for the domain process (called emulator thread in some contexts) later overrides that with pinning it to all available pCPUs. How to reproduce: * Configure the domain with "auto" placement for <vcpu>, e.g. <vcpu placement='auto'>4</vcpu> * % virsh start dom * % cat /proc/$dompid/status Though the emulator cgroup cause conflicts, but we can't simply prohibit creating it, as other tunables are still useful, such as "emulator_period", which is used by API virDomainSetSchedulerParameter. So this patch doesn't prohibit creating the emulator cgroup, but inherit the nodeset from numad, and reset the affinity for domain process. * src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator to accept the passed nodenet * src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset	2012-10-24 21:46:24 +08:00
Eric Blake	67aea3fb78	blockjob: remove unused parameters after previous patch Minor cleanup made possible by previous simplifications. * src/qemu/qemu_cgroup.h (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Alter signature. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup, qemuSetupCgroup): Update all uses. * src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice) (qemuDomainDetachDiskDevice): Likewise. * src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive) (qemuDomainChangeDiskMediaLive) (qemuDomainSnapshotCreateSingleDiskActive) (qemuDomainSnapshotUndoSingleDiskActive): Likewise.	2012-10-19 17:35:11 -06:00
Eric Blake	38c4a9cc40	storage: use cache to walk backing chain We used to walk the backing file chain at least twice per disk, once to set up cgroup device whitelisting, and once to set up security labeling. Rather than walk the chain every iteration, which possibly includes calls to fork() in order to open root-squashed NFS files, we can exploit the cache of the previous patch. * src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter signature. * src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller to supply backing chain via disk, if recursion is desired. * src/security/security_dac.c (virSecurityDACSetSecurityImageLabel): Adjust caller. * src/security/security_selinux.c (virSecuritySELinuxSetSecurityImageLabel): Likewise. * src/security/virt-aa-helper.c (get_files): Likewise. * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup) (qemuTeardownDiskCgroup): Likewise. (qemuSetupCgroup): Pre-populate chain.	2012-10-19 17:35:11 -06:00
Martin Kletzander	ba63d8f7d8	qemu: Pin the emulator when only cpuset is specified According to our recent changes (clarifications), we should be pinning qemu's emulator processes using the <vcpu> 'cpuset' attribute in case there is no <emulatorpin> specified. This however doesn't work entirely as expected and this patch should resolve all the remaining issues.	2012-10-17 17:37:10 +02:00
Jiri Denemark	edc9269a2a	qemu: Implement startupPolicy for USB passed through devices	2012-10-11 15:11:42 +02:00
Eric Blake	4ecb723b9e	maint: fix up copyright notice inconsistencies https://www.gnu.org/licenses/gpl-howto.html recommends that the 'If not, see <url>.' phrase be a separate sentence. * tests/securityselinuxhelper.c: Remove doubled line. * tests/securityselinuxtest.c: Likewise. * globally: s/; If/. If/	2012-09-20 16:30:55 -06:00
Hu Tao	75b198b3e7	use virBitmap to store numa nodemask info.	2012-09-17 14:59:37 -04:00
Hu Tao	f970d8481e	use virBitmap to store cpupin info	2012-09-17 14:59:36 -04:00
Hu Tao	f7e1a546f2	fix bug in qemuSetupCgroupForEmulator Should not return 0 when failed to setup cgroup.	2012-09-11 16:08:41 -06:00
Martin Kletzander	9f86fb9326	qemu: don't pin all the cpus This is another fix for the emulator-pin series. When going through the cputune pinning settings, the current code is trying to pin all the CPUs, even when not all of them are specified. This causes error in the subsequent function which, of course, cannot find the cpu to pin. Since it's enough to pass the correct VCPU ID to the function, the fix is trivial.	2012-09-05 19:25:10 +02:00
Jiri Denemark	774eb45be6	qemu: Don't ignore CPU tuning config if required cgroups are missing When domain XML contains any of the elements for setting up CPU scheduling parameters (period, quota, emulator_period, or emulator_quota) we need cpu cgroup to enforce the configuration. However, the existing code would just ignore silently such settings if either cgroups were not available at all cpu cgroup was not available. Moreover, APIs for manipulating CPU scheduler parameters were already failing if cpu cgroup was not available. This patch makes cpu cgroup mandatory for all domains that use CPU scheduling elements in their XML.	2012-08-31 13:24:02 +02:00
Jiri Denemark	0c7cca36e7	qemu: Fix starting domains with no cpu cgroup If cgroups are enabled in general but cpu cgroup is disabled in qemu.conf or not mounted at all, libvirt would refuse to start any domain even though scheduler parameters are not set in domain XML. This patch makes cpu cgroup mandatory only for domains that actually want to use it.	2012-08-29 16:13:38 +02:00
Martin Kletzander	16ebec2b7c	qemu: fix regression with pinning Commit `4b03d59167` changed the pinning behavior in a way that makes some machines non-startable. The comment mentioning that we cannot control each vcpu when there is not VCPU<-> PID mapping available is true, however, this isn't necessarily an error, because this can be caused by old QEMU without support for "query-cpus" command as well as a software emulated machines that don't create more than one process.	2012-08-27 10:20:42 +02:00
Hu Tao	b65dafa812	qemu: introduce period/quota tuning for emulator This patch introduces support of setting emulator's period and quota to limit cpu bandwidth when the vm starts. Also updates XML Schema for new entries and docs.	2012-08-22 16:52:22 +08:00
Hu Tao	1d4395eb47	limit cpu bandwidth only for vcpus This patch changes the behaviour of xml element cputune.period and cputune.quota to limit cpu bandwidth only for vcpus, and no longer limit cpu bandwidth for the whole guest. The reasons to do this are: - This matches docs of cputune.period and cputune.quota. - The other parts excepting vcpus are treated as "emulator", and there are separate period/quota settings for emulator in the subsequent patches	2012-08-22 16:50:41 +08:00
Tang Chen	a1249489ce	qemu: synchronize emulatorpin info to cgroup Introduce qemuSetupCgroupEmulatorPin() function to add emulator threads pin info to cpuset cgroup, the same as vcpupin. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>	2012-08-22 16:09:26 +08:00
Hu Tao	fe1d32596c	Enable cpuset cgroup and synchronous vcpupin info to cgroup. vcpu threads pin are implemented using sched_setaffinity(), but not controlled by cgroup. This patch does the following things: 1) enable cpuset cgroup 2) reflect all the vcpu threads pin info to cgroup Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>	2012-08-22 15:12:22 +08:00
Wen Congyang	4b03d59167	create a new cgroup and move all emulator threads to the new cgroup Create a new cgroup and move all emulator threads to the new cgroup. And then we can do the other things: 1. limit only vcpu usage rather than the whole qemu 2. limit for emulator threads(include vhost-net threads) Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>	2012-08-22 14:33:59 +08:00

1 2

84 Commits