libvirt

mirror of https://gitlab.com/libvirt/libvirt.git synced 2025-01-09 06:25:19 +00:00

Author	SHA1	Message	Date
Peter Krempa	88cac66d92	conf: Make tri-state feature options more universal The apic-eoi feature enum and implementation can be made more universal to allow re-use of the enum for other features.	2012-10-18 12:22:49 +02:00
Michal Privoznik	998dc17da3	qemu: Correctly wait for spice to migrate Currently we query-spice after the main migration has completed before moving to next state. Qemu reports this as boolean (not enclosed within quotes). Therefore it is not correct to use virJSONValueObjectGetString but virJSONValueObjectGetBoolean instead.	2012-10-18 10:31:56 +02:00
Viktor Mihajlovski	1916679506	qemu: Fixed default machine detection in qemuCapsParseMachineTypesStr The machine in the last output line of <qemu-binary> -M ? was always reported as default machine even if this wasn't the actual default. Trivial fix. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>	2012-10-17 17:24:41 -06:00
Martin Kletzander	ba63d8f7d8	qemu: Pin the emulator when only cpuset is specified According to our recent changes (clarifications), we should be pinning qemu's emulator processes using the <vcpu> 'cpuset' attribute in case there is no <emulatorpin> specified. This however doesn't work entirely as expected and this patch should resolve all the remaining issues.	2012-10-17 17:37:10 +02:00
Jiri Denemark	837993d845	qemu: Clear async job when p2p migration fails early When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock. Reported by Guido Winkelmann.	2012-10-17 15:43:38 +02:00
Doug Goldstein	1e7ec88d9a	interface: add virInterfaceGetXMLDesc() in udev Added support for retrieving the XML defining a specific interface via the udev based backend to virInterface. Implement the following APIs for the udev based backend: * virInterfaceGetXMLDesc() Note: Does not support bond devices.	2012-10-17 13:59:16 +02:00
Michal Privoznik	740225a1cb	AUTHORS: Remove double entry I've accidentally added Li Zhang <zhlcindy@linux.vnet.ibm.com> to AUTHORS, even if he already was there.	2012-10-17 11:53:13 +02:00
Li Zhang	40f58ca75d	Doc-fix for PowerPC CPU model driver There are some descriptions not right in PowerPC CPU model driver. This patch is to fix them. Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com> Acked-by: Michal Privoznik <mprivozn@redhat.com>	2012-10-17 10:03:34 +02:00
Li Zhang	9943a7341c	Implement CPU model driver for PowerPC Currently, the CPU model driver is not implemented for PowerPC. Host's CPU information is needed to exposed to guests' XML file some time. This patch is to implement the callback functions of CPU model driver. Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com> Acked-by: Michal Privoznik <mprivozn@redhat.com>	2012-10-17 10:03:34 +02:00
Li Zhang	309f03db40	Add one file cpu_ppc_data.h to define CPU data for PPC CPU version can be got by PVR on PowerPC. So this PVR is defined in the CPU data in cpuData structure. Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com> Acked-by: Michal Privoznik <mprivozn@redhat.com>	2012-10-17 10:03:34 +02:00
Guannan Ren	d37a3a1d6c	selinux: remove unused variables in socket labelling	2012-10-17 13:13:17 +08:00
Guannan Ren	89b63f0ad4	selinux: fix wrong tapfd relablling It should relabel tapfd of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT rather than VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE (commit `ae368ebfcc` introduced this bug) Caution: The context of the two hunks is identical other than indentation. Please be extremely cautious of where the patch gets applied.	2012-10-17 13:13:14 +08:00
Cole Robinson	9f0e9cba27	storage: lvm: lvcreate fails with allocation=0, don't do that On F17 at least, this command fails: $ sudo /usr/sbin/lvcreate --name sparsetest -L 0K --virtualsize 16384K vgvirt Unable to create new logical volume with no extents Which is unfortunate since allocation=0 is what virt-manager tries to use by default. Rather than telling the user 'don't do that', let's just give them the smallest allocation possible if alloc=0 is requested. https://bugzilla.redhat.com/show_bug.cgi?id=866481	2012-10-16 21:16:44 -04:00
Cole Robinson	01df6f2bff	storage: lvm: Don't overwrite lvcreate errors Before: $ sudo virsh vol-create-as --pool vgvirt sparsetest --capacity 16M --allocation 0 error: Failed to create vol sparsetest error: internal error Child process (/usr/sbin/lvchange -aln vgvirt/sparsetest) unexpected exit status 5: One or more specified logical volume(s) not found. After: $ sudo virsh vol-create-as --pool vgvirt sparsetest --capacity 16M --allocation 0 error: Failed to create vol sparsetest error: internal error Child process (/usr/sbin/lvcreate --name sparsetest -L 0K --virtualsize 16384K vgvirt) unexpected exit status 5: Unable to create new logical volume with no extents	2012-10-16 21:16:44 -04:00
Jiri Denemark	3143c81ca1	spec: Require newer sanlock on recent distros 2 The previous commit was incomplete. We need to also add explicit Requires for the newer version since RPM's automatic dependencies won't work with sanlock.	2012-10-17 00:00:47 +02:00
Peter Krempa	cb4f41b8d0	spec: Add runtime requirement for libssh2 libssh2 unfortunately doesn't support symbol versioning so RPM can't figure out what version is needed for the currently installed libvirt package. This patch adds a runtime requirement, so that the correct version of libssh2 can be installed along with libvirt.	2012-10-16 22:47:04 +02:00
Jiri Denemark	48bf62fde1	spec: Require newer sanlock on recent distros Make sure libvirt is build with sanlock >= 2.4 on distros that are new enough to provide it.	2012-10-16 21:32:07 +02:00
Jiri Denemark	5ce6d95eed	locking: Fix build with sanlock < 2.4 libvirt started using sanlock_killpath to implement on_lockfailure action. Since sanlock_killpath was introduced in sanlock 2.4, libvirt fails to build with older sanlock.	2012-10-16 21:32:05 +02:00
Daniel P. Berrange	7bd744c401	Fix typo in previous commit s/lik/like/ Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 16:37:50 +01:00
Daniel P. Berrange	d507f8f9b9	Make virInitialize thread safe Currently there is a restriction that multi-threaded applications must manually call virInitialize, before threads start using libvirt, because it is not thread-safe. By switching it to use a virOnceControl initializer we gain thread safety, and thus applications no longer need to manually call it. They can rely on virConnectOpen invoking it for them. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 16:33:38 +01:00
Daniel P. Berrange	84912e9c91	Fix virProcessKillPainfully on Win32 Win32 platforms don't have SIGKILL defined, but they do have SIGABRT. Since our virProcess wrapper treats anything which isn't SIGTERM/SIGINT as equivalent to SIGKILL, just use SIGABRT on Win32. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:47:14 +01:00
Daniel P. Berrange	381a339e98	Add JSON serialization of virNetServerPtr objects for process re-exec() Add two new APIs virNetServerNewPostExecRestart and virNetServerPreExecRestart which allow a virNetServerPtr object to be created from a JSON object and saved to a JSON object, for the purpose of re-exec'ing a process. This includes serialization of all registered services and clients Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Daniel P. Berrange	3cfc3d7d2c	Add JSON serialization of virNetServerClientPtr objects for process re-exec() Add two new APIs virNetServerClientNewPostExecRestart and virNetServerClientPreExecRestart which allow a virNetServerClientPtr object to be created from a JSON object and saved to a JSON object, for the purpose of re-exec'ing a process. This includes serialization of the connected socket associated with the client Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Daniel P. Berrange	0cc7925520	Add JSON serialization of virNetServerServicePtr objects for process re-exec() Add two new APIs virNetServerServiceNewPostExecRestart and virNetServerServicePreExecRestart which allow a virNetServerServicePtr object to be created from a JSON object and saved to a JSON object, for the purpose of re-exec'ing a process. This includes serialization of the listening sockets associated with the service Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Daniel P. Berrange	c298145344	Add JSON serialization of virNetSocketPtr objects for process re-exec() Add two new APIs virNetSocketNewPostExecRestart and virNetSocketPreExecRestart which allow a virNetSocketPtr object to be created from a JSON object and saved to a JSON object, for the purpose of re-exec'ing a process. As well as saving the state in JSON format, the second method will disable the O_CLOEXEC flag so that the open file descriptors are preserved across the process re-exec() Since it is not possible to serialize SASL or TLS encryption state, an error will be raised if attempting to perform serialization on non-raw sockets Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Daniel P. Berrange	8057c04e8d	Add JSON serialization of virLockSpacePtr objects for process re-exec() Add two new APIs virLockSpaceNewPostExecRestart and virLockSpacePreExecRestart which allow a virLockSpacePtr object to be created from a JSON object and saved to a JSON object, for the purposes of re-exec'ing a process. As well as saving the state in JSON format, the second method will disable the O_CLOEXEC flag so that the open file descriptors are preserved across the process re-exec() Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Daniel P. Berrange	eca72d4759	Introduce an internal API for handling file based lockspaces The previously introduced virFile{Lock,Unlock} APIs provide a way to acquire/release fcntl() locks on individual files. For unknown reason though, the POSIX spec says that fcntl() locks are released when any file handle referring to the same path is closed. In the following sequence threadA: fd1 = open("foo") threadB: fd2 = open("foo") threadA: virFileLock(fd1) threadB: virFileLock(fd2) threadB: close(fd2) you'd expect threadA to come out holding a lock on 'foo', and indeed it does hold a lock for a very short time. Unfortunately when threadB does close(fd2) this releases the lock associated with fd1. For the current libvirt use case for virFileLock - pidfiles - this doesn't matter since the lock is acquired at startup while single threaded an never released until exit. To provide a more generally useful API though, it is necessary to introduce a slightly higher level abstraction, which is to be referred to as a "lockspace". This is to be provided by a virLockSpacePtr object in src/util/virlockspace.{c,h}. The core idea is that the lockspace keeps track of what files are already open+locked. This means that when a 2nd thread comes along and tries to acquire a lock, it doesn't end up opening and closing a new FD. The lockspace just checks the current list of held locks and immediately returns VIR_ERR_RESOURCE_BUSY. NB, the API as it stands is designed on the basis that the files being locked are not being otherwise opened and used by the application code. One approach to using this API is to acquire locks based on a hash of the filepath. eg to lock /var/lib/libvirt/images/foo.img the application might do virLockSpacePtr lockspace = virLockSpaceNew("/var/lib/libvirt/imagelocks"); lockname = md5sum("/var/lib/libvirt/images/foo.img"); virLockSpaceAcquireLock(lockspace, lockname); NB, in this example, the caller should ensure that the path is canonicalized before calculating the checksum. It is also possible to do locks directly on resources by using a NULL lockspace directory and then using the file path as the lock name eg virLockSpacePtr lockspace = virLockSpaceNew(NULL); virLockSpaceAcquireLock(lockspace, "/var/lib/libvirt/images/foo.img"); This is only safe to do though if no other part of the process will be opening the files. This will be the case when this code is used inside the soon-to-be-reposted virlockd daemon Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2012-10-16 15:45:55 +01:00
Martin Kletzander	60f96bfc88	tests: Fix domain-events python test There was a missing method in python implementation of domain-events test and this patch adds that.	2012-10-16 16:37:29 +02:00
Eric Blake	819c8ce043	maint: prepare for next release number Given Daniel's announcement[1], code targetting the next release will be in 1.0.0, not 0.10.3. Changed mechanically with: for f in $(git grep -l '0$.$10\13\b') ; do sed -i -e 's/0$.$10\13/1\10\10/g' $f done [1]https://www.redhat.com/archives/libvir-list/2012-October/msg00403.html * docs/formatdomain.html.in: Use 1.0.0 for next release. * src/interface/interface_backend_udev.c: Likewise.	2012-10-16 08:09:01 -06:00
Eric Blake	1c3fee6abc	maint: fix license on polkit script As approved here: https://www.redhat.com/archives/libvir-list/2012-October/msg00701.html * daemon/libvirtd.policy.in: Use LGPLv2+ license.	2012-10-16 08:09:01 -06:00
Martin Kletzander	59952932f5	conf: add test for boot dev and order Add test for `280b8c9e7c`.	2012-10-16 12:25:32 +02:00
Martin Kletzander	280b8c9e7c	conf: Fix crash with cleanup There was a crash possible when both <boot dev... and <boot order... were specified due to virDomainDefParseBootXML() erroring out before setting *tmp (which was free'd in cleanup). As a fix, I created this cleanup that uses one pointer for all the temporary stored XPath strings and values, plus this pointer is correctly initialized to NULL.	2012-10-16 11:15:04 +02:00
Martin Kletzander	6676c1fc8f	selinux: Use raw contexts 2 In commit `9674f2c637`, I forgot to change selabel_lookup with the other functions, so this one-liner does exactly that.	2012-10-16 10:30:18 +02:00
Eric Blake	2cfa14bc8a	maint: drop spurious semicolons Detected with: git grep ';;$' -- '*/.[ch]' * src/network/bridge_driver.c (networkRadvdConfContents): Fix harmless typo. * src/phyp/phyp_driver.c (phypUUIDTable_Pull): Likewise. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveDel): Likewise.	2012-10-15 09:08:19 -06:00
Guannan Ren	ae368ebfcc	selinux: add security selinux function to label tapfd BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981 When using macvtap, a character device gets first created by kernel with name /dev/tapN, its selinux context is: system_u:object_r:device_t:s0 Shortly, when udev gets notification when new file is created in /dev, it will then jump in and relabel this file back to the expected default context: system_u:object_r:tun_tap_device_t:s0 There is a time gap happened. Sometimes, it will have migration failed, AVC error message: type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524 scontext=unconfined_u:system_r:svirt_t:s0:c598,c908 tcontext=system_u:object_r:device_t:s0 tclass=chr_file This patch will label the tapfd device before qemu process starts: system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)	2012-10-15 21:01:07 +08:00
Martin Kletzander	7ba5defb5a	Add support for SUSPEND_DISK event This patch adds support for SUSPEND_DISK event; both lifecycle and separated. The support is added for QEMU, machines are changed to PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes to shut-off. This and much more needs to be done in order for libvirt to work with transient devices, wake-ups etc. This patch is not aiming for that functionality.	2012-10-15 12:09:10 +02:00
Ján Tomko	a9e3b4f78e	util: switch virLogEatParams to virLogSource Commit `e8fd8757c8` changed 'const char *' category to virLogSource enum. This changes it in virLogEatParams as well, thus fixing the build with --disable-debug. -- Hopefully moving the enum declarations is less ugly than using int.	2012-10-15 11:13:43 +02:00
Osier Yang	f81f0f2f1d	node_memory: Add new parameter field to tune the new sysfs knob Upstream kernel introduced new sysfs knob "merge_across_nodes" to specify if pages from different numa nodes can be merged. When set to 0, only pages which physically reside in the memory area of same NUMA node can be merged. When set to 1, pages from all nodes can be merged. This patch supports the tuning by adding new param field "shm_merge_across_nodes".	2012-10-15 17:35:54 +08:00
Laine Stump	6bde0a1a37	qemu: reorganize qemuDomainChangeNet and qemuDomainChangeNetBridge This patch resolves: https://bugzilla.redhat.com/show_bug.cgi?id=805071 to the extent that it can be resolved with current qemu functionality. It attempts to detect as many situations as possible when the simple operation of disconnecting an existing tap device from one bridge and attaching it to another will satisfy the change requested in virDomainUpdateDeviceFlags() for a network device. Before this patch, that situation could only be detected if the pre-change interface and the post-change interface definition were both "type='bridge'". After this patch, it can also be detected if the before or after interfaces are any combination of type='bridge' and type='network' (the networks can be <forward mode='nat\|route\|bridge'>, as long as they use a Linux host bridge and not macvtap connections). This extra effort is especially useful since the recent discovery that a netdev_del+netdev_add combo (to reconnect the network device with completely different hostside configuration) doesn't work properly with current qemu (1.2) unless it is accompanied by the matching device_del+device_add - see this mailing list message for details: http://lists.nongnu.org/archive/html/qemu-devel/2012-10/msg02355.html (A slight modification of the patch referenced there has been prepared to apply on top of this patch, but won't be pushed until qemu can be made to work with it.) * qemuDomainChangeNet needs access to the virDomainDeviceDef that holds the new netdef (so that it can clear out the virDomainDeviceDef if it ends up using the NetDef to replace the original), so the virDomainNetDefPtr arg is replaced with a virDomainDeviceDefPtr. * qemuDomainChangeNet previously checked for some changes to the interface config, but this check was by no means complete. It was also a bit disorganized. This refactoring of the code is (I believe) complete in its check of all NetDef attributes that might be changed, and either returns a failure (for changes that are simply impossible), or sets one of three flags: needLinkStateChange - if the device link state needs to go up/down needBridgeChange - if everything else is the same, but it needs to be connected to a difference linux host bridge needReconnect - if the entire host side of the device needs to be torn down and reconstructed (currently non-working, as mentioned above) Note that this function will refuse to make any change that requires the guest side of the device to be detached (e.g. changing the PCI address or mac address). Those would be disruptive enough to the guest that it's reasonable to require an explicit detach/attach sequence from the management application. * As mentioned above, qemuDomainChangeNet also does its best to understand when a simple change in attached bridge for the existing tap device will work vs. the need to completely tear down/reconstruct the host side of the device (including tap device). This patch does not implement the "reconnect" code anyway - there is a placeholder that turns that into an error. Rather, the purpose of this patch is to replicate existing behavior with code that is ready to have that functionality plugged in in a later patch. * The expanded uses for qemuDomainChangeNetBridge meant that it needed to be enhanced as well - it no longer replaces the original brname string in olddev with the new brname; instead, it relies on the caller to replace the entire olddev with newdev (since we've gone to great lengths to assure they are functionally identical other than the name of the bridge, this is now not only safe, but more correct). Additionally, qemuDomainNetChangeBridge can now set the bridge for type='network' interfaces as well as plain type='bridge' interfaces. (Note that I had to make this change simultaneous to the reorganization of qemuDomainChangeNet because the two are too closely intertwined to separate).	2012-10-15 04:36:39 -04:00
Guido Günther	dc9d7a171c	Avoid straying </cpuset> by using the same condition as for the <cpuset>. Fixes "make check" found by http://honk.sigxcpu.org:8001/job/libvirt-check/160/	2012-10-15 17:14:25 +08:00
Laine Stump	11c47d979c	conf: virDomainDeviceInfoCopy utility function This does a shallow copy of all the bits, then strdups the two items that are actually allocated separately.	2012-10-15 04:03:06 -04:00
Laine Stump	310945597c	conf: fix virDevicePCIAddressEqual args This function really should have been taking virDevicePCIAddress* instead of the inefficient virDevicePCIAddress (results in copying two entire structs onto the stack rather than just two pointers), and returning a bool true/false (not matching is not necessarily a "failure", as a -1 return would imply, and also using "if (!virDevicePCIAddressEqual(x, y))" to mean "if x == y" is just a bit counterintuitive).	2012-10-15 04:03:06 -04:00
Guido Günther	a2b80edbc6	Fix tab vs space that broke "make syntax-check" found by http://honk.sigxcpu.org:8001/job/libvirt-syntax-check/157/ Pushed under the build breaker rule.	2012-10-15 09:18:18 +02:00
Osier Yang	3635b41e15	qemu: Ignore def->cpumask if emulatorpin is specified If the vcpu placement is "static", it's just fine to ignore the def->cpumask if emulatorpin is specified.	2012-10-15 12:20:37 +08:00
Osier Yang	5378effd57	conf: Ignore emulatorpin if vcpu placement is auto When vcpu placement is "auto", the domain process will be pinned to advisory nodeset from querying numad, While emulatorpin will override the pinning. That means both of them are to set the pinning policy for domain process, but conflicts with each other. This patch ingore emulatorpin if vcpu placement is "auto", because <vcpu> placement can't be simply ignored for <numatune> placement could default to it.	2012-10-15 12:19:54 +08:00
Osier Yang	0df1a79089	qemu: Initialize cpuset for hotplugged vcpu as def->cpuset The onlined vcpu pinning policy should inherit def->cpuset if it's not specified explicitly, and the affinity should be set in this case. Oppositely, the offlined vcpu pinning policy should be free()'ed.	2012-10-15 12:16:02 +08:00
Osier Yang	a9bfe887f9	qemu: Create or remove cgroup when doing vcpu hotpluging Various APIs use cgroup to either set or get the statistics of host or guest. Hotplug or hot unplug new vcpus without creating or removing the cgroup for the vcpus could cause problems for those APIs. E.g. % virsh vcpucount dom maximum config 10 maximum live 10 current config 1 current live 1 % virsh setvcpu dom 2 % virsh schedinfo dom --set vcpu_quota=1000 Scheduler : posix error: Unable to find vcpu cgroup for rhel6.2(vcpu: 1): No such file or directory This patch fixes the problem by creating cgroups for each of the onlined vcpus, and destroying cgroups for each of the offlined vcpus.	2012-10-15 12:15:32 +08:00
Osier Yang	10f8a45deb	conf: Initialize the pinning policy for vcpus Document for <vcpu>'s "cpuset" says: Since 0.4.4, this element can contain an optional cpuset attribute, which is a comma-separated list of physical CPU numbers that virtual CPUs can be pinned to. However, it's not the truth, libvirt actually pins the domain process to the specified pCPUs by "cpuset" of <vcpu>. And the vcpu thread are pinned to all available pCPUs if no <vcpupin> is specified for it. This patch is to implement the codes to inherit <vcpu>'s "cpuset" for vcpu that doesn't have <vcpupin> specified, and <vcpupin> for these vcpu will be ignored when formating. Underlying driver implementation will make sure the vcpu thread pinned to correct pCPUs.	2012-10-15 12:14:22 +08:00
Osier Yang	60b176c3d0	conf: Ignore vcpupin for not onlined vcpus when parsing Setting pinning policy for vcpu which exceeds current vcpus number just makes no sense, however, it could cause various problems, E.g. <vcpu current='1'>4</vcpu> <cputune> <vcpupin vcpuid='3' cpuset='4'/> </cputune> % virsh start linux error: Failed to start domain linux error: cannot set CPU affinity on process 32534: No such process We must have some odd codes underlying which produces the "on process 32534", but the point is why we not to prevent earlier when parsing? Note that this is only one of the problem it could cause. This patch is to ignore the <vcpupin> for not onlined vcpus.	2012-10-15 12:13:57 +08:00
Osier Yang	f108944ae0	doc: Sort out the relationship between <vcpu>, <vcpupin>, and <emulatorpin> These 3 elements conflicts with each other in either the doc or the underlying codes. Current problems: Problem 1: The doc shouldn't simply say "These settings are superseded by CPU tuning. " for element <vcpu>. As except the tuning, <vcpu> allows to specify the current, maxmum vcpu number. Apart from that, <vcpu> also allows to specify the placement as "auto", which binds the domain process to the advisory nodeset from numad. Problem 2: Doc for <vcpu> says its "cpuset" specify the physical CPUs that the vcpus can be pinned. But it's not the truth, as actually it only pin domain process to the specified physical CPUs. So either it's a document bug, or code bug. Problem 3: Doc for <vcpupin> says it supersed "cpuset" of <vcpu>, it's not quite correct, as each <vcpupin> specify the pinning policy only for one vcpu. How about the ones which doesn't have <vcpupin> specified? it says the vcpu will be pinned to all available physical CPUs, but what's the meaning of attribute "cpuset" of <vcpu> then? Problem 4: Doc for <emulatorpin> says it pin the emulator threads (domain process in other context, perhaps another follow up patch to cleanup the inconsistency is needed) to the physical CPUs specified its attribute "cpuset". Which conflicts with <vcpu>'s "cpuset". And actually in the underlying codes, it set the affinity for domain process twice if both "cpuset" for <vcpu> and <emulatorpin> are specified, and <emulatorpin>'s pinning will override <vcpu>'s. Problem 5: When "placement" of <vcpu> is "auto" (I.e. uses numad to get the advisory nodeset to which the domain process is pinned to), it will also be overridden by <emulatorpin>, This patch is trying to sort out the conflicts or bugs by: 1) Don't say <vcpu> is superseded by <cputune> 2) Keep the semanteme for "cpuset" of <vcpu> (I.e. Still says it specify the physical CPUs the virtual CPUs). But modifying it to mention it also set the pinning policy for domain process, and the CPU placement of domain process specified by "cpuset" of <vcpu> will be ingored if <emulatorpin> specified, and similary, the CPU placement of vcpu thread will be ignored if it has <vcpupin> specified, for vcpu which doesn't have <vcpupin> specified, it inherits "cpuset" of <vcpu>. 3) Don't say <vcpu> is supersed by <vcpupin>. If neither <vcpupin> nor "cpuset" of <vcpu> is specified, the vcpu will be pinned to all available pCPUs. 4) If neither <emulatorpin> nor "cpuset" of <vcpu> is specified, the domain process (emulator threads in the context) will be pinned to all available pCPUs. 5) If "placement" of <vcpu> is "auto", <emulatorpin> is not allowed. 6) hotplugged vcpus will also inherit "cpuset" of <vcpu> Codes changes according to above document changes: 1) Inherit def->cpumask for each vcpu which doesn't have <vcpupin> specified, during parsing. 2) ping the vcpu which doesn't have <vcpupin> specified to def->cpumask either by cgroup for sched_setaffinity(2), which is actually done by 1). 3) Error out if "placement" == "auto", and <emulatorpin> is specified. Otherwise, <emulatorpin> is honored, and "cpuset" of <cpuset> is ignored. 4) Setup cgroup for each hotplugged vcpu, and setup the pinning policy by either cgroup or sched_setaffinity(2). 5) Remove cgroup and <vcpupin> for each hot unplugged vcpu. Patches are following (6 in total except this patch)	2012-10-15 12:13:34 +08:00

1 2 3 4 5 ...

11222 Commits