Commit id '90b721e43' moved where the virCgroupAddTask was made until
after the check for the vcpupin checks. However, in doing so it missed
an option where if the cpumap didn't exist, then the code would continue
back to the top of the current vcpu loop. The results was that the
virCgroupAddTask wouldn't be called.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This reverts commit a41c00b472.
After much testing and upstream discussion this has been deemed to be
the incorrect operation since it means we no longer have any guarantee
about which resource controllers the QEMU processes in general are in.
Moving tasks to cgroups implied sched_setaffinity. Changing the cpus in
a set implies the same for all tasks in the group.
The old code put the the thread into the cpuset inherited from the
machine cgroup, which allowed it to run outside of vcpupin for a short
while.
Signed-off-by: Henning Schild <henning.schild@siemens.com>
The machine cgroup is a superset, a parent to the emulator and vcpuX
cgroups. The parent cgroup should never have any tasks directly in it.
In fact the parent cpuset might contain way more cpus than the sum of
emulatorpin and vcpupins. So putting tasks in the superset will allow
them to run outside of <cputune>.
Signed-off-by: Henning Schild <henning.schild@siemens.com>
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.
Signed-off-by: Henning Schild <henning.schild@siemens.com>
Add qemuDomainHasVCpuPids to do the checking and replace in place checks
with it.
We no longer need checking whether the thread contains fake data
(vcpupids[0] == vm->pid) as in b07f3d821d
and 65686e5a81 this was removed.
Since commit 0c04906fa the check for priv->cgroup doesn't make sense as
the calls to virCgroupHasController return the same information. Remove
it and move it's comment partially to the new check.
The already spurious check was also later copied to the iothreads code.
The domain definition is not needed in any of these functions.
Only pass it to qemuSetupChardevCgroup, which is used as a callback
for virDomainChrDefForeach.
Use the right type for passing virDomainObjPtr instead of
void* where possible.
Every single call to qemuDomainEventQueue() uses the following pattern:
if (event)
qemuDomainEventQueue(driver, event);
Let's move the check for valid event to qemuDomainEventQueue and
simplify all callers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The problem here is that there are some values that kernel accepts, but
does not set them, for example 18446744073709551615 which acts the same
way as zero. Let's do the same thing we do with other tuning options
and re-read them right after they are set in order to keep our internal
structures up-to-date.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165580
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If cpuset is disabled or not available, it libvirt must not use it.
Mainly for actions that do not need it and can use sched_setaffinity()
or numa_membind() instead, because they will fail without good reason.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1244664
Signed-off-by: Luyao Huang <lhuang@redhat.com>
The scope name, even according to our docs is
"machine-$DRIVER\x2d$VMNAME.scope" virSystemdMakeScopeName would use the
resource partition name instead of "machine-" if it was specified thus
creating invalid scope paths.
This makes libvirt drop cgroups for a VM that uses custom resource
partition upon reconnecting since the detected scope name would not
match the expected name generated by virSystemdMakeScopeName.
The error is exposed by the following log entry:
debug : virCgroupValidateMachineGroup:302 : Name 'machine-qemu\x2dtestvm.scope' for controller 'cpu' does not match 'testvm', 'testvm.libvirt-qemu' or 'machine-test-qemu\x2dtestvm.scope'
for a "/machine/test" resource and "testvm" vm.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1238570
The privileged flag will not change while the configuration might
change. Make the 'privileged' flag member of the driver again and mark
it immutable. Should that ever change add an accessor that will group
reads of the state.
Store the emulator pinning cpu mask as a pure virBitmap rather than the
virDomainPinDef since it stores only the bitmap and refactor
qemuDomainPinEmulator to do the same operations in a much saner way.
As a side effect virDomainEmulatorPinAdd and virDomainEmulatorPinDel can
be removed since they don't add any value.
So far, we are not reporting if numatune was even defined. The
value of zero is blindly returned (which maps onto
VIR_DOMAIN_NUMATUNE_MEM_STRICT). Unfortunately, we are making
decisions based on this value. Instead, we should not only return
the correct value, but report to the caller if the value is valid
at all.
For better viewing of this patch use '-w'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Remove the iothreadspin array from cputune and replace with a cpumask
to be stored in the iothreadids list.
Adjust the test output because our printing goes in order of the iothreadids
list now.
Add 'thread_id' to the virDomainIOThreadIDDef as a means to store the
'thread_id' as returned from the live qemu monitor data.
Remove the iothreadpids list from _qemuDomainObjPrivate and replace with
the new iothreadids 'thread_id' element.
Rather than use the default numbering scheme of 1..number of iothreads
defined for the domain, use the iothreadid's list for the iothread_id
Since iothreadids list keeps track of the iothread_id's, these are
now used in place of the many places where a for loop would "know"
that the ID was "+ 1" from the array element.
The new tests ensure usage of the <iothreadid> values for an exact number
of iothreads and the usage of a smaller number of <iothreadid> values than
iothreads that exist (and usage of the default numbering scheme).
131,088 bytes in 16 blocks are definitely lost in loss record 2,174 of 2,176
at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x52A026F: virReallocN (viralloc.c:245)
by 0x52BFCB5: saferead_lim (virfile.c:1268)
by 0x52C00EF: virFileReadLimFD (virfile.c:1328)
by 0x52C019A: virFileReadAll (virfile.c:1351)
by 0x52A5D4F: virCgroupGetValueStr (vircgroup.c:763)
by 0x1DDA0DA3: qemuRestoreCgroupState (qemu_cgroup.c:805)
by 0x1DDA0DA3: qemuConnectCgroup (qemu_cgroup.c:857)
by 0x1DDB7BA1: qemuProcessReconnect (qemu_process.c:3694)
by 0x52FD171: virThreadHelper (virthread.c:206)
by 0x82B8DF4: start_thread (pthread_create.c:308)
by 0x85C31AC: clone (clone.S:113)
Signed-off-by: Luyao Huang <lhuang@redhat.com>
==19015== 968 (416 direct, 552 indirect) bytes in 1 blocks are definitely lost in loss record 999 of 1,049
==19015== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==19015== by 0x52ADF14: virAllocVar (viralloc.c:560)
==19015== by 0x5302FD1: virObjectNew (virobject.c:193)
==19015== by 0x1DD9401E: virQEMUDriverConfigNew (qemu_conf.c:164)
==19015== by 0x1DDDF65D: qemuStateInitialize (qemu_driver.c:666)
==19015== by 0x53E0823: virStateInitialize (libvirt.c:777)
==19015== by 0x11E067: daemonRunStateInit (libvirtd.c:905)
==19015== by 0x53201AD: virThreadHelper (virthread.c:206)
==19015== by 0xA1EE1F2: start_thread (in /lib64/libpthread-2.19.so)
==19015== by 0xA4EFC8C: clone (in /lib64/libc-2.19.so)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
==19015== 1,064 (656 direct, 408 indirect) bytes in 2 blocks are definitely lost in loss record 1,002 of 1,049
==19015== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==19015== by 0x52AD74B: virAlloc (viralloc.c:144)
==19015== by 0x52B47CA: virCgroupNew (vircgroup.c:1057)
==19015== by 0x52B53E5: virCgroupNewVcpu (vircgroup.c:1451)
==19015== by 0x1DD85A40: qemuSetupCgroupForVcpu (qemu_cgroup.c:1013)
==19015== by 0x1DDA66EA: qemuProcessStart (qemu_process.c:4844)
==19015== by 0x1DDF1807: qemuDomainObjStart (qemu_driver.c:7265)
==19015== by 0x1DDF1A66: qemuDomainCreateWithFlags (qemu_driver.c:7320)
==19015== by 0x1DDF1ACD: qemuDomainCreate (qemu_driver.c:7337)
==19015== by 0x53F87EA: virDomainCreate (libvirt-domain.c:6820)
==19015== by 0x12690A: remoteDispatchDomainCreate (remote_dispatch.h:3481)
==19015== by 0x126827: remoteDispatchDomainCreateHelper (remote_dispatch.h:3457)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Two places would call to qemuPrepareCpumap() with priv->autoNodeset to
convert it to a cpuset. Remove the function and use the prepared cpuset
automatically.
When the default cpuset or automatic numa placement is used libvirt
would place the whole parent cgroup in the specified cpuset. This then
disallowed to re-pin the vcpus to a different cpu.
This patch pins only the vcpu threads to the default cpuset and thus
allows to re-pin them later.
The following config would fail to start:
<domain type='kvm'>
...
<vcpu placement='static' cpuset='0-1' current='2'>4</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='2-3'/>
...
This is a regression since a39f69d2b.
We've never set the cpuset.memory_migrate value to anything, keeping it
on default. However, we allow changing cpuset.mems on live domain.
That setting, however, don't have any consequence on a domain unless
it's going to allocate new memory.
I managed to make 'virsh numatune' move all the memory to any node I
wanted even without disabling libnuma's numa_set_membind(), so this
should be safe to use with it as well.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1198497
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
As pointed out by jtomko in his review of the IOThreads pinning code:
http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html
there are some comments sprinkled in indicating IOThreads were using
the same structure as the VcpuPin code...
This is the first patch of a few that will change the virDomainVcpuPin*
structures and code to just virDomainPin* - starting with the data
structure naming...
There was a mess in the way how we store unlimited value for memory
limits and how we handled values provided by user. Internally there
were two possible ways how to store unlimited value: as 0 value or as
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED. Because we chose to store memory
limits as unsigned long long, we cannot use -1 to represent unlimited.
It's much easier for us to say that everything greater than
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid
value despite that it makes no sense to set limit to 0.
Remove unnecessary function virCompareLimitUlong. The update of test
is to prevent the 0 to be miss-used as unlimited in future.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If 'virNumaGetHostNodeset()' fails then the error path will try to free
uninitialized pointer mem_mask. Introduced by commit af2a1f058.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
systemd-machined introduced a new method CreateMachineWithNetwork
that obsoletes CreateMachine. It expects to be given a list of
VETH/TAP device indexes for the host side device(s) associated
with a container/machine.
This falls back to the old CreateMachine method when the new
one is not supported.
Commit af2a1f05 tried clearly separating each condition in
qemuRestoreCgroupState() for the sake of readability, however somehow
one condition body was missing. That means that the body of the next
condition got executed only if both of there were true, which is
impossible, thus resulting in a dead code and a logic error.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Instead of setting the value of cpuset.mems once when the domain starts
and then re-calculating the value every time we need to change the child
cgroup values, leave the cgroup alone and rather set the child data
every time there is new cgroup created. We don't leave any task in the
parent group anyway. This will ease both current and future code.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If the memory mode is specified as 'strict' and with one node, we
get the following error when starting domain.
error: Unable to write to '$cgroup_path/cpuset.mems': Device or resource busy
XML is configured with numatune as follows:
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
It's broken by Commit 411cea638f
which moved qemuSetupCgroupForEmulator() before setting cpuset.mems
in qemuSetupCgroupPostInit.
Directory '$cgroup_path/emulator/' is created in qemuSetupCgroupForEmulator.
But '$cgroup_path/emulator/cpuset.mems' it not set and has a default value
(all nodes, such as 0-1). Then we setup '$cgroup_path/cpuset.mems' to the
nodemask (in this case it's '0') in qemuSetupCgroupPostInit. It must fail.
This patch makes '$cgroup_path/emulator/cpuset.mems' is set before
'$cgroup_path/cpuset.mems'. The action is similar with that in
qemuDomainSetNumaParamsLive.
Signed-off-by: Wang Rui <moon.wangrui@huawei.com>