Commit Graph

78 Commits

Author SHA1 Message Date
Martin Kletzander
231656bbeb cgroups: Redefine what "unlimited" means wrt memory limits
Since kernel 3.12 (commit 34ff8dc08956098563989d8599840b130be81252 in
linux-stable.git in particular) the value for 'unlimited' in cgroup
memory limits changed from LLONG_MAX to ULLONG_MAX.  Due to rather
unfortunate choice of our VIR_DOMAIN_MEMORY_PARAM_UNLIMITED constant
(which we transfer as an unsigned long long in Kibibytes), we ended up
with the situation described below (applies to x86_64):

 - 2^64-1 (ULLONG_MAX) -- "unlimited" in kernel = 3.12

 - 2^63-1 (LLONG_MAX) -- "unlimited" in kernel < 3.12
 - 2^63-1024 -- our PARAM_UNLIMITED scaled to Bytes

 - 2^53-1 -- our PARAM_UNLIMITED unscaled (in Kibibytes)

This means that when any number within (2^63-1, 2^64-1] is read from
memory cgroup, we are transferring that number instead of "unlimited".
Unfortunately, changing VIR_DOMAIN_MEMORY_PARAM_UNLIMITED would break
ABI compatibility and thus we have to resort to a different solution.

With this patch every value greater than PARAM_UNLIMITED means
"unlimited".  Even though this may seem misleading, we are already in
such unclear situation when running 3.12 kernel with memory limits set
to 2^63.

One example showing most of the problems at once (with kernel 3.12.2):
 # virsh memtune asdf --hard-limit 9007199254740991 --swap-hard-limit -1
 # echo 12345678901234567890 >\
/sys/fs/cgroup/memory/machine/asdf.libvirt-qemu/memory.soft_limit_in_bytes
 # virsh memtune asdf
 hard_limit     : 18014398509481983
 soft_limit     : 12056327051986884
 swap_hard_limit: 18014398509481983

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2013-12-10 08:38:46 +01:00
Zhou Yimin
036aeca721 Cgroup: Replace 'newpath' with 'newPath'
Unifying codding style, replace 'newpath' with 'newPath'.

From: Zhou Yimin <zhouyimin@huawei.com>
2013-12-06 16:18:14 +01:00
Chen Hanxiao
521cec2aab cgroup: leave blkio cgroup value checking to kernel
The range of valid values for cgroup tunables has
changed in the past and may change again in future
kernels. Avoid hardcoding range checks in libvirt
code, delegating range checking to the kernel itself.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-15 12:22:07 +01:00
Chen Hanxiao
501476fccf cgroup: show error when EINVAL is returned
When EINVAL is returned while changing a cgroups value, tell
user that what values are invalid for the field.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-15 12:18:47 +01:00
Chen Hanxiao
fc9a416df7 cgroup: fix a comment typo in vircgroup.c
s/shoule/should

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-09 17:16:58 +02:00
Peter Krempa
d79fe8b50b cgroup: Move [qemu|lxc]GetCpuBWStatus to vicgroup.c and refactor it
The function existed in two identical instances in lxc and qemu. Move it
to vircgroup.c and simplify it. Refactor the callers too.
2013-09-16 11:32:49 +02:00
Peter Krempa
4baa8d7637 cleanup: Kill usage of access(PATH, F_OK) in favor of virFileExists()
Semantics of the libvirt helper are more clear. This change also allows
to clean up some pieces of code.
2013-09-16 10:37:39 +02:00
Daniel P. Berrange
a48838ad2e Fix launching of VMs on when only logind part of systemd is present
Debian systems may run the 'systemd-logind' daemon, which causes the
/sys/fs/cgroup/systemd  mount to be setup, but no other cgroup
controllers are created. While the LXC driver considers cgroups to
be mandatory, the QEMU driver is supposed to accept them as optional.

We detect whether they are present by looking in /proc/mounts for
any mounts of type 'cgroups', but this is not sufficient. We need to
skip any named mounts (as seen by a name=XXX string in the mount
options), so that we only detect actual resource controllers.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=721979

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-09-12 11:32:36 +01:00
Daniel P. Berrange
f0b6d8d472 Fix cgroups when all are mounted on /sys/fs/cgroup
Some users in Ubuntu/Debian seem to have a setup where all the
cgroup controllers are mounted on /sys/fs/cgroup rather than
any /sys/fs/cgroup/<controller> name. In the loop which detects
which controllers are present for a mount point we were modifying
'mnt_dir' field in the 'struct mntent' var, but not always restoring
the original value. This caused detection to break in the all-in-one
mount setup.

Fix that logic bug and add test case coverage for this mount
setup.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-09-11 11:45:38 +01:00
Roman Bogorodskiy
81b1915773 cgroup macros refactoring, part 5
Complete the refactoring by adding missing stubs so it compiles on
platform without cgroup support.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:58:54 -06:00
Roman Bogorodskiy
2d795df3f0 cgroup macros refactoring, part 4
Complete moving to VIR_CGROUP_SUPPORTED

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:58:54 -06:00
Roman Bogorodskiy
7f5f270d5f cgroup macros refactoring, part 3
Continue converting to VIR_CGROUP_SUPPORTED

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:58:54 -06:00
Roman Bogorodskiy
c419e9b51c cgroup macros refactoring, part 2
- Convert virCgroupGet* to VIR_CGROUP_SUPPORTED
- Convert virCgroup(Get|Set)FreezerState to VIR_CGROUP_SUPPORTED

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:58:47 -06:00
Roman Bogorodskiy
02f1fd41f6 cgroup macros refactoring, part 1
- Introduce VIR_CGROUP_SUPPORTED conditional
- Convert virCgroupKill* to use it
- Convert virCgroupIsolateMount() to use it
- Convert virCgroupRemoveRecursively to VIR_CGROUP_SUPPORTED

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:15:58 -06:00
Eric Blake
2ff9e54cbf cgroup: functional sort
Make future patches smaller by matching a sane header listing in
the first place.  No semantic change.

* src/util/vircgroup.h: Move free next to new, and controller
functions next to each other.
* src/util/vircgroup.c (virCgroupFree, virCgroupHasController)
(virCgroupPathOfController, virCgroupRemoveRecursively)
(virCgroupRemove): Sort implementation to be closer to header.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:08:18 -06:00
Eric Blake
7ccd322b20 cgroup: topological sort
Avoid a forward declaration of a static function.

* src/util/vircgroup.c (virCgroupPartitionNeedsEscaping)
(virCgroupParticionEscape): Move up.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 15:38:37 -06:00
Eric Blake
a91929053c cgroup: use consistent formatting
Format all functions with two blank lines between, and return type
on separate line from function name.  Also break some lines longer
than 80 columns.  This makes the subsequent macro refactoring
less noisy.

* src/util/vircgroup.c: Match prevailing style.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 15:36:35 -06:00
Daniel P. Berrange
2fe2470181 Enable support for systemd-machined in cgroups creation
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:29:19 +01:00
Daniel P. Berrange
75304eaa1a Cope with races while killing processes
When systemd is involved in managing processes, it may start
killing off & tearing down croups associated with the process
while we're still doing virCgroupKillPainfully. We must
explicitly check for ENOENT and treat it as if we had finished
killing processes

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:27:28 +01:00
Daniel P. Berrange
aedd46e7e3 Add support for systemd cgroup mount
Systemd uses a named cgroup mount for tracking processes. Add
it as another type of controller, albeit one which we have to
special case in a number of places. In particular we must
never create/delete directories there, nor add tasks. Essentially
the systemd mount is to be considered read-only for libvirt.

With this change both the virCgroupDetectPlacement and
virCgroupCopyPlacement methods must be invoked. The copy
placement method will copy setup for resource controllers
only. The detect placement method will probe for any
named controllers, or resource controllers not already
setup.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:27:19 +01:00
Eric Blake
a2d0c3f553 build: fix vircgroup build on mingw
The previous patch was incomplete.

  CC       libvirt_util_la-vircgroup.lo
../../src/util/vircgroup.c:70:12: error: 'virCgroupPartitionEscape' declared 'static' but never defined [-Werror=unused-function]
 static int virCgroupPartitionEscape(char **path);
            ^

* src/util/vircgroup.c (virCgroupPartitionEscape): Move forward
declaration inside conditional.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-07-29 08:56:20 -06:00
Daniel P. Berrange
7cf81fa175 Conditionalize build of virCgroupValidateMachineGroup
The virCgroupValidateMachineGroup method calls some functions
which are only conditionally compiled, thus it too must be
made conditional. This fixes the build on non-Linux hosts.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-29 14:36:44 +01:00
Daniel P. Berrange
56b54173ed Skip detecting placement if controller is disabled
If the app has provided a whitelist of controllers to be used,
we skip detecting its mount point. We still, however, fill in
the placement info which later confuses the machine name
validation code. Skip detecting placement if the controller
mount point is not set

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:51 +01:00
Daniel P. Berrange
5ec5a22493 Add 'controllers' arg to virCgroupNewDetect
When detecting cgroups we must honour any controllers
whitelist the driver may have.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:47 +01:00
Daniel P. Berrange
c101b851c1 Fix detection of 'emulator' cgroup
When a VM has an 'emulator' child cgroup present, we must
strip off that suffix when detecting the cgroup for a
machine

Rename the virCgroupIsValidMachineGroup method to
virCgroupValidateMachineGroup to make a bit clearer
that this isn't simply a boolean check, it will make
changes to the object.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:46 +01:00
Daniel P. Berrange
525c9d5a49 Make virCgroupIsValidMachine static
The virCgroupIsValidMachine does not need to be called from
outside the cgroups file now, so make it static.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:29 +01:00
Daniel P. Berrange
a45b99ead9 Introduce a more convenient virCgroupNewDetectMachine
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:47:30 +01:00
Daniel P. Berrange
3068244e85 Protection against doing bad stuff to the root group
Add protection such that the virCgroupRemove and
virCgroupKill* do not do anything to the root cgroup.

Killing all PIDs in the root cgroup does not end well.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 11:42:48 +01:00
Daniel P. Berrange
b333330aa5 New cgroups API for atomically creating machine cgroups
Instead of requiring one API call to create a cgroup and
another to add a task to it, introduce a new API
virCgroupNewMachine which does both jobs at once. This
will facilitate the later code to talk to systemd to
achieve this job which is also atomic.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 11:42:47 +01:00
Roman Bogorodskiy
fa6805e55e Fix virCgroupAvailable() w/o HAVE_GETMNTENT_R defined
virCgroupAvailable() implementation calls getmntent_r
without checking if HAVE_GETMNTENT_R is defined, so it fails
to build on platforms without getmntent_r support.

Make virCgroupAvailable() just return false without
HAVE_GETMNTENT_R.
2013-07-24 15:31:34 +02:00
Daniel P. Berrange
d64e852b5a Remove obsolete cgroups creation apis
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
e638778eb3 Add API for checking if a cgroup is valid for a domain
Add virCgroupIsValidMachine API to check whether an auto
detected cgroup is valid for a machine. This lets us
check if a VM has just been placed into some generic
shared cgroup, or worse, the root cgroup

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
66a7f857f3 Add a virCgroupNewDetect API for finding cgroup placement
Add a virCgroupNewDetect API which is used to initialize a
cgroup object with the placement of an arbitrary process.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:35:26 +01:00
Daniel P. Berrange
0d7f45aea7 Convert remainder of cgroups code to report errors
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
3260fdfab0 Convert the virCgroupKill* APIs to report errors
Instead of returning errno values, change the virCgroupKill*
APIs to fully report errors.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
b64dabff27 Report full errors from virCgroupNew*
Instead of returning raw errno values, report full libvirt
errors in virCgroupNew* functions.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Ján Tomko
cc7329317f cgroup: reuse buffer for getline
Reuse the buffer for getline and track buffer allocation
separately from the string length to prevent unlikely
out-of-bounds memory access.

This fixes the following leak that happened when zero bytes were read:

==404== 120 bytes in 1 blocks are definitely lost in loss record 1,344 of 1,671
==404==    at 0x4C2C71B: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==404==    by 0x906F862: getdelim (iogetdelim.c:68)
==404==    by 0x52A48FB: virCgroupPartitionNeedsEscaping (vircgroup.c:1136)
==404==    by 0x52A0FB4: virCgroupPartitionEscape (vircgroup.c:1171)
==404==    by 0x52A0EA4: virCgroupNewDomainPartition (vircgroup.c:1450)
2013-07-17 14:08:11 +02:00
Daniel P. Berrange
f8b42f3224 Convert 'int i' to 'size_t i' in src/util/ files
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-10 17:40:13 +01:00
Michal Privoznik
a2f8babc7d Adapt to VIR_ALLOC and virAsprintf in src/util/* 2013-07-10 11:07:33 +02:00
Michal Privoznik
8290cbbc38 viralloc: Report OOM error on failure
Similarly to VIR_STRDUP, we want the OOM error to be reported in
VIR_ALLOC and friends.
2013-07-10 11:07:31 +02:00
Michal Privoznik
bc13222185 virCgroupNewPartition: Don't leak @newpath
The @newpath variable is allocated in virCgroupSetPartitionSuffix(). But
it's newer freed.
2013-07-03 09:42:11 +02:00
Ján Tomko
5bc8ecb8d1 Plug leak in virCgroupMoveTask
We only break out of the while loop if *content is an empty string.
However the buffer has been allocated to BUFSIZ + 1 (8193 in my case),
but it gets overwritten in the next for iteration.

Move VIR_FREE right before we overwrite it to avoid the leak.

==5777== 16,386 bytes in 2 blocks are definitely lost in loss record 1,022 of 1,027
==5777==    by 0x5296E28: virReallocN (viralloc.c:184)
==5777==    by 0x52B0C66: virFileReadLimFD (virfile.c:1137)
==5777==    by 0x52B0E1A: virFileReadAll (virfile.c:1199)
==5777==    by 0x529B092: virCgroupGetValueStr (vircgroup.c:534)
==5777==    by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)

Introduced by 83e4c77.

https://bugzilla.redhat.com/show_bug.cgi?id=978352
2013-06-26 15:38:01 +02:00
Ján Tomko
306c49ffd5 Fix invalid read in virCgroupGetValueStr
Don't check for '\n' at the end of file if zero bytes were read.

Found by valgrind:
==404== Invalid read of size 1
==404==    at 0x529B09F: virCgroupGetValueStr (vircgroup.c:540)
==404==    by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
==404==    by 0x1EB475: qemuSetupCgroupForEmulator (qemu_cgroup.c:1061)
==404==    by 0x1D9489: qemuProcessStart (qemu_process.c:3801)
==404==    by 0x18557E: qemuDomainObjStart (qemu_driver.c:5787)
==404==    by 0x190FA4: qemuDomainCreateWithFlags (qemu_driver.c:5839)

Introduced by 0d0b409.

https://bugzilla.redhat.com/show_bug.cgi?id=978356
2013-06-26 15:05:43 +02:00
Ján Tomko
e557766c3b Replace two-state local integers with bool
Found with 'git grep "= 1"'.
2013-06-06 17:22:53 +02:00
Viktor Mihajlovski
eb21408f44 cgroups: Do not enforce nonexistent controllers
Currently, the controllers argument to virCgroupDetect acts both as
a result filter and a required controller specification, which is
a bit overloaded. If both functionalities are needed, it would be
better to have them seperated into a filter and a requirement mask.
The only situation where it is used today is to ensure that only
CPU related controllers are used for the VCPU directories. But here
we clearly do not want to enforce the existence of cpu, cpuacct and
specifically not cpuset at the same time.
This commit changes the semantics of controllers to "filter only".
Should a required mask ever be needed, more work will have to be done.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2013-05-24 12:11:24 +02:00
Michal Privoznik
eb8e5e8774 Adapt to VIR_STRDUP and VIR_STRNDUP in src/util/vircgroup.c
This commit is separate due to unusual paradigm compared to the
most source files.
2013-05-24 10:10:03 +02:00
Michal Privoznik
b43bb98a31 virCgroupAddTaskStrController: s/-1/-ENOMEM/
Within whole vircgroup.c we 'return -errno', e.g. 'return -ENOMEM'.
However, in this specific function virCgroupAddTaskStrController
we weren't returning -ENOMEM but -1 despite fact that later in
the function we are returning one of errno values indeed.
2013-05-24 10:03:22 +02:00
Eric Blake
83e4c77547 cgroup: be robust against cgroup movement races
https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a
problem starting domains when cgroups are enabled; I was able
to reliably reproduce the race about 5% of the time when I added
hooks to domain startup by 3 seconds (as that seemed to be about
the length of time that qemu created and then closed a temporary
thread, probably related to aio handling of initially opening
a disk image).  The problem has existed since we introduced
virCgroupMoveTask in commit 9102829 (v0.10.0).

There are some inherent TOCTTOU races when moving tasks between
kernel cgroups, precisely because threads can be created or
completed in the window between when we read a thread id from the
source and when we write to the destination.  As the goal of
virCgroupMoveTask is merely to move ALL tasks into the new
cgroup, it is sufficient to iterate until no more threads are
being created in the old group, and ignoring any threads that
die before we can move them.

It would be nicer to start the threads in the right cgroup to
begin with, but by default, all child threads are created in
the same cgroup as their parent, and we don't want vcpu child
threads in the emulator cgroup, so I don't see any good way
of avoiding the move.  It would also be nice if the kernel were
to implement something like rename() as a way to atomically move
a group of threads from one cgroup to another, instead of forcing
a window where we have to read and parse the source, then format
and write back into the destination.

* src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore
ESRCH, because a thread ended between read and write attempts.
(virCgroupMoveTask): Loop until all threads have moved.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-05-21 11:33:56 -06:00
Osier Yang
3fcc1df2f8 src/utils: Remove the whitespace before ";" 2013-05-21 23:41:45 +08:00
Daniel P. Berrange
c2cf5f1c2a Fix failure to detect missing cgroup partitions
Change bbe97ae968 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-17 10:25:15 +01:00