Commit Graph

166 Commits

Author SHA1 Message Date
Daniel P. Berrange
44f79a0bd0 lxc: ensure libvirt_lxc and qemu-nbd move into systemd machine slice
Currently when spawning containers with systemd, the container PID 1
will get moved into the systemd machine slice. Libvirt then manually
moves the libvirt_lxc and qemu-nbd processes into the cgroups associated
with the slice, but skips the systemd controller cgroup. This means that
from systemd's POV, libvirt_lxc and qemu-nbd are still part of the
libvirtd.service unit.

On systemctl daemon-reload, it will notice that libvirt_lxc & qemu-nbd
are in the libvirtd.service unit for the systemd controller, but in the
machine cgroups for resources. Systemd will thus move them back into
the libvirtd.service resource cgroups next time libvirtd is restarted.
This causes libvirtd to kill off the container due to incorrect cgroup
placement.

The solution is to ensure that when moving libvirt_lxc & qemu-nbd, we
also move the systemd cgroup controller placement. Normally this is
not something we ever want todo, but this is a special case as we are
intentionally wanting to move them to a different systemd unit.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 12:46:52 +00:00
Boris Fiuczynski
dbeaa7e666 cgroup: reduce complexity of controller disabling
This patch reduces the complexity of the filtering algorithm in
virCgroupDetect by first correcting the controller mask and then
checking for potential co-mounts without any correlating
controller mask modifications.

If you agree that this patch removes complexity and improves
readability it could simply be squashed into the first patch
of this series.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
2016-12-20 11:18:09 +01:00
Boris Fiuczynski
dfcfe0bb9c cgroup: unavailable controller prevents controller disabling
The cgroup controller filtering in virCgroupDetect does not work
properly if the following conditions are met:
1) the host system does not have a cgroup controller which
libvirt requests (unavailable controller) and
2) libvirt is configured to disable a controller (disabled controller) and
3) the disabled controller is located before the unavailable controller
in virCgroupController.

As an example: The memory controller is unavailable and the cpuset
controller is configured to be disabled.
In this scenario trying to start a domain results in the error
error: Controller 'cpuset' is not wanted, but 'memory' is co-mounted: Invalid argument

This error occurs when virCgroupDetect is called with a valid parent group.
The resulting group created by virCgroupCopyMounts holds for cpuset and
memory controller empty mount points. The filtering of disabled controllers
checks for co-mounts by comparing the mount points. The cpuset controller
causes the filtering to occur before the memory controller is marked as to be
ignored by modifying the controller mask since it is unavailable.
Therefore the co-mount detection logic compares the cpuset and memory controller
mount points and since both are empty the memory controller is regarded
erroneously as being co-mounted.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-12-20 11:17:22 +01:00
Viktor Mihajlovski
ac8ac9e052 cgroup: Use system reported "unlimited" value for comparison
With kernel 3.18 (since commit 3e32cb2e0a12b6915056ff04601cf1bb9b44f967)
the "unlimited" value for cgroup memory limits has changed once again as
its byte value is now computed from a page counter.
The new "unlimited" value reported by the cgroup fs is therefore 2**51-1
pages which is (VIR_DOMAIN_MEMORY_PARAM_UNLIMITED - 3072). This results
e.g. in virsh memtune displaying 9007199254740988 instead of unlimited
for the limits.

This patch uses the value of memory.limit_in_bytes from the cgroup
memory root which is the system's "real" unlimited value for comparison.

See also libvirt commit 231656bbeb for the
history for kernel 3.12 and before.

Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
2016-12-06 16:25:20 +01:00
Michal Privoznik
c2a5a4e7ea virstring: Unify string list function names
We have couple of functions that operate over NULL terminated
lits of strings. However, our naming sucks:

virStringJoin
virStringFreeList
virStringFreeListCount
virStringArrayHasString
virStringGetFirstWithPrefix

We can do better:

virStringListJoin
virStringListFree
virStringListFreeCount
virStringListHasString
virStringListGetFirstWithPrefix

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-11-25 13:54:05 +01:00
Nitesh Konkar
d276da48bc Fix typos and grammar
Signed-off-by: Nitesh Konkar <nitkon12@linux.vnet.ibm.com>
2016-11-23 12:08:15 -05:00
Michal Privoznik
b7d2d4af2b src: Treat PID as signed
This initially started as a fix of some debug printing in
virCgroupDetect. However it turned out that other places suffer
from the similar problem. While dealing with pids, esp. in cases
where we cannot use pid_t for ABI stability reasons, we often
chose an unsigned integer type. This makes no sense as pid_t is
signed.
Also, new syntax-check rule is introduced so we won't repeat this
mistake.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-10-13 17:58:56 +08:00
Michal Privoznik
f3f15cc240 Make sure sys/types.h is included after sys/sysmacros.h
In the latest glibc, major() and minor() functions are marked as
deprecated (glibc commit dbab6577):

  CC       util/libvirt_util_la-vircgroup.lo
util/vircgroup.c: In function 'virCgroupGetBlockDevString':
util/vircgroup.c:768:5: error: '__major_from_sys_types' is deprecated:
  In the GNU C Library, `major' is defined by <sys/sysmacros.h>.
  For historical compatibility, it is currently defined by
  <sys/types.h> as well, but we plan to remove this soon.
  To use `major', include <sys/sysmacros.h> directly.
  If you did not intend to use a system-defined macro `major',
  you should #undef it after including <sys/types.h>.
  [-Werror=deprecated-declarations]
     if (virAsprintf(&ret, "%d:%d ", major(sb.st_rdev), minor(sb.st_rdev)) < 0)
     ^~
In file included from /usr/include/features.h:397:0,
                 from /usr/include/bits/libc-header-start.h:33,
                 from /usr/include/stdio.h:28,
                 from ../gnulib/lib/stdio.h:43,
                 from util/vircgroup.c:26:
/usr/include/sys/sysmacros.h:87:1: note: declared here
 __SYSMACROS_DEFINE_MAJOR (__SYSMACROS_FST_IMPL_TEMPL)
 ^

Moreover, in the glibc commit, there's suggestion to keep
ordering of including of header files as implemented here.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-09-06 17:49:36 +02:00
Peter Krempa
c84c2cb389 util: Extract and rename qemuDomainDelCgroupForThread to virCgroupDelThread 2016-08-24 15:44:47 -04:00
Ján Tomko
cd6e4e5fe4 cgroup: drop INSERT_ELEMENT usage virCgroupPartitionEscape
Use virAsprintf to prepend an underscore to make the code more
readable.
2016-07-26 10:41:26 +02:00
Ján Tomko
994b024624 Use virDirOpenQuiet
Remove all the remaining usage of opendir.
2016-06-24 14:20:57 +02:00
Ján Tomko
42b4a37d68 Use virDirOpenIfExists
Use it instead of opendir everywhere we need to check for ENOENT.
2016-06-24 14:20:57 +02:00
Ján Tomko
e81de04c10 Use virDirOpen
Switch from opendir to virDirOpen everywhere we need to report an error.
2016-06-24 14:20:57 +02:00
Ján Tomko
70a033ab42 Do not ignore hidden files in /sys and /proc
The directories we iterate over are unlikely to contain any entries
starting with a dot, other than '.' and '..' which is already skipped
by virDirRead.
2016-06-23 21:58:38 +02:00
Ján Tomko
fe79c3f2c1 Do not check for '.' and '..' after virDirRead
It skips those directory entries.
2016-06-23 21:58:38 +02:00
Ján Tomko
a4e6f1eb9c Introduce VIR_DIR_CLOSE
Introduce a helper that only calls closedir if DIR* is non-NULL
and sets it to NULL afterwards.
2016-06-23 21:58:33 +02:00
Daniel P. Berrange
eaf18f4c2b nodeinfo: move host CPU APIs out into virhostcpu.c file
Move all APIs with a virHostCPU name prefix out into new
util/virhostcpu.h & util/virhostcpu.c files

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-06-09 18:31:11 +01:00
Daniel P. Berrange
4053350bfe nodeinfo: rename all CPU APIs to have a virHostCPU prefix
In preparation for moving all the CPU related APIs out of
the nodeinfo file, give them a virHostCPU name prefix.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-06-09 18:08:15 +01:00
Daniel P. Berrange
08ea852c25 nodeinfo: remove sysfs_prefix from all methods
Nearly all the methods in the nodeinfo file are given a
'const char *sysfs_prefix' parameter to override the
default sysfs path (/sys/devices/system). Every single
caller passes in NULL for this, except one use in the
unit tests. Furthermore this parameter is totally
Linux-specific, when the APIs are intended to be cross
platform portable.

This removes the sysfs_prefix parameter and instead gives
a new method linuxNodeInfoSetSysFSSystemPath for use by
the test suite.

For two of the methods this hardcodes use of the constant
SYSFS_SYSTEM_PATH, since the test suite does not need to
override the path for thos methods.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-06-09 18:00:18 +01:00
Michal Privoznik
fb377701f2 virCgroupValidateMachineGroup: Reflect change in CGroup struct naming
Fron c3bd0019c0 on instead of creating the following path for
cgroups:

  /sys/fs/cgroupX/$name.libvirt-$driver

we generate rather more verbose one:

  /sys/fs/cgroupX/$driver-$id-$name.libvirt-$driver

where $name is optional and included iff contains allowed chars.
See original commit for more reasoning. Now, problem with the
original commit is that we are unable to start any LXC domain
after it. Because when starting LXC container, the CGroup layout
is created by our lxc_controller process and then detected and
validated by libvirtd. The validation is done by trying to match
detected layout against all the possible patterns for cgroup
paths that we've ever had. And the commit in question forgot to
update this part of the code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-05-06 12:51:06 +02:00
Martin Kletzander
aca4d72b2a Include sysmacros.h where needed
So in glibc-2.23 sys/sysmacros.h is no longer included from sys/types.h
and we don't build because of the usage of major/minor/makedev macros.
Autoconf already has AC_HEADER_MAJOR macro that check where exactly
these functions/macros are defined, so let's use that.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2016-04-18 20:36:57 +02:00
Henning Schild
ff16bde100 qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask
qemuProcessSetupEmulator runs at a point in time where there is only
the qemu main thread. Use virCgroupAddTask to put just that one task
into the emulator cgroup. That patch makes virCgroupMoveTask and
virCgroupAddTaskStrController obsolete.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2016-03-01 14:07:27 +00:00
Henning Schild
85d7480654 vircgroup: one central point for adding tasks to cgroups
Use virCgroupAddTaskController in virCgroupAddTask so we have one
single point where we add tasks to cgroups.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2016-03-01 11:20:56 +00:00
Michal Privoznik
6bfb03ae15 vircgroup: Update virCgroupDenyDevicePath stub
In cf113e8d we changed the declaration of
virCgroupAllowDevicePath() and virCgroupDenyDevicePath().
However, while updating the stub for non-cgroup platforms for the
former we forgot to update the latter too causing a build
failure.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-02-17 14:25:35 +01:00
Peter Krempa
cf113e8d54 util: cgroup: Allow ignoring EACCES in virCgroup(Allow|Deny)DevicePath
When adding disk images to ACL we may call those functions on NFS
shares. In that case we might get an EACCES, which isn't really relevant
since NFS would not hold a block device. This patch adds a flag that
allows to stop reporting an error on EACCES to avoid spaming logs.

Currently there's no functional change.
2016-02-17 10:54:05 +01:00
Peter Krempa
9cd5da710e util: cgroup: Drop virCgroup(Allow|Deny)DeviceMajor
Since commit 47e5b5ae virCgroupAllowDevice allows to pass -1 as either
the minor or major device number and it automatically uses '*' in place
of that. Reuse the new approach through the code and drop the duplicated
functions.
2016-02-17 10:54:05 +01:00
Peter Krempa
f42b5c327f util: cgroup: Instrument virCgroupDenyDevice to handle -1 device number as *
Similarly to commit 47e5b5ae virCgroupDenyDevice will handle -1 as *.
2016-02-17 10:54:05 +01:00
Michal Privoznik
a0aa92a24b vircgroup: Update virCgroupGetPercpuStats stump
In the commit 7938b533 we've changed the function signature,
however forgot to update stump that's used on systems without
CGroups causing a build failure.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-02-08 14:06:30 +01:00
Peter Krempa
7938b533d5 cgroup: Prepare for sparse vCPU topologies in virCgroupGetPercpuStats
Pass a bitmap of enabled guest vCPUs to virCgroupGetPercpuStats so that
non-continuous vCPU topologies can be used.
2016-02-08 09:51:34 +01:00
Martin Kletzander
c3bd0019c0 systemd: Modernize machine naming
So, systemd-machined has this philosophy that machine names are like
hostnames and hence should follow the same rules.  But we always allowed
international characters in domain names.  Thus we need to modify the
machine name we are passing to systemd.

In order to change some machine names that we will be passing to systemd,
we also need to call TerminateMachine at the end of a lifetime of a
domain.  Even for domains that were started with older libvirt.  That
can be achieved thanks to virSystemdGetMachineNameByPID().  And because
we can change machine names, we can get rid of the inconsistent and
pointless escaping of domain names when creating machine names.

So this patch modifies the naming in the following way.  It creates the
name as <drivername>-<id>-<name> where invalid hostname characters are
stripped out of the name and if the resulting name is longer, it
truncates it to 64 characters.  That way we can start domains we
couldn't start before.  Well, at least on systemd.

To make it work all together, the machineName (which is needed only with
systemd) is saved in domain's private data.  That way the generation is
moved to the driver and we don't need to pass various unnecessary
arguments to cgroup functions.

The only thing this complicates a bit is the scope generation when
validating a cgroup where we must check both old and new naming, so a
slight modification was needed there.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1282846

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2016-02-05 16:11:50 +01:00
Peter Krempa
58578f83bc cgroup: Clean up virCgroupGetPercpuStats
Use 'ret' for return variable name, clarify use of 'param_idx' and avoid
unnecessary 'success' label. No functional changes. Also document the
function.
2016-02-03 13:10:04 +01:00
Michal Privoznik
c7f5e26b5f vircgroup: Finish renaming of virCgroupIsolateMount
In dc576025c3 we renamed virCgroupIsolateMount function to
virCgroupBindMount. However, we forgot about one occurrence in
section of the code which provides stubs for platforms without
support for CGroups like *BSD for instance.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2016-01-26 17:39:47 +01:00
Daniel P. Berrange
dc576025c3 lxc: don't try to hide parent cgroups inside container
On the host when we start a container, it will be
placed in a cgroup path of

   /machine.slice/machine-lxc\x2ddemo.scope

under /sys/fs/cgroup/*

Inside the containers' namespace we need to setup
/sys/fs/cgroup mounts, and currently will bind
mount /machine.slice/machine-lxc\x2ddemo.scope on
the host to appear as / in the container.

While this may sound nice, it confuses applications
dealing with cgroups, because /proc/$PID/cgroup
now does not match the directory in /sys/fs/cgroup

This particularly causes problems for systems and
will make it create repeated path components in
the cgroup for apps run in the container eg

  /machine.slice/machine-lxc\x2ddemo.scope/machine.slice/machine-lxc\x2ddemo.scope/user.slice/user-0.slice/session-61.scope

This also causes any systemd service that uses
sd-notify to fail to start, because when systemd
receives the notification it won't be able to
identify the corresponding unit it came from.
In particular this break rabbitmq-server startup

Future kernels will provide proper cgroup namespacing
which will handle this problem, but until that time
we should not try to play games with hiding parent
cgroups.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-01-26 16:11:32 +00:00
John Ferlan
d41bd09596 Revert "util: cgroups do not implicitly add task to new machine cgroup"
This reverts commit 71ce475967.

Since commit id 'a41c00b47' has been reverted, this no longer is
necessary
2016-01-14 11:00:25 -05:00
Jasper Lievisse Adriaanse
1b60f1b401 cgroup: don't include sys/mount.h if not needed
As cgroup implementation only works on Linux, it does not
make much sense to include sys/mount.h if other requirements are
not met, such as HAVE_MNTENT_H and HAVE_GETMNTENT_R.

Also, it fixes build on OpenBSD that requires to include sys/param.h
along with sys/mount.h.

Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
2016-01-11 19:56:06 +03:00
Michal Privoznik
f55d1316ad sysconf: Include unistd.h
The manpage for sysconf() suggest including unistd.h as the
function is declared there. Even though we are not hitting any
compile issues currently, let's include the correct header file
instead of relying on some hidden include chain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-24 18:03:50 +01:00
Henning Schild
71ce475967 util: cgroups do not implicitly add task to new machine cgroup
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2015-12-14 15:43:29 -05:00
Roman Bogorodskiy
46550cde0f util: fix build without cgroup
Commit 89c509a0 added getters for cgroup block device I/O throttling,
however stub versions of these functions have not matching function
prototypes that result in compilation fail on platforms not supporting
cgroup.

Fix build by correcting prototypes of the stubbed functions.

Pushing under build-breaker rule.
2015-08-20 09:42:56 +03:00
Martin Kletzander
89c509a0c1 util: Add getters for cgroup block device I/O throttling
Since now they were not needed, but I sense they will be in a short
while.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-08-18 16:25:16 -07:00
Martin Kletzander
ea9db906fc util: Add virCgroupGetBlockDevString
This function translates device paths to "major:minor " string, and all
virCgroupSetBlkioDevice* functions are modified to use it.  It's a
cleanup with no functional change.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-08-18 16:16:38 -07:00
Peter Krempa
88f6c007c3 cgroup: Drop resource partition from virSystemdMakeScopeName
The scope name, even according to our docs is
"machine-$DRIVER\x2d$VMNAME.scope" virSystemdMakeScopeName would use the
resource partition name instead of "machine-" if it was specified thus
creating invalid scope paths.

This makes libvirt drop cgroups for a VM that uses custom resource
partition upon reconnecting since the detected scope name would not
match the expected name generated by virSystemdMakeScopeName.

The error is exposed by the following log entry:

debug : virCgroupValidateMachineGroup:302 : Name 'machine-qemu\x2dtestvm.scope' for controller 'cpu' does not match 'testvm', 'testvm.libvirt-qemu' or 'machine-test-qemu\x2dtestvm.scope'

for a "/machine/test" resource and "testvm" vm.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1238570
2015-07-22 07:12:56 +02:00
John Ferlan
51281dcb90 nodeinfo: Add sysfs_prefix to nodeGetPresentCPUBitmap
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
0456eda317 cgroup: Use virCgroupNewThread
Replace the virCgroupNew{Vcpu|Emulator|IOThread} calls with the common
virCgroupNewThread API

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-04-09 19:27:08 -04:00
John Ferlan
2cd3a980dc cgroup: Introduce virCgroupNewThread
Create a new common API to replace the virCgroupNew{Vcpu|Emulator|IOThread}
API's using an emum to generate the cgroup name

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-04-09 19:27:08 -04:00
Michal Privoznik
d65acbde35 vircgroup: Introduce virCgroupControllerAvailable
This new internal API checks if given CGroup controller is
available.  It is going to be needed later when we need to make a
decision whether pin domain memory onto NUMA nodes using cpuset
CGroup controller or using numa_set_membind().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-04-08 11:54:24 +02:00
Michal Privoznik
149a62bc83 virCgroupNew: Enhance debug message
When creating new internal representation of cgroups, all passed
arguments are logged. Well, except for two: pid and pointer for
return value. Lets log them too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-03-30 15:20:24 +02:00
Michal Privoznik
0a09bcdc7f virCgroupNewPartition: Fix comment
The function has no argument named @name rather than @path
instead.  The comment is, however, referring to @name while it
should have been referring to @path really.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-03-30 15:20:20 +02:00
John Ferlan
cf6ab17e45 vircgroup: Fix build issue mingw cross compile
Commit id '2dbfa716' exposed virCgroupDetectMountsFromFile, but did not
add the corresponding entry in the "#else /* !VIR_CGROUP_SUPPORTED */"
section of the module.
2015-03-27 18:09:07 -04:00
John Ferlan
38efd52584 vircgroup: Fix build issue on mingw cross compile
Commit id 'ba1dfc5' added virCgroupSetCpusetMemoryMigrate and
virCgroupGetCpusetMemoryMigrate, but did not add the corresponding
entry points into the "#else /* !VIR_CGROUP_SUPPORTED */" section
2015-03-27 18:09:07 -04:00
Martin Kletzander
ba1dfc5b6a cgroup: Add accessors for cpuset.memory_migrate
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-03-20 13:40:02 +01:00