Commit Graph

53 Commits

Author SHA1 Message Date
Martin Kletzander
eaf2c9f891 Move machineName generation from virsystemd into domain_conf
It is more related to a domain as we might use it even when there is
no systemd and it does not use any dbus/systemd functions.  In order
not to use code from conf/ in util/ pass machineName in cgroups code
as a parameter.  That also fixes a leak of machineName in the lxc
driver and cleans up and de-duplicates some code.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2017-07-25 17:02:27 +02:00
Daniel P. Berrange
44f79a0bd0 lxc: ensure libvirt_lxc and qemu-nbd move into systemd machine slice
Currently when spawning containers with systemd, the container PID 1
will get moved into the systemd machine slice. Libvirt then manually
moves the libvirt_lxc and qemu-nbd processes into the cgroups associated
with the slice, but skips the systemd controller cgroup. This means that
from systemd's POV, libvirt_lxc and qemu-nbd are still part of the
libvirtd.service unit.

On systemctl daemon-reload, it will notice that libvirt_lxc & qemu-nbd
are in the libvirtd.service unit for the systemd controller, but in the
machine cgroups for resources. Systemd will thus move them back into
the libvirtd.service resource cgroups next time libvirtd is restarted.
This causes libvirtd to kill off the container due to incorrect cgroup
placement.

The solution is to ensure that when moving libvirt_lxc & qemu-nbd, we
also move the systemd cgroup controller placement. Normally this is
not something we ever want todo, but this is a special case as we are
intentionally wanting to move them to a different systemd unit.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-01-09 12:46:52 +00:00
Peter Krempa
c84c2cb389 util: Extract and rename qemuDomainDelCgroupForThread to virCgroupDelThread 2016-08-24 15:44:47 -04:00
Henning Schild
ff16bde100 qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask
qemuProcessSetupEmulator runs at a point in time where there is only
the qemu main thread. Use virCgroupAddTask to put just that one task
into the emulator cgroup. That patch makes virCgroupMoveTask and
virCgroupAddTaskStrController obsolete.

Signed-off-by: Henning Schild <henning.schild@siemens.com>
2016-03-01 14:07:27 +00:00
Peter Krempa
cf113e8d54 util: cgroup: Allow ignoring EACCES in virCgroup(Allow|Deny)DevicePath
When adding disk images to ACL we may call those functions on NFS
shares. In that case we might get an EACCES, which isn't really relevant
since NFS would not hold a block device. This patch adds a flag that
allows to stop reporting an error on EACCES to avoid spaming logs.

Currently there's no functional change.
2016-02-17 10:54:05 +01:00
Peter Krempa
9cd5da710e util: cgroup: Drop virCgroup(Allow|Deny)DeviceMajor
Since commit 47e5b5ae virCgroupAllowDevice allows to pass -1 as either
the minor or major device number and it automatically uses '*' in place
of that. Reuse the new approach through the code and drop the duplicated
functions.
2016-02-17 10:54:05 +01:00
Peter Krempa
7938b533d5 cgroup: Prepare for sparse vCPU topologies in virCgroupGetPercpuStats
Pass a bitmap of enabled guest vCPUs to virCgroupGetPercpuStats so that
non-continuous vCPU topologies can be used.
2016-02-08 09:51:34 +01:00
John Ferlan
b8c0f18654 util: Fix virCgroupNewMachine ATTRIBUTE_NONNULL args
Commit id 'c3bd0019c0' removed arg3, but forgot to adjust the numbers
for NONNULL - caused build failure for coverity
2016-02-06 06:45:46 -05:00
Martin Kletzander
c3bd0019c0 systemd: Modernize machine naming
So, systemd-machined has this philosophy that machine names are like
hostnames and hence should follow the same rules.  But we always allowed
international characters in domain names.  Thus we need to modify the
machine name we are passing to systemd.

In order to change some machine names that we will be passing to systemd,
we also need to call TerminateMachine at the end of a lifetime of a
domain.  Even for domains that were started with older libvirt.  That
can be achieved thanks to virSystemdGetMachineNameByPID().  And because
we can change machine names, we can get rid of the inconsistent and
pointless escaping of domain names when creating machine names.

So this patch modifies the naming in the following way.  It creates the
name as <drivername>-<id>-<name> where invalid hostname characters are
stripped out of the name and if the resulting name is longer, it
truncates it to 64 characters.  That way we can start domains we
couldn't start before.  Well, at least on systemd.

To make it work all together, the machineName (which is needed only with
systemd) is saved in domain's private data.  That way the generation is
moved to the driver and we don't need to pass various unnecessary
arguments to cgroup functions.

The only thing this complicates a bit is the scope generation when
validating a cgroup where we must check both old and new naming, so a
slight modification was needed there.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1282846

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2016-02-05 16:11:50 +01:00
Daniel P. Berrange
dc576025c3 lxc: don't try to hide parent cgroups inside container
On the host when we start a container, it will be
placed in a cgroup path of

   /machine.slice/machine-lxc\x2ddemo.scope

under /sys/fs/cgroup/*

Inside the containers' namespace we need to setup
/sys/fs/cgroup mounts, and currently will bind
mount /machine.slice/machine-lxc\x2ddemo.scope on
the host to appear as / in the container.

While this may sound nice, it confuses applications
dealing with cgroups, because /proc/$PID/cgroup
now does not match the directory in /sys/fs/cgroup

This particularly causes problems for systems and
will make it create repeated path components in
the cgroup for apps run in the container eg

  /machine.slice/machine-lxc\x2ddemo.scope/machine.slice/machine-lxc\x2ddemo.scope/user.slice/user-0.slice/session-61.scope

This also causes any systemd service that uses
sd-notify to fail to start, because when systemd
receives the notification it won't be able to
identify the corresponding unit it came from.
In particular this break rabbitmq-server startup

Future kernels will provide proper cgroup namespacing
which will handle this problem, but until that time
we should not try to play games with hiding parent
cgroups.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-01-26 16:11:32 +00:00
Martin Kletzander
89c509a0c1 util: Add getters for cgroup block device I/O throttling
Since now they were not needed, but I sense they will be in a short
while.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-08-18 16:25:16 -07:00
Peter Krempa
88f6c007c3 cgroup: Drop resource partition from virSystemdMakeScopeName
The scope name, even according to our docs is
"machine-$DRIVER\x2d$VMNAME.scope" virSystemdMakeScopeName would use the
resource partition name instead of "machine-" if it was specified thus
creating invalid scope paths.

This makes libvirt drop cgroups for a VM that uses custom resource
partition upon reconnecting since the detected scope name would not
match the expected name generated by virSystemdMakeScopeName.

The error is exposed by the following log entry:

debug : virCgroupValidateMachineGroup:302 : Name 'machine-qemu\x2dtestvm.scope' for controller 'cpu' does not match 'testvm', 'testvm.libvirt-qemu' or 'machine-test-qemu\x2dtestvm.scope'

for a "/machine/test" resource and "testvm" vm.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1238570
2015-07-22 07:12:56 +02:00
John Ferlan
0456eda317 cgroup: Use virCgroupNewThread
Replace the virCgroupNew{Vcpu|Emulator|IOThread} calls with the common
virCgroupNewThread API

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-04-09 19:27:08 -04:00
John Ferlan
2cd3a980dc cgroup: Introduce virCgroupNewThread
Create a new common API to replace the virCgroupNew{Vcpu|Emulator|IOThread}
API's using an emum to generate the cgroup name

Signed-off-by: John Ferlan <jferlan@redhat.com>
2015-04-09 19:27:08 -04:00
Michal Privoznik
d65acbde35 vircgroup: Introduce virCgroupControllerAvailable
This new internal API checks if given CGroup controller is
available.  It is going to be needed later when we need to make a
decision whether pin domain memory onto NUMA nodes using cpuset
CGroup controller or using numa_set_membind().

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-04-08 11:54:24 +02:00
Michal Privoznik
771e6e5a46 virCgroupController: Check the enum fits into 'int'
Throughout our code, the virCgroupController enum is used in two ways.
First as an index to an array of cgroup controllers:

struct virCgroup {
    char *path;

    struct virCgroupController controllers[VIR_CGROUP_CONTROLLER_LAST];
};

Second way is that when calling virCgroupNew() a bitmask of the enum
items can be passed to selectively detect only some controllers. For
instance:

int
virCgroupNewVcpu(virCgroupPtr domain,
                 int vcpuid,
                 bool create,
                 virCgroupPtr *group)
{
    ...
    controllers = ((1 << VIR_CGROUP_CONTROLLER_CPU) |
                   (1 << VIR_CGROUP_CONTROLLER_CPUACCT) |
                   (1 << VIR_CGROUP_CONTROLLER_CPUSET));

    if (virCgroupNew(-1, name, domain, controllers, group) < 0)
        goto cleanup;
}

Even though it's highly unlikely that so many new controllers will be
invented so that we would overflow when constructing the bitmask, it
doesn't hurt to check at compile time either.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-03-30 15:20:28 +02:00
Martin Kletzander
ba1dfc5b6a cgroup: Add accessors for cpuset.memory_migrate
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-03-20 13:40:02 +01:00
Daniel P. Berrange
318df5a05f Add support for systemd-machined CreateMachineWithNetwork
systemd-machined introduced a new method CreateMachineWithNetwork
that obsoletes CreateMachine. It expects to be given a list of
VETH/TAP device indexes for the host side device(s) associated
with a container/machine.

This falls back to the old CreateMachine method when the new
one is not supported.
2015-01-15 11:07:07 +00:00
Martin Kletzander
1a80b97ddf util: Add function virCgroupHasEmptyTasks
That function helps checking whether there's a task in that cgroup.

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2014-12-16 11:15:27 +01:00
Guido Günther
4882618ed1 qemu: use systemd's TerminateMachine to kill all processes
If we don't properly clean up all processes in the
machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm
starts fail with

  'CreateMachine: File exists'

Additional processes can e.g. be added via

  echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks

but there are other cases like

  http://bugs.debian.org/761521

Invoke TerminateMachine to be on the safe side since systemd tracks the
cgroup anyway. This is a noop if all processes have terminated already.
2014-10-01 20:17:46 +02:00
John Ferlan
3abb95cad4 vircgroup: Introduce virCgroupNewIOThread
Add virCgroupNewIOThread() to mimic virCgroupNewVcpu() except the naming
scheme with use "iothread" rather than "vcpu".
2014-09-15 13:18:56 -04:00
Cédric Bosdonnat
47e5b5ae32 lxc: allow to keep or drop capabilities
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:

  <mknod state="on"/>, keep CAP_MKNOD capability
  <sys_chroot state="off"/> drop CAP_SYS_CHROOT capability

Users can restrict or give more capabilities than the default using
this mechanism.
2014-07-23 15:12:37 +08:00
Peter Krempa
a48f445100 util: cgroup: Add helper to convert device mode to string
Cgroups code uses VIR_CGROUP_DEVICE_* flags to specify the mode but in
the end it needs to be converted to a string. Add a helper to do it and
use it in the cgroup code before introducing it into the rest of the
code.
2014-07-08 14:34:05 +02:00
Ján Tomko
897808e74f Extend virCgroupGetPercpuStats to fill in vcputime too
Currently, virCgroupGetPercpuStats is only used by the LXC driver,
filling out the CPUTIME stats. qemuDomainGetPercpuStats does this
and also filles out VCPUTIME stats.

Extend virCgroupGetPercpuStats to also report VCPUTIME stats if
nvcpupids is non-zero. In the LXC driver, we don't have cpupids.
In the QEMU driver, there is at least one cpupid for a running domain,
so the behavior shouldn't change for QEMU either.

Also rename getSumVcpuPercpuStats to virCgroupGetPercpuVcpuSum.
2014-04-09 16:24:08 +02:00
Richard Weinberger
6fb42d7cdc Ensure systemd cgroup ownership is delegated to container with userns
This function is needed for user namespaces, where we need to chmod()
the cgroup to the initial uid/gid such that systemd is allowed to
use the cgroup.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2014-02-24 15:35:47 +00:00
Thorsten Behrens
4b3b2f6ceb Implement domainGetCPUStats for lxc driver. 2014-02-20 16:20:09 +01:00
Thorsten Behrens
65158899b7 Make qemuGetDomainTotalCPUStats a virCgroup function.
To reuse this from other drivers, like lxc.
2014-02-20 16:20:09 +01:00
Thorsten Behrens
a2bb187c7e Add util virCgroupGetBlkioIo*Serviced methods.
This reads blkio stats from blkio.throttle.io_service_bytes and
blkio.throttle.io_serviced.
2014-02-20 16:20:09 +01:00
Gao feng
3b431929a2 blkio: Setting throttle blkio cgroup for domain
This patch introduces virCgroupSetBlkioDeviceReadIops,
virCgroupSetBlkioDeviceWriteIops,
virCgroupSetBlkioDeviceReadBps and
virCgroupSetBlkioDeviceWriteBps,

we can use these interfaces to set up throttle
blkio cgroup for domain.

This patch also adds the new throttle blkio cgroup
elements to the test xml.

Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2014-01-20 10:52:44 +08:00
Peter Krempa
d79fe8b50b cgroup: Move [qemu|lxc]GetCpuBWStatus to vicgroup.c and refactor it
The function existed in two identical instances in lxc and qemu. Move it
to vircgroup.c and simplify it. Refactor the callers too.
2013-09-16 11:32:49 +02:00
Eric Blake
2ff9e54cbf cgroup: functional sort
Make future patches smaller by matching a sane header listing in
the first place.  No semantic change.

* src/util/vircgroup.h: Move free next to new, and controller
functions next to each other.
* src/util/vircgroup.c (virCgroupFree, virCgroupHasController)
(virCgroupPathOfController, virCgroupRemoveRecursively)
(virCgroupRemove): Sort implementation to be closer to header.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-08-12 16:08:18 -06:00
Daniel P. Berrange
2fe2470181 Enable support for systemd-machined in cgroups creation
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:29:19 +01:00
Daniel P. Berrange
aedd46e7e3 Add support for systemd cgroup mount
Systemd uses a named cgroup mount for tracking processes. Add
it as another type of controller, albeit one which we have to
special case in a number of places. In particular we must
never create/delete directories there, nor add tasks. Essentially
the systemd mount is to be considered read-only for libvirt.

With this change both the virCgroupDetectPlacement and
virCgroupCopyPlacement methods must be invoked. The copy
placement method will copy setup for resource controllers
only. The detect placement method will probe for any
named controllers, or resource controllers not already
setup.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:27:19 +01:00
Daniel P. Berrange
5ec5a22493 Add 'controllers' arg to virCgroupNewDetect
When detecting cgroups we must honour any controllers
whitelist the driver may have.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:47 +01:00
Daniel P. Berrange
525c9d5a49 Make virCgroupIsValidMachine static
The virCgroupIsValidMachine does not need to be called from
outside the cgroups file now, so make it static.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:29 +01:00
Daniel P. Berrange
a45b99ead9 Introduce a more convenient virCgroupNewDetectMachine
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:47:30 +01:00
Daniel P. Berrange
b333330aa5 New cgroups API for atomically creating machine cgroups
Instead of requiring one API call to create a cgroup and
another to add a task to it, introduce a new API
virCgroupNewMachine which does both jobs at once. This
will facilitate the later code to talk to systemd to
achieve this job which is also atomic.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 11:42:47 +01:00
Daniel P. Berrange
d64e852b5a Remove obsolete cgroups creation apis
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
e638778eb3 Add API for checking if a cgroup is valid for a domain
Add virCgroupIsValidMachine API to check whether an auto
detected cgroup is valid for a machine. This lets us
check if a VM has just been placed into some generic
shared cgroup, or worse, the root cgroup

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
66a7f857f3 Add a virCgroupNewDetect API for finding cgroup placement
Add a virCgroupNewDetect API which is used to initialize a
cgroup object with the placement of an arbitrary process.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:35:26 +01:00
Daniel P. Berrange
b64dabff27 Report full errors from virCgroupNew*
Instead of returning raw errno values, report full libvirt
errors in virCgroupNew* functions.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
John Ferlan
d94a3cfcfb Fix build breaker with ATTRIBUTE_NONNULL defs
Using "./autogen.sh --system lv_cv_static_analysis=yes" for my daily
Coverity builds resulted in the following error when building:

In file included from util/vircgrouppriv.h:32:0,
                 from util/vircgroup.c:44:
util/vircgroup.h:59:5: error: nonnull argument with out-of-range operand number (argument 1, operand 5)
util/vircgroup.h:74:5: error: nonnull argument references non-pointer operand (argument 1, operand 4)
make[3]: *** [libvirt_util_la-vircgroup.lo] Error 1
make[3]: Leaving directory `/home/jferlan/libvirt.cov.curr/src'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/jferlan/libvirt.cov.curr/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/jferlan/libvirt.cov.curr'
make: *** [all] Error 2
2013-04-16 07:17:00 -04:00
Daniel P. Berrange
e7d8ab016b Add support for perf_event and net_cls cgroup controllers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:32 +01:00
Daniel P. Berrange
1da631ecf3 Add an API for re-mounting cgroups, to isolate the process location
Add a virCgroupIsolateMount method which looks at where the
current process is place in the cgroups (eg /system/demo.lxc.libvirt)
and then remounts the cgroups such that this sub-directory
becomes the root directory from the current process' POV.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:32 +01:00
Daniel P. Berrange
767596bdb4 Remove non-functional code for setting up non-root cgroups
The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
aa8604dd45 Add a new virCgroupNewPartition for setting up resource partitions
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
04c18d25f1 Rename virCgroupForXXX to virCgroupNewXXX
Rename all the virCgroupForXXX methods to use the form
virCgroupNewXXX since they are all constructors. Also
make sure the output parameter is the last one in the
list, and annotate all pointers as non-null. Fix up
all callers, and make sure they use true/false not 0/1
for the boolean parameters

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00
Daniel P. Berrange
dca927c82f Rename virCgroupMounted to virCgroupHasController & make it more robust
The virCgroupMounted method is badly named, since a controller can be
mounted, but disabled in the current object. Rename the method to be
virCgroupHasController. Also make it tolerant to a  NULL virCgroupPtr
and out-of-range controller index, to avoid duplication of these
checks in all callers

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-08 14:49:12 +01:00
Daniel P. Berrange
56f27b3bbc Don't create dirs in cgroup controllers we don't want to use
Currently when getting an instance of virCgroupPtr we will
create the path in all cgroup controllers. Only at the virt
driver layer are we attempting to filter controllers. This
is bad because the mere act of creating the dirs in the
controllers can have a functional impact on the kernel,
particularly for performance.

Update the virCgroupForDriver() method to accept a bitmask
of controllers to use. Only create dirs in the controllers
that are requested. When creating cgroups for domains,
respect the active controller list from the parent cgroup

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-05 10:41:54 +01:00
Daniel P. Berrange
804a809a06 Rename virCgroupGetAppRoot to virCgroupForSelf
The virCgroupGetAppRoot is not clear in its meaning. Change
to virCgroupForSelf to highlight that this returns the
cgroup config for the caller's process

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-05 10:41:54 +01:00