Commit Graph

42 Commits

Author SHA1 Message Date
Michal Privoznik
ea7d0ca37c vircgroup: Fix virCgroupKillRecursive() wrt nested controllers
I've encountered the following bug, but only on Gentoo with
systemd and CGroupsV2. I've started an LXC container successfully
but destroying it reported the following error:

  error: Failed to destroy domain 'amd64'
  error: internal error: failed to get cgroup backend for 'pathOfController'

Debugging showed, that CGroup hierarchy is full of surprises:

/sys/fs/cgroup/machine.slice/machine-lxc\x2d861\x2damd64.scope/
└── libvirt
    ├── dev-hugepages.mount
    ├── dev-mqueue.mount
    ├── init.scope
    ├── sys-fs-fuse-connections.mount
    ├── sys-kernel-config.mount
    ├── sys-kernel-debug.mount
    ├── sys-kernel-tracing.mount
    ├── system.slice
    │   ├── console-getty.service
    │   ├── dbus.service
    │   ├── system-getty.slice
    │   ├── system-modprobe.slice
    │   ├── systemd-journald.service
    │   ├── systemd-logind.service
    │   └── tmp.mount
    └── user.slice

For comparison, here's the same container on recent Rawhide:

/sys/fs/cgroup/machine.slice/machine-lxc\x2d13550\x2damd64.scope/
└── libvirt

Anyway, those nested directories should not be a problem, because
virCgroupKillRecursiveInternal() removes them recursively, right?
Sort of. The function really does remove nested directories, but
it assumes that every directory has the same controller as the
rest. Just take a look at virCgroupV2KillRecursive() - it gets
'Any' controller (the first one it found in ".scope") and then
passes it to virCgroupKillRecursiveInternal().

This assumption is not true though. The controllers found in
".scope" are the following:

  cpuset cpu io memory pids

while "libvirt" has fewer:

  cpuset cpu io memory

Up until now it's not problem, because of how we order
controllers internally - "cpu" is the first and thus picking
"Any" controller returns just that. But the rest of directories
has no controllers, their "cgroup.controllers" is just empty.

What fixes the bug is dropping @controller argument from
virCgroupKillRecursiveInternal() and letting each iteration work
pick its own controller.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2021-04-19 11:21:40 +02:00
Michal Privoznik
c8238579fb lib: Drop internal virXXXPtr typedefs
Historically, we declared pointer type to our types:

  typedef struct _virXXX virXXX;
  typedef virXXX *virXXXPtr;

But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.

This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:

https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-04-13 17:00:38 +02:00
Pavel Hrdina
184245f53b vircgroup: introduce nested cgroup to properly work with systemd
When running on host with systemd we register VMs with machined.
In this case systemd creates the root VM cgroup for us. This has some
implications where one of them is that systemd owns all files inside
the root VM cgroup and we should not touch them.

We already use DBus calls for some of the APIs but for the remaining
ones we will continue accessing the files directly. Systemd doesn't
support threaded cgroups so we need to do this.

The reason why we don't use DBus for most of the APIs is that we already
have a code that works with files and we would have to check if systemd
supports each API.

This change introduces new topology on systemd hosts:

$ROOT
  |
  +- machine.slice
     |
     +- machine-qemu\x2d1\x2dvm1.scope
        |
        +- libvirt
           |
           +- emulator
           +- vcpu0
           +- vcpu0

compared to the previous topology:

$ROOT
  |
  +- machine.slice
     |
     +- machine-qemu\x2d1\x2dvm1.scope
        |
        +- emulator
        +- vcpu0
        +- vcpu0

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-10 13:37:12 +01:00
Pavel Hrdina
9c1693eff4 vircgroup: use DBus call to systemd for some APIs
When running on host with systemd we register VMs with machined.
In this case systemd creates the root VM cgroup for us. This has some
implications where one of them is that systemd owns all files inside
the root VM cgroup and we should not touch them.

If we change any value in file that systemd knows about it will be
changed to what systemd thinks it should be when executing
`systemctl daemon-reload`.

These are the APIs that we need to call using systemd because they set
limits that are proportional to sibling cgroups.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-02-10 13:37:12 +01:00
Pavel Hrdina
126cb34a20 virt-host-validate: fix detection with cgroups v2
Using virtCgroupNewSelf() is not correct with cgroups v2 because the
the virt-host-validate process is executed from from the same cgroup
context as the terminal and usually not all controllers are enabled
by default.

To do a proper check we need to use the root cgroup to see what
controllers are actually available. Libvirt or systemd ensures that
all controllers are available for VMs as well.

This still doesn't solve the devices controller with cgroups v2 where
there is no controller as it was replaced by eBPF. Currently libvirt
tries to query eBPF programs which usually works only for root as
regular users will get permission denied for that operation.

Fixes: https://gitlab.com/libvirt/libvirt/-/issues/94

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-19 01:18:35 +01:00
Pavel Hrdina
8f0f6ff082 vircgrouppriv: fix ATTRIBUTE_NONNULL for virCgroupNewDomainPartition
Commit <99d2c6519ad18651b5959fa0a3366bcb2c1e44f3> removed parameter
from the function but did not modified ATTRIBUTE_NONNULL.

Reported-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2020-11-05 23:15:16 +01:00
Pavel Hrdina
99d2c6519a vircgroup: drop @create from virCgroupNewDomainPartition
All callers pass true.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-03 21:26:32 +01:00
Pavel Hrdina
ca7b305631 vircgroup: drop @pid argument from virCgroupNew
Now it is always -1.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-03 21:26:32 +01:00
Pavel Hrdina
2eb83e270d vircgroup: drop @parent from virCgroupNew
Now it is always NULL.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-03 21:26:32 +01:00
Pavel Hrdina
c88b3712ca vircgroup: remove useless cgroup->path variable
It is only used for debug and error purposes which can be easily
replaced by @placement.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2020-11-03 21:26:32 +01:00
Peter Krempa
b16629f00c util: cgroup: Use GHashTable instead of virHashTable
Rewrite using GHashTable which already has interfaces for using a number
as hash key. Glib's implementation doesn't copy the key by default, so
we need to allocate it, but overal the interface is more suited for this
case.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-10-22 15:02:46 +02:00
Pavel Hrdina
48423a0b5d vircgroup: introduce virCgroupV2DevicesAttachProg
This function loads the BPF prog with prepared map into kernel and
attaches it into guest cgroup.  It can be also used to replace existing
program in the cgroup if we need to resize BPF map to store more rules
for devices. The old program will be closed and removed from kernel.

There are two possible ways how to create BPF program:

    - One way is to write simple C-like code which can by compiled into
      BPF object file which can be loaded into kernel using elfutils.

    - The second way is to define macros which look like assembler
      instructions and can be used directly to create BPF program that
      can be directly loaded into kernel.

Since the program is not too complex we can use the second option.

If there is no program, all devices are allowed, if there is some
program it is executed and based on the exit status the access is
denied for 0 and allowed for 1.

Our program will follow these rules:

    - first it will try to look for the specific key using major and
      minor to see if there is any rule for that specific device

    - if there is no specific rule it will try to look for any rule that
      matches only major of the device

    - if there is no match with major it will try the same but with
      minor of the device

    - as the last attempt it will try to look for rule for all devices
      and if there is no match it will return 0 to deny that access

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:05 +01:00
Pavel Hrdina
c23829f18a util: vircgroup: move virCgroupGetValueStr out of virCgroupGetValueForBlkDev
If we need to get a path of specific file and we need to check its
existence before we use it then we can reuse that path to get value
for specific device.  This way we will not build the path again in
virCgroupGetValueForBlkDev.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-06-21 14:35:57 +02:00
Pavel Hrdina
3f741f9ace util: vircgroup: introduce virCgroup(Get|Set)ValueRaw
If we need to get a path of specific file and we need to check its
existence before we use it then we can reuse that path to get/set
values instead of calling the existing get/set value functions which
would be building the path again.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-06-21 14:35:51 +02:00
Cole Robinson
ce5346292a vircgrouppriv.h: Use #pragma once
Acked-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
2019-04-04 18:42:10 -04:00
Daniel P. Berrangé
568a417224 Enforce a standard header file guard symbol name
Require that all headers are guarded by a symbol named

  LIBVIRT_$FILENAME

where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.

Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-12-14 10:47:13 +00:00
Daniel P. Berrangé
4cfd709021 Fix many mistakes & inconsistencies in header file layout
This introduces a syntax-check script that validates header files use a
common layout:

  /*
   ...copyright header...
   */
  <one blank line>
  #ifndef SYMBOL
  # define SYMBOL
  ....content....
  #endif /* SYMBOL */

For any file ending priv.h, before the #ifndef, we will require a
guard to prevent bogus imports:

  #ifndef SYMBOL_ALLOW
  # error ....
  #endif /* SYMBOL_ALLOW */
  <one blank line>

The many mistakes this script identifies are then fixed.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-12-14 10:46:53 +00:00
Pavel Hrdina
b532546823 vircgroup: introduce virCgroupKillRecursiveCB
The rewrite to support cgroup v2 missed this function.  In cgroup v2
we have different files to track tasks.

We would fail to remove cgroup on non-systemd OSes if there is any
extra process assigned to guest cgroup because we would not kill any
process form the guest cgroup.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-12-14 11:27:18 +08:00
Daniel P. Berrangé
600462834f Remove all Author(s): lines from source file headers
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.

In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.

With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to  find the
author of a particular bit of code.

This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.

The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.

Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-12-13 16:08:38 +00:00
Pavel Hrdina
b79d858518 vircgroup: add support for hybrid configuration
This enables to use both cgroup v1 and v2 at the same time together
with libvirt.  It is supported by kernel and there is valid use-case,
not all controllers are implemented in cgroup v2 so there might be
configurations where administrator would enable these missing
controllers in cgroup v1.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-10-05 15:53:29 +02:00
Pavel Hrdina
b4ddf5ae62 util: introduce cgroup v2 files
Place cgroup v2 backend type before cgroup v1 to make it obvious
that cgroup v2 is preferred implementation.

Following patches will introduce support for hybrid configuration
which will allow us to use both at the same time, but we should
prefer cgroup v2 regardless.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-10-05 15:53:29 +02:00
Pavel Hrdina
65ba48d267 vircgroup: rename controllers to legacy
With the introduction of cgroup v2 there are new names used with
cgroups based on which version is used:

    - legacy: cgroup v1
    - unified: cgroup v2
    - hybrid: cgroup v1 and cgroup v2

Let's use 'legacy' instead of 'cgroupv1' or 'controllers' in our code.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
bebf732cfa vircgroup: rename virCgroupController into virCgroupV1Controller
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
5436fd75d8 vircgroup: extract virCgroupV1(Set|Get)CpuCfsQuota
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
900c58b7f9 vircgroup: extract virCgroupV1(Set|Get)Memory*Limit
They all need virCgroupV1GetMemoryUnlimitedKB() so it's easier to
move them in one commit.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
8f50f9ca24 vircgroup: extract virCgroupV1(Set|Get)BlkioDeviceWeight
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
44809a28ec vircgroup: extract virCgroupV1GetBlkioIoDeviceServiced
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
064024e70a vircgroup: extract virCgroupV1AddTask
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
b148d08049 vircgroup: extract virCgroupV1Remove
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
152c0f0bf5 vircgroup: extract virCgroupV1MakeGroup
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
61629d5be3 vircgroup: extract virCgroupV1ValidateMachineGroup
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
f60af21594 vircgroup: detect available backend for cgroup
We need to update one test-case because now new cgroup object will be
created only if there is any cgroup backend available.

Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 13:40:22 +02:00
Pavel Hrdina
8b62008d2b vircgrouptest: call virCgroupNewSelf instead virCgroupDetectMounts
This will be required once cgroup v2 is introduced.  The cgroup
detection is not simple and we will have multiple backends so we
should not just jump into the middle of the detection code.

In order to use virCgroupNewSelf we need to create all the remaining
data files:

    - {name}.cgroups represents /proc/cgroups, it is a list of cgroup
      controllers compiled into kernel

    - {name}.self.cgroup represents /proc/self/cgroup, it describes
      cgroups to which the process belongs

For "no-cgroups" we need to modify the expected behavior because
virCgroupNewSelf() will fail if there are no controllers available.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 09:59:23 +02:00
Pavel Hrdina
4988f4b347 vircgrouptest: call virCgroupDetectMounts directly
Because we can set which files to return for cgroup tests there
is no need to have special function tailored to run tests.

Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-09-25 09:59:23 +02:00
Pavel Hrdina
8ea2d5a6cb vircgroup: Move function used in tests into vircgrouppriv.h
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-08-13 11:53:53 +02:00
Pavel Hrdina
cd77242504 vircgroup: Introduce standard set of typedefs and use them
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-08-13 11:53:53 +02:00
Pavel Hrdina
70dc671a27 vircgroup: Rename structs to start with underscore
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-08-13 11:53:53 +02:00
Michal Privoznik
e0b46ad623 Revert "util: cgroup: define cleanup function using VIR_DEFINE_AUTOPTR_FUNC"
This reverts commit 4da4a9fe0c.

Turns out, our code relies on virCgroupFree(&var) setting
var = NULL.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2018-07-30 13:28:20 +02:00
Sukrit Bhatnagar
4da4a9fe0c util: cgroup: define cleanup function using VIR_DEFINE_AUTOPTR_FUNC
Using the new VIR_DEFINE_AUTOPTR_FUNC macro defined in
src/util/viralloc.h, define a new wrapper around an existing
cleanup function which will be called when a variable declared
with VIR_AUTOPTR macro goes out of scope. Also, drop the redundant
viralloc.h include, since that has moved from the source module into
the header.

When a variable of type virCgroupPtr is declared using
VIR_AUTOPTR, the function virCgroupFree will be run
automatically on it when it goes out of scope.

This commit also adds an intermediate typedef for virCgroup
type for use with the cleanup macros.

Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2018-07-27 17:19:18 +02:00
Jiri Denemark
2dbfa716e8 tests: Add tests for virCgroupDetectMounts
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-03-18 09:53:24 +01:00
Daniel P. Berrange
83336118db Track symlinks for co-mounted cgroup controllers
If a cgroup controller is co-mounted with another, eg

   /sys/fs/cgroup/cpu,cpuacct

Then it is a requirement that there exist symlinks at

   /sys/fs/cgroup/cpu
   /sys/fs/cgroup/cpuacct

pointing to the real mount point. Add support to virCgroupPtr
to detect and track these symlinks

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:32 +01:00
Daniel P. Berrange
f0e5f92434 Pull definition of structs out of vircgroup.c to vircgrouppriv.h
The definition of structs for cgroups are kept in vircgroup.c since
they are intended to be private from users of the API. To enable
effective testing, however, they need to be accessible. To address
the latter issue, without compronmising the former, this introduces
a new vircgrouppriv.h file to hold the struct definitions.

To prevent other files including this private header, it requires
that __VIR_CGROUP_ALLOW_INCLUDE_PRIV_H__ be defined before inclusion

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-04-15 17:35:31 +01:00