The original logic is incorrect. We would delete the device entry
from eBPF map only if the newval would be same as current val in the
map. In case that the device was allowed only as read-only but later
we remove all permissions for that device it would remain in the table
with empty values.
The old code would still deny the device but it's not working as
intended. Instead we will update the value in advance. If the updated
value is 0 it means that we are removing all permissions so it should
be removed from the map, otherwise we will update the value in map.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1810356
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Now that every caller to copyPlacement doesn't pass absolute path there
is no need to have a condition to handle that case.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Currently this task is done by virCgroupCopyPlacement when the @path
starts with "/".
virCgroupNew is always called with @path starting with "/" and there is
no parent to copy path from. To make it obvious what the code is doing
introduce new helper.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
It is only used for debug and error purposes which can be easily
replaced by @placement.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With cgroups v2 working with controllers is a bit more complicated then
with cgroups v1 where the controller had to be mounted.
There are two files, cgroups.controllers and cgroup.subtree_control.
The file cgroup.controllers lists all controllers enabled in the current
cgroup and cgroups.subtree_control, as the name suggest, controls which
controllers are enabled for a subtree of cgroups.
Now the issue here is that the current code doesn't make any difference
if the @parent variable is NULL or not because ../cgroup.subtree_control
will list the same controllers as ./cgroup.controllers.
The whole point of the @parent variable is when we are building the
cgroup topology ourselves without systemd help we need to detect which
controllers are enabled in the parent cgroup in order to enable them for
the current cgroup as well and for that we need to check
cgroup.controllers of the parent group.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When libvirtd starts a VM it internally stores a path to the main
cgroup. When we restart libvirtd we should get to the same state.
When we start a VM on host with systemd the cgroup is created for us and
the process is already placed into that cgroup and we detect the path
created by systemd using /proc/$PID/cgroup. After that we create
sub-cgroups and move all threads there.
Once libvirtd is restarted we again detect the cgroup path using
/proc/$PID/cgroup, but in this case we will get a different path because
the main thread was moved to a "emulator" cgroup.
Instead of ignoring the "emulator" directory when validating cgroups
remove it completely when detecting cgroup otherwise cgroups will not
work properly when libvirtd is restarted.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With cgroups v2 the file cgroup.procs will never be empty if threading
is enabled as it will always have ID of all processes even if all
threads of the processes are moved to sub-cgroups. If that happens the
file cgroup.threads will be empty.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Rewrite using GHashTable which already has interfaces for using a number
as hash key. Glib's implementation doesn't copy the key by default, so
we need to allocate it, but overal the interface is more suited for this
case.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In virCgroupV2BindMount there is an unused variable containing
what seem to be tmpfs mount options.
Delete it. Unlike with cgroups v1, we do not create a tmpfs
here.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Both accept a NULL value gracefully and virStringFreeList
does not zero the pointer afterwards, so a straight replace
is safe.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There is nothing in the vircgroup.h header file
requiring virutil.h.
Remove it and include unistd.h in the C files.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
So the issue here is that you can end up with configuration where
you have cgroup v1 and v2 enabled at the same time and the devices
controllers is enabled for cgroup v1.
In cgroup v2 there is no devices controller, the device access is
controlled using BPF and since it is not a cgroup controller both
of them can exists at the same time and both of them are applied while
resolving access to devices.
In order to avoid configuring both BPF and cgroup v1 devices we will
use BPF if possible and otherwise fallback to cgroup v1 devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we want to deny all devices we just need to replace any existing
program with new program with empty map.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we want to allow all devices with all permissions we need to replace
any existing program that has any rule configured, otherwise we just
need to add new rule which will for example allow read access to all
devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In order to deny device we need to check if there is any entry in BPF
map and we need to load the current value from map if there is already
entry for that device. If both values are same we can remove that entry
but if they are different we need to update the entry because we don't
have to deny all access, but for example only write access.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In order to allow device we need to create key and value which will be
used to update BPF map. virBPFUpdateElem() can override existing
entries in BPF map so we need to check if that entry exists in order to
track number of entries in our map.
This can add rule for specific device but major and minor can be both
-1 which follows the same behavior as in cgroup v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We need to close our FD that we have for BPF program and map in order
to let kernel remove all resources once the cgroup is removed as well.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There is no exact way how to figure out whether BPF devices support is
compiled into kernel. One way is to check kernel configure options but
this is not reliable as it may not be available. Let's try to do
syscall to which will list BPF cgroup device programs.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In few places we have the following code pattern:
int ret;
... /* @ret is not accessed here */
ret = f(...);
return ret;
This pattern can be written less verbose:
...
return f(...);
This patch was generated with following coccinelle spatch:
@@
type T;
constant C;
expression f;
identifier ret;
@@
-T ret = C;
... when != ret
-ret = f;
-return ret;
+return f;
Afterwards I needed to fix a few places, e.g. comment in
virDomainNetIPParseXML() was removed too because coccinelle
thinks it refers to @ret while in fact it doesn't. Also in few
places it replaced @ret declaration with a few spaces instead of
removing the line. But nothing terribly wrong.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Replace all occurrences of
if (VIR_STRDUP(a, b) < 0)
/* effectively dead code */
with:
a = g_strdup(b);
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
All the callers of these functions only check for a negative
return value.
However, virNetDevOpenvswitchGetVhostuserIfname is documented
as returning 1 for openvswitch interfaces so preserve that.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since commit 44e7f02915
util: rewrite auto cleanup macros to use glib's equivalent
VIR_AUTOFREE is just an alias for g_autofree. Use the GLib macros
directly instead of our custom aliases.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use G_GNUC_UNUSED from GLib instead of ATTRIBUTE_UNUSED.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We're using gnulib to get ffs, ffsl, rotl32, count_one_bits,
and count_leading_zeros. Except for rotl32 they can all be
replaced with gcc/clangs builtins. rotl32 is a one-line
trivial function.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
On Fedora 31, starting a 'mock' build alters /proc/$pid/cgroup,
probably due to usage of systemd-nspawn.
Before:
$ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/...
After:
$ cat /proc/self/cgroup
1:name=systemd:/
0::/user.slice/user-1000.slice/...
The cgroupv2 code mishandles that first line in the second case, which
causes VM startup to fail with: Unable to read from
'/sys/fs/cgroup/machine/cgroup.controllers': No such file or directory
The kernel docs[1] say that the cgroupv2 path will always start with
'0::', which in the code here controllers="". Only set the v2 placement
path when we see that cgroup file entry.
[1] https://www.kernel.org/doc/html/v5.3/admin-guide/cgroup-v2.html#processeshttps://bugzilla.redhat.com/show_bug.cgi?id=1751120
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
When we set cpu.max period we need to parse the cpu.max file first as
it contains both quota and period values separated by space. When only
a single number is written to that file it will set quota. However,
in order to change period we need to write both values.
The code was prepared for that but mistakenly used new line to end the
string with the first value.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1749227
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Introduced by commit <c854e0bd33c7a5afb04a36465bf04f861b2efef5> that
tried to fix an issue where we would fail to parse values from files.
We cannot change the original pointer that is going to be used by
VIR_AUTOFREE.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1747440
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
If the first value in cpu.max is "max" return from function.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1741837
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Our virStrToLong* helpers converts string to integers where it wraps
strtol standard function. After the conversion happens and there are
some remaining invalid characters our helpers will fail if the second
argument is NULL.
We need to pass pointer to string in cases where there are multiple
values in a single file.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1741825
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In cgroups v2 when a new group is created by default no controller is
enabled so the detection code will not detect any controllers.
When enabling the controllers we should also store them for the group.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
When creating new group for cgroups v2 the we cannot check
cgroups.controllers for that cgroup because the directory is created
later. In that case we should check cgroups.subtree_control of parent
group to get list of controllers enabled for child cgroups.
In order to achieve that we will prefer the parent group if it exists,
the current group will be used only for root group.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Because of a systemd delegation policy [1] we should not write to any
cgroups files owned by systemd which in case of cgroups v2 includes
'cgroups.subtree_control'.
systemd will enable controllers automatically for us to have them
available for VM cgroups.
[1] <https://github.com/systemd/systemd/blob/master/docs/CGROUP_DELEGATION.md>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This reverts commit 7bca1c9bdc.
As it turns out it's not a good idea on systemd hosts. The root
cgroup can have all controllers enabled but they don't have to be
enabled for sub-cgroups.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When detecting available controllers on host we can be limited by list
of controllers from qemu.conf file.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Currently CPU controller cannot be enabled if there is any real-time
task running and is assigned to non-root cgroup which is the case on
several distributions with graphical environment.
Instead of erroring out treat it as the controller is not available.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
In order to skip controllers that we are not able to activate we need
to return different return value so the caller can decide what to do.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
It might happen that we are not able to enable CPU controller so we
can enable it for thread sub-cgroups only if it's available in parent
cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The assumption that CPU controller would be always enabled is wrong, we
should use any available controller to create a new sub-cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This affects only cgroups v2 where enabled controllers are not based on
available mount points but on the list provided in cgroup.controllers
file. However, moving it will fill in placement as well, so it needs
to be freed together with mount point if we don't need that controller.
Before this patch we were assuming that all controllers available in
root cgroup where available in all other sub-cgroups which was wrong.
In order to fix it we need to move the cgroup controllers detection
after cgroup placement was prepared in order to build correct path for
cgroup.controllers file.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
In cgroups v2 we don't have to detect available controllers every single
time if we are creating a new cgroup based on parent cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In kernel 4.12 there was introduced new BFQ scheduler and in kernel
5.0 the old CFQ scheduler was removed. This has an implication on
the cgroups file names.
If the CFQ controller is enabled we use one file:
io.weight
The new BFQ controller expose one file with different name:
io.bfq.weight
Except for different name they have different syntax.
io.weight:
default $val
major:minor $val
io.bfq.weight:
$val
The difference is that BFQ doesn't support per-device weight.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we need to get a path of specific file and we need to check its
existence before we use it then we can reuse that path to get value
for specific device. This way we will not build the path again in
virCgroupGetValueForBlkDev.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Standardize on putting the _LAST enum value on the second line
of VIR_ENUM_IMPL invocations. Later patches that add string labels
to VIR_ENUM_IMPL will push most of these to the second line anyways,
so this saves some noise.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
This reverts commit a5e1602090.
Getting rid of unistd.h from our headers will require more work than
just fixing the broken mingw build. Revert it until I have a more
complete proposal.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>