Commit Graph

19 Commits

Author SHA1 Message Date
Pavel Hrdina
07497fc6da vircgroupv2devices: refactor virCgroupV2DevicesRemoveProg
When running on systemd host the cgroup itself is removed by machined
so when we reach this code the directory no longer exist. If libvirtd
was running the whole time between starting and destroying VM the
detection is skipped because we still have both FD in memory. But if
libvirtd was restarted and no operation requiring cgroup devices
executed the FDs would be 0 and libvirt would try to detect them using
the cgroup directory. This results in reporting following errors:

    libvirtd[955]: unable to open '/sys/fs/cgroup/machine.slice/machine-qemu\x2d1\x2dguest.scope/': No such file or directory
    libvirtd[955]: Failed to remove cgroup for guest

When running on non-systemd host where we handle cgroups manually this
would not happen.

When destroying VM it is not necessary to detect the BPF prog and map
because the following code only closes the FDs without doing anything
else. We could run code that would try to detach the BPF prog from the
cgroup but that is not necessary as well. If the cgroup is removed and
there is no other FD open to the prog kernel will cleanup the prog and
map eventually.

Reported-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2021-04-14 12:06:16 +02:00
Michal Privoznik
c8238579fb lib: Drop internal virXXXPtr typedefs
Historically, we declared pointer type to our types:

  typedef struct _virXXX virXXX;
  typedef virXXX *virXXXPtr;

But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.

This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:

https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-04-13 17:00:38 +02:00
Daniel P. Berrangé
fa56310e18 util: tell users that memory locking ulimit is too low for BPF
If running libvirtd via systemd, it gets a 64 MB memlock limit, but if
running from the shell it will only get 64 KB on a Fedora 33 system.
The latter low limit causes any attempt to use BPF to fail and it is
not obvious why.

This improves the error message thus:

  # virsh -c lxc:/// start sh
error: Failed to start domain 'sh'
error: internal error: guest failed to start: Failure in libvirt_lxc startup: failed to initialize device BPF map; locked memory limit for libvirtd probably needs to be raised: Operation not permitted

Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-17 09:16:44 +00:00
Pavel Hrdina
b2b1702341 src: add missing headers to various files
All these headers are indirectly included provided by virfile.h having
virstoragefile.h which will be removed in the following patch.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2021-01-06 13:15:17 +01:00
Michal Privoznik
95b9db4ee2 lib: Prefer WITH_* prefix for #if conditionals
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-09-02 10:28:10 +02:00
Pavel Hrdina
7e574d1a07 vircgroupv2devices: fix counting entries in BPF map
BPF syscall BPF_MAP_GET_NEXT_KEY returns -1 if something fails but it
will also return -1 if trying to get next key using the last key in the
map with errno set to ENOENT.

If there are VMs running and libvirtd is restarted and user tries to
call some cgroup devices operation on a VM we need to get the count of
entries in BPF map and it fails which will result in error when trying
to attach/detech devices.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1833321

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
2020-08-11 15:11:15 +02:00
Michal Privoznik
02794cc41d virCgroupV2DevicesAvailable: Print stringified errno in the debug log
In the virCgroupV2DevicesAvailable() function we try to determine
whether CGroups version 2 are available. We do this by opening
what we believe is the CGroup mount point and issuing a BPF call.
When the call fails, a debug message is printed. However, the BPF
call sets errno too. Include it in the debug message to help us
with debugging.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2020-03-10 17:10:21 +01:00
Pavel Hrdina
7d60846962 vircgroupv2devices: free BPF map when replacing with new one
This leaks the FD of BPF map which means it will not be freed.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2020-01-13 15:17:54 +01:00
Michal Privoznik
529100d9f7 vircgroupv2devices: Avoid double close on map FD
When allowing/denying a device in devices CGroupV2 we have to
write a BPF program for it. The program we put there is merely
static and all it does it looks up a device in a hash table (also
known as map in BPF terminology). A map is referenced via an FD
which can be acquired via virBPFCreateMap() and like any other FD
it should be closed when no longer needed. However, we close it
twice: the first time in virCgroupV2DevicesAttachProg() which
closes it unconditionally, and the second time in either
virCgroupV2DevicesCreateProg() or
virCgroupV2DevicesPrepareProg(). Remove the second close.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-01-06 17:30:17 +01:00
Michal Privoznik
ff878fe77c vircgroupv2devices: Unexport virCgroupV2DevicesAttachProg()
This function is not called outside of the source file where it's
defined. There's no need to export it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2020-01-06 17:30:17 +01:00
Michal Privoznik
c10b78370d vircgroupv2devices: Fix format string for size_t variable
In virCgroupV2DevicesReallocMap() we are debug printing both
arguments passed to the function. However, the @size argument is
type of size_t but '%lu' is used to format it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2019-11-18 08:53:30 +01:00
Pavel Hrdina
b18b0ce609 vircgroup: introduce virCgroupV2DevicesGetKey
Device rules are stored in BPF map that is a hash type, this function
will create a key based on major and minor id of device.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:38 +01:00
Pavel Hrdina
63cfe7b84d vircgroup: introduce virCgroupV2DeviceGetPerms
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:37 +01:00
Pavel Hrdina
6a24bd75ed vircgroup: introduce virCgroupV2DevicesRemoveProg
We need to close our FD that we have for BPF program and map in order
to let kernel remove all resources once the cgroup is removed as well.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:34 +01:00
Pavel Hrdina
ef747499a5 vircgroup: introduce virCgroupV2DevicesPrepareProg
This function will be called for every virCgroup(Allow|Deny)* API in
order to prepare BPF program for guest.  Since libvirtd can be restarted
at any point we will first try to detect existing progam, if there is
none we will create a new empty BPF program and lastly if we don't have
any space left in the existing BPF map we will create a new copy of the
BPF map with more space and attach a new program with that map into the
guest cgroup.

This solution allows us to start with reasonably small BPF map consuming
only small amount of memory and if needed we can easily extend the BPF
map if there is a lot of host devices used in guest or if user wants to
hot-plug a lot of devices once the guest is running.

Since there is no way how to reallocate existing BPF map we need to
create a new copy if we run out of space in current BPF map.

This overcomes all the limitations in BPF:

    - map used in program has to be created before the program is loaded
      into kernel

    - once map is created you cannot change its size

    - you cannot replace map in existing program

    - you cannot use an array of maps because it can store FD to maps
      of one specific size so we would not be able to use it to overcome
      the second issue

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:33 +01:00
Pavel Hrdina
afa2788662 vircgroup: introduce virCgroupV2DevicesCreateProg
This function creates new BPF program with new empty BPF map with the
default size and attaches it to the guest cgroup.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:32 +01:00
Pavel Hrdina
ce11a5c59f vircgroup: introduce virCgroupV2DevicesDetectProg
This function will be called if libvirtd was restarted while some
domains were running.  It will try to detect existing programs attached
to the guest cgroup.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:31 +01:00
Pavel Hrdina
48423a0b5d vircgroup: introduce virCgroupV2DevicesAttachProg
This function loads the BPF prog with prepared map into kernel and
attaches it into guest cgroup.  It can be also used to replace existing
program in the cgroup if we need to resize BPF map to store more rules
for devices. The old program will be closed and removed from kernel.

There are two possible ways how to create BPF program:

    - One way is to write simple C-like code which can by compiled into
      BPF object file which can be loaded into kernel using elfutils.

    - The second way is to define macros which look like assembler
      instructions and can be used directly to create BPF program that
      can be directly loaded into kernel.

Since the program is not too complex we can use the second option.

If there is no program, all devices are allowed, if there is some
program it is executed and based on the exit status the access is
denied for 0 and allowed for 1.

Our program will follow these rules:

    - first it will try to look for the specific key using major and
      minor to see if there is any rule for that specific device

    - if there is no specific rule it will try to look for any rule that
      matches only major of the device

    - if there is no match with major it will try the same but with
      minor of the device

    - as the last attempt it will try to look for rule for all devices
      and if there is no match it will return 0 to deny that access

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:05 +01:00
Pavel Hrdina
30b6ddc44c vircgroup: introduce virCgroupV2DevicesAvailable
There is no exact way how to figure out whether BPF devices support is
compiled into kernel.  One way is to check kernel configure options but
this is not reliable as it may not be available.  Let's try to do
syscall to which will list BPF cgroup device programs.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2019-11-15 12:58:04 +01:00