So the issue here is that you can end up with configuration where
you have cgroup v1 and v2 enabled at the same time and the devices
controllers is enabled for cgroup v1.
In cgroup v2 there is no devices controller, the device access is
controlled using BPF and since it is not a cgroup controller both
of them can exists at the same time and both of them are applied while
resolving access to devices.
In order to avoid configuring both BPF and cgroup v1 devices we will
use BPF if possible and otherwise fallback to cgroup v1 devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we want to deny all devices we just need to replace any existing
program with new program with empty map.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we want to allow all devices with all permissions we need to replace
any existing program that has any rule configured, otherwise we just
need to add new rule which will for example allow read access to all
devices.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In order to deny device we need to check if there is any entry in BPF
map and we need to load the current value from map if there is already
entry for that device. If both values are same we can remove that entry
but if they are different we need to update the entry because we don't
have to deny all access, but for example only write access.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In order to allow device we need to create key and value which will be
used to update BPF map. virBPFUpdateElem() can override existing
entries in BPF map so we need to check if that entry exists in order to
track number of entries in our map.
This can add rule for specific device but major and minor can be both
-1 which follows the same behavior as in cgroup v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Device rules are stored in BPF map that is a hash type, this function
will create a key based on major and minor id of device.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We need to close our FD that we have for BPF program and map in order
to let kernel remove all resources once the cgroup is removed as well.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function will be called for every virCgroup(Allow|Deny)* API in
order to prepare BPF program for guest. Since libvirtd can be restarted
at any point we will first try to detect existing progam, if there is
none we will create a new empty BPF program and lastly if we don't have
any space left in the existing BPF map we will create a new copy of the
BPF map with more space and attach a new program with that map into the
guest cgroup.
This solution allows us to start with reasonably small BPF map consuming
only small amount of memory and if needed we can easily extend the BPF
map if there is a lot of host devices used in guest or if user wants to
hot-plug a lot of devices once the guest is running.
Since there is no way how to reallocate existing BPF map we need to
create a new copy if we run out of space in current BPF map.
This overcomes all the limitations in BPF:
- map used in program has to be created before the program is loaded
into kernel
- once map is created you cannot change its size
- you cannot replace map in existing program
- you cannot use an array of maps because it can store FD to maps
of one specific size so we would not be able to use it to overcome
the second issue
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function creates new BPF program with new empty BPF map with the
default size and attaches it to the guest cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function will be called if libvirtd was restarted while some
domains were running. It will try to detect existing programs attached
to the guest cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function loads the BPF prog with prepared map into kernel and
attaches it into guest cgroup. It can be also used to replace existing
program in the cgroup if we need to resize BPF map to store more rules
for devices. The old program will be closed and removed from kernel.
There are two possible ways how to create BPF program:
- One way is to write simple C-like code which can by compiled into
BPF object file which can be loaded into kernel using elfutils.
- The second way is to define macros which look like assembler
instructions and can be used directly to create BPF program that
can be directly loaded into kernel.
Since the program is not too complex we can use the second option.
If there is no program, all devices are allowed, if there is some
program it is executed and based on the exit status the access is
denied for 0 and allowed for 1.
Our program will follow these rules:
- first it will try to look for the specific key using major and
minor to see if there is any rule for that specific device
- if there is no specific rule it will try to look for any rule that
matches only major of the device
- if there is no match with major it will try the same but with
minor of the device
- as the last attempt it will try to look for rule for all devices
and if there is no match it will return 0 to deny that access
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There is no exact way how to figure out whether BPF devices support is
compiled into kernel. One way is to check kernel configure options but
this is not reliable as it may not be available. Let's try to do
syscall to which will list BPF cgroup device programs.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In order to implement devices controller with cgroup v2 we need to
add support for BPF programs, cgroup v2 doesn't have devices controller.
This introduces required helpers wrapping linux syscalls.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some of our tests try to validate domain XMLs they are working
with (not intentionally, simply because they call top level
domain XML parse function). Anyway, this implies that we build
domain capabilities also - see
virQEMUDriverGetDomainCapabilities(). And since some domain XMLs
are type of 'kvm' the control gets through
virQEMUCapsFillDomainCaps() and virHostCPUGetKVMMaxVCPUs() to
opening /dev/kvm which may be missing on the machine we're
running 'make check'.
Previously, we did not see this issue, because it was masked. If
building domain capabilities failed for whatever reason, we
ignored the failure. Only v5.9.0-207-gc69e6edea3 uncovered the
problem (it changed reval from 0 to -1 if
virQEMUDriverGetDomainCapabilities() fails). Since the referenced
commit is correct, we need to mock access to /dev/kvm in our
tests.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Some layered products such as oVirt have requested a way to avoid being
blocked by guest agent commands when querying a loaded vm. For example,
many guest agent commands are polled periodically to monitor changes,
and rather than blocking the calling process, they'd prefer to simply
time out when an agent query is taking too long.
This patch adds a way for the user to specify a custom agent timeout
that is applied to all agent commands.
One special case to note here is the 'guest-sync' command. 'guest-sync'
is issued internally prior to calling any other command. (For example,
when libvirt wants to call 'guest-get-fsinfo', we first call
'guest-sync' and then call 'guest-get-fsinfo').
Previously, the 'guest-sync' command used a 5-second timeout
(VIR_DOMAIN_QEMU_AGENT_COMMAND_DEFAULT), whereas the actual command that
followed always blocked indefinitely
(VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK). As part of this patch, if a
custom timeout is specified that is shorter than
5 seconds, this new timeout is also used for 'guest-sync'. If there is
no custom timeout or if the custom timeout is longer than 5 seconds, we
will continue to use the 5-second timeout.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
With g_mkstemp_full, there is no need to distinguish between
mkostemp and mkostemps (no suffix vs. a suffix of a fixed length),
because the GLib function looks for the XXXXXX pattern everywhere
in the string.
Use S_IRUSR | S_IWUSR for the permissions and do not pass O_RDWR
in flags since it's implied.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This saves us from allocating vars upfront, since GLib deals with
that for us.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Using GRegex simplifies the code since g_match_info_fetch will
copy the matched substring for us.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Now that the cleanup section is empty, the ret variable is no longer
necessary.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This flag is not needed to use extended regular expression syntax
with GRegex and it makes GRegex ignore whitespace in the regex.
Remove the unintended usage, even though it should not matter in this
case.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The 'ramfb' attribute provides a framebuffer to the guest that can be
used as a boot display for the vgpu
For example, the following configuration can be used to provide a vgpu
with a boot display:
<hostdev mode='subsystem' type='mdev' model='vfio-pci' display='on' ramfb='on'>
<source>
<address uuid='$UUID'/>
</source>
</hostdev>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
As suggested by Cole, this patch uses the domain capabilities to
validate the supported video model types. This allows us to remove the
model type validation from qemu_process.c and qemu_domain.c and
consolidates it all in a single place that will automatically adjust
when new domain capabilities are added.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Continue consolidation of video device validation started in previous
patch.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
The goal is to move all of the video device validation to a single place
and use domain caps to validate the supported video device models. Since
qemuDomainDeviceDefValidateVideo() is called from
qemuProcessStartValidate(), these changes should not change anny
behavior.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
In a follow-up commit, we will use the domain capabilities to validate
video device configurations, which means that we also need to make sure
that the domain capabilities include the "none" video device.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
commit 9bfcf0f62d added the
QEMU_CAPS_DEVICE_RAMFB capability but did not set the domain capability.
This patch sets the domain capability for the ramfb device and updates
the tests.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Several tests were not specifying the necessary qemu capabilities for
what they were testing. Due to the way that the video devices are
currently validated, this is not causing any problems. But a change to
video device validation in a following patch would have exposed this
issue and resulted in multiple test failures about the domain
configuration not supporting particular video models.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
This allows us to simplify the function and avoid jumping to 'cleanup'.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
When the virDomainCapsDeviceDefValidate() function returned an error
status (-1), we were aborting the function early, but returning the
default return value (0). This patch properly returns an error in that
case.
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
The ordering of lock manager locks in the libxl driver has a flaw that was
uncovered by a migration error path. In the perform phase of migration, the
source host calls virDomainLockProcessPause to release the lock before
sending the VM to the destination host. If the send fails an attempt is made
to reacquire the lock with virDomainLockProcessResume, but that too can fail
if the destination host has not finished cleaning up the failed VM and
releasing the lock it acquired when starting to receive the VM.
This change delays calling virDomainLockProcessResume in libxlDomainStart
until the VM is successfully created, but before it is unpaused. A similar
approach is used by the qemu driver, avoiding the need to release the lock
if VM creation fails. In the migration perform phase, releasing the lock
with virDomainLockProcessPause is delayed until the VM is successfully
sent to the destination, which avoids reacquiring the lock if the send
fails.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
The virHostGetBootTimeProcfs() function is defined only for Linux
and therefore it's only call should also be done if we're on
Linux.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add a helper which converts qemu emulator capabilities to the domain
capability XML. This will simplify future additions of new features.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Declare the capabilities as enum values and store them in an array. This
makes adding new features more straightforward and simplifies the
formatter which now doesn't require changing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
While the qemu driver currently implements all domain capability
features, we should initialize all features using the helper similarly
to how we do it in drivers which don't support any.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For future extensions of the domain caps it's useful to have a single
point that initializes all capabilities as unsupported by a driver. The
driver then can enable specific ones.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>