35976 Commits

Author SHA1 Message Date
Boris Fiuczynski
23df65de2f nodedev: immediate update of active config on udev events
When an udev add, change or remove event occurs the mdev active config data
requires an update via mdevctl as the udev does not contain all config data.
This update needs to occur immediately and to be finished before the libvirt
nodedev event is issued to keep the API usage reliable.

After this change, scheduleMdevctlUpdate call is already called in
`udevAddOneDevice` and can therefore be removed in `udevHandleOneDevice`.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
2024-06-18 09:00:05 -05:00
Marc Hartmayer
30354f5b1f node_device_udev: Set @def to NULL
@def is owned by @obj after adding it the node device object list. As soon as
the @obj lock has been released, another thread could free @obj and therefore
@def. If now someone accesses @def this would lead to a heap-use-after-free and
therefore most likely to a segmentation fault, therefore set @def to NULL after
the ownership has moved.

While at it, add comments to other code places why @def is set to NULL.

Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
2024-06-18 08:59:46 -05:00
Boris Fiuczynski
7ccf76ea34 nodedev: fix mdev add udev event data handling
Two situations will trigger an udev add event:
 1) the mdev is created when started (transient) or
 2) the mdev was defined and is started
In case 1 there is no node object existing and no config data is copied.
In case 2 copying the active config data of an existing node object will
only copy invalid data. Instead copying the defined config data will
store valid data into the newly added node object.

Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
2024-06-18 08:58:10 -05:00
Adam Julis
e145d182a6 qemu: implement iommu coldplug/unplug
Resolves: https://issues.redhat.com/browse/RHEL-23833
Signed-off-by: Adam Julis <ajulis@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-18 12:17:50 +02:00
Adam Julis
8ce138632c syms: Properly export virDomainIOMMUDefFree()
While the function is exported via header, the symbol itself was not.

Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-18 12:17:50 +02:00
Adam Julis
59f6e226bb qemu_driver: add validation of potential dependencies on cold plug
Although virDomainDeviceDefValidate() is called as a part of
parsing device XML routine, it validates only that single device.
The virDomainDefValidate() function performs a more comprehensive
check. It should detect errors resulting from dependencies
between devices, or a device and some other part of XML config.
Therefore, a call to virDomainDefValidate() is added at the end
of qemuDomainAttachDeviceConfig().

Signed-off-by: Adam Julis <ajulis@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-18 08:46:28 +02:00
Daniel P. Berrangé
3fff8c91b0 network: introduce a "none" firewall backend type
There are two scenarios identified after the recent firewall backend
selection was introduced, which result in libvirtd failing to startup
due to an inability to find either iptables/nftables

 - On Linux if running unprivileged with $PATH lacking the dir
   containing iptables/nftables
 - On non-Linux where iptables/nftables never existed

In the former case, it is preferrable to restore the behaviour whereby
the driver starts successfully. Users will get an error reported when
attempting to start any virtual network, due to the lack of permissions
needed to create bridge devices. This makes the missing firewall backend
irrelevant.

In the latter case, the network driver calls the 'nop' platform
implementation which does not attempt to implement any firewall logic,
just allowing the network to start without firewall rules.

To solve this are number of changes are required

 * Introduce VIR_FIREWALL_BACKEND_NONE, which does nothing except
   report a fatal error from virFirewallApply(). This code path
   is unreachable, since we'll never create a virFirewall
   object with with VIR_FIREWALL_BACKEND_NONE, so the error reporting
   is just a sanity check.

 * Ignore the compile time backend defaults and assume use of
   the 'none' backend if running unprivileged.

   This fixes the first regression, avoiding the failure to start
   libvirtd on Linux in unprivileged context, instead allowing use
   of the driver and expecting a permission denied when creating a
   bridge.

 * Reject the use of compile time backend defaults no non-Linux
   and hardcode the 'none' backend. The non-Linux platforms have
   no firewall implementation at all currently, so there's no
   reason to permit the use of 'firewall_backend_priority'
   meson option.

   This fixes the second regression, avoiding the failure to start
   libvirtd on non-Linux hosts due to non-existant Linux binaries.

 * Change the Linux platform backend to raise an error if the
   firewall backend is 'none'. Again this code path is unreachable
   by default since we'll fail to create the bridge before getting
   here, but if someone modified network.conf to request the 'none'
   backend, this will stop further progress.

 * Change the nop platform backend to raise an error if the
   firewall backend is 'iptables' or 'nftables'. Again this code
   path is unreachable, since we should already have failed to
   find the iptables/nftables binaries on non-Linux hosts, so
   this is just a sanity check.

 * 'none' is not permited as a value in 'firewall_backend_priority'
   meson option, since it is conceptually meaningless to ask for
   that on Linux.

NB, 'firewall_backend_priority' allows repeated options temporarily,
which we don't want. Meson intends to turn this into a hard error

  DEPRECATION: Duplicated values in array option is deprecated. This will become a hard error in the future.

and we can live with the reduced error checking until that happens.

Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-17 15:55:14 +01:00
Michal Privoznik
be58733d90 conf: Drop needless NULL checks guarding virBufferEscapeString()
There's no need to guard virBufferEscapeString() with a call to
NULL as the very first thing the function does is check all three
arguments for NULL.

This patch was generated using the following spatch:

  @@
  expression X, Y, E;
  @@

  - if (E)
      virBufferEscapeString(X, Y, E);

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>
2024-06-17 16:13:28 +02:00
Michal Privoznik
0f5ce3afd4 virprocess: Debug affinity map in virProcessSetAffinity()
The aim of virProcessSetAffinity() is to set affinity of given
process to given CPUs. While we currently print the PID into
logs, the CPU map is not printed. It may help when debugging
weird scenarios.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-17 12:30:39 +02:00
Michal Privoznik
095f22db21 qemu_process: Issue an info message when subtracting isolcpus
In one of my previous commits I've made us substract isolcpus
from all online CPUs when setting affinity on QEMU threads. See
commit below for more info on that. Nevertheless, this is
something that surely deserves an entry in log. I've chosen INFO
priority for now. We can promote that to a regular WARN if users
complain.

Fixes: da95bcb6b2d9b04958e0f2603202801dd29debb8
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-17 12:30:39 +02:00
Daniel P. Berrangé
f2828880b6 meson: allow systemd sysusersdir to be changed
We currently hardcode the systemd sysusersdir, but it is desirable to be
able to choose a different location in some cases. For example, Fedora
flatpak builds change the RPM %_sysusersdir macro, but we can't currently
honour that.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reported-by: Yaakov Selkowitz <yselkowi@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-13 10:23:11 +01:00
Peter Krempa
39bfd6c888 qemu_validate: Validate support for SCSI emulation support in 'virtio-blk' devices
The support will be dropped soon by qemu, and libvirt is not rejecting
such configurations. Add validation of this explicitly requested config.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-12 08:21:12 +02:00
Peter Krempa
126f95c1fe qemuValidateDomainDeviceDefDiskFrontend: Refactor validation of <disk type='lun'>
Use a switch statement for checks based on the disk bus.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-12 08:21:11 +02:00
Daniel P. Berrangé
9f549eb8a5 rpc: split TLS cert validation into separate file
The TLS cert validation logic will be reused for the new impl of the
virt-pki-validate tool.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-11 12:50:23 +01:00
Daniel P. Berrangé
14f4de4c73 rpc: refactor method for checking session certificates
This will facilitate moving much of the code into a new file in the
subsequent commit.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-11 12:50:23 +01:00
Daniel P. Berrangé
e66c3bcd0c rpc: split out helpers for TLS cert path location
We'll want to access these paths from outside the TLS context code,
so split them into a standalone file.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-11 12:50:23 +01:00
Georgia Garcia
a2455fd53d virt-aa-helper: use 'include if exists' on .files
Change the 'include' in the AppArmor policy to use 'include if exists'
when including <uuid>.files. Note that 'if exists' is only available
after AppArmor 3.0, therefore a #ifdef check must be added.

When the <uuid>.files is not present, there are some failures in the
AppArmor tools like the following, since they expect the file to exist
when using 'include':

ERROR: Include file /etc/apparmor.d/libvirt/libvirt-8534a409-a460-4fab-a2dd-0e1dce4ff273.files not found

Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-06-11 09:46:19 +02:00
Daniel P. Berrangé
a7eb7de531 meson: allow systemd unitdir to be changed
We currently hardcode the systemd unitdir, but it is desirable to be
able to choose a different location in some cases. For examples, Fedora
flatpak builds change the RPM %_unitdir macro, but we can't currently
honour that.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-07 14:04:19 +01:00
Andrea Bolognani
971e767805 qemu: Reject TPM 1.2 in most scenarios
Everywhere we use TPM 2.0 as our default, the chances of TPM
1.2 being supported by the guest OS are very slim. Just reject
such configurations outright.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-06-07 11:13:19 +02:00
Andrea Bolognani
220b2690da qemu: Default to TPM 2.0 in most scenarios
TPM 1.2 is a pretty bad default these days, especially for
architectures which were introduced when TPM 2.0 already existed.

We're already carving out exceptions for several scenarios, but
that's basically backwards: at this point, using TPM 1.2 is the
exception.

Restructure the code so that it reflects reality and we don't
have to remember to update it every time a new architecture is
introduced.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2024-06-07 11:13:16 +02:00
Michal Privoznik
86e511fafb lib: Annotate more function as NULL terminated
While __attribute((sentinel)) (exposed by glib under
G_GNUC_NULL_TERMINATED macro) is a gcc extension, it's supported
by clang too. It's already being used throughout our code but
some functions that take variadic arguments and expect NULL at
the end were lacking such annotation. Fill them in.

After this, there are still some functions left untouched because
they expect a different sentinel than NULL. Unfortunately, glib
does not provide macro for different sentinels. We may come up
with our own, but let's save that for future work.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
2024-06-06 09:29:58 +02:00
Daniel P. Berrangé
3499354e12 interface: fix udev reference leak with invalid flags
The udevInterfaceGetXMLDesc method takes a reference on the udev
driver as its first action. If the virCheckFlags() condition
fails, however, this reference is never released.

Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-05 12:19:12 +01:00
Daniel P. Berrangé
98f1cf88fa rpc: avoid leak of GSource in use for interrupting main loop
We never release the reference on the GSource created for
interrupting the main loop, nor do we remove it from the
main context if our thread is woken up prior to the wakeup
callback firing.

This can result in a leak of GSource objects, along with an
ever growing list of GSources attached to the main context,
which will gradually slow down execution of the loop, as
several operations are O(N) for the number of attached GSource
objects.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-06-05 12:03:24 +01:00
Peter Krempa
f3e8c10fe4 qemu: validate: Fix check for unsupported FS-device bootindex use on un-assigned addresses
When hot-plugging a FS device with un-assigned address with a bootindex
the recently-added validation check would fail as validation on hotplug
is done prior to address assignment.

To fix this problem we can simply relax the check to also pass on _NONE
addresses. Unsupported configurations will still be caught as previous
commit re-checks the definition after address assignment prior to
hotplug.

Resolves: https://issues.redhat.com/browse/RHEL-39271
Fixes: 4690058b6d3dab672bd18ff69c83392245253024
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-05-31 12:54:32 +02:00
Peter Krempa
63b32dbe8b qemu: hotplug: Validate definition of 'FS' device after address allocation
Some of the checks make sense only after the address is allocated and
thus we need to re-do the validation after the address is assigned.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-05-31 12:54:32 +02:00
Peter Krempa
c249f909f3 syms: Properly export 'virDomainDeviceDefValidate'
While the function is exported via header, the symbol itself was not.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2024-05-31 12:54:32 +02:00
Michal Privoznik
a57195e79e log_cleaner: Detect rotated filenames properly
When removing rotated log files, their name is matched against a
regex (@log_regex) and if they contain '.N' suffix the 'N' is
then parsed into an integer. Well, due to a bug in
virLogCleanerParseFilename() this is not how the code works. If
the suffix isn't found then g_match_info_fetch() returns an empty
string instead of NULL which then makes str2int parsing fail.
Just check for this case before parsing the string.

Based on the original patch sent by David.

Reported-by: David Negreira <david.negreira@canonical.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-31 08:34:29 +02:00
Michal Privoznik
805b1eec7d qemu_hotplug: Clear QoS if required in qemuDomainChangeNet()
In one of my recent commits, I've introduced
virDomainInterfaceClearQoS() which is a helper that either calls
virNetDevBandwidthClear() ('tc' implementation) or
virNetDevOpenvswitchInterfaceClearQos() (for ovs ifaces). But I
made a micro optimization which leads to a bug: the function
checks whether passed iface has any QoS set and returns early if
it has none. In majority of cases this is right thing to do, but
when removing QoS on virDomainUpdateDeviceFlags() this is
problematic. The new definition (passed as argument to
virDomainInterfaceClearQoS()) contains no QoS (because user
requested its removal) and thus instead of removing the old QoS
setting nothing is done.

Fortunately, the fix is simple - pass olddev which contains the
old QoS setting.

Fixes: 812a146dfe784315edece43d09f8d9e432f8230e
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-30 14:08:07 +02:00
Pavel Hrdina
2ea493598f qemu_snapshot: fix memory leak when reverting external snapshot
The code cleaning up virStorageSource doesn't free data allocated by
virStorageSourceInit() so we need to call virStorageSourceDeinit()
explicitly.

Fixes: 8e664737813378d2a1bdeacc2ca8e942327e2cab
Resolves: https://issues.redhat.com/browse/RHEL-33044
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-29 15:23:55 +02:00
Andrea Bolognani
957eea376b meson: Improve default firewall backend configuration
The current implementation requires users to configure the
preference as such:

  -Dfirewall_backend_default_1=iptables
  -Dfirewall_backend_default_2=nftables

In addition to being more verbose than one would hope, there
are several things that could go wrong.

First of all, meson performs no validation on the provided
values, so mistakes will only be caught by the compiler.
Additionally, it's entirely possible to provide nonsensical
combinations, such as repeating the same value twice.

Change things so that the preference can now be configured
as such:

  -Dfirewall_backend_priority=iptables,nftables

Checks have been added to prevent invalid values from being
accepted.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
2024-05-28 19:28:58 +02:00
Laine Stump
a4f38f6ffe network: use iif/oif instead of iifname/oifname in nftables rules
iifname/oifname need to lookup the string that contains the name of
the interface each time a packet is checked, while iif/oif compare the
ifindex of the interface, which is included directly in the
packet. Conveniently, the rule is created using the *name* of the
interface (which gets converted to ifindex as the rule is added), so
no extra work is required other than changing the commandline option.

If it was the case that the interface could be deleted and re-added
during the life of the rule, we would have to use Xifname (since
deleting and re-adding the interface would result in ifindex
changing), but for our uses this never happens, so Xif works for us,
and undoubtedly improves performance by at least 0.0000001%.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-27 23:53:58 +02:00
Peter Krempa
65cdc37a7e virFileOpenForked: Fix handling of return value from virSocketSendFD()
Commit 91f4ebbac81bc3829da6d5a71d7520a6fc9e358e (v10.0.0-185-g91f4ebbac8)
changed the return value of virSocketSendFD() from 0 to 1 on success.

Unfortunately in 'virFileOpenForked' the return value was used to report
the error back to the main process from the fork'd child. As process
return codes are positive only, the code negates the value of 'ret' and
reports it. This resulted in the parent thinking the process exited with
failure:

 # virsh save avocado-vt-vm1 /mnt/save
 error: Failed to save domain 'avocado-vt-vm1' to /mnt/save
 error: Error from child process creating '/mnt/save': Unknown error 255

This error reproduces on NFS mounts with 'root_squash' enabled. I've
also observed it in one specific migration case when root_squash NFS is
used with following error:

  Failed to open file '/var/lib/libvirt/images/alpine.qcow2': Unknown error 255'

To fix the issue the code is refactored so that it doesn't actually
touch the 'ret' variable needlessly and assigns to it only on failure
cases, which prevents the '1' to be propagated to the parent process as
'255' after negating and storing in the process return code.

Fixes: 91f4ebbac81bc3829da6d5a71d7520a6fc9e358e
Resolves: https://issues.redhat.com/browse/RHEL-36721
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-23 14:32:24 +02:00
Peter Krempa
f63cbc7365 virGetGroupList: Refactor and fix callers
Use contemporary style for declarations and automatic memory clearing
for a helper string.

Since the function can't fail any more, remove any mention of returning
errno and remove error checks from all callers.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-23 14:32:24 +02:00
Peter Krempa
f2648fca1a virfile: Modernize definition of virFileOpenForked/virFileOpenForceOwnerMode/virFileOpenAs
Declare one argument per line and one variable per line and use boolean
operators at the end of the line rather than at the beginning.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
2024-05-23 14:32:24 +02:00
Laine Stump
afbd1bb89e network: eliminate pointless host input/output rules from nftables backend
The iptables backend (which was used as the model for the nftables
backend) used the same "filter" and "nat" tables used by other
services on the system (e.g. firewalld or any other host firewall
management application), so it was possible that one of those other
services would be blocking DNS, DHCP, or TFTP from guests to the host;
we added our own rules at the beginning of the chain to allow this
traffic no matter if someone else rejected it later.

But with nftables, each service uses their own table, and all traffic
must be acepted by all tables no matter what - it's not possible for
us to just insert a higher priority/earlier rule that will override
some reject rule put in by, e.g., firewalld. Instead the firewalld (or
other) table must be setup by that service to allow the traffic. That,
along with the fact that our table is already "accept by default",
makes it possible to eliminate the individual accept rules for DHCP,
DNS, and TFTP. And once those rules are eliminated, there is no longer
any need for the guest_to_host or host_to_guest tables.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:49 -04:00
Laine Stump
958aa7f274 network: rename chains used by network driver nftables backend
Because the chains added by the network driver nftables backend will
go into a table used only by libvirt, we don't need to have "libvirt"
in the chain names. Instead, we can make them more descriptive and
less abrasive (by using lower case, and using full words rather than
abbreviations).

Also (again because nobody else is using the private "libvirt_network"
table) we can directly put our rules into the input ("guest_to_host"),
output ("host_to_guest"), and postrouting ("guest_nat") chains rather
than creating a subordinate chain as done in the iptables backend.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:49 -04:00
Laine Stump
0bd7a47356 network: name the nftables table "libvirt_network" rather than "libvirt"
This way when we implement nftables for the nwfilter driver, we can
create a separate table called "libvirt_nwfilter" and everything will
look all symmetrical and stuff.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:49 -04:00
Laine Stump
b89c4991da network: add an nftables backend for network driver's firewall construction
Support using nftables to setup the firewall for each virtual network,
rather than iptables. The initial implementation of the nftables
backend creates (almost) exactly the same ruleset as the iptables
backend, determined by running the following commands on a host that
has an active virtual network:

  iptables-save >iptables.txt
  iptables-restore-translate -f iptables.txt

(and the similar ip6tables-save/ip6tables-restore-translate for an
IPv6 network). Correctness of the new backend was checked by comparing
the output of:

  nft list ruleset

when the backend is set to iptables and when it is set to nftables.

This page was used as a guide:

  https://wiki.nftables.org/wiki-nftables/index.php/Moving_from_iptables_to_nftables

The only differences between the rules created by the nftables backed
vs. the iptables backend (aside from a few inconsequential changes in
display order of some chains/options) are:

1) When we add nftables rules, rather than adding them in the
system-created "filter" and "nat" tables, we add them in a private
table (ie only we should be using it) created by us called "libvirt"
(the system-created "filter" and "nat" tables can't be used because
adding any rules to those tables directly with nft will cause failure
of any legacy application attempting to use iptables when it tries to
list the iptables rules (e.g. "iptables -S").

(NB: in nftables only a single table is required for both nat and
filter rules - the chains for each are differentiated by specifying
different "hook" locations for the toplevel chain of each)

2) Since the rules that were added to allow tftp/dns/dhcp traffic from
the guests to the host are unnecessary in the context of nftables,
those rules aren't added.

(Longer explanation: In the case of iptables, all rules were in a
single table, and it was always assumed that there would be some
"catch-all" REJECT rule added by "someone else" in the case that a
packet didn't match any specific rules, so libvirt added these
specific rules to ensure that, no matter what other rules were added
by any other subsystem, the guests would still have functional
tftp/dns/dhcp. For nftables though, the rules added by each subsystem
are in a separate table, and in order for traffic to be accepted, it
must be accepted by *all* tables, so just adding the specific rules to
libvirt's table doesn't help anything (as the default for the libvirt
table is ACCEPT anyway) and it just isn't practical/possible for
libvirt to find *all* other tables and add rules in all of them to
make sure the traffic is accepted. libvirt does this for firewalld (it
creates a "libvirt" zone that allows tftp/dns/dhcp, and adds all
virtual network bridges to that zone), however, so in that case no
extra work is required of the sysadmin.)

3) nftables doesn't support the "checksum mangle" rule (or any
equivalent functionality) that we have historically added to our
iptables rules, so the nftables rules we add have nothing related to
checksum mangling.

(NB: The result of (3) is that if you a) have a very old guest (RHEL5
era or earlier) and b) that guest is using a virtio-net network
device, and c) the virtio-net device is using vhost packet processing
(the default) then DHCP on the guest will fail. You can work around
this by adding <driver name='qemu'/> to the <interface> XML for the
guest).

There are certainly much better nftables rulesets that could be used
instead of those implemented here, and everything is in place to make
future changes to the rules that are used simple and free of surprises
(e.g. the rules that are added have coresponding "removal" commands
added to the network status so that we will always remove exactly the
rules that were previously added rather than trying to remove the
rules that "the current build of libvirt would have added" (which will
be incorrect the first time we run a libvirt with a newly modified
ruleset). For this initial implementation though, I wanted the
nftables rules to be as identical to the iptables rules as possible,
just to make it easier to verify that everything is working.

The backend can be manually chosen using the firewall_backend setting
in /etc/libvirt/network.conf. libvirtd/virtnetworkd will read this
setting when it starts; if there is no explicit setting, it will check
for availability of FIREWALL_BACKEND_DEFAULT_1 and then
FIREWALL_BACKEND_DEFAULT_2 (which are set at build time in
meson_options.txt or by adding -Dfirewall_backend_default_n=blah to
the meson commandline), and use the first backend that is available
(ie, that has the necessary programs installed). The standard
meson_options.txt is set to check for nftables first, and then
iptables.

Although it should be very safe to change the default backend from
iptables to nftables, that change is left for a later patch, to show
how the change in default can be undone if someone really needs to do
that.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:07 -04:00
Laine Stump
865eea30f4 meson: stop looking for iptables/ip6tables/ebtables at build time
This was the only reason we required the iptables and ebtables
packages at build time, and many other external commands already have
their binaries found at runtime by looking through $PATH (virCommand
automatically does this), so we may as well do it for these commands
as well.

Since we no longer need iptables or iptables at build time, we can
also drop the BuildRequires for them from the rpm specfile.

Inspired-by: 6aa2fa38b04b802f137e51ebbeb4ca9b67487575
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:07 -04:00
Laine Stump
110383fa30 network: save network status when firewall rules are reloaded
In the case that a new version of libvirt is started that uses
different rules to build the network firewall, we need to re-save the
status so that when the network is destroyed (or the *next* time
libvirt is restarted and wants to remove/re-add the firewall), it will
have the proper information to perform the firewall removal.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:07 -04:00
Laine Stump
97061d576b network: use previously saved list of firewall removal commands
When destroying a network, the network driver has always assumed that
it knew what firewall rules had been added as the network was
started. This was usually correct - I only recall one time in the past
that the firewall rules added by libvirt were changed. But if the
exact rules used for a network *were* ever changed from one
build/version of libvirt to another, then we would end up attempting
to remove rules that hadn't been added, and could possibly *not*
remove rules that had been added.

The solution to this to not make such brash assumptions about the
past, but instead to save (in the network status object at network
start time) a list of all the rules needed to remove the rules that
were added for the network, and then use that saved list during
network destroy to remove exactly what was previous added.

Beyond making net-destroy more precise, there are other benefits:

1) We can change the details of the rules we add for networks from one
build/release of libvirt to another and painlessly upgrade.

2) The user can switch from one firewall backend to another by simply
changing the setting in network.conf and restarting
libvirtd/virtnetworkd.

In both cases, the restarted libvirtd/virtnetworkd will remove all the
rules that had been previously added (based on the network status),
and then add new rules (saving the new removal commands back into the
network status)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:07 -04:00
Laine Stump
0fa79844a1 conf: add a virFirewall object to virNetworkObj
This virFirewall object will store the list of actions required to
remove the firewall that was added for the currently active instance
of the network, so it has been named "fwRemoval" (and when parsed into
XML, the <firewall> element will have the name "fwRemoval").

There are no uses of the fwRemoval object in the virNetworkObj yet,
but everything is in place to add it to the XML when formatted, parse
it from the XML when reading network status, and free the virFirewall
object when the virNetworkObj is freed.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:20:07 -04:00
Laine Stump
df9a505961 util: new functions virFirewallParseXML() and virFirewallFormat()
These functions convert a virFirewall object to/from XML so that it
can be serialized to disk (in a virNetworkObj's status file) and
restored later (e.g. after libvirtd/virtnetworkd is restarted).

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:58 -04:00
Laine Stump
b77b2fc314 util: new function virFirewallNewFromRollback()
virFirewallNewFromRollback() creates a new virFirewall object that
contains a copy of the "rollback" commands from an existing
virFirewall object, but in reverse order. The intent is that this
virFirewall be saved and used later to remove the firewall rules that
were added for a network.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:48 -04:00
Laine Stump
d24b7501dc util: add name attribute to virFirewall
This will be used to label (via "name='blah'") a firewall when it is
formatted to XML and written to the network status.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:36 -04:00
Laine Stump
e1b6b0646f network: turn on auto-rollback for the rules added for virtual networks
So far this will only affect what happens if there is some failure
while applying the firewall rules; the rollback rules aren't yet
persistent beyond that time. More work is needed to remember the
rollback rules while the network is active, and use those rules to
remove the firewall for the network when it is destroyed.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:36 -04:00
Laine Stump
e23907635c util: implement rollback rule autocreation for iptables commands
If the VIR_FIREWALL_TRANSACTION_AUTO_ROLLBACK flag is set, each time
an iptables command is executed that is adding a rule or chain, a
corresponding command that will *delete* the same rule/chain is
constructed and added to the list of rollback commands. If we later
want to undo the entire firewall, we can just run those commands.

This isn't yet used anywhere, since
VIR_FIREWALL_TRANSACTION_AUTO_ROLLBACK isn't being set.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:36 -04:00
Laine Stump
f94c82b0a6 util: new functions to support adding individual firewall rollback commands
In the past virFirewall required all rollback commands for a group
(those commands necessary to "undo" any rules that had been added in
that group in case of a later failure) to be manually added by
switching into the virFirewall object into "rollback mode" and then
re-calling the inverse of the exact virFirewallAddCmd*() APIs that had
been called to add the original rules (ie. for each
"iptables --insert" command, for rollback we would need to add a
command with all arguments identical except that "--insert" would be
replaced by "--delete").

Because nftables can't search for rules to remove by comparing all the
arguments (it instead expects *only* a handle that is provided via
stdout when the rule was originally added), we won't be able to follow
the iptables method and manually construct the command to undo any
given nft command by just duplicating all the args of the command
(except the action). Instead we will need to be able to automatically
create a rollback command at the time the rule-adding command is
executed (e.g. an "nft delete rule" command that would include the
rule handle returned in stdout by an "nft add rule" command).

In order to make this happen, we need to be able to 1) learn whether
the user of the virFirewall API desires this behavior (handled by a new
transaction flag called VIR_FIREWALL_TRANSACTION_AUTO_ROLLBACK that
can be retrieved with the new virFirewallTransactionGetFlags() API),
and 2) add a new command to the current group's rollback command list (with
the new virFirewallAddRollbackCmd()).

We will actually use this capability in an upcoming patch.

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:36 -04:00
Laine Stump
c737f225a9 network: framework to call backend-specific function to init private filter chains
Modify networkSetupPrivateChains() in the network driver to accept a
firewallBackend argument so it will know which backend to call. (right
now it always calls the iptables version of the lower level function,
but in the future it could instead call the nftables version based on
configuration).

But networkSetupPrivateChains() was being called with virOnce(), and
virOnce() doesn't support calling functions that require an argument
(it's based on pthread_once(), which accepts no arguments, so it's not
something we can easily fix in our implementation of virOnce()). To
solve this dilemma, this patch eliminates use of virOnce() by adding a
static lock, and putting all of networkSetupPrivateChains() (including
the setting of "chainInitDone") inside a lock guard - now the places
that used to call it via virOnce() can safely call it directly
instead (adding in the necessary argument to specify backend).

(If it turns out to be significant, we could optimize this by checking
for chainInitDone outside the lock guard, returning immediately if
it's already set, and then moving the setting of chainInitDone up to
the top of the guarded section.)

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:36 -04:00
Laine Stump
64b966558c network: support setting firewallBackend from network.conf
It still can have only one useful value ("iptables"), but once a 2nd
value is supported, it will be selectable by setting
"firewall_backend=nftables" in /etc/libvirt/network.conf.

If firewall_backend isn't set in network.conf, then libvirt will check
to see if FIREWALL_BACKEND_DEFAULT_1 is available and, if so, set
that. (Since FIREWALL_BACKEND_DEFAULT_1 is currently "iptables", this
means checking to see it the iptables binary is present on the
system).  If the default backend isn't available, that is considered a
fatal error (since no networks can be started anyway), so an error is
logged and startup of the network driver fails.

NB: network.conf is itself created from network.conf.in at build time,
and the advertised default setting of firewall_backend (in a commented
out line) is set from the meson_options.txt setting
"firewall_backend_default_1". This way the conf file will have correct
information no matter what ordering is chosen for default backend at
build time (as more backends are added, settings will be added for
"firewall_backend_default_n", and those will be settable in
meson_options.txt and on the meson commandline to change the ordering
of the auto-detection when no backend is set in network.conf).

virNetworkLoadDriverConfig() may look more complicated than necessary,
but as additional backends are added, it will be easier to add checks
for those backends (and to re-order the checks based on builders'
preferences).

Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-05-22 23:19:18 -04:00