mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2025-01-23 04:55:18 +00:00
42ab0148dd
Many years ago (April 2010), soon after "vhost" in-kernel packet processing was added to the virtio-net driver, people running RHEL5 virtual machines with a virtio-net interface connected via a libvirt virtual network noticed that when vhost packet processing was enabled, their VMs could no longer get an IP address via DHCP - the guest was ignoring the DHCP response packets sent by the host. (I've been informed by danpb that the same issue had been encountered, and "fixed" even earlier than that, in 2006, with Xen as the hypervisor.) The "gory details" of the 2010 discussion are chronicled here: https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html but basically it was because packet checksums weren't being fully computed on the host side (because QEMU on the host and the NIC driver in the guest had agreed between themselves to turn off checksums because they were unnecessary due to the "link" between the two being entirely in local memory rather than an error-prone physical cable), but 1) a partial checksum was being put into the packets at some point by "someone" 2) the "don't use checksums" info was known by the guest kernel, which would properly ignore the "bad" checksum), and 3) the packets were being read by the dhclient application on the guest side with a "raw" socket (thus bypassing the guest kernel UDP processing that would have known the checksum was irrelevant and ignore it)), The "fix" for this ended up being two-tiered: 1) The ISC DHCP package (which contains the aforementioned dhclient program) made a fix to their dhclient code which caused it to accept packets anyway even if they didn't have a proper checksum (NB: that's not a full explanation, and possibly not accurate). This remedied the problem for guests with an updated dhclient. Here is the code with the fix to ISC DHCP: https://github.com/isc-projects/dhcp/blob/master/common/packet.c#L365 This eliminated the issue for any new/updated guests that had the fixed dhclient, but it didn't solve the problem for existing/old guest images that didn't/couldn't get their dhclient updated. This brings us to: 2) iptables added a new "CHECKSUM" target and "--checksum-fill" action: http://patchwork.ozlabs.org/patch/58525/ and libvirt added an iptables rule for each virtual network to match DHCP response packets and perform --checksum-fill. This way by the time dhclient on the guest read the raw packet, the checksum would be corrected, and the packet would be accepted. This was pushed upstream in libvirt commit v0.8.2-142-gfd5b15ff1a. The word at the time from those more knowledgeable than me was that the bad checksum problem was really specific to ISC's dhclient running on Linux, and so once their fix was in use everywhere dhclient was used, bad checksums would be a thing of the past and the --checksum-fill iptables rules would no longer be needed (but would otherwise be harmless if they were still there). (Plot twist: the dhclient code in fix (1) above apparently is on a Linux-only code path - this is very important later!) Based on this information (and also due to the opinion that fixing it by having iptables modify the packet checksum was really the wrong way to permanently fix things, i.e. an "ugly hack"), the nftables developers made the decision to not implement an equivalent to --checksum-fill in nftables. As a result, when I wrote the nftables firewall backend for libvirt virtual networks earlier this year, it didn't add in any rule to "fix" broken UDP checksums (since there was apparently no equivalent in nftables and, after all, that was fixed somewhere else 14 years ago, right???) But last week, when Rich Jones was doing routine testing using a Fedora 40 host (the first Fedora release to use the nftables backend of libvirt's network driver by default) and a FreeBSD guest, for "some strange reason", the FreeBSD guest was unable to get an IP address from DHCP!! https://www.spinics.net/linux/fedora/libvirt-users/msg14356.html A few quick tests proved that it was the same old "bad checksum" problem from 2010 come back to haunt us - it wasn't a Linux-only issue after all. Phil Sutter and Eric Garver (nftables people) pointed out that, while nftables doesn't have an action that will *compute* the checksum of a packet, it *does* have an action that will set the checksum to 0, and suggested we try adding a "zero the checksum" rule for dhcp response packets to our nftables ruleset. (Why? Because a checksum value of 0 in a IPv4 UDP packet is defined by RFC768 to mean "no checksum generated", implying "checksum not needed"). It turns out that this works - dhclient properly recognizes that a 0 checksum means "don't bother with the checksum", and accepts the packet as valid. So to once again fix this timeless bug, this patch adds such a checksum zeroing rule to the nftables rules setup for each virtual network. This has been verified (on a Fedora 40 host) to fix DHCP with FreeBSD and OpenBSD guests, while not breaking it for Fedora or Windows (10) guests. Fixes: b89c4991daa0ee9371f10937fab3b03c5ffdabc6 Reported-by: Rich Jones <rjones@redhat.com> Fix-Suggested-by: Eric Garver <egarver@redhat.com> Fix-Suggested-by: Phil Sutter <psutter@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
153 lines
1.7 KiB
Plaintext
153 lines
1.7 KiB
Plaintext
nft \
|
|
list \
|
|
table \
|
|
ip \
|
|
libvirt_network
|
|
nft \
|
|
add \
|
|
table \
|
|
ip \
|
|
libvirt_network
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
forward \
|
|
'{ type filter hook forward priority 0; policy accept; }'
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
guest_output
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_output
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
guest_input
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_input
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
guest_cross
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_cross
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
guest_nat \
|
|
'{ type nat hook postrouting priority 100; policy accept; }'
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip \
|
|
libvirt_network \
|
|
postroute_mangle \
|
|
'{ type filter hook postrouting priority 0; policy accept; }'
|
|
nft \
|
|
list \
|
|
table \
|
|
ip6 \
|
|
libvirt_network
|
|
nft \
|
|
add \
|
|
table \
|
|
ip6 \
|
|
libvirt_network
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
forward \
|
|
'{ type filter hook forward priority 0; policy accept; }'
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
guest_output
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip6 \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_output
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
guest_input
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip6 \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_input
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
guest_cross
|
|
nft \
|
|
insert \
|
|
rule \
|
|
ip6 \
|
|
libvirt_network \
|
|
forward \
|
|
counter \
|
|
jump \
|
|
guest_cross
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
guest_nat \
|
|
'{ type nat hook postrouting priority 100; policy accept; }'
|
|
nft \
|
|
add \
|
|
chain \
|
|
ip6 \
|
|
libvirt_network \
|
|
postroute_mangle \
|
|
'{ type filter hook postrouting priority 0; policy accept; }'
|