passt

mirror of https://passt.top/passt synced 2024-12-22 21:55:22 +00:00

Author	SHA1	Message	Date
David Gibson	0804fdbc28	udp: Correctly look up outbound socket with port remappings Commit `bb9bf0bb` ("tcp, udp: Don't precompute port remappings in epoll references") changed the epoll reference for UDP sockets to include the bound port as seen by the socket itself, rather than the bound port it would be translated to on the guest side. As a side effect, it also means that udp_tap_map[] is indexed by the bound port on the host side, rather than on the guest side. This is consistent and a good idea, however we forgot to account for it when finding the correct outgoing socket for packets originating in the guest. This means that if forwarding UDP inbound with a port number change, reply packets would be misdirected. Fix this by applying the reverse mapping before looking up the socket in udp_tap_handler(). While we're at it, use 'port' directly instead of 'uref.port' in udp_sock_init(). Those now always have the same value - failing to realise that is the same error as above. Reported-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=87 Fixes: `bb9bf0bb8f` ("tcp, udp: Don't precompute port remappings in epoll references") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-25 00:00:34 +02:00
Laurent Vivier	95601237ef	tcp: Replace TCP buffer structure by an iovec array To be able to provide pointers to TCP headers and IP headers without worrying about alignment in the structure, split the structure into several arrays and point to each part of the frame using an iovec array. Using iovec also allows us to simply ignore the first entry when the vnet length header is not needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-19 11:21:09 +02:00
Stefano Brivio	27f1c762b1	conf: Don't fail if the template interface doesn't have a MAC address ...simply resort to using locally-administered address (LAA) as host-side source, instead. Pick 02:00:00:00:00:00, to make it clear that we don't actually care about that address, and also to match the 00 (Administratively Assigned Identifier) quadrant of SLAP (RFC 8948). Otherwise, pasta refuses to start if the template is a tun or Wireguard interface. Link: https://bugs.passt.top/show_bug.cgi?id=49 Link: https://github.com/containers/podman/issues/22320 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-19 11:21:00 +02:00
Stefano Brivio	eca8baa028	conf: We're interested in the MAC address, not in the MAC itself Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-04-19 11:15:36 +02:00
Stefano Brivio	ee338a256e	pasta, util: Align stack area for clones to maximum natural alignment Given that we use this stack pointer as a location to store arbitrary data types from the cloned process, we need to guarantee that its alignment matches any of those possible data types. runsisi reports that pasta gets a SIGBUS in pasta_open_ns() on aarch64, where the alignment requirement for stack pointers is a 16 bytes (same as the size of a long double), and similar requirements actually apply to most architectures we run on. Reported-by: runsisi <runsisi@hust.edu.cn> Link: https://bugs.passt.top/show_bug.cgi?id=85 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-04-19 11:15:27 +02:00
Stefano Brivio	5d5208b67d	treewide: Compilers' name for armv6l and armv7l is "arm" When I switched from 'uname -m' to 'gcc -dumpmachine' to fetch the architecture name for, among others, seccomp.sh, I didn't realise that "armv6l" and "armv7l" are just Linux kernel names -- compilers just call that "arm". Fix the "syscalls" annotation we use to define seccomp profiles accordingly, otherwise pasta will be terminated on sigreturn() on armv6l and armv7l. Fixes: `213c397492` ("passt, pasta: Run-time selection of AVX2 build") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-11 17:34:04 +02:00
David Gibson	954589b64b	test: Verify that podman tests are using the pasta binary we expect Paul Holzinger pointed out that when we invoke the podman tests inside the passt testsuite, the way we point podman at the newly built pasta binary is kind of indirect. It's therefore prudent to check that podman is actually using the binary we expect it to - in particular that it is using the binary built in this tree, not some system installed pasta binary. Suggested-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:24 +02:00
David Gibson	489b28e216	test: catatonit may not be in $PATH The pasta_podman/bats test script looks for 'catatonit' amongst other tools to be avaiiliable on the host. However, while the podman tests do require catatonit, it doesn't necessarily need to be in the regular path. For example Fedora and RHEL place catatonit in /usr/libexec and podman finds it there fine. Therefore, remove it as an htools dependency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:21 +02:00
David Gibson	f9fe3ae5dd	test: Build and download podman as a test asset The pasta_podman/bats test scrpt downloads and builds podman, then runs its pasta specific tests. Downloading from within a test case has some drawbacks: * It can be very tedious if you have poor connectivity to the server * It makes a test that's ostensibly for pasta itself dependent on the state of the github server * It precludes runnning the tests in an isolated network environment The same concerns largely apply to building podman too, because it's pretty common for Go builds to download dependencies themselves. Therefore move the download and build of podman from the test itself, to the Makefile where we prepare other test assets. To avoid cryptic failures if something went wrong with the build, make running the test dependent on having the built podman binary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:16 +02:00
David Gibson	e8b78217bb	test: Make sure to update mbuto repository We download and use mbuto to build trivial boot images for our VM tests. However, if mbuto is already cloned, we won't update it to the current version. Add some make logic to ensure that we do this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:13 +02:00
David Gibson	ef2cb13b49	cppcheck: Explicitly give files to check Currently "make cppcheck" invokes cppcheck on ".", so it will check all the .c and .h files it can find in the source tree. This isn't ideal, because it can find files that aren't actually part of the real build, or even stale files which aren't in git. More practically, some upcoming changes are looking at downloading other source trees for some tests. Static errors in there is Not Our Problem, so checking them is both slow and pointless. So, change the Makefile to invoke cppcheck only on the specific source files that are part of the build. For some reason in this format the badBitmaskCheck warnings in seccomp.h which were suppressed by `5beb3472e` ("cppcheck: Avoid errors due to zeroes in bitwise ORs") no longer trigger. That means we get unmatchedSuppression warnings instead. We add an unmatchedSuppression suppression instead of simply removing the original suppressions, just in case this odd behaviour isn't the same for all cppcheck versions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:11 +02:00
David Gibson	97e8b33f87	netlink: Ignore routes to link-local addresses for selecting interface Since `f919dc7a4b` ("conf, netlink: Don't require a default route to start"), and since `639fdf06ed` ("netlink: Fix selection of template interface") less buggily, we haven't required a default route on the host in order to operate. Instead, if we lack a default route we'll pick an interface with any route, as long as there's only one such interface. If there's more than one, we don't have a good criterion to pick, so we give up with an informational message. Paul Holzinger pointed out that this code considers it ambiguous even if all but one of the interfaces has only routes to link-local addresses (fe80::/10). A route to link-local addresses isn't really useful from pasta's point of view, so ignore them instead. This removes a misleading message in many cases, and a spurious failure in some cases. Suggested-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:08 +02:00
David Gibson	67a6258918	util: Add helper to return name of address family We have a few places where we want to include the name of the internet protocol version (IPv4 or IPv6) in a message, which we handle with an open-coded ?: expression. This seems like something that might be more widely useful, so make a trivial helper to return the correct string based on the address family. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 16:59:05 +02:00
Stefano Brivio	f4e38b5cd2	netlink: Adjust interface index inside copied nexthop objects too As pasta duplicates host routes into the target namespaces, interface indices might not match, so we go through RTA_OIF attributes and fix them up to match the identifier in the namespace. But RTA_OIF is not the ony attribute specifying interfaces for routes: multipath routes use RTA_MULTIPATH attributes with nexthop objects, which contain in turn interface indices. Fix them up as well. If we don't, and we have at least two host interfaces, and the host interface we use as template isn't the first one (hence the mismatching indices), we'll fail to insert multipath routes with nexthop objects, and ultimately refuse to start as the kernel unexpectedly gives us ENODEV. Link: https://github.com/containers/podman/issues/22192 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-04-05 16:58:52 +02:00
Danish Prakash	88c2f08eba	apparmor: Fix access to procfs namespace entries in pasta's abstraction From an original patch by Danish Prakash. With commit `ff22a78d7b` ("pasta: Don't try to watch namespaces in procfs with inotify, use timer instead"), if a filesystem-bound target namespace is passed on the command line, we'll grab a handle on its parent directory. That commit, however, didn't introduce a matching AppArmor rule. Add it here. To access a network namespace procfs entry, we also need a 'ptrace' rule. See commit `594dce66d3` ("isolation: keep CAP_SYS_PTRACE when required") for details as to when we need this -- essentially, it's about operation with Buildah. Reported-by: Jörg Sonnenberger <joerg@bec.de> Link: https://github.com/containers/buildah/issues/5440 Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840 Fixes: `ff22a78d7b` ("pasta: Don't try to watch namespaces in procfs with inotify, use timer instead") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 12:12:26 +02:00
Stefano Brivio	100919ce74	apparmor: Expand scope of @{run}/user access, allow writing PID files too With Podman's custom networks, pasta will typically need to open the target network namespace at /run/user/<UID>/containers/networks: grant access to anything under /run/user/<UID> instead of limiting it to some subpath. Note that in this case, Podman will need pasta to write out a PID file, so we need write access, for similar locations, too. Reported-by: Jörg Sonnenberger <joerg@bec.de> Link: https://github.com/containers/buildah/issues/5440 Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 12:12:26 +02:00
Stefano Brivio	dc7b7f28b7	apparmor: Add mount rule with explicit, empty source in passt abstraction For the policy to work as expected across either AppArmor commit 9d3f8c6cc05d ("parser: fix parsing of source as mount point for propagation type flags") and commit 300889c3a4b7 ("parser: fix option flag processing for single conditional rules"), we need one mount rule with matching mount options as "source" (that is, without source), and one without mount options and an explicit, empty source. Link: https://github.com/containers/buildah/issues/5440 Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-05 12:12:26 +02:00
Stefano Brivio	bbea2752f6	README.md: Alpine, Guix and OpenSUSE now have packages for passt Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-04-05 12:12:23 +02:00
David Gibson	4988e2b406	tcp: Unconditionally force ACK for all !SYN, !RST packets Currently we set ACK on flags packets only when the acknowledged byte pointer has advanced, or we hadn't previously set a window. This means in particular that we can send a window update with no ACK flag, which doesn't appear to be correct. RFC 9293 requires a receiver to ignore such a packet [0], and indeed it appears that every non-SYN, non-RST packet should have the ACK flag. The reason for the existing logic, rather than always forcing an ACK seems to be to avoid having the packet mistaken as a duplicate ACK which might trigger a fast retransmit. However, earlier tests in the function mean we won't reach here if we don't have either an advance in the ack pointer - which will already set the ACK flag, or a window update - which shouldn't trigger a fast retransmit. [0] https://www.ietf.org/rfc/rfc9293.html#section-3.10.7.4-2.5.2.1 Link: https://github.com/containers/podman/issues/22146 Link: https://bugs.passt.top/show_bug.cgi?id=84 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-26 09:52:04 +01:00
David Gibson	5894a245b9	tcp: Never automatically add the ACK flag to RST packets tcp_send_flag() will sometimes force on the ACK flag for all !SYN packets. This doesn't make sense for RST packets, where plain RST and RST+ACK have somewhat different meanings. AIUI, RST+ACK indicates an abrupt end to a connection, but acknowledges data already sent. Plain RST indicates an abort, when one end receives a packet that doesn't seem to make sense in the context of what it knows about the connection. All of the cases where we send RSTs are the second, so we don't want an ACK flag, but we currently could add one anyway. Change that, so we won't add an ACK to an RST unless the caller explicitly requests it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-26 09:51:58 +01:00
David Gibson	16c2d8da0d	tcp: Rearrange logic for setting ACK flag in tcp_send_flag() We have different paths for controlling the ACK flag for the SYN and !SYN paths. This amounts to sometimes forcing on the ACK flag in the !SYN path regardless of options. We can rearrange things to explicitly be that which will make things neater for some future changes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-26 09:51:55 +01:00
David Gibson	99355e25b9	tcp: Split handling of DUP_ACK from ACK The DUP_ACK flag to tcp_send_flag() has two effects: first it forces the setting of the ACK flag in the packet, even if we otherwise wouldn't. Secondly, it causes a duplicate of the flags packet to be sent immediately after the first. Setting the ACK flag to tcp_send_flag() also has the first effect, so instead of having DUP_ACK also do that, pass both flags when we need both operations. This slightly simplifies the logic of tcp_send_flag() in a way that makes some future changes easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-26 09:51:41 +01:00
Laurent Vivier	71dd405460	util: fix confusion between offset in the iovec array and in the entry In write_remainder() 'skip' is the offset to start the operation from in the iovec array. In iov_skip_bytes(), 'skip' is also the offset in the iovec array but 'offset' is the first unskipped byte in the iovec entry. As write_remainder() uses 'skip' for both, 'skip' is reset to the first unskipped byte in the iovec entry rather to staying the first unskipped byte in the iovec array. Fix the problem by introducing a new variable not to overwrite 'skip' on each loop. Fixes: `8bdb0883b4` ("util: Add write_remainder() helper") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-20 10:06:32 +01:00
David Gibson	639fdf06ed	netlink: Fix selection of template interface Since `f919dc7a4b` ("conf, netlink: Don't require a default route to start"), if there is only one host interface with routes, we will pick that as the template interface, even if there are no default routes for an IP version. Unfortunately this selection had a serious flaw: in some cases it would 'return' in the middle of an nl_foreach() loop, meaning we wouldn't consume all the netlink responses for our query. This could cause later netlink operations to fail as we read leftover responses from the aborted query. Rewrite the interface detection to avoid this problem. While we're there: * Perform detection of both default and non-default routes in a single pass, avoiding an ugly goto * Give more detail on error and working but unusual paths about the situation (no suitable interface, multiple possible candidates, etc.). Fixes: `f919dc7a4b` ("conf, netlink: Don't require a default route to start") Link: https://bugs.passt.top/show_bug.cgi?id=83 Link: https://github.com/containers/podman/issues/22052 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Use info(), not warn() for somewhat expected cases where one IP version has no default routes, or no routes at all] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-20 09:34:08 +01:00
David Gibson	d35bcbee90	netlink: Fix handling of NLMSG_DONE in nl_route_dup() A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families") changed netlink behaviour so that the NLMSG_DONE terminating a bunch of responses can go in the same datagram as those responses, rather than in a separate one. Our netlink code is supposed to handle that behaviour, and indeed does so for most cases, using the nl_foreach() macro. However, there was a subtle error in nl_route_dup() which doesn't work with this change. `f00b1534` ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own subtle error. The problem arises because nl_route_dup(), unlike other cases doesn't just make a single pass through all the responses to a netlink request. It needs to get all the routes, then make multiple passes through them. We don't really have anywhere to buffer multiple datagrams, so we only support the case where all the routes fit in a single datagram - but we need to fail gracefully when that's not the case. After receiving the first datagram of responses (with nl_next()) we have a first loop scanning them. It needs to exit when either we run out of messages in the datagram (!NLMSG_OK()) or when we get a message indicating the last response (nl_status() <= 0). What we do after the loop depends on which exit case we had. If we saw the last response, we're done, but otherwise we need to receive more datagrams to discard the rest of the responses. We attempt to check for that second case by re-checking NLMSG_OK(nh, status). However in the got-last-response case, we've altered status from the number of remaining bytes to the error code (usually 0). That means NLMSG_OK() now returns false even if it didn't during the loop check. To fix this we need separate variables for the number of bytes left and the final status code. We also checked status after the loop, but this was redundant: we can only exit the loop with NLMSG_OK() == true if status <= 0. Reported-by: Martin Pitt <mpitt@redhat.com> Fixes: `f00b153414` ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") Fixes: `4d6e9d0816` ("netlink: Always process all responses to a netlink request") Link: https://github.com/containers/podman/issues/22052 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-19 10:23:34 +01:00
Dan Čermák	615d370ca2	fedora: Switch license identifier to SPDX The spec file patch by Dan Čermák was originally contributed at: https://src.fedoraproject.org/rpms/passt/pull-request/1 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-18 08:57:47 +01:00
Stefano Brivio	d989eae308	udp: Translate source address of resolver only for DNS remapped queries Paul reports that if pasta is configured with --dns-forward, and the container queries a resolver which is configured on the host directly, without using the address given for --dns-forward, we'll translate the source address of the response pretending it's coming from the address passed as --dns-forward, and the client will discard the reply. That is, $ cat /etc/resolv.conf 198.51.100.1 $ pasta --config-net --dns-forward 192.0.2.1 nslookup passt.top will not work, because we change the source address of the reply from 198.51.100.1 to 192.0.2.1. But the client contacted 198.51.100.1, and it's from that address that it expects an answer. Add a PORT_DNS_FWD flag for tap-facing ports, which is triggered by activity in the opposite direction as the other flags. If the tap-facing port was seen sending a DNS query that was remapped, we'll remap the source address of the response, otherwise we'll leave it unaffected. Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-18 08:57:40 +01:00
Stefano Brivio	f919dc7a4b	conf, netlink: Don't require a default route to start There might be isolated testing environments where default routes and global connectivity are not needed, a single interface has all non-loopback addresses and routes, and still passt and pasta are expected to work. In this case, it's pretty obvious what our upstream interface should be, so go ahead and select the only interface with at least one route, disabling DHCP and implying --no-map-gw as the documentation already states. If there are multiple interfaces with routes, though, refuse to start, because at that point it's really not clear what we should do. Reported-by: Martin Pitt <mpitt@redhat.com> Link: https://github.com/containers/podman/issues/21896 Signed-off-by: Stefano brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-18 08:57:21 +01:00
Stefano Brivio	f00b153414	netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE Martin reports that, with Fedora Linux kernel version kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64, including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families"), pasta doesn't exit once the network namespace is gone. Actually, pasta is completely non-functional, at least with default options, because nl_route_dup(), which duplicates routes from the parent namespace into the target namespace at start-up, is stuck on a second receive operation for RTM_GETROUTE. However, with that commit, the kernel is now able to fit the whole response, including the NLMSG_DONE message, into a single datagram, so no further messages will be received. It turns out that commit `4d6e9d0816` ("netlink: Always process all responses to a netlink request") accidentally relied on the fact that we would always get at least two datagrams as a response to RTM_GETROUTE. That is, the test to check if we expect another datagram, is based on the 'status' variable, which is 0 if we just parsed NLMSG_DONE, but we'll also expect another datagram if NLMSG_OK on the last message is false. But NLMSG_OK with a zero length is always false. The problem is that we don't distinguish if status is zero because we got a NLMSG_DONE message, or because we processed all the available datagram bytes. Introduce an explicit check on NLMSG_DONE. We should probably refactor this slightly, for example by introducing a special return code from nl_status(), but this is probably the least invasive fix for the issue at hand. Reported-by: Martin Pitt <mpitt@redhat.com> Link: https://github.com/containers/podman/issues/22052 Fixes: `4d6e9d0816` ("netlink: Always process all responses to a netlink request") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-18 08:56:32 +01:00
David Gibson	d3eb0d7b59	tap: Rename tap_iov_{base,len} These two functions are typically used to calculate values to go into the iov_base and iov_len fields of a struct iovec. They don't have to be used for that, though. Rename them in terms of what they actually do: calculate the base address and total length of the complete frame, including both L2 and tap specific headers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:43 +01:00
David Gibson	4db947d17c	tap: Implement tap_send() "slow path" in terms of fast path Most times we send frames to the guest it goes via tap_send_frames(). However "slow path" protocols - ARP, ICMP, ICMPv6, DHCP and DHCPv6 - go via tap_send(). As well as being a semantic duplication, tap_send() contains at least one serious problem: it doesn't properly handle short sends, which can be fatal on the qemu socket connection, since frame boundaries will get out of sync. Rewrite tap_send() to call tap_send_frames(). While we're there, rename it tap_send_single() for clarity. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:37 +01:00
David Gibson	1ebe787fe4	tap: Simplify some casts in the tap "slow path" functions We can both remove some variables which differ from others only in type, and slightly improve type safety. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:33 +01:00
David Gibson	2d0e0084b6	tap: Extend tap_send_frames() to allow multi-buffer frames tap_send_frames() takes a vector of buffers and requires exactly one frame per buffer. We have future plans where we want to have multiple buffers per frame in some circumstances, so extend tap_send_frames() to take the number of buffers per frame as a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Improve comment to rembufs calculation] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:28 +01:00
Stefano Brivio	f67238aa86	passt, log: Call __openlog() earlier, log to stderr until we detach Paul reports that, with commit `15001b39ef` ("conf: set the log level much earlier"), early messages aren't reported to standard error anymore. The reason is that, once the log mask is changed from LOG_EARLY, we don't force logging to stderr, and this mechanism was abused to have early errors on stderr. Now that we drop LOG_EARLY earlier on, this doesn't work anymore. Call __openlog() as soon as we know the mode we're running as, using LOG_PERROR. Then, once we detach, if we're not running from an interactive terminal and logging to standard error is not forced, drop LOG_PERROR from the options. While at it, check if the standard error descriptor refers to a terminal, instead of checking standard output: if the user redirects standard output to /dev/null, they might still want to see messages from standard error. Further, make sure we don't print messages to standard error reporting that we couldn't log to the system logger, if we didn't open a connection yet. That's expected. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `15001b39ef` ("conf: set the log level much earlier") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:19:36 +01:00
Stefano Brivio	3fe9878db7	pcap: Use clock_gettime() instead of gettimeofday() POSIX.1-2008 declared gettimeofday() as obsolete, but I'm a dinosaur. Usually, C libraries translate that to the clock_gettime() system call anyway, but this doesn't happen in Jon's environment, and, there, seccomp happily kills pasta(1) when started with --pcap, because we didn't add gettimeofday() to our seccomp profiles. Use clock_gettime() instead. Reported-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:18:36 +01:00
Stefano Brivio	0761f29a14	passt.1: --{no-,}dhcp-dns and --{no-,}dhcp-search don't take addresses ...they are simple enable/disable options. Fixes: `89678c5157` ("conf, udp: Introduce basic DNS forwarding") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:18:09 +01:00
Stefano Brivio	4d05ba2c58	conf: Warn if we can't advertise any nameserver via DHCP, NDP, or DHCPv6 We might have read from resolv.conf, or from the command line, a resolver that's reachable via loopback address, but that doesn't mean we can offer that via DHCP, NDP or DHCPv6: warn if there are no resolvers we can offer for a given IP version. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:17:37 +01:00
Stefano Brivio	43881636c2	conf: Handle addresses passed via --dns just like the ones from resolv.conf ...that is, call add_dns4() and add_dns6() instead of simply adding those to the list of servers we advertise. Most importantly, this will set the 'dns_host' field for the matching IP version, so that, as mentioned in the man page, servers passed via --dns are used for DNS mapping as well, if used in combination with --dns-forward. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://bugs.passt.top/show_bug.cgi?id=82 Fixes: `89678c5157` ("conf, udp: Introduce basic DNS forwarding") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:16:04 +01:00
Laurent Vivier	b299942bbd	tap: Capture only packets that are actually sent In tap_send_frames(), if we failed to send all the frames, we must only log the frames that have been sent, not all the frames we wanted to send. Fixes: `dda7945ca9` ("pcap: Handle short writes in pcap_frame()") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:27 +01:00
David Gibson	413c15988e	udp: Use existing helper for UDP checksum on inbound IPv6 packets Currently we open code the calculation of the UDP checksum in udp_update_hdr6(). We calling a helper to handle the IPv6 pseudo-header, and preset the checksum field to 0 so an uninitialised value doesn't get folded in. We already have a helper to do this: csum_udp6() which we use in some slow paths. Use it here as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:25 +01:00
David Gibson	ae69838db0	udp: Avoid unnecessary pointer in udp_update_hdr4() We carry around the source address as a pointer to a constant struct in_addr. But it's silly to carry around a 4 or 8 byte pointer to a 4 byte IPv4 address. Just copy the IPv4 address around by value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:21 +01:00
David Gibson	b0419d150a	udp: Re-order udp_update_hdr[46] for clarity and brevity The order of things in these functions is a bit odd for historical reasons. We initialise some IP header fields early, the more later after making some tests. Likewise we declare some variables without initialisation, but then unconditionally set them to values we could calculate at the start of the function. Previous cleanups have removed the reasons for some of these choices, so reorder for clarity, and where possible move the first assignment into an initialiser. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:19 +01:00
David Gibson	8a842e03cd	udp: Pass data length explicitly to to udp_update_hdr[46] These functions take an index to the L2 buffer whose header information to update. They use that for two things: to locate the buffer pointer itself, and to retrieve the length of the received message from the paralllel udp[46]_l2_mh_sock array. The latter is arguably a failure to separate concerns. Change these functions to explicitly take a buffer pointer and payload length as parameters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:17 +01:00
David Gibson	76571ae869	udp: Consistent port variable names in udp_update_hdr[46] In these functions we have 'dstport' for the destination port, but 'src_port' for the source port. Change the latter to 'srcport' for consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:15 +01:00
David Gibson	205b140dec	udp: Refactor udp_sock[46]_iov_init() Each of these functions have 3 essentially identical loops in a row. Merge the loops into a single common udp_sock_iov_init() function, calling udp_sock[46]_iov_init_one() helpers to initialize each "slot" in the various parallel arrays. This is slightly neater now, and more naturally allows changes we want to make where more initialization will become common between IPv4 and IPv6. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:36:59 +01:00
Stefano Brivio	860d2764dd	conf: Don't warn if nameservers were found, but won't be advertised Starting from commit `3a2afde87d` ("conf, udp: Drop mostly duplicated dns_send arrays, rename related fields"), we won't add to c->ip4.dns and c->ip6.dns nameservers that can't be used by the guest or container, and we won't advertise them. However, the fact that we don't advertise any nameserver doesn't mean that we didn't find any, and we should warn only if we couldn't find any. This is particularly relevant in case both --dns-forward and --no-map-gw are passed, and a single loopback address is listed in /etc/resolv.conf: we'll forward queries directed to the address specified by --dns-forward to the loopback address we found, we won't advertise that address, so we shouldn't warn: this is a perfectly legitimate usage. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/19213 Fixes: `3a2afde87d` ("conf, udp: Drop mostly duplicated dns_send arrays, rename related fields") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Paul Holzinger <pholzing@redhat.com>	2024-03-12 01:50:48 +01:00
David Gibson	4779dfe12f	icmp: Use 'flowside' epoll references for ping sockets Currently ping sockets use a custom epoll reference type which includes the ICMP id. However, now that we have entries in the flow table for ping flows, finding that is sufficient to get everything else we want, including the id. Therefore remove the icmp_epoll_ref type and use the general 'flowside' field for ping sockets. Having done this we no longer need separate EPOLL_TYPE_ICMP and EPOLL_TYPE_ICMPV6 reference types, because we can easily determine which case we have from the flow type. Merge both types into EPOLL_TYPE_PING. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-12 01:49:05 +01:00
David Gibson	02cbdb0b86	icmp: Flow based error reporting Use flow_dbg() and flow_err() helpers to generate flow-linked error messages in most places. Make a few small improvements to the messages while we're at it. This allows us to avoid the awkward 'pname' variables since whether we're dealing with ICMP or ICMPv6 is already built into the flow type which these helpers include. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Coding style fix in icmp_tap_handler()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-12 01:36:04 +01:00
David Gibson	3af5e9fdba	icmp: Store ping socket information in flow table Currently icmp_id_map[][] stores information about ping sockets in a bespoke structure. Move the same information into new types of flow in the flow table. To match that change, replace the existing ICMP timer with a flow-based timer for expiring ping sockets. This has the advantage that we only need to scan the active flows, not all possible ids. We convert icmp_id_map[][] to point to the flow table entries, rather than containing its own information. We do still use that array for locating the right ping flows, rather than using a "flow native" form of lookup for the time being. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update id_sock description in comment to icmp_ping_new()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-12 01:34:45 +01:00
Stefano Brivio	383a6f67e5	ip: Use regular htons() for non-constant protocol number in L2_BUF_IP4_PSUM instead of htons_constant(), which is for... constants. Fixes: `5bf200ae8a` ("tcp, udp: Don't include destination address in partially precomputed csums") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-08 10:31:14 +01:00

... 6 7 8 9 10 ...

1768 Commits