1
0
mirror of https://passt.top/passt synced 2024-07-02 07:52:41 +00:00
Commit Graph

177 Commits

Author SHA1 Message Date
David Gibson
1ba76c9e8c udp: Single buffer for IPv4, IPv6 headers and metadata
Currently we have separate arrays for IPv4 and IPv6 which contain the
headers for guest-bound packets, and also the originating socket address.
We can combine these into a single array of "metadata" structures with
space for both pre-cooked IPv4 and IPv6 headers, as well as shared space
for the tap specific header and socket address (using sockaddr_inany).

Because we're using IOVs to separately address the pieces of each frame,
these structures don't need to be packed to keep the headers contiguous
so we can more naturally arrange for the alignment we want.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:52 +02:00
David Gibson
d4598e1d18 udp: Use the same buffer for the L2 header for all frames
Currently each tap-bound frame buffer has room for its own ethernet header.
However the ethernet header is always the same for such frames, so now
that we're indirectly referencing the ethernet header via iov, we can use
a single buffer for all of them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:49 +02:00
David Gibson
6170688616 udp: Share payload buffers between IPv4 and IPv6
Currently the IPv4 and IPv6 paths unnecessarily use different buffers for
the UDP payload.  Now that we're handling the various pieces of the UDP
packets with an iov, we can split the payload part of the buffers off into
its own array shared between IPv4 and IPv6.  As well as saving a little
memory, this allows the payload buffers to be neatly page aligned.

With the buffers merged, udp[46]_l2_iov_sock contain exactly the same thing
as each other and can also be merged.  Likewise udp[46]_iov_splice can be
merged together.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:46 +02:00
David Gibson
2d16946bac udp: Explicitly set checksum in guest-bound UDP headers
For IPv4, UDP checksums are optional and can just be set to 0.
udp_update_hdr4() ignores the checksum field entirely.  Since these are set
to 0 during startup, this works as intended for now.

However, we'd like to share payload and UDP header buffers betweem IPv4 and
IPv6, which does calculate UDP checksums.  Therefore, for robustness, we
should explicitly set the checksum field to 0 for guest-bound UDP packets.

In the tap_udp4_send() slow path, however, we do allow IPv4 UDP checksums
to be calculated as a compile time option.  For consistency, use the same
thing in the udp_update_hdr4() path, which will typically initialize to 0,
but calculate a real checksum if configured to do so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:44 +02:00
David Gibson
6c4d26a364 udp: Combine initialisation of IPv4 and IPv6 iovs
We're going to introduce more sharing between the IPv4 and IPv6 buffer
structures.  Prepare for this by combinng the initialisation functions.
While we're at it remove the misleading "sock" from the name since these
initialise both tap side and sock side structures.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:41 +02:00
David Gibson
3f9bd867b5 udp: Split tap-bound UDP packets into multiple buffers using io vector
When sending to the tap device, currently we assemble the headers and
payload into a single contiguous buffer.  Those are described by a single
struct iovec, then a batch of frames is sent to the device with
tap_send_frames().

In order to better integrate the IPv4 and IPv6 paths, we want the IP
header in a different buffer that might not be contiguous with the
payload.  To prepare for that, split the UDP packet into an iovec of
buffers.  We use the same split that Laurent recently introduced for
TCP for convenience.

This removes the last use of tap_hdr_len_(), tap_frame_base() and
tap_frame_len(), so remove those too.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:38 +02:00
David Gibson
3559899586 iov: Helper macro to construct iovs covering existing variables or fields
Laurent's recent changes mean we use IO vectors much more heavily in the
TCP code.  In many of those cases, and few others around the code base,
individual iovs of these vectors are constructed to exactly cover existing
variables or fields.  We can make initializing such iovs shorter and
clearer with a macro for the purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:31 +02:00
David Gibson
5566386f5f treewide: Standardise variable names for various packet lengths
At various points we need to track the lengths of a packet including or
excluding various different sets of headers.  We don't always use the same
variable names for doing so.  Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.

To improve clarity, standardise on these names:
   dlen:		L4 protocol payload length ("data length")
   l4len:		plen + length of L4 protocol header
   l3len:		l4len + length of IPv4/IPv6 header
   l2len:		l3len + length of L2 (ethernet) header

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:23 +02:00
David Gibson
9e22c53aa9 checksum: Make csum_ip4_header() take a host endian length
csum_ip4_header() takes the packet length as a network endian value.  In
general it's very error-prone to pass non-native-endian values as a raw
integer.  It's particularly bad here because this differs from other
checksum functions (e.g. proto_ipv4_header_psum()) which take host native
lengths.

It turns out all the callers have easy access to the native endian value,
so switch it to use host order like everything else.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:21 +02:00
David Gibson
34fb381b5a tap: Split tap specific and L2 (ethernet) headers
In some places (well, actually only UDP now) we use struct tap_hdr to
represent both tap backend specific and L2 ethernet headers.  Handling
these together seemed like a good idea at the time, but Laurent's changes
in the TCP code working towards vhost-user support suggest that treating
them separately is more useful, more often.

Alter struct tap_hdr to represent only the TAP backend specific headers.
Updated related helpers and the UDP code to match.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:08 +02:00
David Gibson
0804fdbc28 udp: Correctly look up outbound socket with port remappings
Commit bb9bf0bb ("tcp, udp: Don't precompute port remappings in epoll
references") changed the epoll reference for UDP sockets to include the
bound port as seen by the socket itself, rather than the bound port it
would be translated to on the guest side.  As a side effect, it also means
that udp_tap_map[] is indexed by the bound port on the host side, rather
than on the guest side.  This is consistent and a good idea, however we
forgot to account for it when finding the correct outgoing socket for
packets originating in the guest.  This means that if forwarding UDP
inbound with a port number change, reply packets would be misdirected.

Fix this by applying the reverse mapping before looking up the socket in
udp_tap_handler().  While we're at it, use 'port' directly instead of
'uref.port' in udp_sock_init().  Those now always have the same value -
failing to realise that is the same error as above.

Reported-by: Laurent Jacquot <jk@lutty.net>
Link: https://bugs.passt.top/show_bug.cgi?id=87
Fixes: bb9bf0bb8f ("tcp, udp: Don't precompute port remappings in epoll references")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-04-25 00:00:34 +02:00
Stefano Brivio
d989eae308 udp: Translate source address of resolver only for DNS remapped queries
Paul reports that if pasta is configured with --dns-forward, and the
container queries a resolver which is configured on the host directly,
without using the address given for --dns-forward, we'll translate
the source address of the response pretending it's coming from the
address passed as --dns-forward, and the client will discard the
reply.

That is,

  $ cat /etc/resolv.conf
  198.51.100.1
  $ pasta --config-net --dns-forward 192.0.2.1 nslookup passt.top

will not work, because we change the source address of the reply from
198.51.100.1 to 192.0.2.1. But the client contacted 198.51.100.1, and
it's from that address that it expects an answer.

Add a PORT_DNS_FWD flag for tap-facing ports, which is triggered by
activity in the opposite direction as the other flags. If the
tap-facing port was seen sending a DNS query that was remapped, we'll
remap the source address of the response, otherwise we'll leave it
unaffected.

Reported-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-03-18 08:57:40 +01:00
David Gibson
d3eb0d7b59 tap: Rename tap_iov_{base,len}
These two functions are typically used to calculate values to go into the
iov_base and iov_len fields of a struct iovec.  They don't have to be used
for that, though.  Rename them in terms of what they actually do: calculate
the base address and total length of the complete frame, including both L2
and tap specific headers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-14 16:57:43 +01:00
David Gibson
2d0e0084b6 tap: Extend tap_send_frames() to allow multi-buffer frames
tap_send_frames() takes a vector of buffers and requires exactly one frame
per buffer.  We have future plans where we want to have multiple buffers
per frame in some circumstances, so extend tap_send_frames() to take the
number of buffers per frame as a parameter.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Improve comment to rembufs calculation]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-14 16:57:28 +01:00
David Gibson
413c15988e udp: Use existing helper for UDP checksum on inbound IPv6 packets
Currently we open code the calculation of the UDP checksum in
udp_update_hdr6().  We calling a helper to handle the IPv6 pseudo-header,
and preset the checksum field to 0 so an uninitialised value doesn't get
folded in.  We already have a helper to do this: csum_udp6() which we use
in some slow paths.  Use it here as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:37:25 +01:00
David Gibson
ae69838db0 udp: Avoid unnecessary pointer in udp_update_hdr4()
We carry around the source address as a pointer to a constant struct
in_addr.  But it's silly to carry around a 4 or 8 byte pointer to a 4 byte
IPv4 address.  Just copy the IPv4 address around by value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:37:21 +01:00
David Gibson
b0419d150a udp: Re-order udp_update_hdr[46] for clarity and brevity
The order of things in these functions is a bit odd for historical reasons.
We initialise some IP header fields early, the more later after making
some tests.  Likewise we declare some variables without initialisation,
but then unconditionally set them to values we could calculate at the
start of the function.

Previous cleanups have removed the reasons for some of these choices, so
reorder for clarity, and where possible move the first assignment into an
initialiser.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:37:19 +01:00
David Gibson
8a842e03cd udp: Pass data length explicitly to to udp_update_hdr[46]
These functions take an index to the L2 buffer whose header information to
update.  They use that for two things: to locate the buffer pointer itself,
and to retrieve the length of the received message from the paralllel
udp[46]_l2_mh_sock array.  The latter is arguably a failure to separate
concerns.  Change these functions to explicitly take a buffer pointer and
payload length as parameters.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:37:17 +01:00
David Gibson
76571ae869 udp: Consistent port variable names in udp_update_hdr[46]
In these functions we have 'dstport' for the destination port, but
'src_port' for the source port.  Change the latter to 'srcport' for
consistency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:37:15 +01:00
David Gibson
205b140dec udp: Refactor udp_sock[46]_iov_init()
Each of these functions have 3 essentially identical loops in a row.
Merge the loops into a single common udp_sock_iov_init() function, calling
udp_sock[46]_iov_init_one() helpers to initialize each "slot" in the
various parallel arrays.  This is slightly neater now, and more naturally
allows changes we want to make where more initialization will become common
between IPv4 and IPv6.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-13 14:36:59 +01:00
Laurent Vivier
6b22e10a26 tap: make tap_update_mac() generic
Use ethhdr rather than tap_hdr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-9-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:49 +01:00
Laurent Vivier
7df624e79a checksum: introduce functions to compute the header part checksum for TCP/UDP
The TCP and UDP checksums are computed using the data in the TCP/UDP
payload but also some informations in the IP header (protocol,
length, source and destination addresses).

We add two functions, proto_ipv4_header_psum() and
proto_ipv6_header_psum(), to compute the checksum of the IP
header part.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-ID: <20240303135114.1023026-8-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:47 +01:00
Laurent Vivier
feb4900c25 checksum: use csum_ip4_header() in udp.c and tcp.c
We can find the same function to compute the IPv4 header
checksum in tcp.c, udp.c and tap.c

Use the function defined for tap.c, csum_ip4_header(), but
with the code used in tcp.c and udp.c as it doesn't need a fully
initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-7-lvivier@redhat.com>
[dwg: Fix weird cppcheck regression; it appears to be a problem
 in pre-existing code, but somehow this patch is exposing it]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:44 +01:00
Laurent Vivier
e82b4fe5fc udp: little cleanup in udp_update_hdrX() to prepare future changes
in udp_update_hdr4():

    Assign the source address to src, either b->s_in.sin_addr,
    c->ip4.dns_match or c->ip4.gw and then set b->iph.saddr to src->s_addr.

in udp_update_hdr6():

   Assign the source address to src, either b->s_in6.sin6_addr,
   c->ip6.dns_match, c->ip6.gw or c->ip6.addr_ll.
   Assign the destination to dst, either c->ip6.addr_seen or
   &c->ip6.addr_ll_seen.
   Then set dst to b->ip6h.daddr and src to b->ip6h.saddr.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-ID: <20240303135114.1023026-6-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:41 +01:00
Laurent Vivier
324bd46782 util: move IP stuff from util.[ch] to ip.[ch]
Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures.  Modify various files to include the new header ip.h when
it's needed.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-5-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:38 +01:00
David Gibson
3b9098aa49 fwd: Rename port_fwd.[ch] and their contents
Currently port_fwd.[ch] contains helpers related to port forwarding,
particular automatic port forwarding.  We're planning to allow much more
flexible sorts of forwarding, including both port translation and NAT based
on the flow table.  This will subsume the existing port forwarding logic,
so rename port_fwd.[ch] to fwd.[ch] with matching updates to all the names
within.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 09:48:27 +01:00
David Gibson
bb9bf0bb8f tcp, udp: Don't precompute port remappings in epoll references
The epoll references for both TCP listening sockets and UDP sockets
includes a port number.  This gives the destination port that traffic
to that socket will be sent to on the other side.  That will usually
be the same as the socket's bound port, but might not if the -t, -u,
-T or -U options are given with different original and forwarded port
numbers.

As we move towards a more flexible forwarding model for passt, it's
going to become possible for that destination port to vary depending
on more things (for example the source or destination address).  So,
it will no longer make sense to have a fixed value for a listening
socket.

Change to simpler semantics where this field in the reference gives
the bound port of the socket.  We apply the translations to the
correct destination port later on, when we're actually forwarding.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 09:47:40 +01:00
David Gibson
0cf6b2d89d inany: Provide more conveniently typed constants for special addresses
Our inany_addr type is used in some places to represent either IPv4 or
IPv6 addresses, and we plan to use it more widely.  We don't yet
provide constants of this type for special addresses (loopback and
"any").  Add some of these, both the IPv4 and IPv6 variants of those
addresses, but typed as union inany_addr.

To avoid actually adding more things to .data we can use some macros and
casting to overlay the IPv6 versions of these with the standard library's
in6addr_loopback and in6addr_any.  For the IPv4 versions we need to create
new constant globals.

For complicated historical reasons, the standard library doesn't
provide constants for IPv4 loopback and any addresses as struct
in_addr.  It just has macros of type in_addr_t == uint32_t, which has
some gotchas w.r.t. endianness.  We can use some more macros to
address this lack, using macros to effectively create these IPv4
constants as pieces of the inany constants above.

We use this last to avoid some awkward temporary variables just used
to get an address of an IPv4 loopback address.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 09:47:28 +01:00
David Gibson
90f1d3b354 udp: Remove unnecessary test for unspecified addr_out
If the configured output address is unspecified, we don't set the bind
address to it when creating a new socket in udp_tap_handler().  That sounds
sensible, but what we're leaving the bind address as is, exactly, the
unspecified address, so this test makes no difference.  Remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 06:23:47 +01:00
David Gibson
745fa38169 udp: Fix incorrect usage of IPv6 state in IPv4 path
When forwarding IPv4 packets in udp_tap_handler(), we incorrectly use an
IPv6 address test on our IPv4 address (which could cause an out of bounds
access), and possibly set our bind interface to the IPv6 interface based on
it.  Adjust to correctly look at the IPv4 address and IPv4 interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 05:41:03 +01:00
David Gibson
deea5a8437 udp: Small streamline to udp_update_hdr4()
Streamline the logic here slightly, by introducing a 'src' temporary for
brevity.  We also transform the logic for setting/clearing PORT_LOOPBACK.
This makes udp_update_hdr4() more closely match the corresponding logic
from udp_update_udp6().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 05:41:01 +01:00
David Gibson
bc2d0d381c udp: Set pif in epoll reference for ephemeral host sockets
The udp_epoll_ref contains a field for the pif to which the socket belongs.
We fill this in for permanent sockets created with udp_sock_init() and for
spliced sockets, however, we omit it for ephemeral sockets created for
tap originated flows.

This is a bug, although we currently get away with it, because we don't
consult that field for such flows.  Correctly fill it in.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 05:40:59 +01:00
David Gibson
720d777a69 udp: Don't attempt to translate a 0.0.0.0 source address
If an incoming packet has a source address of 0.0.0.0 we translate that to
the gateway address.  This doesn't really make sense, because we have no
way to do a reverse translation for reply packets.

Certain UDP protocols do use an unspecified source address in some
circumstances (e.g. DHCP).  These generally either require no reply, a
multicast reply, or provide a suitable reply address by other means.

In none of those cases does translating it in passt/pasta make sense.  The
best we can really do here is just leave it as is.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 05:40:46 +01:00
David Gibson
4e08d9b9c6 treewide: Use sa_family_t for address family variables
Sometimes we use sa_family_t for variables and parameters containing a
socket address family, other times we use a plain int.  Since sa_family_t
is what's actually used in struct sockaddr and friends, standardise on
that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-27 12:52:02 +01:00
David Gibson
1e6f92b995 udp: Fix 16-bit overflow in udp_invert_portmap()
The code in udp_invert_portmap() is written based on an incorrect
understanding of C's (arcane) integer promotion rules.  We calculate
'(in_port_t)i + delta' expecting the result to be of type in_port_t (16
bits).  However "small integer types" (those narrower than 'int') are
always promoted to int for expressions, meaning this calculation can
overrun the rdelta[] array.

Fix this, and use a new intermediate for the index, to make it very clear
what it's type is.  We also change i to unsigned, to avoid any possible
confusion from mixing signed and unsigned types.

Link: https://bugs.passt.top/show_bug.cgi?id=80
Reported-by: Laurent Jacquot <jk@lutty.net>
Suggested-by: Laurent Jacquot <jk@lutty.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-20 08:14:28 +01:00
David Gibson
8954c4a91b udp: Assertion in udp_invert_portmap() can be calculated at compile time
All the values in this ASSERT() are known at compile time, so this can be
converted to a static_assert().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-20 08:14:08 +01:00
David Gibson
927cb84fff udp: udp_sock_init_ns() partially duplicats udp_port_rebind_outbound()
Usually automatically forwarded UDP outbound ports are set up by
udp_port_rebind_outbound() called from udp_timer().  However, the very
first time they're created and bound is by udp_sock_init_ns() called from
udp_init().  udp_sock_init_ns() is essentially an unnecessary cut down
version of udp_port_rebind_outbound(), so we can jusat remove it.

Doing so does require moving udp_init() below udp_port_rebind_outbound()'s
definition.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-14 03:24:23 +01:00
David Gibson
96ad5c5acd udp: Don't prematurely (and incorrectly) set up automatic inbound forwards
For automated inbound port forwarding in pasta mode we scan bound ports
within the guest namespace via /proc and bind matching ports on the host to
listen for packets.  For UDP this is usually handled by udp_timer() which
calls port_fwd_scan_udp() followed by udp_port_rebind().  However there's
one initial scan before the the UDP timer is started: we call
port_fwd_scan_udp() from port_fwd_init(), and actually bind the resulting
ports in udp_sock_init_init() called from udp_init().

Unfortunately, the version in udp_sock_init_init() isn't correct.  It
unconditionally opens a new socket for every forwarded port, even if a
socket has already been explicit created with the -u option.  If the
explicitly forwarded ports have particular configuration (such as a
specific bound address address, or one implied by the -o option) those will
not be replicated in the new socket.  We essentially leak the original
correctly configured socket, replacing it with one which might not be
right.

We could make udp_sock_init_init() use udp_port_rebind() to get that right,
but there's actually no point doing so:
 * The initial bind was introduced by ccf6d2a7b4 ("udp: Actually bind
   detected namespace ports in init namespace") at which time we didn't
   periodically scan for bound UDP ports.  Periodic scanning was introduced
   in 457ff122e ("udp,pasta: Periodically scan for ports to automatically
   forward") making the bind from udp_init() redundant.
 * At the time of udp_init(), programs in the guest namespace are likely
   not to have started yet (unless attaching a pre-existing namespace) so
   there's likely not anything to scan for anyway.

So, simply remove the initial, broken socket create/bind, allowing
automatic port forwards to be created the first time udp_timer() runs.

Reported-by: Laurent Jacquot <jk@lutty.net>
Suggested-by: Laurent Jacquot <jk@lutty.net>
Link: https://bugs.passt.top/show_bug.cgi?id=79
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-14 03:24:01 +01:00
David Gibson
8563e7c870 treewide: Standardise on 'now' for current timestamp variables
In a number of places we pass around a struct timespec representing the
(more or less) current time.  Sometimes we call it 'now', and sometimes we
call it 'ts'.  Standardise on the more informative 'now'.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:35:10 +01:00
David Gibson
a179ca6707 treewide: Make a bunch of pointer variables pointers to const
Sufficiently recent cppcheck (I'm using 2.13.0) seems to have added another
warning for pointer variables which could be pointer to const but aren't.
Use this to make a bunch of variables const pointers where they previously
weren't for no particular reason.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-16 21:49:27 +01:00
David Gibson
57de44a4bc util: Make sock_l4() treat empty string ifname like NULL
sock_l4() takes NULL for ifname if you don't want to bind the socket to a
particular interface.  However, for a number of the callers, it's more
natural to use an empty string for that case.  Change sock_l4() to accept
either NULL or an empty string equivalently, and simplify some callers
using that change.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
David Gibson
5cada56186 treewide: Avoid in_addr_t
IPv4 addresses can be stored in an in_addr_t or a struct in_addr.  The
former is just a type alias to a 32-bit integer, so doesn't really give us
any type checking.  Therefore we generally prefer the structure, since we
mostly want to treat IP address as opaque objects.  Fix a few places where
we still use in_addr_t, but can just as easily use struct in_addr.

Note there are still some uses of in_addr_t in conf.c, but those are
justified: since they're doing prefix calculations, they actually need to
look at the internals of the address as an integer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
David Gibson
546332786c treewide: Use IN4ADDR_LOOPBACK_INIT more widely
We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to
the loopback address without delving into its internals.  However there are
some places we don't use it, and explicitly look at the internal structure
of struct in_addr, which we generally want to avoid.  Use the define more
widely to avoid that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
Stefano Brivio
4117bd94f9 port_fwd, util: Don't bind UDP ports with opposite-side bound TCP ports
When pasta periodically scans bound ports and binds them on the other
side in order to forward traffic, we bind UDP ports for corresponding
TCP port numbers, too, to support protocols and applications such as
iperf3 which use UDP port numbers matching the ones used by the TCP
data connection.

If we scan UDP ports in order to bind UDP ports, we skip detection of
the UDP ports we already bound ourselves, to avoid looping back our
own ports. Same with scanning and binding TCP ports.

But if we scan for TCP ports in order to bind UDP ports, we need to
skip bound TCP ports too, otherwise, as David pointed out:

- we find a bound TCP port on side A, and bind the corresponding TCP
  and UDP ports on side B

- at the next periodic scan, we find that UDP port bound on side B,
  and we bind the corresponding UDP port on side A

- at this point, we unbind that UDP port on side B: we would
  otherwise loop back our own port.

To fix this, we need to avoid binding UDP ports that we already
bound, on the other side, as a consequence of finding a corresponding
bound TCP port.

Reproducing this issue is straightforward:

  ./pasta -- iperf3 -s

  # Wait one second, then from another terminal:
  iperf3 -c ::1 -u

Reported-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Analysed-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: 457ff122e3 ("udp,pasta: Periodically scan for ports to automatically forward")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-22 07:19:36 +01:00
David Gibson
457ff122e3 udp,pasta: Periodically scan for ports to automatically forward
pasta supports automatic port forwarding, where we look for listening
sockets in /proc/net (in both namespace and outside) and establish port
forwarding to match.

For TCP we do this scan both at initial startup, then periodically
thereafter.  For UDP however, we currently only scan at start.  So unlike
TCP we won't update forwarding to handle services that start after pasta
has begun.

There's no particular reason for that, other than that we didn't implement
it.  So, remove that difference, by scanning for new UDP forwards
periodically too.  The logic is basically identical to that for TCP, but it
needs some changes to handle the mildly different data structures in the
UDP case.

Link: https://bugs.passt.top/show_bug.cgi?id=45
Link: https://github.com/rootless-containers/rootlesskit/issues/383
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-19 09:08:39 +01:00
David Gibson
de974f0cf1 udp: Remove socket from udp_{tap,splice}_map when timed out
We save sockets bound to particular ports in udp_{tap,splice}_map for
reuse later.  If they're not used for a time, we time them out and close
them. However, when that happened, we weren't actually removing the fds
from the relevant map.  That meant that later interactions on the same port
could get a stale fd from the map.

The stale fd might be closed, leading to unexpected EBADF errors, or it
could have been re-used by a completely different socket bound to a
different port, which could lead to us incorrectly forwarding packets.

Reported-by: Chris Kuhn <kuhnchris@kuhnchris.eu>
Reported-by: Jay <bugs.passt.top@bitsbetwixt.com>
Link: https://bugs.passt.top/show_bug.cgi?id=57
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:55:08 +01:00
David Gibson
480aa4a108 udp: Consistently use -1 to indicate un-opened sockets in maps
udp uses the udp_tap_map, udp_splice_ns and udp_splice_init tables to keep
track of already opened sockets bound to specific ports.  We need a way to
indicate entries where a socket hasn't been opened, but the code isn't
consistent if this is indicated by a 0 or a -1:
  * udp_splice_sendfrom() and udp_tap_handler() assume that 0 indicates
    an unopened socket
  * udp_sock_init() fills in -1 for a failure to open a socket
  * udp_timer_one() is somewhere in between, treating only strictly
    positive fds as valid

-1 (or, at least, negative) is really the correct choice here, since 0 is
a theoretically valid fd value (if very unlikely in practice).  Change to
use that consistently throughout.

The table does need to be initialised to all -1 values before any calls to
udp_sock_init() which can happen from conf_ports().  Because C doesn't make
it easy to statically initialise non zero values in large tables, this does
require a somewhat awkward call to initialise the table from conf().  This
is the best approach I could see for the short term, with any luck it will
go away at some point when those socket tables are replaced by a unified
flow table.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:55:03 +01:00
David Gibson
f6d8dc2355 pif: Pass originating pif to tap handler functions
For now, packets passed to the various *_tap_handler() functions always
come from the single "tap" interface.  We want to allow the possibility to
broaden that in future.  As preparation for that, have the code in tap.c
pass the pif id of the originating interface to each of those handler
functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:53:45 +01:00
David Gibson
732e249376 pif: Record originating pif in listening socket refs
For certain socket types, we record in the epoll ref whether they're
sockets in the namespace, or on the host.  We now have the notion of "pif"
to indicate what "place" a socket is associated with, so generalise the
simple one-bit 'ns' to a pif id.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:53:41 +01:00
David Gibson
0d73fa2225 udp: Clean up ref initialisation in udp_sock_init()
udp_sock_init() has a number of paths that initialise uref differently.
However some of the fields are initialised the same way in all of them.
Move those fields into the original initialiser to save a few lines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:53:35 +01:00