1
0
mirror of https://passt.top/passt synced 2024-09-19 14:00:53 +00:00
Commit Graph

76 Commits

Author SHA1 Message Date
David Gibson
c000f2aba6 flow, icmp: Use general flow forwarding rules for ICMP
Current ICMP hard codes its forwarding rules, and never applies any
translations.  Change it to use the flow_target() function, so that
it's translated the same as TCP (excluding TCP specific port
redirection).

This means that gw mapping now applies to ICMP so "ping <gw address>" will
now ping the host's loopback instead of the actual gw machine.  This
removes the surprising behaviour that the target you ping might not be the
same as you connect to with TCP.

This removes the last user of flow_target_af(), so that's removed as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:33 +02:00
David Gibson
4cd753e65c icmp: Manage outbound socket address via flow table
For now when we forward a ping to the host we leave the host side
forwarding address and port blank since we don't necessarily know what
source address and id will be used by the kernel.  When the outbound
address option is active, though, we do know the address at least, so we
can record it in the flowside.

Having done that, use it as the primary source of truth, binding the
outgoing socket based on the information in there.  This allows the
possibility of more complex rules for what outbound address and/or id
we use in future.

To implement this we create a new helper which sets up a new socket based
on information in a flowside, which will also have future uses.  It
behaves slightly differently from the existing ICMP code, in that it
doesn't bind to a specific interface if given a loopback address.  This is
logically correct - the loopback address means we need to operate through
the host's loopback interface, not ifname_out.  We didn't need it in ICMP
because ICMP will never generate a loopback address at this point, however
we intend to change that in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:25 +02:00
David Gibson
2faf6fcd8b icmp: Eliminate icmp_id_map
With previous reworks the icmp_id_map data structure is now maintained, but
never used for anything.  Eliminate it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:20 +02:00
David Gibson
2f40a01944 icmp: Look up ping flows using flow hash
When we receive a ping packet from the tap interface, we currently locate
the correct flow entry (if present) using an anciliary data structure, the
icmp_id_map[] tables.  However, we can look this up using the flow hash
table - that's what it's for.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:16 +02:00
David Gibson
6d76278c21 icmp: Obtain destination addresses from the flowsides
icmp_sock_handler() obtains the guest address from it's most recently
observed IP.  However, this can now be obtained from the common flowside
information.

icmp_tap_handler() builds its socket address for sendto() directly
from the destination address supplied by the incoming tap packet.
This can instead be generated from the flow.

Using the flowsides as the common source of truth here prepares us for
allowing more flexible NAT and forwarding by properly initialising
that flowside information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:13 +02:00
David Gibson
5cffb1bf64 icmp: Remove redundant id field from flow table entry
struct icmp_ping_flow contains a field for the ICMP id of the ping, but
this is now redundant, since the id is also stored as the "port" in the
common flowsides.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:06 +02:00
David Gibson
4e2d36e83f flow: Common address information for target side
Require the address and port information for the target (non
initiating) side to be populated when a flow enters TGT state.
Implement that for TCP and ICMP.  For now this leaves some information
redundantly recorded in both generic and type specific fields.  We'll
fix that in later patches.

For TCP we now use the information from the flow to construct the
destination socket address in both tcp_conn_from_tap() and
tcp_splice_connect().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:32:37 +02:00
David Gibson
8012f5ff55 flow: Common address information for initiating side
Handling of each protocol needs some degree of tracking of the
addresses and ports at the end of each connection or flow.  Sometimes
that's explicit (as in the guest visible addresses for TCP
connections), sometimes implicit (the bound and connected addresses of
sockets).

To allow more consistent handling across protocols we want to
uniformly track the address and port at each end of the connection.
Furthermore, because we allow port remapping, and we sometimes need to
apply NAT, the addresses and ports can be different as seen by the
guest/namespace and as by the host.

Introduce 'struct flowside' to keep track of address and port
information related to one side of a flow. Store two of these in the
common fields of a flow to track that information for both sides.

For now we only populate the initiating side, requiring that
information be completed when a flows enter INI.  Later patches will
populate the target side.

For now this leaves some information redundantly recorded in both generic
and type specific fields.  We'll fix that in later patches.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:32:32 +02:00
David Gibson
9b125e7776 flow, icmp, tcp: Clean up helpers for getting flow from index
TCP (both regular and spliced) and ICMP both have macros to retrieve the
relevant protcol specific flow structure from a flow index.  In most cases
what we actually want is to get the specific flow from a sidx.  Replace
those simple macros with a more precise inline, which also asserts that
the flow is of the type we expect.

While we're they're also add a pif_at_sidx() helper to get the interface of
a specific flow & side, which is useful in some places.

Finally, fix some minor style issues in the comments on some of the
existing sidx related helpers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-17 15:27:27 +02:00
David Gibson
74c1c5efcf util: sock_l4() determine protocol from epoll type rather than the reverse
sock_l4() creates a socket of the given IP protocol number, and adds it to
the epoll state.  Currently it determines the correct tag for the epoll
data based on the protocol.  However, we have some future cases where we
might want different semantics, and therefore epoll types, for sockets of
the same protocol.  So, change sock_l4() to take the epoll type as an
explicit parameter, and determine the protocol from that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-05 15:26:09 +02:00
David Gibson
8a2accb847 flow: Record the pifs for each side of each flow
Currently we have no generic information flows apart from the type and
state, everything else is specific to the flow type.  Start introducing
generic flow information by recording the pifs which the flow connects.

To keep track of what information is valid, introduce new flow states:
INI for when the initiating side information is complete, and TGT for
when both sides information is complete, but we haven't chosen the
flow type yet.  For now, these states don't do an awful lot, but
they'll become more important as we add more generic information.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-22 23:21:03 +02:00
David Gibson
43571852e6 flow: Make side 0 always be the initiating side
Each flow in the flow table has two sides, 0 and 1, representing the
two interfaces between which passt/pasta will forward data for that flow.
Which side is which is currently up to the protocol specific code:  TCP
uses side 0 for the host/"sock" side and 1 for the guest/"tap" side, except
for spliced connections where it uses 0 for the initiating side and 1 for
the target side.  ICMP also uses 0 for the host/"sock" side and 1 for the
guest/"tap" side, but in its case the latter is always also the initiating
side.

Make this generically consistent by always using side 0 for the initiating
side and 1 for the target side.  This doesn't simplify a lot for now, and
arguably makes TCP slightly more complex, since we add an extra field to
the connection structure to record which is the guest facing side. This is
an interim change, which we'll be able to remove later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>q
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-22 23:21:01 +02:00
David Gibson
0060acd11b flow: Clarify and enforce flow state transitions
Flows move over several different states in their lifetime.  The rules for
these are documented in comments, but they're pretty complex and a number
of the transitions are implicit, which makes this pretty fragile and
error prone.

Change the code to explicitly track the states in a field.  Make all
transitions explicit and logged.  To the extent that it's practical in C,
enforce what can and can't be done in various states with ASSERT()s.

While we're at it, tweak the docs to clarify the restrictions on each state
a bit.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-22 23:20:58 +02:00
David Gibson
7a832a8a0e flow: Properly type callbacks to protocol specific handlers
The flow dispatches deferred and timer handling for flows centrally, but
needs to call into protocol specific code for the handling of individual
flows.  Currently this passes a general union flow *.  It makes more sense
to pass the specific relevant flow type structure.  That brings the check
on the flow type adjacent to casting to the union variant which it tags.

Arguably, this is a slight abstraction violation since it involves the
generic flow code using protocol specific types.  It's already calling into
protocol specific functions, so I don't think this really makes any
difference.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-22 23:20:52 +02:00
David Gibson
5566386f5f treewide: Standardise variable names for various packet lengths
At various points we need to track the lengths of a packet including or
excluding various different sets of headers.  We don't always use the same
variable names for doing so.  Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.

To improve clarity, standardise on these names:
   dlen:		L4 protocol payload length ("data length")
   l4len:		plen + length of L4 protocol header
   l3len:		l4len + length of IPv4/IPv6 header
   l2len:		l3len + length of L2 (ethernet) header

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:23 +02:00
David Gibson
4779dfe12f icmp: Use 'flowside' epoll references for ping sockets
Currently ping sockets use a custom epoll reference type which includes
the ICMP id.  However, now that we have entries in the flow table for
ping flows, finding that is sufficient to get everything else we want,
including the id.  Therefore remove the icmp_epoll_ref type and use the
general 'flowside' field for ping sockets.

Having done this we no longer need separate EPOLL_TYPE_ICMP and
EPOLL_TYPE_ICMPV6 reference types, because we can easily determine
which case we have from the flow type. Merge both types into
EPOLL_TYPE_PING.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-12 01:49:05 +01:00
David Gibson
02cbdb0b86 icmp: Flow based error reporting
Use flow_dbg() and flow_err() helpers to generate flow-linked error
messages in most places.  Make a few small improvements to the messages
while we're at it.  This allows us to avoid the awkward 'pname' variables
since whether we're dealing with ICMP or ICMPv6 is already built into the
flow type which these helpers include.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Coding style fix in icmp_tap_handler()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-12 01:36:04 +01:00
David Gibson
3af5e9fdba icmp: Store ping socket information in flow table
Currently icmp_id_map[][] stores information about ping sockets in a
bespoke structure.  Move the same information into new types of flow
in the flow table.  To match that change, replace the existing ICMP
timer with a flow-based timer for expiring ping sockets.  This has the
advantage that we only need to scan the active flows, not all possible
ids.

We convert icmp_id_map[][] to point to the flow table entries, rather
than containing its own information.  We do still use that array for
locating the right ping flows, rather than using a "flow native" form
of lookup for the time being.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update id_sock description in comment to icmp_ping_new()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-12 01:34:45 +01:00
Laurent Vivier
324bd46782 util: move IP stuff from util.[ch] to ip.[ch]
Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures.  Modify various files to include the new header ip.h when
it's needed.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-5-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:38 +01:00
David Gibson
f6e6e8ad40 inany: Introduce union sockaddr_inany
There are a number of places where we want to handle either a
sockaddr_in or a sockaddr_in6.  In some of those we use a void *,
which works ok and matches some standard library interfaces, but
doesn't give a signature level hint that we're dealing with only
sockaddr_in or sockaddr_in6, not (say) sockaddr_un or another type of
socket address.  Other places we use a sockaddr_storage, which also
works, but has the same problem in addition to allocating more on the
stack than we need to.

Introduce union sockaddr_inany to explictly handle this case: it has
variants for sockaddr_in and sockaddr_in6.  Use it in a number of
places where it's easy to do so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 09:47:31 +01:00
David Gibson
4e08d9b9c6 treewide: Use sa_family_t for address family variables
Sometimes we use sa_family_t for variables and parameters containing a
socket address family, other times we use a plain int.  Since sa_family_t
is what's actually used in struct sockaddr and friends, standardise on
that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-27 12:52:02 +01:00
David Gibson
322660b0b9 icmp: Dedicated functions for starting and closing ping sequences
ICMP sockets are cleaned up on a timeout implemented in icmp_timer_one(),
and the logic to do that cleanup is open coded in that function.  Similarly
new sockets are opened when we discover we don't have an existing one in
icmp_tap_handler(), and again the logic is open-coded.

That's not the worst thing, but it's a bit cleaner to have dedicated
functions for the creation and destruction of ping sockets.  This will also
make things a bit easier for future changes we have in mind.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:56 +01:00
David Gibson
b6a4e20aa6 icmp: Validate packets received on ping sockets
We access fields of packets received from ping sockets assuming they're
echo replies, without actually checking that.  Of course, we don't expect
anything else from the kernel, but it's probably best to verify.

While we're at it, also check for short packets, or a receive address of
the wrong family.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:55 +01:00
David Gibson
6e86511f59 icmp: Warn on receive errors from ping sockets
Currently we silently ignore an errors receiving a packet from a ping
socket.  We don't expect that to happen, so it's probably worth reporting
if it does.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:53 +01:00
David Gibson
a325121759 icmp: Consolidate icmp_sock_handler() with icmpv6_sock_handler()
Currently we have separate handlers for ICMP and ICMPv6 ping replies.
Although there are a number of points of difference, with some creative
refactoring we can combine these together sensibly.  Although it doesn't
save a vast amount of code, it does make it clearer that we're performing
basically the same steps for each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:51 +01:00
David Gibson
70d43f9c05 icmp: Share more between IPv4 and IPv6 paths in icmp_tap_handler()
Currently icmp_tap_handler() consists of two almost disjoint paths for the
IPv4 and IPv6 cases.  The only thing they share is an error message.
We can use some intermediate variables to refactor this to share some more
code between those paths.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:49 +01:00
David Gibson
15be1bfd81 icmp: Simplify socket expiry scanning
Currently we use icmp_act[] to scan for ICMP ids which might have an open
socket which could time out.  However icmp_act[] contains no information
that's not already in icmp_id_map[] - it's just an "index" which allows
scanning for relevant entries with less cache footprint.

We only scan for ICMP socket expiry every 1s, though, so it's not clear
that cache footprint really matters.  Furthermore, there's no strong reason
we need to scan even that often - the timeout is fairly arbitrary and
approximate.

So, eliminate icmp_act[] in favour of directly scanning icmp_id_map[] and
compensate for the cache impact by reducing the scan frequency to once
every 10s.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:46 +01:00
David Gibson
24badd0acf icmp: Use -1 to represent "missing" sockets
icmp_id_map[] contains, amongst other things, fds for "ping" sockets
associated with various ICMP echo ids.  However, we only lazily open()
those sockets, so many will be missing.  We currently represent that with
a 0, which isn't great, since that's technically a valid fd.  Use -1
instead.  This does require initializing the fields in icmp_id_map[] but
we already have an obvious place to do that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:43 +01:00
David Gibson
43713af50e icmp: Don't attempt to match host IDs to guest IDs
When forwarding pings from tap, currently we create a ping socket with
a socket address whose port is set to the ID of the ping received from the
guest.  This causes the socket to send pings with the same ID on the host.
Although this seems look a good idea for maximum transparency, it's
probably unwise.

First, it's fallible - the bind() could fail, and we already have fallback
logic which will overwrite the packets with the expected guest id if the
id we get on replies doesn't already match.  We might as well do that
unconditionally.

But more importantly, we don't know what else on the host might be using
ping sockets, so we could end up with an ID that's the same as an existing
socket.  You'd expect that to fail the bind() with EADDRINUSE, which would
be fine: we'd fall back to rewriting the reply ids.  However it appears the
kernel (v6.6.3 at least), does *not* fail the bind() and instead it's
"last socket wins" in terms of who gets the replies.  So we could
accidentally intercept ping replies for something else on the host.

So, instead of using bind() to set the id, just let the kernel pick one
and expect to translate the replies back.  Although theoretically this
makes the passt/pasta link a bit less "transparent", essentially nothing
cares about specific ping IDs, much like TCP source ports, which we also
don't preserve.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:41 +01:00
David Gibson
8534cdbfd1 icmp: Don't attempt to handle "wrong direction" ping socket traffic
Linux ICMP "ping" sockets are very specific in what they do.  They let
userspace send ping requests (ICMP_ECHO or ICMP6_ECHO_REQUEST), and receive
matching replies (ICMP_ECHOREPLY or ICMP6_ECHO_REPLY).  They don't let you
intercept or handle incoming ping requests.

In the case of passt/pasta that means we can process echo requests from tap
and forward them to a ping socket, then take the replies from the ping
socket and forward them to tap.  We can't do the reverse: take echo
requests from the host and somehow forward them to the guest. There's
really no way for something outside to initiate a ping to a passt/pasta
connected guest and if there was we'd need an entirely different mechanism
to handle it.

However, we have some logic to deal with packets going in that reverse
direction.  Remove it, since it can't ever be used that way.  While we're
there use defines for the ICMPv6 types, instead of open coded type values.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:38 +01:00
David Gibson
2cb2fe6f89 icmp: Remove redundant initialisation of sendto() address
We initialise the address portion of the sockaddr for sendto() to the
unspecified address, but then always overwrite it with the actual
destination address before we call the sendto().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:35 +01:00
David Gibson
5dffb99892 icmp: Don't set "port" on destination sockaddr for ping sockets
We set the port to the ICMP id on the sendto() address when using ICMP
ping sockets.  However, this has no effect: the ICMP id the kernel
uses is determined only by the "port" on the socket's *bound* address
(which is constructed inside sock_l4(), using the id we also pass to
it).

For unclear reasons this change triggers cppcheck 2.13.0 to give new
"variable could be const pointer" warnings, so make *ih const as well to
fix that.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:36:32 +01:00
David Gibson
8563e7c870 treewide: Standardise on 'now' for current timestamp variables
In a number of places we pass around a struct timespec representing the
(more or less) current time.  Sometimes we call it 'now', and sometimes we
call it 'ts'.  Standardise on the more informative 'now'.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-01-22 23:35:10 +01:00
David Gibson
57de44a4bc util: Make sock_l4() treat empty string ifname like NULL
sock_l4() takes NULL for ifname if you don't want to bind the socket to a
particular interface.  However, for a number of the callers, it's more
natural to use an empty string for that case.  Change sock_l4() to accept
either NULL or an empty string equivalently, and simplify some callers
using that change.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
David Gibson
24d1f6570b icmp: Avoid unnecessary handling of unspecified bind address
We go to some trouble, if the configured output address is unspecified, to
pass NULL to sock_l4().  But while passing NULL is one way to get sock_l4()
not to specify a bind address, passing the "any" address explicitly works
too.  Use this to simplify icmp_tap_handler().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
David Gibson
073f530bfe treewide: Add IN4ADDR_ANY_INIT macro
We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to
the loopback address, make a similar one for the unspecified / any address.
This avoids messying things with the internal structure of struct in_addr
where we don't care about it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 19:29:45 +01:00
David Gibson
f6d8dc2355 pif: Pass originating pif to tap handler functions
For now, packets passed to the various *_tap_handler() functions always
come from the single "tap" interface.  We want to allow the possibility to
broaden that in future.  As preparation for that, have the code in tap.c
pass the pif id of the originating interface to each of those handler
functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 09:53:45 +01:00
David Gibson
c1d2a070f2 util: Consolidate and improve workarounds for clang-tidy issue 58992
We have several workarounds for a clang-tidy bug where the checker doesn't
recognize that a number of system calls write to - and therefore initialise
- a socket address.  We can't neatly use a suppression, because the bogus
warning shows up some time after the actual system call, when we access
a field of the socket address which clang-tidy erroneously thinks is
uninitialised.

Consolidate these workarounds into one place by using macros to implement
wrappers around affected system calls which add a memset() of the sockaddr
to silence clang-tidy.  This removes the need for the individual memset()
workarounds at the callers - and the somewhat longwinded explanatory
comments.

We can then use a #define to not include the hack in "real" builds, but
only consider it for clang-tidy.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-09-27 17:26:06 +02:00
David Gibson
cee4a2da48 tap: Pass source address to protocol handler functions
The tap code passes the IPv4 or IPv6 destination address of packets it
receives to the protocol specific code.  Currently that protocol code
doesn't use the source address, but we want it to in future.  So, in
preparation, pass the IPv4/IPv6 source address of tap packets to those
functions as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-22 12:15:21 +02:00
David Gibson
05f606ab0b epoll: Split handling of ICMP and ICMPv6 sockets
We have different epoll type values for ICMP and ICMPv6 sockets, but they
both call the same handler function, icmp_sock_handler().  However that
function does essentially nothing in common for the two cases.  So, split
it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them
separately from the top level.

While we're there remove some parameters that the function was never using
anyway.  Also move the test for c->no_icmp into the functions, so that all
the logic specific to ICMP is within the handler, rather than in the top
level dispatch code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-13 17:30:07 +02:00
David Gibson
3401644453 epoll: Generalize epoll_ref to cover things other than sockets
The epoll_ref type includes fields for the IP protocol of a socket, and the
socket fd.  However, we already have a few things in the epoll which aren't
protocol sockets, and we may have more in future.  Rename these fields to
an abstract "fd type" and file descriptor for more generality.

Similarly, rather than using existing IP protocol numbers for the type,
introduce our own number space.  For now these just correspond to the
supported protocols, but we'll expand on that in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-13 17:29:51 +02:00
David Gibson
8218d99013 Use C11 anonymous members to make poll refs less verbose to use
union epoll_ref has a deeply nested set of structs and unions to let us
subdivide it into the various different fields we want.  This means that
referencing elements can involve an awkward long string of intermediate
fields.

Using C11 anonymous structs and unions lets us do this less clumsily.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:17:57 +02:00
Stefano Brivio
ca2749e1bd passt: Relicense to GPL 2.0, or any later version
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.

Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.

Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-04-06 18:00:33 +02:00
Stefano Brivio
a9c59dd91b conf, icmp, tcp, udp: Add options to bind to outbound address and interface
I didn't notice earlier: libslirp (and slirp4netns) supports binding
outbound sockets to specific IPv4 and IPv6 addresses, to force the
source addresse selection. If we want to claim feature parity, we
should implement that as well.

Further, Podman supports specifying outbound interfaces as well, but
this is simply done by resolving the primary address for an interface
when the network back-end is started. However, since kernel version
5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for
non-root users"), we can actually bind to a specific interface name,
which doesn't need to be validated in advance.

Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses,
and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets
to given interfaces.

Given that it probably makes little sense to select addresses and
routes from interfaces different than the ones given for outbound
sockets, also assign those as "template" interfaces, by default,
unless explicitly overridden by '-i'.

For ICMP and UDP, we call sock_l4() to open outbound sockets, as we
already needed to bind to given ports or echo identifiers, and we
can bind() a socket only once: there, pass address (if any) and
interface (if any) for the existing bind() and setsockopt() calls.

For TCP, in general, we wouldn't otherwise bind sockets. Add a
specific helper to do that.

For UDP outbound sockets, we need to know if the final destination
of the socket is a loopback address, before we decide whether it
makes sense to bind the socket at all: move the block mangling the
address destination before the creation of the socket in the IPv4
path. This was already the case for the IPv6 path.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:43:59 +01:00
David Gibson
7c7b68dbe0 Use typing to reduce chances of IPv4 endianness errors
We recently corrected some errors handling the endianness of IPv4
addresses.  These are very easy errors to make since although we mostly
store them in network endianness, we sometimes need to manipulate them in
host endianness.

To reduce the chances of making such mistakes again, change to always using
a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network
endian addresses.  This makes it harder to accidentally do arithmetic or
comparisons on such addresses as if they were host endian.

We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to
directly work with struct in_addr values.  This has the additional benefit
of making the IPv4 and IPv6 paths more visually similar.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:24 +01:00
David Gibson
2b793d94ca Correct some missing endian conversions of IPv4 addresses
The INADDR_LOOPBACK constant is in host endianness, and similarly the
IN_MULTICAST macro expects a host endian address.  However, there are some
places in passt where we use those with network endian values.  This means
that passt will incorrectly allow you to set 127.0.0.1 or a multicast
address as the guest address or DNS forwarding address.  Add the necessary
conversions to correct this.

INADDR_ANY and INADDR_BROADCAST logically behave the same way, although
because they're palindromes it doesn't have an effect in practice.  Change
them to be logically correct while we're there, though.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:03:58 +01:00
Stefano Brivio
f212044940 icmp: Don't discard first reply sequence for a given echo ID
In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any
reply we send: we're on the same host as the target, after all. We
discard them by comparing the last sequence we sent with the sequence
we receive.

However, on the first reply for a given identifier, the sequence
might be zero, depending on the implementation of ping(8): we need
another value to indicate we haven't sent any sequence number, yet.

Use -1 as initialiser in the echo identifier map.

This is visible with Busybox's ping, and was reported by Paul on the
integration at https://github.com/containers/podman/pull/16141, with:

  $ podman run --net=pasta alpine ping -c 2 192.168.188.1

...where only the second reply would be routed back.

Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 33482d5bf2 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-27 00:18:21 +02:00
Stefano Brivio
b062ee47d1 icmp: Add debugging messages for handled replies and requests
...instead of just reporting errors.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-27 00:18:18 +02:00
David Gibson
2dbc622f54 tap: Split tap_ip4_send() into UDP and ICMP variants
tap_ip4_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv4 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:56 +02:00
David Gibson
9d8dd8b6f4 tap: Split tap_ip6_send() into UDP and ICMP variants
tap_ip6_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp6_send() and tap_icmp6_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv6 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:48 +02:00