1
0
mirror of https://passt.top/passt synced 2025-01-18 18:35:15 +00:00

146 Commits

Author SHA1 Message Date
Stefano Brivio
d072ac2434 tap: With pasta, don't reset on tap errors, handle write failures
Since commit 0515adceaa8f ("passt, pasta: Namespace-based sandboxing,
defer seccomp policy application"), it makes no sense to close and
reopen the tap device on error: we don't have access to /dev/net/tun
after the initial setup phase.

If we hit ENOBUFS while writing (as reported: in one case because
the kernel actually ran out of memory, with another case under
investigation), or ENOSPC, we're supposed to drop whatever data we
were trying to send: there's no room for it.

Handle EINTR just like we handled EAGAIN/EWOULDBLOCK: there's no
particular reason why sending the same data should fail again.

Anything else I can think of would be an unrecoverable error: exit
with failure then.

While at it, drop a useless cast on the write() call: it takes a
const void * anyway.

Reported-by: Gianluca Stivan <me@yawnt.com>
Reported-by: Chris Kuhn <kuhnchris@kuhnchris.eu>
Fixes: 0515adceaa8f ("passt, pasta: Namespace-based sandboxing, defer seccomp policy application")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-06-23 10:15:10 +02:00
David Gibson
25f1d1a84f tap: Don't update ip6.addr_seen to ::
When we receive packets from the tap side, we update the addr_seen fields
to reflect the last known address of the guest or ns.  For ip4.addr_seen
we, sensibly, only update if the address we've just seen isn't 0 (0.0.0.0).
This case can occur during early DHCP transactions.

We have no equivalent case for IPv6.  We're less likely to hit this,
because DHCPv6 uses link-local addresses, however we can see an source
address of :: with certain multicast operations.  This can bite us if we
try to make an incoming connection very early after starting pasta with
--config-net: we may have only seen some of those multicast packets,
updated addr_seen to :: and not had any "real" packets to update it to a
global address.  I've seen this with some of the avocado test conversions.

In any case, it can never make sense to update addr_seen to ::, so
explicitly exclude that case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-05-17 18:50:34 +02:00
Stefano Brivio
ca2749e1bd passt: Relicense to GPL 2.0, or any later version
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.

Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.

Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-04-06 18:00:33 +02:00
David Gibson
34ade90957 Work around weird false positives with cppcheck-2.9.1
Commit 89e38f55 "treewide: Fix header includes to build with musl" added
extra #includes to work with musl.  Unfortunately with the cppcheck version
I'm using (cppcheck-2.9-1.fc37.x86_64 in Fedora 37) this causes weird false
positives: specifically cppcheck seems to hit a #error in <bits/unistd.h>
complaining about including it directly instead of via <unistd.h> (which is
not something we're doing).

I have no idea why that would be happening; but I'm guessing it has to be
a bug in the cpp implementation in that cppcheck version.  In any case,
it's possible to work around this by moving the include of <unistd.h>
before the include of <signal.h>.  So, do that.

Fixes: 89e38f55405d ("treewide: Fix header includes to build with musl")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-03-21 16:38:06 +01:00
Chris Kuhn
89e38f5540 treewide: Fix header includes to build with musl
Roughly inspired from a patch by Chris Kuhn: fix up includes so that
we can build against musl: glibc is more lenient as headers generally
include a larger amount of other headers.

Compared to the original patch, I only included what was needed
directly in C files, instead of adding blanket includes in local
header files. It's a bit more involved, but more consistent with the
current (not ideal) situation.

Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
d2df763232 log, conf, tap: Define die() as err() plus exit(), drop cppcheck workarounds
If we define die() as a variadic macro, passing __VA_ARGS__ to err(),
and calling exit() outside err() itself, we can drop the workarounds
introduced in commit 36f0199f6ef4 ("conf, tap: Silence two false
positive invalidFunctionArg from cppcheck").

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:57 +01:00
Stefano Brivio
36f0199f6e conf, tap: Silence two false positive invalidFunctionArg from cppcheck
The newly introduced die() calls exit(), but cppcheck doesn't see it
and warns about possibly invalid arguments used after the check which
triggers die(). Add return statements to silence the warnings.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 19:19:23 +01:00
David Gibson
42bfd212b1 tap: Eliminate goto from tap_handler()
The goto here really doesn't improve clarity or brevity at all.  Use a
clearer construct.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:31 +01:00
David Gibson
b62ed9ca0e tap: Don't pcap frames that didn't get sent
In tap_send_frames() we send a number of frames to the tap device, then
also write them to the pcap capture file (if configured).  However the tap
send can partially fail (short write()s or similar), meaning that some
of the requested frames weren't actually sent, but we still write those
frames to the capture file.

We do give a debug message in this case, but it's misleading to add frames
that we know weren't sent to the capture file.  Rework to avoid this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:20 +01:00
Stefano Brivio
7564b58a7f tap: Use single counter for iov elements in tap_send_frames_pasta()
David points out that using multiple counters to go over the iov
array, namely 'i' and 'iov', makes mistakes easier. We can't just use
'iov', unless we reserve an element with zero iov_len at the end,
which isn't really justified.

Simply use 'i' to iterate over the array.

Link: https://archives.passt.top/passt-dev/Y+mfenvLn3VJ7Dg5@yekko/
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-16 17:33:54 +01:00
Laine Stump
c9af6f92db convert all remaining err() followed by exit() to die()
This actually leaves us with 0 uses of err(), but someone could want
to use it in the future, so we may as well leave it around.

Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 17:32:27 +01:00
Stefano Brivio
ac153595c0 tap: Send frames after the first one in tap_send_frames_pasta()
...instead of repeatedly sending out the first one in iov.

Fixes: e21ee41ac35a ("tcp: Combine two parts of pasta tap send path together")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-12 14:34:38 +01:00
David Gibson
2d553b587a tap: Improve handling of partial frame sends
In passt mode, when writing frames to the qemu socket, we might get a short
send.  If we ignored this and carried on, the qemu socket would get out of
sync, because the bytes we actually sent wouldn't correspond  to the length
header we already sent.  tap_send_frames_passt() handles that by doing a
a blocking send to complete the message, but it has a few flaws:
 * We only attempt to resend once: although it's unlikely in practice,
   nothing prevents the blocking send() from also being short
 * We print a debug error if send() returns non-zero.. but send() returns
   the number of bytes sent, so we actually want it to return the length
   of the remaining data.

Correct those flaws and also be a bit more thorough about reporting
problems here.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:55:01 +01:00
David Gibson
0fb7b2b908 tap: Use different io vector bases depending on tap type
Currently tap_send_frames() expects the frames it is given to include the
vnet_len field, even in pasta mode which doesn't use it (although it need
not be initialized in that case).  To match, tap_iov_base() and
tap_iov_len() construct the frame in that way.

This will inconvenience future changes, so alter things to set the buffers
to include just the frame needed by the tap backend type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:57 +01:00
David Gibson
4b3d38a069 tap: Add "tap headers" abstraction
Currently both the TCP and UDP code need to deal in various places with the
details of the L2 headers, and also the tap-specific "vnet_len" header.
This makes abstracting the tap interface to new backends (e.g. vhost-user
or tun) more difficult.

To improve this abstraction, create a new 'tap_hdr' structure which
represents both L2 (always Ethernet at the moment, but might be vary in
future) and any additional tap specific headers (such as the qemu socket's
vnet_len field).  Provide helper functions and macros to initialize, update
and use it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:52 +01:00
David Gibson
6d011c1faa tap, tcp: Move tap send path to tap.c
The functions which do the final steps of sending TCP packets on through
the tap interface - tcp_l2_buf_flush*() - no longer have anything that's
actually specific to TCP in them, other than comments and names.  Move them
all to tap.c.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:40 +01:00
Richard W.M. Jones
190169c544 passt, tap: Process data on the socket before HUP/ERR events
In the case where the client writes a packet and then closes the
socket, because we receive EPOLLIN|EPOLLRDHUP together we have a
choice of whether to close the socket immediately, or read the packet
and then close the socket.  Choose the latter.

This should improve fuzzing coverage and arguably is a better choice
even for regular use since dropping packets on close is bad.

See-also: https://archives.passt.top/passt-dev/20221117171805.3746f53a@elisabeth/
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-25 01:40:57 +01:00
Richard W.M. Jones
6b4e68383c passt, tap: Add --fd option
This passes a fully connected stream socket to passt.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
[sbrivio: reuse fd_tap instead of adding a new descriptor,
 imply --one-off on --fd, add to optstring and usage()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-25 01:40:47 +01:00
David Gibson
698e4fd761 style: Minor corrections to function comments
Some style issues and a typo.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-25 01:34:26 +01:00
Stefano Brivio
25dab96205 tap: Revert recently added checks in tap_handler_passt()
This reverts commit 198f87835dc4 ("tap: Return -EIO from
tap_handler_passt() on inconsistent packet stream") and commit
510dace86ccf ("tap: Keep stream consistent if qemu length descriptor
spans two recv() calls").

I can hit occasional failures in perf/passt_tcp tests where we seem
to be getting excess data at the end of a recv(), and for some reason
I couldn't figure out yet, if we just ignore it, subsequent recv()
calls from qemu return correct data. If we close the connection, qemu
can't talk to us anymore, of course.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-16 15:11:22 +01:00
Stefano Brivio
b27d6d121c arp, tap, util: Don't use perror() after seccomp filter is installed
If stderr is closed, after we fork to background, glibc's
implementation of perror() will try to re-open it by calling dup(),
upon which the seccomp filter causes the process to terminate,
because dup() is not included in the list of allowed syscalls.

Replace perror() calls that might happen after isolation_postfork().
We could probably replace all of them, but early ones need a bit more
attention as we have to check whether log.c functions work in early
stages.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-11-16 15:11:13 +01:00
Stefano Brivio
198f87835d tap: Return -EIO from tap_handler_passt() on inconsistent packet stream
While it's important to fail in that case, it makes little sense to
fail quietly: it's better to tell qemu explicitly that something went
wrong and that we won't recover, by closing the socket.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-10 11:17:50 +01:00
Stefano Brivio
510dace86c tap: Keep stream consistent if qemu length descriptor spans two recv() calls
I got all paranoid after triggering a divide-by-zero general
protection fault in passt with a qemu version without the virtio_net
TX hang fix, while flooding UDP. I start thinking this was actually
coming from some random changes I was playing with, but before
reaching this conclusion I reviewed once more the relatively short
path in tap_handler_passt() before we start using packet_*()
functions, and found this.

Never observed in practice, but artificially reproduced with changes
in qemu's socket interface: if we don't receive from qemu a complete
length descriptor in one recv() call, or if we receive a partial one
at the end of one call, we currently disregard the rest, which would
make the stream inconsistent.

Nothing really bad happens, except that from that point on we would
disregard all the packets we get until, if ever, we get the stream
back in sync by chance.

Force reading a complete packet length descriptor with a blocking
recv(), if needed -- not just a complete packet later.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-10 11:17:50 +01:00
Stefano Brivio
11efaefa1e passt, qrap, README: Update notes and documentation for AF_UNIX support in qemu
We can't get rid of qrap quite yet, but at least we should start
telling users it's not going to be needed anymore starting from qemu
7.2.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:32 +01:00
David Gibson
f7653a1446 Use endian-safer typing in struct tap4_l4_t
We recently converted to using struct in_addr rather than bare in_addr_t
or uint32_t to represent IPv4 addresses in network order.  This makes it
harder forget to apply the correct endian conversions.

We omitted the IPv4 addresses stored in struct tap4_l4_t, however.  Convert
those as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:26 +01:00
David Gibson
7c7b68dbe0 Use typing to reduce chances of IPv4 endianness errors
We recently corrected some errors handling the endianness of IPv4
addresses.  These are very easy errors to make since although we mostly
store them in network endianness, we sometimes need to manipulate them in
host endianness.

To reduce the chances of making such mistakes again, change to always using
a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network
endian addresses.  This makes it harder to accidentally do arithmetic or
comparisons on such addresses as if they were host endian.

We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to
directly work with struct in_addr values.  This has the additional benefit
of making the IPv4 and IPv6 paths more visually similar.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:24 +01:00
Stefano Brivio
2d4468ebb7 tap: Support for detection of existing sockets on ramfs
On ramfs, connecting to a non-existent UNIX domain socket yields
EACCESS, instead of ENOENT. This is visible if we use passt directly
on rootfs (a ramfs instance) from an initramfs image.

It's probably wrong for ramfs to return EACCES, but given the
simplicity of the filesystem, I doubt we should try to fix it there
at the possible cost of added complexity.

Also, this whole beauty should go away once qrap-less usage is
established, so just accept EACCES as indication that a conflicting
socket does not, in fact, exist.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-11-04 12:01:09 +01:00
Stefano Brivio
947d756747 tap: Trace received (outbound) ICMP packets in debug mode, too
This only worked for ICMPv6: ICMP packets have no TCP-style header,
so they are handled as a special case before packet sequences are
formed, and the call to tap_packet_debug() was missing.

Fixes: bb708111833e ("treewide: Packet abstraction with mandatory boundary checks")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-27 00:18:16 +02:00
David Gibson
c6845f60a0 dhcp: Use tap_udp4_send() helper in dhcp()
The IPv4 specific dhcp() manually constructs L2 and IP headers to send its
DHCP reply packet, unlike its IPv6 equivalent in dhcpv6.c which uses the
tap_udp6_send() helper.  Now that we've broaded the parameters to
tap_udp4_send() we can use it in dhcp() to avoid some duplicated logic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:35:00 +02:00
David Gibson
2dbc622f54 tap: Split tap_ip4_send() into UDP and ICMP variants
tap_ip4_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv4 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:56 +02:00
David Gibson
cb1edae3b5 ndp: Remove unneeded eh_source parameter
ndp() takes a parameter giving the ethernet source address of the packet
it is to respond to, which it uses to determine the destination address to
send the reply packet to.

This is not necessary, because the address will always be the guest's
MAC address.  Even if the guest has just changed MAC address, then either
tap_handler_passt() or tap_handler_pasta() - which are the only call paths
leading to ndp() will have updated c->mac_guest with the new value.

So, remove the parameter, and just use c->mac_guest, making it more
consistent with other paths where we construct packets to send inwards.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:51 +02:00
David Gibson
9d8dd8b6f4 tap: Split tap_ip6_send() into UDP and ICMP variants
tap_ip6_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation.  By using a suitable
helper we can split it into tap_udp6_send() and tap_icmp6_send() functions
without greatly increasing the code size, this removing that layering
violation.

We make some small changes to the interface while there.  In both cases
we make the destination IPv6 address a parameter, which will be useful
later.  For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header.  For the ICMP variant we pass in the ICMP
header as before.  The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:48 +02:00
David Gibson
f616ca231e Split tap_ip_send() into IPv4 and IPv6 specific functions
The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and
it turns out that every caller (statically) knows if it is using IPv4 or
IPv6.  So split into separate tap_ip4_send() and tap_ip6_send() functions.
Use a new tap_l2_hdr() function for the very small common part.

While we're there, make some minor cleanups:
  - We were double writing some fields in the IPv6 header, so that it
    temporary matched the pseudo-header for checksum calculation.  With
    recent checksum reworks, this isn't neccessary any more.
  - We don't use any IPv4 header options, so use some sizeof() constructs
    instead of some open coded values for header length.
  - The comment used to say that the flow label was for TCP over IPv6, but
    in fact the only thing we used it for was DHCPv6 over UDP traffic

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:45 +02:00
David Gibson
fb5d1c5d7d tap: Remove unhelpeful vnet_pre optimization from tap_send()
Callers of tap_send() can optionally use a small optimization by adding
extra space for the 4 byte length header used on the qemu socket interface.
tap_ip_send() is currently the only user of this, but this is used only
for "slow path" ICMP and DHCP packets, so there's not a lot of value to
the optimization.

Worse, having the two paths here complicates the interface and makes future
cleanups difficult, so just remove it.  I have some plans to bring back the
optimization in a more general way in future, but for now it's just in the
way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:43 +02:00
David Gibson
f72b63e92f Remove support for TCP packets from tap_ip_send()
tap_ip_send() is never used for TCP packets, we're unlikely to use it for
that in future, and the handling of TCP packets makes other cleanups
unnecessarily awkward.  Remove it.

This is the only user of csum_tcp4(), so we can remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:40 +02:00
David Gibson
a2eb2d310a Add helpers for normal inbound packet destination addresses
tap_ip_send() doesn't take a destination address, because it's specifically
for inbound packets, and the IP addresses of the guest/namespace are
already known to us.  Rather than open-coding this destination address
logic, make helper functions for it which will enable some later cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:38 +02:00
David Gibson
3d8ccb44a6 Add csum_ip4_header() helper to calculate IPv4 header checksums
We calculate IPv4 header checksums in at least two places, in dhcp() and
in tap_ip_send.  Add a helper to handle this calculation in both places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:34 +02:00
David Gibson
bd4be308fc Add csum_udp4() helper for calculating UDP over IPv4 checksums
At least two places in passt fill in UDP over IPv4 checksums, although
since UDP checksums are optional with IPv4 that just amounts to storing
a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in
dhcp()).  For consistency, add a helper for this "calculation".

Just for the heck of it, add the option (compile time disabled for now) to
calculate real UDP checksums.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:32 +02:00
David Gibson
6905ac75ec Add csum_udp6() helper for calculating UDP over IPv6 checksums
Add a helper for calculating UDP checksums when used over IPv6
For future flexibility, the new helper takes parameters for the fields in
the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
be explicitly constructed.  It also allows the UDP header and payload to
be in separate buffers, although we don't use this yet.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:29 +02:00
David Gibson
67ab617172 Add csum_icmp4() helper for calculating ICMP checksums
Although tap_ip_send() is currently the only place calculating ICMP
checksums, create a helper function for symmetry with ICMPv6.  For
future flexibility it allows the ICMPv6 header and payload to be in
separate buffers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:26 +02:00
David Gibson
7abd2b0d72 Add csum_icmp6() helper for calculating ICMPv6 checksums
At least two places in passt calculate ICMPv6 checksums, ndp() and
tap_ip_send().  Add a helper to handle this calculation in both places.
For future flexibility, the new helper takes parameters for the fields in
the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
be explicitly constructed.  It also allows the ICMPv6 header and payload to
be in separate buffers, although we don't use this yet.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:21 +02:00
Stefano Brivio
3e2eb4337b conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.

Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.

To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.

As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.

While at it:

- move the v4_only && v6_only options check just after the first
  option processing loop, as this is more strictly related to
  option parsing proper

- update the man page, explaining that CAP_NET_BIND_SERVICE is
  *not* the preferred way to bind ports, because passt and pasta
  can be abused to allow other processes to make effective usage
  of it. Add a note about the recommended sysctl instead

- simplify nl_sock_init_do() now that it's called once for each
  case

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:36 +02:00
Stefano Brivio
2970dc257c tap: Don't check sequence counts when adding packets to pool
This is a minor optimisation possibility I spotted while trying to
debug a hang in tap4_handler(): if we run out of space for packet
sequences, it's fine to add packets to an existing per-sequence pool.

We should check the count of packet sequences only once we realise
that we actually need a new packet sequence.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:36 +02:00
Stefano Brivio
a62ed181db conf, tap: Add option to quit once the client closes the connection
This is practical to avoid explicit lifecycle management in users,
e.g. libvirtd, and is trivial to implement.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
David Gibson
eb5e123038 cppcheck: Reduce scope of some variables
Minor style improvement suggested by cppcheck.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:22:01 +02:00
Stefano Brivio
d30bde3181 tap: Check return value of accept4() before calling getsockopt()
Reported by Coverity (CWE-119):

	Negative value used as argument to a function expecting a
	positive value (for example, size of buffer or allocation)

and harmless, because getsockopt() would return -EBADF if the
socket is -1, so we wouldn't print anything.

Check if accept4() returns a valid socket before calling getsockopt()
on it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-22 16:54:01 +02:00
David Gibson
16f5586bb8 Make substructures for IPv4 and IPv6 specific context information
The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity.  Split those out into a sub-structure.

This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 22:14:07 +02:00
David Gibson
5e12d23acb Separate IPv4 and IPv6 configuration
After recent changes, conf_ip() now has essentially entirely disjoint paths
for IPv4 and IPv6 configuration.  So, it's cleaner to split them out into
different functions conf_ip4() and conf_ip6().

Splitting these out also lets us make the interface a bit nicer, having
them return success or failure directly, rather than manipulating c->v4
and c->v6 to indicate success/failure of the two versions.

Since these functions may also initialize the interface index for each
protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing
tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since
a 0 interface index is never valid).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Whitespace fixes]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-30 22:12:50 +02:00
David Gibson
4bc883aeab Move passt mac_guest init to be more symmetric with pasta
In pasta mode, the guest's MAC address is set up in pasta_ns_cobf() called
from tap_sock_tun_init().  If we have a guest MAC configured with
--ns-mac-addr, this will set the given MAC on the kernel tuntap device, or
if we haven't configured one it will update our record of the guest MAC to
the kernel assigned one from the device.

For passt, we don't initially know the guest's MAC until we receive packets
from it, so we have to initially use a broadcast address.  This is - oddly
- set up in an entirely different place, in conf_ip() conditional on the
mode.

Move it to the logically matching place for passt - tap_sock_unix_init().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 21:57:50 +02:00