1
0
mirror of https://passt.top/passt synced 2024-12-22 21:55:22 +00:00
Commit Graph

69 Commits

Author SHA1 Message Date
Stefano Brivio
346da48fe6 udp: Fix port and address checks for DNS forwarder
First off, as we swap endianness for source ports in
udp_fill_data_v{4,6}(), we want host endianness, not network
endianness. It doesn't actually matter if we use htons() or ntohs()
here, but the current version is confusing.

In the IPv4 path, when we remap DNS answers, we already swapped the
endianness as needed for the source port: don't swap it again,
otherwise we'll not map DNS answers for IPv4.

In the IPv6 path, when we remap DNS answers, we want to check that
they came from our upstream DNS server, not the one configured via
--dns-forward (which doesn't even need to exist for this
functionality to work).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
c1eff9a3c6 conf, tcp, udp: Allow specification of interface to bind to
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).

Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.

Reported use case: running local services that use ports we want
to have externally forwarded:
  https://github.com/containers/podman/issues/14425

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
Stefano Brivio
5290b6f13e udp: Replace pragma to ignore bogus stringop-overread warning with workaround
Commit c318ffcb4c ("udp: Ignore bogus -Wstringop-overread for
write() from gcc 12.1") uses a gcc pragma to ignore a bogus warning,
which started appearing on gcc 12.1 (aarch64 and x86_64) due to:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483

...but gcc 12.2 has the same issue. Not just that: if LTO is enabled,
the pragma itself is ignored (this wasn't the case with gcc 12.1),
because of:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922

Drop the pragma, and assign a frame (in the networking sense) pointer
from the offset of the Ethernet header in the buffer, then pass it to
write() and pcap(), so that gcc doesn't obsess anymore with the fact
that an Ethernet header is 14 bytes and we're sending more than that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:23:10 +02:00
David Gibson
d5b80ccc72 Fix widespread off-by-one error dealing with port numbers
Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a
'short'.  USHRT_MAX is therefore the maximum port number and this is widely
used in the code.  Unfortunately, a lot of those places don't actually
want the maximum port number (USHRT_MAX == 65535), they want the total
number of ports (65536).  This leads to a number of potentially nasty
consequences:

 * We have buffer overruns on the port_fwd::delta array if we try to use
   port 65535
 * We have similar potential overruns for the tcp_sock_* arrays
 * Interestingly udp_act had the correct size, but we can calculate it in
   a more direct manner
 * We have a logical overrun of the ports bitmap as well, although it will
   just use an unused bit in the last byte so isnt harmful
 * Many loops don't consider port 65535 (which does mitigate some but not
   all of the buffer overruns above)
 * In udp_invert_portmap() we incorrectly compute the reverse port
   translation for return packets

Correct all these by using a new NUM_PORTS defined explicitly for this
purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
3ede07aac9 Treat port numbers as unsigned
Port numbers are unsigned values, but we're storing them in (signed) int
variables in some places.  This isn't actually harmful, because int is
large enough to hold the entire range of ports.  However in places we don't
want to use an in_port_t (usually to avoid overflow on the last iteration
of a loop) it makes more conceptual sense to use an unsigned int. This will
also avoid some problems with later cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
f5a31ee94c Don't use indirect remap functions for conf_ports()
Now that we've delayed initialization of the UDP specific "reverse" map
until udp_init(), the only difference between the various 'remap' functions
used in conf_ports() is which array they target.  So, simplify by open
coding the logic into conf_ports() with a pointer to the correct mapping
array.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
1467a35b5a udp: Delay initialization of UDP reversed port mapping table
Because it's connectionless, when mapping UDP ports we need, in addition
to the table of deltas for destination ports needed by TCP, we need an
inverted table to translate the source ports on return packets.

Currently we fill out the inverted table at the same time we construct the
main table in udp_remap_to_tap() and udp_remap_to_init().  However, we
don't use either table until after we've initialized UDP, so we can delay
the construction of the reverse table to udp_init().  This makes the
configuration more symmetric between TCP and UDP which will enable further
cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
163dc5f188 Consolidate port forwarding configuration into a common structure
The configuration for how to forward ports in and out of the guest/ns is
divided between several different variables.  For each connect direction
and protocol we have a mode in the udp/tcp context structure, a bitmap
of which ports to forward also in the context structure and an array of
deltas to apply if the outward facing and inward facing port numbers are
different.  This last is a separate global variable, rather than being in
the context structure, for no particular reason.  UDP also requires an
additional array which has the reverse mapping used for return packets.

Consolidate these into a re-used substructure in the context structure.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
af9c98af5f udp: Don't drop zero-length outbound UDP packets
udp_tap_handler() currently skips outbound packets if they have a payload
length of zero.  This is not correct, since in a datagram protocol zero
length packets still have meaning.

Adjust this to correctly forward the zero-length packets by using a msghdr
with msg_iovlen == 0.

Bugzilla: https://bugs.passt.top/show_bug.cgi?id=19

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 11:14:29 +02:00
David Gibson
484652c632 udp: Don't pre-initialize msghdr array
In udp_tap_handler() the array of msghdr structures, mm[], is initialized
to zero.  Since UIO_MAXIOV is 1024, this can be quite a large zero, which
is expensive if we only end up using a few of its entries.  It also makes
it less obvious how we're setting all the control fields at the point we
actually invoke sendmmsg().

Rather than pre-initializing it, just initialize each element as we use it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 11:14:29 +02:00
David Gibson
16f5586bb8 Make substructures for IPv4 and IPv6 specific context information
The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity.  Split those out into a sub-structure.

This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 22:14:07 +02:00
David Gibson
5e12d23acb Separate IPv4 and IPv6 configuration
After recent changes, conf_ip() now has essentially entirely disjoint paths
for IPv4 and IPv6 configuration.  So, it's cleaner to split them out into
different functions conf_ip4() and conf_ip6().

Splitting these out also lets us make the interface a bit nicer, having
them return success or failure directly, rather than manipulating c->v4
and c->v6 to indicate success/failure of the two versions.

Since these functions may also initialize the interface index for each
protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing
tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since
a 0 interface index is never valid).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Whitespace fixes]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-30 22:12:50 +02:00
Stefano Brivio
c318ffcb4c udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1
With current OpenSUSE Tumbleweed on aarch64 (gcc-12-1.3.aarch64) and
on x86_64 (gcc-12-1.4.x86_64), but curiously not on armv7hl
(gcc-12-1.3.armv7hl), gcc warns about using the _pointer_ to the
802.3 header to write the whole frame to the tap descriptor:
  reading between 62 and 4294967357 bytes from a region of size 14

which is bogus:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483

Probably declaring udp_sock_fill_data_v{4,6}() as noinline would
"fix" this, but that's on the data path, so I'd rather not. Use
a gcc pragma instead.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-19 16:27:20 +02:00
Stefano Brivio
3c6ae62510 conf, tcp, udp: Allow address specification for forwarded ports
This feature is available in slirp4netns but was missing in passt and
pasta.

Given that we don't do dynamic memory allocation, we need to bind
sockets while parsing port configuration. This means we need to
process all other options first, as they might affect addressing and
IP version support. It also implies a minor rework of how TCP and UDP
implementations bind sockets.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-01 07:19:05 +02:00
Stefano Brivio
2b1fbf4631 udp: Out-of-bounds read, CWE-125 in udp_timer()
Not an actual issue due to how it's typically stored, but udp_act
can also be used for ports 65528-65535. Reported by Coverity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-07 11:44:35 +02:00
Stefano Brivio
22ed4467a4 treewide: Unchecked return value from library, CWE-252
All instances were harmless, but it might be useful to have some
debug messages here and there. Reported by Coverity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-07 11:44:35 +02:00
Stefano Brivio
37c228ada8 tap, tcp, udp, icmp: Cut down on some oversized buffers
The existing sizes provide no measurable differences in throughput
and packet rates at this point. They were probably needed as batched
implementations were not complete, but they can be decreased quite a
bit now.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
ad7f57a5b7 udp: Move flags before ts in struct udp_tap_port, avoid end padding
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
48582bf47f treewide: Mark constant references as const
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
bb70811183 treewide: Packet abstraction with mandatory boundary checks
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
3eb19cfd8a tcp, udp, util: Enforce 24-bit limit on socket numbers
This should never happen, but there are no formal guarantees: ensure
socket numbers are below SOCKET_MAX.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
79217b7689 udp: Use flags for local, loopback, and configured unicast binds
There's no value in keeping a separate timestamp for activity and for
aging of local binds, given that they have the same timeout. Reduce
that to a single timestamp, with a flag indicating the local bind.

Also use flags instead of separate int fields for loopback and
configured unicast address usage as source address.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
5eb7604203 udp: Split buffer queueing/writing parts of udp_sock_handler()
...it became too hard to follow: split it off to
udp_sock_fill_data_v{4,6}.

While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h,
instead of open-coded memcmp().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
0296a57242 udp: Drop _splice from recv, send, sendto static buffer names
It's already implied by the fact they don't have "l2" in their
names, and dropping it improves readability a bit.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
eed6933e6c udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}
Not functionally needed, but gcc versions 7 to 9 (at least) will
issue a warning otherwise.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-25 22:54:35 +01:00
Stefano Brivio
6c93111864 tcp, udp: Receive batching doesn't pay off when writing single frames to tap
In pasta mode, when we get data from sockets and write it as single
frames to the tap device, we batch receive operations considerably,
and then (conceptually) split the data in many smaller writes.

It looked like an obvious choice, but performance is actually better
if we receive data in many small frame-sized recvmsg()/recvmmsg().

The syscall overhead with the previous behaviour, observed by perf,
comes predominantly from write operations, but receiving data in
shorter chunks probably improves cache locality by a considerable
amount.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
9afd87b733 udp: Allow loopback connections from host using configured unicast address
Likely for testing purposes only: allow connections from host to
guest or namespace using, as connection target, the configured,
possibly global unicast address.

In this case, we have to map the destination address to a link-local
address, and for port-based tracked responses, the source address
needs to be again the unicast address: not loopback, not link-local.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
89678c5157 conf, udp: Introduce basic DNS forwarding
For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.

This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.

Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.

This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
0515adceaa passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
caa22aa644 tcp, udp, util: Fixes for bitmap handling on big-endian, casts
Bitmap manipulating functions would otherwise refer to inconsistent
sets of bits on big-endian architectures. While at it, fix up a
couple of casts.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 16:30:59 +01:00
Stefano Brivio
b93c2c1713 passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:57:09 +01:00
Stefano Brivio
627e18fa8a passt: Add cppcheck target, test, and address resulting warnings
...mostly false positives, but a number of very relevant ones too,
in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 09:41:13 +02:00
Stefano Brivio
27f5999677 udp: Avoid static initialiser for udp{4,6}_l2_buf
With the new UDP_TAP_FRAMES value, the binary size grows considerably.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:49:25 +02:00
Stefano Brivio
85a80f8f63 udp: Fix maximum payload size calculation for IPv4 buffers, bump UDP_TAP_FRAMES
The issue with a higher UDP_TAP_FRAMES was actually coming from a
payload size the guest couldn't digest. Fix that, and bump
UDP_TAP_FRAMES back to 128.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:42:09 +02:00
Stefano Brivio
dd942eaa48 passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:26:08 +02:00
Stefano Brivio
9618d24700 ndp, dhcpv6, tcp, udp: Always use link-local as source if gateway isn't
This shouldn't happen on any sane configuration, but I just met an
example of that: the default IPv6 gateway on the host is configured
with a global unicast address, we use that as source for RA, DHCPv6
replies, and the guest ignores it. Same later on if we talk TCP or
UDP and the guest has no idea where that address comes from.

Use our link-local address in case the gateway address is global.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 11:10:23 +02:00
Stefano Brivio
12cfa6444c passt: Add clang-tidy Makefile target and test, take care of warnings
Most are just about style and form, but a few were actually
serious mistakes (NDP-related).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:34:22 +02:00
Stefano Brivio
b0b77118fe passt: Address warnings from Clang's scan-build
All false positives so far.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
1a563a0cbd passt: Address gcc 11 warnings
A mix of unchecked return values, a missing permission mask for
open(2) with O_CREAT, and some false positives from
-Wstringop-overflow and -Wmaybe-uninitialized.

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefan Hajnoczi
14fe73e766 udp: drop bogus udp_tap_map ts assignment
The 'ts' field is a timestamp so assigning the socket file descriptor is
incorrect. There is no actual bug because the current time is assigned
just a few lines later:

      udp_tap_map[V4][src].sock = s;
      udp_tap_map[V4][src].ts = s;
      ^^^^^^^^^^^ bogus ^^^^^^^^^^
      bitmap_set(udp_act[V4][UDP_ACT_TAP], src);
  }

  udp_tap_map[V4][src].ts = now->tv_sec;
  ^^^^^^^^^^^^^^^ correct ^^^^^^^^^^^^^^

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-15 20:46:17 +02:00
Stefano Brivio
f45891cf26 conf, tcp, udp: Add --no-map-gw to disable mapping gateway address to host
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:19:52 +02:00
Stefano Brivio
66d5930ec7 passt, pasta: Add seccomp support
List of allowed syscalls comes from comments in the form:
	#syscalls <list>

for syscalls needed both in passt and pasta mode, and:
	#syscalls:pasta <list>
	#syscalls:passt <list>

for syscalls specifically needed in pasta or passt mode only.

seccomp.sh builds a list of BPF statements from those comments,
prefixed by a binary search tree to keep lookup fast.

While at it, clean up a bit the Makefile using wildcards.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:15:46 +02:00
Giuseppe Scrivano
9a175cc2ce pasta: Allow specifying paths and names of namespaces
Based on a patch from Giuseppe Scrivano, this adds the ability to:

- specify paths and names of target namespaces to join, instead of
  a PID, also for user namespaces, with --userns

- request to join or create a network namespace only, without
  entering or creating a user namespace, with --netns-only

- specify the base directory for netns mountpoints, with --nsrun-dir

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
[sbrivio: reworked logic to actually join the given namespaces when
 they're not created, implemented --netns-only and --nsrun-dir,
 updated pasta demo script and man page]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-07 04:05:15 +02:00
Stefano Brivio
dd581730e5 tap: Completely de-serialise input message batches
Until now, messages would be passed to protocol handlers in a single
batch only if they happened to be dequeued in a row. Packets
interleaved between different connections would result in multiple
calls to the same protocol handler for a single connection.

Instead, keep track of incoming packet descriptors, arrange them in
sequences, and call protocol handlers only as we completely sorted
input messages in batches.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
ec0bdc10b1 udp: Switch to new socket message after 32KiB instead of 64KiB
For some reason, this measurably improves performance with qemu and
virtio-net.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
c2d86b7475 udp: Decrease UDP_TAP_FRAMES to 16
Similarly to the decrease in TCP_TAP_FRAMES, this improves fairness,
with a very small impact on performance.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
3994fc8f58 udp: Reset iov_base after sending partial message on sendmmsg() failure
We set the length while processing messges, but the starting address is
pre-initialised.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-09 15:40:04 +02:00
Stefano Brivio
ecf1f97564 udp: Fix comparison of seen IPv4 address for local connections
c->addr4_seen is stored in network order.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-09 15:40:04 +02:00
Stefano Brivio
8e9333616a udp: Fix retry mechanism on partial sendmmsg()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-09 15:40:04 +02:00