The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity. Split those out into a sub-structure.
This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.
Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.
This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.
This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.
Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.
This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Unions and structs, you all have names now.
Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.
Provide a ffsl() weak declaration using gcc built-in.
Start reordering includes, but that's not enough for the
llvm-include-order checker yet.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This shouldn't happen on any sane configuration, but I just met an
example of that: the default IPv6 gateway on the host is configured
with a global unicast address, we use that as source for RA, DHCPv6
replies, and the guest ignores it. Same later on if we talk TCP or
UDP and the guest has no idea where that address comes from.
Use our link-local address in case the gateway address is global.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Replace libc functions that might dynamically allocate memory with own
implementations or wrappers.
Drop brk(2) from list of allowed syscalls in seccomp profile.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Once we're past the IA_NA or IA_TA option itself, before we start
looking for IA_ADDR suboptions, we need to subtract the length
of the option we parsed so far, otherwise we might end up reading
past the end of the message, or miss some parts.
While at it, streamline calculations in dhcpv6_opt().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This isn't optional: TCP streams must carry a unique, hard-to-guess,
non-zero label for each direction. Linux, probably among others,
will otherwise refuse to associate packets in a given stream to the
same connection.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
dhcpv6_opt() already reflects consumed bytes on the remaining length,
and that we're not exceeding the message length. At this point, the
remaining length is usually zero.
While at it, drop a useless __packed__ attribute that triggers a gcc
warning.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:
$ unshare -rUn
# echo $$
1871759
$ ./pasta 1871759 # From another terminal
# udhcpc -i pasta0 2>/dev/null
# ping -c1 pasta.pizza
PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms
--- pasta.pizza ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
# ping -c1 spaghetti.pizza
PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms
--- spaghetti.pizza ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms
This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.
Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:
# ip link set dev lo up
# iperf3 -s
$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
[SUM] 0.00-10.00 sec 52.3 GBytes 44.9 Gbits/sec 283 sender
[SUM] 0.00-10.43 sec 52.3 GBytes 43.1 Gbits/sec receiver
iperf Done.
epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.
A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Add support for a variable amount of DNS servers, including zero,
from /etc/resolv.conf, in DHCP, NDP and DHCPv6 implementations.
Introduce support for domain search list for DHCP (RFC 3397),
NDP (RFC 8106), and DHCPv6 (RFC 3646), also sourced from
/etc/resolv.conf.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As we support UDP forwarding for packets that are sent to local
ports, we actually need some kind of connection tracking for UDP.
While at it, this commit introduces a number of vaguely related fixes
for issues observed while trying this out. In detail:
- implement an explicit, albeit minimalistic, connection tracking
for UDP, to allow usage of ephemeral ports by the guest and by
the host at the same time, by binding them dynamically as needed,
and to allow mapping address changes for packets with a loopback
address as destination
- set the guest MAC address whenever we receive a packet from tap
instead of waiting for an ARP request, and set it to broadcast on
start, otherwise DHCPv6 might not work if all DHCPv6 requests time
out before the guest starts talking IPv4
- split context IPv6 address into address we assign, global or site
address seen on tap, and link-local address seen on tap, and make
sure we use the addresses we've seen as destination (link-local
choice depends on source address). Similarly, for IPv4, split into
address we assign and address we observe, and use the address we
observe as destination
- introduce a clock_gettime() syscall right after epoll_wait() wakes
up, so that we can remove all the other ones and pass the current
timestamp to tap and socket handlers -- this is additionally needed
by UDP to time out bindings to ephemeral ports and mappings between
loopback address and a local address
- rename sock_l4_add() to sock_l4(), no semantic changes intended
- include <arpa/inet.h> in passt.c before kernel headers so that we
can use <netinet/in.h> macros to check IPv6 address types, and
remove a duplicate <linux/ip.h> inclusion
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It looks like some versions of ISC's IPv6 dhclient not only discard
the DNS Recursive Name Server option if other options (Domain Search
List? FQDN?) are absent, but they also drop existing entries
configured via SLAAC from /etc/resolv.conf.
Don't pass option 23 until I figure this out, it's anyway redundant
as we pass DNS information via SLAAC (RFC 8106).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
dhcpv6_opt() needs to subtract option length _before_ returning,
so that callers can conveniently pass the remaining length on
subsequent calls.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The NotOnLink status code needs to be appended to the existing IA
content, because if we omit the requested addresses in the reply,
ISC's dhclient handles it as a NoAddrsAvail response.
Also fix length accounting (we would send a bunch of zeroes after
the IA otherwise), and print an informational message with the
requested address, if it's not appropriate for the link.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This implementation, similarly to the IPv4 DHCP one, hands out a
single address, which is the same as the upstream address for the
host.
This avoids the need for address translation as long as the client
runs a DHCPv6 client. The NDP "Managed" flag is now set in Router
Advertisements.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>