1
0
mirror of https://passt.top/passt synced 2024-07-01 23:42:41 +00:00
Commit Graph

163 Commits

Author SHA1 Message Date
Stefano Brivio
7656a6f888 conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway
Seen in a Google Compute Engine environment with a machine configured
via cloud-init-dhcp, while testing Podman integration for pasta: the
assigned address has a /32 netmask, and there's a default route,
which can be added on the host because there's another route, also
/32, pointing to the default gateway. For example, on the host:

  ip -4 address add 10.156.0.2/32 dev eth0
  ip -4 route add 10.156.0.1/32 dev eth0
  ip -4 route add default via 10.156.0.1

This is not a valid configuration as far as I can tell: if the
address is configured as /32, it shouldn't be used to reach a gateway
outside its derived netmask. However, Linux allows that, and
everything works.

The problem comes when pasta --config-net sources address and default
route from the host, and it can't configure the route in the target
namespace because the gateway is invalid. That is, we would skip
configuring the first route in the example, which results in the
equivalent of doing:

  ip -4 address add 10.156.0.2/32 dev eth0
  ip -4 route add default via 10.156.0.1

where, at this point, 10.156.0.1 is unreachable, and hence invalid
as a gateway.

Sourcing more routes than just the default is doable, but probably
undesirable: pasta users want to provide connectivity to a container,
not reflect exactly whatever trickery is configured on the host.

Add a consistency check and an adjustment: if the configured default
gateway is not reachable, shrink the given netmask until we can reach
it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:32 +01:00
David Gibson
7c7b68dbe0 Use typing to reduce chances of IPv4 endianness errors
We recently corrected some errors handling the endianness of IPv4
addresses.  These are very easy errors to make since although we mostly
store them in network endianness, we sometimes need to manipulate them in
host endianness.

To reduce the chances of making such mistakes again, change to always using
a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network
endian addresses.  This makes it harder to accidentally do arithmetic or
comparisons on such addresses as if they were host endian.

We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to
directly work with struct in_addr values.  This has the additional benefit
of making the IPv4 and IPv6 paths more visually similar.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:24 +01:00
David Gibson
dd3470d9a9 Use IPV4_IS_LOOPBACK more widely
This macro checks if an IPv4 address is in the loopback network
(127.0.0.0/8).  There are two places where we open code an identical check,
use the macro instead.

There are also a number of places we specifically exclude the loopback
address (127.0.0.1), but we should actually be excluding anything in the
loopback network.  Change those sites to use the macro as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:21 +01:00
David Gibson
dd09cceaee Minor improvements to IPv4 netmask handling
There are several minor problems with our parsing of IPv4 netmasks (-n).

First, we don't reject nonsensical netmasks like 0.255.0.255.  Address this
structurally by using prefix length instead of netmask as the primary
variable, only converting (and validating) when we need to.  This has the
added benefit of making some things more uniform with the IPv6 path.

Second, when the user specifies a prefix length, we truncate the output
from strtol() to an integer, which means we would treat -n 4294967320 as
valid (equivalent to 24).  Fix types to check for this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:19 +01:00
David Gibson
2b793d94ca Correct some missing endian conversions of IPv4 addresses
The INADDR_LOOPBACK constant is in host endianness, and similarly the
IN_MULTICAST macro expects a host endian address.  However, there are some
places in passt where we use those with network endian values.  This means
that passt will incorrectly allow you to set 127.0.0.1 or a multicast
address as the guest address or DNS forwarding address.  Add the necessary
conversions to correct this.

INADDR_ANY and INADDR_BROADCAST logically behave the same way, although
because they're palindromes it doesn't have an effect in practice.  Change
them to be logically correct while we're there, though.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:03:58 +01:00
Stefano Brivio
7402951658 conf, passt.1: Don't imply --foreground with --debug
Having -f implied by -d (and --trace) usually saves some typing, but
debug mode in background (with a log file) is quite useful if pasta
is started by Podman, and is probably going to be handy for passt
with libvirt later, too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-27 00:17:56 +02:00
Stefano Brivio
c11277b94f conf: Don't pass leading ~ to parse_port_range() on exclusions
Commit 84fec4e998 ("Clean up parsing of port ranges") drops the
strspn() call before the parsing of excluded port ranges, because now
we're checking against any stray characters at every step.

However, that also has the effect of passing ~ as first character to
the new parse_port_range(), which makes no sense: we already checked
that ~ is the first character before the call, so skip it.

Alona reported this output:
  Invalid port specifier ~15000,~15001,~15006,~15008,~15020,~15021,~15090

while the whole specifier is indeed valid.

Reported-by: Alona Paz <alkaplan@redhat.com>
Fixes: 84fec4e998 ("Clean up parsing of port ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-24 14:37:22 +02:00
Stefano Brivio
3e2eb4337b conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.

Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.

To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.

As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.

While at it:

- move the v4_only && v6_only options check just after the first
  option processing loop, as this is more strictly related to
  option parsing proper

- update the man page, explaining that CAP_NET_BIND_SERVICE is
  *not* the preferred way to bind ports, because passt and pasta
  can be abused to allow other processes to make effective usage
  of it. Add a note about the recommended sysctl instead

- simplify nl_sock_init_do() now that it's called once for each
  case

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:36 +02:00
David Gibson
eb3d03a588 isolation: Only configure UID/GID mappings in userns when spawning shell
When in passt mode, or pasta mode spawning a command, we create a userns
for ourselves.  This is used both to isolate the pasta/passt process itself
and to run the spawned command, if any.

Since eed17a47 "Handle userns isolation and dropping root at the same time"
we've handled both cases the same, configuring the UID and GID mappings in
the new userns to map whichever UID we're running as to root within the
userns.

This mapping is desirable when spawning a shell or other command, so that
the user gets a root shell with reasonably clear abilities within the
userns and netns.  It's not necessarily essential, though.  When not
spawning a shell, it doesn't really have any purpose: passt itself doesn't
need to be root and can operate fine with an unmapped user (using some of
the capabilities we get when entering the userns instead).

Configuring the uid_map can cause problems if passt is running with any
capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to
allow it to forward low ports.  In this case the kernel makes files in
/proc/pid owned by root rather than the starting user to prevent the user
from interfering with the operation of the capability-enhanced process.
This includes uid_map meaning we are not able to write to it.

Whether this behaviour is correct in the kernel is debatable, but in any
case we might as well avoid problems by only initializing the user mappings
when we really want them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:36 +02:00
David Gibson
c22ebccba8 isolation: Replace drop_caps() with a version that actually does something
The current implementation of drop_caps() doesn't really work because it
attempts to drop capabilities from the bounding set.  That's not the set
that really matters, it's about limiting the abilities of things we might
later exec() rather than our own capabilities.  It also requires
CAP_SETPCAP which we won't usually have.

Replace it with a new version which uses setcap(2) to drop capabilities
from the effective and permitted sets.  For now we leave the inheritable
set as is, since we don't want to preclude the user from passing
inheritable capabilities to the command spawed by pasta.

Correctly dropping caps reveals that we were relying on some capabilities
we'd supposedly dropped.  Re-divide the dropping of capabilities between
isolate_initial(), isolate_user() and isolate_prefork() to make this work.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:36 +02:00
Stefano Brivio
57e2c066e9 conf: Drop excess colons in usage for DHCP and DNS options
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
10236de486 conf: Report usage for --no-netns-quit
Fixes: 745a9ba428 ("pasta: By default, quit if filesystem-bound net namespace goes away")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
c1eff9a3c6 conf, tcp, udp: Allow specification of interface to bind to
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).

Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.

Reported use case: running local services that use ports we want
to have externally forwarded:
  https://github.com/containers/podman/issues/14425

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
a62ed181db conf, tap: Add option to quit once the client closes the connection
This is practical to avoid explicit lifecycle management in users,
e.g. libvirtd, and is trivial to implement.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
e23024ccff conf, log, Makefile: Add versioning information
Add a --version option displaying that, and also include this
information in the log files.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-15 02:10:28 +02:00
Stefano Brivio
01efc71ddd log, conf: Add support for logging to file
In some environments, such as KubeVirt pods, we might not have a
system logger available. We could choose to run in foreground, but
this takes away the convenient synchronisation mechanism derived from
forking to background when interfaces are ready.

Add optional logging to file with -l/--log-file and --log-size.

Unfortunately, this means we need to duplicate features that are more
appropriately implemented by a system logger, such as rotation. Keep
that reasonably simple, by using fallocate() with range collapsing
where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and
falling back to an unsophisticated block-by-block moving of entries
toward the beginning of the file once we reach the (mandatory) size
limit.

While at it, clarify the role of LOG_EMERG in passt.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:28 +02:00
Stefano Brivio
51fa9bfd7b conf: Drop duplicate, diverging optstring assignments
This originated as a result of copy and paste to introduce a second
stage for processing options related to port forwarding, has already
bitten David in the past, and just gave me hours of fun.

As a matter of fact, the second set of optstring assignments was
already incorrect, but it didn't matter because the first one was
more restrictive, not allowing optional arguments for -P, -D, -S.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:26 +02:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
David Gibson
40901c5437 cppcheck: Use inline suppression for strtok() in conf.c
strtok() is non-reentrant and old-fashioned, so cppcheck would complains
about its use in conf.c if it weren't suppressed.  We're single threaded
and strtok() is convenient though, so it's not really worth reworking at
this time.  Convert this to an inline suppression so it's adjacent to the
code its annotating.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:22:19 +02:00
David Gibson
ab96da98cd Don't shadow 'i' in conf_ports()
The counter 'i' is used in a number of places in conf_ports(), but in one
of those we unnecessarily shadow it in an inner scope.  We could re-use the
same 'i' every time, but each use is logically separate, so instead remove
the outer declaration and declare it locally in each of the clauses where
we need it.

While we're there change it from a signed to unsigned int, since it's used
to iterate over port numbers which are generally treated as unsigned.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:22:05 +02:00
David Gibson
68ef4931cb Clean up parsing in conf_runas()
conf_runas() handles several of the different possible cases for the
--runas argument in a slightly odd order.  Although it can parse both
numeric UIDs/GIDs and user/group names, it can't parse a numeric UID
combined with a group name or vice versa.  That's not obviously useful, but
it's slightly surprising gap to have.

Rework the parsing to be more systematic: first split the option into
user and (optional) group parts, then separately parse each part as either
numeric or a name.  As a bonus this removes some clang-tidy warnings.

While we're there also add cppcheck suppressions for getpwnam() and
getgrnam().  It complains about those because they're not reentrant.
passt is single threaded though, and is always likely to be during
this initialization code, even if we multithread later.

There were some existing suppressions for these in the cppcheck
invocation but they're no longer up to date.  Replace them with inline
suppressions which, being next to the code, are more likely to stay
correct.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:21:58 +02:00
David Gibson
84fec4e998 Clean up parsing of port ranges
conf_ports() parses ranges of ports for the -t, -u, -T and -U options.
The code is quite difficult to the follow, to the point that clang-tidy
and cppcheck disagree on whether one of the pointers can be NULL at some
points.

Rework the code with the use of two new helper functions:
  * parse_port_range() operates a bit like strtoul(), but can parse a whole
    port range specification (e.g. '80' or '1000-1015')
  * next_chunk() does the necessary wrapping around strchr() to advance to
    just after the next given delimiter, while cleanly handling if there
    are no more delimiters

The new version is easier to follow, and also removes some cppcheck
warnings.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:21:41 +02:00
David Gibson
d5b80ccc72 Fix widespread off-by-one error dealing with port numbers
Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a
'short'.  USHRT_MAX is therefore the maximum port number and this is widely
used in the code.  Unfortunately, a lot of those places don't actually
want the maximum port number (USHRT_MAX == 65535), they want the total
number of ports (65536).  This leads to a number of potentially nasty
consequences:

 * We have buffer overruns on the port_fwd::delta array if we try to use
   port 65535
 * We have similar potential overruns for the tcp_sock_* arrays
 * Interestingly udp_act had the correct size, but we can calculate it in
   a more direct manner
 * We have a logical overrun of the ports bitmap as well, although it will
   just use an unused bit in the last byte so isnt harmful
 * Many loops don't consider port 65535 (which does mitigate some but not
   all of the buffer overruns above)
 * In udp_invert_portmap() we incorrectly compute the reverse port
   translation for return packets

Correct all these by using a new NUM_PORTS defined explicitly for this
purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
3ede07aac9 Treat port numbers as unsigned
Port numbers are unsigned values, but we're storing them in (signed) int
variables in some places.  This isn't actually harmful, because int is
large enough to hold the entire range of ports.  However in places we don't
want to use an in_port_t (usually to avoid overflow on the last iteration
of a loop) it makes more conceptual sense to use an unsigned int. This will
also avoid some problems with later cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
0d1886dca0 Pass entire port forwarding configuration substructure to conf_ports()
conf_ports() switches on the optname argument to select the target array
for several updates.  Now that all these maps are in a common structure, we
can simplify by just passing in a pointer to the whole struct port_fwd to
update.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
f5a31ee94c Don't use indirect remap functions for conf_ports()
Now that we've delayed initialization of the UDP specific "reverse" map
until udp_init(), the only difference between the various 'remap' functions
used in conf_ports() is which array they target.  So, simplify by open
coding the logic into conf_ports() with a pointer to the correct mapping
array.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
163dc5f188 Consolidate port forwarding configuration into a common structure
The configuration for how to forward ports in and out of the guest/ns is
divided between several different variables.  For each connect direction
and protocol we have a mode in the udp/tcp context structure, a bitmap
of which ports to forward also in the context structure and an array of
deltas to apply if the outward facing and inward facing port numbers are
different.  This last is a separate global variable, rather than being in
the context structure, for no particular reason.  UDP also requires an
additional array which has the reverse mapping used for return packets.

Consolidate these into a re-used substructure in the context structure.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
1128fa03fe Improve types and names for port forwarding configuration
enum conf_port_type is local to conf.c and is used to track the port
forwarding mode during configuration.  We don't keep it around in the
context structure, however the 'init_detect_ports' and 'ns_detect_ports'
fields in the context are based solely on this.  Rather than changing
encoding, just include the forwarding mode into the context structure.
Move the type definition to a new port_fwd.h, which is kind of trivial at
the moment but will have more stuff later.

While we're there, "conf_port_type" doesn't really convey that this enum is
describing how port forwarding is configured.  Rename it to port_fwd_mode.
The variables (now fields) of this type also have mildly confusing names
since it's not immediately obvious whether 'ns' and 'init' refer to the
source or destination of the packets.  Use "in" (host to guest / init to
ns) and "out" (guest to host / ns to init) instead.

This has the added bonus that we no longer have locals 'udp_init' and
'tcp_init' which shadow global functions.

In addition, add a typedef 'port_fwd_map' for a bitmap of each port number,
which is used in several places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
Stefano Brivio
4a1b675278 conf, tcp, udp: Arrays for ports need 2^16 values, not 2^16-8
Reported by David but also by Coverity (CWE-119):

	In conf_ports: Out-of-bounds access to a buffer

...not in practice, because the allocation size is rounded up
anyway, but not nice either.

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-22 16:54:09 +02:00
David Gibson
ef6da15732 Allow --userns when pasta spawns a command
Currently --userns is only allowed when pasta is attaching to an existing
netns or PID, and is prohibited when creating a new netns by spawning a
command or shell.

With the new handling of userns, this check isn't neccessary.  I'm not sure
if there's any use case for --userns with a spawned command, but it's
strictly more flexible and requires zero extra code, so we might as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
eed17a47fe Handle userns isolation and dropping root at the same time
passt/pasta can interact with user namespaces in a number of ways:
   1) With --netns-only we'll remain in our original user namespace
   2) With --userns or a PID option to pasta we'll join either the given
      user namespace or that of the PID
   3) When pasta spawns a shell or command we'll start a new user namespace
      for the command and then join it
   4) With passt we'll create a new user namespace when we sandbox()
      ourself

However (3) and (4) turn out to have essentially the same effect.  In both
cases we create one new user namespace.  The spawned command starts there,
and passt/pasta itself will live there from sandbox() onwards.

Because of this, we can simplify user namespace handling by moving the
userns handling earlier, to the same point we drop root in the original
namespace.  Extend the drop_user() function to isolate_user() which does
both.

After switching UID and GID in the original userns, isolate_user() will
either join or create the userns we require.  When we spawn a command with
pasta_start_ns()/pasta_setup_ns() we no longer need to create a userns,
because we're already made one.  sandbox() likewise no longer needs to
create (or join) an userns because we're already in the one we need.

We no longer need c->pasta_userns_fd, since the fd is only used locally
in isolate_user().  Likewise we can replace c->netns_only with a local
in conf(), since it's not used outside there.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
fc1be3d5ab Clean up and rename conf_ns_open()
conf_ns_open() opens file descriptors for the namespaces pasta needs, but
it doesnt really have anything to do with configuration any more.  For
better clarity, move it to pasta.c and rename it pasta_open_ns().  This
makes the symmetry between it and pasta_start_ns() more clear, since these
represent the two basic ways that pasta can operate, either attaching to
an existing namespace/process or spawning a new one.

Since its no longer validating options, the errors it could return
shouldn't cause a usage message.  Just exit directly with an error instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
e8b19a4bd2 Consolidate validation of pasta namespace options
There are a number of different ways to specify namespaces for pasta to
use.  Some combinations are valid and some are not.  Currently validation
for these is spread across several places: conf_ns_pid() validates PID
options specifically.  Near its callsite in conf() several other checks
are made. Some additional checks are made in conf_ns_open() and finally
theres a check just before the call to pasta_start_ns().

This is quite hard to follow.  Make it easier by putting all the validation
logic together in a new conf_pasta_ns() function, which subsumes
conf_ns_pid().  This reveals that some of the checks were redundant with
each other, so remove those.

For good measure, rename conf_netns() to conf_netns_opt() to make it
clearer its handling just the --netns option specifically, not overall
configuration of the netns.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
d72a1e7bb9 Move self-isolation code into a separate file
passt/pasta contains a number of routines designed to isolate passt from
the rest of the system for security.  These are spread through util.c and
passt.c.  Move them together into a new isolation.c file.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
5d3b50c100 Safer handling if we can't open /proc/self/uid_map
passt is allowed to run as "root" (UID 0) in a user namespace, but notas
real root in the init namespace.  We read /proc/self/uid_map to determine
if we're in the init namespace or not.

If we're unable to open /proc/self/uid_map we assume we're ok and
continue running as UID 0.  This seems unwise.  The only instances I
can think of where uid_map won't be available are if the host kernel
doesn't support namespaces, or /proc is not mounted.  In neither case
is it safe to assume we're "not really" root and continue (although in
practice we'd likely fail for other reasons pretty soon anyway).

Therefore, fail with an error in this case, instead of carrying on.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
80d7012b09 Consolidate determination of UID/GID to run as
Currently the logic to work out what UID and GID we will run as is spread
across conf().  If --runas is specified it's handled in conf_runas(),
otherwise it's handled by check_root(), which depends on initialization of
the uid and gid variables by either conf() itself or conf_runas().

Make this clearer by putting all the UID and GID logic into a single
conf_ugid() function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
10c6347747 Split checking for root from dropping root privilege
check_root() both checks to see if we are root (in the init namespace),
and if we are drops to an unprivileged user.  To make future cleanups
simpler, split the checking for root (now in check_root()) from the actual
dropping of privilege (now in drop_root()).

Note that this does slightly alter semantics.  Previously we would only
setuid() if we were originally root (in the init namespace).  Now we will
always setuid() and setgid(), though it won't actually change anything if
we weren't privileged to begin with.  This also means that we will now
always attempt to switch to the user specified with --runas, even if we
aren't (init namespace) root to begin with.  Obviously this will fail with
an error if we weren't privileged to start with.  --help and the man page
are updated accordingly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
David Gibson
7330ae3abf Don't store UID & GID persistently in the context structure
c->uid and c->gid are first set in conf(), and last used in check_root()
itself called from conf().  Therefore these don't need to be fields in the
long lived context structure and can instead be locals in conf().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
Stefano Brivio
bac7dfebe4 conf: Fix getopt_long() optstring for current semantics of -D, -S, -p
Declaring them as required_argument in the longopts array specifies
validation, but doesn't affect how optind is increased after parsing
their values.

Currently, passing one of these options as last option causes pasta
to handle their own values as path to a binary to execute.

Fixes: aae2a9bbf7 ("conf: Use "-D none" and "-S none" instead of missing empty option arguments")
Fixes: bf95322fc1 ("conf: Make the argument to --pcap option mandatory")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-02 17:07:45 +02:00
David Gibson
1392bc5ca0 Allow pasta to take a command to execute
When not given an existing PID or network namspace to attach to, pasta
spawns a shell.  Most commands which can spawn a shell in an altered
environment can also run other commands in that same environment, which can
be useful in automation.

Allow pasta to do the same thing; it can be given an arbitrary command to
run in the network and user namespace which pasta creates.  If neither a
command nor an existing PID or netns to attach to is given, continue to
spawn a default shell, as before.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:43:31 +02:00
David Gibson
c188736cd8 Use explicit --netns option rather than multiplexing with PID
When attaching to an existing namespace, pasta can take a PID or the name
or path of a network namespace as a non-option parameter.  We disambiguate
based on what the parameter looks like.  Make this more explicit by using
a --netns option for explicitly giving the path or name, and treating a
non-option argument always as a PID.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Fix typo in man page]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-30 19:43:31 +02:00
David Gibson
9e0dbc8948 More deterministic detection of whether argument is a PID, PATH or NAME
pasta takes as its only non-option argument either a PID to attach to the
namespaces of, a PATH to a network namespace or a NAME of a network
namespace (relative to /run/netns).  Currently to determine which it is
we try all 3 in that order, and if anything goes wrong we move onto the
next.

This has the potential to cause very confusing failure modes.  e.g. if the
argument is intended to be a network namespace name, but a (non-namespace)
file of the same name exists in the current directory.

Make behaviour more predictable by choosing how to treat the argument based
only on the argument's contents, not anything else on the system:
  - If it's a decimal integer treat it as a PID
  - Otherwise, if it has no '/' characters, treat it as a netns name
    (ip-netns doesn't allow '/' in netns names)
  - Otherwise, treat it as a netns path

If you want to open a persistent netns in the current directory, you can
use './netns'.

This also allows us to split the parsing of the PID|PATH|NAME option from
the actual opening of the namespaces.  In turn that allows us to put the
opening of existing namespaces next to the opening of new namespaces in
pasta_start_ns.  That makes the logical flow easier to follow and will
enable later cleanups.

Caveats:
 - The separation of functions mean we will always generate the basename
   and dirname for the netns_quit system, even when using PID namespaces.
   This is pointless, since the netns_quit system doesn't work for non
   persistent namespaces, but is harmless.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:43:31 +02:00
David Gibson
70389d3640 Move ENOENT error message into conf_ns_opt()
After calling conf_ns_opt() we check for -ENOENT and print an error
message, but conf_ns_opt() prints messages for other errors itself.  For
consistency move the ENOENT message into conf_ns_opt() as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:43:31 +02:00
David Gibson
8de488892f Remove --nsrun-dir option
pasta can identify a netns as a "name", which is to say a path relative to
(usually) /run/netns, which is the place that ip(8) creates persistent
network namespaces.  Alternatively a full path to a netns can be given.

The --nsrun-dir option allows the user to change the standard path where
netns names are resolved.  However, there's no real point to this, if the
user wants to override the location of the netns, they can just as easily
use the full path to specify the netns.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:43:31 +02:00
David Gibson
aae2a9bbf7 conf: Use "-D none" and "-S none" instead of missing empty option arguments
Both the -D (--dns) and -S (--search) options take an optional argument.
If the argument is omitted the option is disabled entirely.  However,
handling the optional argument requires some ugly special case handling if
it's the last option on the command line, and has potential ambiguity with
non-option arguments used with pasta.  It can also make it more confusing
to read command lines.

Simplify the logic here by replacing the non-argument versions with an
explicit "-D none" or "-S none".

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Reworked logic to exclude redundant/conflicting options]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-30 19:42:52 +02:00
David Gibson
bf95322fc1 conf: Make the argument to --pcap option mandatory
The --pcap or -p option can be used with or without an argument.  If given,
the argument gives the name of the file to save a packet trace to.  If
omitted, we generate a default name in /tmp.

Generating the default name isn't particularly useful though, since making
a suitable name can easily be done by the caller.  Remove this feature.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:17:57 +02:00
David Gibson
60ffc5b6cb Don't unnecessarily avoid CLOEXEC flags
There are several places in the passt code where we have lint overrides
because we're not adding CLOEXEC flags to open or other operations.
Comments suggest this is because it's before we fork() into the background
but we'll need those file descriptors after we're in the background.

However, as the name suggests CLOEXEC closes on exec(), not on fork().  The
only place we exec() is either super early invoke the avx2 version of the
binary, or when we start a shell in pasta mode, which certainly *doesn't*
require the fds in question.

Add the CLOEXEC flag in those places, and remove the lint overrides.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-24 18:01:48 +02:00
David Gibson
cc287af173 conf: Fix incorrect bounds checking for sock_path parameter
Looks like a copy-paste error where we're checking against the size of the
pcap field, rather than the sock_path field.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-24 18:01:48 +02:00
David Gibson
16f5586bb8 Make substructures for IPv4 and IPv6 specific context information
The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity.  Split those out into a sub-structure.

This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 22:14:07 +02:00
David Gibson
5e12d23acb Separate IPv4 and IPv6 configuration
After recent changes, conf_ip() now has essentially entirely disjoint paths
for IPv4 and IPv6 configuration.  So, it's cleaner to split them out into
different functions conf_ip4() and conf_ip6().

Splitting these out also lets us make the interface a bit nicer, having
them return success or failure directly, rather than manipulating c->v4
and c->v6 to indicate success/failure of the two versions.

Since these functions may also initialize the interface index for each
protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing
tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since
a 0 interface index is never valid).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Whitespace fixes]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-30 22:12:50 +02:00
David Gibson
c984ee5afd Clarify semantics of c->v4 and c->v6 variables
The v4 and v6 fields of the context structure can be confusing, because
they change meaning part way through the code:  Before conf_ip(), they are
booleans which indicate whether the -4 or -6 options have been given.
After conf_ip() they are DISABLED|ENABLED|PROBE enums which indicate
whether the IP version is available (which means both that it was allowed
on the command line and we were able to configure it).  The PROBE variant
of the enum is only used locally within conf_ip() and since recent changes
there it no longer has a real purpose different from ENABLED.

Simplify this all by making the context fields always just a boolean
indicating the availability of the IP version.  They both default to 1, but
can be set to 0 by either command line options or configuration failures.
We use some local variables in conf() for tracking the state of the command
line options on their own.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Minor coding style fix in conf.c]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-30 22:09:38 +02:00
David Gibson
4bc883aeab Move passt mac_guest init to be more symmetric with pasta
In pasta mode, the guest's MAC address is set up in pasta_ns_cobf() called
from tap_sock_tun_init().  If we have a guest MAC configured with
--ns-mac-addr, this will set the given MAC on the kernel tuntap device, or
if we haven't configured one it will update our record of the guest MAC to
the kernel assigned one from the device.

For passt, we don't initially know the guest's MAC until we receive packets
from it, so we have to initially use a broadcast address.  This is - oddly
- set up in an entirely different place, in conf_ip() conditional on the
mode.

Move it to the logically matching place for passt - tap_sock_unix_init().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 21:57:50 +02:00
David Gibson
3f19072640 Initialize host side MAC when in IPv6 only mode
When sending packets to the guest we need a source MAC address, which we
currently take from the host side interface we're using (though it's
basically arbitrary).  However if not given on the command line this MAC
is initialized in an IPv4 specific path, and will end up as
00:00:00:00:00:00 when running "passt 6".  The MAC address is also used
for IPv6 packets, though.

Interestingly, we largely seem to get away with using an all-zero MAC, but
it's probably not a good idea.  Make the IPv6 path pick the MAC address
from its interface if the IPv4 path hasn't already done so.

While we're there, use the existing MAC_IS_ZERO macro to make the code a
little clearer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 21:57:50 +02:00
David Gibson
06abfcf6d9 Separately locate external interfaces for IPv4 and IPv6
Now that the back end allows passt/pasta to use different external
interfaces for IPv4 and IPv6, use that to do the right thing in the case
that the host has IPv4 and IPv6 connectivity via different interfaces.
If the user hasn't explicitly chosen an interface, separately search for
a suitable external interface for each protocol.

As a bonus, this substantially simplifies the external interface probe.  It
also eliminates a subtle confusing case where in some circumstances we
would pick the first interface in interface index order, and sometimes in
order of routes returned from netlink.  On some network configurations that
could cause tests to fail, because the logic in the tests was subtly
different (it always used route order).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 21:57:50 +02:00
David Gibson
4b2e018d70 Allow different external interfaces for IPv4 and IPv6 connectivity
It's quite plausible for a host to have both IPv4 and IPv6 connectivity,
but only via different interfaces.  For example, this will happen in the
case that IPv6 connectivity is via a tunnel (e.g. 6in4 or 6rd).  It would
also happen in the case that IPv4 access is via a tunnel on an otherwise
IPv6 only local network, which is a setup that might become more common in
the post IPv4 address exhaustion world.

In turns out there's no real need for passt/pasta to get its IPv4 and IPv6
connectivity via the same interface, so we can handle this situation fairly
easily.  Change the core to allow eparate external interfaces for IPv4 and
IPv6.  We don't actually set these separately for now.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 21:50:41 +02:00
Stefano Brivio
b86cd006d3 conf: Reset range endpoints after parsing one excluded port specifier
I forgot to reset the range endpoints after parsing an item of the
comma-separated list in commit 220759efb8 ("conf: Allow to specify
ranges and ports excluded from given ranges") -- fix that.

Fixes: 220759efb8 ("conf: Allow to specify ranges and ports excluded from given ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-14 16:35:57 +02:00
Stefano Brivio
220759efb8 conf: Allow to specify ranges and ports excluded from given ranges
This is useful in environments where we want to forward a large
number of ports, or all non-ephemeral ones, and some other service
running on the host needs a few selected ports.

I'm using ~ as prefix for the specification of excluded ranges and
ports to avoid the need for explicit command line quoting.

Ranges and ports can be excluded from given ranges by adding them
in the comma-separated list, prefixed by ~. Some quick examples:

  -t 5000-6000,~5555: forward ports 5000 to 6000, but not 5555

  -t ~20000-20010: forward all non-ephemeral, allowed ports, except
     for ports 20000 to 20010

...more details in usage message and man page.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-14 01:36:05 +02:00
Stefano Brivio
4de37151c9 conf: Fix initialisation of IPv6 unicast and link-local addresses
In commit 675174d4ba ("conf, tap: Split netlink and pasta
functions, allow interface configuration"), I broke the initial
setting of the observed IPv6 addresses in two ways:

- the size copied from the configured addresses corresponds to an
  IPv4 address, not to an IPv6 address

- the observed link-local address is initialised to the configured
  unicast address, not the link-local one

If we haven't seen the guest using some type of addresses yet, we
should default to the configured values, hence these initial
settings: fix both.

This resulted in UDP flows to the guest from a unique local address
on the network not working before the guest shows passt a valid
address itself, as reported by Alona.

Reported-by: Alona Paz <alkaplan@redhat.com>
Link: https://bugs.passt.top/show_bug.cgi?id=16
Fixes: 675174d4ba ("conf, tap: Split netlink and pasta functions, allow interface configuration")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-14 01:36:05 +02:00
David Gibson
aa8603e6a9 Handle the case of a DNS server on localhost
By default, passt detects the nameserver used by the host system by reading
/etc/resolv.conf, and advertises that to the guest via DHCP.  However this
breaks down if the host's nameserver is local (on 127.0.0.1 or ::1);
connecting to localhost on the guest won't reach the host's nameserver.

Using a local nameserver is a reasonably common case when using dnsmasq
or similar to merge name resolution on a home network with name resolution
from an organization-private VPN.

We already have the gateway mapping support to allow reaching host-local
services from the guest via the address of the default gateway.  Add code
to detect the case of a local DNS server and use the gateway mapping to
advertise it usefully to the guest.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-14 01:32:42 +02:00
David Gibson
c589917e71 Parse resolv.conf with new lineread implementation
Switch the resolv.conf parsing in conf.c to use the new lineread
implementation.  This means that it can now handle a resolv.conf file which
contains blank lines.

There are quite a few other fragilities with the resolv.conf parsing, but
that's out of scope for this patch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-06 08:10:55 +02:00
Stefano Brivio
465712721e conf: In conf_runas(), on static builds, group information is also unused
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-06-18 09:06:00 +02:00
Stefano Brivio
67103ea556 conf: Fix one Coverity CID 258163 warning, work around another one
In conf_runas(), Coverity reports that we might dereference uid and
gid despite possibly being NULL (CWE-476) because of the check after
the first sscanf(). They can't be NULL, but I actually wanted to
check that UID and GID are non-zero (the user could otherwise pass
--runas root:root and defy the whole mechanism).

Later on, we have the same type of warning for 'gr': it's compared
against NULL, so it might be NULL, which is actually the case: but
in that case, we don't dereference it, because we'll return -ENOENT
right away. Rewrite the clause to silence the warning.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-20 10:50:35 +02:00
Stefano Brivio
a951e0b9ef conf: Add --runas option, changing to given UID and GID if started as root
On some systems, user and group "nobody" might not be available. The
new --runas option allows to override the default "nobody" choice if
started as root.

Now that we allow this, drop the initgroups() call that was used to
add any additional groups for the given user, as that might now
grant unnecessarily broad permissions. For instance, several
distributions have a "kvm" group to allow regular user access to
/dev/kvm, and we don't need that in passt or pasta.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-19 16:27:20 +02:00
Stefano Brivio
3c6ae62510 conf, tcp, udp: Allow address specification for forwarded ports
This feature is available in slirp4netns but was missing in passt and
pasta.

Given that we don't do dynamic memory allocation, we need to bind
sockets while parsing port configuration. This means we need to
process all other options first, as they might affect addressing and
IP version support. It also implies a minor rework of how TCP and UDP
implementations bind sockets.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-01 07:19:05 +02:00
Stefano Brivio
ceddcac74a conf, tap: False "Buffer not null terminated" positives, CWE-170
Those strings are actually guaranteed to be NULL-terminated. Reported
by Coverity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-07 11:44:35 +02:00
Stefano Brivio
e46f67f152 conf: False "Assign instead of compare" positive, CWE-481
This really just needs to be an assignment before line_read() --
turn it into a for loop. Reported by Coverity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-07 11:44:35 +02:00
Stefano Brivio
0786b2e60a conf, packet: Operands don't affect result, CWE-569
Reported by Coverity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-07 11:44:35 +02:00
Stefano Brivio
62c3edd957 treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
48582bf47f treewide: Mark constant references as const
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
d2e40bb8d9 conf, util, tap: Implement --trace option for extra verbose logging
--debug can be a bit too noisy, especially as single packets or
socket messages are logged: implement a new option, --trace,
implying --debug, that enables all debug messages.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
e5bd8dbb24 conf, ndp: Disable router advertisements on --config-net
If we statically configure a default route, and also advertise it for
SLAAC, the kernel will try moments later to add the same route:

  ICMPv6: RA: ndisc_router_discovery failed to add default route

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:21:52 +01:00
Stefano Brivio
5e0c75d609 passt: Drop PASST_LEGACY_NO_OPTIONS sections
...nobody uses those builds anymore.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-22 18:42:51 +01:00
Stefano Brivio
745a9ba428 pasta: By default, quit if filesystem-bound net namespace goes away
This should be convenient for users managing filesystem-bound network
namespaces: monitor the base directory of the namespace and exit if
the namespace given as PATH or NAME target is deleted. We can't add
an inotify watch directly on the namespace directory, that won't work
with nsfs.

Add an option to disable this behaviour, --no-netns-quit.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
89678c5157 conf, udp: Introduce basic DNS forwarding
For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.

This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.

Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.

This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
01ae772dcc conf: Given IPv4 address and no netmask, assign RFC 790-style classes
Provide a sane default, instead of /0, if an address is given, and it
doesn't correspond to any host address we could find via netlink.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
eb18f862cb conf: Don't print configuration on --quiet
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
ce4e7b4d5d Makefile, conf, passt: Drop passt4netns references, explicit argc check
Nobody currently calls this as passt4netns, that was the name I used
before 'pasta', drop any reference before it's too late.

While at it, explicitly check that argc is bigger than or equal to
one, just as a defensive measure: argv[0] being NULL is not an issue
anyway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
0515adceaa passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
292c185553 passt: Address new clang-tidy warnings from LLVM 13.0.1
clang-tidy from LLVM 13.0.1 reports some new warnings from these
checkers:

- altera-unroll-loops, altera-id-dependent-backward-branch: ignore
  for the moment being, add a TODO item

- bugprone-easily-swappable-parameters: ignore, nothing to do about
  those

- readability-function-cognitive-complexity: ignore for the moment
  being, add a TODO item

- altera-struct-pack-align: ignore, alignment is forced in protocol
  headers

- concurrency-mt-unsafe: ignore for the moment being, add a TODO
  item

Fix bugprone-implicit-widening-of-multiplication-result warnings,
though, that's doable and they seem to make sense.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-30 02:59:12 +01:00
Stefano Brivio
ffc3183ac1 conf: Fix support for --stderr as short option (-e)
I forgot --stderr could also be -e, fix handling.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-27 16:44:05 +01:00
Stefano Brivio
33b1bdd079 seccomp: Add a number of alternate and per-arch syscalls
Depending on the C library, but not necessarily in all the
functions we use, statx() might be used instead of stat(),
getdents() instead of getdents64(), readlinkat() instead of
readlink(), openat() instead of open().

On aarch64, it's clone() and not fork(), and dup3() instead of
dup2() -- just allow the existing alternative instead of dealing
with per-arch selections.

Since glibc commit 9a7565403758 ("posix: Consolidate fork
implementation"), we need to allow set_robust_list() for
fork()/clone(), even in a single-threaded context.

On some architectures, epoll_pwait() is provided instead of
epoll_wait(), but never both. Same with newfstat() and
fstat(), sigreturn() and rt_sigreturn(), getdents64() and
getdents(), readlink() and readlinkat(), unlink() and
unlinkat(), whereas pipe() might not be available, but
pipe2() always is, exclusively or not.

Seen on Fedora 34: newfstatat() is used on top of fstat().

syslog() is an actual system call on some glibc/arch combinations,
instead of a connect()/send() implementation.

On ppc64 and ppc64le, _llseek(), recv(), send() and getuid()
are used. For ppc64 only: ugetrlimit() for the getrlimit()
implementation, plus sigreturn() and fcntl64().

On s390x, additionally, we need to allow socketcall() (on top
of socket()), and sigreturn() also for passt (not just for
pasta).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 16:30:59 +01:00
Stefano Brivio
4c7304db85 conf, pasta: Explicitly pass CLONE_{NEWUSER,NEWNET} to setns()
Only allow the intended types of namespaces to be joined via setns()
as a defensive measure.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 16:30:59 +01:00
Stefano Brivio
b93c2c1713 passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:57:09 +01:00
Stefano Brivio
627e18fa8a passt: Add cppcheck target, test, and address resulting warnings
...mostly false positives, but a number of very relevant ones too,
in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 09:41:13 +02:00
Stefano Brivio
dd942eaa48 passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:26:08 +02:00
Stefano Brivio
12cfa6444c passt: Add clang-tidy Makefile target and test, take care of warnings
Most are just about style and form, but a few were actually
serious mistakes (NDP-related).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:34:22 +02:00
Stefano Brivio
7d24803fb3 conf: Always pass an empty buffer to line_read() in get_dns()
Given that get_dns() touches the buffer read by line_read(), we
can't optimise that by passing the existing buffer.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
b0b77118fe passt: Address warnings from Clang's scan-build
All false positives so far.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
17600d6d6e netlink, conf: Actually get prefix/mask length
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-19 09:01:27 +02:00
Stefano Brivio
2c7d1ce088 passt: Static builds: don't redefine __vsyslog(), skip getpwnam() and initgroups()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-16 16:53:40 +02:00
Stefano Brivio
1fd0c9b0e1 util, pasta: Don't read() and lseek() every single line in read_line()
...periodically checking bound ports becomes quite expensive
otherwise.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-16 00:49:33 +02:00
Stefano Brivio
bf63832207 conf, pasta: Create a new namespace also if probing netns options failed
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-15 17:07:16 +02:00
Stefano Brivio
3c6d24dd30 netlink, pasta: Configure MTU of tap interface on --config-net
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:20:34 +02:00
Stefano Brivio
54a19002df conf: Add -P, --pid, to specify a file where own PID is written to
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:20:34 +02:00
Stefano Brivio
1cbd2c8c6b conf: Reset netns_only flag after probing
...if we check whether an option might be a namespace specification,
and it turns out not to be (e.g. with --pcap), we might set
netns_only, but we don't reset it back to 0 if it wasn't set.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:20:34 +02:00
Stefano Brivio
f45891cf26 conf, tcp, udp: Add --no-map-gw to disable mapping gateway address to host
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:19:52 +02:00
Stefano Brivio
fc93f97774 conf: Reset errno before checking port specifier with strtol(3)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:18:50 +02:00
Stefano Brivio
32d07f5e59 passt, pasta: Completely avoid dynamic memory allocation
Replace libc functions that might dynamically allocate memory with own
implementations or wrappers.

Drop brk(2) from list of allowed syscalls in seccomp profile.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:16:03 +02:00
Stefano Brivio
66d5930ec7 passt, pasta: Add seccomp support
List of allowed syscalls comes from comments in the form:
	#syscalls <list>

for syscalls needed both in passt and pasta mode, and:
	#syscalls:pasta <list>
	#syscalls:passt <list>

for syscalls specifically needed in pasta or passt mode only.

seccomp.sh builds a list of BPF statements from those comments,
prefixed by a binary search tree to keep lookup fast.

While at it, clean up a bit the Makefile using wildcards.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:15:46 +02:00
Stefano Brivio
675174d4ba conf, tap: Split netlink and pasta functions, allow interface configuration
Move netlink routines to their own file, and use netlink to configure
or fetch all the information we need, except for the TUNSETIFF ioctl.

Move pasta-specific functions to their own file as well, add
parameters and calls to configure the tap interface in the namespace.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:15:12 +02:00