During testing it is sometimes useful to force traffic which would
normally be forwared by socket splicing through the tap interface.
In this commit, we add a command switch enabling such funtionality
for inbound local traffic.
For outbound local traffic this is much trickier, if even possible,
so leave that for a later commit.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
add virtio and vhost-user functions to connect with QEMU.
$ ./passt --vhost-user
and
# qemu-system-x86_64 ... -m 4G \
-object memory-backend-memfd,id=memfd0,share=on,size=4G \
-numa node,memdev=memfd0 \
-chardev socket,id=chr0,path=/tmp/passt_1.socket \
-netdev vhost-user,id=netdev0,chardev=chr0 \
-device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \
...
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: as suggested by lvivier, include <netinet/if_ether.h>
before including <linux/if_ether.h> as C libraries such as musl
__UAPI_DEF_ETHHDR in <netinet/if_ether.h> if they already have
a definition of struct ethhdr]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
There are setups where no host interface is available or configured
at all, intentionally or not, temporarily or not, but users expect
(Podman) containers to run in any case as they did with slirp4netns,
and we're now getting reports that we broke such setups at a rather
alarming rate.
To this end, if we don't find any usable host interface, instead of
exiting:
- for IPv4, use 169.254.2.1 as guest/container address and 169.254.2.2
as default gateway
- for IPv6, don't assign any address (forcibly disable DHCPv6), and
use the *first* link-local address we observe to represent the
guest/container. Advertise fe80::1 as default gateway
- use 'tap0' as default interface name for pasta
Change ifi4 and ifi6 in struct ctx to int and accept a special -1
value meaning that no host interface was selected, but the IP family
is enabled. The fact that the kernel uses unsigned int values for
those is not an issue as 1. one can't create so many interfaces
anyway and 2. we otherwise handle those values transparently.
Fix a botched conditional in conf_print() to actually skip printing
DHCPv6 information if DHCPv6 is disabled (and skip printing NDP
information if NDP is disabled).
Link: https://github.com/containers/podman/issues/24614
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Fixes: 90e83d50a9bd ("Don't take "our" MAC address from the host")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
It's not true that there's no mapping by default: there's no mapping
in the --map-guest-addr sense, by default, but in that case
the default --map-host-loopback behaviour prevails.
While at it, fix a typo.
Fixes: 57b7bd2a48a1 ("fwd, conf: Allow NAT of the guest's assigned address")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In pasta mode, where addressing permits we "splice" connections, forwarding
directly from host socket to guest/container socket without any L2 or L3
processing. This gives us a very large performance improvement when it's
possible.
Since the traffic is from a local socket within the guest, it will go over
the guest's 'lo' interface, and accordingly we set the guest side address
to be the loopback address. However this has a surprising side effect:
sometimes guests will run services that are only supposed to be used within
the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's
forwarding exposes those services to the host, which isn't generally what
we want.
Correct this by instead forwarding inbound "splice" flows to the guest's
external address.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This section didn't mention the effect of the --map-host-loopback option
which now alters this behaviour. Update it accordingly.
It used "local addresses" to mean specifically 127.0.0.0/8 and ::1.
However, "local" could also refer to link-local addresses or to addresses
of any scope which happen to be configured on the host. Use "loopback
address" to be more precise about this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The description of this option says that it's deprecated, but unlike
--no-copy-addrs and --no-copy-routes it doesn't have a clear label. Add
one to make it easier to spot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When redirecting DNS queries with the --dns-forward option, passt/pasta
needs a host side nameserver to redirect the queries to. This is
controlled by the c->ip[46].dns_host variables. This is set to the first
first nameserver listed in the host's /etc/resolv.conf, and there isn't
currently a way to override it from the command line.
Prior to 0b25cac9 ("conf: Treat --dns addresses as guest visible
addresses") it was possible to alter this with the -D/--dns option.
However, doing so was confusing and had some nonsensical edge cases because
-D generally takes guest side addresses, rather than host side addresses.
Add a new --dns-host option to restore this functionality in a more
sensible way.
Link: https://bugs.passt.top/show_bug.cgi?id=102
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In a couple of recent reports, we've seen that it can be useful for pasta
to forward ports from addresses which are not currently configured on the
host, but might be in future. That can be done with the sysctl
net.ipv4.ip_nonlocal_bind, but that does require CAP_NET_ADMIN to set in
the first place. We can allow the same thing on a per-socket basis with
the IP_FREEBIND (or IPV6_FREEBIND) socket option.
Add a --freebind command line argument to enable this socket option on
all listening sockets.
Link: https://bugs.passt.top/show_bug.cgi?id=101
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The guest is usually assigned one of the host's IP addresses. That means
it can't access the host itself via its usual address. The
--map-host-loopback option (enabled by default with the gateway address)
allows the guest to contact the host. However, connections forwarded this
way appear on the host to have originated from the loopback interface,
which isn't always desirable.
Add a new --map-guest-addr option, which acts similarly but forwarded
connections will go to the host's external address, instead of loopback.
If '-a' is used, so the guest's address is not the same as the host's, this
will instead forward to whatever host-visible site is shadowed by the
guest's assigned address.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Because the host and guest share the same IP address with passt/pasta, it's
not possible for the guest to directly address the host. Therefore we
allow packets from the guest going to a special "NAT to host" address to be
redirected to the host, appearing there as though they have both source and
destination address of loopback.
Currently that special address is always the address of the default
gateway (or none). That can be a problem if we want that gateway to be
addressable by the guest. Therefore, allow the special "NAT to host"
address to be overridden on the command line with a new --map-host-loopback
option.
In order to exercise and test it, update the passt_in_ns and perf
tests to use this option and give different mapping addresses for the
two layers of the environment.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Although it's not 100% explicit in the man page, addresses given to
the --dns option are intended to be addresses as seen by the guest.
This differs from addresses taken from the host's /etc/resolv.conf,
which must be translated to guest accessible versions in some cases.
Our implementation is currently inconsistent on this: when using
--dns-forward, you must usually also give --dns with the matching address,
which is meaningful only in the guest's address view. However if you give
--dns with a loopback addres, it will be translated like a host view
address.
Move the remapping logic for DNS addresses out of add_dns4() and add_dns6()
into add_dns_resolv() so that it is only applied for host nameserver
addresses, not for nameservers given explicitly with --dns.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Paul reports that setting IPv4 address and gateway manually, using
--address and --gateway, causes pasta to fail inserting IPv6 routes
in a setup where multiple, inter-dependent IPv6 routes are present
on the host.
That's because, currently, any -g option implies --no-copy-routes
altogether, and any -a implies --no-copy-addrs.
Limit this implication to the matching IP version, instead, by having
two copies of no_copy_routes and no_copy_addrs in the context
structure, separately for IPv4 and IPv6.
While at it, change them to 'bool': we had them as 'int' because
getopt_long() used to set them directly, but it hasn't been the case
for a while already.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
passt/pasta has options to redirect DNS requests from the guest to a
different server address on the host side. Currently, however, only UDP
packets to port 53 are considered "DNS requests". This ignores DNS
requests over TCP - less common, but certainly possible. It also ignores
encrypted DNS requests on port 853.
Extend the DNS forwarding logic to handle both of those cases.
Link: https://github.com/containers/podman/issues/23239
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The original behaviour of printing messages to standard error by
default when running from a non-interactive terminal was introduced
because the first KubeVirt integration draft used to start passt in
foreground and get messages via standard error.
For development purposes, the system logger was more convenient at
that point, and passt was running from interactive terminals only if
not started by the KubeVirt integration.
This behaviour was introduced by 84a62b79a2bc ("passt: Also log to
stderr, don't fork to background if not interactive").
Later, I added command-line options in 1e49d194d017 ("passt, pasta:
Introduce command-line options and port re-mapping") and accidentally
reversed this condition, which wasn't a problem as --stderr could
force printing to standard error anyway (and it was used by KubeVirt).
Nowadays, the KubeVirt integration uses a log file (requested via
libvirt configuration), and the same applies for Podman if one
actually needs to look at runtime logs. There are no use cases left,
as far as I know, where passt runs in foreground in non-interactive
terminals.
Seize the chance to reintroduce some sanity here. If we fork to
background, standard error is closed, so --stderr is useless in that
case.
If we run in foreground, there's no harm in printing messages to
standard error, and that accidentally became the default behaviour
anyway, so --stderr is not needed in that case.
It would be needed for non-interactive terminals, but there are no
use cases, and if there were, let's log to standard error anyway:
the user can always redirect standard error to /dev/null if needed.
Before we're up and running, we need to print to standard error anyway
if something happens, otherwise we can't report failure to start in
any kind of usage, stand-alone or in integrations.
So, make --stderr do nothing, and deprecate it.
While at it, drop a left-over comment about --foreground being the
default only for interactive terminals, because it's not the case
anymore.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In multiple occasions, especially when passt(1) and pasta(1) are used
in integrations such as the one with Podman, the ability to override
earlier options on the command line with later one would have been
convenient.
Recently, to debug a number of issues happening with Podman, I would
have liked to ask users to share a debug log by passing --debug as
additional option, but pasta refuses --quiet (always passed by Podman)
and --debug at the same time.
On top of this, Podman lets users specify other pasta options in its
containers.conf(5) file, as well as on the command line.
The options from the configuration files are appended together with
the ones from the command line, which makes it impossible for users to
override options from the configuration file, if duplicated options
are refused, unless Podman takes care of sorting them, which is
clearly not sustainable.
For --debug and --trace, somebody took care of this on Podman side at:
https://github.com/containers/common/pull/2052
but this doesn't fix the issue with other options, and we'll have
anyway older versions of Podman around, too.
I think there's some value in telling users about duplicated or
conflicting options, because that might reveal issues in integrations
or accidental misconfigurations, but by now I'm fairly convinced that
the downsides outweigh this.
Drop checks about duplicate options and mutually exclusive ones. In
some cases, we need to also undo a couple of initialisations caused
by earlier options, but this looks like a simplification, overall.
Notable exception: --stderr still conflicts with --log-file, because
users might have the expectation that they don't actually conflict.
But they do conflict in the existing implementation, so it's safer
to make sure that the users notice that.
Suggested-by: Paul Holzinger <pholzing@redhat.com>
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Paul Holzinger <pholzing@redhat.com>
The SPDX identifier states GPL-2.0-or-later but the copyright section
mentions GPL-3.0 or later causing a mismatch.
Also, only correctly refers to GPL instead of AGPL.
Signed-off-by: Danish Prakash <contact@danishpraka.sh>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While commit f919dc7a4b1c ("conf, netlink: Don't require a default
route to start") sounded reasonable in the assumption that, if we
don't find default routes for a given address family, we can still
proceed by selecting an interface with any route *iff it's the only
one for that protocol family*, Jelle reported a further issue in a
similar setup.
There, multiple interfaces are present, and while remote container
connectivity doesn't matter for the container, local connectivity is
desired. There are no default routes, but those multiple interfaces
all have non-default routes, so we should just pick one and start.
Pick the first interface reported by the kernel with any route, if
there are no default routes. There should be no harm in doing so.
Reported-by: Jelle van der Waa <jvanderwaa@redhat.com>
Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2277954
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Holzinger <pholzing@redhat.com>
There might be isolated testing environments where default routes and
global connectivity are not needed, a single interface has all
non-loopback addresses and routes, and still passt and pasta are
expected to work.
In this case, it's pretty obvious what our upstream interface should
be, so go ahead and select the only interface with at least one
route, disabling DHCP and implying --no-map-gw as the documentation
already states.
If there are multiple interfaces with routes, though, refuse to start,
because at that point it's really not clear what we should do.
Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://github.com/containers/podman/issues/21896
Signed-off-by: Stefano brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We watch network namespace entries to detect when we should quit
(unless --no-netns-quit is passed), and these might stored in a tmpfs
typically mounted at /run/user/UID or /var/run/user/UID, or found in
procfs at /proc/PID/ns/.
Currently, we try to use inotify for any possible location of those
entries, but inotify, of course, doesn't work on pseudo-filesystems
(see inotify(7)).
The man page reflects this: the description of --no-netns-quit
implies that we won't quit anyway if the namespace is not "bound to
the filesystem".
Well, we won't quit, but, since commit 9e0dbc894813 ("More
deterministic detection of whether argument is a PID, PATH or NAME"),
we try. And, indeed, this is harmless, as the caveat from that
commit message states.
Now, it turns out that Buildah, a tool to create container images,
sharing its codebase with Podman, passes a procfs entry to pasta, and
expects pasta to exit once the network namespace is not needed
anymore, that is, once the original container process, also spawned
by Buildah, terminates.
Get this to work by using the timer fallback mechanism if the
namespace name is passed as a path belonging to a pseudo-filesystem.
This is expected to be procfs, but I covered sysfs and devpts
pseudo-filesystems as well, because nothing actually prevents
creating this kind of directory structure and links there.
Note that fstatfs(), according to some versions of man pages, was
apparently "deprecated" by the LSB. My reasoning for using it is
essentially this:
https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
...that is, there was no such thing as an LSB deprecation, and
anyway there's no other way to get the filesystem type.
Also note that, while it might sound more obvious to detect the
filesystem type using fstatfs() on the file descriptor itself
(c->pasta_netns_fd), the reported filesystem type for it is nsfs, no
matter what path was given to pasta. If we use the parent directory,
we'll typically have either tmpfs or procfs reported.
If the target namespace is given as a PID, or as a PID-based procfs
entry, we don't risk races if this PID is recycled: our handle on
/proc/PID/ns will always refer to the original namespace associated
with that PID, and we don't re-open this entry from procfs to check
it.
There's, however, a remaining race possibility if the parent process
is not the one associated to the network namespace we operate on: in
that case, the parent might pass a procfs entry associated to a PID
that was recycled by the time we parse it. This can't happen if the
namespace PID matches the one of the parent, because we detach from
the controlling terminal after parsing the namespace reference.
To avoid this type of race, if desired, we could add the option for
the parent to pass a PID file descriptor, that the parent obtained
via pidfd_open(). This is beyond the scope of this change.
Update the man page to reflect that, even if the target network
namespace is passed as a procfs path or a PID, we'll now quit when
the procfs entry is gone.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: https://github.com/containers/podman/pull/21563#issuecomment-1948200214
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...or similar, that is, if only excluded ranges are given (implying
we'll forward any other available port). In that case, we'll usually
forward large sets of ports, and it might be inconvenient for the
user to skip excluding single ports that are already taken.
The existing behaviour, that is, exiting only if we fail to bind all
the ports for one given forwarding option, turns out to be
problematic for several aspects raised by Paul:
- Podman merges ranges anyway, so we might fail to bind all the ports
from a specific range given by the user, but we'll not fail anyway
because Podman merges it with another one where we succeed to bind
at least one port. At the same time, there should be no semantic
difference between multiple ranges given by a single option and
multiple ranges given as multiple options: it's unexpected and
not documented
- the user might actually rely on a given port to be forwarded to a
given container or a virtual machine, and if connections are
forwarded to an unrelated process, this might raise security
concerns
- given that we can try and fail to bind multiple ports before
exiting (in case we can't bind any), we don't have a specific error
code we can return to the user, so we don't give the user helpful
indication as to why we couldn't bind ports.
Exit as soon as we fail to create or bind a socket for a given
forwarded port, and report the actual error.
Keep the current behaviour, however, in case the user wants to
forward all the (available) ports for a given protocol, or all the
ports with excluded ranges only. There, it's more reasonable that
the user is expecting partial failures, and it's probably convenient
that we continue with the ports we could forward.
Update the manual page to reflect the new behaviour, and the old
behaviour too in the cases where we keep it.
Suggested-by: Paul Holzinger <pholzing@redhat.com>
Link: https://github.com/containers/podman/pull/21563#issuecomment-1937024642
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If the default route for a given IP version is a multipath one,
instead of refusing to start because there's no RTA_GATEWAY attribute
in the set returned by the kernel, we can just pick one of the paths.
To make this somewhat less arbitrary, pick the path with the highest
weight, if weights differ.
Reported-by: Ed Santiago <santiago@redhat.com>
Link: https://github.com/containers/podman/issues/20927
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
pasta supports automatic port forwarding, where we look for listening
sockets in /proc/net (in both namespace and outside) and establish port
forwarding to match.
For TCP we do this scan both at initial startup, then periodically
thereafter. For UDP however, we currently only scan at start. So unlike
TCP we won't update forwarding to handle services that start after pasta
has begun.
There's no particular reason for that, other than that we didn't implement
it. So, remove that difference, by scanning for new UDP forwards
periodically too. The logic is basically identical to that for TCP, but it
needs some changes to handle the mildly different data structures in the
UDP case.
Link: https://bugs.passt.top/show_bug.cgi?id=45
Link: https://github.com/rootless-containers/rootlesskit/issues/383
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Use the newly-introduced NL_DUP mode for nl_addr() to copy all the
addresses associated to the template interface in the outer
namespace, unless --no-copy-addrs (also implied by -a) is given.
This option is introduced as deprecated right away: it's not expected
to be of any use, but it's helpful to keep it around for a while to
debug any suspected issue with this change.
This is done mostly for consistency with routes. It might partially
cover the issue at:
https://bugs.passt.top/show_bug.cgi?id=47
Support multiple addresses per address family
for some use cases, but not the originally intended one: we'll still
use a single outbound address (unless the routing table specifies
different preferred source addresses depending on the destination),
regardless of the address used in the target namespace.
Link: https://bugs.passt.top/show_bug.cgi?id=47
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If we use a template interface without a gateway on the default
route, we can still offer almost complete functionality, except that,
of course, we can't map the gateway address to the outer namespace or
host, and that we have no obvious server address or identifier for
use in DHCP's siaddr and option 54 (Server identifier, mandatory).
Continue, if we have a default route but no default gateway, and
imply --no-map-gw and --no-dhcp in that case. NDP responder and
DHCPv6 should be able to work as usual because we require a
link-local address to be present, and we'll fall back to that.
Together with the previous commits implementing an actual copy of
routes from the outer namespace, this should finally fix the
operation of 'pasta --config-net' for cases where we have a default
route on the host, but no default gateway, as it's the case for
tap-style routes, including typical Wireguard endpoints.
Reported-by: me@yawnt.com
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Use the newly-introduced NL_DUP mode for nl_route() to copy all the
routes associated to the template interface in the outer namespace,
unless --no-copy-routes (also implied by -g) is given.
This option is introduced as deprecated right away: it's not expected
to be of any use, but it's helpful to keep it around for a while to
debug any suspected issue with this change.
Otherwise, we can't use default gateways which are not, address-wise,
on the same subnet as the container, as reported by Callum.
Reported-by: Callum Parsey <callum@neoninteger.au>
Link: https://github.com/containers/podman/issues/18539
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Somebody might want to bind listening sockets to a specific
interface, but not a specific address, and there isn't really a
reason to prevent that. For example:
-t %eth0/2022
Alternatively, we support options such as -t 0.0.0.0%eth0/2022 and
-t ::%eth0/2022, but not together, for the same port.
Enable this kind of syntax and add examples to the man page.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: https://github.com/containers/podman/issues/14425#issuecomment-1485192195
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
By default, 65520 bytes are advertised, and zero disables DHCP and
NDP options.
Fixes: ec2b58ea4dc4 ("conf, dhcp, ndp: Fix message about default MTU, make NDP consistent")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I didn't notice earlier: libslirp (and slirp4netns) supports binding
outbound sockets to specific IPv4 and IPv6 addresses, to force the
source addresse selection. If we want to claim feature parity, we
should implement that as well.
Further, Podman supports specifying outbound interfaces as well, but
this is simply done by resolving the primary address for an interface
when the network back-end is started. However, since kernel version
5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for
non-root users"), we can actually bind to a specific interface name,
which doesn't need to be validated in advance.
Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses,
and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets
to given interfaces.
Given that it probably makes little sense to select addresses and
routes from interfaces different than the ones given for outbound
sockets, also assign those as "template" interfaces, by default,
unless explicitly overridden by '-i'.
For ICMP and UDP, we call sock_l4() to open outbound sockets, as we
already needed to bind to given ports or echo identifiers, and we
can bind() a socket only once: there, pass address (if any) and
interface (if any) for the existing bind() and setsockopt() calls.
For TCP, in general, we wouldn't otherwise bind sockets. Add a
specific helper to do that.
For UDP outbound sockets, we need to know if the final destination
of the socket is a loopback address, before we decide whether it
makes sense to bind the socket at all: move the block mangling the
address destination before the creation of the socket in the IPv4
path. This was already the case for the IPv6 path.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Based on a patch from Laine, and reports from Laine and Yalan: fix
the "22-80:32-90" example, and improve wording for the other ones:
instead of using "to" to denote the end of a range, use "between ...
and", so that it's clear we're *not* referring to target ports.
Reported-by: Laine Stump <laine@redhat.com>
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: da20f57f19dc ("passt, qrap: Add man pages")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This passes a fully connected stream socket to passt.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
[sbrivio: reuse fd_tap instead of adding a new descriptor,
imply --one-off on --fd, add to optstring and usage()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We can't get rid of qrap quite yet, but at least we should start
telling users it's not going to be needed anymore starting from qemu
7.2.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Having -f implied by -d (and --trace) usually saves some typing, but
debug mode in background (with a log file) is quite useful if pasta
is started by Podman, and is probably going to be handy for passt
with libvirt later, too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).
Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.
Reported use case: running local services that use ports we want
to have externally forwarded:
https://github.com/containers/podman/issues/14425
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
This is practical to avoid explicit lifecycle management in users,
e.g. libvirtd, and is trivial to implement.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In some environments, such as KubeVirt pods, we might not have a
system logger available. We could choose to run in foreground, but
this takes away the convenient synchronisation mechanism derived from
forking to background when interfaces are ready.
Add optional logging to file with -l/--log-file and --log-size.
Unfortunately, this means we need to duplicate features that are more
appropriately implemented by a system logger, such as rotation. Keep
that reasonably simple, by using fallocate() with range collapsing
where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and
falling back to an unsophisticated block-by-block moving of entries
toward the beginning of the file once we reach the (mandatory) size
limit.
While at it, clarify the role of LOG_EMERG in passt.c.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Currently --userns is only allowed when pasta is attaching to an existing
netns or PID, and is prohibited when creating a new netns by spawning a
command or shell.
With the new handling of userns, this check isn't neccessary. I'm not sure
if there's any use case for --userns with a spawned command, but it's
strictly more flexible and requires zero extra code, so we might as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
check_root() both checks to see if we are root (in the init namespace),
and if we are drops to an unprivileged user. To make future cleanups
simpler, split the checking for root (now in check_root()) from the actual
dropping of privilege (now in drop_root()).
Note that this does slightly alter semantics. Previously we would only
setuid() if we were originally root (in the init namespace). Now we will
always setuid() and setgid(), though it won't actually change anything if
we weren't privileged to begin with. This also means that we will now
always attempt to switch to the user specified with --runas, even if we
aren't (init namespace) root to begin with. Obviously this will fail with
an error if we weren't privileged to start with. --help and the man page
are updated accordingly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When not given an existing PID or network namspace to attach to, pasta
spawns a shell. Most commands which can spawn a shell in an altered
environment can also run other commands in that same environment, which can
be useful in automation.
Allow pasta to do the same thing; it can be given an arbitrary command to
run in the network and user namespace which pasta creates. If neither a
command nor an existing PID or netns to attach to is given, continue to
spawn a default shell, as before.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When attaching to an existing namespace, pasta can take a PID or the name
or path of a network namespace as a non-option parameter. We disambiguate
based on what the parameter looks like. Make this more explicit by using
a --netns option for explicitly giving the path or name, and treating a
non-option argument always as a PID.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Fix typo in man page]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
pasta can identify a netns as a "name", which is to say a path relative to
(usually) /run/netns, which is the place that ip(8) creates persistent
network namespaces. Alternatively a full path to a netns can be given.
The --nsrun-dir option allows the user to change the standard path where
netns names are resolved. However, there's no real point to this, if the
user wants to override the location of the netns, they can just as easily
use the full path to specify the netns.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The man page states that the --userns option can be given either as a path
or as a name relative to --nsrun-dir. This is not correct: as the name
suggests --nsrun-dir is (correctly) used only for *netns* resolution, not
*userns* resolution.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>