1
0
mirror of https://passt.top/passt synced 2024-12-22 13:45:32 +00:00
Commit Graph

82 Commits

Author SHA1 Message Date
Stefano Brivio
32c386834d netlink: Fix typo in function comment for nl_addr_set()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-18 01:29:52 +02:00
Stefano Brivio
d6f0220731 netlink, pasta: Fetch link-local address from namespace interface once it's up
As soon as we bring up the interface, the Linux kernel will set up a
link-local address for it, so we can fetch it and start using right
away, if we need a link-local address to communicate to the container
before we see any traffic coming from it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-18 01:29:52 +02:00
Stefano Brivio
74e508cf79 netlink, pasta: Disable DAD for link-local addresses on namespace interface
It makes no sense for a container or a guest to try and perform
duplicate address detection for their link-local address, as we'll
anyway not relay neighbour solicitations with an unspecified source
address.

While they perform duplicate address detection, the link-local address
is not usable, which prevents us from bringing up especially
containers and communicate with them right away via IPv6.

This is not enough to prevent DAD and reach the container right away:
we'll need a couple more patches.

As we send NLM_F_REPLACE requests right away, while we still have to
read out other addresses on the same socket, we can't use nl_do():
keep track of the last sequence we sent (last address we changed), and
deal with the answers to those NLM_F_REPLACE requests in a separate
loop, later.

Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-18 01:29:38 +02:00
Stefano Brivio
0c74068f56 netlink, pasta: Turn nl_link_up() into a generic function to set link flags
In the next patches, we'll reuse it to set flags other than IFF_UP.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-15 09:14:47 +02:00
Stefano Brivio
8231ce54c3 netlink, pasta: Split MTU setting functionality out of nl_link_up()
As we'll use nl_link_up() for more than just bringing up devices, it
will become awkward to carry empty MTU values around whenever we call
it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-15 09:14:43 +02:00
Stefano Brivio
b91d3373ac netlink: Fix typo in function comment for nl_addr_get()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-08-15 09:14:29 +02:00
Stefano Brivio
dba7f0f5ce treewide: Replace strerror() calls
Now that we have logging functions embedding perror() functionality,
we can make _some_ calls more terse by using them. In many places,
the strerror() calls are still more convenient because, for example,
they are used in flow debugging functions, or because the return code
variable of interest is not 'errno'.

While at it, convert a few error messages from a scant perror style
to proper failure descriptions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-06-21 15:32:44 +02:00
Stefano Brivio
62de6140d9 netlink: Strip nexthop identifiers when duplicating routes
If routing daemons set up host routes, for example FRR via OSPF as in
the reported issue, they might add nexthop identifiers (not objects)
that are generally not valid in the target namespace. Strip them off
as well, otherwise we'll get EINVAL from the kernel.

Link: https://github.com/containers/podman/issues/22960
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-06-20 17:03:28 +02:00
Stefano Brivio
f301bb18b5 netlink: Ignore EHOSTUNREACH failures when duplicating routes
To implicitly resolve possible dependencies between routes as we
duplicate them into the target namespace, we go through a set of n
routes n times, and ignore EEXIST responses to netlink messages (we
already inserted the route) and ENETUNREACH (we didn't insert the
route yet, but we need to insert another one first).

Until now, we didn't ignore EHOSTUNREACH responses. However,
NetworkManager users with multiple non-subnet routes for the same
interface report that pasta exits with "no route to host" while
duplicating routes.

This happens because NetworkManager sets the 'noprefixroute' attribute
on addresses, meaning that the kernel won't create subnet routes
automatically depending on the prefix length of the address. We copy
this attribute as we copy the address into the target namespace, and
as a result, the kernel doesn't create subnet routes in the target
namespace either.

This means that the gateway for routes that are inserted later can be
unreachable at some points during the sequence of route duplication.
That is, we don't just have dependencies between regular routes, but
we can also have dependencies between regular routes and subnet
routes, as subnet routes are not automatically inserted in advance.

Link: https://github.com/containers/podman/issues/22824
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-06-19 15:00:55 +02:00
Stefano Brivio
450a6131be netlink: With no default route, pick the first interface with a route
While commit f919dc7a4b ("conf, netlink: Don't require a default
route to start") sounded reasonable in the assumption that, if we
don't find default routes for a given address family, we can still
proceed by selecting an interface with any route *iff it's the only
one for that protocol family*, Jelle reported a further issue in a
similar setup.

There, multiple interfaces are present, and while remote container
connectivity doesn't matter for the container, local connectivity is
desired. There are no default routes, but those multiple interfaces
all have non-default routes, so we should just pick one and start.

Pick the first interface reported by the kernel with any route, if
there are no default routes. There should be no harm in doing so.

Reported-by: Jelle van der Waa <jvanderwaa@redhat.com>
Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2277954
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Holzinger <pholzing@redhat.com>
2024-06-19 15:00:55 +02:00
David Gibson
3f917b326b netlink, test: Ignore deprecated addresses
When we retrieve or copy host addresses we can include deprecated
addresses, which is not what we want.  Adjust our logic to exclude them.
Similarly our tests can retrieve deprecated addresses, so exclude them
there too.

I hit this in practice because my router sometimes temporarily advertises
an fd00:: prefix before the real delegated IPv6 prefix.  The deprecated
address can hang around for some time messing up my tests.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-22 23:21:09 +02:00
Stefano Brivio
623c2fd621 netlink: Don't duplicate routes referring to unrelated host interfaces
We take care of this in nl_addr_dup(): if the interface index
associated to an address doesn't match the selected host interface
(ifa->ifa_index != ifi_src), we don't copy that address.

But for routes, we just unconditionally update the interface index to
match the index in the target namespace, even if the source interface
didn't match.

This might happen in two cases: with a pre-4.20 kernel without support
for NETLINK_GET_STRICT_CHK, which won't filter routes based on the
interface we pass in the request, as reported by runsisi, and any
kernel with support for multipath routes where any of the nexthops
refers to an unrelated host interface.

In both cases, check the index of the source interface, and avoid
copying unrelated routes.

Reported-by: runsisi <runsisi@hust.edu.cn>
Link: https://bugs.passt.top/show_bug.cgi?id=86
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: runsisi <runsisi@hust.edu.cn>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-05-11 00:52:19 +02:00
Stefano Brivio
76e32022c4 netlink: Fix iterations over nexthop objects
Somewhat confusingly, RTNH_NEXT(), as defined by <linux/rtnetlink.h>,
doesn't take an attribute length parameter like RTA_NEXT() does, and
I just modelled loops over nexthops after RTA loops, forgetting to
decrease the remaining length we pass to RTNH_OK().

In practice, this didn't cause issue in any of the combinations I
checked, at least without the next patch.

We seem to be the only user of RTNH_OK(): even iproute2 has an
open-coded version of it in print_rta_multipath() (ip/iproute.c).

Introduce RTNH_NEXT_AND_DEC(), similar to RTA_NEXT(), and use it.

Fixes: 6c7623d07b ("netlink: Add support to fetch default gateway from multipath routes")
Fixes: f4e38b5cd2 ("netlink: Adjust interface index inside copied nexthop objects too")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-05-02 16:12:45 +02:00
Stefano Brivio
d03c4e2020 netlink: Use IFA_F_NODAD also while duplicating addresses from the host
...not just for a single set address (legacy operation with
--no-copy-addrs). I forgot to add this to nl_addr_dup().

Note that we can have two version of flags: the 8-bit ifa_flags in
ifaddrmsg, and the newer 32-bit version as IFA_FLAGS attribute, which
is given priority if present. Make sure IFA_F_NODAD is set in both.

Without this, a Podman user reports, something on the lines of:
  pasta --config-net -- ping -c1 -6 passt.top

would fail as the kernel would start Duplicate Address Detection
once we configure the address, which can't really work (and doesn't
make sense) in this case.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-04-26 07:46:54 +02:00
Stefano Brivio
bfc83b54c4 netlink: For IPv4, IFA_LOCAL is the interface address, not IFA_ADDRESS
See the comment to the unnamed enum in linux/if_addr.h, which
currently states:

  /*
   * Important comment:
   * IFA_ADDRESS is prefix address, rather than local interface address.
   * It makes no difference for normally configured broadcast interfaces,
   * but for point-to-point IFA_ADDRESS is DESTINATION address,
   * local address is supplied in IFA_LOCAL attribute.
   *
   * [...]
   */

if we fetch IFA_ADDRESS, and we have a point-to-point link with a peer
address configured, we'll source the peer address as "our" address,
and refuse to resolve it in arp().

This was reported with pasta and a tun upstream interface configured
by OpenVPN in "p2p" topology: the target namespace will have similar
addresses and routes as the host, which is fine, and will try to
resolve the point-to-point peer address (because it's the default
gateway).

Given that we configure it as our address (only internally, not
visibly in the namespace), we'll fail to resolve that and traffic
doesn't go anywhere.

Note that this is not the case for IPv6: there, IFA_ADDRESS is the
actual, local address of the interface, and IFA_LOCAL is not
necessarily present, so the comment in linux/if_addr.h doesn't apply
either.

Link: https://github.com/containers/podman/issues/22320
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-04-26 07:46:42 +02:00
David Gibson
97e8b33f87 netlink: Ignore routes to link-local addresses for selecting interface
Since f919dc7a4b ("conf, netlink: Don't require a default route to
start"), and since 639fdf06ed ("netlink: Fix selection of template
interface") less buggily, we haven't required a default route on the host
in order to operate.  Instead, if we lack a default route we'll pick an
interface with any route, as long as there's only one such interface.  If
there's more than one, we don't have a good criterion to pick, so we give
up with an informational message.

Paul Holzinger pointed out that this code considers it ambiguous even if
all but one of the interfaces has only routes to link-local addresses
(fe80::/10).  A route to link-local addresses isn't really useful from
pasta's point of view, so ignore them instead.  This removes a misleading
message in many cases, and a spurious failure in some cases.

Suggested-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-04-05 16:59:08 +02:00
David Gibson
67a6258918 util: Add helper to return name of address family
We have a few places where we want to include the name of the internet
protocol version (IPv4 or IPv6) in a message, which we handle with an
open-coded ?: expression.

This seems like something that might be more widely useful, so make a
trivial helper to return the correct string based on the address family.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-04-05 16:59:05 +02:00
Stefano Brivio
f4e38b5cd2 netlink: Adjust interface index inside copied nexthop objects too
As pasta duplicates host routes into the target namespaces, interface
indices might not match, so we go through RTA_OIF attributes and fix
them up to match the identifier in the namespace.

But RTA_OIF is not the ony attribute specifying interfaces for routes:
multipath routes use RTA_MULTIPATH attributes with nexthop objects,
which contain in turn interface indices. Fix them up as well.

If we don't, and we have at least two host interfaces, and the host
interface we use as template isn't the first one (hence the
mismatching indices), we'll fail to insert multipath routes with
nexthop objects, and ultimately refuse to start as the kernel
unexpectedly gives us ENODEV.

Link: https://github.com/containers/podman/issues/22192
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-04-05 16:58:52 +02:00
David Gibson
639fdf06ed netlink: Fix selection of template interface
Since f919dc7a4b ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version.  Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query.  This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.

Rewrite the interface detection to avoid this problem.  While we're there:
  * Perform detection of both default and non-default routes in a single
    pass, avoiding an ugly goto
  * Give more detail on error and working but unusual paths about the
    situation (no suitable interface, multiple possible candidates, etc.).

Fixes: f919dc7a4b ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
 IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 09:34:08 +01:00
David Gibson
d35bcbee90 netlink: Fix handling of NLMSG_DONE in nl_route_dup()
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.

Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro.  However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.

The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request.  It needs to get all the routes, then make multiple passes
through them.  We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.

After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them.  It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).

What we do after the loop depends on which exit case we had.  If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.

We attempt to check for that second case by re-checking NLMSG_OK(nh,
status).  However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check.  To fix this we need separate variables for the number of bytes
left and the final status code.

We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.

Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-19 10:23:34 +01:00
Stefano Brivio
f919dc7a4b conf, netlink: Don't require a default route to start
There might be isolated testing environments where default routes and
global connectivity are not needed, a single interface has all
non-loopback addresses and routes, and still passt and pasta are
expected to work.

In this case, it's pretty obvious what our upstream interface should
be, so go ahead and select the only interface with at least one
route, disabling DHCP and implying --no-map-gw as the documentation
already states.

If there are multiple interfaces with routes, though, refuse to start,
because at that point it's really not clear what we should do.

Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://github.com/containers/podman/issues/21896
Signed-off-by: Stefano brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-03-18 08:57:21 +01:00
Stefano Brivio
f00b153414 netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE
Martin reports that, with Fedora Linux kernel version
kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64,
including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same
read() as families"), pasta doesn't exit once the network namespace
is gone.

Actually, pasta is completely non-functional, at least with default
options, because nl_route_dup(), which duplicates routes from the
parent namespace into the target namespace at start-up, is stuck on
a second receive operation for RTM_GETROUTE.

However, with that commit, the kernel is now able to fit the whole
response, including the NLMSG_DONE message, into a single datagram,
so no further messages will be received.

It turns out that commit 4d6e9d0816 ("netlink: Always process all
responses to a netlink request") accidentally relied on the fact that
we would always get at least two datagrams as a response to
RTM_GETROUTE.

That is, the test to check if we expect another datagram, is based
on the 'status' variable, which is 0 if we just parsed NLMSG_DONE,
but we'll also expect another datagram if NLMSG_OK on the last
message is false. But NLMSG_OK with a zero length is always false.

The problem is that we don't distinguish if status is zero because
we got a NLMSG_DONE message, or because we processed all the
available datagram bytes.

Introduce an explicit check on NLMSG_DONE. We should probably
refactor this slightly, for example by introducing a special return
code from nl_status(), but this is probably the least invasive fix
for the issue at hand.

Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://github.com/containers/podman/issues/22052
Fixes: 4d6e9d0816 ("netlink: Always process all responses to a netlink request")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-03-18 08:56:32 +01:00
David Gibson
9f57983886 netlink: Use const rtnh pointer
6c7623d07 ("netlink: Add support to fetch default gateway from multipath
routes") inadvertently introduced a new cppcheck warning for a variable
which could be a const pointer but isn't.  This occurs with
cppcheck-2.13.0-1.fc39.x86_64 in Fedora 39 at least.

Fixes: 6c7623d07b ("netlink: Add support to fetch default gateway from multipath routes")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-14 01:10:47 +01:00
Stefano Brivio
6c7623d07b netlink: Add support to fetch default gateway from multipath routes
If the default route for a given IP version is a multipath one,
instead of refusing to start because there's no RTA_GATEWAY attribute
in the set returned by the kernel, we can just pick one of the paths.

To make this somewhat less arbitrary, pick the path with the highest
weight, if weights differ.

Reported-by: Ed Santiago <santiago@redhat.com>
Link: https://github.com/containers/podman/issues/20927
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-02-09 13:24:33 +01:00
Stefano Brivio
f091893c1f netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb245
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.

Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.

The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.

The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.

The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.

Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.

Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb245 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-30 11:45:27 +01:00
Stefano Brivio
06559048e7 treewide: Use 'z' length modifier for size_t/ssize_t conversions
Types size_t and ssize_t are not necessarily long, it depends on the
architecture.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-12-02 03:54:42 +01:00
Stefano Brivio
b944622969 netlink: Sequence numbers are actually 32 bits wide
Harmless, as we use sequence numbers monotonically anyway, but now
clang-tidy reports:

/home/sbrivio/passt/netlink.c:155:7: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [clang-diagnostic-format,-warnings-as-errors]
                    nh->nlmsg_seq, seq);
                    ^
/home/sbrivio/passt/log.h:26:7: note: expanded from macro 'die'
                err(__VA_ARGS__);                                       \
                    ^~~~~~~~~~~
/home/sbrivio/passt/log.h:19:34: note: expanded from macro 'err'
                                        ^~~~~~~~~~~
Suppressed 222820 warnings (222816 in non-user code, 4 NOLINT).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
1 warning treated as error
make: *** [Makefile:255: clang-tidy] Error 1

Fixes: 9d4ab98d53 ("netlink: Add nl_do() helper for simple operations with error checking")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-11-07 12:22:13 +01:00
David Gibson
6471c7d01b cppcheck: Make many pointers const
Newer versions of cppcheck (as of 2.12.0, at least) added a warning for
pointers which could be declared to point at const data, but aren't.
Based on that, make many pointers throughout the codebase const.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-10-04 23:23:35 +02:00
David Gibson
a7e4bfb857 pasta: Strip RTA_PREFSRC when copying routes to the namespace
Host routes can include a preferred source address (RTA_PREFSRC), which
must be one of the host's addresses.  However when using pasta with -a the
namespace might be given a different address, not on the host.  This seems
to occur pretty routinely depending on the network configuration systems
in place on the host.

With --config-net we will try to copy host routes to the namespace.  If
one of those includes an RTA_PREFSRC, but the namespace doesn't have the
host address, this will fail with -EINVAL, causing pasta to fail.

Fix this by stripping off RTA_PREFSRC attributes from routes as we copy
them to the namespace.  This is by no means infallible, bit it should at
least handle common cases for the time being.

Link: https://bugs.passt.top/show_bug.cgi?id=71
Link: https://github.com/containers/podman/pull/19699#issuecomment-1688769287
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-23 15:52:31 +02:00
Stefano Brivio
5e4f7b92b0 netlink: Set IFA_ADDRESS, not just IFA_LOCAL, while adding IPv4 addresses
Otherwise, we actually configure the address, but it's not usable
because no local route is added by the kernel.

Link: https://github.com/containers/podman/pull/19699
Fixes: cfe7509e5c ("netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-23 09:34:44 +02:00
David Gibson
da0aeb9080 netlink: Don't propagate host address expiry to the container
When we copy addresses from the host to the container in nl_addr_dup(), we
copy all the address's attributes, including IFA_CACHEINFO, which controls
the address's lifetime.  If the host address is managed by, for example,
DHCP, it will typically have a finite lifetime.

When we copy that lifetime to the pasta container, that lifetime will
remain, meaning the kernel will eventually remove the address, typically
some hours later.  The container, however, won't have the DHCP client or
whatever was managing and maintaining the address in the host, so it will
just lose connectivity.

Long term, we may want to monitor host address changes and reflect them to
the guest.  But for now, we just want to take a snapshot of the host's
address and set those in the container permanently.  We can accomplish that
by stripping off the IFA_CACHEINFO attribute as we copy addresses.

Link: https://github.com/containers/podman/issues/19405
Link: https://bugs.passt.top/show_bug.cgi?id=70
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-16 08:10:04 +02:00
David Gibson
b4f8ffd5c4 netlink: Correctly calculate attribute length for address messages
In nl_addr_get() and nl_addr_dup() we step the attributes attached to each
RTM_NEWADDR message with a loop initialised with IFA_RTA() and
RTM_PAYLOAD() macros.  RTM_PAYLOAD(), however is for RTM_NEWROUTE messages
(struct rtmsg), not RTM_NEWADDR messages (struct ifaddrmsg).  Consequently
it miscalculates the size and means we can skip some attributes.  Switch
to IFA_PAYLOAD() which we should be using here.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-16 08:10:02 +02:00
David Gibson
4b9f4c2513 netlink: Remove redundant check on nlmsg_type
In the loop within nl_addr_dup() we check and skip for any messages that
aren't of type RTM_NEWADDR.  This is a leftover that was missed in the
recent big netlink cleanup.  In fact we already check for the message type
in the nl_foreach_oftype() macro, so the explicit test is redudant.
Remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-16 08:09:49 +02:00
David Gibson
02b30e7871 netlink: Propagate errors for "dup" operations
We now detect errors on netlink "set" operations while configuring the
pasta namespace with --config-net.  However in many cases rather than
a simple "set" we use a more complex "dup" function to copy
configuration from the host to the namespace.  We're not yet properly
detecting and reporting netlink errors for that case.

Change the "dup" operations to propagate netlink errors to their
caller, pasta_ns_conf() and report them there.

Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Minor formatting changes in pasta_ns_conf()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:32:32 +02:00
David Gibson
5103811e2d netlink: Propagate errors for "dump" operations
Currently if we receive any netlink errors while discovering network
configuration from the host, we'll just ignore it and carry on.  This
might lead to cryptic error messages later on, or even silent
misconfiguration.

We now have the mechanisms to detect errors from get/dump netlink
operations.  Propgate these errors up to the callers and report them usefully.

Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:30:41 +02:00
David Gibson
4d6e9d0816 netlink: Always process all responses to a netlink request
A single netlink request can result in multiple response datagrams.  We
process multiple response datagrams in some circumstances, but there are
cases where we exit early and will leave remaining datagrams in the queue.
These will be flushed in nl_send() before we send another request.

This is confusing, and not what we need to reliably check for errors from
netlink operations.  So, instead, make sure we always process all the
response datagrams whenever we send a request (excepting fatal errors).

In most cases this is just a matter of avoiding early exits from nl_foreach
loops.  nl_route_dup() is a bit trickier, because we need to retain all the
routes we're going to try to copy in a single buffer.  Here we instead use
a secondary buffer to flush any remaining datagrams, and report an error
if there are any additional routes in those datagrams .

Link: https://bugs.passt.top/show_bug.cgi?id=67
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:30:37 +02:00
David Gibson
8de9805224 netlink: Propagate errors for "set" operations
Currently if anything goes wrong while we're configuring the namespace
network with --config-net, we'll just ignore it and carry on.  This might
lead to a silently unconfigured or misconfigured namespace environment.

For simple "set" operations based on nl_do() we can now detect failures
reported via netlink.  Propagate those errors up to pasta_ns_conf() and
report them usefully.

Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Minor formatting changes in pasta_ns_conf()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:30:22 +02:00
David Gibson
a309318275 netlink: Add nl_foreach_oftype to filter response message types
In most cases where processing response messages, we expect only one type
of message (excepting NLMSG_DONE or NLMSG_ERROR), and so we need a test
and continue to skip anything else.  Add a helper macro to do this.
This also fixes a bug in nl_get_ext_if() where we didn't have such a test
and if we got a message other than RTM_NEWROUTE we would have parsed
its contents as nonsense.

Also add a warning message if we get such an unexpected message type, which
could be useful for debugging if we ever hit it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:29 +02:00
David Gibson
99ddd7ce83 netlink: Split nl_req() to allow processing multiple response datagrams
Currently nl_req() sends the request, and receives a single response
datagram which we then process.  However, a single request can result in
multiple response datagrams.  That happens nearly all the time for DUMP
requests, where the 'DONE' message usually comes in a second datagram after
the NEW{LINK|ADDR|ROUTE} messages.  It can also happen if there are just
too many objects to dump in a single datagram.

Allow our netlink code to process multiple response datagrams by splitting
nl_req() into three different helpers: nl_send() just sends a request,
without getting a response.  nl_status() checks a single message to see if
it indicates the end of the reponses for our request.  nl_next() moves onto
the next response message, whether it's in a datagram we already received
or we need to recv() a new one.  We also add a 'for'-style macro to use
these to step through every response message to a request across multiple
datagrams.

While we're at it, be more thourough with checking that our sequence
numbers are in sync.

Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:26 +02:00
David Gibson
8ec757d003 netlink: Clearer reasoning about the netlink response buffer size
Currently we set NLBUFSIZ large enough for 8192 netlink headers (128kiB in
total), and reference netlink(7).  However netlink(7) says nothing about
reponse buffer sizes, and the documents which do reference 8192 *bytes* not
8192 headers.

Update NLBUFSIZ to 64kiB with a more detailed rationale.

Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:22 +02:00
David Gibson
9d4ab98d53 netlink: Add nl_do() helper for simple operations with error checking
So far we never checked for errors reported on netlink operations via
NLMSG_ERROR messages.  This has led to several subtle and tricky to debug
situations which would have been obvious if we knew that certain netlink
operations had failed.

Introduce a nl_do() helper that performs netlink "do" operations (that is
making a single change without retreiving complex information) with much
more thorough error checking.  As well as returning an error code if we
get an NLMSG_ERROR message, we also check for unexpected behaviour in
several places.  That way if we've made a mistake in our assumptions about
how netlink works it should result in a clear error rather than some subtle
misbehaviour.

We update those calls to nl_req() that can use the new wrapper to do so.
We will extend those to better handle errors in future.  We don't touch
non-"do" operations for now, those are a bit trickier.

Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:19 +02:00
David Gibson
282581ba84 netlink: Fill in netlink header fields from nl_req()
Currently netlink functions need to fill in a full netlink header, as well
as a payload then call nl_req() to submit that to the kernel.  It makes
things a bit terser if we just give the relevant header fields as
parameters to nl_req() and have it complete the header.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:16 +02:00
David Gibson
f62600b2df netlink: Treat send() or recv() errors as fatal
Errors on send() or recv() calls on a netlink socket don't indicate errors
with the netlink operations we're attempting, but rather that something's
gone wrong with the mechanics of netlink itself.  We don't really expect
this to ever happen, and if it does, it's not clear what we could to to
recover.

So, treat errors from these calls as fatal, rather than returning the error
up the stack.  This makes handling failures in the callers of nl_req()
simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:14 +02:00
David Gibson
0a568c847d netlink: Start sequence number from 1 instead of 0
Netlink messages have a sequence number that's used to match requests to
responses.  It mostly doesn't matter what it is as long as it monotonically
increases, so we just use a global counter which we advance with each
request.

However, we start this counter at 0, so our very first request has sequence
number 0, which is usually reserved for asynchronous messages from the
kernel which aren't in response to a specific request. Since we don't (for
now) use such async messages, this doesn't really matter, but it's not
good practce.  So start the sequence at 1 instead.

Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:09 +02:00
David Gibson
dee7594180 netlink: Make nl_*_dup() use a separate datagram for each request
nl_req() is designed to handle a single netlink request message: it only
receives a single reply datagram for the request, and only waits for a
single NLMSG_DONE or NLMSG_ERROR message at the beginning to clear out
things from previous requests.

However, in both nl_addr_dup() and nl_route_dup() we can send multiple
request messages as a single datagram, with a single nl_req() call.
This can easily mean that the replies nl_req() collects get out of
sync with requests.  We only get away with this because after we call
these functions we don't make any netlink calls where we need to parse
the replies.

This is fragile, so alter nl_*_dup() to make an nl_req() call for each
address it is adding in the target namespace.

For nl_route_dup() this fixes an additional minor problem: because
routes can have dependencies, some of the route add requests might
fail on the first attempt, so we need to repeat the requests a number
of times.  When we did that, we weren't updating the sequence number
on each new attempt.  This works, but not updating the sequence number
for each new request isn't ideal.  Now that we're making the requests
one at a time, it's easier to make sure we update the sequence number
each time.

Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:28:00 +02:00
David Gibson
576df71e8b netlink: Explicitly pass netlink sockets to operations
All the netlink operations currently implicitly use one of the two global
netlink sockets, sometimes depending on an 'ns' parameter.  Change them
all to explicitly take the socket to use (or two sockets to use in the case
of the *_dup() functions).  As well as making these functions strictly more
general, it makes the callers easier to follow because we're passing a
socket variable with a name rather than an unexplained '0' or '1' for the
ns parameter.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Minor formatting changes in pasta_ns_conf()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:27:42 +02:00
David Gibson
cfe7509e5c netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t
This improves consistency with IPv6 and makes it harder to misuse these as
some other sort of value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:25:23 +02:00
David Gibson
257a6b0b7e netlink: Split nl_route() into separate operation functions
nl_route() can perform 3 quite different operations based on the 'op'
parameter.  Split this into separate functions for each one.  This requires
more lines of code, but makes the internal logic of each operation much
easier to follow.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:25:20 +02:00
David Gibson
eff3bcb245 netlink: Split nl_addr() into separate operation functions
nl_addr() can perform three quite different operations based on the 'op'
parameter, each of which uses a different subset of the parameters.  Split
them up into a function for each operation.  This does use more lines of
code, but the overlap wasn't that great, and the separated logic is much
easier to follow.

It's also clearer in the callers what we expect the netlink operations to
do, and what information it uses.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Minor formatting fixes in pasta_ns_conf()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:24:52 +02:00
David Gibson
e96182e9c2 netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC.  This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.

Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac().  The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.

This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path.  We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-04 01:18:14 +02:00