To be able to manage buffers inside a shared memory provided
by a VM via a vhost-user interface, we cannot rely on the fact
that buffers are located in a pre-defined memory area and use
a base address and a 32bit offset to address them.
We need a 64bit address, so replace struct desc by struct iovec
and update range checking.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It's widely considered a legacy option nowadays, and I've haven't seen
clients setting it since Windows 95, but it's convenient for a minimal
DHCP client not using raw IP sockets such as what I'm playing with for
muvm.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
I'm trying to speed up and simplify IP address acquisition in muvm.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We want to add support for option 80 (Rapid Commit, RFC 4039), whose
length is 0.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
There are setups where no host interface is available or configured
at all, intentionally or not, temporarily or not, but users expect
(Podman) containers to run in any case as they did with slirp4netns,
and we're now getting reports that we broke such setups at a rather
alarming rate.
To this end, if we don't find any usable host interface, instead of
exiting:
- for IPv4, use 169.254.2.1 as guest/container address and 169.254.2.2
as default gateway
- for IPv6, don't assign any address (forcibly disable DHCPv6), and
use the *first* link-local address we observe to represent the
guest/container. Advertise fe80::1 as default gateway
- use 'tap0' as default interface name for pasta
Change ifi4 and ifi6 in struct ctx to int and accept a special -1
value meaning that no host interface was selected, but the IP family
is enabled. The fact that the kernel uses unsigned int values for
those is not an issue as 1. one can't create so many interfaces
anyway and 2. we otherwise handle those values transparently.
Fix a botched conditional in conf_print() to actually skip printing
DHCPv6 information if DHCPv6 is disabled (and skip printing NDP
information if NDP is disabled).
Link: https://github.com/containers/podman/issues/24614
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since 9a0e544f05 the NDP tests attempt to explicitly wait for DAD to
complete, rather than just having a hard coded sleep. However, the
conditions we use are a bit sloppy and allow for a number of possible cases
where it might not work correctly. Stefano seems to be hitting one of
these (though I'm not sure which) with some later patches.
- We wait for *lack* of a tentative address, so if the first check occurs
before we have even a tentative address it will bypass the delay
- It's not entirely clear if the permanent address will always appear
as soon as the tentative address disappears
- We weren't filtering on interface
- We were doing the filtering with ip-address options rather than in jq.
However in at least in some circumstances this seems to result in an
empty .addr_info field, rather than omitting it entirely, which could
cause us to get the wrong result
So, instead, explicitly wait for the address we need to be present: an
RA provided address on the external interface. While we're here we remove
the requirement that it have global scope: the "kernel_ra" check is already
sufficient to make sure this address comes from an NDP RA, not something
else. If it's not the global scope address we expect, better to check it
and fail, rather than keep waiting.
Fixes: 9a0e544f05 ("test: Improve test for NDP assigned prefix")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is very visible with muvm, but it also happens with QEMU: we're
sending the first unsolicited router advertisement milliseconds after
the guest connects.
That's usually pointless because, when the hypervisor connects, the
guest is typically not ready yet to process anything of that sort:
it's still booting. And if we happen to send it late enough (still
milliseconds), with muvm, while the message is discarded, it
sometimes (slightly) delays the response to the first solicited
router advertisement, which is the one we need to have coming fast.
Skip sending the unsolicited advertisement on the first timer run,
just calculate the next delay. Keep it simple by observing that we're
probably not trying to reach the 1970s with IPv6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
By dropping the filter on prefix length, commit 910f4f9103
("test: Don't require 64-bit prefixes in perf tests") broke tests on
setups where two global unicast IPv6 addresses are available, which
is the typical case when the "host" is a VM running under passt with
addresses from SLAAC and DHCPv6, because two addresses will be
returned.
Pick the first one instead. We don't really care about the prefix
length, any of these addresses will work.
Fixes: 910f4f9103 ("test: Don't require 64-bit prefixes in perf tests")
Link: https://archives.passt.top/passt-dev/20241119214344.6b4a5b3a@elisabeth/
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: 90e83d50a9 ("Don't take "our" MAC address from the host")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
It's not true that there's no mapping by default: there's no mapping
in the --map-guest-addr sense, by default, but in that case
the default --map-host-loopback behaviour prevails.
While at it, fix a typo.
Fixes: 57b7bd2a48 ("fwd, conf: Allow NAT of the guest's assigned address")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
RFC 9293, 3.8.4 says:
Implementers MAY include "keep-alives" in their TCP implementations
(MAY-5), although this practice is not universally accepted. Some
TCP implementations, however, have included a keep-alive mechanism.
To confirm that an idle connection is still active, these
implementations send a probe segment designed to elicit a response
from the TCP peer. Such a segment generally contains SEG.SEQ =
SND.NXT-1 and may or may not contain one garbage octet of data. If
keep-alives are included, the application MUST be able to turn them
on or off for each TCP connection (MUST-24), and they MUST default to
off (MUST-25).
but currently, tcp_data_from_tap() is not aware of this and will
schedule a fast re-transmit on the second keep-alive (because it's
also a duplicate ACK), ignoring the fact that the sequence number was
rewinded to SND.NXT-1.
ACK these keep-alive segments, reset the activity timeout, and ignore
them for the rest.
At some point, we could think of implementing an approximation of
keep-alive segments on outbound sockets, for example by setting
TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single
keep-alive segment at approximately the same time, and never reset the
connection. That's beyond the scope of this fix, though.
Reported-by: Tim Besard <tim.besard@gmail.com>
Link: https://github.com/containers/podman/discussions/24572
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We enter the timer handler with the ACK_TO_TAP_DUE flag, call
tcp_prepare_flags() with ACK_IF_NEEDED, and realise that we
acknowledged everything meanwhile, so we return early, but we also
need to reset that flag to avoid unnecessarily scheduling the timer
over and over again until more pending data appears.
I'm not sure if this fixes any real issue, but I've spotted this
in several logs reported by users, including one where we have some
unexpected bursts of high CPU load during TCP transfers at low rates,
from https://github.com/containers/podman/issues/23686.
Link: https://github.com/containers/podman/discussions/24572
Link: https://github.com/containers/podman/issues/23686
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We recently added support for sending unsolicited NDP Router Advertisement
packets. While we (correctly) disable this if the --no-ra option is given
we incorrectly still send them if --no-ndp is set. Fix the oversight.
Fixes: 6e1e44293e ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
ndp_timer() is called right away on the first epoll_wait() cycle,
when the communication channel to the guest isn't ready yet:
1.0038: NDP: sending unsolicited RA, next in 264s
1.0038: tap: failed to send 1 frames of 1
check that it's up before sending it. This effectively delays the
first gratuitous router advertisement, which is probably a good idea
given that we expect the guest to send a router solicitation right
away.
Fixes: 6e1e44293e ("ndp: Send unsolicited Router Advertisements")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If passt or pasta are started as root, we need to read the passwd file
(be it /etc/passwd or whatever sssd provides) to find out UID and GID
of 'nobody' so that we can switch to it.
Instead of a bunch of allow rules for passwd_file_t and sssd macros,
use the more convenient auth_read_passwd() interface which should
cover our usage of getpwnam().
The existing rules weren't actually enough:
# strace -e openat passt -f
[...]
Started as root, will change to nobody.
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
with corresponding SELinux warnings logged in audit.log.
Reported-by: Minxi Hou <mhou@redhat.com>
Analysed-by: Miloš Malik <mmalik@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently, our NDP implementation only sends Router Advertisements (RA)
when it receives a Router Solicitation (RS) from the guest. However,
RFC 4861 requires that we periodically send unsolicited RAs.
Linux as a guest also requires this: it will send an RS when a link first
comes up, but the route it gets from this will have a finite lifetime (we
set this to 65535s, the maximum allowed, around 18 hours). When that
expires the guest will not send a new RS, but instead expects the route to
have been renewed (if still valid) by an unsolicited RA.
Implement sending unsolicited RAs on a partially randomised timer, as
required by RFC 4861. The RFC also specifies that solicited RAs should
also be delayed, or even omitted, if the next unsolicited RA is soon
enough. For now we don't do that, always sending an immediate RA in
response to an RS. We can get away with this because in our use cases
we expect to just have passt itself and the guest on the link, rather than
a large broadcast domain.
Link: https://github.com/kubevirt/kubevirt/issues/13191
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have an upcoming case where we need pseudo-random numbers to scatter
timings, but we don't need cryptographically strong random numbers. libc's
built in random() is fine for this purpose, but we should seed it. Extend
secret_init() - the only current user of random numbers - to do this as
well as generating the SipHash secret. Using /dev/random for a PRNG seed
is probably overkill, but it's simple and we only do it once, so we might
as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently secret_init() open codes getting good quality random bytes from
the OS, either via getrandom(2) or reading /dev/random. We're going to
add at least one more place that needs random data in future, so make a
general helper for getting random bytes. While we're there, fix a number
of minor bugs:
- getrandom() can theoretically return a "short read", so handle that case
- getrandom() as well as read can return a transient EINTR
- We would attempt to read data from /dev/random if we failed to open it
(open() returns -1), but not if we opened it as fd 0 (unlikely, but ok)
- More specific error reporting
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we open-code the lifetime of the route we advertise via NDP to be
65535s (the maximum). Change it to a #define.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
There are a number of places we can simply assign IPv6 addresses about,
rather than the current mildly ugly memcpy().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently the large ndp() function responds to all NDP messages we handle,
both parsing the message as necessary and sending the response. Split out
the code to construct and send specific message types into ndp_na() (to
send NA messages) and ndp_ra() (to send RA messages).
As well as breaking up an excessively large function, this is a first step
to being able to send unsolicited NDP messages.
While we're there, remove a slighty ugly goto.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
ndp() has a conditional on message type generating the reply message, then
a tiny amount of common code, then another conditional to send the reply
with slightly different parameters. We can make this a bit neater by
making a helper function for sending the reply, and call it from each of
the different message type paths.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
ndp() updates addr_seen or addr_ll_seen based on the source address of the
received packet. This is redundant since tap6_handler() has already
updated addr_seen for any type of packet, not just NDP.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We pass -I options to cppcheck so that it will find the system headers.
Then we need to pass a bunch more options to suppress the zillions of
cppcheck errors found in those headers.
It turns out, however, that it's not recommended to give the system headers
to cppcheck anyway. Instead it has built-in knowledge of the ANSI libc and
uses that as the basis of its checks. We do need to suppress
missingIncludeSystem warnings instead though.
Not bothering with the system headers makes the cppcheck runtime go from
~37s to ~14s on my machine, which is a pretty nice win.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If CLOSE_RANGE_UNSHARE isn't defined, we define a fallback version of
close_range() which is a (successful) no-op. This is broken in several
ways:
* It doesn't actually fix compile if using old kernel headers, because
the caller of close_range() still directly uses CLOSE_RANGE_UNSHARE
unprotected by ifdefs
* Even if it did fix the compile, it means inconsistent behaviour between
a compile time failure to find the value (we silently don't close files)
and a runtime failure (we die with an error from close_range())
* Silently not closing the files we intend to close for security reasons
is probably not a good idea in any case
We don't want to simply error if close_range() or CLOSE_RANGE_UNSHARE isn't
available, because that would require running on kernel >= 5.9. On the
other hand there's not really any other way to flush all possible fds
leaked by the parent (close() in a loop takes over a minute). So in this
case print a warning and carry on.
As bonus this fixes a cppcheck error I see with some different options I'm
looking to apply in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
util.h has some #ifdefs and weak definitions to handle compatibility with
various kernel versions. Move this to linux_dep.h which handles several
other similar cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
log.c has several #ifdefs on FALLOC_FL_COLLAPSE_RANGE that won't attempt
to use it if not defined. But even if the value is defined at compile
time, it might not be available in the runtime kernel, so we need to check
for errors from a fallocate() call and fall back to other methods.
Simplify this to only need the runtime check by using linux_dep.h to define
FALLOC_FL_COLLAPSE_RANGE if it's not in the kernel headers.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I have no idea why, but these are reported by clang-tidy (19.2.1) on
Alpine (x86) only:
/home/sbrivio/passt/tap.c:1139:38: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors]
1139 | int fd = socket(AF_UNIX, SOCK_STREAM, 0);
| ^
| | SOCK_CLOEXEC
/home/sbrivio/passt/tap.c:1158:51: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors]
1158 | ex = socket(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0);
| ^
| | SOCK_CLOEXEC
/home/sbrivio/passt/tcp.c:1413:44: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors]
1413 | s = socket(af, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP);
| ^
| | SOCK_CLOEXEC
/home/sbrivio/passt/util.c:188:38: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors]
188 | if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
| ^
| | SOCK_CLOEXEC
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
For some reason, this is only reported by clang-tidy 19.1.2 on
Alpine:
/home/sbrivio/passt/passt.c:314:53: error: conditional operator with identical true and false expressions [bugprone-branch-clone,-warnings-as-errors]
314 | nfds = epoll_wait(c.epollfd, events, EPOLL_EVENTS, TIMER_INTERVAL);
| ^
We do have a suppression, but not on the line preceding it, because
we also need a cppcheck suppression there. Use NOLINTBEGIN/NOLINTEND
for the clang-tidy suppression.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
On 32-bit architectures, clang-tidy reports:
/home/pi/passt/tcp.c:728:11: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors]
728 | if (v >= SNDBUF_BIG)
| ^
/home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG'
158 | #define SNDBUF_BIG (4UL * 1024 * 1024)
| ^
/home/pi/passt/tcp.c:728:11: note: make conversion explicit to silence this warning
728 | if (v >= SNDBUF_BIG)
| ^
/home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG'
158 | #define SNDBUF_BIG (4UL * 1024 * 1024)
| ^~~~~~~~~~~~~~~~~
/home/pi/passt/tcp.c:728:11: note: perform multiplication in a wider type
728 | if (v >= SNDBUF_BIG)
| ^
/home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG'
158 | #define SNDBUF_BIG (4UL * 1024 * 1024)
| ^~~~~~~~~~
/home/pi/passt/tcp.c:730:15: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors]
730 | else if (v > SNDBUF_SMALL)
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^
/home/pi/passt/tcp.c:730:15: note: make conversion explicit to silence this warning
730 | else if (v > SNDBUF_SMALL)
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^~~~~~~~~~~~
/home/pi/passt/tcp.c:730:15: note: perform multiplication in a wider type
730 | else if (v > SNDBUF_SMALL)
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^~~~~
/home/pi/passt/tcp.c:731:17: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors]
731 | v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2;
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^
/home/pi/passt/tcp.c:731:17: note: make conversion explicit to silence this warning
731 | v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2;
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^~~~~~~~~~~~
/home/pi/passt/tcp.c:731:17: note: perform multiplication in a wider type
731 | v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2;
| ^
/home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL'
159 | #define SNDBUF_SMALL (128UL * 1024)
| ^~~~~
because, wherever we use those thresholds, we define the other term
of comparison as uint64_t. Define the thresholds as unsigned long long
as well, to make sure we match types.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Given that we're comparing against 'n', which is signed, we cast
TAP_BUF_BYTES to ssize_t so that the maximum buffer usage, calculated
as the difference between TAP_BUF_BYTES and ETH_MAX_MTU, will also be
signed.
This doesn't necessarily happen on 32-bit architectures, though. On
armhf and i686, clang-tidy 18.1.8 and 19.1.2 report:
/home/pi/passt/tap.c:1087:16: error: comparison of integers of different signs: 'ssize_t' (aka 'int') and 'unsigned int' [clang-diagnostic-sign-compare,-warnings-as-errors]
1087 | for (n = 0; n <= (ssize_t)TAP_BUF_BYTES - ETH_MAX_MTU; n += len) {
| ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cast the whole difference to ssize_t, as we know it's going to be
positive anyway, instead of relying on that side effect.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
cppcheck 2.14.2 on Alpine reports:
dhcpv6.c:431:32: style: Variable 'client_id' can be declared as pointer to const [constVariablePointer]
struct opt_hdr *ia, *bad_ia, *client_id;
^
It's not only 'client_id': we can declare 'ia' as const pointer too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
cppcheck 2.16.0 reports:
dhcpv6.c:334:14: style: The comparison 'ia_type == 3' is always true. [knownConditionTrueFalse]
if (ia_type == OPT_IA_NA) {
^
dhcpv6.c:306:12: note: 'ia_type' is assigned value '3' here.
ia_type = OPT_IA_NA;
^
dhcpv6.c:334:14: note: The comparison 'ia_type == 3' is always true.
if (ia_type == OPT_IA_NA) {
^
this is not really the case as we set ia_type to OPT_IA_TA and then
jump back.
Anyway, there's no particular reason to use a goto here: add a trivial
foreach() macro to go through elements of an array and use it instead.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In the NDP tests we search explicitly for a guest address with prefix
length 64. AFAICT this is an attempt to specifically find the SLAAC
assigned address, rather than something assigned by other means. We can do
that more explicitly by checking for .protocol == "kernel_ra". however.
The SLAAC prefixes we assigned *will* always be 64-bit, that's hard-coded
into our NDP implementation. RFC4862 doesn't really allow anything else
since the interface identifiers for an Ethernet-like link are 64-bits.
Let's actually verify that, rather than just assuming it, by extracting the
prefix length assigned in the guest and checking it as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When determining the namespace's IPv6 address in the perf test setup, we
explicitly filter for addresses with a 64-bit prefix length. There's no
real reason we need that - as long as it's a global address we can use it.
I suspect this was copied without thinking from a similar example in the
NDP tests, where the 64-bit prefix length _is_ meaningful (though it's not
entirely clear if the handling is correct there either).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently nstool die()s on essentially any error. In most cases that's
fine for our purposes. However, it's a problem when in "hold" mode and
getting an IO error on an accept()ed socket. This could just indicate that
the control client aborted prematurely, in which case we don't want to
kill of the namespace we're holding.
Adjust these to print an error, close() the control client socket and
carry on. In addition, we need to explicitly ignore SIGPIPE in order not
to be killed by an abruptly closed client connection.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
nstool in "exec" mode will propagate some signals (specifically SIGTERM) to
the process in the namespace it executes. The signal handler which
accomplishes this is called simply sig_handler(). However, it turns out
we're going to need some other signal handlers, so rename this to the more
specific sig_propagate().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While experimenting with cppcheck options, I hit several false positives
caused by this bug: https://trac.cppcheck.net/ticket/13227
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have an ASSERT() verifying that we're able to look up the flow in
udp_reply_sock_handler(). However, we dereference uflow before that in
an initializer, rather defeating the point. Rearrange to avoid that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We don't modify this structure at all. For some reason cppcheck doesn't
catch this with our current options, but did when I was experimenting with
some different options.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_info.h exists just to contain a modern enough version of struct
tcp_info for our needs, removing compile time dependency on the version of
kernel headers. There are several other cases where we can remove similar
compile time dependencies on kernel version. Prepare for that by renaming
tcp_info.h to linux_dep.h.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
On certain architectures we get a warning about comparison between
different signedness integers in fwd_probe_ephemeral(). This is because
NUM_PORTS evaluates to an unsigned integer. It's a fixed value, though
and we know it will fit in a signed long on anything reasonable, so add
a cast to suppress the warning.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We supply a weak alias for ffsl() in case it's not defined in our libc.
Except.. we don't have any users for it any more, so remove it.
make cppcheck doesn't spot this at present for complicated reasons, but it
might with tweaks to the options I'm experimenting with.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
clangd's default configuration seems to try to treat .h files as C++ not
C. There are many more spurious warnings generated at present, but this
removes some of the most egregious ones.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We probe the available stack limit in the Makefile using rlimit, then use
that to set the size of the stack when we clone() extra threads. But
the rlimit at compile time need not be the same as the rlimit at runtime,
so that's not particularly sensible.
Ideally, we'd set the stack size based on an estimate of the actual
maximum stack usage of all our clone()ed functions. We don't have that
at the moment, but to keep things simple just set it to 1MiB - that's what
the current probe will set things to on my default configuration Fedora 40,
so it's likely to be fine in most cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We insert -DARCH for all compiles, based on TARGET_ARCH determined in the
Makefile. However, this is only used in qrap.c, not anywhere else in
passt or pasta. Only supply this -D when compiling qrap specifically.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we construct the AUDIT_ARCH variable in the Makefile, then pass
it into the C code with -D. The only place that uses it, though is the
BPF filter generated by seccomp.sh. seccomp.sh already needs to do things
differently depending on the arch, so it might as well just insert the
expanded AUDIT_ARCH directly into the generated code, rather than using
a #define. Arguably this is better, even, since it ensures more locally
that the arch the BPF checks for matches the arch seccomp.sh built the
filter for.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
NETNS_RUN_DIR is set in the Makefile, then passed into the C code with
-D. But NETNS_RUN_DIR is just a fixed string, it doesn't depend on any
make probes or variables, so there's really no reason to handle it via the
Makefile. Just move it to a plain #define in conf.c.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since it's the size of a chunk of memory it would seem logical that
RTA_PAYLOAD() returns size_t. However, it doesn't - it explicitly casts
its result to an int. RTNH_OK(), which often takes the result of
RTA_PAYLOAD() as a parameter compares it to an int, so using size_t can
result in comparison of different-signed integer warnings from clang.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>