We download and use mbuto to build trivial boot images for our VM tests.
However, if mbuto is already cloned, we won't update it to the current
version. Add some make logic to ensure that we do this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently "make cppcheck" invokes cppcheck on ".", so it will check all the
.c and .h files it can find in the source tree. This isn't ideal, because
it can find files that aren't actually part of the real build, or even
stale files which aren't in git.
More practically, some upcoming changes are looking at downloading other
source trees for some tests. Static errors in there is Not Our Problem,
so checking them is both slow and pointless.
So, change the Makefile to invoke cppcheck only on the specific source
files that are part of the build. For some reason in this format the
badBitmaskCheck warnings in seccomp.h which were suppressed by 5beb3472e
("cppcheck: Avoid errors due to zeroes in bitwise ORs") no longer trigger.
That means we get unmatchedSuppression warnings instead. We add an
unmatchedSuppression suppression instead of simply removing the original
suppressions, just in case this odd behaviour isn't the same for all
cppcheck versions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), and since 639fdf06ede ("netlink: Fix selection of template
interface") less buggily, we haven't required a default route on the host
in order to operate. Instead, if we lack a default route we'll pick an
interface with any route, as long as there's only one such interface. If
there's more than one, we don't have a good criterion to pick, so we give
up with an informational message.
Paul Holzinger pointed out that this code considers it ambiguous even if
all but one of the interfaces has only routes to link-local addresses
(fe80::/10). A route to link-local addresses isn't really useful from
pasta's point of view, so ignore them instead. This removes a misleading
message in many cases, and a spurious failure in some cases.
Suggested-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have a few places where we want to include the name of the internet
protocol version (IPv4 or IPv6) in a message, which we handle with an
open-coded ?: expression.
This seems like something that might be more widely useful, so make a
trivial helper to return the correct string based on the address family.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As pasta duplicates host routes into the target namespaces, interface
indices might not match, so we go through RTA_OIF attributes and fix
them up to match the identifier in the namespace.
But RTA_OIF is not the ony attribute specifying interfaces for routes:
multipath routes use RTA_MULTIPATH attributes with nexthop objects,
which contain in turn interface indices. Fix them up as well.
If we don't, and we have at least two host interfaces, and the host
interface we use as template isn't the first one (hence the
mismatching indices), we'll fail to insert multipath routes with
nexthop objects, and ultimately refuse to start as the kernel
unexpectedly gives us ENODEV.
Link: https://github.com/containers/podman/issues/22192
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
From an original patch by Danish Prakash.
With commit ff22a78d7b52 ("pasta: Don't try to watch namespaces in
procfs with inotify, use timer instead"), if a filesystem-bound
target namespace is passed on the command line, we'll grab a handle
on its parent directory. That commit, however, didn't introduce a
matching AppArmor rule. Add it here.
To access a network namespace procfs entry, we also need a 'ptrace'
rule. See commit 594dce66d3bb ("isolation: keep CAP_SYS_PTRACE when
required") for details as to when we need this -- essentially, it's
about operation with Buildah.
Reported-by: Jörg Sonnenberger <joerg@bec.de>
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Fixes: ff22a78d7b52 ("pasta: Don't try to watch namespaces in procfs with inotify, use timer instead")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
With Podman's custom networks, pasta will typically need to open the
target network namespace at /run/user/<UID>/containers/networks:
grant access to anything under /run/user/<UID> instead of limiting it
to some subpath.
Note that in this case, Podman will need pasta to write out a PID
file, so we need write access, for similar locations, too.
Reported-by: Jörg Sonnenberger <joerg@bec.de>
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For the policy to work as expected across either AppArmor commit
9d3f8c6cc05d ("parser: fix parsing of source as mount point for
propagation type flags") and commit 300889c3a4b7 ("parser: fix option
flag processing for single conditional rules"), we need one mount
rule with matching mount options as "source" (that is, without
source), and one without mount options and an explicit, empty source.
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we set ACK on flags packets only when the acknowledged byte
pointer has advanced, or we hadn't previously set a window. This means
in particular that we can send a window update with no ACK flag, which
doesn't appear to be correct. RFC 9293 requires a receiver to ignore such
a packet [0], and indeed it appears that every non-SYN, non-RST packet
should have the ACK flag.
The reason for the existing logic, rather than always forcing an ACK seems
to be to avoid having the packet mistaken as a duplicate ACK which might
trigger a fast retransmit. However, earlier tests in the function mean we
won't reach here if we don't have either an advance in the ack pointer -
which will already set the ACK flag, or a window update - which shouldn't
trigger a fast retransmit.
[0] https://www.ietf.org/rfc/rfc9293.html#section-3.10.7.4-2.5.2.1
Link: https://github.com/containers/podman/issues/22146
Link: https://bugs.passt.top/show_bug.cgi?id=84
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_send_flag() will sometimes force on the ACK flag for all !SYN packets.
This doesn't make sense for RST packets, where plain RST and RST+ACK have
somewhat different meanings. AIUI, RST+ACK indicates an abrupt end to
a connection, but acknowledges data already sent. Plain RST indicates an
abort, when one end receives a packet that doesn't seem to make sense in
the context of what it knows about the connection. All of the cases where
we send RSTs are the second, so we don't want an ACK flag, but we currently
could add one anyway.
Change that, so we won't add an ACK to an RST unless the caller explicitly
requests it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have different paths for controlling the ACK flag for the SYN and !SYN
paths. This amounts to sometimes forcing on the ACK flag in the !SYN path
regardless of options. We can rearrange things to explicitly be that which
will make things neater for some future changes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The DUP_ACK flag to tcp_send_flag() has two effects: first it forces the
setting of the ACK flag in the packet, even if we otherwise wouldn't.
Secondly, it causes a duplicate of the flags packet to be sent immediately
after the first.
Setting the ACK flag to tcp_send_flag() also has the first effect, so
instead of having DUP_ACK also do that, pass both flags when we need both
operations. This slightly simplifies the logic of tcp_send_flag() in a way
that makes some future changes easier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In write_remainder() 'skip' is the offset to start the operation from
in the iovec array.
In iov_skip_bytes(), 'skip' is also the offset in the iovec array but
'offset' is the first unskipped byte in the iovec entry.
As write_remainder() uses 'skip' for both, 'skip' is reset to the
first unskipped byte in the iovec entry rather to staying the first
unskipped byte in the iovec array.
Fix the problem by introducing a new variable not to overwrite 'skip'
on each loop.
Fixes: 8bdb0883b441 ("util: Add write_remainder() helper")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.
Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro. However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.
The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request. It needs to get all the routes, then make multiple passes
through them. We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.
After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them. It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).
What we do after the loop depends on which exit case we had. If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.
We attempt to check for that second case by re-checking NLMSG_OK(nh,
status). However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check. To fix this we need separate variables for the number of bytes
left and the final status code.
We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.
Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The spec file patch by Dan Čermák was originally contributed at:
https://src.fedoraproject.org/rpms/passt/pull-request/1
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Paul reports that if pasta is configured with --dns-forward, and the
container queries a resolver which is configured on the host directly,
without using the address given for --dns-forward, we'll translate
the source address of the response pretending it's coming from the
address passed as --dns-forward, and the client will discard the
reply.
That is,
$ cat /etc/resolv.conf
198.51.100.1
$ pasta --config-net --dns-forward 192.0.2.1 nslookup passt.top
will not work, because we change the source address of the reply from
198.51.100.1 to 192.0.2.1. But the client contacted 198.51.100.1, and
it's from that address that it expects an answer.
Add a PORT_DNS_FWD flag for tap-facing ports, which is triggered by
activity in the opposite direction as the other flags. If the
tap-facing port was seen sending a DNS query that was remapped, we'll
remap the source address of the response, otherwise we'll leave it
unaffected.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
There might be isolated testing environments where default routes and
global connectivity are not needed, a single interface has all
non-loopback addresses and routes, and still passt and pasta are
expected to work.
In this case, it's pretty obvious what our upstream interface should
be, so go ahead and select the only interface with at least one
route, disabling DHCP and implying --no-map-gw as the documentation
already states.
If there are multiple interfaces with routes, though, refuse to start,
because at that point it's really not clear what we should do.
Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://github.com/containers/podman/issues/21896
Signed-off-by: Stefano brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Martin reports that, with Fedora Linux kernel version
kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64,
including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same
read() as families"), pasta doesn't exit once the network namespace
is gone.
Actually, pasta is completely non-functional, at least with default
options, because nl_route_dup(), which duplicates routes from the
parent namespace into the target namespace at start-up, is stuck on
a second receive operation for RTM_GETROUTE.
However, with that commit, the kernel is now able to fit the whole
response, including the NLMSG_DONE message, into a single datagram,
so no further messages will be received.
It turns out that commit 4d6e9d0816e2 ("netlink: Always process all
responses to a netlink request") accidentally relied on the fact that
we would always get at least two datagrams as a response to
RTM_GETROUTE.
That is, the test to check if we expect another datagram, is based
on the 'status' variable, which is 0 if we just parsed NLMSG_DONE,
but we'll also expect another datagram if NLMSG_OK on the last
message is false. But NLMSG_OK with a zero length is always false.
The problem is that we don't distinguish if status is zero because
we got a NLMSG_DONE message, or because we processed all the
available datagram bytes.
Introduce an explicit check on NLMSG_DONE. We should probably
refactor this slightly, for example by introducing a special return
code from nl_status(), but this is probably the least invasive fix
for the issue at hand.
Reported-by: Martin Pitt <mpitt@redhat.com>
Link: https://github.com/containers/podman/issues/22052
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
These two functions are typically used to calculate values to go into the
iov_base and iov_len fields of a struct iovec. They don't have to be used
for that, though. Rename them in terms of what they actually do: calculate
the base address and total length of the complete frame, including both L2
and tap specific headers.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Most times we send frames to the guest it goes via tap_send_frames().
However "slow path" protocols - ARP, ICMP, ICMPv6, DHCP and DHCPv6 - go
via tap_send().
As well as being a semantic duplication, tap_send() contains at least one
serious problem: it doesn't properly handle short sends, which can be fatal
on the qemu socket connection, since frame boundaries will get out of sync.
Rewrite tap_send() to call tap_send_frames(). While we're there, rename it
tap_send_single() for clarity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We can both remove some variables which differ from others only in type,
and slightly improve type safety.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tap_send_frames() takes a vector of buffers and requires exactly one frame
per buffer. We have future plans where we want to have multiple buffers
per frame in some circumstances, so extend tap_send_frames() to take the
number of buffers per frame as a parameter.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Improve comment to rembufs calculation]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Paul reports that, with commit 15001b39ef1d ("conf: set the log level
much earlier"), early messages aren't reported to standard error
anymore.
The reason is that, once the log mask is changed from LOG_EARLY, we
don't force logging to stderr, and this mechanism was abused to have
early errors on stderr. Now that we drop LOG_EARLY earlier on, this
doesn't work anymore.
Call __openlog() as soon as we know the mode we're running as, using
LOG_PERROR. Then, once we detach, if we're not running from an
interactive terminal and logging to standard error is not forced,
drop LOG_PERROR from the options.
While at it, check if the standard error descriptor refers to a
terminal, instead of checking standard output: if the user redirects
standard output to /dev/null, they might still want to see messages
from standard error.
Further, make sure we don't print messages to standard error reporting
that we couldn't log to the system logger, if we didn't open a
connection yet. That's expected.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 15001b39ef1d ("conf: set the log level much earlier")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
POSIX.1-2008 declared gettimeofday() as obsolete, but I'm a dinosaur.
Usually, C libraries translate that to the clock_gettime() system
call anyway, but this doesn't happen in Jon's environment, and,
there, seccomp happily kills pasta(1) when started with --pcap,
because we didn't add gettimeofday() to our seccomp profiles.
Use clock_gettime() instead.
Reported-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We might have read from resolv.conf, or from the command line, a
resolver that's reachable via loopback address, but that doesn't mean
we can offer that via DHCP, NDP or DHCPv6: warn if there are no
resolvers we can offer for a given IP version.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
...that is, call add_dns4() and add_dns6() instead of simply adding
those to the list of servers we advertise.
Most importantly, this will set the 'dns_host' field for the matching
IP version, so that, as mentioned in the man page, servers passed via
--dns are used for DNS mapping as well, if used in combination with
--dns-forward.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: https://bugs.passt.top/show_bug.cgi?id=82
Fixes: 89678c515755 ("conf, udp: Introduce basic DNS forwarding")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
In tap_send_frames(), if we failed to send all the frames, we must
only log the frames that have been sent, not all the frames we wanted
to send.
Fixes: dda7945ca9c9 ("pcap: Handle short writes in pcap_frame()")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we open code the calculation of the UDP checksum in
udp_update_hdr6(). We calling a helper to handle the IPv6 pseudo-header,
and preset the checksum field to 0 so an uninitialised value doesn't get
folded in. We already have a helper to do this: csum_udp6() which we use
in some slow paths. Use it here as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We carry around the source address as a pointer to a constant struct
in_addr. But it's silly to carry around a 4 or 8 byte pointer to a 4 byte
IPv4 address. Just copy the IPv4 address around by value.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The order of things in these functions is a bit odd for historical reasons.
We initialise some IP header fields early, the more later after making
some tests. Likewise we declare some variables without initialisation,
but then unconditionally set them to values we could calculate at the
start of the function.
Previous cleanups have removed the reasons for some of these choices, so
reorder for clarity, and where possible move the first assignment into an
initialiser.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
These functions take an index to the L2 buffer whose header information to
update. They use that for two things: to locate the buffer pointer itself,
and to retrieve the length of the received message from the paralllel
udp[46]_l2_mh_sock array. The latter is arguably a failure to separate
concerns. Change these functions to explicitly take a buffer pointer and
payload length as parameters.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In these functions we have 'dstport' for the destination port, but
'src_port' for the source port. Change the latter to 'srcport' for
consistency.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Each of these functions have 3 essentially identical loops in a row.
Merge the loops into a single common udp_sock_iov_init() function, calling
udp_sock[46]_iov_init_one() helpers to initialize each "slot" in the
various parallel arrays. This is slightly neater now, and more naturally
allows changes we want to make where more initialization will become common
between IPv4 and IPv6.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Starting from commit 3a2afde87dd1 ("conf, udp: Drop mostly duplicated
dns_send arrays, rename related fields"), we won't add to c->ip4.dns
and c->ip6.dns nameservers that can't be used by the guest or
container, and we won't advertise them.
However, the fact that we don't advertise any nameserver doesn't mean
that we didn't find any, and we should warn only if we couldn't find
any.
This is particularly relevant in case both --dns-forward and
--no-map-gw are passed, and a single loopback address is listed in
/etc/resolv.conf: we'll forward queries directed to the address
specified by --dns-forward to the loopback address we found, we
won't advertise that address, so we shouldn't warn: this is a
perfectly legitimate usage.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: https://github.com/containers/podman/issues/19213
Fixes: 3a2afde87dd1 ("conf, udp: Drop mostly duplicated dns_send arrays, rename related fields")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Paul Holzinger <pholzing@redhat.com>
Currently ping sockets use a custom epoll reference type which includes
the ICMP id. However, now that we have entries in the flow table for
ping flows, finding that is sufficient to get everything else we want,
including the id. Therefore remove the icmp_epoll_ref type and use the
general 'flowside' field for ping sockets.
Having done this we no longer need separate EPOLL_TYPE_ICMP and
EPOLL_TYPE_ICMPV6 reference types, because we can easily determine
which case we have from the flow type. Merge both types into
EPOLL_TYPE_PING.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Use flow_dbg() and flow_err() helpers to generate flow-linked error
messages in most places. Make a few small improvements to the messages
while we're at it. This allows us to avoid the awkward 'pname' variables
since whether we're dealing with ICMP or ICMPv6 is already built into the
flow type which these helpers include.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Coding style fix in icmp_tap_handler()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently icmp_id_map[][] stores information about ping sockets in a
bespoke structure. Move the same information into new types of flow
in the flow table. To match that change, replace the existing ICMP
timer with a flow-based timer for expiring ping sockets. This has the
advantage that we only need to scan the active flows, not all possible
ids.
We convert icmp_id_map[][] to point to the flow table entries, rather
than containing its own information. We do still use that array for
locating the right ping flows, rather than using a "flow native" form
of lookup for the time being.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update id_sock description in comment to icmp_ping_new()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
instead of htons_constant(), which is for... constants.
Fixes: 5bf200ae8a1a ("tcp, udp: Don't include destination address in partially precomputed csums")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
As pointed out in review, the documentation comments for iov_skip_bytes()
are more confusing than they should be. Reword them, including updating
parameter names, to make it clearer.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function
tcp_fill_header().
Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to
tcp_fill_headers4() and tcp_fill_headers6()
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-ID: <20240303135114.1023026-10-lvivier@redhat.com>
[dwg: Correct commit message with new function names]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Use ethhdr rather than tap_hdr.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-9-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The TCP and UDP checksums are computed using the data in the TCP/UDP
payload but also some informations in the IP header (protocol,
length, source and destination addresses).
We add two functions, proto_ipv4_header_psum() and
proto_ipv6_header_psum(), to compute the checksum of the IP
header part.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-ID: <20240303135114.1023026-8-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We can find the same function to compute the IPv4 header
checksum in tcp.c, udp.c and tap.c
Use the function defined for tap.c, csum_ip4_header(), but
with the code used in tcp.c and udp.c as it doesn't need a fully
initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-7-lvivier@redhat.com>
[dwg: Fix weird cppcheck regression; it appears to be a problem
in pre-existing code, but somehow this patch is exposing it]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
in udp_update_hdr4():
Assign the source address to src, either b->s_in.sin_addr,
c->ip4.dns_match or c->ip4.gw and then set b->iph.saddr to src->s_addr.
in udp_update_hdr6():
Assign the source address to src, either b->s_in6.sin6_addr,
c->ip6.dns_match, c->ip6.gw or c->ip6.addr_ll.
Assign the destination to dst, either c->ip6.addr_seen or
&c->ip6.addr_ll_seen.
Then set dst to b->ip6h.daddr and src to b->ip6h.saddr.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-ID: <20240303135114.1023026-6-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures. Modify various files to include the new header ip.h when
it's needed.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-5-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Introduce the function csum_unfolded() that computes the unfolded
32-bit checksum of a data buffer, and call it from csum() that returns
the folded value.
Introduce csum_iov() that computes the checksum using csum_folded() on
all vectors of the iovec array and returns the folded result.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-4-lvivier@redhat.com>
[dwg: Fixed trivial cppcheck & clang-tidy regressions]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If buffer is not aligned use sum_16b() only on the not aligned
part, and then use csum_avx2() on the remaining part
Remove unneeded now function csum_unaligned().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-3-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>