When studying the Linux source code and Wireshark dumps it seems like
the no_frag flag in the IPv4 header is always set. Following discussions
in the Internet on this subject indicates that modern routers never
fragment packets, and that it isn't even supported in many cases.
Adding to this that incoming messages forwarded on the tap interface
never even pass through a router it seems safe to always set this flag.
This makes the IPv4 headers of forwarded messages identical to those
sent by the external sockets, something we must consider desirable.
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Otherwise we build it, but we don't install it. Not an issue that
warrants a a release right away as it's anyway usable.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...instead of unconditionally trying to enable both: mmap2() is the
32-bit ARM variant for mmap() (and perhaps for other architectures),
bot if mmap() is available, valgrind will use that one.
This avoids seccomp.sh warning us about missing mmap2() if mmap() is
present, and is consistent with what we do in vhost-user code.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
On the command line -m 0 means "don't assign an MTU" (letting the guest use
its default. However, internally we use (c->mtu == -1) to represent that
state. We use (c->mtu == 0) to represent "the user didn't specify on the
command line, so use the default" - but this is only used during conf(),
never afterwards.
This is unnecessarily confusing. We can instead just initialise c->mtu to
its default (65520) before parsing options and use 0 on both the command
line and internally to represent the "don't assign" special case. This
ensures that c->mtu is always 0..65535, so we can store it in a uint16_t
which is more natural.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We're a bit sloppy with parsing MTU which can lead to some surprising,
though fairly harmless, results:
* Passing a non-number like '-m xyz' will not give an error and act like
-m 0
* Junk after a number (e.g. '-m 1500pqr') will be ignored rather than
giving an error
* We parse the MTU as a long, then immediately assign to an int, so on
some platforms certain ludicrously out of bounds values will be
silently truncated, rather than giving an error
Be a bit more thorough with the error checking to avoid that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The migration code introduced a number of 'foreach' macros to traverse the
flow table. These aren't inherently tied to migration, so polish up their
naming, move them to flow_table.h and also use in flow_defer_handler()
which is the other place we need to traverse the whole table.
For now we keep foreach_established_tcp_flow() as is.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The foreach macros used to step through flows each take a 'bound' parameter
to only scan part of the flow table. Only one place actually passes a
bound different from FLOW_MAX. So we can simplify every other invocation
by having that one case manually handle the bound.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The foreach macros are odd in that they take two loop counters: an integer
index, and a pointer to the flow. We nearly always want the latter, not
the former, and we can get the index from the pointer trivially when we
need it. So, rearrange the macros not to need the integer index.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Our general logging helpers include a number of _perror() variants which,
like perror(3) include the description of the current errno. We didn't
have those for our flow specific logging helpers, though. Fill this gap
with flow_perror() and flow_dbg_perror(), and use them where it's useful.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_flow_migrate_source_ext() is passed both the index of the flow it
operates on and the pointer to the connection structure. However, the
former is trivially derived from the latter. Simplify the interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This function existed in drafts of the migration code, but not the final
version. Get rid of the prototype.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_flow_migrate_target_ext() takes a raw union flow *, although it is TCP
specific, and requires a FLOW_TYPE_TCP entry. Our usual convention is that
such functions should take a struct tcp_tap_conn * instead. Convert it to
do so.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
head_cnt is a global variable which tracks how many entries in head[] are
currently used. The fact that it's global obscures the fact that the
lifetime over which it has a meaningful value is quite short: a single
call to of tcp_vu_data_from_sock().
Make it a local to tcp_vu_data_from_sock() to make that lifetime clearer.
We keep the head[] array global for now - although technically it has the
same valid lifetime - because it's large enough we might not want to put
it on the stack.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The uses of this macro were removed in d4598e1d18ac ("udp: Use the same
buffer for the L2 header for all frames").
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Fundamentally what packet_check_range() does is to check whether a given
memory range is within the allowed / expected memory set aside for packets
from a particular pool. That range could represent a whole packet (from
packet_add_do()) or part of a packet (from packet_get_do()), but it doesn't
really matter which.
However, we pass the start of the range as two parameters: @start which is
the start of the packet, and @offset which is the offset within the packet
of the range we're interested in. We never use these separately, only as
(start + offset). Simplify the interface of packet_check_range() and
vu_packet_check_range() to directly take the start of the relevant range.
This will allow some additional future improvements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we have a dummy pkt[1] array, which we alias with an array of
a different size via various macros. However, we already require C11 which
includes flexible array members, so we can do better.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The option 255 (end of options) do not need the length byte, this change
remove that allowing to have one extra byte at other dynamic options.
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
PCAP=1 ./run migrate/bidirectional gives an overview of how the
whole thing is working.
Add 12 tests in total, checking basic functionality with and without
flows in both directions, with and without sockets in half-closed
states (both inbound and outbound), migration behaviour under traffic
flood, under traffic flood with > 253 flows, and strict checking of
sequences under flood with ramp patterns in both directions.
These tests need preparation and teardown for each case, as we need
to restore the source guest in its own context and pane before we can
test again. Eventually, we could consider alternating source and
target so that we don't need to restart from scratch every time, but
that's beyond the scope of this initial test implementation.
Trick: './run migrate/*' runs all the tests with preparation and
teardown steps.
Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This implements flow preparation on the source, transfer of data with
a format roughly inspired by struct tcp_tap_conn, plus a specific
structure for parameters that don't fit in the flow table, and flow
insertion on the target, with all the appropriate window options,
window scaling, MSS, etc.
Contents of pending queues are transferred as well.
The target side is rather convoluted because we first need to create
sockets and switch them to repair mode, before we can apply options
that are *not* stored in the flow table. This also means that, if
we're testing this on the same machine, in the same namespace, we need
to close the listening socket on the source before we can start moving
data.
Further, we need to connect() the socket on the target before we can
restore data queues, but we can't do that (again, on the same machine)
as long as the matching source socket is open, which implies an
arbitrary limit on queue sizes we can transfer, because we can only
dump pending queues on the source as long as the socket is open, of
course.
Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In commit e5eefe77435a ("tcp: Refactor to use events instead of
states, split out spliced implementation"), this:
if (!bitmap_isset(rcvlowat_set, conn - ts) &&
readlen > (long)c->tcp.pipe_size / 10) {
(note the !) became:
if (conn->flags & lowat_set_flag &&
readlen > (long)c->tcp.pipe_size / 10) {
in the new tcp_splice_sock_handler().
We want to check, there, if we should set SO_RCVLOWAT, only if we
haven't set it already.
But, instead, we're checking if it's already set before we set it, so
we'll never set it, of course.
Fix the check and re-enable the functionality, which should give us
improved CPU utilisation in non-interactive cases where we are not
transferring at full pipe capacity.
Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation")
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we set the OUT_WAIT_* flag (waiting on EPOLLOUT) for a side of a
given flow, it means that we're blocked, waiting for the receiver to
actually receive data, with a full pipe.
In that case, if we keep EPOLLIN set for the socket on the other side
(our receiving side), we'll get into a loop such as:
41.0230: pasta: epoll event on connected spliced TCP socket 108 (events: 0x00000001)
41.0230: Flow 1 (TCP connection (spliced)): -1 from read-side call
41.0230: Flow 1 (TCP connection (spliced)): -1 from write-side call (passed 8192)
41.0230: Flow 1 (TCP connection (spliced)): event at tcp_splice_sock_handler:577
41.0230: pasta: epoll event on connected spliced TCP socket 108 (events: 0x00000001)
41.0230: Flow 1 (TCP connection (spliced)): -1 from read-side call
41.0230: Flow 1 (TCP connection (spliced)): -1 from write-side call (passed 8192)
41.0230: Flow 1 (TCP connection (spliced)): event at tcp_splice_sock_handler:577
leading to 100% CPU usage, of course.
Drop EPOLLIN on our receiving side as long when we're waiting for
output readiness on the other side.
Link: https://github.com/containers/podman/issues/23686#issuecomment-2661036584
Link: https://www.reddit.com/r/podman/comments/1iph50j/pasta_high_cpu_on_podman_rootless_container/
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
GET_VRING_BASE stops the queue, clearing the call and kick fds. However,
we don't clear vring.avail. That means that if vu_queue_notify() is called
it won't realise the queue isn't ready and will die with an EBADFD.
We get this during migration, because for some reason, qemu reconfigures
the vhost-user device when a migration is triggered. There's a window
between the GET_VRING_BASE and re-establishing the call fd where the
notify function can be called, causing a crash.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I added this a long long time ago because it dramatically improved
throughput back then: with rmem_max and wmem_max >= 4 MiB, we would
force send and receive buffer sizes for TCP sockets to the maximum
allowed value.
This effectively disables TCP auto-tuning, which would otherwise allow
us to exceed those limits, as crazy as it might sound. But in any
case, it made sense.
Now that we have zero (internal) copies on every path, plus vhost-user
support, it turns out that these settings are entirely obsolete. I get
substantially the same throughput in every test we perform, even with
very short durations (one second).
The settings are not just useless: they actually cause us quite some
trouble on guest state migration, because they lead to huge queues
that need to be moved as well.
Drop those settings.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Once we get a FIN segment from the container/guest, we enter something
resembling CLOSE_WAIT (from the perspective of the peer), but that
doesn't mean that we should stop processing window updates from the
guest and checking for socket data if the guest acknowledges
something.
If we don't do that, we can very easily run into a situation where we
send a burst of data to the tap, get a zero window update, along with
a FIN segment, because the flow is meant to be unidirectional, and now
the connection will be stuck forever, because we'll ignore updates.
Reproducer, server:
$ pasta --config-net -t 9999 -- sh -c 'echo DONE | socat TCP-LISTEN:9997,shut-down STDIO'
and client:
$ ./test/rampstream send 50000 | socat -u STDIN TCP:$LOCAL_ADDR:9997
2025/02/13 09:14:45 socat[2997126] E write(5, 0x55f5dbf47000, 8192): Broken pipe
while at it, update the message string for the third passive close
state (which we see in this case): it's CLOSE_WAIT, not LAST_ACK.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This doesn't actually belong to passt's own policy: we should export
an interface and libvirt's policy should use it, because passt's
policy shouldn't be aware of svirt_image_t at all.
However, libvirt doesn't maintain its own policy, which makes policy
updates rather involved. Add this workaround to ensure --vhost-user
is working in combination with libvirt, as it might take ages before
we can get the proper rule in libvirt's policy.
Reported-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...other than being convenient, they might be reasonably
representative of typical stand-alone usage.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
So that we can bind inbound sockets to specific addresses, like we
already do for outbound sockets.
While at it, change the error message in tcp_conn_from_tap() to match
this one.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This will close all the sockets we currently have open in repair mode,
and completes our migration tasks as source. If the hypervisor wants
to have us back at this point, somebody needs to restart us.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In vhost-user mode, by default, create a second UNIX domain socket
accepting connections from passt-repair, with the usual listener
socket.
When we need to set or clear TCP_REPAIR on sockets, we'll send them
via SCM_RIGHTS to passt-repair, who sets the socket option values we
ask for.
To that end, introduce batched functions to request TCP_REPAIR
settings on sockets, so that we don't have to send a single message
for each socket, on migration. When needed, repair_flush() will
send the message and check for the reply.
Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Most of the information in struct ctx doesn't need to be migrated.
Either it's strictly back end information which is allowed to differ
between the two ends, or it must already be configured identically on
the two ends.
There are a few exceptions though. In particular passt learns several
addresses of the guest by observing what it sends out. If we lose
this information across migration we might get away with it, but if
there are active flows we might misdirect some packets before
re-learning the guest address.
Avoid this by migrating the guest's observed addresses.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Coding style stuff, comments, etc.]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Introduce facilities for guest migration on top of vhost-user
infrastructure. Add migration facilities based on top of the current
vhost-user infrastructure, moving vu_migrate() and related functions
to migrate.c.
Versioned migration stages define function pointers to be called on
source or target, or data sections that need to be transferred.
The migration header consists of a magic number, a version number for the
encoding, and a "compat_version" which represents the oldest version which
is compatible with the current one. We don't use it yet, but that allows
for the future possibility of backwards compatible protocol extensions.
Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
head_cnt represents the number of frames we're going to forward to the
guest in tcp_vu_sock_recv(), each of which could require multiple
buffers ("elements"). We initialise it with as many frames as we can
find space for in vu buffers, and we then need to adjust it down to
the number of frames we actually (partially) filled.
We adjust it down based on number of individual buffers used by the
data from recvmsg(). At this point 'i' is *one greater than* that
number of buffers, so we need to discard all (unused) frames with a
buffer index >= i, instead of > i.
Reported-by: David Gibson <david@gibson.dropbear.id.au>
[david: Contributed actual commit message]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This probably doesn't cover all the cases where we should send a
zero-window probe, but it's rather unobtrusive and obvious, so start
from here, also because I just observed this case (without the fix
from the previous patch, to take into account window information from
keep-alive segments).
If we hit the ACK timeout, and try re-sending data from the socket,
if the window is zero, we'll just fail again, go back to the timer,
and so on, until we hit the maximum number of re-transmissions and
reset the connection.
Don't do that: forcibly try to send something by implementing the
equivalent of a zero-window probe in this case.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
It looks like a detail, but it's critical if we're dealing with
somebody, such as near-future self, using TCP_REPAIR to migrate TCP
connections in the guest or container.
The last packet sent from the 'source' process/guest/container
typically reports a small window, or zero, because the guest/container
hadn't been draining it for a while.
The next packet, appearing as the target sets TCP_REPAIR_OFF on the
migrated socket, is a keep-alive (also called "window probe" in CRIU
or TCP_REPAIR-related code), and it comes with an updated window
value, reflecting the pre-migration "regular" value.
If we ignore it, it might take a while/forever before we realise we
can actually restart sending.
Fixes: 238c69f9af45 ("tcp: Acknowledge keep-alive segments, ignore them for the rest")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Both DHCPv4 and DHCPv6 has the capability to pass the hostname to
clients, the DHCPv4 uses option 12 (hostname) while the DHCPv6 uses option 39
(client fqdn), for some virt deployments like kubevirt is expected to
have the VirtualMachine name as the guest hostname.
This change add the following arguments:
- -H --hostname NAME to configure the hostname DHCPv4 option(12)
- --fqdn NAME to configure client fqdn option for both DHCPv4(81) and
DHCPv6(39)
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This should be a relatively common case and I'm a bit surprised it's
been broken since I added the "gateway mapping" functionality, but it
doesn't happen with Podman, and not with systemd-resolved or similar
local proxies, and also not with servers where typically the gateway
is just a router and not a DNS resolver. That could be the reason why
nobody noticed until now.
By default, we'll map the address of the default gateway, in
containers and guests, to represent "the host", so that we have a
well-defined way to reach the host. Say:
0.0029: NAT to host 127.0.0.1: 192.168.100.1
But if the host gateway is also a DNS resolver:
0.0029: DNS:
0.0029: 192.168.100.1
then we'll send DNS queries directed to it to the host instead:
0.0372: Flow 0 (INI): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => ?
0.0372: Flow 0 (TGT): INI -> TGT
0.0373: Flow 0 (TGT): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53
0.0373: Flow 0 (UDP flow): TGT -> TYPED
0.0373: Flow 0 (UDP flow): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53
0.0373: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 31049
0.0374: Flow 0 (UDP flow): TYPED -> ACTIVE
0.0374: Flow 0 (UDP flow): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53
which doesn't quite work, of course:
0.0374: pasta: epoll event on UDP reply socket 95 (events: 0x00000008)
0.0374: ICMP error on UDP socket 95: Connection refused
unless the host is a resolver itself... but then we wouldn't find the
address of the gateway in its /etc/resolv.conf, presumably.
Fix this by making an exception for DNS traffic: if the default
gateway is a resolver, match on DNS traffic going to the default
gateway, and explicitly forward it to the configured resolver.
Reported-by: Prafulla Giri <prafulla.giri@protonmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It looks like me, myself and I couldn't agree on the "simple" protocol
between passt and passt-repair. The man page and passt say it's one
confirmation per command, but the passt-repair implementation had one
confirmation per socket instead.
This caused all sort of mysterious issues with repair mode
pseudo-randomly enabled, and leading to hours of fun (mostly not
mine). Oops.
Switch to one confirmation per command (of course).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The logic composing the DHCP reply message is reusing the request
message to compose it, future long options like FQDN may
exceed the request message limit making it go beyond the lower
bound.
This change creates a new reply message with a fixed options size of 308
and fills it in with proper fields from requests adding on top the generated
options, this way the reply lower bound does not depend on the request.
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While main() conventionally returns int, and we need a return at the
end of the function to avoid compiler warnings, turning that return
into _exit() to avoid exit handlers triggers a Coverity warning. It's
unreachable code anyway, so switch that single occurence back to a
plain return.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
There's no inverse function for CMSG_LEN(), so we need to loop over
SCM_MAX_FD (253) possible input values. The previous calculation is
clearly wrong, as not every int takes CMSG_LEN(sizeof(int)) in cmsg
data.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we use glibc's perror(), we need to allow dup() and fcntl() in our
seccomp profiles, which are a bit too much for this simple helper. On
top of that, we would probably need a wrapper to avoid allocation for
translated messages.
While at it: ECONNRESET is just a close() from passt, treat it like
EOF.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If libvirtd is triggered by an unprivileged user, the virt-aa-helper
mechanism doesn't work, because per-VM profiles can't be instantiated,
and as a result libvirtd runs unconfined.
This means passt can't start, because the passt subprofile from
libvirt's profile is not loaded either.
Example:
$ virsh start alpine
error: Failed to start domain 'alpine'
error: internal error: Child process (passt --one-off --socket /run/user/1000/libvirt/qemu/run/passt/1-alpine-net0.socket --pid /run/user/1000/libvirt/qemu/run/passt/1-alpine-net0-passt.pid --tcp-ports 40922:2) unexpected fatal signal 11
Add an annoying workaround for the moment being. Much better than
encouraging users to start guests as root, or to disable AppArmor
altogether.
Reported-by: Prafulla Giri <prafulla.giri@protonmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...perhaps I should adopt the healthy habit of actually reading
headers instead of using my mental copy.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
When building against musl headers:
- sizeof() needs stddef.h, as it should be;
- we can't initialise a struct msghdr by simply listing fields in
order, as they contain explicit padding fields. Use field names
instead.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>