passt

mirror of https://passt.top/passt synced 2025-02-24 11:52:19 +00:00

Author	SHA1	Message	Date
David Gibson	0349cf637f	util: Rename and make global vu_remove_watch() vu_remove_watch() is used in vhost_user.c to remove an fd from the global epoll set. There's nothing really vhost user specific about it though, so rename, move to util.c and use it in a bunch of places outside vhost_user.c where it makes things marginally more readable. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-03 07:32:51 +01:00
David Gibson	10c4a9e1b3	tcp: Always pass NULL event with EPOLL_CTL_DEL In tcp_epoll_ctl() we pass an event pointer with EPOLL_CTL_DEL, even though it will be ignored. It's possible this was a workaround for pre-2.6.9 kernels which required a non-NULL pointer here, but we rely on the kernel accepting NULL events for EPOLL_CTL_DEL in lots of other places. Use NULL instead for simplicity and consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-03 07:32:37 +01:00
Laurent Vivier	dd6a6854c7	vhost-user: Implement an empty VHOST_USER_SEND_RARP command Passt cannot manage and doesn't need to manage the broadcast of a fake RARP, but QEMU will report an error message if Passt doesn't implement it. Implement an empty SEND_RARP command to silence QEMU error message. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-24 21:40:05 +01:00
Stefano Brivio	d477a1fb03	netlink: Skip loopback interface while looking for a template There might be reasons to have routes on the loopback interface, for example Any-IP/AnyIP routes as implemented by Linux kernel commit ab79ad14a2d5 ("ipv6: Implement Any-IP support for IPv6."). If we use the loopback interface as a template, though, we'll pick 'lo' (typically) as interface name for our tap interface, but we'll already have an interface called 'lo' in the target namespace, and as we TUNSETIFF on it, we'll fail with EINVAL, because it's not a tap interface. Skip the loopback interface while looking for a template interface or, more accurately, skip the interface with index 1. Strictly speaking, we should fetch interface flags via RTM_GETLINK instead, and check for IFF_LOOPBACK, but interleaving that request while we're iterating over routes is unnecessarily complicated. Link: https://www.reddit.com/r/podman/comments/1i6pj7u/starting_pod_without_external_network/ Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-01-24 21:39:52 +01:00
Laurent Vivier	4f2c8e7913	vhost_user: Drop packet with unsupported iovec array If the iovec array cannot be managed, drop it rather than passing the second entry to tap_add_packet(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> 2025_01_21.4f2c8e7	2025-01-21 14:30:42 +01:00
Stefano Brivio	ec5c4d936d	tcp: Set PSH flag for last incoming packets in a batch So far we omitted setting PSH flags for inbound traffic altogether: as we ignore the nature of the data we're sending, we can't conclude that some data is more or less urgent. This works fine with Linux guests, as the Linux kernel doesn't do much with it, on input: it will generally deliver data to the application layer without delay. However, with Windows, things change: if we don't set the PSH flag on interactive inbound traffic, we can expect long delays before the data is delivered to the application. This is very visible with RDP, where packets we send on behalf of the RDP client are delivered with delays exceeding one second: $ tshark -r rdp.pcap -td -Y 'frame.number in { 33170 .. 33173 }' --disable-protocol tls 33170 0.030296 93.235.154.248 → 88.198.0.164 54 TCP 49012 → 3389 [ACK] Seq=13820 Ack=285229 Win=387968 Len=0 33171 0.985412 88.198.0.164 → 93.235.154.248 105 TCP 3389 → 49012 [PSH, ACK] Seq=285229 Ack=13820 Win=63198 Len=51 33172 0.030373 93.235.154.248 → 88.198.0.164 54 TCP 49012 → 3389 [ACK] Seq=13820 Ack=285280 Win=387968 Len=0 33173 1.383776 88.198.0.164 → 93.235.154.248 424 TCP 3389 → 49012 [PSH, ACK] Seq=285280 Ack=13820 Win=63198 Len=370 in this example (packet capture taken by passt), frame #33172 is a mouse event sent by the RDP client, and frame #33173 is the first event (display reacting to click) sent back by the server. This appears as a 1.4 s delay before we get frame #33173. If we set PSH, instead: $ tshark -r rdp_psh.pcap -td -Y 'frame.number in { 314 .. 317 }' --disable-protocol tls 314 0.002503 93.235.154.248 → 88.198.0.164 170 TCP 51066 → 3389 [PSH, ACK] Seq=7779 Ack=74047 Win=31872 Len=116 315 0.000557 88.198.0.164 → 93.235.154.248 54 TCP 3389 → 51066 [ACK] Seq=79162 Ack=7895 Win=62872 Len=0 316 0.012752 93.235.154.248 → 88.198.0.164 170 TCP 51066 → 3389 [PSH, ACK] Seq=7895 Ack=79162 Win=31872 Len=116 317 0.011927 88.198.0.164 → 93.235.154.248 107 TCP 3389 → 51066 [PSH, ACK] Seq=79162 Ack=8011 Win=62756 Len=53 here, in frame #316, our mouse event is delivered without a delay and receives a response in approximately 12 ms. Set PSH on the last segment for any batch we dequeue from the socket, that is, set it whenever we know that we might not be sending data to the same port for a while. Reported-by: NN708 Link: https://bugs.passt.top/show_bug.cgi?id=107 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-21 14:28:44 +01:00
Stefano Brivio	db2c91ae86	tcp: Set ACK flag on all RST segments, even for client in SYN-SENT state Somewhat curiously, RFC 9293, section 3.10.7.3, states: If the state is SYN-SENT, then [...] Second, check the RST bit: - If the RST bit is set, [...] o If the ACK was acceptable, then signal to the user "error: connection reset", drop the segment, enter CLOSED state, delete TCB, and return. Otherwise (no ACK), drop the segment and return. which matches verbatim RFC 793, pages 66-67, and is implemented as-is by tcp_rcv_synsent_state_process() in the Linux kernel, that is: /* No ACK in the segment / if (th->rst) { / rfc793: * "If the RST bit is set * * Otherwise (no ACK) drop the segment and return." / goto discard_and_undo; } meaning that if a client is in SYN-SENT state, and we send a RST segment once we realise that we can't establish the outbound connection, the client will ignore our segment and will need to pointlessly wait until the connection times out instead of aborting it right away. The ACK flag on a RST, in this case, doesn't really seem to have any function, but we must set it nevertheless. The ACK sequence number is already correct because we always set it before calling tcp_prepare_flags(), whenever relevant. This leaves us with no cases where we should not* set the ACK flag on non-SYN segments, so always set the ACK flag for RST segments. Note that non-SYN, non-RST segments were already covered by commit 4988e2b40631 ("tcp: Unconditionally force ACK for all !SYN, !RST packets"). Reported-by: Dirk Janssen <Dirk.Janssen@schiphol.nl> Reported-by: Roeland van de Pol <Roeland.van.de.Pol@schiphol.nl> Reported-by: Robert Floor <Robert.Floor@schiphol.nl> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-21 14:28:44 +01:00
Stefano Brivio	54bb972cfb	tcp: Disable Nagle's algorithm (set TCP_NODELAY) on all sockets Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on both sides"), David argues that, in general, we don't know what kind of TCP traffic we're dealing with, on any side or path. TCP segments might have been delivered to our socket with a PSH flag, but we don't have a way to know about it. Similarly, the guest might send us segments with PSH or URG set, but we don't know if we should generally TCP_CORK sockets and uncork on those flags, because that would assume they're running a Linux kernel (and a particular version of it) matching the kernel that delivers outbound packets for us. Given that we can't make any assumption and everything might very well be interactive traffic, disable Nagle's algorithm on all non-spliced sockets as well. After all, John Nagle himself is nowadays recommending that delayed ACKs should never be enabled together with his algorithm, but we don't have a practical way to ensure that our environment is free from delayed ACKs (TCP_QUICKACK is not really usable for this purpose): https://news.ycombinator.com/item?id=34180239 Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-01-21 14:28:37 +01:00
Stefano Brivio	8757834d14	tcp: Buffer sizes are not inherited on accept()/accept4() ...so it's pointless to set SO_RCVBUF and SO_SNDBUF on listening sockets. Call tcp_sock_set_bufsize() after accept4(), for inbound sockets. As we didn't have large buffer sizes set for inbound sockets for a long time (they are set explicitly only if the maximum size is big enough, more than than the ~200 KiB default), I ran some more throughput tests for this one, and I see slightly better numbers (say, 17 gbps instead of 15 gbps guest to host without vhost-user). Fixes: 904b86ade7db ("tcp: Rework window handling, timers, add SO_RCVLOWAT and pools for sockets/pipes") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-01-21 14:28:14 +01:00
Laurent Vivier	c96a88d550	vhost_user: remove ASSERT() on iovec number Replace ASSERT() on the number of iovec in the element and on the first entry length by a debug() message. Signed-off-by: Laurent Vivier <lvivier@redhat.com> [sbrivio: Fix typo in failure message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	412ed4f09f	vhost-user: Report to front-end we support VHOST_USER_PROTOCOL_F_DEVICE_STATE Report to front-end that we support device state commands: VHOST_USER_CHECK_DEVICE_STATE VHOST_USER_SET_LOG_BASE These feature is needed to transfer backend state using frontend channel. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	31d70024be	vhost-user: add VHOST_USER_SET_DEVICE_STATE_FD command Set the file descriptor to use to transfer the backend device state during migration. Signed-off-by: Laurent Vivier <lvivier@redhat.com> [sbrivio: Fixed nits and coding style here and there] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	878e163454	vhost-user: add VHOST_USER_CHECK_DEVICE_STATE command After transferring the back-end’s internal state during migration, check whether the back-end was able to successfully fully process the state. The value returned indicates success or error; 0 is success, any non-zero value is an error. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	78c73e9395	vhost-user: Report to front-end we support VHOST_USER_PROTOCOL_F_LOG_SHMFD This features allows QEMU to be migrated. We need also to report VHOST_F_LOG_ALL. This protocol feature reports we can log the page update and implement VHOST_USER_SET_LOG_BASE and VHOST_USER_SET_LOG_FD. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	3c1d91b816	vhost-user: add VHOST_USER_SET_LOG_BASE command Sets logging shared memory space. When the back-end has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature, the log memory fd is provided in the ancillary data of VHOST_USER_SET_LOG_BASE message, the size and offset of shared memory area provided in the message. Signed-off-by: Laurent Vivier <lvivier@redhat.com> [sbrivio: Fix coding style in a bunch of places] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	538312af19	vhost-user: Pass vu_dev to more virtio functions vu_dev will be needed to log page update. Add the parameter to: vring_used_write() vu_queue_fill_by_index() vu_queue_fill() vring_used_idx_set() vu_queue_flush() The new parameter is unused for now. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	b04195c60f	vhost-user: add VHOST_USER_SET_LOG_FD command VHOST_USER_SET_LOG_FD is an optional message with an eventfd in ancillary data, it may be used to inform the front-end that the log has been modified. Signed-off-by: Laurent Vivier <lvivier@redhat.com> [sbrivio: Fix comment to vu_set_log_fd_exec()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Laurent Vivier	6016e04a3a	vhost-user: update protocol features and commands list vhost-user protocol specification has been updated with feature flags and commands we will need to implement migration. Signed-off-by: Laurent Vivier <lvivier@redhat.com> [sbrivio: Fix comment to union vhost_user_payload] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-20 19:51:24 +01:00
Stefano Brivio	a8f4fc481c	tcp: Mask EPOLLIN altogether if we're blocked waiting on an ACK from the guest There are pretty much two cases of the (misnomer) STALLED: in one case, we could send more data to the guest if it becomes available, and in another case, we can't, because we filled the window. If, in this second case, we keep EPOLLIN enabled, but never read from the socket, we get short but CPU-annoying storms of EPOLLIN events, upon which we reschedule the ACK timeout handler, never read from the socket, go back to epoll_wait(), and so on: timerfd_settime(76, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=2, tv_nsec=0}}, NULL) = 0 epoll_wait(3, [{events=EPOLLIN, data={u32=10497, u64=38654716161}}], 8, 1000) = 1 timerfd_settime(76, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=2, tv_nsec=0}}, NULL) = 0 epoll_wait(3, [{events=EPOLLIN, data={u32=10497, u64=38654716161}}], 8, 1000) = 1 timerfd_settime(76, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=2, tv_nsec=0}}, NULL) = 0 epoll_wait(3, [{events=EPOLLIN, data={u32=10497, u64=38654716161}}], 8, 1000) = 1 also known as: 29.1517: Flow 2 (TCP connection): timer expires in 2.000s 29.1517: Flow 2 (TCP connection): timer expires in 2.000s 29.1517: Flow 2 (TCP connection): timer expires in 2.000s which, for some reason, becomes very visible with muvm and aria2c downloading from a server nearby in parallel chunks. That's because EPOLLIN isn't cleared if we don't read from the socket, and even with EPOLLET, epoll_wait() will repeatedly wake us up until we actually read something. In this case, we don't want to subscribe to EPOLLIN at all: all we're waiting for is an ACK segment from the guest. Differentiate this case with a new connection flag, ACK_FROM_TAP_BLOCKS, which doesn't just indicate that we're waiting for an ACK from the guest (ACK_FROM_TAP_DUE), but also that we're blocked waiting for it. If this flag is set before we set STALLED, EPOLLIN will be masked while we set EPOLLET because of STALLED. Whenever we clear STALLED, we also clear this flag. This is definitely not elegant, but it's a minimal fix. We can probably simplify this at a later point by having a category of connection flags directly corresponding to epoll flags, and dropping STALLED altogether, or, perhaps, always using EPOLLET (but we need a mechanism to re-check sockets for pending data if we can't temporarily write to the guest). I suspect that this might also be implied in https://github.com/containers/podman/issues/23686, hence the Link: tag. It doesn't necessarily mean I'm fixing it (I can't reproduce that). Link: https://github.com/containers/podman/issues/23686 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-16 21:15:33 +01:00
Stefano Brivio	b8f573cdc2	tcp: Set EPOLLET when when reading from a socket fails with EAGAIN Before SO_PEEK_OFF support was introduced by commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), we would peek data from sockets using a "discard" buffer as first iovec element, so that, unless we had no pending data at all, we would always get a positive return code from recvmsg() (except for closing connections or errors). If we couldn't send more data to the guest, in the window, we would set the STALLED flag (causing the epoll descriptor to switch to edge-triggered mode), and return early from tcp_data_from_sock(). With SO_PEEK_OFF, we don't have a discard buffer, and if there's data on the socket, but nothing beyond our current peeking offset, we'll get EAGAIN instead of our current "discard" length. In that case, we return even earlier, and we don't set EPOLLET on the socket as a result. As reported by Asahi Lina, this causes event loops where the kernel is signalling socket readiness, because there's data we didn't dequeue yet (waiting for the guest to acknowledge it), but we won't actually peek anything new, and return early without setting EPOLLET. This is the original report, mentioning the originally proposed fix: -- When there is unacknowledged data in the inbound socket buffer, passt leaves the socket in the epoll instance to accept new data from the server. Since there is already data in the socket buffer, an epoll without EPOLLET will repeatedly fire while no data is processed, busy-looping the CPU: epoll_pwait(3, [...], 8, 1000, NULL, 8) = 4 recvmsg(25, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(169, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(111, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(180, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) epoll_pwait(3, [...], 8, 1000, NULL, 8) = 4 recvmsg(25, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(169, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(111, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(180, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) Add in the missing EPOLLET flag for this case. This brings CPU usage down from around ~80% when downloading over TCP, to ~5% (use case: passt as network transport for muvm, downloading Steam games). -- we can't set EPOLLET unconditionally though, at least right now, because we don't monitor the guest tap for EPOLLOUT in case we fail to write on that side because we filled up that buffer (and not the window of a TCP connection). Instead, rely on the observation that, once a connection is established, we only get EAGAIN on recvmsg() if we are attempting to peek data from a socket with a non-zero peeking offset: we only peek when there's pending data on a socket, and in that case, if we peek without offset, we'll always see some data. And if we peek data with a non-zero offset and get EAGAIN, that means that we're either waiting for more data to arrive on the socket (which would cause further wake-ups, even with EPOLLET), or we're waiting for the guest to acknowledge some of it, which would anyway cause a wake-up. In that case, it's safe to set STALLED and, in turn, EPOLLET on the socket, which fixes the EPOLLIN event loop. While we're establishing a connection from the socket side, though, we'll call, once, tcp_{buf,vu}_data_from_sock() to see if we got any data while we were waiting for SYN, ACK from the guest. See the comment at the end of tcp_conn_from_sock_finish(). And if there's no data queued on the socket as we check, we'll also get EAGAIN, even if our peeking offset is zero. For this reason, we need to additionally check that 'already_sent' is not zero, meaning, explicitly, that our peeking offset is not zero. Reported-by: Asahi Lina <lina@asahilina.net> Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-16 21:15:33 +01:00
Stefano Brivio	22cf08ba00	tcp: Don't subscribe to EPOLLOUT events on STALLED I inadvertently added that in an unrelated change, but it doesn't make sense: STALLED means we have pending socket data that we can't write to the guest, not the other way around. Fixes: bb708111833e ("treewide: Packet abstraction with mandatory boundary checks") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-16 21:15:33 +01:00
Stefano Brivio	707f77b0a9	tcp: Fix ACK sequence getting out of sync on EPOLLOUT wake-up In the next patches, I'm extending the usage of STALLED to a few more cases. Doing so revealed this issue: if we set STALLED and, consequently, EPOLLOUT (which is wrong, fixed later) right after we set a connection to ESTABLISHED (which also happened by mistake while I was preparing another change), with the guest sending data together with the final ACK in the handshake, say: 41.3661: vhost-user: got kick_data: 0000000000000001 idx: 1 41.3662: Flow 2 (NEW): FREE -> NEW 41.3663: Flow 2 (INI): NEW -> INI 41.3663: Flow 2 (INI): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => ? 41.3665: Flow 2 (TGT): INI -> TGT 41.3666: Flow 2 (TGT): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3667: Flow 2 (TCP connection): TGT -> TYPED 41.3667: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3669: Flow 2 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT 41.3670: Flow 2 (TCP connection): Side 0 hash table insert: bucket: 339814 41.3672: Flow 2 (TCP connection): TYPED -> ACTIVE 41.3673: Flow 2 (TCP connection): TAP [2a01:4f8:222:904::2]:52536 -> [2001:db8:9a55::1]:10003 => HOST [::]:0 -> [2001:db8:9a55::1]:10003 41.3674: Flow 2 (TCP connection): TAP_SYN_ACK_SENT: SYN_SENT -> SYN_RCVD 41.3675: Flow 2 (TCP connection): ACK_FROM_TAP_DUE 41.3675: Flow 2 (TCP connection): timer expires in 10.000s 41.3675: vhost-user: got kick_data: 0000000000000001 idx: 1 41.3676: Flow 2 (TCP connection): ACK_FROM_TAP_DUE dropped 41.3676: Flow 2 (TCP connection): ESTABLISHED: SYN_RCVD -> ESTABLISHED 41.3678: Flow 2 (TCP connection): STALLED 41.3678: vhost-user: got kick_data: 0000000000000002 idx: 1 41.3679: Flow 2 (TCP connection): ACK_TO_TAP_DUE 41.3680: Flow 2 (TCP connection): timer expires in 0.010s 41.3680: Flow 2 (TCP connection): STALLED dropped we'll immediately get an EPOLLOUT event, call tcp_update_seqack_wnd(), but ignore window and ACK sequence update. At this point, we think we acknowledged all the data to the guest (but we didn't) and we'll happily proceed to clear the ACK_TO_TAP_DUE flag: 41.3780: Flow 2 (TCP connection): ACK_TO_TAP_DUE dropped 41.3780: Flow 2 (TCP connection): timer expires in 7200.000s 41.5754: vhost-user: got kick_data: 0000000000000001 idx: 1 41.9956: vhost-user: got kick_data: 0000000000000001 idx: 1 42.8275: vhost-user: got kick_data: 0000000000000001 idx: 1 while the guest starts retransmitting that data desperately, without ever getting an ACK segment from us: 1433 38.746353 2a01:4f8:222:904::2 → 2001:db8:9a55::1 94 TCP 54312 → 10003 [SYN] Seq=0 Win=65460 Len=0 MSS=65460 SACK_PERM TSval=1089126192 TSecr=0 WS=128 1434 38.747357 2001:db8:9a55::1 → 2a01:4f8:222:904::2 82 TCP 10003 → 54312 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=61440 WS=256 1435 38.747500 2a01:4f8:222:904::2 → 2001:db8:9a55::1 74 TCP 54312 → 10003 [ACK] Seq=1 Ack=1 Win=65536 Len=0 1436 38.747769 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1437 38.747798 2a01:4f8:222:904::2 → 2001:db8:9a55::1 32841 TCP 54312 → 10003 [ACK] Seq=8193 Ack=1 Win=65536 Len=32767 1438 38.748049 2001:db8:9a55::1 → 2a01:4f8:222:904::2 74 TCP [TCP Window Update] 10003 → 54312 [ACK] Seq=1 Ack=1 Win=65280 Len=0 1439 38.954044 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1440 39.370096 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 1441 40.202135 2a01:4f8:222:904::2 → 2001:db8:9a55::1 8266 TCP [TCP Retransmission] 54312 → 10003 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=8192 because seq_ack_to_tap is already set to the sequence after frame number 1437 in the example. For some reason, I could only reproduce this with vhost-user, IPv6, and passt running under valgrind while taking captures. Even under these conditions, it happens quite rarely. Forcibly send an ACK segment if we update the ACK sequence (or the advertised window). Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-16 21:15:33 +01:00
Laurent Vivier	1b95bd6fa1	vhost_user: fix multibuffer from linux Under some conditions, linux can provide several buffers in the same element (multiple entries in the iovec array). I didn't identify what changed between the kernel guest that provides one buffer and the one that provides several (doesn't seem to be a kernel change or a configuration change). Fix the following assert: ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg What I can see is the buffer can be splitted in two iovecs: - vnet header - packet data This change manages this special case but the real fix will be to allow tap_add_packet() to manage iovec array. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-15 23:05:31 +01:00
Stefano Brivio	f04b483d15	test/pasta_podman: Run Podman tests on a single CPU thread Increasingly often, I'm getting occasional failures of the same type as https://github.com/containers/podman/issues/24147. I guess it mostly depends on the system load. It will be a while until I'll actually run tests on a kernel including my fix for it, kernel commit a502ea6fa94b ("udp: Deal with race between UDP socket address change and rehash"), so add a horrible workaround using taskset(1), for the moment. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-15 22:16:32 +01:00
Laurent Vivier	2c174f1fe8	checksum: fix checksum with odd base address csum_unfolded() must call csum_avx2() with a 32byte aligned base address. To be able to do that if the buffer is not correctly aligned, it splits the buffers in 2 parts, the second part is 32byte aligned and can be used with csum_avx2(), the first part is the remaining part, that is not 32byte aligned and we use sum_16b() to compute the checksum. A problem appears if the length of the first part is odd because the checksum is using 16bit words to do the checksum. If the length is odd, when the second part is computed, all words are shifted by 1 byte, meaning weight of upper and lower byte is swapped. For instance a 13 bytes buffer: bytes: aa AA bb BB cc CC dd DD ee EE ff FF gg 16bit words: AAaa BBbb CCcc DDdd EEee FFff 00gg If we don't split the sequence, the checksum is: AAaa + BBbb + CCcc + DDdd + EEee + FFff + 00gg If we split the sequence with an even length for the first part: (AAaa + BBbb) + (CCcc + DDdd + EEee + FFff + 00gg) But if the first part has an odd length: (AAaa + BBbb + 00cc) + (ddCC + eeDD + ffEE + ggFF) To avoid the problem, do not call csum_avx2() if the first part cannot have an even length, and compute the checksum of all the buffer using sum_16b(). This is slower but it can only happen if the buffer base address is odd, and this can only happen if the binary is built using '-Os', and that means we have chosen to prioritize size over speed. Reported-by: Mike Jones <mike@mjones.io> Link: https://bugs.passt.top/show_bug.cgi?id=108 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Added comment explaining why we check for pad & 1] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-10 22:20:23 +01:00
Stefano Brivio	725acd111b	tcp_splice: Set (again) TCP_NODELAY on both sides In commit 7ecf69329787 ("pasta, tcp: Don't set TCP_CORK on spliced sockets") I just assumed that we wouldn't benefit from disabling Nagle's algorithm once we drop TCP_CORK (and its 200ms fixed delay). It turns out that with some patterns, such as a PostgreSQL server in a container receiving parameterised, short queries, for which pasta sees several short inbound messages (Parse, Bind, Describe, Execute and Sync commands getting each one their own packet, 5 to 49 bytes TCP payload each), we'll read them usually in two batches, and send them in matching batches, for example: 9165.2467: pasta: epoll event on connected spliced TCP socket 117 (events: 0x00000001) 9165.2468: Flow 0 (TCP connection (spliced)): 76 from read-side call 9165.2468: Flow 0 (TCP connection (spliced)): 76 from write-side call (passed 524288) 9165.2469: pasta: epoll event on connected spliced TCP socket 117 (events: 0x00000001) 9165.2470: Flow 0 (TCP connection (spliced)): 15 from read-side call 9165.2470: Flow 0 (TCP connection (spliced)): 15 from write-side call (passed 524288) 9165.2944: pasta: epoll event on connected spliced TCP socket 118 (events: 0x00000001) and the kernel delivers the first one, waits for acknowledgement from the receiver, then delivers the second one. This adds very substantial and unnecessary delay. It's usually a fixed ~40ms between the two batches, which is clearly unacceptable for loopback connections. In this example, the delay is shown by the timestamp of the response from socket 118. The peer (server) doesn't actually take that long (less than a millisecond), but it takes that long for the kernel to deliver our request. To avoid batching and delays, disable Nagle's algorithm by setting TCP_NODELAY on both internal and external sockets: this way, we get one inbound packet for each original message, we transfer them right away, and the kernel delivers them to the process in the container as they are, without delay. We can do this safely as we don't care much about network utilisation when there's in fact pretty much no network (loopback connections). This is unfortunately not visible in the TCP request-response tests from the test suite because, with smaller messages (we use one byte), Nagle's algorithm doesn't even kick in. It's probably not trivial to implement a universal test covering this case. Fixes: 7ecf69329787 ("pasta, tcp: Don't set TCP_CORK on spliced sockets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-06 10:10:29 +01:00
Stefano Brivio	3876fc780d	seccomp: Unconditionally allow accept(2) even if accept4(2) is present On Alpine Linux 3.21, passt aborts right away as soon as QEMU connects to it. Most likely, this has always been the case with musl, because since musl commit dc01e2cbfb29 ("add fallback emulation for accept4 on old kernels"), accept4() without flags is implemented using accept(). However, I guess that nobody realised earlier because it's typically pasta(1) being used on musl-based distributions, and the only place where we call accept4() without flags is tap_listen_handler(). Add accept() to the list of allowed system calls regardless of the presence of accept4(). Reported-by: NN708 <nn708@outlook.com> Link: https://bugs.passt.top/show_bug.cgi?id=106 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-01-05 23:49:11 +01:00
Laurent Vivier	898e853635	virtio: Use const pointer for vu_dev We don't modify the structure in some virtio functions. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-05 23:48:59 +01:00
Stefano Brivio	324233bd9b	udp_flow: Don't block multicast and broadcast messages It was reported that SSDP notifications sent from a container (with e.g. minidlna) stopped appearing on the network starting from commit 1db4f773e87f ("udp: Improve detail of UDP endpoint sanity checking"). As a minimal reproducer using minidlnad(8): $ mkdir /tmp/minidlna $ cat conf media_dir=/tmp/minidlna db_dir=/tmp/minidlna $ ./pasta -d --config-net -- sh -c '/usr/sbin/minidlnad -p 31337 -S -f conf -P /dev/null & (sleep 1; killall minidlnad)' [...] 1.0327: Flow 0 (NEW): FREE -> NEW 1.0327: Flow 0 (INI): NEW -> INI 1.0327: Flow 0 (INI): TAP [88.198.0.164]:54185 -> [239.255.255.250]:1900 => ? 1.0327: Flow 0 (INI): Invalid endpoint on UDP packet 1.0327: Flow 0 (FREE): INI -> FREE 1.0328: Flow 0 (FREE): TAP [88.198.0.164]:54185 -> [239.255.255.250]:1900 => ? 1.0328: Dropping datagram with no flow TAP 88.198.0.164:54185 -> 239.255.255.250:1900 This is an actual regression as there's no particular reason to block outbound multicast UDP packets. And even if we don't handle multicast groups in any particular way (https://bugs.passt.top/show_bug.cgi?id=2, "Add IGMP/MLD proxy"), there's no reason to block inbound multicast or broadcast packets either, should they ever be somehow delivered to passt or pasta. Let multicast and broadcast packets through, refusing only to establish flows with unspecified endpoint, as those would actually cause havoc in the flow table. IP-wise, SSDP notifications look like this (after this patch), inside and outside: $ pasta -p /tmp/minidlna.pcap --config-net -- sh -c '/usr/sbin/minidlnad -p 31337 -S -f minidlna.conf -P /dev/null & (sleep 1; killall minidlnad)' [...] $ tshark -a packets:1 -r /tmp/minidlna.pcap ssdp 2 0.074808 88.198.0.164 ? 239.255.255.250 SSDP 200 NOTIFY * HTTP/1.1 # tshark -i ens3 -a packets:1 multicast 2>/dev/null 1 0.000000000 88.198.0.164 ? 239.255.255.250 SSDP 200 NOTIFY * HTTP/1.1 Link: https://github.com/containers/podman/issues/24871 Fixes: 1db4f773e87f ("udp: Improve detail of UDP endpoint sanity checking") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-05 23:48:59 +01:00
Stefano Brivio	2385b69a66	Makefile: Report error and stop if we can't set TARGET I don't think it's necessarily productive to check all the possible error conditions in the Makefile, but this one is annoying: issue 'make' without a C compiler, then install one, and build again. Then run passt and it will mysteriously terminate on epoll_wait(), because seccomp.h is good enough to build against, but the resulting seccomp filter doesn't allow any system call. Not really fun to debug. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-01-05 23:48:37 +01:00
Stefano Brivio	e5ba8adef7	README: Mark vhost-user as supported Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-12 10:50:48 +01:00
Stefano Brivio	09478d55fe	treewide: Dodge dynamic memory allocation in strerror() from glibc > 2.40 With glibc commit 25a5eb4010df ("string: strerror, strsignal cannot use buffer after dlmopen (bug 32026)"), strerror() now needs, at least on x86, the getrandom() and brk() system calls, in order to fill in the locale-translated error message. But getrandom() and brk() are not allowed by our seccomp profiles. This became visible on Fedora Rawhide with the "podman login and logout" Podman tests, defined at test/e2e/login_logout_test.go in the Podman source tree, where pasta would terminate upon printing error descriptions (at least the ones related to the SO_ERROR queue for spliced connections). Avoid dynamic memory allocation by calling strerrordesc_np() instead, which is a GNU function returning a static, untranslated version of the error description. If it's not available, keep calling strerror(), which at that point should be simple enough as to be usable (at least, that's currently the case for musl). Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/24804 Analysed-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Paul Holzinger <pholzing@redhat.com> 2024_12_11.09478d5	2024-12-11 12:21:23 +01:00
Jon Maloy	e24f026222	pasta: make it possible to disable socket splicing During testing it is sometimes useful to force traffic which would normally be forwared by socket splicing through the tap interface. In this commit, we add a command switch enabling such funtionality for inbound local traffic. For outbound local traffic this is much trickier, if even possible, so leave that for a later commit. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-11 01:47:37 +01:00
Laurent Vivier	947f5cdb93	tap: Call vu_init() with --fd We need to initialize vhost-user structures with --fd too. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-10 12:26:56 +01:00
Laurent Vivier	2139ad33fc	tap: Use a common function to start a new connection Merge code from tap_backend_init(), tap_sock_tun_init() and tap_listen_handler() to set epoll_ref entry and to add it to epollfd. No functionality change Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-10 12:26:34 +01:00
Laurent Vivier	8996d183c5	udp_vu: update segment size In udp_vu_sock_recv(), collect a segment with a size defined to IP_MAX_MTU + ETH_HLEN + sizeof(struct virtio_net_hdr_mrg_rxbuf) The original version double counted the IP header: IP_MAX_MTU includes the IP header, and so did hdrlen. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-05 21:08:58 +01:00
David Gibson	190829705e	flow: Remove over-zealous sanity checks in flow_sidx_hash() In flow_sidx_hash() we verify that the flow we're hashing doesn't have an unspecified endpoint address, or zero for either port. The hash table only works if we're looking for exact matches of address and port, and this is attempting to catch any cases where we might have left address or port unpopulated or filled with a wildcard. This doesn't really work though, because there are cases where unspecified addresses or zero ports are correct: * We already use unspecified addresses for our address in cases where we don't know the specific local address for that side, and exclude the obvious extra check on side->oaddr for that reason. * Zero port numbers aren't strictly forbidden over the wire. We forbid them for TCP & UDP because they can't safely be handled on the socket side. However for ICMP a zero id, which goes in the port field is valid. * Possible future flow types (for example, for multicast protocols) might legitimately have an unspecified address. Although it makes them easier to miss, these sorts of sanity checks really have to be done at the protocol / flow type layer, and we already do so. Remove the checks in flow_sidx_hash() other than checking that the pif is specified. Reported-by: Stefan <steffhip@gmail.com> Link: https://bugs.passt.top/show_bug.cgi?id=105 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-05 21:08:58 +01:00
David Gibson	1db4f773e8	udp: Improve detail of UDP endpoint sanity checking In udp_flow_new() we reject a flow if the endpoint isn't unicast, or it has a zero endpoint port. Those conditions aren't strictly illegal, but we can't safely handle them at present: * Multicast UDP endpoints are certainly possible, but our current flow tracking only makes sense for simple unicast flows - we'll need different handling if we want to handle multicast flows in future * It's not entirely clear if port 0 is RFC-ishly correct, but for socket interfaces port 0 sometimes has a special meaning such as "pick the port for me, kernel". That makes flows on port 0 unsafe to forward in the usual way. For the same reason we also can't safely handle port 0 as our port. In principle that's also true for our address, however in the case of flows initiated from a socket, we may not know our address since the socket could be bound to 0.0.0.0 or ::, so we can only verify that our address is unicast for flows initiated from the tap side. Refine the current check in udp_flow_new() to slightly more detailed checks in udp_flow_from_sock() and udp_flow_from_tap() to make what is and isn't handled clearer. This makes this checking more similar to what we do for TCP connections. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-05 21:08:58 +01:00
Stefano Brivio	966fdc8749	perf/passt_vu_tcp: Make it shine Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 15:06:44 +01:00
Laurent Vivier	020c8b7127	tcp_vu: Compute IPv4 header checksum if dlen changes In tcp_vu_data_from_sock() we compute IPv4 header checksum only for the first and the last packets, and re-use the first packet checksum for all the other packets as the content of the header doesn't change. It's more accurate to check the dlen value to know if the checksum should change as dlen is the only information that can change in the loop. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
Laurent Vivier	d9c0f8eefb	Makefile: Use make internal string functions TARGET_ARCH is computed from '$(CC) -dumpmachine' using external bash commands like echo, cut, tr and sed. This can be done using make internal string functions. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	b6e79efa0b	tcp_vu: Remove unnecessary tcp_vu_update_check() function Because the vhost-user <-> virtio-net path ignores checksums, we usually don't calculate them when sending packets to the guest. So, we always pass no_tcp_csum=true to tcp_fill_headers(). We do want accurate checksums when capturing packets though, so the captures don't show bogus values. Currently we handle this by updating the checksum field immediately before writing the packet to the capture file, using tcp_vu_update_check(). This is unnecessary, though: in each case tcp_fill_headers() is called not very long before, so we can alter its no_tcp_csum parameter pased on whether we're generating captures or not. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	a6348cad51	tcp: Merge tcp_fill_headers[46]() with each other We have different versions of this function for IPv4 and IPv6, but the caller already requires some IP version specific code to get the right header pointers. Instead, have a common function that fills either an IPv4 or an IPv6 header based on which header pointer it is passed. This allows us to remove a small amount of code duplication and make a few slightly ugly conditionals. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	2abf5ab7f3	tcp: Merge tcp_update_check_tcp[46]() The only reason we need separate functions for the IPv4 and IPv6 case is to calculate the checksum of the IP pseudo-header, which is different for the two cases. However, the caller already knows which path it's on and can access the values needed for the pseudo-header partial sum more easily than tcp_update_check_tcp[46]() can. So, merge these functions into a single tcp_update_csum() function that just takes the pseudo-header partial sum, calculated in the caller. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	08ea3cc581	tcp: Pass TCP header and payload separately to tcp_fill_headers[46]() At the moment these take separate pointers to the tap specific and IP headers, but expect the TCP header and payload as a single tcp_payload_t. As well as being slightly inconsistent, this involves some slightly iffy pointer shenanigans when called on the flags path with a tcp_flags_t instead of a tcp_payload_t. More importantly, it's inconvenient for the upcoming vhost-user case, where the TCP header and payload might not be contiguous. Furthermore, the payload itself might not be contiguous. So, pass the TCP header as its own pointer, and the TCP payload as an IO vector. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	2ee07697c4	tcp: Pass TCP header and payload separately to tcp_update_check_tcp[46]() Currently these expects both the TCP header and payload in a single IOV, and goes to some trouble to locate the checksum field within it. In the current caller we've already know where the TCP header is, so we might as well just pass it in. This will need to work a bit differently for vhost-user, but that code already needs to locate the TCP header for other reasons, so again we can just pass it in. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	67151090bc	iov, checksum: Replace csum_iov() with csum_iov_tail() We usually want to checksum only the tail part of a frame, excluding at least some headers. csum_iov() does that for a frame represented as an IO vector, not actually summing the entire IO vector. We now have struct iov_tail to explicitly represent this construct, so replace csum_iov() with csum_iov_tail() taking that representation rather than 3 parameters. We propagate the same change to csum_udp4() and csum_udp6() which take similar parameters. This slightly simplifies the code, and will allow some further simplifications as struct iov_tail is more widely used. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
David Gibson	f931103171	iov: iov tail helpers In the vhost-user code we have a number of places where we need to locate a particular header within the guest-supplied IO vector. We need to work out which buffer the header is in, and verify that it's contiguous and aligned as we need. At the moment this is open-coded, but introduce a helper to make this more straightforward. We add a new datatype 'struct iov_tail' representing an IO vector from which we've logically consumed some number of headers. The IOV_PULL_HEADER macro consumes a new header from the vector, returning a pointer and updating the iov_tail. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-28 14:03:16 +01:00
Stefano Brivio	804a7ce94a	tcp_vu: Change 'dlen' to ssize_t in tcp_vu_data_from_sock() ...to quickly suppress a false positive from Coverity, which assumes that iov_size is 0 and 'dlen' might overflow as a result (with hdrlen being 66). An ASSERT() in tcp_vu_sock_recv() already guarantees that iov_size(iov, buf_cnt) here is anyway greater than 'hdrlen'. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2024-11-27 16:49:21 +01:00
Laurent Vivier	00cc2303fd	Fix build on 32bit target Fix the following errors when built with CFLAGS="-m32 -U__AVX2__": packet.c:57:23: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘size_t’ {aka ‘unsigned int’} [-Wformat=] 57 \| trace("packet offset plus length %lu from size %lu, " 58 \| "%s:%i", start - p->buf + len + offset, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} packet.c:57:23: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 6 has type ‘size_t’ {aka ‘unsigned int’} [-Wformat=] 57 \| trace("packet offset plus length %lu from size %lu, " \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 58 \| "%s:%i", start - p->buf + len + offset, 59 \| p->buf_size, func, line); \| ~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} vhost_user.c:139:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 139 \| return (void )(qemu_addr - r->qva + r->mmap_addr + \| ^ vhost_user.c:439:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 439 \| munmap((void )r->mmap_addr, r->size + r->mmap_offset); \| ^ vhost_user.c:900:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 900 \| munmap((void )r->mmap_addr, r->size + r->mmap_offset); \| ^ virtio.c:111:32: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 111 \| return (void )(guest_addr - r->gpa + r->mmap_addr + \| ^ vu_common.c:37:27: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 37 \| char m = (char )dev_region->mmap_addr; \| ^ Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-27 16:49:21 +01:00

1 2 3 4 5 ...

1846 Commits