mirror of
https://passt.top/passt
synced 2025-01-21 11:45:15 +00:00
tcp: Set EPOLLET when when reading from a socket fails with EAGAIN
Before SO_PEEK_OFF support was introduced by commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), we would peek data from sockets using a "discard" buffer as first iovec element, so that, unless we had no pending data at all, we would always get a positive return code from recvmsg() (except for closing connections or errors). If we couldn't send more data to the guest, in the window, we would set the STALLED flag (causing the epoll descriptor to switch to edge-triggered mode), and return early from tcp_data_from_sock(). With SO_PEEK_OFF, we don't have a discard buffer, and if there's data on the socket, but nothing beyond our current peeking offset, we'll get EAGAIN instead of our current "discard" length. In that case, we return even earlier, and we don't set EPOLLET on the socket as a result. As reported by Asahi Lina, this causes event loops where the kernel is signalling socket readiness, because there's data we didn't dequeue yet (waiting for the guest to acknowledge it), but we won't actually peek anything new, and return early without setting EPOLLET. This is the original report, mentioning the originally proposed fix: -- When there is unacknowledged data in the inbound socket buffer, passt leaves the socket in the epoll instance to accept new data from the server. Since there is already data in the socket buffer, an epoll without EPOLLET will repeatedly fire while no data is processed, busy-looping the CPU: epoll_pwait(3, [...], 8, 1000, NULL, 8) = 4 recvmsg(25, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(169, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(111, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(180, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) epoll_pwait(3, [...], 8, 1000, NULL, 8) = 4 recvmsg(25, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(169, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(111, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) recvmsg(180, {msg_namelen=0}, MSG_PEEK) = -1 EAGAIN (Resource temporarily unavailable) Add in the missing EPOLLET flag for this case. This brings CPU usage down from around ~80% when downloading over TCP, to ~5% (use case: passt as network transport for muvm, downloading Steam games). -- we can't set EPOLLET unconditionally though, at least right now, because we don't monitor the guest tap for EPOLLOUT in case we fail to write on that side because we filled up that buffer (and not the window of a TCP connection). Instead, rely on the observation that, once a connection is established, we only get EAGAIN on recvmsg() if we are attempting to peek data from a socket with a non-zero peeking offset: we only peek when there's pending data on a socket, and in that case, if we peek without offset, we'll always see some data. And if we peek data with a non-zero offset and get EAGAIN, that means that we're either waiting for more data to arrive on the socket (which would cause further wake-ups, even with EPOLLET), or we're waiting for the guest to acknowledge some of it, which would anyway cause a wake-up. In that case, it's safe to set STALLED and, in turn, EPOLLET on the socket, which fixes the EPOLLIN event loop. While we're establishing a connection from the socket side, though, we'll call, once, tcp_{buf,vu}_data_from_sock() to see if we got any data while we were waiting for SYN, ACK from the guest. See the comment at the end of tcp_conn_from_sock_finish(). And if there's no data queued on the socket as we check, we'll also get EAGAIN, even if our peeking offset is zero. For this reason, we need to additionally check that 'already_sent' is not zero, meaning, explicitly, that our peeking offset is not zero. Reported-by: Asahi Lina <lina@asahilina.net> Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit is contained in:
parent
22cf08ba00
commit
b8f573cdc2
@ -359,6 +359,9 @@ int tcp_buf_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
|
||||
return -errno;
|
||||
}
|
||||
|
||||
if (already_sent) /* No new data and EAGAIN: set EPOLLET */
|
||||
conn_flag(c, conn, STALLED);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user