2021-10-20 08:26:56 +02:00
<!-- -
passt: Relicense to GPL 2.0, or any later version
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-04-05 20:11:44 +02:00
SPDX-License-Identifier: GPL-2.0-or-later
2022-03-28 17:41:29 +02:00
Copyright (c) 2021-2022 Red Hat GmbH
2021-10-20 08:26:56 +02:00
Author: Stefano Brivio < sbrivio @redhat .com >
-->
2021-10-19 12:43:28 +02:00
2022-03-04 16:34:52 +01:00
< style scoped >
.mobile_hide {
visibility: hidden;
display: none;
}
img {
visibility: hidden;
display: none;
}
li {
margin: 10px;
}
@media only screen and (min-width: 768px) {
.mobile_hide {
visibility: visible;
display: inherit;
}
img {
visibility: visible;
display: inherit;
}
li {
margin: 0px;
}
}
.mobile_show {
visibility: visible;
display: inherit;
}
@media only screen and (min-width: 768px) {
.mobile_show {
visibility: hidden;
display: none;
}
}
< / style >
2021-03-18 12:56:03 +01:00
# passt: Plug A Simple Socket Transport
2021-09-26 19:31:37 +02:00
_passt_ implements a translation layer between a Layer-2 network interface and
native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
2021-03-18 12:56:03 +01:00
require any capabilities or privileges, and it can be used as a simple
replacement for Slirp.
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" >
2022-03-02 13:17:14 +01:00
< picture >
< source type = "image/webp" srcset = "/builds/latest/web/passt_overview.webp" >
< source type = "image/png" srcset = "/builds/latest/web/passt_overview.png" >
< img src = "/builds/latest/web/passt_overview.png" usemap = " #image -map" class = "bright" style = "z-index: 20; position: relative;" alt = "Overview diagram of passt" >
< / picture >
2022-03-04 16:34:52 +01:00
< map name = "image-map" id = "map_overview" class = "mobile_hide" >
2021-03-24 11:09:41 +01:00
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/tcp.7.html" coords = "229,275,246,320,306,294,287,249" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://lwn.net/Articles/420799/" coords = "230,201,243,246,297,232,289,186" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/udp.7.html" coords = "234,129,236,175,297,169,293,126" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://en.wiktionary.org/wiki/passen #German " coords = "387,516,841,440,847,476,393,553" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/udp.c" coords = "398,123,520,157" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ping.c" coords = "397,164,517,197" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp.c" coords = "398,203,516,237" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/unix.7.html" coords = "569,306,674,359" shape = "rect" >
2021-10-07 15:14:22 +02:00
< area class = "map_area" target = "_blank" href = "/passt/tree/udp.c" coords = "719,152,740,176,792,134,768,108" shape = "poly" >
< area class = "map_area" target = "_blank" href = "/passt/tree/icmp.c" coords = "727,206,827,120,854,150,754,238" shape = "poly" >
< area class = "map_area" target = "_blank" href = "/passt/tree/tcp.c" coords = "730,273,774,326,947,176,902,119" shape = "poly" >
< area class = "map_area" target = "_blank" href = "/passt/tree/igmp.c" coords = "865,273,912,295" shape = "rect" >
< area class = "map_area" target = "_blank" href = "/passt/tree/arp.c" coords = "854,300,897,320" shape = "rect" >
< area class = "map_area" target = "_blank" href = "/passt/tree/ndp.c" coords = "869,325,909,344" shape = "rect" >
< area class = "map_area" target = "_blank" href = "/passt/tree/mld.c" coords = "924,267,964,289" shape = "rect" >
< area class = "map_area" target = "_blank" href = "/passt/tree/dhcpv6.c" coords = "918,297,986,317" shape = "rect" >
< area class = "map_area" target = "_blank" href = "/passt/tree/dhcp.c" coords = "931,328,981,352" shape = "rect" >
2021-03-24 11:09:41 +01:00
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/udp.7.html" coords = "1073,115,1059,154,1120,176,1133,137" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://lwn.net/Articles/420799/" coords = "966,113,942,152,1000,175,1017,136" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/tcp.7.html" coords = "1059,175,1039,213,1098,237,1116,197" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/udp.c" coords = "1203,154,1326,189" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ping.c" coords = "1202,195,1327,228" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp.c" coords = "1204,236,1327,269" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://en.wikipedia.org/wiki/OSI_model #Layer_architecture " coords = "1159,52,1325,147" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man4/veth.4.html" coords = "1119,351,1157,339,1198,340,1236,345,1258,359,1229,377,1176,377,1139,375,1114,365" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man4/veth.4.html" coords = "1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape = "poly" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords = "240,379,524,452" shape = "rect" >
< area class = "map_area" target = "_blank" href = "https://man7.org/linux/man-pages/man7/netlink.7.html" coords = "1119,278,1117,293,1165,304,1169,288" shape = "poly" >
2021-10-07 15:14:22 +02:00
< area class = "map_area" target = "_blank" href = "/passt/tree/conf.c" coords = "989,294,1040,264,1089,280,986,344" shape = "poly" >
2021-03-24 11:09:41 +01:00
< / map >
2021-04-13 22:54:08 +02:00
< canvas id = "map_highlight" style = "border: 0px; z-index: 10; position: fixed; pointer-events: none" > < / canvas >
2022-03-04 16:34:52 +01:00
< / div >
2021-03-24 11:09:41 +01:00
< script >
function canvas_position(el) {
var rect = el.getBoundingClientRect();
var canvas = document.getElementById('map_highlight');
canvas.width = rect.right - rect.left;
canvas.height = rect.bottom - rect.top;
canvas.style.left = rect.left + 'px';
canvas.style.top = rect.top + 'px';
}
function map_hover() {
var coords = this.coords.split(',');
var canvas = document.getElementById('map_highlight');
var ctx = canvas.getContext('2d');
canvas_position(this);
ctx.fillStyle = 'rgba(255, 255, 255, .3)';
ctx.lineWidth = 1.5;
ctx.strokeStyle = 'rgba(255, 255, 100, 1)';
ctx.beginPath();
ctx.setLineDash([15, 15]);
if (this.shape == "poly") {
ctx.moveTo(coords[0], coords[1]);
for (item = 2; item < coords.length - 1 ; item + = 2 ) {
ctx.lineTo(coords[item], coords[item + 1])
}
} else if (this.shape == "rect") {
ctx.rect(coords[0], coords[1],
coords[2] - coords[0], coords[3] - coords[1]);
}
ctx.closePath();
ctx.stroke();
ctx.fill();
}
function map_out() {
var canvas = document.getElementById('map_highlight');
var ctx = canvas.getContext('2d');
ctx.clearRect(0, 0, canvas.width, canvas.height);
}
var map_areas = document.getElementsByClassName("map_area");
for (var i = 0; i < map_areas.length ; i + + ) {
map_areas[i].onmouseover = map_hover;
map_areas[i].onmouseout = map_out;
}
< / script >
2021-03-20 07:22:09 +01:00
2021-09-26 19:31:37 +02:00
# pasta: Pack A Subtle Tap Abstraction
_pasta_ (same binary as _passt_ , different command) offers equivalent
functionality, for network namespaces: traffic is forwarded using a tap
interface inside the namespace, without the need to create further interfaces on
the host, hence not requiring any capabilities or privileges.
It also implements a tap bypass path for local connections: packets with a local
destination address are moved directly between Layer-4 sockets, avoiding Layer-2
translations, using the _splice_ (2) and _recvmmsg_ (2)/_sendmmsg_(2) system calls
for TCP and UDP, respectively.
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" >
2022-03-02 13:17:14 +01:00
< picture >
< source type = "image/webp" srcset = "/builds/latest/web/pasta_overview.webp" >
< source type = "image/png" srcset = "/builds/latest/web/pasta_overview.png" >
< img src = "/builds/latest/web/passt_overview.png" class = "bright" style = "z-index: 20; position: relative;" alt = "Overview diagram of pasta" >
< / picture >
2022-03-04 16:34:52 +01:00
< / div >
2021-09-26 19:31:37 +02:00
- [Motivation ](#motivation )
2021-10-23 12:12:23 +02:00
- [Features ](#features )
2021-03-18 12:56:03 +01:00
- [Interfaces and Environment ](#interfaces-and-environment )
- [Services ](#services )
- [Addresses ](#addresses )
- [Protocols ](#protocols )
- [Ports ](#ports )
2021-09-27 13:45:17 +02:00
- [Demo ](#demo )
2021-09-26 19:31:37 +02:00
- [Continuous Integration ](#continuous-integration )
2022-01-26 08:03:35 +01:00
- [Performance ](#performance_1 )
2021-03-18 12:56:03 +01:00
- [Try it ](#try-it )
- [Contribute ](#contribute )
2021-10-23 12:12:23 +02:00
- [Security and Vulnerability Reports ](#security-and-vulnerability-reports )
2021-03-18 12:56:03 +01:00
2022-02-19 04:47:18 +01:00
See also the [man page ](/builds/latest/web/passt.1.html ).
2021-09-26 19:31:37 +02:00
## Motivation
### passt
2021-03-18 12:56:03 +01:00
When container workloads are moved to virtual machines, the network traffic is
typically forwarded by interfaces operating at data link level. Some components
in the containers ecosystem (such as _service meshes_ ), however, expect
applications to run locally, with visible sockets and processes, for the
purposes of socket redirection, monitoring, port mapping.
2021-09-26 19:31:37 +02:00
To solve this issue, user mode networking, as provided e.g. by _libslirp_ ,
can be used. Existing solutions implement a full TCP/IP stack, replaying traffic
on sockets that are local to the pod of the service mesh. This creates the
illusion of application processes running on the same host, eventually separated
by user namespaces.
2021-03-18 12:56:03 +01:00
While being almost transparent to the service mesh infrastructure, that kind of
solution comes with a number of downsides:
* three different TCP/IP stacks (guest, adaptation and host) need to be
2021-09-26 19:31:37 +02:00
traversed for every service request
2021-03-18 12:56:03 +01:00
* addressing needs to be coordinated to create the pretense of consistent
addresses and routes between guest and host environments. This typically needs
a NAT with masquerading, or some form of packet bridging
* the traffic seen by the service mesh and observable externally is a distant
replica of the packets forwarded to and from the guest environment:
* TCP congestion windows and network buffering mechanisms in general operate
differently from what would be naturally expected by the application
* protocols carrying addressing information might pose additional challenges,
as the applications don't see the same set of addresses and routes as they
would if deployed with regular containers
_passt_ implements a thinner layer between guest and host, that only implements
2021-09-26 19:31:37 +02:00
what's strictly needed to pretend processes are running locally. The TCP
adaptation doesn't keep per-connection packet buffers, and reflects observed
sending windows and acknowledgements between the two sides. This TCP adaptation
is needed as _passt_ runs without the `CAP_NET_RAW` capability: it can't create
raw IP sockets on the pod, and therefore needs to map packets at Layer-2 to
Layer-4 sockets offered by the host kernel.
2021-03-18 12:56:03 +01:00
2022-03-19 10:21:59 +01:00
See also a
[detailed illustration ](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md )
of the problem and what lead to this approach.
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
### pasta
On Linux, regular users can create network namespaces and run application
services inside them. However, connecting namespaces to other namespaces and to
external hosts requires the creation of network interfaces, such as `veth`
pairs, which needs in turn elevated privileges or the `CAP_NET_ADMIN`
capability. _pasta_ , similarly to _slirp4netns_ , solves this problem by creating
a tap interface available to processes in the namespace, and mapping network
traffic outside the namespace using native Layer-4 sockets.
Existing approaches typically implement a full, generic TCP/IP stack for this
translation between data and transport layers, without the possibility of
speeding up local connections, and usually requiring NAT. _pasta_ :
2021-10-22 14:52:47 +02:00
2021-09-26 19:31:37 +02:00
* avoids the need for a generic, full-fledged TCP/IP stack by coordinating TCP
2021-10-22 14:52:47 +02:00
connection dynamics between sender and receiver
2021-09-26 19:31:37 +02:00
* offers a fast bypass path for local connections: if a process connects to
2021-10-22 14:52:47 +02:00
another process on the same host across namespaces, data is directly forwarded
using pairs of Layer-4 sockets
2021-09-26 19:31:37 +02:00
* with default options, maps routing and addressing information to the
2021-10-22 14:52:47 +02:00
namespace, avoiding any need for NAT
2021-09-26 19:31:37 +02:00
2021-10-23 12:12:23 +02:00
## Features
2022-09-24 00:33:15 +02:00
✅: done/supported, ❌: out of scope, 🛠: in progress/being considered
⌚: nice-to-have, eventually
2021-10-23 12:12:23 +02:00
### Protocols
* ✅ IPv4
* ✅ all features, except for
* ❌ fragmentation
* ✅ IPv6
* ✅ all features, except for
* ❌ fragmentation
* ❌ jumbograms
* ✅ [TCP ](/passt/tree/tcp.c )
* ✅ Window Scaling (RFC 7323)
* ✅ Defenses against Sequence Number Attacks (RFC 6528)
* ⌚ [Protection Against Wrapped Sequences ](https://bugs.passt.top/show_bug.cgi?id=1 ) (PAWS, RFC 7323)
* ⌚ [Timestamps ](https://bugs.passt.top/show_bug.cgi?id=1 ) (RFC 7323)
* ❌ Selective Acknowledgment (RFC 2018)
* ✅ [UDP ](/passt/tree/udp.c )
* ✅ ICMP/ICMPv6 Echo
2022-01-28 02:01:04 +01:00
* ⌚ [IGMP/MLD ](https://bugs.passt.top/show_bug.cgi?id=2 ) proxy
2021-10-23 12:12:23 +02:00
* ⌚ [SCTP ](https://bugs.passt.top/show_bug.cgi?id=3 )
### Portability
* Linux
* ✅ starting from 4.18 kernel version
test: Add distribution tests for several architectures and kernel versions
The new tests check build and a simple case with pasta sending a
short message in both directions (namespace to init, init to
namespace).
Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations
on aarch64, i386, ppc64, ppc64le, s390x, x86_64.
Builds tested starting from approximately glibc 2.19, gcc 4.7, and
actual functionality approximately from 4.4 kernels, glibc 2.25,
gcc 4.8, all the way up to current glibc/gcc/kernel versions.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:45:16 +01:00
* ✅ starting from 3.13 kernel version
2022-02-28 16:18:44 +01:00
* ✅ run-time selection of AVX2 build
2023-03-08 23:43:10 +01:00
* C libraries:
* ✅ glibc
* ✅ [_musl_ ](https://bugs.passt.top/show_bug.cgi?id=4 )
* ⌚ [_uClibc-ng_ ](https://bugs.passt.top/show_bug.cgi?id=5 )
2021-10-23 12:12:23 +02:00
* ⌚ [FreeBSD ](https://bugs.passt.top/show_bug.cgi?id=6 ),
[Darwin ](https://bugs.passt.top/show_bug.cgi?id=6 )
* ⌚ [NetBSD ](https://bugs.passt.top/show_bug.cgi?id=7 ),
[OpenBSD ](https://bugs.passt.top/show_bug.cgi?id=7 )
* ⌚ [Win2k ](https://bugs.passt.top/show_bug.cgi?id=8 )
### Security
* ✅ no dynamic memory allocation (`sbrk` (2), `brk` (2), `mmap` (2) [blocked via
`seccomp` ](/passt/tree/seccomp.sh))
* ✅ root operation not allowed outside user namespaces
* ✅ all capabilities dropped, other than `CAP_NET_BIND_SERVICE` (if granted)
passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.
While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.
With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.
While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.
We have to open all the file handles we'll ever need before
sandboxing:
- the packet capture file can only be opened once, drop instance
numbers from the default path and use the (pre-sandbox) PID instead
- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
of bound ports in pasta mode, are now opened only once, before
sandboxing, and their handles are stored in the execution context
- the UNIX domain socket for passt is also bound only once, before
sandboxing: to reject clients after the first one, instead of
closing the listening socket, keep it open, accept and immediately
discard new connection if we already have a valid one
Clarify the (unchanged) behaviour for --netns-only in the man page.
To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.
For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.
We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.
Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-07 21:11:37 +01:00
* ✅ with default options, user, mount, IPC, UTS, PID namespaces are detached
2021-10-23 12:12:23 +02:00
* ✅ no external dependencies (other than a standard C library)
2022-10-06 14:51:04 +02:00
* ✅ restrictive seccomp profiles (30 syscalls allowed for _passt_ , 41 for
passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.
While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.
With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.
While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.
We have to open all the file handles we'll ever need before
sandboxing:
- the packet capture file can only be opened once, drop instance
numbers from the default path and use the (pre-sandbox) PID instead
- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
of bound ports in pasta mode, are now opened only once, before
sandboxing, and their handles are stored in the execution context
- the UNIX domain socket for passt is also bound only once, before
sandboxing: to reject clients after the first one, instead of
closing the listening socket, keep it open, accept and immediately
discard new connection if we already have a valid one
Clarify the (unchanged) behaviour for --netns-only in the man page.
To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.
For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.
We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.
Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-07 21:11:37 +01:00
_pasta_ on x86_64)
2022-03-28 11:08:39 +02:00
* ✅ examples of [AppArmor ](/passt/tree/contrib/apparmor ) and
[SELinux ](/passt/tree/contrib/selinux ) profiles available
2021-10-23 12:12:23 +02:00
* ✅ static checkers in continuous integration (clang-tidy, cppcheck)
treewide: Packet abstraction with mandatory boundary checks
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.
Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.
This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:02:47 +01:00
* ✅️ clearly defined boundary-checked packet abstraction
2021-10-23 12:12:23 +02:00
* 🛠️ ~5 000 LoC target
* ⌚ [fuzzing ](https://bugs.passt.top/show_bug.cgi?id=9 ), _packetdrill_ tests
* ⌚ stricter [synflood protection ](https://bugs.passt.top/show_bug.cgi?id=10 )
2022-03-19 10:21:59 +01:00
* 💡 [add ](https://lists.passt.top/ ) [your ](https://bugs.passt.top/ )
[ideas ](https://chat.passt.top )
2021-10-23 12:12:23 +02:00
### Configurability
* ✅ all addresses, ports, port ranges
* ✅ optional NAT, not required
* ✅ all protocols
* ✅ _pasta_ : auto-detection of bound ports
2023-03-08 23:43:10 +01:00
* ⌚ run-time configuration of port ranges without autodetection
* ⌚ configuration of port ranges for autodetection
2022-03-19 10:21:59 +01:00
* 💡 [add ](https://lists.passt.top/ ) [your ](https://bugs.passt.top/ )
[ideas ](https://chat.passt.top )
2021-10-23 12:12:23 +02:00
### Performance
* ✅ maximum two (cache hot) copies on every data path
* ✅ _pasta_ : zero-copy for local connections by design (no configuration
needed)
* ✅ generalised coalescing and batching on every path for every supported
protocol
* ✅ 4 to 50 times IPv4 TCP throughput of existing, conceptually similar
solutions depending on MTU (UDP and IPv6 hard to compare)
2022-10-27 22:38:10 +02:00
* 🛠 [_vhost-user_ support ](https://bugs.passt.top/show_bug.cgi?id=25 ) for
maximum one copy on every data path and lower request-response latency
2021-10-23 12:12:23 +02:00
* ⌚ [multithreading ](https://bugs.passt.top/show_bug.cgi?id=13 )
* ⌚ [raw IP socket support ](https://bugs.passt.top/show_bug.cgi?id=14 ) if
`CAP_NET_RAW` is granted
* ⌚ eBPF support (might not improve performance over vhost-user)
### Interfaces
2023-03-08 23:43:10 +01:00
* ✅ native [qemu ](https://bugs.passt.top/show_bug.cgi?id=11 ) support (_passt_)
* ✅ native [libvirt ](https://bugs.passt.top/show_bug.cgi?id=12 ) support
(_passt_)
2022-11-16 23:35:14 +01:00
* ✅ Podman [integration ](https://github.com/containers/podman/pull/16141 )
2023-03-08 23:43:10 +01:00
(_pasta_)
2022-03-19 10:30:47 +01:00
* ✅ bug-to-bug compatible
2022-11-16 15:04:21 +01:00
[_slirp4netns_ replacement ](/passt/tree/slirp4netns.sh )
2022-03-19 10:30:47 +01:00
* ✅ out-of-tree patch for
[Kata Containers ](/passt/tree/contrib/kata-containers ) available
2021-10-23 12:12:23 +02:00
* ⌚ drop-in replacement for VPNKit (rootless Docker)
### Availability
2023-12-24 18:36:29 +01:00
* official packages for:
2024-03-26 11:36:05 +01:00
* ✅ [Alpine Linux ](https://pkgs.alpinelinux.org/packages?name=passt )
2023-12-24 18:36:29 +01:00
* ✅ [Arch Linux ](https://archlinux.org/packages/extra/x86_64/passt/ ) ([aarch64 ](https://archlinuxarm.org/packages/aarch64/passt ), [i486 ](https://www.archlinux32.org/packages/?q=passt ))
2023-12-30 11:44:51 +01:00
* ✅ [CentOS Stream ](https://gitlab.com/redhat/centos-stream/rpms/passt )
2023-12-24 18:36:29 +01:00
* ✅ [Debian ](https://tracker.debian.org/pkg/passt )
* ✅ [Fedora ](https://src.fedoraproject.org/rpms/passt )
* ✅ [Gentoo ](https://packages.gentoo.org/packages/net-misc/passt )
2024-03-26 11:36:05 +01:00
* ✅ [GNU Guix ](https://packages.guix.gnu.org/packages/passt/ )
* ✅ [OpenSUSE ](https://build.opensuse.org/package/requests/Virtualization:containers/passt )
2023-12-24 18:36:29 +01:00
* ✅ [Ubuntu ](https://launchpad.net/ubuntu/+source/passt )
* ✅ [Void Linux ](https://voidlinux.org/packages/?q=passt )
* unofficial packages for:
* ✅ [EPEL, Mageia ](https://copr.fedorainfracloud.org/coprs/sbrivio/passt/ )
2022-11-13 10:04:45 +01:00
* ✅ unofficial [packages ](https://passt.top/builds/latest/x86_64/ ) from x86_64
static builds for other RPM-based distributions
* ✅ unofficial [packages ](https://passt.top/builds/latest/x86_64/ ) from x86_64
2023-03-08 23:43:10 +01:00
static builds for other Debian-based distributions
2022-03-19 10:30:47 +01:00
* ✅ testing on non-x86_64 architectures (aarch64, armv7l, i386, ppc64, ppc64le,
s390x)
2021-10-23 12:12:23 +02:00
### Services
* ✅ built-in [ARP proxy ](/passt/tree/arp.c )
* ✅ minimalistic [DHCP server ](/passt/tree/dhcp.c )
* ✅ minimalistic [NDP proxy ](/passt/tree/ndp.c ) with router advertisements and
SLAAC support
* ✅ minimalistic [DHCPv6 server ](/passt/tree/dhcpv6.c )
* ⌚ fine-grained configurability of DHCP, NDP, DHCPv6 options
2021-03-18 12:56:03 +01:00
## Interfaces and Environment
_passt_ exchanges packets with _qemu_ via UNIX domain socket, using the `socket`
2022-11-04 02:38:31 +01:00
back-end in qemu. This is supported since qemu 7.2.
2021-03-18 12:56:03 +01:00
2022-11-04 02:38:31 +01:00
For older versions, the [qrap ](/passt/tree/qrap.c ) wrapper can be used to
connect to a UNIX domain socket and to start qemu, which can now use the file
descriptor that's already opened.
2021-03-18 12:56:03 +01:00
This approach, compared to using a _tap_ device, doesn't require any security
capabilities, as we don't need to create any interface.
2021-09-26 19:31:37 +02:00
_pasta_ runs out of the box with any recent (post-3.8) Linux kernel.
2021-03-18 12:56:03 +01:00
## Services
2021-09-26 19:31:37 +02:00
_passt_ and _pasta_ provide some minimalistic implementations of networking
services:
2021-03-18 12:56:03 +01:00
2021-10-07 15:14:22 +02:00
* [ARP proxy ](/passt/tree/arp.c ), that resolves the address of
2021-03-18 12:56:03 +01:00
the host (which is used as gateway) to the original MAC address of the host
2021-10-07 15:14:22 +02:00
* [DHCP server ](/passt/tree/dhcp.c ), a simple implementation
2021-09-26 19:31:37 +02:00
handing out one single IPv4 address to the guest or namespace, namely, the
same address as the first one configured for the upstream host interface, and
passing the nameservers configured on the host
2021-10-07 15:14:22 +02:00
* [NDP proxy ](/passt/tree/ndp.c ), which can also assign prefix
2021-03-18 12:56:03 +01:00
and nameserver using SLAAC
2021-10-07 15:14:22 +02:00
* [DHCPv6 server ](/passt/tree/dhcpv6.c ): a simple
2021-09-26 19:31:37 +02:00
implementation handing out one single IPv6 address to the guest or namespace,
2024-07-29 05:20:06 +03:00
namely, the same address as the first one configured for the upstream host
2021-09-26 19:31:37 +02:00
interface, and passing the nameservers configured on the host
2021-03-18 12:56:03 +01:00
## Addresses
2021-09-26 19:31:37 +02:00
For IPv4, the guest or namespace is assigned, via DHCP, the same address as the
upstream interface of the host, and the same default gateway as the default
gateway of the host. Addresses are translated in case the guest is seen using a
different address from the assigned one.
2023-12-27 11:48:20 +01:00
For IPv6, the guest or namespace is assigned, via SLAAC, a prefix derived from
the address of the upstream interface of the host, the same default route as the
default route of the host, and, if a DHCPv6 client is running in the guest or
namespace, also the same address as the upstream address of the host. This means
that, with a DHCPv6 client in the guest or namespace, addresses don't need to be
translated. Should the client use a different address, the destination address
is translated for packets going to the guest or to the namespace.
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
### Local connections with _passt_
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
For UDP and TCP, for both IPv4 and IPv6, packets from the host addressed to a
loopback address are forwarded to the guest with their source address changed to
the address of the gateway or first hop of the default route. This mapping is
reversed on the other way.
### Local connections with _pasta_
Packets addressed to a loopback address in either namespace are directly
forwarded to the corresponding (or configured) port in the other namespace.
Similarly as _passt_ , packets from the non-init namespace addressed to the
default gateway, which are therefore sent via the tap device, will have their
destination address translated to the loopback address.
udp: Connection tracking for ephemeral, local ports, and related fixes
As we support UDP forwarding for packets that are sent to local
ports, we actually need some kind of connection tracking for UDP.
While at it, this commit introduces a number of vaguely related fixes
for issues observed while trying this out. In detail:
- implement an explicit, albeit minimalistic, connection tracking
for UDP, to allow usage of ephemeral ports by the guest and by
the host at the same time, by binding them dynamically as needed,
and to allow mapping address changes for packets with a loopback
address as destination
- set the guest MAC address whenever we receive a packet from tap
instead of waiting for an ARP request, and set it to broadcast on
start, otherwise DHCPv6 might not work if all DHCPv6 requests time
out before the guest starts talking IPv4
- split context IPv6 address into address we assign, global or site
address seen on tap, and link-local address seen on tap, and make
sure we use the addresses we've seen as destination (link-local
choice depends on source address). Similarly, for IPv4, split into
address we assign and address we observe, and use the address we
observe as destination
- introduce a clock_gettime() syscall right after epoll_wait() wakes
up, so that we can remove all the other ones and pass the current
timestamp to tap and socket handlers -- this is additionally needed
by UDP to time out bindings to ephemeral ports and mappings between
loopback address and a local address
- rename sock_l4_add() to sock_l4(), no semantic changes intended
- include <arpa/inet.h> in passt.c before kernel headers so that we
can use <netinet/in.h> macros to check IPv6 address types, and
remove a duplicate <linux/ip.h> inclusion
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-29 16:59:20 +02:00
2021-03-18 12:56:03 +01:00
## Protocols
2021-09-26 19:31:37 +02:00
_passt_ and _pasta_ support TCP, UDP and ICMP/ICMPv6 echo (requests and
2022-03-19 10:21:59 +01:00
replies). More details about the TCP implementation are described in the
[theory of operation ](/passt/tree/tcp.c ), and similarly for
[UDP ](/passt/tree/udp.c ).
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
An IGMP/MLD proxy is currently work in progress.
2021-03-18 12:56:03 +01:00
## Ports
2021-09-26 19:31:37 +02:00
### passt
To avoid the need for explicit port mapping configuration, _passt_ can bind to
all unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports
(0-1023) will fail without additional capabilities, and ports already bound
(service proxies, etc.) will also not be used. Smaller subsets of ports, with
port translations, are also configurable.
udp: Connection tracking for ephemeral, local ports, and related fixes
As we support UDP forwarding for packets that are sent to local
ports, we actually need some kind of connection tracking for UDP.
While at it, this commit introduces a number of vaguely related fixes
for issues observed while trying this out. In detail:
- implement an explicit, albeit minimalistic, connection tracking
for UDP, to allow usage of ephemeral ports by the guest and by
the host at the same time, by binding them dynamically as needed,
and to allow mapping address changes for packets with a loopback
address as destination
- set the guest MAC address whenever we receive a packet from tap
instead of waiting for an ARP request, and set it to broadcast on
start, otherwise DHCPv6 might not work if all DHCPv6 requests time
out before the guest starts talking IPv4
- split context IPv6 address into address we assign, global or site
address seen on tap, and link-local address seen on tap, and make
sure we use the addresses we've seen as destination (link-local
choice depends on source address). Similarly, for IPv4, split into
address we assign and address we observe, and use the address we
observe as destination
- introduce a clock_gettime() syscall right after epoll_wait() wakes
up, so that we can remove all the other ones and pass the current
timestamp to tap and socket handlers -- this is additionally needed
by UDP to time out bindings to ephemeral ports and mappings between
loopback address and a local address
- rename sock_l4_add() to sock_l4(), no semantic changes intended
- include <arpa/inet.h> in passt.c before kernel headers so that we
can use <netinet/in.h> macros to check IPv6 address types, and
remove a duplicate <linux/ip.h> inclusion
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-29 16:59:20 +02:00
UDP ephemeral ports are bound dynamically, as the guest uses them.
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
If all ports are forwarded, service proxies and other services running in the
container need to be started before _passt_ starts.
### pasta
With default options, _pasta_ scans for bound ports on init and non-init
namespaces, and automatically forwards them from the other side. Port forwarding
is fully configurable with command line options.
2021-09-27 13:45:17 +02:00
## Demo
### pasta
2022-02-22 18:29:45 +01:00
< link rel = "stylesheet" type = "text/css" href = "/static/asciinema-player.css" / >
< script src = "/static/asciinema-player.min.js" > < / script >
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" id = "demo_pasta_div" style = "display: grid; grid-template-columns: 1fr 1fr;" >
2022-02-22 23:13:38 +01:00
< div id = "demo_pasta" style = "width: 99%;" > < / div >
< div id = "demo_podman" style = "width: 99%;" > < / div >
2022-02-21 13:35:45 +01:00
< / div >
2022-02-22 18:29:45 +01:00
< script >
2022-03-04 16:34:52 +01:00
if (getComputedStyle(document.getElementById('demo_pasta_div'))['visibility'] == "visible") {
demo_pasta_player = AsciinemaPlayer.create('/builds/latest/web/demo_pasta.cast',
document.getElementById('demo_pasta'),
{ cols: 130, rows: 41,
preload: true, poster: 'npt:0:4'
});
demo_podman_player = AsciinemaPlayer.create('/builds/latest/web/demo_podman.cast',
document.getElementById('demo_podman'),
{ cols: 130, rows: 41,
preload: true, poster: 'npt:0:4'
});
}
2022-02-22 18:29:45 +01:00
< / script >
2022-03-04 16:34:52 +01:00
< div class = "mobile_show" >
< p > < a href = "/builds/latest/web/demo_pasta.html" > Overview of pasta functionality< / a > < / p >
< p > < a href = "/builds/latest/web/demo_podman.html" > Overview of Podman operation with pasta< / a > < / p >
< / div >
2021-09-27 13:45:17 +02:00
### passt
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" id = "demo_passt" style = "width: 70%; height: auto; max-height: 90%" > < / div >
2022-02-22 18:29:45 +01:00
< script >
2022-03-04 16:34:52 +01:00
if (getComputedStyle(document.getElementById('demo_passt'))['visibility'] == "visible") {
demo_passt_player = AsciinemaPlayer.create('/builds/latest/web/demo_passt.cast',
document.getElementById('demo_passt'),
{ cols: 130, rows: 41,
preload: true, poster: 'npt:0:4'
});
}
2022-02-22 18:29:45 +01:00
< / script >
2022-03-04 16:34:52 +01:00
< div class = "mobile_show" >
< p > < a href = "/builds/latest/web/demo_passt.html" > Overview of passt functionality< / a > < / p >
< / div >
2021-09-27 13:45:17 +02:00
2021-09-26 19:31:37 +02:00
## Continuous Integration
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" id = "ci" style = "width: 90%; height: auto; max-height: 90%" > < / div >
2022-02-22 18:29:45 +01:00
< script >
2022-03-04 16:34:52 +01:00
if (getComputedStyle(document.getElementById('ci'))['visibility'] == "visible") {
ci_player = AsciinemaPlayer.create('/builds/latest/web/ci.cast',
document.getElementById('ci'),
{ cols: 240, rows: 51, poster: 'npt:999:0' }
);
}
2022-02-22 18:29:45 +01:00
< / script >
2022-03-04 16:34:52 +01:00
< div class = "mobile_hide" > < script src = "/builds/latest/web/ci.js" > < / script > < / div >
< div class = "mobile_show" >
< p > < a href = "/builds/latest/web/ci.html" > Continuous integration test run< / a > < / p >
< / div >
2021-09-26 19:31:37 +02:00
2022-03-19 10:21:59 +01:00
See also the [test logs ](/builds/latest/test/ ).
2021-09-26 19:31:37 +02:00
## Performance
< script src = "/builds/latest/web/perf.js" > < / script >
2021-03-18 12:56:03 +01:00
## Try it
2021-09-26 19:31:37 +02:00
### passt
2021-03-18 12:56:03 +01:00
* build from source:
git clone https://passt.top/passt
cd passt
make
2023-03-08 23:43:10 +01:00
* alternatively, install one of the [available packages ](#availability )
2021-05-10 07:34:24 +02:00
2022-09-24 00:23:38 +02:00
Static binaries and packages are simply built with:
2022-09-17 01:52:59 +02:00
2022-09-24 08:58:47 +02:00
make pkgs
2022-08-09 23:35:34 +02:00
2021-09-26 19:31:37 +02:00
* have a look at the _man_ page for synopsis and options:
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
man ./passt.1
2021-03-18 12:56:03 +01:00
2022-08-09 23:19:13 +02:00
* run the demo script, that detaches user and network namespaces, configures the
new network namespace using `pasta` , starts `passt` and, optionally, `qemu` :
2021-03-18 12:56:03 +01:00
doc/demo.sh
2023-03-08 23:43:10 +01:00
* alternatively, you can use
[libvirt ](https://libvirt.org/formatdomain.html#userspace-slirp-or-passt-connection )
to start QEMU
2021-03-20 07:22:09 +01:00
2021-03-18 12:56:03 +01:00
* and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6
echo working from/to the guest for IPv4 and IPv6
* to connect to a service on the VM, just connect to the same port directly
2022-09-17 01:52:59 +02:00
with the address of the current network namespace
2021-03-18 12:56:03 +01:00
2021-09-28 14:45:07 +02:00
### pasta
2021-03-18 12:56:03 +01:00
2021-09-26 19:31:37 +02:00
* build from source:
2021-09-18 07:47:25 +02:00
2021-09-26 19:31:37 +02:00
git clone https://passt.top/passt
cd passt
make
2023-03-08 23:43:10 +01:00
* alternatively, install one of the [available packages ](#availability )
2021-09-26 19:31:37 +02:00
2022-09-24 00:23:38 +02:00
Static binaries and packages are simply built with:
2021-09-26 19:31:37 +02:00
2022-09-24 08:58:47 +02:00
make pkgs
2022-08-09 23:35:34 +02:00
2021-09-26 19:31:37 +02:00
* have a look at the _man_ page for synopsis and options:
man ./pasta.1
2021-09-18 07:47:25 +02:00
2021-09-26 19:31:37 +02:00
* start pasta with:
./pasta
2022-11-16 15:04:21 +01:00
* alternatively, use it directly with Podman (since Podman 4.3.2, or with
2022-11-16 23:35:14 +01:00
commit [`aa47e05ae4a0` ](https://github.com/containers/podman/commit/aa47e05ae4a0d14a338cbe106b7eb9cdf098a529 )):
2022-11-16 15:04:21 +01:00
2022-11-16 23:35:14 +01:00
podman run --net=pasta ...
2022-11-16 15:04:21 +01:00
2021-09-26 19:31:37 +02:00
* you're now inside a new user and network namespace. For IPv6, SLAAC happens
right away as _pasta_ sets up the interface, but DHCPv6 support is available
as well. For IPv4, configure the interface with a DHCP client:
dhclient
2021-10-22 14:52:47 +02:00
and, optionally:
2021-09-26 19:31:37 +02:00
dhclient -6
2022-08-09 23:19:13 +02:00
* alternatively, start pasta as:
./pasta --config-net
to let pasta configure networking in the namespace by itself, using
`netlink`
* ...or run the demo script:
2022-09-24 00:23:38 +02:00
doc/demo.sh
2022-08-09 23:19:13 +02:00
2021-09-26 19:31:37 +02:00
* and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6
2022-03-01 21:43:41 +01:00
echo working from/to the namespace for IPv4 and IPv6
2021-09-26 19:31:37 +02:00
* to connect to a service inside the namespace, just connect to the same port
using the loopback address.
## Contribute
2021-09-18 07:47:25 +02:00
2021-10-23 12:12:23 +02:00
### [Mailing Lists](/passt/lists)
* Submit, review patches, and discuss development ideas on
[`passt-dev` ](https://lists.passt.top/postorius/lists/passt-dev.passt.top/ )
* Ask your questions and discuss usage needs on
[`passt-user` ](https://lists.passt.top/postorius/lists/passt-user.passt.top/ )
### [Bug Reports and Feature Requests](/passt/bugs)
2022-10-27 22:38:10 +02:00
* **Pick up an [open bug ](https://bugs.passt.top/buglist.cgi?bug_severity=blocker&bug_severity=quite%20bad&bug_severity=normal&bug_severity=minor&columnlist=bug_status%2Ccomponent%2Cpriority%2Cbug_severity%2Cassigned_to%2Cshort_desc%2Cchangeddate&known_name=Open%20bugs%2C%20by%20priority&list_id=85&query_based_on=Open%20bugs%2C%20by%20priority&query_format=advanced&resolution=--- )**
* **Implement a [feature request ](https://bugs.passt.top/buglist.cgi?bug_severity=enhancement&bug_severity=feature&columnlist=bug_status%2Ccomponent%2Cpriority%2Cbug_severity%2Cassigned_to%2Cshort_desc%2Cchangeddate&known_name=Features%2C%20by%20priority&list_id=81&order=priority%2Cbug_status%2Cassigned_to%2Cbug_id&query_based_on=Features%2C%20by%20priority&query_format=advanced&resolution=--- )**
* Browse all [open items ](https://bugs.passt.top/buglist.cgi?columnlist=bug_status%2Ccomponent%2Cpriority%2Cbug_severity%2Cassigned_to%2Cshort_desc%2Cchangeddate&known_name=All%20items%2C%20by%20priority&list_id=83&query_based_on=All%20items%2C%20by%20priority&query_format=advanced&resolution=--- )
* ...or [file a bug ](https://bugs.passt.top/enter_bug.cgi )
2021-10-23 12:12:23 +02:00
### [Chat](/passt/chat)
2023-03-08 23:43:10 +01:00
* Somebody might be available on [IRC ](https://irc.passt.top ) on `#passt` at
[Libera.Chat ](https://libera.chat/ )
2021-10-23 12:12:23 +02:00
2022-11-21 10:54:08 +01:00
### Weekly development [meeting](https://pad.passt.top/p/weekly)
* Open to everybody! Feel free to join and propose a different time directly on
the agenda.
2021-10-23 12:12:23 +02:00
## Security and Vulnerability Reports
* Please send an email to [passt-sec ](mailto:passt-sec@passt.top ), private list,
no subscription required