mirror of
https://passt.top/passt
synced 2024-12-22 13:45:32 +00:00
README: pasta mode, CI, performance, updated links, etc.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit is contained in:
parent
b216df04a1
commit
cc8db1c5bc
255
README.md
255
README.md
@ -1,11 +1,13 @@
|
|||||||
|
<span style="font-weight: bold; color: red;">While functional and tested to some extent, this project is still in early development phase: don't use in production or critical environments yet.</span>
|
||||||
|
|
||||||
# passt: Plug A Simple Socket Transport
|
# passt: Plug A Simple Socket Transport
|
||||||
|
|
||||||
_passt_ implements a translation layer between a Layer-2 network interface (tap)
|
_passt_ implements a translation layer between a Layer-2 network interface and
|
||||||
and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
|
native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
|
||||||
require any capabilities or privileges, and it can be used as a simple
|
require any capabilities or privileges, and it can be used as a simple
|
||||||
replacement for Slirp.
|
replacement for Slirp.
|
||||||
|
|
||||||
<img src="/builds/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;">
|
<img src="/builds/latest/web/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;">
|
||||||
<map name="image-map" id="map_overview">
|
<map name="image-map" id="map_overview">
|
||||||
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/tcp.7.html" coords="229,275,246,320,306,294,287,249" shape="poly">
|
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/tcp.7.html" coords="229,275,246,320,306,294,287,249" shape="poly">
|
||||||
<area class="map_area" target="_blank" href="https://lwn.net/Articles/420799/" coords="230,201,243,246,297,232,289,186" shape="poly">
|
<area class="map_area" target="_blank" href="https://lwn.net/Articles/420799/" coords="230,201,243,246,297,232,289,186" shape="poly">
|
||||||
@ -35,7 +37,7 @@ replacement for Slirp.
|
|||||||
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man4/veth.4.html" coords="1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape="poly">
|
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man4/veth.4.html" coords="1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape="poly">
|
||||||
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords="240,379,524,452" shape="rect">
|
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords="240,379,524,452" shape="rect">
|
||||||
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/netlink.7.html" coords="1119,278,1117,293,1165,304,1169,288" shape="poly">
|
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/netlink.7.html" coords="1119,278,1117,293,1165,304,1169,288" shape="poly">
|
||||||
<area class="map_area" target="_blank" href="https://passt.top/passt/tree/passt.c#n195" coords="989,294,1040,264,1089,280,986,344" shape="poly">
|
<area class="map_area" target="_blank" href="https://passt.top/passt/tree/conf.c" coords="989,294,1040,264,1089,280,986,344" shape="poly">
|
||||||
</map>
|
</map>
|
||||||
<canvas id="map_highlight" style="border: 0px; z-index: 10; position: fixed; pointer-events: none"></canvas>
|
<canvas id="map_highlight" style="border: 0px; z-index: 10; position: fixed; pointer-events: none"></canvas>
|
||||||
<script>
|
<script>
|
||||||
@ -92,17 +94,35 @@ for (var i = 0; i < map_areas.length; i++) {
|
|||||||
}
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
- [General idea](#general-idea)
|
# pasta: Pack A Subtle Tap Abstraction
|
||||||
|
|
||||||
|
_pasta_ (same binary as _passt_, different command) offers equivalent
|
||||||
|
functionality, for network namespaces: traffic is forwarded using a tap
|
||||||
|
interface inside the namespace, without the need to create further interfaces on
|
||||||
|
the host, hence not requiring any capabilities or privileges.
|
||||||
|
|
||||||
|
It also implements a tap bypass path for local connections: packets with a local
|
||||||
|
destination address are moved directly between Layer-4 sockets, avoiding Layer-2
|
||||||
|
translations, using the _splice_(2) and _recvmmsg_(2)/_sendmmsg_(2) system calls
|
||||||
|
for TCP and UDP, respectively.
|
||||||
|
|
||||||
|
<img src="/builds/latest/web/pasta_overview.png" class="bright" style="z-index: 20; position: relative;">
|
||||||
|
|
||||||
|
- [Motivation](#motivation)
|
||||||
- [Non-functional Targets](#non-functional-targets)
|
- [Non-functional Targets](#non-functional-targets)
|
||||||
- [Interfaces and Environment](#interfaces-and-environment)
|
- [Interfaces and Environment](#interfaces-and-environment)
|
||||||
- [Services](#services)
|
- [Services](#services)
|
||||||
- [Addresses](#addresses)
|
- [Addresses](#addresses)
|
||||||
- [Protocols](#protocols)
|
- [Protocols](#protocols)
|
||||||
- [Ports](#ports)
|
- [Ports](#ports)
|
||||||
|
- [Continuous Integration](#continuous-integration)
|
||||||
|
- [Performance](#performance)
|
||||||
- [Try it](#try-it)
|
- [Try it](#try-it)
|
||||||
- [Contribute](#contribute)
|
- [Contribute](#contribute)
|
||||||
|
|
||||||
## General idea
|
## Motivation
|
||||||
|
|
||||||
|
### passt
|
||||||
|
|
||||||
When container workloads are moved to virtual machines, the network traffic is
|
When container workloads are moved to virtual machines, the network traffic is
|
||||||
typically forwarded by interfaces operating at data link level. Some components
|
typically forwarded by interfaces operating at data link level. Some components
|
||||||
@ -110,19 +130,17 @@ in the containers ecosystem (such as _service meshes_), however, expect
|
|||||||
applications to run locally, with visible sockets and processes, for the
|
applications to run locally, with visible sockets and processes, for the
|
||||||
purposes of socket redirection, monitoring, port mapping.
|
purposes of socket redirection, monitoring, port mapping.
|
||||||
|
|
||||||
To solve this issue, user mode networking as provided e.g. by _Slirp_,
|
To solve this issue, user mode networking, as provided e.g. by _libslirp_,
|
||||||
_libslirp_, _slirp4netns_ can be used. However, these existing solutions
|
can be used. Existing solutions implement a full TCP/IP stack, replaying traffic
|
||||||
implement a full TCP/IP stack, replaying traffic on sockets that are local to
|
on sockets that are local to the pod of the service mesh. This creates the
|
||||||
the pod of the service mesh. This creates the illusion of application processes
|
illusion of application processes running on the same host, eventually separated
|
||||||
running on the same host, eventually separated by user namespaces.
|
by user namespaces.
|
||||||
|
|
||||||
While being almost transparent to the service mesh infrastructure, that kind of
|
While being almost transparent to the service mesh infrastructure, that kind of
|
||||||
solution comes with a number of downsides:
|
solution comes with a number of downsides:
|
||||||
|
|
||||||
* three different TCP/IP stacks (guest, adaptation and host) need to be
|
* three different TCP/IP stacks (guest, adaptation and host) need to be
|
||||||
traversed for every service request. There are no chances to implement
|
traversed for every service request
|
||||||
zero-copy mechanisms, and the amount of context switches increases
|
|
||||||
dramatically
|
|
||||||
* addressing needs to be coordinated to create the pretense of consistent
|
* addressing needs to be coordinated to create the pretense of consistent
|
||||||
addresses and routes between guest and host environments. This typically needs
|
addresses and routes between guest and host environments. This typically needs
|
||||||
a NAT with masquerading, or some form of packet bridging
|
a NAT with masquerading, or some form of packet bridging
|
||||||
@ -135,21 +153,43 @@ solution comes with a number of downsides:
|
|||||||
would if deployed with regular containers
|
would if deployed with regular containers
|
||||||
|
|
||||||
_passt_ implements a thinner layer between guest and host, that only implements
|
_passt_ implements a thinner layer between guest and host, that only implements
|
||||||
what's strictly needed to pretend processes are running locally. A further, full
|
what's strictly needed to pretend processes are running locally. The TCP
|
||||||
TCP/IP stack is not necessarily needed. Some sort of TCP adaptation is needed,
|
adaptation doesn't keep per-connection packet buffers, and reflects observed
|
||||||
however, as this layer runs without the `CAP_NET_RAW` capability: we can't
|
sending windows and acknowledgements between the two sides. This TCP adaptation
|
||||||
create raw IP sockets on the pod, and therefore need to map packets at Layer-2
|
is needed as _passt_ runs without the `CAP_NET_RAW` capability: it can't create
|
||||||
to Layer-4 sockets offered by the host kernel.
|
raw IP sockets on the pod, and therefore needs to map packets at Layer-2 to
|
||||||
|
Layer-4 sockets offered by the host kernel.
|
||||||
|
|
||||||
The problem and this approach are illustrated in more detail, with diagrams,
|
The problem and this approach are illustrated in more detail, with diagrams,
|
||||||
[here](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md).
|
[here](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md).
|
||||||
|
|
||||||
|
### pasta
|
||||||
|
|
||||||
|
On Linux, regular users can create network namespaces and run application
|
||||||
|
services inside them. However, connecting namespaces to other namespaces and to
|
||||||
|
external hosts requires the creation of network interfaces, such as `veth`
|
||||||
|
pairs, which needs in turn elevated privileges or the `CAP_NET_ADMIN`
|
||||||
|
capability. _pasta_, similarly to _slirp4netns_, solves this problem by creating
|
||||||
|
a tap interface available to processes in the namespace, and mapping network
|
||||||
|
traffic outside the namespace using native Layer-4 sockets.
|
||||||
|
|
||||||
|
Existing approaches typically implement a full, generic TCP/IP stack for this
|
||||||
|
translation between data and transport layers, without the possibility of
|
||||||
|
speeding up local connections, and usually requiring NAT. _pasta_:
|
||||||
|
* avoids the need for a generic, full-fledged TCP/IP stack by coordinating TCP
|
||||||
|
connection dynamics between sender and receiver
|
||||||
|
* offers a fast bypass path for local connections: if a process connects to
|
||||||
|
another process on the same host across namespaces, data is directly forwarded
|
||||||
|
using pairs of Layer-4 sockets
|
||||||
|
* with default options, maps routing and addressing information to the
|
||||||
|
namespace, avoiding any need for NAT
|
||||||
|
|
||||||
## Non-functional Targets
|
## Non-functional Targets
|
||||||
|
|
||||||
Security and maintainability goals:
|
Security and maintainability goals:
|
||||||
|
|
||||||
* no dynamic memory allocation
|
* no dynamic memory allocation
|
||||||
* ~2 000 LoC target
|
* ~5 000 LoC target
|
||||||
* no external dependencies
|
* no external dependencies
|
||||||
|
|
||||||
## Interfaces and Environment
|
## Interfaces and Environment
|
||||||
@ -166,83 +206,125 @@ TCP. Two temporary solutions are available:
|
|||||||
This approach, compared to using a _tap_ device, doesn't require any security
|
This approach, compared to using a _tap_ device, doesn't require any security
|
||||||
capabilities, as we don't need to create any interface.
|
capabilities, as we don't need to create any interface.
|
||||||
|
|
||||||
|
_pasta_ runs out of the box with any recent (post-3.8) Linux kernel.
|
||||||
|
|
||||||
## Services
|
## Services
|
||||||
|
|
||||||
_passt_ provides some minimalistic implementations of networking services that
|
_passt_ and _pasta_ provide some minimalistic implementations of networking
|
||||||
can't practically run on the host:
|
services:
|
||||||
|
|
||||||
* [ARP proxy](https://passt.top/passt/tree/arp.c), that resolves the address of
|
* [ARP proxy](https://passt.top/passt/tree/arp.c), that resolves the address of
|
||||||
the host (which is used as gateway) to the original MAC address of the host
|
the host (which is used as gateway) to the original MAC address of the host
|
||||||
* [DHCP server](https://passt.top/passt/tree/dhcp.c), a simple implementation
|
* [DHCP server](https://passt.top/passt/tree/dhcp.c), a simple implementation
|
||||||
handing out one single IPv4 address to the guest, namely, the same address as
|
handing out one single IPv4 address to the guest or namespace, namely, the
|
||||||
the first one configured for the upstream host interface, and passing the
|
same address as the first one configured for the upstream host interface, and
|
||||||
nameservers configured on the host
|
passing the nameservers configured on the host
|
||||||
* [NDP proxy](https://passt.top/passt/tree/ndp.c), which can also assign prefix
|
* [NDP proxy](https://passt.top/passt/tree/ndp.c), which can also assign prefix
|
||||||
and nameserver using SLAAC
|
and nameserver using SLAAC
|
||||||
* [DHCPv6 server](https://passt.top/passt/tree/dhcpv6.c): a simple
|
* [DHCPv6 server](https://passt.top/passt/tree/dhcpv6.c): a simple
|
||||||
implementation handing out one single IPv6 address to the guest, namely, the
|
implementation handing out one single IPv6 address to the guest or namespace,
|
||||||
the same address as the first one configured for the upstream host interface,
|
namely, the the same address as the first one configured for the upstream host
|
||||||
and passing the first nameserver configured on the host
|
interface, and passing the nameservers configured on the host
|
||||||
|
|
||||||
## Addresses
|
## Addresses
|
||||||
|
|
||||||
For IPv4, the guest is assigned, via DHCP, the same address as the upstream
|
For IPv4, the guest or namespace is assigned, via DHCP, the same address as the
|
||||||
interface of the host, and the same default gateway as the default gateway of
|
upstream interface of the host, and the same default gateway as the default
|
||||||
the host. Addresses are translated in case the guest is seen using a different
|
gateway of the host. Addresses are translated in case the guest is seen using a
|
||||||
address from the assigned one.
|
different address from the assigned one.
|
||||||
|
|
||||||
For IPv6, the guest is assigned, via SLAAC, the same prefix as the upstream
|
For IPv6, the guest or namespace is assigned, via SLAAC, the same prefix as the
|
||||||
interface of the host, the same default route as the default route of the
|
upstream interface of the host, the same default route as the default route of
|
||||||
host, and, if a DHCPv6 client is running on the guest, also the same address as
|
the host, and, if a DHCPv6 client is running in the guest or namespace, also the
|
||||||
the upstream address of the host. This means that, with a DHCPv6 client on the
|
same address as the upstream address of the host. This means that, with a DHCPv6
|
||||||
guest, addresses don't need to be translated. Should the client use a different
|
client in the guest or namespace, addresses don't need to be translated. Should
|
||||||
address, the destination address is translated for packets going to the guest.
|
the client use a different address, the destination address is translated for
|
||||||
|
packets going to the guest or to the namespace.
|
||||||
|
|
||||||
For UDP and TCP, for both IPv4 and IPv6, packets addressed to a loopback address
|
### Local connections with _passt_
|
||||||
are forwarded to the guest with their source address changed to the address of
|
|
||||||
the gateway or first hop of the default route. This mapping is reversed as the
|
For UDP and TCP, for both IPv4 and IPv6, packets from the host addressed to a
|
||||||
guest replies to those packets (on the same TCP connection, or using destination
|
loopback address are forwarded to the guest with their source address changed to
|
||||||
port and address that were used as source for UDP).
|
the address of the gateway or first hop of the default route. This mapping is
|
||||||
|
reversed on the other way.
|
||||||
|
|
||||||
|
### Local connections with _pasta_
|
||||||
|
|
||||||
|
Packets addressed to a loopback address in either namespace are directly
|
||||||
|
forwarded to the corresponding (or configured) port in the other namespace.
|
||||||
|
Similarly as _passt_, packets from the non-init namespace addressed to the
|
||||||
|
default gateway, which are therefore sent via the tap device, will have their
|
||||||
|
destination address translated to the loopback address.
|
||||||
|
|
||||||
## Protocols
|
## Protocols
|
||||||
|
|
||||||
_passt_ supports TCP, UDP and ICMP/ICMPv6 echo (requests and replies). More
|
_passt_ and _pasta_ support TCP, UDP and ICMP/ICMPv6 echo (requests and
|
||||||
details about the TCP implementation are available
|
replies). More details about the TCP implementation are available
|
||||||
[here](https://passt.top/passt/tree/tcp.c), and for the UDP
|
[here](https://passt.top/passt/tree/tcp.c), and for the UDP
|
||||||
implementation [here](https://passt.top/passt/tree/udp.c).
|
implementation [here](https://passt.top/passt/tree/udp.c).
|
||||||
|
|
||||||
An IGMP proxy is currently work in progress.
|
An IGMP/MLD proxy is currently work in progress.
|
||||||
|
|
||||||
## Ports
|
## Ports
|
||||||
|
|
||||||
To avoid the need for explicit port mapping configuration, _passt_ binds to all
|
### passt
|
||||||
unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports (0-1023)
|
|
||||||
will fail without additional capabilities, and ports already bound (service
|
To avoid the need for explicit port mapping configuration, _passt_ can bind to
|
||||||
proxies, etc.) will also not be used.
|
all unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports
|
||||||
|
(0-1023) will fail without additional capabilities, and ports already bound
|
||||||
|
(service proxies, etc.) will also not be used. Smaller subsets of ports, with
|
||||||
|
port translations, are also configurable.
|
||||||
|
|
||||||
UDP ephemeral ports are bound dynamically, as the guest uses them.
|
UDP ephemeral ports are bound dynamically, as the guest uses them.
|
||||||
|
|
||||||
Service proxies and other services running in the container need to be started
|
If all ports are forwarded, service proxies and other services running in the
|
||||||
before _passt_ starts.
|
container need to be started before _passt_ starts.
|
||||||
|
|
||||||
|
### pasta
|
||||||
|
|
||||||
|
With default options, _pasta_ scans for bound ports on init and non-init
|
||||||
|
namespaces, and automatically forwards them from the other side. Port forwarding
|
||||||
|
is fully configurable with command line options.
|
||||||
|
|
||||||
|
## Continuous Integration
|
||||||
|
|
||||||
|
<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
|
||||||
|
<source src="/builds/latest/web/ci.webm" type="video/webm">
|
||||||
|
</video></p>
|
||||||
|
|
||||||
|
<script src="/builds/latest/web/ci.js"></script>
|
||||||
|
|
||||||
|
Test logs [here](https://passt.top/builds/latest/test/).
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
<script src="/builds/latest/web/perf.js"></script>
|
||||||
|
|
||||||
## Try it
|
## Try it
|
||||||
|
|
||||||
|
### passt
|
||||||
|
|
||||||
* build from source:
|
* build from source:
|
||||||
|
|
||||||
git clone https://passt.top/passt
|
git clone https://passt.top/passt
|
||||||
cd passt
|
cd passt
|
||||||
make
|
make
|
||||||
|
|
||||||
* to make _passt_ not fork into background when it starts, and to get verbose
|
* alternatively, static builds for x86_64, with or without AVX2 instructions,
|
||||||
debug information, build with:
|
as of the latest commit are also available for convenience
|
||||||
|
[here](https://passt.top/builds/latest/x86_64/avx2/) and
|
||||||
|
[here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
|
||||||
|
packages for Debian (and derivatives) and RPM-based distributions are also
|
||||||
|
available there. These binaries and packages are simply built with:
|
||||||
|
|
||||||
CFLAGS="-DDEBUG" make
|
CFLAGS="-static" make avx2
|
||||||
|
make pkgs
|
||||||
|
make static
|
||||||
|
make pkgs
|
||||||
|
|
||||||
* a static build for x86_64 as of the latest commit is also available for
|
* have a look at the _man_ page for synopsis and options:
|
||||||
convenience [here](https://passt.top/builds/static/). These binaries are
|
|
||||||
simply built with:
|
|
||||||
|
|
||||||
CFLAGS="-static" make
|
man ./passt.1
|
||||||
|
|
||||||
* run the demo script, that creates a network namespace called `passt`, sets up
|
* run the demo script, that creates a network namespace called `passt`, sets up
|
||||||
sets up a _veth_ pair and and addresses, together with NAT for IPv4 and NDP
|
sets up a _veth_ pair and and addresses, together with NAT for IPv4 and NDP
|
||||||
@ -283,14 +365,51 @@ before _passt_ starts.
|
|||||||
|
|
||||||
ssh 192.0.2.2
|
ssh 192.0.2.2
|
||||||
|
|
||||||
|
### passt
|
||||||
|
|
||||||
|
* build from source:
|
||||||
|
|
||||||
|
git clone https://passt.top/passt
|
||||||
|
cd passt
|
||||||
|
make
|
||||||
|
|
||||||
|
* alternatively, static builds for x86_64, with or without AVX2 instructions,
|
||||||
|
as of the latest commit are also available for convenience
|
||||||
|
[here](https://passt.top/builds/latest/x86_64/avx2/) and
|
||||||
|
[here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
|
||||||
|
packages for Debian (and derivatives) and RPM-based distributions are also
|
||||||
|
available there. These binaries and packages are simply built with:
|
||||||
|
|
||||||
|
CFLAGS="-static" make avx2
|
||||||
|
make pkgs
|
||||||
|
make static
|
||||||
|
make pkgs
|
||||||
|
|
||||||
|
* have a look at the _man_ page for synopsis and options:
|
||||||
|
|
||||||
|
man ./pasta.1
|
||||||
|
|
||||||
|
* start pasta with:
|
||||||
|
|
||||||
|
./pasta
|
||||||
|
|
||||||
|
* you're now inside a new user and network namespace. For IPv6, SLAAC happens
|
||||||
|
right away as _pasta_ sets up the interface, but DHCPv6 support is available
|
||||||
|
as well. For IPv4, configure the interface with a DHCP client:
|
||||||
|
|
||||||
|
dhclient
|
||||||
|
|
||||||
|
and, optionally:
|
||||||
|
|
||||||
|
dhclient -6
|
||||||
|
|
||||||
|
* and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6
|
||||||
|
echo working from/to the guest for IPv4 and IPv6
|
||||||
|
|
||||||
|
* to connect to a service inside the namespace, just connect to the same port
|
||||||
|
using the loopback address.
|
||||||
|
|
||||||
## Contribute
|
## Contribute
|
||||||
|
|
||||||
Send patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com).
|
Public bug tracker and mailing lists are coming soon. For the moment being, send
|
||||||
|
patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com).
|
||||||
<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
|
|
||||||
<source src="/builds/ci.mp4" type="video/mp4">
|
|
||||||
</video></p>
|
|
||||||
|
|
||||||
<script src="/builds/perf.js"></script>
|
|
||||||
|
|
||||||
<script src="/builds/video_links.js"></script>
|
|
||||||
|
Loading…
Reference in New Issue
Block a user