1
0
mirror of https://passt.top/passt synced 2024-06-30 23:12:39 +00:00
passt implements a translation layer between a Layer-2 network interface and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't require any capabilities or privileges, and it can be used as a simple replacement for Slirp.
Go to file
Stefano Brivio faff133629 dhcpv6: Fix REPLY messages with NotOnLink status code
The NotOnLink status code needs to be appended to the existing IA
content, because if we omit the requested addresses in the reply,
ISC's dhclient handles it as a NoAddrsAvail response.

Also fix length accounting (we would send a bunch of zeroes after
the IA otherwise), and print an informational message with the
requested address, if it's not appropriate for the link.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-21 17:15:23 +02:00
doc passt: Set soft limit for number of open files to hard limit 2021-03-18 12:58:07 +01:00
libvirt passt: Add libvirt patch for qemu UNIX socket domain back-end 2021-03-21 00:08:42 +01:00
qemu passt: qemu patch for direct UNIX domain connection without the qrap wrapper 2021-03-18 12:58:07 +01:00
arp.c passt: Assorted fixes from "fresh eyes" review 2021-02-21 11:55:49 +01:00
arp.h passt: Assorted fixes from "fresh eyes" review 2021-02-21 11:55:49 +01:00
dhcp.c dhcp: Remove left-over comment about "forced" options 2021-03-26 12:19:56 +01:00
dhcp.h passt: Assorted fixes from "fresh eyes" review 2021-02-21 11:55:49 +01:00
dhcpv6.c dhcpv6: Fix REPLY messages with NotOnLink status code 2021-04-21 17:15:23 +02:00
dhcpv6.h passt: Introduce a DHCPv6 server 2021-04-13 22:37:40 +02:00
icmp.c passt: Introduce ICMP echo proxy 2021-03-18 12:58:03 +01:00
icmp.h passt: Introduce ICMP echo proxy 2021-03-18 12:58:03 +01:00
igmp.c passt: Create dummy igmp.c, mld.c files for image map in README 2021-04-13 22:41:04 +02:00
Makefile passt: Introduce a DHCPv6 server 2021-04-13 22:37:40 +02:00
mld.c passt: Create dummy igmp.c, mld.c files for image map in README 2021-04-13 22:41:04 +02:00
ndp.c passt: Introduce a DHCPv6 server 2021-04-13 22:37:40 +02:00
ndp.h passt: Assorted fixes from "fresh eyes" review 2021-02-21 11:55:49 +01:00
passt.c passt: Make UNIX domain socket world-writable and world-readable 2021-04-13 22:37:40 +02:00
passt.h passt: Introduce ICMP echo proxy 2021-03-18 12:58:03 +01:00
qrap.c passt: New design and implementation with native Layer 4 sockets 2021-02-16 09:28:55 +01:00
README.md README: Don't let <canvas> steal pointer events 2021-04-13 22:54:08 +02:00
siphash.c tcp: Add siphash implementation for initial sequence numbers 2021-03-17 10:57:36 +01:00
siphash.h tcp: Add siphash implementation for initial sequence numbers 2021-03-17 10:57:36 +01:00
tap.c passt: New design and implementation with native Layer 4 sockets 2021-02-16 09:28:55 +01:00
tap.h passt: New design and implementation with native Layer 4 sockets 2021-02-16 09:28:55 +01:00
tcp.c tcp: Don't dereference IPv4 addresses 2021-03-20 22:19:15 +01:00
tcp.h tcp: Add struct for TCP execution context, move hash_secret to it 2021-03-17 10:57:41 +01:00
udp.c udp: Fix typo in tcp_tap_handler() documentation 2021-03-17 10:57:42 +01:00
udp.h passt: New design and implementation with native Layer 4 sockets 2021-02-16 09:28:55 +01:00
util.c passt: Run in background, add message logging with severities 2021-03-18 12:58:07 +01:00
util.h passt: Run in background, add message logging with severities 2021-03-18 12:58:07 +01:00

passt: Plug A Simple Socket Transport

passt implements a translation layer between a Layer-2 network interface (tap) and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't require any capabilities or privileges, and it can be used as a simple replacement for Slirp.

General idea

When container workloads are moved to virtual machines, the network traffic is typically forwarded by interfaces operating at data link level. Some components in the containers ecosystem (such as service meshes), however, expect applications to run locally, with visible sockets and processes, for the purposes of socket redirection, monitoring, port mapping.

To solve this issue, user mode networking as provided e.g. by Slirp, libslirp, slirp4netns can be used. However, these existing solutions implement a full TCP/IP stack, replaying traffic on sockets that are local to the pod of the service mesh. This creates the illusion of application processes running on the same host, eventually separated by user namespaces.

While being almost transparent to the service mesh infrastructure, that kind of solution comes with a number of downsides:

  • three different TCP/IP stacks (guest, adaptation and host) need to be traversed for every service request. There are no chances to implement zero-copy mechanisms, and the amount of context switches increases dramatically
  • addressing needs to be coordinated to create the pretense of consistent addresses and routes between guest and host environments. This typically needs a NAT with masquerading, or some form of packet bridging
  • the traffic seen by the service mesh and observable externally is a distant replica of the packets forwarded to and from the guest environment:
    • TCP congestion windows and network buffering mechanisms in general operate differently from what would be naturally expected by the application
    • protocols carrying addressing information might pose additional challenges, as the applications don't see the same set of addresses and routes as they would if deployed with regular containers

passt implements a thinner layer between guest and host, that only implements what's strictly needed to pretend processes are running locally. A further, full TCP/IP stack is not necessarily needed. Some sort of TCP adaptation is needed, however, as this layer runs without the CAP_NET_RAW capability: we can't create raw IP sockets on the pod, and therefore need to map packets at Layer-2 to Layer-4 sockets offered by the host kernel.

The problem and this approach are illustrated in more detail, with diagrams, here.

Non-functional Targets

Security and maintainability goals:

  • no dynamic memory allocation
  • ~2 000 LoC target
  • no external dependencies

Interfaces and Environment

passt exchanges packets with qemu via UNIX domain socket, using the socket back-end in qemu. Currently, qemu can only connect to a listening process via TCP. Two temporary solutions are available:

  • a patch for qemu
  • a wrapper, qrap, that connects to a UNIX domain socket and starts qemu, which can now use the file descriptor that's already opened

This approach, compared to using a tap device, doesn't require any security capabilities, as we don't need to create any interface.

Services

passt provides some minimalistic implementations of networking services that can't practically run on the host:

  • ARP proxy, that resolves the address of the host (which is used as gateway) to the original MAC address of the host
  • DHCP server, a simple implementation handing out one single IPv4 address to the guest, namely, the same address as the first one configured for the upstream host interface, and passing the nameservers configured on the host
  • NDP proxy, which can also assign prefix and nameserver using SLAAC
  • DHCPv6 server: a simple implementation handing out one single IPv6 address to the guest, namely, the the same address as the first one configured for the upstream host interface, and passing the first nameserver configured on the host

Addresses

For IPv4, the guest is assigned, via DHCP, the same address as the upstream interface of the host, and the same default gateway as the default gateway of the host. Addresses are never translated.

For IPv6, the guest is assigned, via SLAAC, the same prefix as the upstream interface of the host, the same default route as the default route of the host, and, if a DHCPv6 client is running on the guest, also the same address as the upstream address of the host. This means that, with a DHCPv6 client on the guest, addresses don't need to be translated. Should the client use a different address, the destination address is translated for packets going to the guest.

Protocols

passt supports TCP, UDP and ICMP/ICMPv6 echo (requests and replies). More details about the TCP implementation are available here, and for the UDP implementation here.

An IGMP proxy is currently work in progress.

Ports

To avoid the need for explicit port mapping configuration, passt binds to all unbound non-ephemeral (0-49152) TCP ports and all unbound (0-65536) UDP ports. Binding to low ports (0-1023) will fail without additional capabilities, and ports already bound (service proxies, etc.) will also not be used.

Service proxies and other services running in the container need to be started before passt starts.

Try it

  • build from source:

      git clone https://passt.top/passt
      cd passt
      make
    
  • a static build for x86_64 as of the latest commit is also available for convenience here. These binaries are simply built with:

      CFLAGS="-static" make
    
  • run the demo script, that creates a network namespace called passt, sets up sets up a veth pair and and addresses, together with NAT for IPv4 and NDP proxying for IPv6, then starts passt in the network namespace:

      doc/demo.sh
    
  • from the same network namespace, start qemu. At the moment, qemu doesn't support UNIX domain sockets for the socket back-end. Two alternatives:

    • use the qrap wrapper, which maps a tap socket descriptor to passt's UNIX domain socket, for example:

          ip netns exec passt ./qrap 5 qemu-system-x86_64 ... -net socket,fd=5 -net nic,model=virtio ...
      
    • or patch qemu with this patch and start it like this:

          qemu-system-x86_64 ... -net socket,connect=/tmp/passt.socket -net nic,model=virtio
      
  • alternatively, you can use libvirt, with this patch, to start qemu (with the patch mentioned above), with this kind of network interface configuration:

      <interface type='client'>
        <mac address='52:54:00:02:6b:60'/>
        <source path='/tmp/passt.socket'/>
        <model type='virtio'/>
        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </interface>
    
  • and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6 echo working from/to the guest for IPv4 and IPv6

  • to connect to a service on the VM, just connect to the same port directly with the address of the network namespace. For example, to ssh to the guest, from the main namespace on the host:

      ssh 192.0.2.2
    

Contribute

Send patches and issue reports to sbrivio@redhat.com.