mirror of
https://passt.top/passt
synced 2025-01-27 22:55:16 +00:00
303 lines
7.8 KiB
Markdown
303 lines
7.8 KiB
Markdown
|
This document shows how to set up a Kata Containers environment using passt to
|
||
|
implement user-mode networking: contrary to other networking models currently
|
||
|
implemented, this kind of setup requires no elevated privileges or capabilities
|
||
|
as far as networking is concerned.
|
||
|
|
||
|
This proof-of-concept uses CRI-O as implementation container runtime, which is
|
||
|
controlled directly without resorting to a full Kubernetes environment.
|
||
|
|
||
|
# Pre-requisites
|
||
|
|
||
|
* Go and rust toolchains, typically provided by distribution packages
|
||
|
* the usual tools, such as git, make, etc.
|
||
|
* a 4.x qemu version, or more recent, with a working virtiofsd executable
|
||
|
(provided at least by Debian, Ubuntu, Fedora packages)
|
||
|
|
||
|
# Fetch and prepare components
|
||
|
|
||
|
## CRI-O
|
||
|
|
||
|
CRI-O is the container runtime. It implements the Kubernetes CRI (Container
|
||
|
Runtime Interface) on one side -- and we'll handle that part manually with
|
||
|
`crictl` here, and on the other side it supports OCI (Open Container Initiative)
|
||
|
runtimes -- Kata Containers is one of them.
|
||
|
|
||
|
### Fetch
|
||
|
|
||
|
git clone https://github.com/cri-o/cri-o.git
|
||
|
|
||
|
### Build
|
||
|
|
||
|
cd cri-o
|
||
|
make
|
||
|
|
||
|
### Install
|
||
|
|
||
|
As root:
|
||
|
|
||
|
make install
|
||
|
|
||
|
### Configure
|
||
|
|
||
|
Configuration is now at `/etc/crio/crio.conf`. This would also be the case for
|
||
|
distribution packages. Some specific configuration items for Kata Containers
|
||
|
are:
|
||
|
|
||
|
# Cgroup management implementation used for the runtime.
|
||
|
cgroup_manager = "cgroupfs"
|
||
|
|
||
|
# manage_ns_lifecycle determines whether we pin and remove namespaces
|
||
|
# and manage their lifecycle
|
||
|
manage_ns_lifecycle = true
|
||
|
|
||
|
and the following section, that can be added at the end, defines a special type
|
||
|
of runtime, the `vm` type. This is needed to run the Kata Containers runtime
|
||
|
instead of the default `crun` choice:
|
||
|
|
||
|
[crio.runtime.runtimes.kata]
|
||
|
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
|
||
|
runtime_type = "vm"
|
||
|
runtime_root = "/run/vc"
|
||
|
|
||
|
Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with
|
||
|
that in the next steps.
|
||
|
|
||
|
## CNI plugins
|
||
|
|
||
|
CNI plugins are actually binaries, run by CRI-O, used to configure networking on
|
||
|
the host as well as on the pod side. A few network topologies are offered, with
|
||
|
very limited capabilities.
|
||
|
|
||
|
### Fetch
|
||
|
|
||
|
git clone https://github.com/containernetworking/plugins
|
||
|
|
||
|
### Build
|
||
|
|
||
|
cd plugins
|
||
|
./build_linux.sh
|
||
|
|
||
|
### Install
|
||
|
|
||
|
As root:
|
||
|
|
||
|
mkdir -p /opt/cni/bin
|
||
|
cp bin/* /opt/cni/bin/
|
||
|
|
||
|
|
||
|
### Configure
|
||
|
|
||
|
The path where CNI configurations are located is configurable in
|
||
|
`/etc/crio/crio.conf`, see the `network_dir` parameter there. Assuming the
|
||
|
default value, we need to provide at least one configuration under
|
||
|
`/etc/cni/net.d/`. For example:
|
||
|
|
||
|
# cat /etc/cni/net.d/50-kata-sandbox.conf
|
||
|
{
|
||
|
"cniVersion": "0.3.0",
|
||
|
"name": "crio-bridge",
|
||
|
"type": "bridge",
|
||
|
"bridge": "cni0",
|
||
|
"isGateway": true,
|
||
|
"ipMasq": true,
|
||
|
"ipam": {
|
||
|
"type": "host-local",
|
||
|
"subnet": "10.88.0.0/16",
|
||
|
"routes": [
|
||
|
{ "dst": "0.0.0.0/0" }
|
||
|
]
|
||
|
}
|
||
|
}
|
||
|
|
||
|
## crictl
|
||
|
|
||
|
`crictl` is needed to control CRI-O in lieu of Kubernetes.
|
||
|
|
||
|
### Fetch
|
||
|
|
||
|
git clone https://github.com/kubernetes-sigs/cri-tools.git
|
||
|
|
||
|
### Build
|
||
|
|
||
|
cd cri-tools
|
||
|
make
|
||
|
|
||
|
### Install
|
||
|
|
||
|
As root:
|
||
|
|
||
|
make install
|
||
|
|
||
|
## mbuto
|
||
|
|
||
|
We'll use `mbuto` to build a minimal virtual machine image for usage with the
|
||
|
Kata Containers runtime.
|
||
|
|
||
|
### Fetch
|
||
|
|
||
|
git clone https://mbuto.lameexcu.se/mbuto
|
||
|
|
||
|
## Kata Containers
|
||
|
|
||
|
### Fetch
|
||
|
|
||
|
git clone https://github.com/kata-containers/kata-containers
|
||
|
|
||
|
### Patch
|
||
|
|
||
|
The current upstream version doesn't support the _passt_ networking model yet,
|
||
|
use the patch from this directory to add it:
|
||
|
|
||
|
patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch
|
||
|
|
||
|
### Build
|
||
|
|
||
|
make -C src/runtime
|
||
|
make -C src/agent LIBC=gnu
|
||
|
|
||
|
### Install
|
||
|
|
||
|
As root:
|
||
|
|
||
|
make -C src/runtime install
|
||
|
cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
|
||
|
chmod 755 /usr/libexec/kata-agent
|
||
|
|
||
|
### Build the Virtual Machine image
|
||
|
|
||
|
cd mbuto
|
||
|
./mbuto -f /tmp/kata.img
|
||
|
|
||
|
See `mbuto -h` for additional parameters, such as choice of kernel version,
|
||
|
kernel modules, program add-ons, etc. `mbuto` will print some configuration
|
||
|
parameters to be used in the configuration of the Kata Containers runtime below.
|
||
|
For example:
|
||
|
|
||
|
$ ./mbuto -c lz4 -f /tmp/kata.img
|
||
|
Not running as root, won't keep cpio mounted
|
||
|
Size: bin 12M lib 59M kmod 1.4M total 70M compressed 33M
|
||
|
Kata Containers [hypervisor.qemu] configuration:
|
||
|
|
||
|
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
|
||
|
initrd = "/tmp/kata.img"
|
||
|
|
||
|
### Configure
|
||
|
|
||
|
The configuration file at this point is located at
|
||
|
`/usr/share/defaults/kata-containers/configuration-qemu.toml`. Some parameters of general interest are:
|
||
|
|
||
|
[hypervisor.qemu]
|
||
|
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
|
||
|
initrd = "/tmp/kata.img"
|
||
|
|
||
|
where we can use the values indicated earlier by `mbuto`. Currently, the default
|
||
|
path for the `virtiofsd` daemon doesn't work for all distributions, ensure that
|
||
|
it matches. For example, on Debian:
|
||
|
|
||
|
virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"
|
||
|
|
||
|
we'll then need to enable the `passt` networking model for the runtime. In the
|
||
|
`[runtime]` section:
|
||
|
|
||
|
internetworking_model=passt
|
||
|
|
||
|
# Run an example container
|
||
|
|
||
|
## Fetch
|
||
|
|
||
|
We'll now need an image of a container to run as example. With `podman`
|
||
|
installed via distribution package, we can import one:
|
||
|
|
||
|
podman pull docker.io/i386/busybox
|
||
|
|
||
|
## Configure
|
||
|
|
||
|
Now we can define configuration files for pod and container we want to create
|
||
|
and start:
|
||
|
|
||
|
$ cat pod-config.json
|
||
|
{
|
||
|
"metadata": {
|
||
|
"name": "kata-sandbox",
|
||
|
"namespace": "default",
|
||
|
"attempt": 1,
|
||
|
"uid": "hdishd83djaidwnduwk28bcsb"
|
||
|
},
|
||
|
"logDirectory": "/tmp",
|
||
|
"linux": {
|
||
|
}
|
||
|
}
|
||
|
|
||
|
$ cat container-busybox.json
|
||
|
{
|
||
|
"metadata": {
|
||
|
"name": "kata-busybox"
|
||
|
},
|
||
|
"image": {
|
||
|
"image": "docker.io/i386/busybox"
|
||
|
},
|
||
|
"command": [
|
||
|
"sleep", "6000"
|
||
|
],
|
||
|
"log_path":"kata-busybox.log",
|
||
|
"linux": {
|
||
|
}
|
||
|
}
|
||
|
|
||
|
## Run the container workload
|
||
|
|
||
|
Assuming we have `pod-config.json` and `container-busybox.json` defined above,
|
||
|
we can now:
|
||
|
|
||
|
### start CRI-O
|
||
|
|
||
|
crio -l debug
|
||
|
|
||
|
### create the pod and run a container inside it
|
||
|
|
||
|
c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))
|
||
|
|
||
|
### verify that addresses are properly configured
|
||
|
|
||
|
crictl exec $c ip ad sh
|
||
|
|
||
|
## Enable support for ICMP/ICMPv6 Echo Request
|
||
|
|
||
|
_passt_ can replicate ICMP Echo Requests sent by the workload, and propagate the
|
||
|
replies back. However, as it's not running as root, we need to enable so-called
|
||
|
_ping_ sockets for unprivileged users. From the namespace created by CRI-O for
|
||
|
this container:
|
||
|
|
||
|
sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647
|
||
|
|
||
|
# Troubleshooting
|
||
|
|
||
|
## Redirect qemu's console output to file
|
||
|
|
||
|
Agent errors and kernel messages should be accessible via named UNIX domain
|
||
|
socket at `/run/vc/vm/*/console.sock`, provided `agent.debug_console` is enabled
|
||
|
in `kernel_params` of `configuration.toml` but this won't work if the agent
|
||
|
doesn't start. In order to get those, we can wrap `qemu` and get, additionally,
|
||
|
all the output piped to a file:
|
||
|
|
||
|
$ cat /usr/local/bin/qemu.sh
|
||
|
#!/bin/sh
|
||
|
|
||
|
/usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log
|
||
|
|
||
|
now, use this as path for `qemu` in `configuration.toml`:
|
||
|
|
||
|
[hypervisor.qemu]
|
||
|
path = "/usr/local/bin/qemu.sh"
|
||
|
|
||
|
and don't forget to add `console=ttyS0` to the kernel parameters, so that kernel
|
||
|
messages will also be included:
|
||
|
|
||
|
kernel_params = "... console=ttyS0"
|
||
|
|
||
|
## Debug console
|
||
|
|
||
|
See the `kata-console` script in the
|
||
|
[kata-vfio-tools repository](https://github.com/dgibson/kata-vfio-tools) for a
|
||
|
convenient helper to access the debug console provided by the agent.
|