passt/contrib/kata-containers/README.md

This document shows how to set up a Kata Containers environment using passt to
implement user-mode networking: contrary to other networking models currently
implemented, this kind of setup requires no elevated privileges or capabilities
as far as networking is concerned.

This proof-of-concept uses CRI-O as implementation container runtime, which is
controlled directly without resorting to a full Kubernetes environment.

# Pre-requisites

* Go and rust toolchains, typically provided by distribution packages
* the usual tools, such as git, make, etc.
* a 4.x qemu version, or more recent, with a working virtiofsd executable
  (provided at least by Debian, Ubuntu, Fedora packages)

# Fetch and prepare components

## CRI-O

CRI-O is the container runtime. It implements the Kubernetes CRI (Container
Runtime Interface) on one side -- and we'll handle that part manually with
`crictl` here, and on the other side it supports OCI (Open Container Initiative)
runtimes -- Kata Containers is one of them.

### Fetch

    git clone https://github.com/cri-o/cri-o.git

### Build

    cd cri-o
    make

### Install

As root:

    make install

### Configure

Configuration is now at `/etc/crio/crio.conf`. This would also be the case for
distribution packages. Some specific configuration items for Kata Containers
are:

    # Cgroup management implementation used for the runtime.
    cgroup_manager = "cgroupfs"

    # manage_ns_lifecycle determines whether we pin and remove namespaces
    # and manage their lifecycle
    manage_ns_lifecycle = true

and the following section, that can be added at the end, defines a special type
of runtime, the `vm` type. This is needed to run the Kata Containers runtime
instead of the default `crun` choice:

    [crio.runtime.runtimes.kata]
    runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
    runtime_type = "vm"
    runtime_root = "/run/vc"

Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with
that in the next steps.

## CNI plugins

CNI plugins are actually binaries, run by CRI-O, used to configure networking on
the host as well as on the pod side. A few network topologies are offered, with
very limited capabilities.

### Fetch

    git clone https://github.com/containernetworking/plugins

### Build

    cd plugins
    ./build_linux.sh

### Install

As root:

    mkdir -p /opt/cni/bin
    cp bin/* /opt/cni/bin/


### Configure

The path where CNI configurations are located is configurable in
`/etc/crio/crio.conf`, see the `network_dir` parameter there. Assuming the
default value, we need to provide at least one configuration under
`/etc/cni/net.d/`. For example:

    # cat /etc/cni/net.d/50-kata-sandbox.conf
    {
        "cniVersion": "0.3.0",
        "name": "crio-bridge",
        "type": "bridge",
        "bridge": "cni0",
        "isGateway": true,
        "ipMasq": true,
        "ipam": {
            "type": "host-local",
            "subnet": "10.88.0.0/16",
            "routes": [
                { "dst": "0.0.0.0/0" }
            ]
        }
    }

## crictl

`crictl` is needed to control CRI-O in lieu of Kubernetes.

### Fetch

    git clone https://github.com/kubernetes-sigs/cri-tools.git

### Build

    cd cri-tools
    make

### Install

As root:

    make install

## mbuto

We'll use `mbuto` to build a minimal virtual machine image for usage with the
Kata Containers runtime.

### Fetch

    git clone https://mbuto.lameexcu.se/mbuto

## Kata Containers

### Fetch

    git clone https://github.com/kata-containers/kata-containers

### Patch

The current upstream version doesn't support the _passt_ networking model yet,
use the patch from this directory to add it:

    patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch

### Build

    make -C src/runtime
    make -C src/agent LIBC=gnu

### Install

As root:

    make -C src/runtime install
    cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
    chmod 755 /usr/libexec/kata-agent

### Build the Virtual Machine image

    cd mbuto
    ./mbuto -f /tmp/kata.img

See `mbuto -h` for additional parameters, such as choice of kernel version,
kernel modules, program add-ons, etc. `mbuto` will print some configuration
parameters to be used in the configuration of the Kata Containers runtime below.
For example:

    $ ./mbuto -c lz4 -f /tmp/kata.img
    Not running as root, won't keep cpio mounted
    Size: bin   12M lib   59M kmod  1.4M total   70M compressed   33M
    Kata Containers [hypervisor.qemu] configuration:

    	kernel = "/boot/vmlinuz-5.10.0-6-amd64"
    	initrd = "/tmp/kata.img"

### Configure

The configuration file at this point is located at
`/usr/share/defaults/kata-containers/configuration-qemu.toml`. Some parameters of general interest are:

    [hypervisor.qemu]
    kernel = "/boot/vmlinuz-5.10.0-6-amd64"
    initrd = "/tmp/kata.img"

where we can use the values indicated earlier by `mbuto`. Currently, the default
path for the `virtiofsd` daemon doesn't work for all distributions, ensure that
it matches. For example, on Debian:

    virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"

we'll then need to enable the `passt` networking model for the runtime. In the
`[runtime]` section:

    	internetworking_model=passt

# Run an example container

## Fetch

We'll now need an image of a container to run as example. With `podman`
installed via distribution package, we can import one:

    podman pull docker.io/i386/busybox

## Configure

Now we can define configuration files for pod and container we want to create
and start:

    $ cat pod-config.json
    {
        "metadata": {
            "name": "kata-sandbox",
            "namespace": "default",
            "attempt": 1,
            "uid": "hdishd83djaidwnduwk28bcsb"
        },
        "logDirectory": "/tmp",
        "linux": {
        }
    }

    $ cat container-busybox.json
    {
      "metadata": {
          "name": "kata-busybox"
      },
      "image": {
          "image": "docker.io/i386/busybox"
      },
      "command": [
          "sleep", "6000"
      ],
      "log_path":"kata-busybox.log",
      "linux": {
      }
    }

## Run the container workload

Assuming we have `pod-config.json` and `container-busybox.json` defined above,
we can now:

### start CRI-O

    crio -l debug

### create the pod and run a container inside it

    c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))

### verify that addresses are properly configured

    crictl exec $c ip addr show

## Enable support for ICMP/ICMPv6 Echo Request

_passt_ can replicate ICMP Echo Requests sent by the workload, and propagate the
replies back. However, as it's not running as root, we need to enable so-called
_ping_ sockets for unprivileged users. From the namespace created by CRI-O for
this container:

    sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647

# Troubleshooting

## Redirect qemu's console output to file

Agent errors and kernel messages should be accessible via named UNIX domain
socket at `/run/vc/vm/*/console.sock`, provided `agent.debug_console` is enabled
in `kernel_params` of `configuration.toml` but this won't work if the agent
doesn't start. In order to get those, we can wrap `qemu` and get, additionally,
all the output piped to a file:

    $ cat /usr/local/bin/qemu.sh
    #!/bin/sh

    /usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log

now, use this as path for `qemu` in `configuration.toml`:

    [hypervisor.qemu]
    path = "/usr/local/bin/qemu.sh"

and don't forget to add `console=ttyS0` to the kernel parameters, so that kernel
messages will also be included:

    kernel_params = "... console=ttyS0"

## Debug console

See the `kata-console` script in the
[kata-vfio-tools repository](https://github.com/dgibson/kata-vfio-tools) for a
convenient helper to access the debug console provided by the agent.