mirror of
https://github.com/cloud-hypervisor/cloud-hypervisor.git
synced 2024-12-22 05:35:20 +00:00
docs: Provide documentation for creating custom image for VFIO CI
Extend the existing `custom-image.md` document with a new section on how to create a custom image that contains NVIDIA drivers that are required for our VFIO baremetal CI. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit is contained in:
parent
e23f4e0783
commit
9f7ccb34cd
@ -158,3 +158,172 @@ as we might need to update the direct kernel boot command line, replacing
|
||||
`/dev/vda1` with the appropriate partition number.
|
||||
|
||||
Update all references to the previous image name to the new one.
|
||||
|
||||
## NVIDIA image for VFIO baremetal CI
|
||||
|
||||
Here we are going to describe how to create a cloud image that contains the
|
||||
necessary NVIDIA drivers for our VFIO baremetal CI.
|
||||
|
||||
### Download base image
|
||||
|
||||
We usually start from one of the custom cloud image we have previously created
|
||||
but we can use a stock cloud image as well.
|
||||
|
||||
```bash
|
||||
wget https://cloud-hypervisor.azureedge.net/jammy-server-cloudimg-amd64-custom-20221118-1.raw
|
||||
mv jammy-server-cloudimg-amd64-custom-20221118-1.raw jammy-server-cloudimg-amd64-nvidia.raw
|
||||
```
|
||||
|
||||
### Extend the image size
|
||||
|
||||
The NVIDIA drivers consume lots of space, which is why we must resize the image
|
||||
before we proceed any further.
|
||||
|
||||
```bash
|
||||
qemu-img resize jammy-server-cloudimg-amd64-nvidia.raw 5G
|
||||
```
|
||||
|
||||
### Resize the partition
|
||||
|
||||
We use `parted` for fixing the GPT after the image was resized, as well as for
|
||||
resizing the `Linux` partition.
|
||||
|
||||
```bash
|
||||
sudo parted jammy-server-cloudimg-amd64-nvidia.raw
|
||||
|
||||
(parted) print
|
||||
Warning: Not all of the space available to jammy-server-cloudimg-amd64-nvidia.raw
|
||||
appears to be used, you can fix the GPT to use all of the space (an extra 5873664
|
||||
blocks) or continue with the current setting?
|
||||
Fix/Ignore? Fix
|
||||
Model: (file)
|
||||
Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
|
||||
Sector size (logical/physical): 512B/512B
|
||||
Partition Table: gpt
|
||||
Disk Flags:
|
||||
|
||||
Number Start End Size File system Name Flags
|
||||
14 1049kB 5243kB 4194kB bios_grub
|
||||
15 5243kB 116MB 111MB fat32 boot, esp
|
||||
1 116MB 2361MB 2245MB ext4
|
||||
|
||||
(parted) resizepart 1 5369MB
|
||||
(parted) print
|
||||
Model: (file)
|
||||
Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
|
||||
Sector size (logical/physical): 512B/512B
|
||||
Partition Table: gpt
|
||||
Disk Flags:
|
||||
|
||||
Number Start End Size File system Name Flags
|
||||
14 1049kB 5243kB 4194kB bios_grub
|
||||
15 5243kB 116MB 111MB fat32 boot, esp
|
||||
1 116MB 5369MB 5252MB ext4
|
||||
|
||||
(parted) quit
|
||||
```
|
||||
|
||||
### Create a macvtap interface
|
||||
|
||||
Rely on the following [documentation](docs/macvtap-bridge.md) to set up a
|
||||
macvtap interface to provide your VM with proper connectivity.
|
||||
|
||||
### Boot the image
|
||||
|
||||
It is particularly important to boot with a `cloud-init` disk attached to the
|
||||
VM as it will automatically resize the Linux `ext4` filesystem based on the
|
||||
partition that we have previously resized.
|
||||
|
||||
```bash
|
||||
./cloud-hypervisor \
|
||||
--kernel hypervisor-fw \
|
||||
--disk path=focal-server-cloudimg-amd64-nvidia.raw path=/tmp/ubuntu-cloudinit.img \
|
||||
--cpus boot=4 \
|
||||
--memory size=4G \
|
||||
--net fd=3,mac=$mac 3<>$"$tapdevice"
|
||||
```
|
||||
|
||||
### Bring up connectivity
|
||||
|
||||
If your network has a DHCP server, run the following from your VM
|
||||
|
||||
```bash
|
||||
sudo dhclient
|
||||
```
|
||||
|
||||
But if that's not the case, let's give it an IP manually (the IP addresses
|
||||
depend on your actual network) and set the DNS server IP address as well.
|
||||
|
||||
```bash
|
||||
sudo ip addr add 192.168.2.10/24 dev ens4
|
||||
sudo ip link set up dev ens4
|
||||
sudo ip route add default via 192.168.2.1
|
||||
sudo resolvectl dns ens4 8.8.8.8
|
||||
```
|
||||
|
||||
#### Check connectivity and update the image
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt upgrade
|
||||
```
|
||||
|
||||
### Install NVIDIA drivers
|
||||
|
||||
The following steps and commands are referenced from the
|
||||
[NVIDIA official documentation](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
|
||||
about Tesla compute cards.
|
||||
|
||||
```bash
|
||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
|
||||
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
|
||||
sudo dpkg -i cuda-keyring_1.0-1_all.deb
|
||||
sudo apt-key del 7fa2af80
|
||||
sudo apt update
|
||||
sudo apt -y install cuda-drivers
|
||||
```
|
||||
|
||||
### Check the `nvidia-smi` tool
|
||||
|
||||
Quickly validate that you can find and run the `nvidia-smi` command from your
|
||||
VM. At this point it should fail given no NVIDIA card has been passed through
|
||||
the VM, therefore no NVIDIA driver is loaded.
|
||||
|
||||
### Workaround LA57 reboot issue
|
||||
|
||||
Add `reboot=a` to `GRUB_CMDLINE_LINUX` in `etc/default/grub` so that the VM
|
||||
will be booted with the ACPI reboot type. This resolves a reboot issue when
|
||||
running on 5-level paging systems.
|
||||
|
||||
```bash
|
||||
sudo vim /etc/default/grub
|
||||
sudo update-grub
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
### Remove previous logins
|
||||
|
||||
Since our integration tests rely on past logins to count the number of reboots,
|
||||
we must ensure to clear the list.
|
||||
|
||||
```bash
|
||||
>/var/log/lastlog
|
||||
>/var/log/wtmp
|
||||
>/var/log/btmp
|
||||
```
|
||||
|
||||
### Clear history
|
||||
|
||||
```
|
||||
history -c
|
||||
rm /home/cloud/.bash_history
|
||||
```
|
||||
|
||||
### Reset cloud-init
|
||||
|
||||
This is mandatory as we want `cloud-init` provisioning to work again when a new
|
||||
VM will be booted with this image.
|
||||
|
||||
```
|
||||
sudo cloud-init clean
|
||||
```
|
Loading…
Reference in New Issue
Block a user