mirror of
https://github.com/cloud-hypervisor/cloud-hypervisor.git
synced 2024-10-02 11:35:46 +00:00
docs: Provide documentation for creating custom image for VFIO CI
Extend the existing `custom-image.md` document with a new section on how to create a custom image that contains NVIDIA drivers that are required for our VFIO baremetal CI. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit is contained in:
parent
e23f4e0783
commit
9f7ccb34cd
@ -158,3 +158,172 @@ as we might need to update the direct kernel boot command line, replacing
|
|||||||
`/dev/vda1` with the appropriate partition number.
|
`/dev/vda1` with the appropriate partition number.
|
||||||
|
|
||||||
Update all references to the previous image name to the new one.
|
Update all references to the previous image name to the new one.
|
||||||
|
|
||||||
|
## NVIDIA image for VFIO baremetal CI
|
||||||
|
|
||||||
|
Here we are going to describe how to create a cloud image that contains the
|
||||||
|
necessary NVIDIA drivers for our VFIO baremetal CI.
|
||||||
|
|
||||||
|
### Download base image
|
||||||
|
|
||||||
|
We usually start from one of the custom cloud image we have previously created
|
||||||
|
but we can use a stock cloud image as well.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget https://cloud-hypervisor.azureedge.net/jammy-server-cloudimg-amd64-custom-20221118-1.raw
|
||||||
|
mv jammy-server-cloudimg-amd64-custom-20221118-1.raw jammy-server-cloudimg-amd64-nvidia.raw
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extend the image size
|
||||||
|
|
||||||
|
The NVIDIA drivers consume lots of space, which is why we must resize the image
|
||||||
|
before we proceed any further.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
qemu-img resize jammy-server-cloudimg-amd64-nvidia.raw 5G
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resize the partition
|
||||||
|
|
||||||
|
We use `parted` for fixing the GPT after the image was resized, as well as for
|
||||||
|
resizing the `Linux` partition.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo parted jammy-server-cloudimg-amd64-nvidia.raw
|
||||||
|
|
||||||
|
(parted) print
|
||||||
|
Warning: Not all of the space available to jammy-server-cloudimg-amd64-nvidia.raw
|
||||||
|
appears to be used, you can fix the GPT to use all of the space (an extra 5873664
|
||||||
|
blocks) or continue with the current setting?
|
||||||
|
Fix/Ignore? Fix
|
||||||
|
Model: (file)
|
||||||
|
Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
|
||||||
|
Sector size (logical/physical): 512B/512B
|
||||||
|
Partition Table: gpt
|
||||||
|
Disk Flags:
|
||||||
|
|
||||||
|
Number Start End Size File system Name Flags
|
||||||
|
14 1049kB 5243kB 4194kB bios_grub
|
||||||
|
15 5243kB 116MB 111MB fat32 boot, esp
|
||||||
|
1 116MB 2361MB 2245MB ext4
|
||||||
|
|
||||||
|
(parted) resizepart 1 5369MB
|
||||||
|
(parted) print
|
||||||
|
Model: (file)
|
||||||
|
Disk jammy-server-cloudimg-amd64-nvidia.raw: 5369MB
|
||||||
|
Sector size (logical/physical): 512B/512B
|
||||||
|
Partition Table: gpt
|
||||||
|
Disk Flags:
|
||||||
|
|
||||||
|
Number Start End Size File system Name Flags
|
||||||
|
14 1049kB 5243kB 4194kB bios_grub
|
||||||
|
15 5243kB 116MB 111MB fat32 boot, esp
|
||||||
|
1 116MB 5369MB 5252MB ext4
|
||||||
|
|
||||||
|
(parted) quit
|
||||||
|
```
|
||||||
|
|
||||||
|
### Create a macvtap interface
|
||||||
|
|
||||||
|
Rely on the following [documentation](docs/macvtap-bridge.md) to set up a
|
||||||
|
macvtap interface to provide your VM with proper connectivity.
|
||||||
|
|
||||||
|
### Boot the image
|
||||||
|
|
||||||
|
It is particularly important to boot with a `cloud-init` disk attached to the
|
||||||
|
VM as it will automatically resize the Linux `ext4` filesystem based on the
|
||||||
|
partition that we have previously resized.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./cloud-hypervisor \
|
||||||
|
--kernel hypervisor-fw \
|
||||||
|
--disk path=focal-server-cloudimg-amd64-nvidia.raw path=/tmp/ubuntu-cloudinit.img \
|
||||||
|
--cpus boot=4 \
|
||||||
|
--memory size=4G \
|
||||||
|
--net fd=3,mac=$mac 3<>$"$tapdevice"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bring up connectivity
|
||||||
|
|
||||||
|
If your network has a DHCP server, run the following from your VM
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo dhclient
|
||||||
|
```
|
||||||
|
|
||||||
|
But if that's not the case, let's give it an IP manually (the IP addresses
|
||||||
|
depend on your actual network) and set the DNS server IP address as well.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ip addr add 192.168.2.10/24 dev ens4
|
||||||
|
sudo ip link set up dev ens4
|
||||||
|
sudo ip route add default via 192.168.2.1
|
||||||
|
sudo resolvectl dns ens4 8.8.8.8
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Check connectivity and update the image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt upgrade
|
||||||
|
```
|
||||||
|
|
||||||
|
### Install NVIDIA drivers
|
||||||
|
|
||||||
|
The following steps and commands are referenced from the
|
||||||
|
[NVIDIA official documentation](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
|
||||||
|
about Tesla compute cards.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
|
||||||
|
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
|
||||||
|
sudo dpkg -i cuda-keyring_1.0-1_all.deb
|
||||||
|
sudo apt-key del 7fa2af80
|
||||||
|
sudo apt update
|
||||||
|
sudo apt -y install cuda-drivers
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check the `nvidia-smi` tool
|
||||||
|
|
||||||
|
Quickly validate that you can find and run the `nvidia-smi` command from your
|
||||||
|
VM. At this point it should fail given no NVIDIA card has been passed through
|
||||||
|
the VM, therefore no NVIDIA driver is loaded.
|
||||||
|
|
||||||
|
### Workaround LA57 reboot issue
|
||||||
|
|
||||||
|
Add `reboot=a` to `GRUB_CMDLINE_LINUX` in `etc/default/grub` so that the VM
|
||||||
|
will be booted with the ACPI reboot type. This resolves a reboot issue when
|
||||||
|
running on 5-level paging systems.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo vim /etc/default/grub
|
||||||
|
sudo update-grub
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
### Remove previous logins
|
||||||
|
|
||||||
|
Since our integration tests rely on past logins to count the number of reboots,
|
||||||
|
we must ensure to clear the list.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
>/var/log/lastlog
|
||||||
|
>/var/log/wtmp
|
||||||
|
>/var/log/btmp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Clear history
|
||||||
|
|
||||||
|
```
|
||||||
|
history -c
|
||||||
|
rm /home/cloud/.bash_history
|
||||||
|
```
|
||||||
|
|
||||||
|
### Reset cloud-init
|
||||||
|
|
||||||
|
This is mandatory as we want `cloud-init` provisioning to work again when a new
|
||||||
|
VM will be booted with this image.
|
||||||
|
|
||||||
|
```
|
||||||
|
sudo cloud-init clean
|
||||||
|
```
|
Loading…
Reference in New Issue
Block a user