Commit Graph

1926 Commits

Author SHA1 Message Date
Rob Bradford
6230929d51 openapi: Add thp option to MemoryConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
f603afc46e vmm: Make Transparent Huge Pages controllable (default on)
Add MemoryConfig::thp and `--memory thp=on|off` to allow control of
Transparent Huge Pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
b68add2d0d vmm: Enable THP when using anonymous memory
If the memory is not backed by a file then it is possible to enable
Transparent Huge Pages on the memory and take advantage of the benefits
of huge pages without requiring the specific allocation of an appropriate
number of huge pages.

TEST=Boot and see that in /proc/`pidof cloud-hypervisor`/smaps that the
region is now THPeligible (and that also pages are being used.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
6e0bd73c90 build: Bump linux-loader from 0.6.0 to 0.7.0
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-02 11:02:00 +00:00
Bo Chen
a9ec0f33c0 misc: Fix clippy issues
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-02 09:41:43 +01:00
Rob Bradford
f4495de143 vmm: Improve handling of shared memory backing
As huge pages are always MAP_SHARED then where the shared memory would
be checked (for vhost-user and local migration) we can also check
instead for huge pages.

The checking is also extended to cover the memory zones based
configuration as well.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
99d9a3d299 vmm: memory_manager: Avoid MAP_PRIVATE CoW with VFIO for hugepages too
We can't use MAP_ANONYMOUS and still have huge pages so MAP_SHARED is
effectively required when using huge pages.

Unfortunately it is not as simple as always forcing MAP_SHARED if
hugepages is on as this might be inappropriate in the backing file case
hence why there is additional complexity of assigning to mmap_flags on
each case and the MAP_SHARED is only turned on for the anonymous file
huge page case as well as anonymous shared file case.

See: #4805

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
df7c728399 vmm: memory_manager: Only file back memory when required
If we do not need an anonymous file backing the memory then do not
create one.

As a side effect this addresses an issue with CoW (mmap with MAP_PRIVATE
but no MAP_ANONYMOUS) when the memory is pinned for VFIO.

Fixes: #4805

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
1e5a4e8d77 vmm: memory_manager: Split filesystem backed and anonymous RAM creation
This simplifies the code somewhat making the code paths more readable.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
ff3fb91ba6 vmm: Refactor creation of the FileOffset for GuestRegionMmap::new()
Create this earlier so that it is possible to pass a None in for
anonymous mappings.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Jinrong Liang
cb171d4a23 device_manager: Avoid checking io_uring support when it's not needed
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
2022-10-27 22:21:51 -07:00
Wei Liu
b99b2bc990 memory_manager: use MFD_CLOEXEC flag when creating memory fd
Until there is a need for sharing the memory fd with a child process, we
should err on the safe side to close it on exec.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-10-27 09:20:08 +02:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Sebastien Boeuf
157db33d65 vmm: Refactor hypervisor::Vm creation on restore
This prevents from leaking implementation details to lib.rs, and rather
keep them in vm.rs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Fabiano Fidêncio
b4e3942708 api: Fix vm.add-device argument type
The add_device() function, from the device manager code, takes a
DeviceConfig as a parameter, instead of a VmAddDevice.

The change was originally done as part of 34412c9b41 and it didn't
break Kata Containers because the VmAddDevice and DeviceConfig structs
share most of their fields, besides the optional for serialization
`pci_segment`, which is not used by the client yet.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-21 11:09:55 -07:00
Sebastien Boeuf
c52ccf3992 vmm: migration: Create destination VM right before to restore it
This is preliminary work to ensure a migrated VM is created right before
it is restored. This will be useful when moving to a design where the VM
is both created and restored simultaneously from the Snapshot.

In details, that means the MemoryManager is the object that must be
created upon receiving the config from the source VM, so that memory
content can be later received and filled into the GuestMemory.
Only after these steps happened, the snapshot is received from the
source VM, and the actual Vm object can be created from both the
snapshot and the MemoryManager previously created.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-18 17:14:29 +02:00
Rob Bradford
a75d71f2c8 vmm: Reduce logging severity for unknown MMIO/PIO device accesses
These look alarming if you are booting with the a distro kernel which is
now a recommended approach.

See: #4786

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-17 10:08:36 -07:00
Bo Chen
96209e7a16 vmm: Remove the explicit call to 'Snapshottable:restore()'
The restore path of MemoryManager is handled specially without
implementing a `Snapshottable:restore()`. Removing the explicit call to
it along the migration code path to avoid confusions.

See: #4783

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-17 10:07:44 -07:00
Sebastien Boeuf
099cdd2af8 virtio-devices, vmm: vdpa: Implement live migration support
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.

Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).

In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Sebastien Boeuf
22be5f9d0f vmm: Extend list of authorized ioctls for vDPA
Adding VHOST_VDPA_GET_CONFIG_SIZE and VHOST_VDPA_SUSPEND to the list of
authorized ioctls for the vmm thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Bo Chen
37c3b0429a vmm: Make MemoryManager::create_ram_region() public
So that it can be reused externally, such as for fuzzing.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-12 16:09:27 +01:00
Anatol Belski
a18b08c682 seccomp: mshv: Allow create partition ioctl
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2022-10-11 09:05:24 +01:00
Bo Chen
29cf637f3f vmm: Move 'default_serial/console()' to vm_config.rs
In this way, we have all functions related to generate default values of
vm-config structs in the same location.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-07 09:13:15 -07:00
Rob Bradford
83cc554f90 vmm: Remove deprecated VmConfig::{kernel, initramfs, cmdline} members
These have been replaced by members of PayloadConfig and should be
removed in v28.0 (mentioned in v26.0 release notes.)

Fixes: #4737

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-06 14:25:29 +01:00
Rob Bradford
7d8d27c1b4 vmm: Rename queue size / number of queues constants
These constants still referenced the long removed (separate vhost-user
structs.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-06 14:25:29 +01:00
Rob Bradford
d692dfb8e3 vmm: Move impl Default for ... to vm_config.rs
This is consistent when considering that some structs have a
`#[derive(Default)`] so it makes sense for the default implementations
to be in the same location.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-06 14:25:29 +01:00
Rob Bradford
7ad58457b0 vmm: Split structs from logic that make up VmConfig
Place the data structures that are required for constructing a VmConfig
into it's own module from the logic that exists to suppot them.

This is useful as a consumer of the API can now clearly see what data
structures make up the API for creating VMs.

This has no functional change and I made no attempt to clean up the
ordering (it's as in the original file) nor any other clean up.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-06 14:25:29 +01:00
Sebastien Boeuf
89677c3181 build: Bump clap from 3.2.22 to 4.0.9
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.22 to 4.0.9.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](clap-rs/clap@v3.2.22...v4.0.9)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Moving to the major version 4 introduced some breaking changes which had
to be handled manually.

Fixes #4709

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-05 12:59:14 +01:00
Rob Bradford
2daab89987 vmm: Remove legacy firmware loading
This functionality was deprecated and is for removal in the upcoming
release.

Fixes: #4511

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-03 17:09:02 +01:00
Rob Bradford
1bc63e7848 vmm: Remove legacy I/O ports for ACPI
These addresses have been superseded and replaced with other I/O ports.

Fixes: #4483

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-03 17:08:57 +01:00
Bo Chen
2115a41568 openapi: Add 'firmware' to 'PayloadConfig'
This option is needed for the openapi consumer (e.g. Kata Containers) to
load firmware (e.g. td-shim) for booting.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-01 08:45:21 +01:00
Rob Bradford
06eb82d239 build: Consolidate "gdb" build feature into "guest_debug"
This simplifies the CI process but also logical with the existing
functionality under "guest_debug" (dumping guest memory).

Fixes: #4679

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-27 14:30:57 +01:00
Sebastien Boeuf
3bf3cca70a vhost_user_net: Allow user to set MTU
Adding the support for the user to set the MTU for the vhost-user-net
backend, which allows the integration test to be extended with the test
of the MTU parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-27 10:37:35 +01:00
Sebastien Boeuf
903c08f8a1 net: Don't override default TAP interface MTU
Adjust MTU logic such that:
1. Apply an MTU to the TAP interface if the user supplies it
2. Always query the TAP interface for the MTU and expose that.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-27 10:37:35 +01:00
Rob Bradford
b2d1dd65f3 build: Remove "fwdebug" and "common" feature flags
This simplifes the buld and checks with very little overhead and the
fwdebug device is I/O port device on 0x402 that can be used by edk2 as a
very simple character device.

See: #4679

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-26 10:16:33 -07:00
Rob Bradford
66c092e69b build: Bump linux-loader from 0.5.0 to 0.6.0
Bumps [linux-loader](https://github.com/rust-vmm/linux-loader) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/rust-vmm/linux-loader/releases)
- [Changelog](https://github.com/rust-vmm/linux-loader/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-vmm/linux-loader/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: linux-loader
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-24 09:54:18 +00:00
Rob Bradford
1202b9a07a vmm: Add some tracing of boot sequence
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-22 18:09:31 +01:00
Sebastien Boeuf
76dbf85b79 net: Give the user the ability to set MTU
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.

In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-21 16:20:57 +02:00
Sebastien Boeuf
f38056fc9e virtio-devices, vmm: Simplify virtio-mem resize operation
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.

In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-20 13:43:40 +02:00
Rob Bradford
f32487f8e8 misc: Automatic beta clippy fixes
e.g. cargo clippy --all --tests --all-targets --fix --features=..

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-20 10:59:48 +01:00
Sebastien Boeuf
1849ffff31 vmm: Remove "amx" feature gate
Given the AMX x86 feature has been made available since kernel v5.17,
and given we don't have any test validating this feature, there's no
need to keep it behing a Rust feature gate.

Fixes #3996

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-16 15:03:31 +01:00
Rob Bradford
0e52be0909 vmm: Ensure default deserialisation for "amx" feature bit
This allows a migration from a binary not compiled with struct member to
be completed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-16 15:03:31 +01:00
Sebastien Boeuf
3793ffe888 vmm: config: Move TDX to rely on PayloadConfig
Removing the option --tdx to specify that we want to run a TD VM. Rely
on --platform option by adding the "tdx" boolean parameter. This is the
new way for enabling TDX with Cloud Hypervisor.

Along with this change, the way to retrieve the firmware path has been
updated to rely on the recently introduced PayloadConfig structure.

Fixes #4556

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-05 12:14:59 +01:00
Sebastien Boeuf
b3bef3adda vmm: acpi: Don't declare MMIO config space through PCI buses
The PCI buses should not declare the address space related to the MMIO
config space given it's already declared in the MCFG table and through
the motherboard device PNP0C02 in the DSDT table.

The PCI MMIO config region for the segment was being wrongly exposed as
part of the _CRS for the ACPI bus device (using Memory32Fixed). Exposing
it via this object was ineffectual as the equivalent entry in the
PNP0C02 (_SB_.MBRD) marked those ranges as not usable via the kernel.
Either way, with both devices used by the kernel, the kernel will not
try and use those memory ranges for the device BARs. However under
td-shim on TDX the PNP0C02 device is not on the permitted list of
devices so the the memory ranges were not marked as unusable resulting
in the kernel attempting to allocate BARs that collided with the PCI
MMIO configuration space.

This is based on the kernel documentation PCI/acpi-info.rst which relies
on ACPI and PCI Firmware specifications. And here are the interesting
quotes from this document:

"""
Prior to the addition of Extended Address Space descriptors, the failure
of Consumer/Producer meant there was no way to describe bridge registers
in the PNP0A03/PNP0A08 device itself. The workaround was to describe the
bridge registers (including ECAM space) in PNP0C02 catch-all devices.
With the exception of ECAM, the bridge register space is device-specific
anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need
to know about it.

PNP0C02 “motherboard” devices are basically a catch-all. There’s no
programming model for them other than “don’t use these resources for
anything else.” So a PNP0C02 _CRS should claim any address space that is
(1) not claimed by _CRS under any other device object in the ACPI
namespace and (2) should not be assigned by the OS to something else.

The address range reported in the MCFG table or by _CBA method (see
Section 4.1.3) must be reserved by declaring a motherboard resource. For
most systems, the motherboard resource would appear at the root of the
ACPI namespace (under _SB) in a node with a _HID of EISAID (PNP0C02),
and the resources in this case should not be claimed in the root PCI
bus’s _CRS. The resources can optionally be returned in Int15 E820 or
EFIGetMemoryMap as reserved memory but must always be reported through
ACPI as a motherboard resource.
"""

This change has been manually tested by running a VM with multiple
segments (4 segments), and by hotplugging an additional disk to the
segment number 2 (third segment).

From one shell:
"""
cloud-hypervisor \
    --cpus boot=1 \
    --memory size=1G \
    --kernel vmlinux \
    --cmdline "root=/dev/vda1 rw console=hvc0" \
    --disk path=jammy-server-cloudimg.raw \
    --api-socket /tmp/ch.sock \
    --platform num_pci_segments=4
"""

From another shell (after the VM is booted):
"""
ch-remote \
    --api-socket=/tmp/ch.sock \
    add-disk \
    path=test-disk.raw,id=disk2,pci_segment=2
"""

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-02 14:14:23 +02:00
Nuno Das Neves
784a3aaf3c devices: gic: use VgicConfig everywhere
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.

Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
2022-08-31 08:33:05 +01:00
Jianyong Wu
3a19573c69 vmm: unify payload load chain on AArch64 with x86_64
AArch64 can share the same way of loading payload with x86_64. It makes
the payload loading more consistent between different arches.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Michael Zhao
b65639fad3 vmm:AArch64: move uefi_flash to memory manager
uefi_flash is used when load firmware, that is load payload depends on
device manager. move uefi_flash to memory manager can eliminate the
dependency.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-31 08:32:08 +01:00
Jianyong Wu
9b1452f258 vmm:AArch64: load standalone firmware
A new firmware item has been added into payload config, we need
extend ability to load standalone firmware on AArch64.
"load_kernel" method will be the entry of image loading work including
kernel and firmware.

This change is back compatible. So, we can either load firmware through
"-kernel" like before or "-firmware".

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Jianyong Wu
2054c8699a vmm:AArch64: add load_firmware method
Later, we will load standalone firmware. So, refactor load_kernel
by abstracting load_firmware method.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Sebastien Boeuf
8c02648ac9 vmm: device_manager: Update virtio-console for proper PTY support
Given the virtio-console is now able to buffer its output when no PTY is
connected on the other end, the device manager code is updated to enable
this. Moving the endpoint type from FilePair to PtyPair enables the
proper codepath in the virtio-console implementation, as well as
updating the PTY resize code, and forcing the PTY to always be
non-blocking.

The non-blocking behavior is required to avoid blocking the guest that
would be waiting on the virtio-console driver. When receiving an
EWOULDBLOCK error, the output will simply be redirected to the temporary
buffer so that it can be later flushed.

The PTY resize logic has been slightly modified to ensure the PTY file
descriptors are closed. It avoids the child process to keep a hold onto
the PTY device, which would have caused the PTY to believe something is
connected on the other end, which would have prevented the detection of
any new connection on the PTY.

Fixes #4521

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
a940f525a8 vmm: Move SerialBuffer to its own crate
We want to be able to reuse the SerialBuffer from the virtio-devices
crate, particularly from the virtio-console implementation. That's why
we move the SerialBuffer definition to its own crate so that it can be
accessed from both vmm and virtio-devices crates, without creating any
cyclic dependency.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
63462fd8ab vmm: serial_manager: Iterate again on EINTR
If the epoll_wait() call returns EINTR, this only means a signal has
been delivered before any of the file descriptors registered triggered
an event or before the end of the timeout (if timeout isn't -1). For
that reason, we should simply try to listen on the epoll loop again.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
0bcb6ff061 vmm: Limit the size of the SerialBuffer
We must limit how much the buffer can grow, otherwise this could lead
the process to consume all the memory on the machine. This could happen
if the output from the guest was very important and nothing would
connect to the PTY for a long time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-24 12:14:59 +02:00
Michael Zhao
d66d64c325 vmm: Restrict the maximum number of HW breakpoints
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-23 16:57:12 +02:00
Wei Liu
3e6b0a5eab vmm: unify TranslateVirtualAddress error for both x86_64 and aarch64
Using anyhow::Error should cover both architectures.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-22 09:37:21 -07:00
Michael Zhao
c798b958f3 vmm: Extend seccomp rules for GDB
Add 'KVM_SET_GUEST_DEBUG' ioctl to seccomp filter rules.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Michael Zhao
0522e40933 vmm: Implement translate_gva on AArch64
On AArch64, `translate_gva` API is not provided by KVM. We implemented
it in VMM by walking through translation tables.

Address translation is big topic, here we only focus the scenario that
happens in VMM while debugging kernel. This `translate_gva`
implementation is restricted to:
 - Exception Level 1
 - Translate high address range only (kernel space)

This implementation supports following Arm-v8a features related to
address translation:
 - FEAT_LPA
 - FEAT_LVA
 - FEAT_LPA2

The implementation supports page sizes of 4KiB, 16KiB and 64KiB.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Michael Zhao
5febdec81a vmm: Enable gdbstub on AArch64
The `gva_translate` function is still missing, it will be added with a
separate commit.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Nuno Das Neves
fdc8546eef vmm: aarch64: Use GIC_V3_* consts instead of magic numbers in create_madt()
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
2022-08-21 17:06:48 +08:00
Sebastien Boeuf
cdcd4d259e vmm: serial: Wait for PTY to be available before writing to it
The goal of this patch is to provide a reliable way to detect when the
other end of the PTY is connected, and therefore be able to identify
when we can write to the PTY device. This is needed because writing to
the PTY device when the other end isn't connected causes the loss of
the written bytes.

The way to detect the connection on the other end of the PTY is by
knowing the other end is disconnected at first with the presence of the
EPOLLHUP event. Later on, when the connection happens, EPOLLHUP is not
triggered anymore, and that's when we can assume it's okay to write to
the PTY main device.

It's important to note we had to ensure the file descriptor for the
other end was closed, otherwise we would have never seen the EPOLLHUP
event. And we did so by removing the "sub" field from the PtyPair
structure as it was keeping the associated File opened.

Fixes #3170

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-19 14:39:06 +01:00
Rob Bradford
396f9ce2c6 vmm: Deprecate non-PVH firmware loading
Curently all the firmware blobs we support can use PVH loading.

See: #4511

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-18 17:29:44 +01:00
Rob Bradford
282a1001ef vmm: x86_64: Rename load_firmware() to reflect its purpose
This function only supports loading legacy, non-PVH firmware binaries.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
0d682e185f vmm: x86_64: Add support for firmware loading
Since our firmware files are still designed to be used via PVH use the
load_kernel() function to load the firmware falling back to legacy
firmware loading if necessary.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
8ec5a248cd main, vmm: Add option to pass firmware parameter in payload
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
763ea7da42 vmm: x86_64: Split payload loading into it's own function
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
2856074d12 vmm: x86_64: Make kernel loading use PayloadConfig
Minor refactoring to start supporting loading a firmware payload

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
485900eeb4 vmm: x86_64: Use more general name for payload handling
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
6988da79d2 vmm: x86_64: Split legacy firmware loading into own function
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Sebastien Boeuf
98f949d35d vmm: Add new I/O ports for ACPI shutdown and PM timer devices
Adding new I/O ports for both the ACPI shutdown and the ACPI PM timer
devices so they can be triggered from both addresses. The reason for
this change is that TDX expects only certain I/O ports to be enabled
based on what QEMU exposes. We follow this to avoid new ports from being
opened exclusively for Cloud Hypervisor.

We have to keep the former I/O ports available given all firmwares
haven't been updated yet. Once we reach a point where we know both Rust
Hypervisor Firmware, OVMF, TDVF and TDSHIM have been updated with the
new port values, we'll be able to remove the former ports.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-11 11:46:09 +01:00
Rob Bradford
8c22c03e1e vmm: openapi: Switch to describing new payload API
The old API remains usable, and will remain usable for two releases but
we should only advertise the new API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 22:20:07 +01:00
Rob Bradford
51fdc48817 vmm: openapi: Fix OpenAPI YAML file formatting
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 22:20:07 +01:00
Rob Bradford
cef51a9de0 vmm: Encompass guest payload configuration in PayloadConfig
Introduce a new top level member of VmConfig called PayloadConfig that
(currently) encompasses the kernel, commandline and initramfs for the
guest to use.

In future this can be extended for firmware use. The existing
"--kernel", "--cmdline" and "initramfs" CLI parameters now fill the
PayloadConfig.

Any config supplied which uses the now deprecated config members have
those members mapped to the new version with a warning.

See: #4445

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 15:12:34 +01:00
Rob Bradford
6bc46ba9c1 vmm: config: Reject VFIO devices with the same path
By checking in the validation logic we get checking for both devices
specified in the initial config but also hotplug too.

Fixes: #4453

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-09 14:32:35 +02:00
Rob Bradford
ea58d2f68a vmm: config: Enhance test_cpu_parsing to add "affinity" parameter
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-08 16:23:00 +01:00
Rob Bradford
d295de4cd5 option_parser: Move test_option_parser to option_parser crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-08 16:23:00 +01:00
Wei Liu
53aecf9341 vmm: add oem_strings to openapi
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-08 08:59:19 +01:00
Wei Liu
57e9b80123 vmm: provide oem_strings option
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-08 08:59:19 +01:00
lizhaoxin1
65f42c1f62 vmm: openapi: Add uuid to PlatformConfig
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
2022-08-04 09:20:06 +02:00
lizhaoxin1
bc3a276b43 arch, vmm: Expose platform uuid via SMBIOS
Parse and set uuid.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-04 09:20:06 +02:00
lizhaoxin1
3abc1e1e51 vmm: config: Add "uuid" option to "--platform"
The uuid indicates the unique ID of a virtual machine.
cloud-hypervisor takes the uuid passed by libvirt
and uses it to initialize cloud-init.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
2022-08-04 09:20:06 +02:00
Bo Chen
1125fd2667 vmm: api: Use 'BTreeMap' for 'HttpRoutes'
In this way, we get the values sorted by its key by default, which is
useful for the 'http_api' fuzzer.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Bo Chen
eb056d374a vmm: Make 'EpollContext::add_event()' public
So that it can be reused by other crate, e.g. from fuzz targets.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Sebastien Boeuf
4d74525bdc vmm: Remove unused "poll_queue" from DiskConfig
The parameter "poll_queue" was useful at the time Cloud Hypervisor was
responsible for spawning vhost-user backends, as it was carrying the
information the vhost-user-block backend should have this option enabled
or not.

It's been quite some time that we walked away from this design, as we
now expect a management layer to be responsible for running vhost-user
backends.

That's the reason why we can remove "poll_queue" from the DiskConfig
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-02 15:10:11 +02:00
Michael Zhao
7199119bb2 hypervisor: Remove Vcpu::read_mpidr() on AArch64
Replaced `read_mpidr()` with `get_sys_reg()`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Michael Zhao
cd7f36a713 hypervisor: Remove get/set_reg() on AArch64
`Vcpu::get/set_reg()` were only invoked in Vcpu itself.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Michael Zhao
f7b6d99c2d hypervisor: Remove get/set_sys_regs() on AArch64
`hypervisor::Vcpu::get/set_sys_regs()` are only used in Vcpu internally.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Rob Bradford
857edc71a9 vmm: cpu: Remove now unused CpuManager::vcpus_paused()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-26 09:22:25 +02:00
Rob Bradford
0e29379bcf vmm: Make gdb break/resuming more resilient
When starting the VM such that it is already on a breakpoint (via
stop_on_boot) when attached to gdb then start the vCPUs in a paused
state rather than starting the vCPUs later (upon resume).

Further, make the resumption/break of the VM more resilient by only
attempting to resume the vCPUs if were are already in a break point and
only attempting to pause/break if we were already running.

Fixes: #4354

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-26 09:22:25 +02:00
Rob Bradford
a749182777 vmm: acpi: Use ACPI platform device addresses from DeviceManager
Remove the hardcoded addresses.

Also remove PM_TMR_BLK as spec compliant implementation will use
X_PM_TMR_BLK over this field.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-25 16:16:06 +01:00
Rob Bradford
2e8eb96ef6 vmm: device_manager: Store ACPI platform addresses for later use
These are ready for inclusion in the FACP table.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-25 16:16:06 +01:00
Wei Liu
ad33f7c5e6 vmm: return seccomp rules according to hypervisors
That requires stashing the hypervisor type into various places.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-22 12:50:12 +01:00
Wei Liu
a96a5d7816 hypervisor, vmm: use new vfio-ioctls
Use the new vfio-ioctls APIs. Drop Cloud Hypervisor's Device trait
since it is no longer needed.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-21 23:37:53 +01:00
Wei Liu
f84ddedb1a hypervisor, vmm: introduce trait functions for aarch64 PMU
The original code uses kvm_device_attr directly outside of the
hyeprvisor crate. That leaks hypervisor details.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-21 23:37:53 +01:00
Wei Liu
f21fc1dcb6 hypervisor: x86: provide a generic MsrEntry structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-20 10:13:41 +01:00
Wei Liu
4d2cc3778f hypervisor: move away from MsrEntries type
It is a flexible array. Switch to vector and slice instead.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-20 10:13:41 +01:00
Wei Liu
05e5106b9b hypervisor x86: provide a generic LapicState structure
This requires making get/set_lapic_reg part of the type.

For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-19 09:38:38 +01:00
Wei Liu
6a8c0fc887 hypervisor: provide a generic FpuState structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
08135fa085 hypervisor: provide a generic CpudIdEntry structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
45fbf840db hypervisor, vmm: move away from CpuId type
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.

Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.

This allows us to decouple CpuIdEntry from hypervisors more easily.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
f1ab86fecb hypervisor: x86: provide a generic SpecialRegisters structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
75797827d5 hypervisor: x86: provide a generic SegmentRegister structure
And drop SegmentRegisterOps since it is no longer required.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
8b7781e267 hypervisor: x86: provide a generic StandardRegisters structure
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
4201bf4011 hypervisor: provide a generic ClockData structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Wei Liu
beb4f86b82 hypervisor, vmm: drop VmState and code
VmState was introduced to hold hypervisor specific VM state. KVM does
not need it and MSHV does not really use it yet.

Just drop the code. It can be easily revived once there is a need.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Alyssa Ross
a455917db5 vmm: fix missed API or debug events
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us.  This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on).  To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one.  This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.

For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once.  So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.

Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again.  But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2022-07-14 17:44:11 +01:00
Michael Zhao
2d8635f04a hypervisor: Refactor system_registers on AArch64
Function `system_registers` took mutable vector reference and modified
the vector content. Now change the definition to `get/set` style.
And rename to `get/set_sys_regs` to align with other functions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Michael Zhao
c445513976 hypervisor: Refactor core_registers on AArch64
On AArch64, the function `core_registers` and `set_core_registers` are
the same thing of `get/set_regs` on x86_64. Now the names are aligned.
This will benefit supporting `gdb`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Wei Liu
0e8769d76a device_manager: assert passthrough_device has the correct type
There is a lot of unsafe code in such a small function. Add an assert
to help detect issues earlier.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 08:09:50 +01:00
Wei Liu
84bbaf06d1 hypervisor: turn boot_msr_entries into a trait method
This allows dispatching to either KVM or MSHV automatically.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-08 16:49:58 +01:00
Rob Bradford
121729a3b0 vmm: Split signal handling for VM and VMM signals
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.

The generic VMM signals (SIGINT and SIGTERM) need handling at all times.

Split the signal handling into two separate threads which have differing
lifetimes.

Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
    SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
    socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.

Fixes: #4269

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-08 15:15:46 +01:00
Rob Bradford
93237f0106 vmm: Set MADT "Online Capable" flag
The Linux kernel now checks for this before marking CPUs as
hotpluggable:

commit aa06e20f1be628186f0c2dcec09ea0009eb69778
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Sep 8 16:41:46 2021 -0500

    x86/ACPI: Don't add CPUs that are not online capable

    A number of systems are showing "hotplug capable" CPUs when they
    are not really hotpluggable.  This is because the MADT has extra
    CPU entries to support different CPUs that may be inserted into
    the socket with different numbers of cores.

    Starting with ACPI 6.3 the spec has an Online Capable bit in the
    MADT used to determine whether or not a CPU is hotplug capable
    when the enabled bit is not set.

    Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-01 18:45:05 +01:00
Rob Bradford
adf5881757 build: #[allow(clippy::significant_drop_in_scrutinee) in some crates
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
b57d7b258d build: Fix beta clippy issue (needless_return)
warning: unneeded `return` statement
   --> pci/src/vfio_user.rs:627:13
    |
627 | /             return Err(std::io::Error::new(
628 | |                 std::io::ErrorKind::Other,
629 | |                 format!("Region not found for 0x{:x}", gpa),
630 | |             ));
    | |_______________^
    |
    = note: `#[warn(clippy::needless_return)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return
help: remove `return`
    |
627 ~             Err(std::io::Error::new(
628 +                 std::io::ErrorKind::Other,
629 +                 format!("Region not found for 0x{:x}", gpa),
630 +             ))
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2716bc3311 build: Fix beta clippy issue (derive_partial_eq_without_eq)
warning: you are deriving `PartialEq` and can implement `Eq`
  --> vmm/src/serial_manager.rs:59:30
   |
59 | #[derive(Debug, Clone, Copy, PartialEq)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2e664dca64 vmm: Always reset the console mode on VMM exit
Tested:

1. SIGTERM based
2. VM shutdown/poweroff
3. Injected VM boot failure after calling Vm::setup_tty()

Fixes: #4248

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-28 16:45:27 +01:00
Rob Bradford
65ec6631fb vmm: cpu: Store the vCPU snapshots in ascending order
The snapshots are stored in a BTree which is ordered however as the ids
are strings lexical ordering places "11" ahead of "2". So encode the
vCPU id with zero padding so it is lexically sorted.

This fixes issues with CPU restore on aarch64.

See: #4239

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-27 16:20:57 +01:00
Wei Liu
bccd7c7e48 vmm: drop Sync+Send bounds for EndpointHandler
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 23:28:57 +01:00
Wei Liu
8fa1098629 vmm: switch from lazy_static to once_cell
Once_cell does not require using macro and is slated to become part of
Rust std at some point.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 16:03:07 +01:00
Sebastien Boeuf
335a4e1cc0 vmm: api: Expose kvm_hyperv parameter in OpenAPI description
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-17 15:11:53 +01:00
Sebastien Boeuf
81ba70a497 pci, vmm: Defer mapping VFIO MMIO regions on restore
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.

When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
7df7061610 pci, vmm: Add migratable support to vfio-user devices
Based on recent changes to VfioUserPciDevice, the vfio-user devices can
now be migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
c021dda267 pci, vmm: Add migratable support to VFIO devices
Based on recent changes to VfioPciDevice, the VFIO devices can now be
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Rob Bradford
94fb9f817d vmm: Fix clippy issues under "guest_debug" feature
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-08 11:40:56 +01:00
Michael Zhao
a7a15d56dd aarch64: Move setup_regs to hypervisor
`setup_regs` of AArch64 calls KVM sepecific code. Now move it to
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 11:07:46 +01:00
Sebastien Boeuf
65dc1c83a9 vmm: cpu: Save and restore CPU states during snapshot/restore
Based on recent KVM host patches (merged in Linux 5.16), it's forbidden
to call into KVM_SET_CPUID2 after the first successful KVM_RUN returned.
That means saving CPU states during the pause sequence, and restoring
these states during the resume sequence will not work with the current
design starting with kernel version 5.16.

In order to solve this problem, let's simply move the save/restore logic
to the snapshot/restore sequences rather than the pause/resume ones.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:29 +01:00
Sebastien Boeuf
3edaa8adb6 vmm: Ensure restore matches boot sequence
The vCPU is created and set after all the devices on a VM's boot.
There's no reason to follow a different order on the restore codepath as
this could cause some unexpected behaviors.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:17 +01:00
Michael Zhao
9260c3816e vmm: Update unit test for GIC refactoring
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
5d45d6d0fb vmm: Move GIC unit test to hypervisor crate
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
957d3a7443 aarch64: Simplify GIC related structs definition
Combined the `GicDevice` struct in `arch` crate and the `Gic` struct in
`devices` crate.

After moving the KVM specific code for GIC in `arch`, a very thin wapper
layer `GicDevice` was left in `arch` crate. It is easy to combine it
with the `Gic` in `devices` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
04949755c0 arch: Switch to new GIC interface
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Rob Bradford
ade3a9c8f6 virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-01 09:42:02 +02:00
Yi Wang
dbeb922882 doc: add vm coredump support
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
8b585b96c1 vmm: enable coredump
Based on the newly added guest_debug feature, this patch adds http
endpoint support.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
ccb604e1e1 vmm: add cpu segment note for coredump
The crash tool use a special note segment which named 'QEMU' to
analyze kaslr info and so on. If we don't add the 'QEMU' note
segment, crash tool can't find linux version to move on.

For now, the most convenient way is to add 'QEMU' note segment to
make crash tool happy.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
0e65ca4a6c vmm: save guest memory for coredump
Guest memory is needed for analysis in crash tool, so save it
for coredump.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
7e280b6f70 vmm: save elf header for coredump
The vmcore file of guest is an elf format, so the first step of coredump
is to save the elf header.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
90034fd6ba vmm: add GuestDebuggable trait
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.

Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Rob Bradford
465db7f08c vmm: config: Remove mergeable option from PmemConfig
Fixes: #3968

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:48:49 +02:00
Rob Bradford
55c5961f43 vmm: config: Remove dax & cache_size options from FsConfig
Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
7c3582b4a8 vmm: config: Fix error message regarding use of cache size without dax
The error message incorrectly said that the user was trying to combine
cache_size without dax whereas it is only usuable with dax.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
979797786d vmm: Remove DAX cache setup for virtio-fs devices
Remove the code from the DeviceManager that prepares the DAX cache since
the functionality has now been removed.

Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Michael Zhao
0fd6521759 aarch64: Avoid depending on layout in GIC code
Removing the dependency on `layout` helps moving GIC code into
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Michael Zhao
3fe20cc09a aarch64: Remove GicDevice trait
`GicDevice` trait was defined for the common part of GicV3 and ITS.
Now that the standalone GicV3 do not exist, `GicDevice` is not needed.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Rob Bradford
fa07d83565 Revert "virtio-devices, vmm: Optimised async virtio device activation"
This reverts commit f160572f9d.

There has been increased flakiness around the live migration tests since
this was merged. Speculatively reverting to see if there is increased
stability.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-21 21:27:33 +01:00
Rob Bradford
f160572f9d virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-20 17:07:13 +01:00
Sebastien Boeuf
49db713124 virtio-devices, vmm: Remove unused macro rules
Latest cargo beta version raises warnings about unused macro rules.
Simply remove them to fix the beta build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-20 09:59:43 +01:00
Maksym Pavlenko
3a0429c998 cargo: Clean up serde dependencies
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-18 08:21:19 +02:00
Rob Bradford
16a9882153 vmm: cpu: tdx: Don't use fd suffix for something not an FD
The hypervisor::Vcpu is the abstraction over the fd.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
218be2642e hypervisor: Explicitly pub use at the hypervisor crate top-level
Explicitly re-export types from the hypervisor specific modules. This
makes it much clearer what the common functionality that is exposed is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
cd0df05808 vmm, arch: CpuId is x86_64 specific so import from the x86_64 module
It will be removed as a top-level export from the hypervisor crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
d3f66f8702 hypervisor: Make vm module private
And thus only export what is necessary through a `pub use`. This is
consistent with some of the other modules and makes it easier to
understand what the external interface of the hypervisor crate is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
b1bd87df19 vmm: Simplify MsiInterruptManager generics
By taking advantage of the fact that IrqRoutingEntry is exported by the
hypervisor crate (that is typedef'ed to the hypervisor specific version)
then the code for handling the MsiInterruptManager can be simplified.

This is particularly useful if in this future it is not a typedef but
rather a wrapper type.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
3f9e8d676a hypervisor: Move creation of irq routing struct to hypervisor crate
This removes the requirement to leak as many datastructures from the
hypervisor crate into the vmm crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
c2c813599d vmm: Don't use kvm_ioctls directly
The IoEventAddress is re-exported through the crate at the top-level.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 15:57:43 +01:00
Rob Bradford
387d56879b vmm, hypervisor: Clean up nomenclature around offloading VM operations
The trait and functionality is about operations on the VM rather than
the VMM so should be named appropriately. This clashed with with
existing struct for the concrete implementation that was renamed
appropriately.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 13:10:01 +01:00
Sebastien Boeuf
5f722d0d3f vmm: Fix loading RAW firmware
Whenever going through the codepath of loading a RAW firmware, we always
add an extra RAM region to the guest memory through the memory manager.
But we must be careful to use the updated guest memory rather than a
previous reference that wasn't containing the new region, as this can
lead to the following error:

VmBoot(FirmwareLoad(InvalidGuestAddress(GuestAddress(4290772992))))

This is fixed by the current patch, getting the latest reference onto
the guest memory from the memory manager right after the new region has
been added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-06 18:13:28 +02:00
Bo Chen
42c19e14c5 vmm: Add 'shutdown()' to vCPU seccomp filter
This is required when hot-removing a vfio-user device. Details code path
below:

Thread 6 "vcpu0" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7f8196889700 (LWP 2358305)]
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
  0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
  0x000056189240737d in std::sys::unix::net::Socket::shutdown ()
    at library/std/src/sys/unix/net.rs:383
  std::os::unix::net::stream::UnixStream::shutdown () at library/std/src/os/unix/net/stream.rs:479
  0x000056189210e23d in vfio_user::Client::shutdown (self=0x7f8190014300)
    at vfio_user/src/lib.rs:787
  0x00005618920b9d02 in <pci::vfio_user::VfioUserPciDevice as core::ops::drop::Drop>::drop (
    self=0x7f819002d7c0) at pci/src/vfio_user.rs:551
  0x00005618920b8787 in core::ptr::drop_in_place<pci::vfio_user::VfioUserPciDevice> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b92e3 in core::ptr::drop_in_place<core::cell::UnsafeCell<dyn pci::device::PciDevice>>
    () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b9362 in core::ptr::drop_in_place<std::sync::mutex::Mutex<dyn pci::device::PciDevice>> () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920d8a3e in alloc::sync::Arc<T>::drop_slow (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1092
  0x00005618920ba273 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1688
 0x00005618920b76fb in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mutex::Mutex<dyn pci::device::PciDevice>>> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
 0x0000561891b5e47d in vmm::device_manager::DeviceManager::eject_device (self=0x7f8190009600,
    pci_segment_id=0, device_id=3) at vmm/src/device_manager.rs:4000
 0x0000561891b674bc in <vmm::device_manager::DeviceManager as vm_device:🚌:BusDevice>::write (
    self=0x7f8190009600, base=70368744108032, offset=8, data=&[u8](size=4) = {...})
    at vmm/src/device_manager.rs:4625
 0x00005618921927d5 in vm_device:🚌:Bus::write (self=0x7f8190006e00, addr=70368744108040,
    data=&[u8](size=4) = {...}) at vm-device/src/bus.rs:235
 0x0000561891b72e10 in <vmm::vm::VmOps as hypervisor::vm::VmmOps>::mmio_write (
    self=0x7f81900097b0, gpa=70368744108040, data=&[u8](size=4) = {...}) at vmm/src/vm.rs:378
 0x0000561892133ae2 in <hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run (
    self=0x7f8190013c90) at hypervisor/src/kvm/mod.rs:1114
 0x0000561891914e85 in vmm::cpu::Vcpu::run (self=0x7f819001b230) at vmm/src/cpu.rs:348
 0x000056189189f2cb in vmm::cpu::CpuManager::start_vcpu::{{closure}}::{{closure}} ()
    at vmm/src/cpu.rs:953

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-05 15:33:26 -07:00
Sebastien Boeuf
058a61148c vmm: Factorize net creation
Since both Net and vhost_user::Net implement the Migratable trait, we
can factorize the common part to simplify the code related to the net
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
425902b296 vmm: Factorize disk creation
Since both Block and vhost_user::Blk implement the Migratable trait, we
can factorize the common part to simplify the code related to the disk
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
54f39aa8cb vmm: Validate vhost-user-block/net are not configured with iommu=on
Extend the validate() function for both DiskConfig and NetConfig so that
we return an error if a vhost-user-block or vhost-user-net device is
expected to be placed behind the virtual IOMMU. Since these devices
don't support this feature, we can't allow iommu to be set to true in
these cases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Rob Bradford
707cea2182 vmm, devices: Move logging of 0x80 timestamp to its own device
This is a cleaner approach to handling the I/O port write to 0x80.
Whilst doing this also use generate the timestamp at the start of the VM
creation. For consistency use the same timestamp for the ARM equivalent.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 23:02:53 +01:00
Rob Bradford
c47e3b8689 gdb: Do not use VmmOps for memory manipulation
We don't use the VmmOps trait directly for manipulating memory in the
core of the VMM as it's really designed for the MSHV crate to handle
instruction decoding. As I plan to make this trait MSHV specific to
allow reduced locking for MMIO and PIO handling when running on KVM this
use should be removed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 11:33:02 -07:00
Bo Chen
7fe399598d vmm: device_manager: Map MMIO regions to the guest correctly
To correctly map MMIO regions to the guest, we will need to wait for valid
MMIO region information which is generated from 'PciDevice::allocate_bars()'
(as a part of 'DeviceManager::add_pci_device()').

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Rob Bradford
1dfe4eda5c vmm: Prevent "internal" identifiers being used by user
For devices that cannot be named by the user use the "__" prefix to
identify them as internal devices. Check that any identifiers provided
in the config do not clash with those internal names. This prevents the
user from creating a disk such as "__serial" which would then cause a
failure in unpredictable manner.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 12:34:11 +02:00
Sebastien Boeuf
6e101f479c vmm: Ensure hotplugged device identifier is unique
Whenever a device (virtio, vfio, vfio-user or vdpa) is hotplugged, we
must verify the provided identifier is unique, otherwise we must return
an error.

Particularly, this will prevent issues with identifiers for serial,
console, IOAPIC, balloon, rng, watchdog, iommu and gpio since all of
these are hardcoded by the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-03 18:34:24 +01:00
Rob Bradford
6d4862245d vmm: Generate event when device is removed
The new event contains the BDF and the device id:

{
  "timestamp": {
    "secs": 2,
    "nanos": 731073396
  },
  "source": "vm",
  "event": "device-removed",
  "properties": {
    "bdf": "0000:00:02.0",
    "id": "test-disk"
  }
}

Fixes: #4038

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-03 17:10:36 +02:00
Sebastien Boeuf
a5a2e591c9 vmm: Remove FsConfig from VmConfig when unplugging fs device
All hotpluggable devices were properly removed from the VmConfig when a
remove-device command was issued, except for the "fs" type. Fix this
lack of support as it is causing the integration tests to fail with the
recent addition of verifying that identifiers are unique.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
677c8831af vmm: Ensure uniqueness of generated identifiers
The device identifiers generated from the DeviceManager were not
guaranteed to be unique since they were not taking the list of
identifiers provided through the configuration.

By returning the list of unique identifiers from the configuration, and
by providing it to the DeviceManager, the generation of new identifiers
can rely both on the DeviceTree and the list of IDs from the
configuration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
634c53ea50 vmm: config: Validate provided identifiers are unique
A valid configuration means we can only accept unique identifiers from
the user.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
LiHui
ec0c1b01c4 vmm: api: Do not delete the API socket on API server creation
The socket will safely deleted on shutdown and so it is not necessary to
delete the API socket when starting the HTTP server.

Fixes: #4026

Signed-off-by: LiHui <andrewli@kubesphere.io>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 18:40:49 +01:00
Rob Bradford
f17aa3755f vmm: Add clarifying comment about Vm::entry_point()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
744a049007 vmm: Parallelise functionality with kernel loading
Move fuctionality earlier in the boot so as to run in parallel with the
loading of the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
e70bd069b3 vmm: Load kernel asynchronously
Start loading the kernel as possible in the VM in a separate thread.
Whilst it is loading other work can be carried out such as initialising
the devices.

The biggest performance improvement is seen with a more complex set of
devices. If using e.g. four virtio-net devices then the time to start the
kernel improves by 20-30ms. With the simplest configuration the
improvement was of the order of 2-3ms.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
bfeb3120f5 vmm: Refactor kernel loading to decouple from Vm struct
This will allow the kernel to be loaded from another thread.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
ce6d88d187 vmm: Merge aarch64 use statements
These were in their own block and not organised lexically.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
56fe4c61af vmm: Duplicate Vm::entry_point() across architectures
These will have very different implementations when asynchronously
loading the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
1d1a087fc5 vmm: Refactor kernel command line generation
This allows the same code for generating the kernel command line to be
used on both aarch64 and x86_64 when the latter starts loading the
kernel in asynchronously.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
f1276c58d2 vmm: Commandline inject from devices is aarch64 specific
This is not required for x86_64 and maintains a tight coupling between
kernel loading and the DeviceManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
da33eb5e8c vmm: device_manager: Remove extra whitespace lines
These originated from the removal of the acpi feature gate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Fabiano Fidêncio
fdeb4f7c46 Revert "vmm, openapi: Token Bucket fields should be uint64"
This reverts commit 87eed369cd.

The reason we're reverting this is that OpenAPI Specification[0] doesn't
know how to deal with unsigned types. :-/

Right now the best to do is keep it as it's, as an int64, and try to fix
OpenAPI, or even switch to swagger, as the latter knows how to properly
deal with those.  However, switching to swagger is far from being an 1:1
transition and will require time to experiment, thus reverting this for
now seems the best approach.

[0]: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 09:26:38 +02:00
Fabiano Fidêncio
87eed369cd vmm, openapi: Token Bucket fields should be uint64
The Token Bucket fields are, on the Cloud Hypervisor side, u64.
However, we expose those as int64 in the OpenAPI YAML file.

With that in mind, let's adjust the yaml file to expose those as uint64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 13:16:02 +02:00
Rob Bradford
79f4c2db01 vmm: Enable virtio-iommu in VmConfig::validate()
This means that the automatic enabling of the virtio-iommu will also be
applied to VMs creates via the API as well as the CLI.

Fixes: #4016

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 12:27:00 +01:00
Rob Bradford
bf9f79081a vmm: Only create ACPI memory manager DSDT when resizable
If using the ACPI based hotplug only memory can be added so if the
hotplug RAM size is the same as the boot RAM size then do not include
the memory manager DSDT entries.

Also: this change simplifies the code marginally by making the
HotplugMethod enum Copyable.

This was identified from the following perf output:

     1.78%     0.00%  vmm              cloud-hypervisor      [.] <vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
            |
            ---<vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
               <vmm::memory_manager::MemorySlot as acpi_tables::aml::Aml>::append_aml_bytes
               acpi_tables::aml::Name::new
               <acpi_tables::aml::Path as acpi_tables::aml::Aml>::append_aml_bytes
               __libc_malloc

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 13:07:19 +02:00
Rob Bradford
62f17ccf8c vmm: Improve error handling for vmm::vm::Error
In particular implement thiserror::Error, cleanup wording and remove
unused errors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
cb03540ffd vmm: config: Derive thiserror::Error
No further changes are necessary that adding a #[derive(Error)] as there
is a manual implementation of Display.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
0270d697ab vmm: cpu: Improve Error reporting
Remove unused enum members, improve error messages and implement
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
47529796d0 arch: Improve arch::Error
Remove unused error enum entries, improve wording and derive
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
1c786610b7 vmm: api: Don't use clashing struct name for Error
Import vmm::Error as VmmError to allow the use of thiserror::Error to
avoid clashing names.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Sebastien Boeuf
eb6daa2fc3 pci: Store MSI interrupt manager in VfioCommon
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Rob Bradford
adb3dcdc13 vmm: openapi: Add serial_number to PlatformConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
e972eb7c74 arch, vmm: Expose platform serial_number via SMBIOS
Fixes: #4002

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
203dfdc156 vmm: config: Add "serial_number" option to "--platform"
This carries a string that is exposed via DMI/SMBIOS and is particularly
useful for cloud-init initialisation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
4a04d1f8f2 vmm: seccomp: Allow SYS_rseq as required by newer glibc
glibc 2.35 as shipped by Fedora 36 now uses the rseq syscall.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 13:02:51 +01:00
Rob Bradford
4ca066f077 vmm: api: Simplify error reporting from HTTP to internal API calls
Use a single enum member for representing errors from the internal API.
This avoids the ugly duplication of the API call name in the error
message:

e.g.

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: VmResize(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Becomes:

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: ApiError(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-20 19:39:05 +01:00
Sebastien Boeuf
11e9f43305 vmm: Use new Resource type PciBar
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
89218b6d1e pci: Replace BAR tuple with PciBarConfiguration
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
1795afadb8 vmm: Factorize algorithm finding HOB memory resources
By factorizing the algorithm untangling TDVF sections from guest RAM
into a dedicated function, we can write some unit tests to validate it
properly achieves what we expect.

Adding the "tdx" feature to the unit tests, otherwise it wouldn't get
tested.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 15:23:12 +02:00
Sebastien Boeuf
5264d545dd pci, vmm: Extend PciDevice trait to support BAR relocation
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0c34846ef6 vmm: Return new PCI resources from add_pci_device()
By returning the new PCI resources from add_pci_device(), we allow the
factorization of the code translating the BARs into resources. This
allows VIRTIO, VFIO and vfio-user to add the resources to the DeviceTree
node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
4f172ae4b6 vmm: Retrieve PCI resources for VFIO and vfio-user devices
Relying on the function introduced recently to get the PCI resources and
handle the restore case, both VFIO and vfio-user device creation paths
now have access to PCI resources, which can be provided to the function
add_pci_device().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00