Commit Graph

1725 Commits

Author SHA1 Message Date
Wei Liu
8b7781e267 hypervisor: x86: provide a generic StandardRegisters structure
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
4201bf4011 hypervisor: provide a generic ClockData structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Wei Liu
beb4f86b82 hypervisor, vmm: drop VmState and code
VmState was introduced to hold hypervisor specific VM state. KVM does
not need it and MSHV does not really use it yet.

Just drop the code. It can be easily revived once there is a need.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Alyssa Ross
a455917db5 vmm: fix missed API or debug events
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us.  This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on).  To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one.  This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.

For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once.  So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.

Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again.  But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2022-07-14 17:44:11 +01:00
Michael Zhao
2d8635f04a hypervisor: Refactor system_registers on AArch64
Function `system_registers` took mutable vector reference and modified
the vector content. Now change the definition to `get/set` style.
And rename to `get/set_sys_regs` to align with other functions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Michael Zhao
c445513976 hypervisor: Refactor core_registers on AArch64
On AArch64, the function `core_registers` and `set_core_registers` are
the same thing of `get/set_regs` on x86_64. Now the names are aligned.
This will benefit supporting `gdb`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Wei Liu
0e8769d76a device_manager: assert passthrough_device has the correct type
There is a lot of unsafe code in such a small function. Add an assert
to help detect issues earlier.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 08:09:50 +01:00
Wei Liu
84bbaf06d1 hypervisor: turn boot_msr_entries into a trait method
This allows dispatching to either KVM or MSHV automatically.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-08 16:49:58 +01:00
Rob Bradford
121729a3b0 vmm: Split signal handling for VM and VMM signals
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.

The generic VMM signals (SIGINT and SIGTERM) need handling at all times.

Split the signal handling into two separate threads which have differing
lifetimes.

Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
    SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
    socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.

Fixes: #4269

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-08 15:15:46 +01:00
Rob Bradford
93237f0106 vmm: Set MADT "Online Capable" flag
The Linux kernel now checks for this before marking CPUs as
hotpluggable:

commit aa06e20f1be628186f0c2dcec09ea0009eb69778
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Sep 8 16:41:46 2021 -0500

    x86/ACPI: Don't add CPUs that are not online capable

    A number of systems are showing "hotplug capable" CPUs when they
    are not really hotpluggable.  This is because the MADT has extra
    CPU entries to support different CPUs that may be inserted into
    the socket with different numbers of cores.

    Starting with ACPI 6.3 the spec has an Online Capable bit in the
    MADT used to determine whether or not a CPU is hotplug capable
    when the enabled bit is not set.

    Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-01 18:45:05 +01:00
Rob Bradford
adf5881757 build: #[allow(clippy::significant_drop_in_scrutinee) in some crates
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
b57d7b258d build: Fix beta clippy issue (needless_return)
warning: unneeded `return` statement
   --> pci/src/vfio_user.rs:627:13
    |
627 | /             return Err(std::io::Error::new(
628 | |                 std::io::ErrorKind::Other,
629 | |                 format!("Region not found for 0x{:x}", gpa),
630 | |             ));
    | |_______________^
    |
    = note: `#[warn(clippy::needless_return)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return
help: remove `return`
    |
627 ~             Err(std::io::Error::new(
628 +                 std::io::ErrorKind::Other,
629 +                 format!("Region not found for 0x{:x}", gpa),
630 +             ))
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2716bc3311 build: Fix beta clippy issue (derive_partial_eq_without_eq)
warning: you are deriving `PartialEq` and can implement `Eq`
  --> vmm/src/serial_manager.rs:59:30
   |
59 | #[derive(Debug, Clone, Copy, PartialEq)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2e664dca64 vmm: Always reset the console mode on VMM exit
Tested:

1. SIGTERM based
2. VM shutdown/poweroff
3. Injected VM boot failure after calling Vm::setup_tty()

Fixes: #4248

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-28 16:45:27 +01:00
Rob Bradford
65ec6631fb vmm: cpu: Store the vCPU snapshots in ascending order
The snapshots are stored in a BTree which is ordered however as the ids
are strings lexical ordering places "11" ahead of "2". So encode the
vCPU id with zero padding so it is lexically sorted.

This fixes issues with CPU restore on aarch64.

See: #4239

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-27 16:20:57 +01:00
Wei Liu
bccd7c7e48 vmm: drop Sync+Send bounds for EndpointHandler
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 23:28:57 +01:00
Wei Liu
8fa1098629 vmm: switch from lazy_static to once_cell
Once_cell does not require using macro and is slated to become part of
Rust std at some point.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 16:03:07 +01:00
Sebastien Boeuf
335a4e1cc0 vmm: api: Expose kvm_hyperv parameter in OpenAPI description
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-17 15:11:53 +01:00
Sebastien Boeuf
81ba70a497 pci, vmm: Defer mapping VFIO MMIO regions on restore
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.

When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
7df7061610 pci, vmm: Add migratable support to vfio-user devices
Based on recent changes to VfioUserPciDevice, the vfio-user devices can
now be migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
c021dda267 pci, vmm: Add migratable support to VFIO devices
Based on recent changes to VfioPciDevice, the VFIO devices can now be
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Rob Bradford
94fb9f817d vmm: Fix clippy issues under "guest_debug" feature
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-08 11:40:56 +01:00
Michael Zhao
a7a15d56dd aarch64: Move setup_regs to hypervisor
`setup_regs` of AArch64 calls KVM sepecific code. Now move it to
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 11:07:46 +01:00
Sebastien Boeuf
65dc1c83a9 vmm: cpu: Save and restore CPU states during snapshot/restore
Based on recent KVM host patches (merged in Linux 5.16), it's forbidden
to call into KVM_SET_CPUID2 after the first successful KVM_RUN returned.
That means saving CPU states during the pause sequence, and restoring
these states during the resume sequence will not work with the current
design starting with kernel version 5.16.

In order to solve this problem, let's simply move the save/restore logic
to the snapshot/restore sequences rather than the pause/resume ones.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:29 +01:00
Sebastien Boeuf
3edaa8adb6 vmm: Ensure restore matches boot sequence
The vCPU is created and set after all the devices on a VM's boot.
There's no reason to follow a different order on the restore codepath as
this could cause some unexpected behaviors.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:17 +01:00
Michael Zhao
9260c3816e vmm: Update unit test for GIC refactoring
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
5d45d6d0fb vmm: Move GIC unit test to hypervisor crate
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
957d3a7443 aarch64: Simplify GIC related structs definition
Combined the `GicDevice` struct in `arch` crate and the `Gic` struct in
`devices` crate.

After moving the KVM specific code for GIC in `arch`, a very thin wapper
layer `GicDevice` was left in `arch` crate. It is easy to combine it
with the `Gic` in `devices` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
04949755c0 arch: Switch to new GIC interface
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Rob Bradford
ade3a9c8f6 virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-01 09:42:02 +02:00
Yi Wang
dbeb922882 doc: add vm coredump support
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
8b585b96c1 vmm: enable coredump
Based on the newly added guest_debug feature, this patch adds http
endpoint support.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
ccb604e1e1 vmm: add cpu segment note for coredump
The crash tool use a special note segment which named 'QEMU' to
analyze kaslr info and so on. If we don't add the 'QEMU' note
segment, crash tool can't find linux version to move on.

For now, the most convenient way is to add 'QEMU' note segment to
make crash tool happy.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
0e65ca4a6c vmm: save guest memory for coredump
Guest memory is needed for analysis in crash tool, so save it
for coredump.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
7e280b6f70 vmm: save elf header for coredump
The vmcore file of guest is an elf format, so the first step of coredump
is to save the elf header.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
90034fd6ba vmm: add GuestDebuggable trait
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.

Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Rob Bradford
465db7f08c vmm: config: Remove mergeable option from PmemConfig
Fixes: #3968

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:48:49 +02:00
Rob Bradford
55c5961f43 vmm: config: Remove dax & cache_size options from FsConfig
Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
7c3582b4a8 vmm: config: Fix error message regarding use of cache size without dax
The error message incorrectly said that the user was trying to combine
cache_size without dax whereas it is only usuable with dax.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
979797786d vmm: Remove DAX cache setup for virtio-fs devices
Remove the code from the DeviceManager that prepares the DAX cache since
the functionality has now been removed.

Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Michael Zhao
0fd6521759 aarch64: Avoid depending on layout in GIC code
Removing the dependency on `layout` helps moving GIC code into
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Michael Zhao
3fe20cc09a aarch64: Remove GicDevice trait
`GicDevice` trait was defined for the common part of GicV3 and ITS.
Now that the standalone GicV3 do not exist, `GicDevice` is not needed.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Rob Bradford
fa07d83565 Revert "virtio-devices, vmm: Optimised async virtio device activation"
This reverts commit f160572f9d.

There has been increased flakiness around the live migration tests since
this was merged. Speculatively reverting to see if there is increased
stability.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-21 21:27:33 +01:00
Rob Bradford
f160572f9d virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-20 17:07:13 +01:00
Sebastien Boeuf
49db713124 virtio-devices, vmm: Remove unused macro rules
Latest cargo beta version raises warnings about unused macro rules.
Simply remove them to fix the beta build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-20 09:59:43 +01:00
Maksym Pavlenko
3a0429c998 cargo: Clean up serde dependencies
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-18 08:21:19 +02:00
Rob Bradford
16a9882153 vmm: cpu: tdx: Don't use fd suffix for something not an FD
The hypervisor::Vcpu is the abstraction over the fd.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
218be2642e hypervisor: Explicitly pub use at the hypervisor crate top-level
Explicitly re-export types from the hypervisor specific modules. This
makes it much clearer what the common functionality that is exposed is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
cd0df05808 vmm, arch: CpuId is x86_64 specific so import from the x86_64 module
It will be removed as a top-level export from the hypervisor crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
d3f66f8702 hypervisor: Make vm module private
And thus only export what is necessary through a `pub use`. This is
consistent with some of the other modules and makes it easier to
understand what the external interface of the hypervisor crate is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
b1bd87df19 vmm: Simplify MsiInterruptManager generics
By taking advantage of the fact that IrqRoutingEntry is exported by the
hypervisor crate (that is typedef'ed to the hypervisor specific version)
then the code for handling the MsiInterruptManager can be simplified.

This is particularly useful if in this future it is not a typedef but
rather a wrapper type.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
3f9e8d676a hypervisor: Move creation of irq routing struct to hypervisor crate
This removes the requirement to leak as many datastructures from the
hypervisor crate into the vmm crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
c2c813599d vmm: Don't use kvm_ioctls directly
The IoEventAddress is re-exported through the crate at the top-level.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 15:57:43 +01:00
Rob Bradford
387d56879b vmm, hypervisor: Clean up nomenclature around offloading VM operations
The trait and functionality is about operations on the VM rather than
the VMM so should be named appropriately. This clashed with with
existing struct for the concrete implementation that was renamed
appropriately.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 13:10:01 +01:00
Sebastien Boeuf
5f722d0d3f vmm: Fix loading RAW firmware
Whenever going through the codepath of loading a RAW firmware, we always
add an extra RAM region to the guest memory through the memory manager.
But we must be careful to use the updated guest memory rather than a
previous reference that wasn't containing the new region, as this can
lead to the following error:

VmBoot(FirmwareLoad(InvalidGuestAddress(GuestAddress(4290772992))))

This is fixed by the current patch, getting the latest reference onto
the guest memory from the memory manager right after the new region has
been added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-06 18:13:28 +02:00
Bo Chen
42c19e14c5 vmm: Add 'shutdown()' to vCPU seccomp filter
This is required when hot-removing a vfio-user device. Details code path
below:

Thread 6 "vcpu0" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7f8196889700 (LWP 2358305)]
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
  0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
  0x000056189240737d in std::sys::unix::net::Socket::shutdown ()
    at library/std/src/sys/unix/net.rs:383
  std::os::unix::net::stream::UnixStream::shutdown () at library/std/src/os/unix/net/stream.rs:479
  0x000056189210e23d in vfio_user::Client::shutdown (self=0x7f8190014300)
    at vfio_user/src/lib.rs:787
  0x00005618920b9d02 in <pci::vfio_user::VfioUserPciDevice as core::ops::drop::Drop>::drop (
    self=0x7f819002d7c0) at pci/src/vfio_user.rs:551
  0x00005618920b8787 in core::ptr::drop_in_place<pci::vfio_user::VfioUserPciDevice> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b92e3 in core::ptr::drop_in_place<core::cell::UnsafeCell<dyn pci::device::PciDevice>>
    () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b9362 in core::ptr::drop_in_place<std::sync::mutex::Mutex<dyn pci::device::PciDevice>> () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920d8a3e in alloc::sync::Arc<T>::drop_slow (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1092
  0x00005618920ba273 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1688
 0x00005618920b76fb in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mutex::Mutex<dyn pci::device::PciDevice>>> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
 0x0000561891b5e47d in vmm::device_manager::DeviceManager::eject_device (self=0x7f8190009600,
    pci_segment_id=0, device_id=3) at vmm/src/device_manager.rs:4000
 0x0000561891b674bc in <vmm::device_manager::DeviceManager as vm_device:🚌:BusDevice>::write (
    self=0x7f8190009600, base=70368744108032, offset=8, data=&[u8](size=4) = {...})
    at vmm/src/device_manager.rs:4625
 0x00005618921927d5 in vm_device:🚌:Bus::write (self=0x7f8190006e00, addr=70368744108040,
    data=&[u8](size=4) = {...}) at vm-device/src/bus.rs:235
 0x0000561891b72e10 in <vmm::vm::VmOps as hypervisor::vm::VmmOps>::mmio_write (
    self=0x7f81900097b0, gpa=70368744108040, data=&[u8](size=4) = {...}) at vmm/src/vm.rs:378
 0x0000561892133ae2 in <hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run (
    self=0x7f8190013c90) at hypervisor/src/kvm/mod.rs:1114
 0x0000561891914e85 in vmm::cpu::Vcpu::run (self=0x7f819001b230) at vmm/src/cpu.rs:348
 0x000056189189f2cb in vmm::cpu::CpuManager::start_vcpu::{{closure}}::{{closure}} ()
    at vmm/src/cpu.rs:953

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-05 15:33:26 -07:00
Sebastien Boeuf
058a61148c vmm: Factorize net creation
Since both Net and vhost_user::Net implement the Migratable trait, we
can factorize the common part to simplify the code related to the net
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
425902b296 vmm: Factorize disk creation
Since both Block and vhost_user::Blk implement the Migratable trait, we
can factorize the common part to simplify the code related to the disk
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
54f39aa8cb vmm: Validate vhost-user-block/net are not configured with iommu=on
Extend the validate() function for both DiskConfig and NetConfig so that
we return an error if a vhost-user-block or vhost-user-net device is
expected to be placed behind the virtual IOMMU. Since these devices
don't support this feature, we can't allow iommu to be set to true in
these cases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Rob Bradford
707cea2182 vmm, devices: Move logging of 0x80 timestamp to its own device
This is a cleaner approach to handling the I/O port write to 0x80.
Whilst doing this also use generate the timestamp at the start of the VM
creation. For consistency use the same timestamp for the ARM equivalent.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 23:02:53 +01:00
Rob Bradford
c47e3b8689 gdb: Do not use VmmOps for memory manipulation
We don't use the VmmOps trait directly for manipulating memory in the
core of the VMM as it's really designed for the MSHV crate to handle
instruction decoding. As I plan to make this trait MSHV specific to
allow reduced locking for MMIO and PIO handling when running on KVM this
use should be removed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 11:33:02 -07:00
Bo Chen
7fe399598d vmm: device_manager: Map MMIO regions to the guest correctly
To correctly map MMIO regions to the guest, we will need to wait for valid
MMIO region information which is generated from 'PciDevice::allocate_bars()'
(as a part of 'DeviceManager::add_pci_device()').

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Rob Bradford
1dfe4eda5c vmm: Prevent "internal" identifiers being used by user
For devices that cannot be named by the user use the "__" prefix to
identify them as internal devices. Check that any identifiers provided
in the config do not clash with those internal names. This prevents the
user from creating a disk such as "__serial" which would then cause a
failure in unpredictable manner.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 12:34:11 +02:00
Sebastien Boeuf
6e101f479c vmm: Ensure hotplugged device identifier is unique
Whenever a device (virtio, vfio, vfio-user or vdpa) is hotplugged, we
must verify the provided identifier is unique, otherwise we must return
an error.

Particularly, this will prevent issues with identifiers for serial,
console, IOAPIC, balloon, rng, watchdog, iommu and gpio since all of
these are hardcoded by the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-03 18:34:24 +01:00
Rob Bradford
6d4862245d vmm: Generate event when device is removed
The new event contains the BDF and the device id:

{
  "timestamp": {
    "secs": 2,
    "nanos": 731073396
  },
  "source": "vm",
  "event": "device-removed",
  "properties": {
    "bdf": "0000:00:02.0",
    "id": "test-disk"
  }
}

Fixes: #4038

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-03 17:10:36 +02:00
Sebastien Boeuf
a5a2e591c9 vmm: Remove FsConfig from VmConfig when unplugging fs device
All hotpluggable devices were properly removed from the VmConfig when a
remove-device command was issued, except for the "fs" type. Fix this
lack of support as it is causing the integration tests to fail with the
recent addition of verifying that identifiers are unique.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
677c8831af vmm: Ensure uniqueness of generated identifiers
The device identifiers generated from the DeviceManager were not
guaranteed to be unique since they were not taking the list of
identifiers provided through the configuration.

By returning the list of unique identifiers from the configuration, and
by providing it to the DeviceManager, the generation of new identifiers
can rely both on the DeviceTree and the list of IDs from the
configuration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
634c53ea50 vmm: config: Validate provided identifiers are unique
A valid configuration means we can only accept unique identifiers from
the user.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
LiHui
ec0c1b01c4 vmm: api: Do not delete the API socket on API server creation
The socket will safely deleted on shutdown and so it is not necessary to
delete the API socket when starting the HTTP server.

Fixes: #4026

Signed-off-by: LiHui <andrewli@kubesphere.io>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 18:40:49 +01:00
Rob Bradford
f17aa3755f vmm: Add clarifying comment about Vm::entry_point()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
744a049007 vmm: Parallelise functionality with kernel loading
Move fuctionality earlier in the boot so as to run in parallel with the
loading of the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
e70bd069b3 vmm: Load kernel asynchronously
Start loading the kernel as possible in the VM in a separate thread.
Whilst it is loading other work can be carried out such as initialising
the devices.

The biggest performance improvement is seen with a more complex set of
devices. If using e.g. four virtio-net devices then the time to start the
kernel improves by 20-30ms. With the simplest configuration the
improvement was of the order of 2-3ms.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
bfeb3120f5 vmm: Refactor kernel loading to decouple from Vm struct
This will allow the kernel to be loaded from another thread.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
ce6d88d187 vmm: Merge aarch64 use statements
These were in their own block and not organised lexically.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
56fe4c61af vmm: Duplicate Vm::entry_point() across architectures
These will have very different implementations when asynchronously
loading the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
1d1a087fc5 vmm: Refactor kernel command line generation
This allows the same code for generating the kernel command line to be
used on both aarch64 and x86_64 when the latter starts loading the
kernel in asynchronously.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
f1276c58d2 vmm: Commandline inject from devices is aarch64 specific
This is not required for x86_64 and maintains a tight coupling between
kernel loading and the DeviceManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
da33eb5e8c vmm: device_manager: Remove extra whitespace lines
These originated from the removal of the acpi feature gate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Fabiano Fidêncio
fdeb4f7c46 Revert "vmm, openapi: Token Bucket fields should be uint64"
This reverts commit 87eed369cd.

The reason we're reverting this is that OpenAPI Specification[0] doesn't
know how to deal with unsigned types. :-/

Right now the best to do is keep it as it's, as an int64, and try to fix
OpenAPI, or even switch to swagger, as the latter knows how to properly
deal with those.  However, switching to swagger is far from being an 1:1
transition and will require time to experiment, thus reverting this for
now seems the best approach.

[0]: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 09:26:38 +02:00
Fabiano Fidêncio
87eed369cd vmm, openapi: Token Bucket fields should be uint64
The Token Bucket fields are, on the Cloud Hypervisor side, u64.
However, we expose those as int64 in the OpenAPI YAML file.

With that in mind, let's adjust the yaml file to expose those as uint64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 13:16:02 +02:00
Rob Bradford
79f4c2db01 vmm: Enable virtio-iommu in VmConfig::validate()
This means that the automatic enabling of the virtio-iommu will also be
applied to VMs creates via the API as well as the CLI.

Fixes: #4016

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 12:27:00 +01:00
Rob Bradford
bf9f79081a vmm: Only create ACPI memory manager DSDT when resizable
If using the ACPI based hotplug only memory can be added so if the
hotplug RAM size is the same as the boot RAM size then do not include
the memory manager DSDT entries.

Also: this change simplifies the code marginally by making the
HotplugMethod enum Copyable.

This was identified from the following perf output:

     1.78%     0.00%  vmm              cloud-hypervisor      [.] <vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
            |
            ---<vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
               <vmm::memory_manager::MemorySlot as acpi_tables::aml::Aml>::append_aml_bytes
               acpi_tables::aml::Name::new
               <acpi_tables::aml::Path as acpi_tables::aml::Aml>::append_aml_bytes
               __libc_malloc

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 13:07:19 +02:00
Rob Bradford
62f17ccf8c vmm: Improve error handling for vmm::vm::Error
In particular implement thiserror::Error, cleanup wording and remove
unused errors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
cb03540ffd vmm: config: Derive thiserror::Error
No further changes are necessary that adding a #[derive(Error)] as there
is a manual implementation of Display.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
0270d697ab vmm: cpu: Improve Error reporting
Remove unused enum members, improve error messages and implement
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
47529796d0 arch: Improve arch::Error
Remove unused error enum entries, improve wording and derive
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
1c786610b7 vmm: api: Don't use clashing struct name for Error
Import vmm::Error as VmmError to allow the use of thiserror::Error to
avoid clashing names.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Sebastien Boeuf
eb6daa2fc3 pci: Store MSI interrupt manager in VfioCommon
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Rob Bradford
adb3dcdc13 vmm: openapi: Add serial_number to PlatformConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
e972eb7c74 arch, vmm: Expose platform serial_number via SMBIOS
Fixes: #4002

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
203dfdc156 vmm: config: Add "serial_number" option to "--platform"
This carries a string that is exposed via DMI/SMBIOS and is particularly
useful for cloud-init initialisation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
4a04d1f8f2 vmm: seccomp: Allow SYS_rseq as required by newer glibc
glibc 2.35 as shipped by Fedora 36 now uses the rseq syscall.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 13:02:51 +01:00
Rob Bradford
4ca066f077 vmm: api: Simplify error reporting from HTTP to internal API calls
Use a single enum member for representing errors from the internal API.
This avoids the ugly duplication of the API call name in the error
message:

e.g.

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: VmResize(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Becomes:

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: ApiError(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-20 19:39:05 +01:00
Sebastien Boeuf
11e9f43305 vmm: Use new Resource type PciBar
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
89218b6d1e pci: Replace BAR tuple with PciBarConfiguration
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
1795afadb8 vmm: Factorize algorithm finding HOB memory resources
By factorizing the algorithm untangling TDVF sections from guest RAM
into a dedicated function, we can write some unit tests to validate it
properly achieves what we expect.

Adding the "tdx" feature to the unit tests, otherwise it wouldn't get
tested.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 15:23:12 +02:00
Sebastien Boeuf
5264d545dd pci, vmm: Extend PciDevice trait to support BAR relocation
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0c34846ef6 vmm: Return new PCI resources from add_pci_device()
By returning the new PCI resources from add_pci_device(), we allow the
factorization of the code translating the BARs into resources. This
allows VIRTIO, VFIO and vfio-user to add the resources to the DeviceTree
node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
4f172ae4b6 vmm: Retrieve PCI resources for VFIO and vfio-user devices
Relying on the function introduced recently to get the PCI resources and
handle the restore case, both VFIO and vfio-user device creation paths
now have access to PCI resources, which can be provided to the function
add_pci_device().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0f12fe9b3b vmm: Factorize retrieval of PCI resources
Create a dedicated function for getting the PCI segment, b/d/f and
optional resources. This is meant for handling the potential case of a
restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00