Commit Graph

1959 Commits

Author SHA1 Message Date
Rui Chang
2b457584e0 vmm: add add-user-device support in cloud-hypervisor.yaml
The change is missed when add "add-user-device" support in
53b2e19934, use this commit to fix it.

Signed-off-by: Rui Chang <rui.chang@arm.com>
2023-11-21 09:13:22 +00:00
Thomas Barrett
45b01d592a vmm: assign each pci segment 32-bit mmio allocator
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-11-20 15:33:50 -08:00
Yi Wang
a69d8c63b3 vmm: speed up JSON load when reading snap files
We found that it's slow to load JSON when reading snap files. As
described in [1], using from_slice instead of from_reader can fix
this.

Also, fix the error type being returned.

1. https://github.com/serde-rs/json/issues/160

Signed-off-by: Yi Wang <foxywang@tencent.com>
2023-11-16 14:56:04 -08:00
Bo Chen
d4892f41b3 misc: Stop using deprecated functions from vm-memory crate
See: https://github.com/rust-vmm/vm-memory/pull/247

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-11-14 09:17:42 +00:00
Yong He
bb38e4e599 vmm: Allow simultaneously set serial and console as TTY mode
Cloud Hypovrisor supports legacy serial device and virito console device
for VMs. Using legacy serial device, CH can capture full VM console logs,
but its implementation is based on KVM PIO emulation and has poor
performance. Using the virtio console device, the VM console logs will
be sent to CH through the virtio ring, the performance is better, but CH
will only capture the VM console logs after the virtio console device is
initialized, the VM early startup logs will be discarded.

This patch provides a way to enable both the legacy serial device and the
virtio console device as a TTY mode by setting the leagcy serial port as
the VM's early printk device and setting the virtio console as the VM's
main console device.

Then CH can capture early boot logs from the legacy serial device and
capture later logs from the virito console device with better performance.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-11-02 11:06:30 -07:00
Bo Chen
d2f71cebca virtio-devices, vmm: Update seccomp list
The seccompiler v0.4.0 started to use `seccomp` syscall instead of the
`prctl` syscall. Also, threads for virtio-deivces should not need any of
these syscalls anyway.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-31 15:34:17 +00:00
Thomas Barrett
bae13c5c56 block: add aio disk backend
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-10-25 10:19:23 -07:00
Muminul Islam
afe798fc19 vmm: Fix clippy warnings
This patch fixes following warnings:

error: boolean to int conversion using if
   --> vmm/src/vm.rs:866:42
|
|                       .create_vm_with_type(if sev_snp_enabled.into() {
    |  __________________________________________^
| |                         1 // SEV_SNP_ENABLED
| |                     } else {
| |                         0 // SEV_SNP_DISABLED
| |                     })
| |_____________________^ help: replace with from: `u64::from(sev_snp_enabled.into())`
|
  = note: `-D clippy::bool-to-int-with-if` implied by `-D warnings`
  = note: `sev_snp_enabled.into() as u64` or `sev_snp_enabled.into().into()` can also be valid options
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bool_to_int_with_if

error: useless conversion to the same type: `bool`
   --> vmm/src/vm.rs:866:45
|
|                     .create_vm_with_type(if sev_snp_enabled.into() {
|                                             ^^^^^^^^^^^^^^^^^^^^^^ help: consider removing `.into()`: `sev_snp_enabled`
|
  = note: `-D clippy::useless-conversion` implied by `-D warnings`
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_conversion

error: could not compile `vmm` due to 2 previous errors

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2023-10-23 16:03:16 -07:00
Bo Chen
43a6eda400 vmm: Add help information for "--numa pci_segments="
See: #5844

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-20 11:44:28 -07:00
Wei Liu
7bc3452139 main: switch command parsing to use clap
Partially revert 111225a2a5
and add the new dbus and pvpanic arguments.

As we are switching back to clap observe the following changes.

A few examples:

1. `-v -v -v` needs to be written as`-vvv`
2. `--disk D1 --disk D2` and others need to be written as `--disk D1 D2`.
3. `--option value` needs to be written as `--option=value.`

Change integration tests to adapt to the breaking changes.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
2023-10-20 11:44:28 -07:00
Thomas Barrett
3029fbeafd vmm: Allow assignment of PCI segments to NUMA node
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-10-18 11:18:15 -07:00
Bo Chen
0b4c153d4d arch, vmm: Clear AMX CPUID bits if the feature is not enabled
Fixes: #5833

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-18 11:13:12 -07:00
Bo Chen
7dd260f82f arch, vmm: Add new struct CpuidConfig
This struct contains all configuration fields that controls the way how
we generate CPUID for the guest on x86_64. This allows cleaner extension
when adding new configuration fields.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-18 11:13:12 -07:00
Bo Chen
aa6e83126c vmm: tdx: Fix a deadlock while accessing vm_config
The lock to `vm_config` is held for accessing `cpus.kvm_hyperv` passing
as a reference to `arch::generate_common_cpuid()`, so acquiring the same
lock again while calling to the same function is a deadlock.

Fixes: 3793ffe888

Reported-by: Yi Wang <foxywang@tencent.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-17 19:41:33 +01:00
Jinank Jain
1b59ab3d7b vmm, hypervisor: Initialize SEV-SNP VM
As part of this initialization for a SEV-SNP VM on MSHV, it is required
that we transition the guest state to secure state using partition
hypercall. This implies all the created VPs will transition to secure
state and could access the guest encrypted memory.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2023-10-17 17:45:28 +01:00
Anatol Belski
311fc05417 cpu: Store hypervisor object directly instead of separate props
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2023-10-17 18:43:22 +02:00
Anatol Belski
b52966a12c cpu: Implement AMD compatible topology handling
cpu: Pass APIC id explicitly where needed
topology: Set subleaf number explicitly

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2023-10-17 18:43:22 +02:00
Praveen K Paladugu
044f3f758e serial_manager: Remove serial socket
Remove the backend socket of serial port while shutting down guest.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-10-05 15:26:29 +01:00
Praveen K Paladugu
6d1077fc3c vmm: Unix socket backend for serial port
Cloud-Hypervisor takes a path for Unix socket, where it will listen
on. Users can connect to the other end of the socket and access serial
port on the guest.

    "--serial socket=/path/to/socket" is the cmdline option to pass to
cloud-hypervisor.

Users can use socat like below to access guest's serial port once the
guest starts to boot:

    socat -,crnl UNIX-CONNECT:/path/to/socket

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-10-05 15:26:29 +01:00
Bo Chen
ff651e0e28 vmm: Report enabled features from the '/vmm.ping' endpoint
Fixes: #5817

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-05 08:40:50 +01:00
Bo Chen
9abb12fd71 vmm: Return the right error from Vcpu::snapshot()
Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-27 08:51:40 +01:00
Rob Bradford
44f200d67d hypervisor: Set destination vCPU TSC frequency to source
Include the TSC frequency as part of the KVM state so that it will be
restored at the destination.

This ensures migration works correctly between hosts that have a
different TSC frequency if the guest is running with TSC as the source
of timekeeping.

Fixes: #5786

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-09-20 09:13:42 -07:00
Thomas Barrett
c4e8e653ac block: Add support for user specified ID_SERIAL
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-09-11 12:50:41 +01:00
Julian Stecklina
0d9749282a vmm: simplify EntryPoint
EntryPoint had an optional entry_addr, but there is no usage of this
struct that makes it necessary that the address is optional.

Remove the Option to avoid being able to express things that are not
useful.

Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
2023-09-09 10:46:51 +01:00
Philipp Schuster
7bf0cc1ed5 misc: Fix various spelling errors using typos
This fixes all typos found by the typos utility with respect to the config file.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2023-09-09 10:46:21 +01:00
Rob Bradford
4548de194d build: Bump acpi_tables version
Fix newly added deprecation for mispelling of cacheable.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-09-07 13:58:33 +01:00
Jinank Jain
200cba0e20 vmm: Refactor VM creation workflow
This refactoring is required to add support for creating SEV-SNP enabled
VM.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2023-09-07 12:52:27 +01:00
Jinank Jain
5fd79571b7 vmm: Add a feature flag for SEV-SNP support
This feature flag gates the development for SEV-SNP enabled guest.

Also add a helper function to identify if SNP should be enabled for the
guest.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2023-09-07 12:52:27 +01:00
Omer Faruk Bayram
2ed96cd3ed vmm: dbus: broadcast event_monitor events over the DBus API
This commit builds on top of the `Monitor::subscribe` function and
makes it possible to broadcast events published from `event-monitor`
over D-Bus.

The broadcasting functionality is enabled if the D-Bus API is enabled
and users who wish to also enable the file based `event-monitor` can do
so with the CLI arg `--event-monitor`.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-28 17:01:03 -07:00
Omer Faruk Bayram
e02efe9ba0 event_monitor: make it possible to subscribe to Monitor
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-28 17:01:03 -07:00
Rob Bradford
9d5c5a6410 vmm: sigwinch_listener: Remove unncessary mut from reference
warning: this argument is a mutable reference, but not used mutably
   --> vmm/src/sigwinch_listener.rs:121:38
    |
121 | fn set_foreground_process_group(tty: &mut File) -> io::Result<()> {
    |                                      ^^^^^^^^^ help: consider changing to: `&File`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_ref_mut

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-22 12:01:54 +01:00
Rob Bradford
8d072fef15 vmm: device_manager: Remove unnecessary mut from reference
warning: this argument is a mutable reference, but not used mutably
    --> vmm/src/device_manager.rs:1908:35
     |
1908 |     fn set_raw_mode(&mut self, f: &mut dyn AsRawFd) -> vmm_sys_util::errno::Result<()> {
     |                                   ^^^^^^^^^^^^^^^^ help: consider changing to: `&dyn AsRawFd`
     |
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_ref_mut
     = note: `#[warn(clippy::needless_pass_by_ref_mut)]` on by default

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-22 12:01:54 +01:00
Rob Bradford
0bead9ebe1 vmm: cpu: Fix slow vector initialization
warning: slow zero-filling initialization
    --> vmm/src/cpu.rs:1780:9
     |
1779 |         let mut mat_data: Vec<u8> = Vec::new();
     |                                     ---------- help: consider replacing this with: `vec![0; std::mem::size_of_val(&lapic)]`
1780 |         mat_data.resize(std::mem::size_of_val(&lapic), 0);
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#slow_vector_initialization
     = note: `#[warn(clippy::slow_vector_initialization)]` on by default

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-22 12:01:54 +01:00
Yi Wang
d46dd4b31f vmm: cpu: Add pending removed vcpu check to avoid resize vcpu hang
Add pending removed vcpu check according to VcpuState.removing, which
can avoid cloud hypervisor hangup during continual vcpu resize.

Fix #5419

Signed-off-by: Yi Wang <foxywang@tencent.com>
2023-08-20 10:40:43 +01:00
Omer Faruk Bayram
a0c8bf4f9f vmm: seccomp: implement seccomp filtering for the event-monitor thread
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-09 17:22:25 +01:00
Omer Faruk Bayram
02e1c54426 event_monitor: refactor the implementation to support concurrent access
This patch modifies `event_monitor` to ensure that concurrent access to
`event_log` from multiple threads is safe. Previously, the `event_log`
function would acquire a reference to the event log file and write
to it without doing any synchronization, which made it prone to
data races. This issue likely went under the radar because the
relevant `SAFETY` comment on the unsafe block was incomplete.

The new implementation spawns a dedicated thread named `event-monitor`
solely for writing to the file. It uses the MPMC channel exposed by
`flume` to pass messages to the `event-monitor` thread. Since
`flume::Sender<T>` implements `Sync`, it is safe for multiple threads
to share it and send messages to the `event-monitor` thread.
This is not possible with `std::sync::mpsc::Sender<T>` since it's
`!Sync`, meaning it is not safe for it to be shared between different
threads.

The `event_monitor::set_monitor` function now only initializes
the required global state and returns an instance of the
`Monitor` struct. This decouples the actual logging logic from the
`event_monitor` crate. The `event-monitor` thread is then spawned by
the `vmm` crate.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-09 17:22:25 +01:00
Rob Bradford
a00d29867c fuzz, vmm: Avoid infinite loop in CMOS fuzzer
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.

Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-07 08:04:55 +08:00
Rob Bradford
06dc708515 vmm: Only return from reset driven I/O once event received
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.

This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.

Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-04 09:57:25 +08:00
Yong He
0149e65081 vm-device: support batch update interrupt source group GSI
Split interrupt source group restore into two steps, first restore
the irqfd for each interrupt source entry, and second restore the
GSI routing of the entire interrupt source group.

This patch will reduce restore latency of interrupt source group,
and in a 200-concurrent restore test, the patch reduced the
average IOAPIC restore time from 15ms to 1ms.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-08-03 15:58:36 +01:00
Yi Wang
3225c0c7c8 vmm: Automatically pause VM for coredump
If the VMM is not already paused then pause the VM prior to executing
the coredump and then resume it after. If the VM is already paused then
the original state is maintained.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2023-07-31 17:05:46 +01:00
Praveen K Paladugu
dd09a0a890 vmm: Extend mshv ioctls in seccomp filters
Add MSHV_CREATE_DEVICE, MSHV_SET_DEVICE_ATTR ioctls to filters. These
ioctls are required to passthrough PCI devices on mshv.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-07-26 10:14:43 -07:00
Yi Wang
92cf2816b0 vmm: Ensure vcpu pause is synchronised
The pause of vcpu is async now, which makes the vm pause is not
synchronised. As virtio device calls paused_sync wait() to make
sure device_manager pause synchronously, if we make vcpu pause
synchronously, the vm pause can be synchronously then. After
vm.pause() returns the vm is really paused now.

This patch adds a AtomicBool variable to mark vcpu paused state,
to make sure the pause of CpuManager is synchronised.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2023-07-25 09:48:24 -07:00
Yu Li
447cad3861 block: merge qcow, vhdx and block_util into block crate
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.

This commit is based on crosvm implementation:
f2eecc4152

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-19 13:52:43 +01:00
Rob Bradford
0039f45276 vmm: Use LocalX2Apic in MADT/MAT
Using this over the LocalApic supports APIC IDs (and hence number of
vCPUs) above 256 (size increases from u8 to u32.)

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-07-18 09:35:13 -07:00
Rob Bradford
b792d4751d vmm: Encode ACPI name for CPUs using hexadecimal
This increases the number of CPUs that are supported.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-07-17 09:57:51 -07:00
Alyssa Ross
69bd0036d9 vmm: support removing devices before VM is booted
If the VM has been configured but not yet booted, all we need to do to
support removing a device is to remove it from the config, so it will
never be created.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-07-14 10:01:23 -07:00
Alyssa Ross
f346687e3d vmm: default GDB to false when deserializing
This fixes the valid VM config unit tests, which would otherwise fail
to deserialize their expected JSON config due to the missing "gdb" field.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-07-14 09:34:39 -07:00
Yu Li
63226e2b80 build: Fix beta clippy issue (arc_with_non_send_sync)
warning: usage of `Arc<T>` where `T` is not `Send` or `Sync`
   --> virtio-devices/src/vsock/device.rs:376:22
    |
376 |             backend: Arc::new(RwLock::new(backend)),
    |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: consider using `Rc<T>` instead or wrapping `T` in a std::sync type like `Mutex<T>`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync
    = note: `#[warn(clippy::arc_with_non_send_sync)]` on by default

The vsock backend may be shared between threads, so the type `B` in
`Vsock` should be `VsockBackend` and `Sync`.

Considering that `api_receiver` and `gdb_receiver` are only used in vmm
threads, the `Arc` can be replaced by `Rc`.

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-13 08:16:30 -07:00
Yu Li
d0dbc7fb4d build: Fix beta clippy issue (useless_vec)
warning: useless use of `vec!`
   --> test_infra/src/lib.rs:111:30
    |
111 |             let mut events = vec![epoll::Event::new(epoll::Events::empty(), 0); 1];
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: you can use an array directly: `[epoll::Event::new(epoll::Events::empty(), 0); 1]`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_vec
    = note: `#[warn(clippy::useless_vec)]` on by default

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-13 08:16:30 -07:00
Manish Goregaokar
6fdba7ca11 build: Allow disabling io_uring
This gives users the chance to reduce the number of dependencies
included, which is generally good practice and also reduces code size.

Furthermore, `io_uring` specifically is a strong contender for something
one may wish to disable due to the syscall API's many security issues[1]

 [1]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html

Signed-off-by: Manish Goregaokar <manishsmail@gmail.com>
2023-07-11 06:19:30 -07:00