Commit Graph

677 Commits

Author SHA1 Message Date
Thomas Barrett
b750c332aa vmm: add NVIDIA GPUDirect P2P support
On platforms where PCIe P2P is supported, inject a PCI capability into
NVIDIA GPU to indicate support.

Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2024-02-29 09:26:29 +00:00
acarp
035c4b20fb block: Set an option to pin virtio block threads to host cpus
Currently the only way to set the affinity for virtio block threads is
to boot the VM, search for the tid of each of the virtio block threads,
then set the affinity manually. This commit adds an option to pin virtio
block queues to specific host cpus (similar to pinning vcpus to host
cpus). A queue_affinity option has been added to the disk flag in
the cli to specify a mapping of queue indices to host cpus.

Signed-off-by: acarp <acarp@crusoeenergy.com>
2024-02-13 09:05:57 +00:00
Rob Bradford
e70bf59809 vmm: Directly clone console resize pipe
Beta clippy fix:

warning: this call to `as_ref.map(...)` does nothing
    --> vmm/src/device_manager.rs🔢9
     |
1234 |         self.console_resize_pipe.as_ref().map(Arc::clone)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `self.console_resize_pipe.clone()`
     |
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_asref
     = note: `#[warn(clippy::useless_asref)]` on by default

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-02-07 09:25:40 +00:00
Philipp Schuster
e50a641126 devices: add debug-console device
This commit adds the debug-console (or debugcon) device to CHV. It is a
very simple device on I/O port 0xe9 supported by QEMU and BOCHS. It is
meant for printing information as easy as possible, without any
necessary configuration from the guest at all.

It is primarily interesting to OS/kernel and firmware developers as they
can produce output as soon as the guest starts without any configuration
of a serial device or similar. Furthermore, a kernel hacker might use
this device for information of type B whereas information of type A are
printed to the serial device.

This device is not used by default by Linux, Windows, or any other
"real" OS, but only by toy kernels and during firmware development.

In the CLI, it can be configured similar to --console or --serial with
the --debug-console parameter.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2024-01-25 10:25:14 -08:00
Thomas Barrett
c297d8d796 vmm: use RateLimiterGroup for virtio-blk devices
Add a 'rate_limit_groups' field to VmConfig that defines a set of
named RateLimiterGroups.

When the 'rate_limit_group' field of DiskConfig is defined, all
virtio-blk queues will be rate-limited by a shared RateLimiterGroup.
The lifecycle of all RateLimiterGroups is tied to the Vm.
A RateLimiterGroup may exist even if no Disks are configured to use
the RateLimiterGroup. Disks may be hot-added or hot-removed from the
RateLimiterGroup.

When the 'rate_limiter' field of DiskConfig is defined, we construct
an anonymous RateLimiterGroup whose lifecycle is tied to the Disk.
This is primarily done for api backwards compatability. Importantly,
the behavior is not the same! This implementation rate_limits the
aggregate bandwidth / iops of an individual disk rather than the
bandwidth / iops of an individual queue of a disk.

When neither the 'rate_limit_group' or the 'rate_limiter' fields of
DiskConfig is defined, the Disk is not rate-limited.

Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2024-01-03 10:21:06 -08:00
Markus Sütter
0e9513f2b7 vmm: Allow IP configuration on named TAP interfaces
This commit changes existing behavior of named TAP interfaces.
When booting a VM with configuration for a named TAP interface,
cloud-hypervisor will create the interface and apply a given
IP configuration to that interface. If the named interface
already exists on the system, the configuration is NOT overwritten.

Setting the ip and netmask fields in a tap interface configuration
for a named tap interface now works by handing this configuration
to the virtio_devices::Net object when it is created with a name.

This commit also touches net_util to make sure that the ip configuration
of existing TAP interfaces is not modified with ip or netmask handed to
open_tap.

Signed-off-by: Markus Sütter <markus.suetter@secunet.com>
2023-12-05 08:59:04 -08:00
Thomas Barrett
45b01d592a vmm: assign each pci segment 32-bit mmio allocator
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-11-20 15:33:50 -08:00
Thomas Barrett
bae13c5c56 block: add aio disk backend
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-10-25 10:19:23 -07:00
Thomas Barrett
3029fbeafd vmm: Allow assignment of PCI segments to NUMA node
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-10-18 11:18:15 -07:00
Praveen K Paladugu
6d1077fc3c vmm: Unix socket backend for serial port
Cloud-Hypervisor takes a path for Unix socket, where it will listen
on. Users can connect to the other end of the socket and access serial
port on the guest.

    "--serial socket=/path/to/socket" is the cmdline option to pass to
cloud-hypervisor.

Users can use socat like below to access guest's serial port once the
guest starts to boot:

    socat -,crnl UNIX-CONNECT:/path/to/socket

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-10-05 15:26:29 +01:00
Thomas Barrett
c4e8e653ac block: Add support for user specified ID_SERIAL
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-09-11 12:50:41 +01:00
Philipp Schuster
7bf0cc1ed5 misc: Fix various spelling errors using typos
This fixes all typos found by the typos utility with respect to the config file.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2023-09-09 10:46:21 +01:00
Rob Bradford
4548de194d build: Bump acpi_tables version
Fix newly added deprecation for mispelling of cacheable.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-09-07 13:58:33 +01:00
Rob Bradford
8d072fef15 vmm: device_manager: Remove unnecessary mut from reference
warning: this argument is a mutable reference, but not used mutably
    --> vmm/src/device_manager.rs:1908:35
     |
1908 |     fn set_raw_mode(&mut self, f: &mut dyn AsRawFd) -> vmm_sys_util::errno::Result<()> {
     |                                   ^^^^^^^^^^^^^^^^ help: consider changing to: `&dyn AsRawFd`
     |
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_ref_mut
     = note: `#[warn(clippy::needless_pass_by_ref_mut)]` on by default

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-22 12:01:54 +01:00
Rob Bradford
a00d29867c fuzz, vmm: Avoid infinite loop in CMOS fuzzer
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.

Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-07 08:04:55 +08:00
Rob Bradford
06dc708515 vmm: Only return from reset driven I/O once event received
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.

This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.

Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-04 09:57:25 +08:00
Yu Li
447cad3861 block: merge qcow, vhdx and block_util into block crate
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.

This commit is based on crosvm implementation:
f2eecc4152

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-19 13:52:43 +01:00
Manish Goregaokar
6fdba7ca11 build: Allow disabling io_uring
This gives users the chance to reduce the number of dependencies
included, which is generally good practice and also reduces code size.

Furthermore, `io_uring` specifically is a strong contender for something
one may wish to disable due to the syscall API's many security issues[1]

 [1]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html

Signed-off-by: Manish Goregaokar <manishsmail@gmail.com>
2023-07-11 06:19:30 -07:00
Yi Wang
d99c0c0d1d devices: pvpanic: add method for DeviceManager
Add method for DeviceManager to invoke.

Signed-off-by: Yi Wang <foxywang@tencent.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-07-06 11:14:54 +01:00
Rob Bradford
f485922b78 build: Bump acpi_tables from cb5f06c to 05a6091
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-06-08 17:28:02 +00:00
Wei Liu
ba1e89139a pci: aml: support up to 256 PCI segments
Originally the AML only accepted one hex number for PCI segment
numbering. Change it to accept two numbers. That makes it possible to
add up to 256 PCI segments.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2023-05-02 09:34:05 +01:00
Alyssa Ross
c90a0ffff6 vmm: reset to the original termios
Previously, we used two different functions for configuring ttys.
vmm_sys_util::terminal::Terminal::set_raw_mode() was used to configure
stdio ttys, and cfmakeraw() was used to configure ptys created by
cloud-hypervisor.  When I centralized the stdio tty cleanup, I also
switched to using cfmakeraw() everywhere, to avoid duplication.

cfmakeraw sets the OPOST flag, but when we later reset the ttys, we
used vmm_sys_util::terminal::Terminal::set_canon_mode(), which does
not unset this flag.  This meant that the terminal was getting mostly,
but not fully, reset.

To fix this without depending on the implementation of cfmakeraw(),
let's just store the original termios for stdio terminals, and restore
them to exactly the state we found them in when cloud-hypervisor exits.

Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-05-02 09:33:53 +01:00
Bo Chen
a9623c7a28 vmm: Add valid FDs for TAP devices to 'VmConfig::preserved_fds'
In this way, valid FDs for TAP devices will be closed when the holding
VmConfig instance is destroyed.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-04-17 16:33:29 +01:00
Alyssa Ross
b6feae0ace vmm: only touch the tty flags if it's being used
When neither serial nor console are connected to the tty,
cloud-hypervisor shouldn't touch the tty at all.  One way in which
this is annoying is that if I am running cloud-hypervisor without it
using my terminal, I expect to be able to suspend it with ^Z like any
other process, but that doesn't work if it's put the terminal into raw
mode.

Instead of putting the tty into raw mode when a VM is created or
restored, do it when a serial or console device is created.  Since we
now know it can't be put into raw mode until the Vm object is created,
we can move setting it back to canon mode into the drop handler for
that object, which should always be run in normal operation.  We still
also put the tty into canon mode in the SIGTERM / SIGINT handler, but
check whether the tty was actually used, rather than whether stdin is
a tty.  This requires passing on_tty around as an atomic boolean.

I explored more of an abstraction over the tty — having an object that
encapsulated stdout and put the tty into raw mode when initialized and
into canon mode when dropped — but it wasn't practical, mostly due to
the special requirements of the signal handler.  I also investigated
whether the SIGWINCH listener process could be used here, which I
think would have worked but I'm hesitant to involve it in serial
handling as well as conosle handling.

There's no longer a check for whether the file descriptor is a tty
before setting it into canon mode — it's redundant, because if it's
not a tty it just won't respond to the ioctl.

Tested by shutting down through the API, SIGTERM, and an error
injected after setting raw mode.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-17 16:33:17 +01:00
Alyssa Ross
38a1b45783 vmm: use the SIGWINCH listener for TTYs too
Previously, we were only using it for PTYs, because for PTYs there's
no alternative.  But since we have to have it for PTYs anyway, if we
also use it for TTYs, we can eliminate all of the code that handled
SIGWINCH for TTYs.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Alyssa Ross
e9841db486 vmm: don't ignore errors from SIGWINCH listener
Now that the SIGWINCH listener has fallbacks for older kernels, we
don't expect it to routinely fail, so if there's an error setting it
up, we want to know about it.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Alyssa Ross
505f4dfa53 vmm: close all unused fds in sigwinch listener
The PTY main file descriptor had to be introduced as a parameter to
start_sigwinch_listener, so that it could be closed in the child.
Really the SIGWINCH listener process should not have any file
descriptors open, except for the ones it needs to function, so let's
make it more robust by having it close all other file descriptors.

For recent kernels, we can do this very conveniently with
close_range(2), but for older kernels, we have to fall back to closing
open file descriptors one at a time.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Rob Bradford
73c4156775 vmm, devices: Update to latest acpi_tables crate API
Significant API changes have occured, most significantly is the switch
to an approach which does not require vm-memory and can run no_std.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-03-03 13:08:36 +00:00
Yong He
76d6d28f3e vmm: do not start signal thread to resize console if no need
Now cloud hypervisor will start signal thread to catch
SIGWINCH signal, cloud hypervisor then will resize the
guest console via vconsole.

This patch skip starting signal thread when there is no
need to resize guest console, such as console is not
configured.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-02-28 09:40:07 -08:00
Yong He
3494080e2f vmm: add configuration for network offloading features
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.

Fixes: #4792.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-01-12 09:05:45 +00:00
Rob Bradford
5e52729453 misc: Automatically fix cargo clippy issues added in 1.65 (stable)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-14 14:27:19 +00:00
Sebastien Boeuf
3931b99d4e vm-migration: Introduce new constructor for Snapshot
This simplifies the Snapshot creation as we expect a SnapshotData to be
provided most of the time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4ae6b595d7 vm-migration: Rename add_data_section() into add_data()
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
748018ace3 vm-migration: Don't store the id as part of Snapshot structure
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4517b76a23 vm-migration: Rename SnapshotDataSection into SnapshotData
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
5b3bcfa233 vm-migration: Snapshot should have a unique SnapshotDataSection
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Rob Bradford
b3e3a5fdd7 vmm: Fix clippy on musl toolchains
The datatype used for the ioctl() C library call is different between it
and the glibc toolchains. The easiest solution is to have the compiler
type cast to type of the parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-07 17:50:48 +00:00
Michael Zhao
b173f6f654 vmm,devices: Change Gic snapshot and restore path
The snapshot and restore of AArch64 Gic was done in Vm. Now it is moved
to DeviceManager.

The benefit is that the restore can be done while the Gic is created in
DeviceManager.

While the moving of state data from Vm snapshot to DeviceManager
snapshot breaks the compatability of migration from older versions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-12-01 17:07:25 +01:00
Sebastien Boeuf
a6959a7469 vmm: Move DeviceManager to new restore design
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 13:46:31 +01:00
Sebastien Boeuf
b62a40efae virtio-devices, vmm: Always restore virtio devices in paused state
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.

This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 09:27:00 +01:00
Sebastien Boeuf
90b5014a50 vmm: device_manager: Remove 'restoring' attribute
Given 'restoring' isn't needed anymore from the DeviceManager structure,
let's simplify it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
cc3706afe1 pci, vmm: Move VfioPciDevice and VfioUserPciDevice to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
81862e8ed3 devices, vmm: Move Gpio to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
9fbf52b998 devices, vmm: Move Pl011 to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
ef92e55998 devices, vmm: Move Ioapic to new restore design
Moving the Ioapic object to the new restore design, meaning the Ioapic
is created directly with the right state, and it shares the same
codepath as when it's created from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:18:21 +01:00
Sebastien Boeuf
a50b3784fe virtio-devices: Create a proper result type for VirtioPciDevice
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Sebastien Boeuf
eae8043890 pci, virtio-devices: Move VirtioPciDevice to the new restore design
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.

It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Michael Zhao
7d16c74020 vmm: Refactor AArch64 GIC initialization process
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings

The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
ec01062ada vmm: Switch order between DeviceManager and CpuManager creation
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.

It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 11:49:57 +01:00