Commit Graph

663 Commits

Author SHA1 Message Date
Rob Bradford
a00d29867c fuzz, vmm: Avoid infinite loop in CMOS fuzzer
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.

Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-07 08:04:55 +08:00
Rob Bradford
06dc708515 vmm: Only return from reset driven I/O once event received
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.

This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.

Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-04 09:57:25 +08:00
Yu Li
447cad3861 block: merge qcow, vhdx and block_util into block crate
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.

This commit is based on crosvm implementation:
f2eecc4152

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-19 13:52:43 +01:00
Manish Goregaokar
6fdba7ca11 build: Allow disabling io_uring
This gives users the chance to reduce the number of dependencies
included, which is generally good practice and also reduces code size.

Furthermore, `io_uring` specifically is a strong contender for something
one may wish to disable due to the syscall API's many security issues[1]

 [1]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html

Signed-off-by: Manish Goregaokar <manishsmail@gmail.com>
2023-07-11 06:19:30 -07:00
Yi Wang
d99c0c0d1d devices: pvpanic: add method for DeviceManager
Add method for DeviceManager to invoke.

Signed-off-by: Yi Wang <foxywang@tencent.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-07-06 11:14:54 +01:00
Rob Bradford
f485922b78 build: Bump acpi_tables from cb5f06c to 05a6091
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-06-08 17:28:02 +00:00
Wei Liu
ba1e89139a pci: aml: support up to 256 PCI segments
Originally the AML only accepted one hex number for PCI segment
numbering. Change it to accept two numbers. That makes it possible to
add up to 256 PCI segments.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2023-05-02 09:34:05 +01:00
Alyssa Ross
c90a0ffff6 vmm: reset to the original termios
Previously, we used two different functions for configuring ttys.
vmm_sys_util::terminal::Terminal::set_raw_mode() was used to configure
stdio ttys, and cfmakeraw() was used to configure ptys created by
cloud-hypervisor.  When I centralized the stdio tty cleanup, I also
switched to using cfmakeraw() everywhere, to avoid duplication.

cfmakeraw sets the OPOST flag, but when we later reset the ttys, we
used vmm_sys_util::terminal::Terminal::set_canon_mode(), which does
not unset this flag.  This meant that the terminal was getting mostly,
but not fully, reset.

To fix this without depending on the implementation of cfmakeraw(),
let's just store the original termios for stdio terminals, and restore
them to exactly the state we found them in when cloud-hypervisor exits.

Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-05-02 09:33:53 +01:00
Bo Chen
a9623c7a28 vmm: Add valid FDs for TAP devices to 'VmConfig::preserved_fds'
In this way, valid FDs for TAP devices will be closed when the holding
VmConfig instance is destroyed.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-04-17 16:33:29 +01:00
Alyssa Ross
b6feae0ace vmm: only touch the tty flags if it's being used
When neither serial nor console are connected to the tty,
cloud-hypervisor shouldn't touch the tty at all.  One way in which
this is annoying is that if I am running cloud-hypervisor without it
using my terminal, I expect to be able to suspend it with ^Z like any
other process, but that doesn't work if it's put the terminal into raw
mode.

Instead of putting the tty into raw mode when a VM is created or
restored, do it when a serial or console device is created.  Since we
now know it can't be put into raw mode until the Vm object is created,
we can move setting it back to canon mode into the drop handler for
that object, which should always be run in normal operation.  We still
also put the tty into canon mode in the SIGTERM / SIGINT handler, but
check whether the tty was actually used, rather than whether stdin is
a tty.  This requires passing on_tty around as an atomic boolean.

I explored more of an abstraction over the tty — having an object that
encapsulated stdout and put the tty into raw mode when initialized and
into canon mode when dropped — but it wasn't practical, mostly due to
the special requirements of the signal handler.  I also investigated
whether the SIGWINCH listener process could be used here, which I
think would have worked but I'm hesitant to involve it in serial
handling as well as conosle handling.

There's no longer a check for whether the file descriptor is a tty
before setting it into canon mode — it's redundant, because if it's
not a tty it just won't respond to the ioctl.

Tested by shutting down through the API, SIGTERM, and an error
injected after setting raw mode.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-17 16:33:17 +01:00
Alyssa Ross
38a1b45783 vmm: use the SIGWINCH listener for TTYs too
Previously, we were only using it for PTYs, because for PTYs there's
no alternative.  But since we have to have it for PTYs anyway, if we
also use it for TTYs, we can eliminate all of the code that handled
SIGWINCH for TTYs.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Alyssa Ross
e9841db486 vmm: don't ignore errors from SIGWINCH listener
Now that the SIGWINCH listener has fallbacks for older kernels, we
don't expect it to routinely fail, so if there's an error setting it
up, we want to know about it.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Alyssa Ross
505f4dfa53 vmm: close all unused fds in sigwinch listener
The PTY main file descriptor had to be introduced as a parameter to
start_sigwinch_listener, so that it could be closed in the child.
Really the SIGWINCH listener process should not have any file
descriptors open, except for the ones it needs to function, so let's
make it more robust by having it close all other file descriptors.

For recent kernels, we can do this very conveniently with
close_range(2), but for older kernels, we have to fall back to closing
open file descriptors one at a time.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Rob Bradford
73c4156775 vmm, devices: Update to latest acpi_tables crate API
Significant API changes have occured, most significantly is the switch
to an approach which does not require vm-memory and can run no_std.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-03-03 13:08:36 +00:00
Yong He
76d6d28f3e vmm: do not start signal thread to resize console if no need
Now cloud hypervisor will start signal thread to catch
SIGWINCH signal, cloud hypervisor then will resize the
guest console via vconsole.

This patch skip starting signal thread when there is no
need to resize guest console, such as console is not
configured.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-02-28 09:40:07 -08:00
Yong He
3494080e2f vmm: add configuration for network offloading features
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.

Fixes: #4792.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-01-12 09:05:45 +00:00
Rob Bradford
5e52729453 misc: Automatically fix cargo clippy issues added in 1.65 (stable)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-14 14:27:19 +00:00
Sebastien Boeuf
3931b99d4e vm-migration: Introduce new constructor for Snapshot
This simplifies the Snapshot creation as we expect a SnapshotData to be
provided most of the time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4ae6b595d7 vm-migration: Rename add_data_section() into add_data()
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
748018ace3 vm-migration: Don't store the id as part of Snapshot structure
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4517b76a23 vm-migration: Rename SnapshotDataSection into SnapshotData
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
5b3bcfa233 vm-migration: Snapshot should have a unique SnapshotDataSection
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Rob Bradford
b3e3a5fdd7 vmm: Fix clippy on musl toolchains
The datatype used for the ioctl() C library call is different between it
and the glibc toolchains. The easiest solution is to have the compiler
type cast to type of the parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-07 17:50:48 +00:00
Michael Zhao
b173f6f654 vmm,devices: Change Gic snapshot and restore path
The snapshot and restore of AArch64 Gic was done in Vm. Now it is moved
to DeviceManager.

The benefit is that the restore can be done while the Gic is created in
DeviceManager.

While the moving of state data from Vm snapshot to DeviceManager
snapshot breaks the compatability of migration from older versions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-12-01 17:07:25 +01:00
Sebastien Boeuf
a6959a7469 vmm: Move DeviceManager to new restore design
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 13:46:31 +01:00
Sebastien Boeuf
b62a40efae virtio-devices, vmm: Always restore virtio devices in paused state
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.

This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 09:27:00 +01:00
Sebastien Boeuf
90b5014a50 vmm: device_manager: Remove 'restoring' attribute
Given 'restoring' isn't needed anymore from the DeviceManager structure,
let's simplify it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
cc3706afe1 pci, vmm: Move VfioPciDevice and VfioUserPciDevice to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
81862e8ed3 devices, vmm: Move Gpio to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
9fbf52b998 devices, vmm: Move Pl011 to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
ef92e55998 devices, vmm: Move Ioapic to new restore design
Moving the Ioapic object to the new restore design, meaning the Ioapic
is created directly with the right state, and it shares the same
codepath as when it's created from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:18:21 +01:00
Sebastien Boeuf
a50b3784fe virtio-devices: Create a proper result type for VirtioPciDevice
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Sebastien Boeuf
eae8043890 pci, virtio-devices: Move VirtioPciDevice to the new restore design
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.

It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Michael Zhao
7d16c74020 vmm: Refactor AArch64 GIC initialization process
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings

The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
ec01062ada vmm: Switch order between DeviceManager and CpuManager creation
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.

It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 11:49:57 +01:00
Rob Bradford
e37ec26ccf vmm: Remove PCI PIO optimisation
This optimisation provided some peformance improvement when measured by
perf however when considered in terms of boot time peformance this
optimisation doesn't have any impact measurable using our
peformance-metrics tooling.

Removing this optimisation helps simplify the VMM internals as it allows
the reordering of the VM creation process permitting refactoring of the
restore code path.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-22 19:47:53 +00:00
Wei Liu
d05586f520 vmm: modify or provide safety comments
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-11-18 12:50:01 +00:00
Praveen K Paladugu
09e79a5e9b vmm: Add tpm device to mmio bus
Add tpm device to mmio bus if appropriate cmdline arguments were
passed.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2022-11-15 16:42:21 +00:00
Praveen K Paladugu
af261f231c vmm: Add required acpi entries for vtpm device
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
2022-11-15 16:42:21 +00:00
Bo Chen
a9ec0f33c0 misc: Fix clippy issues
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-02 09:41:43 +01:00
Jinrong Liang
cb171d4a23 device_manager: Avoid checking io_uring support when it's not needed
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
2022-10-27 22:21:51 -07:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Sebastien Boeuf
099cdd2af8 virtio-devices, vmm: vdpa: Implement live migration support
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.

Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).

In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Rob Bradford
1bc63e7848 vmm: Remove legacy I/O ports for ACPI
These addresses have been superseded and replaced with other I/O ports.

Fixes: #4483

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-03 17:08:57 +01:00
Sebastien Boeuf
903c08f8a1 net: Don't override default TAP interface MTU
Adjust MTU logic such that:
1. Apply an MTU to the TAP interface if the user supplies it
2. Always query the TAP interface for the MTU and expose that.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-27 10:37:35 +01:00
Rob Bradford
b2d1dd65f3 build: Remove "fwdebug" and "common" feature flags
This simplifes the buld and checks with very little overhead and the
fwdebug device is I/O port device on 0x402 that can be used by edk2 as a
very simple character device.

See: #4679

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-26 10:16:33 -07:00
Rob Bradford
1202b9a07a vmm: Add some tracing of boot sequence
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-22 18:09:31 +01:00
Sebastien Boeuf
76dbf85b79 net: Give the user the ability to set MTU
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.

In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-21 16:20:57 +02:00
Sebastien Boeuf
f38056fc9e virtio-devices, vmm: Simplify virtio-mem resize operation
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.

In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-20 13:43:40 +02:00