Commit Graph

651 Commits

Author SHA1 Message Date
Alyssa Ross
505f4dfa53 vmm: close all unused fds in sigwinch listener
The PTY main file descriptor had to be introduced as a parameter to
start_sigwinch_listener, so that it could be closed in the child.
Really the SIGWINCH listener process should not have any file
descriptors open, except for the ones it needs to function, so let's
make it more robust by having it close all other file descriptors.

For recent kernels, we can do this very conveniently with
close_range(2), but for older kernels, we have to fall back to closing
open file descriptors one at a time.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Rob Bradford
73c4156775 vmm, devices: Update to latest acpi_tables crate API
Significant API changes have occured, most significantly is the switch
to an approach which does not require vm-memory and can run no_std.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-03-03 13:08:36 +00:00
Yong He
76d6d28f3e vmm: do not start signal thread to resize console if no need
Now cloud hypervisor will start signal thread to catch
SIGWINCH signal, cloud hypervisor then will resize the
guest console via vconsole.

This patch skip starting signal thread when there is no
need to resize guest console, such as console is not
configured.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-02-28 09:40:07 -08:00
Yong He
3494080e2f vmm: add configuration for network offloading features
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.

Fixes: #4792.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-01-12 09:05:45 +00:00
Rob Bradford
5e52729453 misc: Automatically fix cargo clippy issues added in 1.65 (stable)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-14 14:27:19 +00:00
Sebastien Boeuf
3931b99d4e vm-migration: Introduce new constructor for Snapshot
This simplifies the Snapshot creation as we expect a SnapshotData to be
provided most of the time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4ae6b595d7 vm-migration: Rename add_data_section() into add_data()
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
748018ace3 vm-migration: Don't store the id as part of Snapshot structure
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
4517b76a23 vm-migration: Rename SnapshotDataSection into SnapshotData
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Sebastien Boeuf
5b3bcfa233 vm-migration: Snapshot should have a unique SnapshotDataSection
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-09 10:26:06 +01:00
Rob Bradford
b3e3a5fdd7 vmm: Fix clippy on musl toolchains
The datatype used for the ioctl() C library call is different between it
and the glibc toolchains. The easiest solution is to have the compiler
type cast to type of the parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-07 17:50:48 +00:00
Michael Zhao
b173f6f654 vmm,devices: Change Gic snapshot and restore path
The snapshot and restore of AArch64 Gic was done in Vm. Now it is moved
to DeviceManager.

The benefit is that the restore can be done while the Gic is created in
DeviceManager.

While the moving of state data from Vm snapshot to DeviceManager
snapshot breaks the compatability of migration from older versions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-12-01 17:07:25 +01:00
Sebastien Boeuf
a6959a7469 vmm: Move DeviceManager to new restore design
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 13:46:31 +01:00
Sebastien Boeuf
b62a40efae virtio-devices, vmm: Always restore virtio devices in paused state
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.

This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 09:27:00 +01:00
Sebastien Boeuf
90b5014a50 vmm: device_manager: Remove 'restoring' attribute
Given 'restoring' isn't needed anymore from the DeviceManager structure,
let's simplify it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
cc3706afe1 pci, vmm: Move VfioPciDevice and VfioUserPciDevice to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
81862e8ed3 devices, vmm: Move Gpio to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
9fbf52b998 devices, vmm: Move Pl011 to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
ef92e55998 devices, vmm: Move Ioapic to new restore design
Moving the Ioapic object to the new restore design, meaning the Ioapic
is created directly with the right state, and it shares the same
codepath as when it's created from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:18:21 +01:00
Sebastien Boeuf
a50b3784fe virtio-devices: Create a proper result type for VirtioPciDevice
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Sebastien Boeuf
eae8043890 pci, virtio-devices: Move VirtioPciDevice to the new restore design
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.

It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Michael Zhao
7d16c74020 vmm: Refactor AArch64 GIC initialization process
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings

The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
ec01062ada vmm: Switch order between DeviceManager and CpuManager creation
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.

It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 11:49:57 +01:00
Rob Bradford
e37ec26ccf vmm: Remove PCI PIO optimisation
This optimisation provided some peformance improvement when measured by
perf however when considered in terms of boot time peformance this
optimisation doesn't have any impact measurable using our
peformance-metrics tooling.

Removing this optimisation helps simplify the VMM internals as it allows
the reordering of the VM creation process permitting refactoring of the
restore code path.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-22 19:47:53 +00:00
Wei Liu
d05586f520 vmm: modify or provide safety comments
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-11-18 12:50:01 +00:00
Praveen K Paladugu
09e79a5e9b vmm: Add tpm device to mmio bus
Add tpm device to mmio bus if appropriate cmdline arguments were
passed.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2022-11-15 16:42:21 +00:00
Praveen K Paladugu
af261f231c vmm: Add required acpi entries for vtpm device
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
2022-11-15 16:42:21 +00:00
Bo Chen
a9ec0f33c0 misc: Fix clippy issues
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-02 09:41:43 +01:00
Jinrong Liang
cb171d4a23 device_manager: Avoid checking io_uring support when it's not needed
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
2022-10-27 22:21:51 -07:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Sebastien Boeuf
099cdd2af8 virtio-devices, vmm: vdpa: Implement live migration support
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.

Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).

In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Rob Bradford
1bc63e7848 vmm: Remove legacy I/O ports for ACPI
These addresses have been superseded and replaced with other I/O ports.

Fixes: #4483

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-03 17:08:57 +01:00
Sebastien Boeuf
903c08f8a1 net: Don't override default TAP interface MTU
Adjust MTU logic such that:
1. Apply an MTU to the TAP interface if the user supplies it
2. Always query the TAP interface for the MTU and expose that.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-27 10:37:35 +01:00
Rob Bradford
b2d1dd65f3 build: Remove "fwdebug" and "common" feature flags
This simplifes the buld and checks with very little overhead and the
fwdebug device is I/O port device on 0x402 that can be used by edk2 as a
very simple character device.

See: #4679

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-26 10:16:33 -07:00
Rob Bradford
1202b9a07a vmm: Add some tracing of boot sequence
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-22 18:09:31 +01:00
Sebastien Boeuf
76dbf85b79 net: Give the user the ability to set MTU
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.

In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-21 16:20:57 +02:00
Sebastien Boeuf
f38056fc9e virtio-devices, vmm: Simplify virtio-mem resize operation
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.

In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-20 13:43:40 +02:00
Nuno Das Neves
784a3aaf3c devices: gic: use VgicConfig everywhere
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.

Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
2022-08-31 08:33:05 +01:00
Michael Zhao
b65639fad3 vmm:AArch64: move uefi_flash to memory manager
uefi_flash is used when load firmware, that is load payload depends on
device manager. move uefi_flash to memory manager can eliminate the
dependency.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-31 08:32:08 +01:00
Sebastien Boeuf
8c02648ac9 vmm: device_manager: Update virtio-console for proper PTY support
Given the virtio-console is now able to buffer its output when no PTY is
connected on the other end, the device manager code is updated to enable
this. Moving the endpoint type from FilePair to PtyPair enables the
proper codepath in the virtio-console implementation, as well as
updating the PTY resize code, and forcing the PTY to always be
non-blocking.

The non-blocking behavior is required to avoid blocking the guest that
would be waiting on the virtio-console driver. When receiving an
EWOULDBLOCK error, the output will simply be redirected to the temporary
buffer so that it can be later flushed.

The PTY resize logic has been slightly modified to ensure the PTY file
descriptors are closed. It avoids the child process to keep a hold onto
the PTY device, which would have caused the PTY to believe something is
connected on the other end, which would have prevented the detection of
any new connection on the PTY.

Fixes #4521

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
cdcd4d259e vmm: serial: Wait for PTY to be available before writing to it
The goal of this patch is to provide a reliable way to detect when the
other end of the PTY is connected, and therefore be able to identify
when we can write to the PTY device. This is needed because writing to
the PTY device when the other end isn't connected causes the loss of
the written bytes.

The way to detect the connection on the other end of the PTY is by
knowing the other end is disconnected at first with the presence of the
EPOLLHUP event. Later on, when the connection happens, EPOLLHUP is not
triggered anymore, and that's when we can assume it's okay to write to
the PTY main device.

It's important to note we had to ensure the file descriptor for the
other end was closed, otherwise we would have never seen the EPOLLHUP
event. And we did so by removing the "sub" field from the PtyPair
structure as it was keeping the associated File opened.

Fixes #3170

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-19 14:39:06 +01:00
Sebastien Boeuf
98f949d35d vmm: Add new I/O ports for ACPI shutdown and PM timer devices
Adding new I/O ports for both the ACPI shutdown and the ACPI PM timer
devices so they can be triggered from both addresses. The reason for
this change is that TDX expects only certain I/O ports to be enabled
based on what QEMU exposes. We follow this to avoid new ports from being
opened exclusively for Cloud Hypervisor.

We have to keep the former I/O ports available given all firmwares
haven't been updated yet. Once we reach a point where we know both Rust
Hypervisor Firmware, OVMF, TDVF and TDSHIM have been updated with the
new port values, we'll be able to remove the former ports.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-11 11:46:09 +01:00
Rob Bradford
2e8eb96ef6 vmm: device_manager: Store ACPI platform addresses for later use
These are ready for inclusion in the FACP table.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-25 16:16:06 +01:00
Wei Liu
ad33f7c5e6 vmm: return seccomp rules according to hypervisors
That requires stashing the hypervisor type into various places.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-22 12:50:12 +01:00
Wei Liu
a96a5d7816 hypervisor, vmm: use new vfio-ioctls
Use the new vfio-ioctls APIs. Drop Cloud Hypervisor's Device trait
since it is no longer needed.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-21 23:37:53 +01:00
Wei Liu
0e8769d76a device_manager: assert passthrough_device has the correct type
There is a lot of unsafe code in such a small function. Add an assert
to help detect issues earlier.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 08:09:50 +01:00
Sebastien Boeuf
81ba70a497 pci, vmm: Defer mapping VFIO MMIO regions on restore
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.

When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
7df7061610 pci, vmm: Add migratable support to vfio-user devices
Based on recent changes to VfioUserPciDevice, the vfio-user devices can
now be migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
c021dda267 pci, vmm: Add migratable support to VFIO devices
Based on recent changes to VfioPciDevice, the VFIO devices can now be
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00