This moves the devices creation out of the dedicated restore function
which will be eventually removed.
This factorizes the creation of all devices into a single location.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows the clock restoration to be moved out of the dedicated
restore function, which will eventually be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Every Vcpu is now created with the right state if there's an available
snapshot associated with it. This simplifies the restore logic.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.
This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the Ioapic object to the new restore design, meaning the Ioapic
is created directly with the right state, and it shares the same
codepath as when it's created from scratch.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.
It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings
The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Moving the creation of the vCPUs before the DeviceManager gets created
will allow for the aarch64 vGIC to be created before the DeviceManager
as well in a follow up patch. The end goal being to adopt the same
creation sequence for both x86_64 and aarch64, and keeping in mind that
the vGIC requires every vCPU to be created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Split the vCPU creation into two distincts parts. On the one hand we
create the actual Vcpu object with the creation of the hypervisor::Vcpu.
And on the other hand, we configure the existing Vcpu, setting registers
to proper values (such as setting the entry point).
This will allow for further work to move the creation earlier in the
boot, so that the hypervisor::Vcpu will be already created when the
DeviceManager gets created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.
It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This optimisation provided some peformance improvement when measured by
perf however when considered in terms of boot time peformance this
optimisation doesn't have any impact measurable using our
peformance-metrics tooling.
Removing this optimisation helps simplify the VMM internals as it allows
the reordering of the VM creation process permitting refactoring of the
restore code path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
If the memory is not backed by a file then it is possible to enable
Transparent Huge Pages on the memory and take advantage of the benefits
of huge pages without requiring the specific allocation of an appropriate
number of huge pages.
TEST=Boot and see that in /proc/`pidof cloud-hypervisor`/smaps that the
region is now THPeligible (and that also pages are being used.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As huge pages are always MAP_SHARED then where the shared memory would
be checked (for vhost-user and local migration) we can also check
instead for huge pages.
The checking is also extended to cover the memory zones based
configuration as well.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We can't use MAP_ANONYMOUS and still have huge pages so MAP_SHARED is
effectively required when using huge pages.
Unfortunately it is not as simple as always forcing MAP_SHARED if
hugepages is on as this might be inappropriate in the backing file case
hence why there is additional complexity of assigning to mmap_flags on
each case and the MAP_SHARED is only turned on for the anonymous file
huge page case as well as anonymous shared file case.
See: #4805
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If we do not need an anonymous file backing the memory then do not
create one.
As a side effect this addresses an issue with CoW (mmap with MAP_PRIVATE
but no MAP_ANONYMOUS) when the memory is pinned for VFIO.
Fixes: #4805
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Until there is a need for sharing the memory fd with a child process, we
should err on the safe side to close it on exec.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The add_device() function, from the device manager code, takes a
DeviceConfig as a parameter, instead of a VmAddDevice.
The change was originally done as part of 34412c9b41 and it didn't
break Kata Containers because the VmAddDevice and DeviceConfig structs
share most of their fields, besides the optional for serialization
`pci_segment`, which is not used by the client yet.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is preliminary work to ensure a migrated VM is created right before
it is restored. This will be useful when moving to a design where the VM
is both created and restored simultaneously from the Snapshot.
In details, that means the MemoryManager is the object that must be
created upon receiving the config from the source VM, so that memory
content can be later received and filled into the GuestMemory.
Only after these steps happened, the snapshot is received from the
source VM, and the actual Vm object can be created from both the
snapshot and the MemoryManager previously created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
These look alarming if you are booting with the a distro kernel which is
now a recommended approach.
See: #4786
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The restore path of MemoryManager is handled specially without
implementing a `Snapshottable:restore()`. Removing the explicit call to
it along the migration code path to avoid confusions.
See: #4783
Signed-off-by: Bo Chen <chen.bo@intel.com>
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.
Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).
In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding VHOST_VDPA_GET_CONFIG_SIZE and VHOST_VDPA_SUSPEND to the list of
authorized ioctls for the vmm thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In this way, we have all functions related to generate default values of
vm-config structs in the same location.
Signed-off-by: Bo Chen <chen.bo@intel.com>
These have been replaced by members of PayloadConfig and should be
removed in v28.0 (mentioned in v26.0 release notes.)
Fixes: #4737
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is consistent when considering that some structs have a
`#[derive(Default)`] so it makes sense for the default implementations
to be in the same location.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Place the data structures that are required for constructing a VmConfig
into it's own module from the logic that exists to suppot them.
This is useful as a consumer of the API can now clearly see what data
structures make up the API for creating VMs.
This has no functional change and I made no attempt to clean up the
ordering (it's as in the original file) nor any other clean up.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.22 to 4.0.9.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](clap-rs/clap@v3.2.22...v4.0.9)
---
updated-dependencies:
- dependency-name: clap
dependency-type: direct:production
update-type: version-update:semver-major
...
Moving to the major version 4 introduced some breaking changes which had
to be handled manually.
Fixes#4709
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This option is needed for the openapi consumer (e.g. Kata Containers) to
load firmware (e.g. td-shim) for booting.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This simplifies the CI process but also logical with the existing
functionality under "guest_debug" (dumping guest memory).
Fixes: #4679
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding the support for the user to set the MTU for the vhost-user-net
backend, which allows the integration test to be extended with the test
of the MTU parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adjust MTU logic such that:
1. Apply an MTU to the TAP interface if the user supplies it
2. Always query the TAP interface for the MTU and expose that.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This simplifes the buld and checks with very little overhead and the
fwdebug device is I/O port device on 0x402 that can be used by edk2 as a
very simple character device.
See: #4679
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new feature "tracing" that enables tracing functionality via the
"tracer" crate (sadly features and crates cannot share the same name.)
Setup: tracer::start()
The main functionality is a tracer::trace_scope()! macro that will add
trace points for the duration of the scope. Tracing events are per
thread.
Finish: tracer::end() this will write the trace file (pretty printed
JSON) to a file in the current directory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.
In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.
In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given the AMX x86 feature has been made available since kernel v5.17,
and given we don't have any test validating this feature, there's no
need to keep it behing a Rust feature gate.
Fixes#3996
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Multiple rust-vmm crates must be updated at once given the vm-memory one
has been updated and they all rely on vm-memory.
- vm-memory from 0.8.0 to 0.9.0
- vhost from 0.4.0 to 0.5.0
- virtio-queue from 0.5.0 to 0.6.0
- vhost-user-backend from 0.6.0 to 0.7.0
- linux-loader from 0.4.0 to 0.5.0
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>