The custom 'clone' duplicates 'preserved_fds' so that the validation
logic can be safely carried out on the clone of the VmConfig.
The custom 'drop' ensures 'preserved_fds' are safely closed when the
holding VmConfig instance is destroyed.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Preserved FDs are the ones that share the same life-time as its holding
VmConfig instance, such as FDs for creating TAP devices.
Preserved FDs will stay open as long as the holding VmConfig instance is
valid, and will be closed when the holding VmConfig instance is destroyed.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Having PMU in guests isn't critical, and not all hardware supports
it (e.g. Apple Silicon).
CpuManager::init_pmu already has a fallback for if PMU is not
supported by the VCPU, but we weren't getting that far, because we
would always try to initialise the VCPU with KVM_ARM_VCPU_PMU_V3, and
then bail when it returned with EINVAL.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Once error occur, vcpu thread may exit, this should
be critical event for the whole VM, we should fire
exit event and set vcpu state.
If we don't set vcpu state, the shutdown process
will hang at signal_thread, which is waiting the
vcpu state to change.
Signed-off-by: Yong He <alexyonghe@tencent.com>
This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.
TEST=Run `cpuid` in the guest and observe the frequency populated.
Fixes: #5178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The code of the stable branch diverges from the main branch, so we
can't directly backport the corresponding commit to fix the clippy
issues.
See: commit 5e52729453
Signed-off-by: Bo Chen <chen.bo@intel.com>
MSHV does not require to ensure MMIO/PIO exits complete
before pausing. This patch makes sure the above requirement
by checking the hypervisor type run-time.
Fixes#5037
Signed-off-by: Muminul Islam <muislam@microsoft.com>
(cherry picked from commit 4e3bc20f2c)
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
If the memory is not backed by a file then it is possible to enable
Transparent Huge Pages on the memory and take advantage of the benefits
of huge pages without requiring the specific allocation of an appropriate
number of huge pages.
TEST=Boot and see that in /proc/`pidof cloud-hypervisor`/smaps that the
region is now THPeligible (and that also pages are being used.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As huge pages are always MAP_SHARED then where the shared memory would
be checked (for vhost-user and local migration) we can also check
instead for huge pages.
The checking is also extended to cover the memory zones based
configuration as well.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We can't use MAP_ANONYMOUS and still have huge pages so MAP_SHARED is
effectively required when using huge pages.
Unfortunately it is not as simple as always forcing MAP_SHARED if
hugepages is on as this might be inappropriate in the backing file case
hence why there is additional complexity of assigning to mmap_flags on
each case and the MAP_SHARED is only turned on for the anonymous file
huge page case as well as anonymous shared file case.
See: #4805
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If we do not need an anonymous file backing the memory then do not
create one.
As a side effect this addresses an issue with CoW (mmap with MAP_PRIVATE
but no MAP_ANONYMOUS) when the memory is pinned for VFIO.
Fixes: #4805
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Until there is a need for sharing the memory fd with a child process, we
should err on the safe side to close it on exec.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The add_device() function, from the device manager code, takes a
DeviceConfig as a parameter, instead of a VmAddDevice.
The change was originally done as part of 34412c9b41 and it didn't
break Kata Containers because the VmAddDevice and DeviceConfig structs
share most of their fields, besides the optional for serialization
`pci_segment`, which is not used by the client yet.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is preliminary work to ensure a migrated VM is created right before
it is restored. This will be useful when moving to a design where the VM
is both created and restored simultaneously from the Snapshot.
In details, that means the MemoryManager is the object that must be
created upon receiving the config from the source VM, so that memory
content can be later received and filled into the GuestMemory.
Only after these steps happened, the snapshot is received from the
source VM, and the actual Vm object can be created from both the
snapshot and the MemoryManager previously created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
These look alarming if you are booting with the a distro kernel which is
now a recommended approach.
See: #4786
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The restore path of MemoryManager is handled specially without
implementing a `Snapshottable:restore()`. Removing the explicit call to
it along the migration code path to avoid confusions.
See: #4783
Signed-off-by: Bo Chen <chen.bo@intel.com>
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.
Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).
In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding VHOST_VDPA_GET_CONFIG_SIZE and VHOST_VDPA_SUSPEND to the list of
authorized ioctls for the vmm thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>