As reported by the periodic CI runs, it may take more time for the NVMe
device to present in the guest after being hotplugged as a VFIO user
device on `aarch64` (especially under high load). Let's increase the
timeout after device hotplug from `1s` to `10s` to increase the test
stability.
Fixes: #3495
Signed-off-by: Bo Chen <chen.bo@intel.com>
Similarly to the previous commit restricting the cpu resizing error only
to the situations where the vcpu amount has changed, let's do the same
with the memory and be consistent throughout our code base.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
188078467d made clear that resize should
only happen when dealing with a "dynamic" CpuManager. Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.
There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there. However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Extend the Hypervisor API in order to retrieve the TDX capabilities from
the underlying hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If EFI reset fails on the Linux kernel then it will fallthrough to CMOS
reset. Implement this as one of our reset solutions.
Fixes: #3912
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These were erroneously skipping features for the unit tests and the
"build" target for dev_cli.sh
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These are never run by the CI and is inconsistent with the way we build
test which is specified inside the .github workflows.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This includes the removal of testing without the "acpi" feature. The
command have been reordered to reduce the amount of recompilation
required.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It seems the vdpa_sim_block isn't behaving properly after the vhost
device is closed, as it sometimes returns EBUSY when we try to open it
again. The easiest way to deal with this issue is by simplifying the
integration test, avoid to plug the same device after it's been
unplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit unifies the custom linux kernel build in x86, Arm, and
performance metrics to the same function. Therefore, when bumping
the kernel version, we can make sure we only need to make the change
in one place.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
With the addition of the amx feature, add a new build workflow to
enable the feature and a clippy quality check.
Signed-off-by: William Douglas <william.douglas@intel.com>
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
Now that we rely on vhost v0.4.0, which contains the fix for
get_iova_range(), we don't need the workaround anymore, and we can
actually call into the dedicated function.
Fixes#3861
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Rely on newly released versions of the vhost and vhost-user-backend
crates from rust-vmm.
The new vhost version includes the fixes needed for vDPA.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit makes sure that the custom linux kernel is always
rebuilt when running the performance metrics tests, and therefore
changes to the kernel config file is always caught.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Explain the reason why using vDPA might be interesting and how to use it
with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Disable the DAX feature from the virtio-fs implementation as the feature
is still not stable. The feature is deprecated, meaning the 'dax'
parameter will be removed in about 2 releases cycles.
In the meantime, the parameter value is ignored and forced to be
disabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The test is sporadically failing whenever we try to hotplug the vDPA
device we've just unplugged. This is causing the kernel to complain with
EBUSY because the device hasn't been released yet. This is happening
because the CI system is under very high load, therefore taking quite
some time to the host to update the state of this device.
The easy way to fix such issue is by increasing the sleep time between
the unplug and the replug.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When running non-dynamic or with virtio-mem for hotplug the ACPI
functionality should not be included on the DSDT nor does the
MemoryManager need to be placed on the MMIO bus.
Fixes: #3883
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is now consistent with not supplying the _CRS for the device when
CpuManager is not dynamic.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The file descriptor provided to fs_slave_map() and fs_slave_io() is
passed as a AsRawFd trait, meaning the caller owns it. For that reason,
there's no need for these functions to close the file descriptor as it
will be closed later on anyway.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>