1734 Commits

Author SHA1 Message Date
Rob Bradford
4a04d1f8f2 vmm: seccomp: Allow SYS_rseq as required by newer glibc
glibc 2.35 as shipped by Fedora 36 now uses the rseq syscall.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 13:02:51 +01:00
Rob Bradford
4ca066f077 vmm: api: Simplify error reporting from HTTP to internal API calls
Use a single enum member for representing errors from the internal API.
This avoids the ugly duplication of the API call name in the error
message:

e.g.

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: VmResize(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Becomes:

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: ApiError(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-20 19:39:05 +01:00
Sebastien Boeuf
11e9f43305 vmm: Use new Resource type PciBar
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
89218b6d1e pci: Replace BAR tuple with PciBarConfiguration
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
1795afadb8 vmm: Factorize algorithm finding HOB memory resources
By factorizing the algorithm untangling TDVF sections from guest RAM
into a dedicated function, we can write some unit tests to validate it
properly achieves what we expect.

Adding the "tdx" feature to the unit tests, otherwise it wouldn't get
tested.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 15:23:12 +02:00
Sebastien Boeuf
5264d545dd pci, vmm: Extend PciDevice trait to support BAR relocation
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0c34846ef6 vmm: Return new PCI resources from add_pci_device()
By returning the new PCI resources from add_pci_device(), we allow the
factorization of the code translating the BARs into resources. This
allows VIRTIO, VFIO and vfio-user to add the resources to the DeviceTree
node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
4f172ae4b6 vmm: Retrieve PCI resources for VFIO and vfio-user devices
Relying on the function introduced recently to get the PCI resources and
handle the restore case, both VFIO and vfio-user device creation paths
now have access to PCI resources, which can be provided to the function
add_pci_device().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0f12fe9b3b vmm: Factorize retrieval of PCI resources
Create a dedicated function for getting the PCI segment, b/d/f and
optional resources. This is meant for handling the potential case of a
restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
6e084572d4 pci, virtio: Make virtio-pci BAR restoration more generic
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Rob Bradford
b212f2823d vmm: Deprecate mergeable option from virtio-pmem
KSM would never merge the file backed pages so this option has no
effect.

See: #3968

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-12 07:12:25 -07:00
Rob Bradford
ed87e42e6f vm-device, pci, devices: Remove InterruptSourceGroup::{un}mask
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.

This causes the ioctl to disappear from the perf report for a boot of
the VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-11 22:56:48 +01:00
Michael Zhao
d1b2a3fca9 aarch64: Add a memory-simulated flash for UEFI
EDK2 execution requires a flash device at address 0.

The new added device is not a fully functional flash. It doesn't
implement any spec of a flash device. Instead, a piece of memory is used
to simulate the flash simply.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-11 09:51:34 +01:00
Michael Zhao
298a5580a9 aarch64: Remove unnecessary function definitions
This is a refactoring commit to simplify source code.
Removed some functions that only return a layout const.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-08 11:08:43 -07:00
Michael Zhao
656425a328 aarch64: Align the data types in layout
Some addresses defined in `layout.rs` were of type `GuestAddress`, and
are `u64`. Now align the types of all the `*_START` definitions to
`GuestAddress`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-08 11:08:43 -07:00
Michael Zhao
848d88c122 aarch64: Reserve a hole in 32-bit space
The reserved space is for devices.
Some devices (like TPM) require arbitrary addresses close to 4GiB.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Michael Zhao
a3dbc3b415 aarch64: Change RAM_START type GuestAddress
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Michael Zhao
ef9f37cd5f aarch64: Rename RAM_64BIT_START in layout
`RAM_64BIT_START` was set to 1 GiB, not a real 64-bit address. Now
rename it `RAM_START` to avoid confusion.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Sebastien Boeuf
e76a5969e8 vmm: Add iommu parameter to VdpaConfig
Add a new iommu parameter to VdpaConfig in order to place the vDPA
device behind a virtual IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-05 00:09:52 +02:00
Sebastien Boeuf
00ce8277aa vmm: tdx: Fix the logic for generating HOB memory resources
The list of memory resources provided through the HOB wasn't accurate
because of the broken logic. The fix provides correct ranges to the
firmware.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-01 18:24:32 +01:00
Sebastien Boeuf
70222ffc1a vmm: tdx: Only report TempMem as reserved memory
Based on latest QEMU patches from branch tdx-qemu-2022.03.29-v7.0.0-rc1
we should only report as memory resources the TempMem sections from TDVF
sections.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-01 18:24:32 +01:00
Rob Bradford
7fd76eff05 vmm: Don't error if live resizing is not possible
The introduction of a error if live resizing is not possible is a
regression compared to the original behaviour where the new size would
be stored in the config and reflected in the next boot. This behaviour
was also inconsistent with the effect of resizing with no VM booted.

Instead of generating an error allow the code to go ahead and update the
config so that the new size will be available upon the reboot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-31 17:04:53 +01:00
Bo Chen
eed2a0d06b vmm: Add 'libc::SYS_shutdown' to vmm 'seccomp' filter list
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-31 09:22:07 +01:00
Fabiano Fidêncio
f049867cd9 vmm,memory_manager: Deny resizing only if the ram amount has changed
Similarly to the previous commit restricting the cpu resizing error only
to the situations where the vcpu amount has changed, let's do the same
with the memory and be consistent throughout our code base.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-30 21:29:08 +01:00
Fabiano Fidêncio
2c8045343c vmm,cpu: Deny resizing only if the vcpu amount has changed
188078467db42f50f5b7e7a7969738ebf8aec95c made clear that resize should
only happen when dealing with a "dynamic" CpuManager.  Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.

There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there.  However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-30 21:29:08 +01:00
Sebastien Boeuf
3c973fa7ce virtio-devices: vhost-user: Add support for TDX
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-30 10:32:23 +02:00
Rob Bradford
ca68b9e7a9 build: Remove "cmos" feature gate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
e0d3efec6e devices: cmos: Implement CMOS based reset
If EFI reset fails on the Linux kernel then it will fallthrough to CMOS
reset. Implement this as one of our reset solutions.

Fixes: #3912

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
7c0cf8cc23 arch, devices, vmm: Remove "acpi" feature gate
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-28 09:18:29 -07:00
William Douglas
6b0df31e5d vmm: Add support for enabling AMX in vm guests
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.

On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.

This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.

The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.

Signed-off-by: William Douglas <william.douglas@intel.com>
2022-03-25 14:11:54 -07:00
Bo Chen
639a7dd73a vmm: Improve 'test_config_validation' with precise Err assertions
Fixed: #3879

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-25 09:17:05 +00:00
Sebastien Boeuf
afd9f17b73 virtio-fs: Deprecate the DAX feature
Disable the DAX feature from the virtio-fs implementation as the feature
is still not stable. The feature is deprecated, meaning the 'dax'
parameter will be removed in about 2 releases cycles.

In the meantime, the parameter value is ignored and forced to be
disabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-24 10:39:11 -07:00
Rob Bradford
7a8061818e vmm: Don't expose MemoryManager ACPI functionality unless required
When running non-dynamic or with virtio-mem for hotplug the ACPI
functionality should not be included on the DSDT nor does the
MemoryManager need to be placed on the MMIO bus.

Fixes: #3883

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:51 +00:00
Rob Bradford
f6dfb42a64 vmm: cpu: Don't place CpuManager on MMIO bus when non-dynamic
This is now consistent with not supplying the _CRS for the device when
CpuManager is not dynamic.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:39 +00:00
Rob Bradford
bbf7fd5372 vmm: Reject memory resizing on TDX
This is similar to the dynamic concept used in CpuManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-23 23:15:20 +00:00
Rob Bradford
1756b23aea vmm: device_manager: Check IOMMU placed device hotplug
Rather than just printing a message return an error back through the API
if the user attempts to hotplug a device that supports being behind an
IOMMU where that device isn't placed on an IOMMU segment.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
0834eca8d4 vmm: config: Validate IOMMU configuration
Ensure devices that are specified to be on a PCI segment that is behind
the IOMMU are IOMMU enabled if possible or error out for those devices
that do not support it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
6d2224f1ba vmm: device_manager: Create IOMMU mapping for hotplugged virtio devices
Previously it was not possible to enable vIOMMU for a virtio device.
However with the ability to place an entire PCI segment behind the
IOMMU the IOMMU mapping needs to be setup for the virtio device if it is
behind the IOMMU.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
54b65107b1 vmm: config: Validate vDPA devices in configuration
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
3b8a017257 vmm: acpi: Print total size of ACPI tables
This can already be calculated by the summing the tables reported by the
Linux kernel but this is more convenient.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-21 14:41:46 +00:00
Sebastien Boeuf
9c95109a6b vmm: Streamline reboot code path
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:50 +01:00
Sebastien Boeuf
3fea5f5396 vmm: Add support for hotplugging a vDPA device
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Sebastien Boeuf
c73c6039c3 vmm: Enable vDPA support
Based on the newly added Vdpa device along with the new vdpa parameter,
this patch enables the support for vDPA devices.

It's important to note this the only virtio device for which we provide
an ExternalDmaMapping instance. This will allow for the right DMA ranges
to be mapped/unmapped.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Sebastien Boeuf
72169686fe vmm: Add a vDPA device parameter
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Rob Bradford
7324b0e514 vmm: cpu: Only include hotplug/unplug related AML code if dynamic
This will significantly reduce the size of the DSDT and the effort
required to parse them if there is no requirement to support
hotplug/unplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
Rob Bradford
188078467d vmm: cpu: Deny resizing if CpuManager is not dynamic
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
Rob Bradford
e5cb13588b vmm: cpu: Add concept of making CpuManager dynamic
If the CpuManager is dynamic it devices CPUs can be
hotplugged/unplugged.

Since TDX does not support CPU hotplug this is currently the only
determinator as to whether the CpuManager is dynamic.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
LiHui
b0be5ff8ad API: fix http hang for vmm.ping/vm.create/vm.info/vmm.shutdown
vmm.ping/vm.info will hang for PUT method, vm.create/vmm.shutdonw hang for GET method.
Because these four APIs do not write the response body when the HTTP method does not match.

Signed-off-by: LiHui <andrewli@kubesphere.io>
2022-03-11 11:56:14 +00:00
Sebastien Boeuf
9d46890dc0 vmm: device_manager: Make virtio DMA mapping conditional on vIOMMU
In case the virtio device which requires DMA mapping is placed behind a
virtual IOMMU, we shouldn't map/unmap any region manually. Instead, we
provide the DMA handler to the virtio-iommu device so that it can
trigger the proper mappings.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
a4f742277b vmm: device_manager: Handle DMA mapping for virtio devices
If a virtio device is associated with a DMA handler, the DMA mapping and
unmapping is performed from the device manager through the handler.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00