1748 Commits

Author SHA1 Message Date
Rob Bradford
da33eb5e8c vmm: device_manager: Remove extra whitespace lines
These originated from the removal of the acpi feature gate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Fabiano Fidêncio
fdeb4f7c46 Revert "vmm, openapi: Token Bucket fields should be uint64"
This reverts commit 87eed369cd091db76ed8542750803659e729b239.

The reason we're reverting this is that OpenAPI Specification[0] doesn't
know how to deal with unsigned types. :-/

Right now the best to do is keep it as it's, as an int64, and try to fix
OpenAPI, or even switch to swagger, as the latter knows how to properly
deal with those.  However, switching to swagger is far from being an 1:1
transition and will require time to experiment, thus reverting this for
now seems the best approach.

[0]: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 09:26:38 +02:00
Fabiano Fidêncio
87eed369cd vmm, openapi: Token Bucket fields should be uint64
The Token Bucket fields are, on the Cloud Hypervisor side, u64.
However, we expose those as int64 in the OpenAPI YAML file.

With that in mind, let's adjust the yaml file to expose those as uint64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 13:16:02 +02:00
Rob Bradford
79f4c2db01 vmm: Enable virtio-iommu in VmConfig::validate()
This means that the automatic enabling of the virtio-iommu will also be
applied to VMs creates via the API as well as the CLI.

Fixes: #4016

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 12:27:00 +01:00
Rob Bradford
bf9f79081a vmm: Only create ACPI memory manager DSDT when resizable
If using the ACPI based hotplug only memory can be added so if the
hotplug RAM size is the same as the boot RAM size then do not include
the memory manager DSDT entries.

Also: this change simplifies the code marginally by making the
HotplugMethod enum Copyable.

This was identified from the following perf output:

     1.78%     0.00%  vmm              cloud-hypervisor      [.] <vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
            |
            ---<vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
               <vmm::memory_manager::MemorySlot as acpi_tables::aml::Aml>::append_aml_bytes
               acpi_tables::aml::Name::new
               <acpi_tables::aml::Path as acpi_tables::aml::Aml>::append_aml_bytes
               __libc_malloc

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 13:07:19 +02:00
Rob Bradford
62f17ccf8c vmm: Improve error handling for vmm::vm::Error
In particular implement thiserror::Error, cleanup wording and remove
unused errors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
cb03540ffd vmm: config: Derive thiserror::Error
No further changes are necessary that adding a #[derive(Error)] as there
is a manual implementation of Display.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
0270d697ab vmm: cpu: Improve Error reporting
Remove unused enum members, improve error messages and implement
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
47529796d0 arch: Improve arch::Error
Remove unused error enum entries, improve wording and derive
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
1c786610b7 vmm: api: Don't use clashing struct name for Error
Import vmm::Error as VmmError to allow the use of thiserror::Error to
avoid clashing names.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Sebastien Boeuf
eb6daa2fc3 pci: Store MSI interrupt manager in VfioCommon
Extend VfioCommon structure to own the MSI interrupt manager. This will
be useful for implementing the restore code path.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-22 16:16:48 +02:00
Rob Bradford
adb3dcdc13 vmm: openapi: Add serial_number to PlatformConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
e972eb7c74 arch, vmm: Expose platform serial_number via SMBIOS
Fixes: #4002

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
203dfdc156 vmm: config: Add "serial_number" option to "--platform"
This carries a string that is exposed via DMI/SMBIOS and is particularly
useful for cloud-init initialisation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 17:17:08 +02:00
Rob Bradford
4a04d1f8f2 vmm: seccomp: Allow SYS_rseq as required by newer glibc
glibc 2.35 as shipped by Fedora 36 now uses the rseq syscall.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 13:02:51 +01:00
Rob Bradford
4ca066f077 vmm: api: Simplify error reporting from HTTP to internal API calls
Use a single enum member for representing errors from the internal API.
This avoids the ugly duplication of the API call name in the error
message:

e.g.

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: VmResize(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Becomes:

$ target/debug/ch-remote --api-socket /tmp/api resize --cpus 2
Error running command: Server responded with an error: InternalServerError: ApiError(VmResize(CpuManager(DesiredVCpuCountExceedsMax)))

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-20 19:39:05 +01:00
Sebastien Boeuf
11e9f43305 vmm: Use new Resource type PciBar
Instead of defining some very generic resources as PioAddressRange or
MmioAddressRange for each PCI BAR, let's move to the new Resource type
PciBar in order to make things clearer. This allows the code for being
more readable, but also removes the need for hard assumptions about the
MMIO and PIO ranges. PioAddressRange and MmioAddressRange types can be
used to describe everything except PCI BARs. BARs are very special as
they can be relocated and have special information we want to carry
along with them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
89218b6d1e pci: Replace BAR tuple with PciBarConfiguration
In order to make the code more consistent and easier to read, we remove
the former tuple that was used to describe a BAR, replacing it with the
existing structure PciBarConfiguration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 12:54:09 -07:00
Sebastien Boeuf
1795afadb8 vmm: Factorize algorithm finding HOB memory resources
By factorizing the algorithm untangling TDVF sections from guest RAM
into a dedicated function, we can write some unit tests to validate it
properly achieves what we expect.

Adding the "tdx" feature to the unit tests, otherwise it wouldn't get
tested.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-19 15:23:12 +02:00
Sebastien Boeuf
5264d545dd pci, vmm: Extend PciDevice trait to support BAR relocation
By adding a new method id() to the PciDevice trait, we allow the caller
to retrieve a unique identifier. This is used in the context of BAR
relocation to identify the device being relocated, so that we can update
the DeviceTree resources for all PCI devices (and not only
VirtioPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0c34846ef6 vmm: Return new PCI resources from add_pci_device()
By returning the new PCI resources from add_pci_device(), we allow the
factorization of the code translating the BARs into resources. This
allows VIRTIO, VFIO and vfio-user to add the resources to the DeviceTree
node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
4f172ae4b6 vmm: Retrieve PCI resources for VFIO and vfio-user devices
Relying on the function introduced recently to get the PCI resources and
handle the restore case, both VFIO and vfio-user device creation paths
now have access to PCI resources, which can be provided to the function
add_pci_device().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
0f12fe9b3b vmm: Factorize retrieval of PCI resources
Create a dedicated function for getting the PCI segment, b/d/f and
optional resources. This is meant for handling the potential case of a
restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Sebastien Boeuf
6e084572d4 pci, virtio: Make virtio-pci BAR restoration more generic
Updating the way of restoring BAR addresses for virtio-pci by providing
a more generic approach that will be reused for other PciDevice
implementations (i.e VfioPcidevice and VfioUserPciDevice).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-14 12:11:37 +02:00
Rob Bradford
b212f2823d vmm: Deprecate mergeable option from virtio-pmem
KSM would never merge the file backed pages so this option has no
effect.

See: #3968

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-12 07:12:25 -07:00
Rob Bradford
ed87e42e6f vm-device, pci, devices: Remove InterruptSourceGroup::{un}mask
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.

This causes the ioctl to disappear from the perf report for a boot of
the VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-11 22:56:48 +01:00
Michael Zhao
d1b2a3fca9 aarch64: Add a memory-simulated flash for UEFI
EDK2 execution requires a flash device at address 0.

The new added device is not a fully functional flash. It doesn't
implement any spec of a flash device. Instead, a piece of memory is used
to simulate the flash simply.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-11 09:51:34 +01:00
Michael Zhao
298a5580a9 aarch64: Remove unnecessary function definitions
This is a refactoring commit to simplify source code.
Removed some functions that only return a layout const.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-08 11:08:43 -07:00
Michael Zhao
656425a328 aarch64: Align the data types in layout
Some addresses defined in `layout.rs` were of type `GuestAddress`, and
are `u64`. Now align the types of all the `*_START` definitions to
`GuestAddress`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-08 11:08:43 -07:00
Michael Zhao
848d88c122 aarch64: Reserve a hole in 32-bit space
The reserved space is for devices.
Some devices (like TPM) require arbitrary addresses close to 4GiB.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Michael Zhao
a3dbc3b415 aarch64: Change RAM_START type GuestAddress
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Michael Zhao
ef9f37cd5f aarch64: Rename RAM_64BIT_START in layout
`RAM_64BIT_START` was set to 1 GiB, not a real 64-bit address. Now
rename it `RAM_START` to avoid confusion.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-04-05 11:04:52 +08:00
Sebastien Boeuf
e76a5969e8 vmm: Add iommu parameter to VdpaConfig
Add a new iommu parameter to VdpaConfig in order to place the vDPA
device behind a virtual IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-05 00:09:52 +02:00
Sebastien Boeuf
00ce8277aa vmm: tdx: Fix the logic for generating HOB memory resources
The list of memory resources provided through the HOB wasn't accurate
because of the broken logic. The fix provides correct ranges to the
firmware.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-01 18:24:32 +01:00
Sebastien Boeuf
70222ffc1a vmm: tdx: Only report TempMem as reserved memory
Based on latest QEMU patches from branch tdx-qemu-2022.03.29-v7.0.0-rc1
we should only report as memory resources the TempMem sections from TDVF
sections.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-04-01 18:24:32 +01:00
Rob Bradford
7fd76eff05 vmm: Don't error if live resizing is not possible
The introduction of a error if live resizing is not possible is a
regression compared to the original behaviour where the new size would
be stored in the config and reflected in the next boot. This behaviour
was also inconsistent with the effect of resizing with no VM booted.

Instead of generating an error allow the code to go ahead and update the
config so that the new size will be available upon the reboot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-31 17:04:53 +01:00
Bo Chen
eed2a0d06b vmm: Add 'libc::SYS_shutdown' to vmm 'seccomp' filter list
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-31 09:22:07 +01:00
Fabiano Fidêncio
f049867cd9 vmm,memory_manager: Deny resizing only if the ram amount has changed
Similarly to the previous commit restricting the cpu resizing error only
to the situations where the vcpu amount has changed, let's do the same
with the memory and be consistent throughout our code base.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-30 21:29:08 +01:00
Fabiano Fidêncio
2c8045343c vmm,cpu: Deny resizing only if the vcpu amount has changed
188078467db42f50f5b7e7a7969738ebf8aec95c made clear that resize should
only happen when dealing with a "dynamic" CpuManager.  Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.

There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there.  However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-30 21:29:08 +01:00
Sebastien Boeuf
3c973fa7ce virtio-devices: vhost-user: Add support for TDX
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-30 10:32:23 +02:00
Rob Bradford
ca68b9e7a9 build: Remove "cmos" feature gate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
e0d3efec6e devices: cmos: Implement CMOS based reset
If EFI reset fails on the Linux kernel then it will fallthrough to CMOS
reset. Implement this as one of our reset solutions.

Fixes: #3912

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
7c0cf8cc23 arch, devices, vmm: Remove "acpi" feature gate
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-28 09:18:29 -07:00
William Douglas
6b0df31e5d vmm: Add support for enabling AMX in vm guests
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.

On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.

This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.

The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.

Signed-off-by: William Douglas <william.douglas@intel.com>
2022-03-25 14:11:54 -07:00
Bo Chen
639a7dd73a vmm: Improve 'test_config_validation' with precise Err assertions
Fixed: #3879

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-25 09:17:05 +00:00
Sebastien Boeuf
afd9f17b73 virtio-fs: Deprecate the DAX feature
Disable the DAX feature from the virtio-fs implementation as the feature
is still not stable. The feature is deprecated, meaning the 'dax'
parameter will be removed in about 2 releases cycles.

In the meantime, the parameter value is ignored and forced to be
disabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-24 10:39:11 -07:00
Rob Bradford
7a8061818e vmm: Don't expose MemoryManager ACPI functionality unless required
When running non-dynamic or with virtio-mem for hotplug the ACPI
functionality should not be included on the DSDT nor does the
MemoryManager need to be placed on the MMIO bus.

Fixes: #3883

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:51 +00:00
Rob Bradford
f6dfb42a64 vmm: cpu: Don't place CpuManager on MMIO bus when non-dynamic
This is now consistent with not supplying the _CRS for the device when
CpuManager is not dynamic.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:39 +00:00
Rob Bradford
bbf7fd5372 vmm: Reject memory resizing on TDX
This is similar to the dynamic concept used in CpuManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-23 23:15:20 +00:00
Rob Bradford
1756b23aea vmm: device_manager: Check IOMMU placed device hotplug
Rather than just printing a message return an error back through the API
if the user attempts to hotplug a device that supports being behind an
IOMMU where that device isn't placed on an IOMMU segment.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00