This simplifies the CI process but also logical with the existing
functionality under "guest_debug" (dumping guest memory).
Fixes: #4679
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Given the AMX x86 feature has been made available since kernel v5.17,
and given we don't have any test validating this feature, there's no
need to keep it behing a Rust feature gate.
Fixes#3996
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
On AArch64, `translate_gva` API is not provided by KVM. We implemented
it in VMM by walking through translation tables.
Address translation is big topic, here we only focus the scenario that
happens in VMM while debugging kernel. This `translate_gva`
implementation is restricted to:
- Exception Level 1
- Translate high address range only (kernel space)
This implementation supports following Arm-v8a features related to
address translation:
- FEAT_LPA
- FEAT_LVA
- FEAT_LPA2
The implementation supports page sizes of 4KiB, 16KiB and 64KiB.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
When starting the VM such that it is already on a breakpoint (via
stop_on_boot) when attached to gdb then start the vCPUs in a paused
state rather than starting the vCPUs later (upon resume).
Further, make the resumption/break of the VM more resilient by only
attempting to resume the vCPUs if were are already in a break point and
only attempting to pause/break if we were already running.
Fixes: #4354
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The original code uses kvm_device_attr directly outside of the
hyeprvisor crate. That leaks hypervisor details.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This requires making get/set_lapic_reg part of the type.
For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().
Signed-off-by: Wei Liu <liuwe@microsoft.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Function `system_registers` took mutable vector reference and modified
the vector content. Now change the definition to `get/set` style.
And rename to `get/set_sys_regs` to align with other functions.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
On AArch64, the function `core_registers` and `set_core_registers` are
the same thing of `get/set_regs` on x86_64. Now the names are aligned.
This will benefit supporting `gdb`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The Linux kernel now checks for this before marking CPUs as
hotpluggable:
commit aa06e20f1be628186f0c2dcec09ea0009eb69778
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Wed Sep 8 16:41:46 2021 -0500
x86/ACPI: Don't add CPUs that are not online capable
A number of systems are showing "hotplug capable" CPUs when they
are not really hotpluggable. This is because the MADT has extra
CPU entries to support different CPUs that may be inserted into
the socket with different numbers of cores.
Starting with ACPI 6.3 the spec has an Online Capable bit in the
MADT used to determine whether or not a CPU is hotplug capable
when the enabled bit is not set.
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The snapshots are stored in a BTree which is ordered however as the ids
are strings lexical ordering places "11" ahead of "2". So encode the
vCPU id with zero padding so it is lexically sorted.
This fixes issues with CPU restore on aarch64.
See: #4239
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on recent KVM host patches (merged in Linux 5.16), it's forbidden
to call into KVM_SET_CPUID2 after the first successful KVM_RUN returned.
That means saving CPU states during the pause sequence, and restoring
these states during the resume sequence will not work with the current
design starting with kernel version 5.16.
In order to solve this problem, let's simply move the save/restore logic
to the snapshot/restore sequences rather than the pause/resume ones.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The crash tool use a special note segment which named 'QEMU' to
analyze kaslr info and so on. If we don't add the 'QEMU' note
segment, crash tool can't find linux version to move on.
For now, the most convenient way is to add 'QEMU' note segment to
make crash tool happy.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Explicitly re-export types from the hypervisor specific modules. This
makes it much clearer what the common functionality that is exposed is.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And thus only export what is necessary through a `pub use`. This is
consistent with some of the other modules and makes it easier to
understand what the external interface of the hypervisor crate is.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The trait and functionality is about operations on the VM rather than
the VMM so should be named appropriately. This clashed with with
existing struct for the concrete implementation that was renamed
appropriately.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We don't use the VmmOps trait directly for manipulating memory in the
core of the VMM as it's really designed for the MSHV crate to handle
instruction decoding. As I plan to make this trait MSHV specific to
allow reduced locking for MMIO and PIO handling when running on KVM this
use should be removed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some addresses defined in `layout.rs` were of type `GuestAddress`, and
are `u64`. Now align the types of all the `*_START` definitions to
`GuestAddress`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The introduction of a error if live resizing is not possible is a
regression compared to the original behaviour where the new size would
be stored in the config and reflected in the next boot. This behaviour
was also inconsistent with the effect of resizing with no VM booted.
Instead of generating an error allow the code to go ahead and update the
config so that the new size will be available upon the reboot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
188078467d made clear that resize should
only happen when dealing with a "dynamic" CpuManager. Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.
There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there. However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
This is now consistent with not supplying the _CRS for the device when
CpuManager is not dynamic.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This will significantly reduce the size of the DSDT and the effort
required to parse them if there is no requirement to support
hotplug/unplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the CpuManager is dynamic it devices CPUs can be
hotplugged/unplugged.
Since TDX does not support CPU hotplug this is currently the only
determinator as to whether the CpuManager is dynamic.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].
[1] https://github.com/google/crosvm/blob/main/src/gdb.rs
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Based on the helpers from the hypervisor crate, the VMM can identify
what type of hypercall has been issued through the KVM_EXIT_TDX reason.
For now, we only log warnings and set the status to INVALID_OPERAND
since these hypercalls aren't supported. The proper handling will be
implemented later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the object returned from CpuManager.create_vcpu() is never used,
we can avoid the cloning of this object.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These are the leftovers from the commit 8155be2:
arch: aarch64: vm_memory is not required when configuring vcpu
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Drop the unused parameter throughout the code base.
Also take the chance to drop a needless clone.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
With the introduction of a new option `affinity` to the `cpus`
parameter, Cloud Hypervisor can now let the user choose the set
of host CPUs where to run each vCPU.
This is useful when trying to achieve CPU pinning, as well as making
sure the VM runs on a specific NUMA node.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allocator allocates 64-bit MMIO addresses for use with platform
devices e.g. ACPI control devices and ensures there is no overlap with
PCI address space ranges which can cause issues with PCI device
remapping.
Use this allocator the ACPI platform devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.
If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.
This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These statements are useful for understanding the cause of reset or
shutdown of the VM and are not spammy so should be included at info!()
level.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This is to make sure the NUMA node data structures can be accessed
both from the `vmm` crate and `arch` crate.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The AArch64 platform provides a NUMA binding for the device tree,
which means on AArch64 platform, the NUMA setup can be extended to
more than the ACPI feature.
Based on above, this commit extends the NUMA setup and data
structures to following scenarios:
- All AArch64 platform
- x86_64 platform with ACPI feature enabled
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Michael Zhao <Michael.Zhao@arm.com>
The optional Processor Properties Topology Table (PPTT) table is
used to describe the topological structure of processors controlled
by the OSPM, and their shared resources, such as caches. The table
can also describe additional information such as which nodes in the
processor topology constitute a physical package.
The ACPI PPTT table supports topology descriptions for ACPI guests.
Therefore, this commit adds the PPTT table for AArch64 to enable
CPU topology feature for ACPI.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
In an Arm system, the hierarchy of CPUs is defined through three
entities that are used to describe the layout of physical CPUs in
the system:
- cluster
- core
- thread
All these three entities have their own FDT node field. Therefore,
This commit adds an AArch64-specific helper to pass the config from
the Cloud Hypervisor command line to the `configure_system`, where
eventually the `create_fdt` is called.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When running TDX guest, the Guest Physical Address space is limited by
a shared bit that is located on bit 47 for 4 level paging, and on bit 51
for 5 level paging (when GPAW bit is 1). In order to keep things simple,
and since a 47 bits address space is 128TiB large, we ensure to limit
the physical addressable space to 47 bits when runnning TDX.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This refactoring ensures all CPUID related operations are centralized in
`arch::x86_64` module, and exposes only two related public functions to
the vmm crate, e.g. `generate_common_cpuid` and `configure_vcpu`.
Signed-off-by: Bo Chen <chen.bo@intel.com>
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We have been building Cloud Hypervisor with command like:
`cargo build --no-default-features --features ...`.
After implementing ACPI, we donot have to use specify all features
explicitly. Default build command `cargo build` can work.
This commit fixed some build warnings with default build option and
changed github workflow correspondingly.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Issue from beta verion of clippy:
error: field is never read: `type`
--> vmm/src/cpu.rs:235:5
|
235 | pub r#type: u8,
| ^^^^^^^^^^^^^^
|
= note: `-D dead-code` implied by `-D warnings`
Signed-off-by: Bo Chen <chen.bo@intel.com>
In order to allow a hotplugged vCPU to be assigned to the correct NUMA
node in the guest, the DSDT table must expose the _PXM method for each
vCPU. This method defines the proximity domain to which each vCPU should
be attached to.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Simplified definition block of CPU's on AArch64. It is not complete yet.
Guest boots. But more is to do in future:
- Fix the error in ACPI definition blocks (seen in boot messages)
- Implement CPU hot-plug controller
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
These messages are predominantly during the boot process but will also
occur during events such as hotplug.
These cover all the significant steps of the boot and can be helpful for
diagnosing performance and functionality issues during the boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By disabling this KVM feature, we prevent the guest from using APF
(Asynchronous Page Fault) mechanism. The kernel has recently switched to
using interrupts to notify about a page being ready, but for some
reasons, this is causing unexpected behavior with Cloud Hypervisor, as
it will make the vcpu thread spin at 100%.
While investigating the issue, it's better to disable the KVM feature to
prevent 100% CPU usage in some cases.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There are two parts:
- Unconditionally zero the output area. The length of the incoming
vector has been seen from 1 to 4 bytes, even though just the first
byte might need to be handled. But also, this ensures any possibly
unhandled offset will return zeroed result to the caller. The former
implementation used an I/O port which seems to behave differently from
MMIO and wouldn't require explicit output zeroing.
- An access with zero offset still takes place and needs to be handled.
Fixes#2437.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
The following is from the Hyper-V specification v6.0b.
Cpuid leaf 0x40000003 EDX:
Bit 3: Support for physical CPU dynamic partitioning events is
available.
When Windows determines to be running under a hypervisor, it will
require this cpuid bit to be set to support dynamic CPU operations.
Cpuid leaf 0x40000004 EAX:
Bit 5: Recommend using relaxed timing for this partition. If
used, the VM should disable any watchdog timeouts that
rely on the timely delivery of external interrupts.
This bit has been figured out as required after seeing guest BSOD
when CPU hotplug bit is enabled. Race conditions seem to arise after a
hotplug operation, when a system watchdog has expired.
Closes#1799.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
warning: name `LocalAPIC` contains a capitalized acronym
--> vmm/src/cpu.rs:197:8
|
197 | struct LocalAPIC {
| ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `SDT` contains a capitalized acronym
--> acpi_tables/src/sdt.rs:27:12
|
27 | pub struct SDT {
| ^^^ help: consider making the acronym lowercase, except the initial letter: `Sdt`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>