Every Vcpu is now created with the right state if there's an available
snapshot associated with it. This simplifies the restore logic.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the creation of the vCPUs before the DeviceManager gets created
will allow for the aarch64 vGIC to be created before the DeviceManager
as well in a follow up patch. The end goal being to adopt the same
creation sequence for both x86_64 and aarch64, and keeping in mind that
the vGIC requires every vCPU to be created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Split the vCPU creation into two distincts parts. On the one hand we
create the actual Vcpu object with the creation of the hypervisor::Vcpu.
And on the other hand, we configure the existing Vcpu, setting registers
to proper values (such as setting the entry point).
This will allow for further work to move the creation earlier in the
boot, so that the hypervisor::Vcpu will be already created when the
DeviceManager gets created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.
It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This simplifies the CI process but also logical with the existing
functionality under "guest_debug" (dumping guest memory).
Fixes: #4679
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Given the AMX x86 feature has been made available since kernel v5.17,
and given we don't have any test validating this feature, there's no
need to keep it behing a Rust feature gate.
Fixes#3996
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
On AArch64, `translate_gva` API is not provided by KVM. We implemented
it in VMM by walking through translation tables.
Address translation is big topic, here we only focus the scenario that
happens in VMM while debugging kernel. This `translate_gva`
implementation is restricted to:
- Exception Level 1
- Translate high address range only (kernel space)
This implementation supports following Arm-v8a features related to
address translation:
- FEAT_LPA
- FEAT_LVA
- FEAT_LPA2
The implementation supports page sizes of 4KiB, 16KiB and 64KiB.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
When starting the VM such that it is already on a breakpoint (via
stop_on_boot) when attached to gdb then start the vCPUs in a paused
state rather than starting the vCPUs later (upon resume).
Further, make the resumption/break of the VM more resilient by only
attempting to resume the vCPUs if were are already in a break point and
only attempting to pause/break if we were already running.
Fixes: #4354
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The original code uses kvm_device_attr directly outside of the
hyeprvisor crate. That leaks hypervisor details.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This requires making get/set_lapic_reg part of the type.
For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().
Signed-off-by: Wei Liu <liuwe@microsoft.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Function `system_registers` took mutable vector reference and modified
the vector content. Now change the definition to `get/set` style.
And rename to `get/set_sys_regs` to align with other functions.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
On AArch64, the function `core_registers` and `set_core_registers` are
the same thing of `get/set_regs` on x86_64. Now the names are aligned.
This will benefit supporting `gdb`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The Linux kernel now checks for this before marking CPUs as
hotpluggable:
commit aa06e20f1be628186f0c2dcec09ea0009eb69778
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Wed Sep 8 16:41:46 2021 -0500
x86/ACPI: Don't add CPUs that are not online capable
A number of systems are showing "hotplug capable" CPUs when they
are not really hotpluggable. This is because the MADT has extra
CPU entries to support different CPUs that may be inserted into
the socket with different numbers of cores.
Starting with ACPI 6.3 the spec has an Online Capable bit in the
MADT used to determine whether or not a CPU is hotplug capable
when the enabled bit is not set.
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The snapshots are stored in a BTree which is ordered however as the ids
are strings lexical ordering places "11" ahead of "2". So encode the
vCPU id with zero padding so it is lexically sorted.
This fixes issues with CPU restore on aarch64.
See: #4239
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on recent KVM host patches (merged in Linux 5.16), it's forbidden
to call into KVM_SET_CPUID2 after the first successful KVM_RUN returned.
That means saving CPU states during the pause sequence, and restoring
these states during the resume sequence will not work with the current
design starting with kernel version 5.16.
In order to solve this problem, let's simply move the save/restore logic
to the snapshot/restore sequences rather than the pause/resume ones.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The crash tool use a special note segment which named 'QEMU' to
analyze kaslr info and so on. If we don't add the 'QEMU' note
segment, crash tool can't find linux version to move on.
For now, the most convenient way is to add 'QEMU' note segment to
make crash tool happy.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Explicitly re-export types from the hypervisor specific modules. This
makes it much clearer what the common functionality that is exposed is.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And thus only export what is necessary through a `pub use`. This is
consistent with some of the other modules and makes it easier to
understand what the external interface of the hypervisor crate is.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The trait and functionality is about operations on the VM rather than
the VMM so should be named appropriately. This clashed with with
existing struct for the concrete implementation that was renamed
appropriately.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We don't use the VmmOps trait directly for manipulating memory in the
core of the VMM as it's really designed for the MSHV crate to handle
instruction decoding. As I plan to make this trait MSHV specific to
allow reduced locking for MMIO and PIO handling when running on KVM this
use should be removed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some addresses defined in `layout.rs` were of type `GuestAddress`, and
are `u64`. Now align the types of all the `*_START` definitions to
`GuestAddress`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The introduction of a error if live resizing is not possible is a
regression compared to the original behaviour where the new size would
be stored in the config and reflected in the next boot. This behaviour
was also inconsistent with the effect of resizing with no VM booted.
Instead of generating an error allow the code to go ahead and update the
config so that the new size will be available upon the reboot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
188078467d made clear that resize should
only happen when dealing with a "dynamic" CpuManager. Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.
There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there. However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>