The following is from the Hyper-V specification v6.0b.
Cpuid leaf 0x40000003 EDX:
Bit 3: Support for physical CPU dynamic partitioning events is
available.
When Windows determines to be running under a hypervisor, it will
require this cpuid bit to be set to support dynamic CPU operations.
Cpuid leaf 0x40000004 EAX:
Bit 5: Recommend using relaxed timing for this partition. If
used, the VM should disable any watchdog timeouts that
rely on the timely delivery of external interrupts.
This bit has been figured out as required after seeing guest BSOD
when CPU hotplug bit is enabled. Race conditions seem to arise after a
hotplug operation, when a system watchdog has expired.
Closes#1799.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
warning: name `LocalAPIC` contains a capitalized acronym
--> vmm/src/cpu.rs:197:8
|
197 | struct LocalAPIC {
| ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `SDT` contains a capitalized acronym
--> acpi_tables/src/sdt.rs:27:12
|
27 | pub struct SDT {
| ^^^ help: consider making the acronym lowercase, except the initial letter: `Sdt`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When booting with TDX no kernel is supplied as the TDFV is responsible
for loading the OS. The requirement to have the kernel is still
currently enforced at the validation entry point; this change merely
changes function prototypes and stored state to use Option<> to support.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In particular update for the vmm-sys-util upgrade and all the other
dependent packages. This requires an updated forked version of
kvm-bindings (due to updated vfio-ioctls) but allowed the removal of our
forked version of kvm-ioctls.
The changes to the API from kvm-ioctls and vmm-sys-util required some
other minor changes to the code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the function can never return an error this is now a clippy failure:
error: this function's return value is unnecessarily wrapped by `Result`
--> virtio-devices/src/watchdog.rs:215:5
|
215 | / fn set_state(&mut self, state: &WatchdogState) -> io::Result<()> {
216 | | self.common.avail_features = state.avail_features;
217 | | self.common.acked_features = state.acked_features;
218 | | // When restoring enable the watchdog if it was previously enabled. We reset the timer
... |
223 | | Ok(())
224 | | }
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_wraps
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This reflects that it generates CPUID state used across all vCPUs.
Further ensure that errors from this function get correctly propagated.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for populating the CPUID with KVM HyperV emulation details from
the per-vCPU CPUID handling code to the shared CPUID handling code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for populating the CPUID with details of the CPU
identification from the per-vCPU CPUID handling code to the shared CPUID
handling code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for populating the CPUID with details of the maximum
address space from the per-vCPU CPUID handling code to the shared CPUID
handling code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
_EJx built in should not return.
dsdt.dsl 813: Return (CEJ0 (0x00))
Warning 3104 - ^ Reserved method should not return a value (_EJ0)
dsdt.dsl 813: Return (CEJ0 (0x00))
Error 6080 - ^ Called method returns no value
Fixes: #2216
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The mutex timeout should be 0xffff rather than 0xfff to disable the
timeout feature.
dsdt.dsl 745: Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning 3130 - ^ Result is not used, possible operator timeout will be missed
dsdt.dsl 767: Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning 3130 - ^ Result is not used, possible operator timeout will be missed
dsdt.dsl 775: Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning 3130 - ^ Result is not used, possible operator timeout will be missed
Fixes: #2216
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This skeleton commit brings in the support for compiling aarch64 with
the "acpi" feature ready to the ACPI enabling. It builds on the work to
move the ACPI hotplug devices from I/O ports to MMIO and conditionalises
any code that is x86_64 only (i.e. because it uses an I/O port.)
Filling in the aarch64 specific details in tables such as the MADT it
out of the scope.
See: #2178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.
This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.
See #1863
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support, which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When a total ordering between multiple atomic variables is not required
then use Ordering::Acquire with atomic loads and Ordering::Release with
atomic stores.
This will improve performance as this does not require a memory fence
on x86_64 which Ordering::SeqCst will use.
Add a comment to the code in the vCPU handling code where it operates on
multiple atomics to explain why Ordering::SeqCst is required.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.
During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()
13.75% 6.02% vcpu0 cloud-hypervisor [.] arc_swap::ArcSwapAny<T,S>::load_full
|
---arc_swap::ArcSwapAny<T,S>::load_full
|
--13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
std::sys_common::backtrace::__rust_begin_short_backtrace
core::ops::function::FnOnce::call_once{{vtable-shim}}
std::sys::unix:🧵:Thread:🆕:thread_start
However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().
This completely removes the above CPU usage from subsequent profiles.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The logic to handle AArch64 system event was: SHUTDOWN and RESET were
all treated as RESET.
Now we handle them differently:
- RESET event will trigger Vmm::vm_reboot(),
- SHUTDOWN event will trigger Vmm::vm_shutdown().
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Now Vcpu::run() returns a boolean value to VcpuManager, indicating
whether the VM is going to reboot (false) or just continue (true).
Moving the handling of hypervisor VCPU run result from Vcpu to
VcpuManager gives us the flexibility to handle more scenarios like
shutting down on AArch64.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.
First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.
Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We turn on that emulation for Windows. Windows does not have KVM's PV
clock, so calling notify_guest_clock_paused results in an error.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
If the user specified a maximum physical bits value through the
`max_phys_bits` option from `--cpus` parameter, the guest CPUID
will be patched accordingly to ensure the guest will find the
right amount of physical bits.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The OneRegister literally means "one (arbitrary) register". Just call it
"Register" instead. There is no need to inherit KVM's naming scheme in
the hypervisor agnostic code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Run loop in hypervisor needs a callback mechanism to access resources
like guest memory, mmio, pio etc.
VmmOps trait is introduced here, which is implemented by vmm module.
While handling vcpuexits in run loop, this trait allows hypervisor
module access to the above mentioned resources via callbacks.
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The MTRR feature was missing from the CPUID, which is causing the guest
to ignore the MTRR settings exposed through dedicated MSRs.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Similarly as the VM booting process, on AArch64 systems,
the vCPUs should be created before the creation of GIC. This
commit refactors the vCPU save/restore code to achieve the
above-mentioned restoring order.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Since calling `KVM_GET_ONE_REG` before `KVM_VCPU_INIT` will
result in an error: Exec format error (os error 8). This commit
decouples the vCPU init process from `configure_vcpus`. Therefore
in the process of restoring the vCPUs, these vCPUs can be
initialized separately before started.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The construction of `GICR_TYPER` register will need vCPU states.
Therefore this commit adds methods to extract saved vCPU states
from the cpu manager.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Adds 3 more unit test cases for AArch64:
*save_restore_core_regs
*save_restore_system_regs
*get_set_mpstate
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit ports code from firecracker and refactors the existing
AArch64 code as the preparation for implementing save/restore
AArch64 vCPU, including:
1. Modification of `arm64_core_reg` macro to retrive the index of
arm64 core register and implemention of a helper to determine if
a register is a system register.
2. Move some macros and helpers in `arch` crate to the `hypervisor`
crate.
3. Added related unit tests for above functions and macros.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Inject CPUID leaves for advertising KVM HyperV support when the
"kvm_hyperv" toggle is enabled. Currently we only enable a selection of
features required to boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently we don't need to do anything to service these exits but when
the synthetic interrupt controller is active an exit will be triggered
to notify the VMM of details of the synthetic interrupt page.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.
Signed-off-by: Anatol Belski <ab@php.net>
The support for SGX is exposed to the guest through CPUID 0x12. KVM
passes static subleaves 0 and 1 from the host to the guest, without
needing any modification from the VMM itself.
But SGX also relies on dynamic subleaves 2 through N, used for
describing each EPC section. This is not handled by KVM, which means
the VMM is in charge of setting each subleaf starting from index 2
up to index N, depending on the number of EPC sections.
These subleaves 2 through N are not listed as part of the supported
CPUID entries from KVM. But it's important to set them as long as index
0 and 1 are present and indicate that SGX is supported.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of passing the GuestMemoryMmap directly to the CpuManager upon
its creation, it's better to pass a reference to the MemoryManager. This
way we will be able to know if SGX EPC region along with one or multiple
sections are present.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to move the hypervisor specific parts of the VM exit handling
path, we're defining a generic, hypervisor agnostic VM exit enum.
This is what the hypervisor's Vcpu run() call should return when the VM
exit can not be completely handled through the hypervisor specific bits.
For KVM based hypervisors, this means directly forwarding the IO related
exits back to the VMM itself. For other hypervisors that e.g. rely on the
VMM to decode and emulate instructions, this means the decoding itself
would happen in the hypervisor crate exclusively, and the rest of the VM
exit handling would be handled through the VMM device model implementation.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fix test_vm unit test by using the new abstraction and dropping some
dead code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The fd naming is quite KVM specific. Since we're now using the
hypervisor crate abstractions, we can rename those into something more
readable and meaningful. Like e.g. vcpu or vm.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We need consistency between pause/resume and snapshot/restore
operations. The symmetrical behavior of pausing/snapshotting
and restoring/resuming has been introduced recently, and we must
now ensure that no matter if we're using pause/resume or
snapshot/restore features, the resulting VM should be running in
the exact same way.
That's why the vCPU state is now stored upon VM pausing. The snapshot
operation being a simple serialization of the previously saved state.
The same way, the vCPU state is now restored upon VM resuming. The
restore operation being a simple deserialization of the previously
restored state.
It's interesting to note that this patch ensures time consistency from a
guest perspective, no matter which clocksource is being used. From a
previous patch, the KVM clock was saved/restored upon VM pause/resume.
We now have the same behavior for TSC, as the TSC from the vCPUs are
saved/restored upon VM pause/resume too.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>