This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.
TEST=Run `cpuid` in the guest and observe the frequency populated.
Fixes: #5178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is required for booting Linux:
From: https://lore.kernel.org/all/20221028141220.29217-3-kirill.shutemov@linux.intel.com/
"""
Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions such as using specific instructions or accessing
a specific MSR.
Notable reason for #VE is access to specific guest physical addresses.
It requires special security considerations as it is not fully in
control of the guest kernel. VMM can remove a page from EPT page table
and trigger #VE on access.
The primary use-case for #VE on a memory access is MMIO: VMM removes
page from EPT to trigger exception in the guest which allows guest to
emulate MMIO with hypercalls.
MMIO only happens on shared memory. All conventional kernel memory is
private. This includes everything from kernel stacks to kernel text.
Handling exceptions on arbitrary accesses to kernel memory is
essentially impossible as handling #VE may require access to memory
that also triggers the exception.
TDX module provides mechanism to disable #VE delivery on access to
private memory. If SEPT_VE_DISABLE TD attribute is set, private EPT
violation will not be reflected to the guest as #VE, but will trigger
exit to VMM.
Make sure the attribute is set by VMM. Panic otherwise.
There's small window during the boot before the check where kernel has
early #VE handler. But the handler is only for port I/O and panic as
soon as it sees any other #VE reason.
SEPT_VE_DISABLE makes SEPT violation unrecoverable and terminating the
TD is the only option.
Kernel has no legitimate use-cases for #VE on private memory. It is
either a guest kernel bug (like access of unaccepted memory) or
malicious/buggy VMM that removes guest page that is still in use.
In both cases terminating TD is the right thing to do.
"""
With this change Cloud Hypervisor can boot the current Linux guest
kernel.
Reported-By: Jiaqi Gao <jiaqi.gao@intel.com
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to comply with latest TDX version, we rely onto the branch
kvm-upstream-2022.08.07-v5.19-rc8 from https://github.com/intel/tdx
repository. Updates are based on changes that happened in
arch/x86/include/uapi/asm/kvm.h headers file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The double underscore made it different from how other projects would
name this particular macro.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The latest kvm-ioctls contains a breaking change to its API. Now Arm's
get/set_one_reg use u128 instead of u64.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Added `Hypervisor::get_guest_debug_hw_bps()` for fetching the number of
supported hardware breakpoints.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The field can be set directly.
This eliminates one place where dyn Device is used outside of KVM
aarch64 code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The code was moved from the vmm crate to the hypervisor crate. After the
move it is trivially obvious that it only works with KVM. Use concrete
types where possible.
This allows us to drop create_device from the Vm trait.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The original code uses kvm_device_attr directly outside of the
hyeprvisor crate. That leaks hypervisor details.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This requires making get/set_lapic_reg part of the type.
For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().
Signed-off-by: Wei Liu <liuwe@microsoft.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
VmState was introduced to hold hypervisor specific VM state. KVM does
not need it and MSHV does not really use it yet.
Just drop the code. It can be easily revived once there is a need.
Signed-off-by: Wei Liu <liuwe@microsoft.com>