Extend the Hypervisor API in order to retrieve the TDX capabilities from
the underlying hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds `VmExit::Debug` for x86/KVM. When the guest hits a
hardware breakpoint, `VcpuExit::Debug` vm exit occurs. This vm exit
will be handled with code implemented in the following commits.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds `set_guest_debug` implementation for x86/KVM. This
function sets hardware breakpoints and single step to debug registers.
NOTE: The `set_guest_debug` implementation is based on the crosvm
implementation [1].
[1]
https://github.com/google/crosvm/blob/main/hypervisor/src/kvm/x86_64.rs
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds `translate_gva` for x86/KVM. The same name function is
already implemented for MSHV, but the implementation differs as
KVM_TRANSLATE does not take the flag argument and does not return status
code. This change requires the newer version of kvm-ioctls [1].
[1]
97ff779b6e
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Relying on the recent additions to the kvm-ioctls crate, this commit
implements the support for providing the exit reason details to the
caller, which allows the identification of the type of hypercall that
was issued. It also introduces a way for the consumer to set the status
code that must be sent back to the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Like devicefd, vcpufd also has ability to set/has attribute through kvm
ioctl. These traits are used when enable PMU on arm64, so add it here.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
If the guest hasn't initialised a PV clock then the KVM_KVMCLOCK_CTRL
ioctl will return -EINVAL. Therefore if running in the firmware or an OS
that doesn't use the PV clock then we should ignore that error
Tested by migrating a VM that has not yet booted into the Linux kernel
(just in firmware) by specifying no disk image:
e.g. target/debug/cloud-hypervisor --kernel ~/workloads/hypervisor-fw --api-socket /tmp/api --serial tty --console off
Fixes: #3586
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extending the Vm trait with set_identity_map_address() in order to
expose this ioctl to the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Place the 3 page TSS at an explicit location in the 32-bit address space
to avoid conflicting with the loaded raw firmware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The variable tmp was never initialized. Calling assume_init when the
content is not yet initialized causes immediate undefined behaviour.
We also cannot create any intermediate references because they will be
subject to the same requirements for references -- the referred object
must be valid.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
When the synthetic interrupt controller is enabled, an extra set of MSRs
must be stored in case of migration. There was one MSR missing in the
list, HV_X64_MSR_SINT14 corresponding to the 15th interrupt source from
the synthetic interrupt controller.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Update the kvm-bindings dependency so that Cloud Hypervisor now depends
on the version 0.5.0, which is based on Linux kernel v5.13.0. We still
have to rely on a forked version to be able to serialize all the KVM
structures we need.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Definition of kvm_tdx_init_vm used by INIT_VM has been updated in latest
kernel, needing an update on the Cloud Hypervisor side as well.
Update structure TdxInitVm to fit this change and avoid -EINVAL to be
returned by the kernel.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Right now, get_dirty_log API has two parameters,
slot and memory_size.
MSHV needs gpa to retrieve the page states. GPA is
needed as MSHV returns the state base on PFN.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.
This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.
Fixes: #2858
Signed-off-by: Bo Chen <chen.bo@intel.com>
We need a dedicated function to enable the SGX attribute capability
through the Hypervisor abstraction.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.
MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.
Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.
This fixes user memory region removal on MSHV.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
Not all AArch64 platforms support IPAs up to 40 bits. Since the
kvm-ioctl crate now supports `get_host_ipa_limit` for AArch64,
when creating the VM, it is better to get the IPA size from the
host and use that to create the VM.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds a helper `get_host_ipa_limit` to the AArch64
`KvmHypervisor` struct. This helper can be used to get the
`Host_IPA_Limit`, which is the maximum possible value for
IPA_Bits on the host and is dependent on the CPU capability
and the kernel configuration.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
error: name `FinalizeTDX` contains a capitalized acronym
--> vmm/src/vm.rs:274:5
|
274 | FinalizeTDX(hypervisor::HypervisorVmError),
| ^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `FinalizeTdx`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `TranslateGVA` contains a capitalized acronym
--> hypervisor/src/arch/emulator/mod.rs:51:5
|
51 | TranslateGVA(#[source] anyhow::Error),
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `TranslateGva`
|
= note: `#[warn(clippy::upper_case_acronyms)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
refactor vec_with_array_field to common hypervisor code
so that mshv can also make use of it.
Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
On AArch64, interrupt controller (GIC) is emulated by KVM. VMM need to
set IRQ routing for devices, including legacy ones.
Before this commit, IRQ routing was only set for MSI. Legacy routing
entries of type KVM_IRQ_ROUTING_IRQCHIP were missing. That is way legacy
devices (like serial device ttyS0) does not work.
The setting of X86 IRQ routing entries are not impacted.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Add API to the hypervisor interface and implement for KVM to allow the
special TDX KVM ioctls on the VM and vCPU FDs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In particular update for the vmm-sys-util upgrade and all the other
dependent packages. This requires an updated forked version of
kvm-bindings (due to updated vfio-ioctls) but allowed the removal of our
forked version of kvm-ioctls.
The changes to the API from kvm-ioctls and vmm-sys-util required some
other minor changes to the code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: field assignment outside of initializer for an instance created with Default::default()
Error: --> hypervisor/src/kvm/mod.rs:1239:9
|
1239 | state.mp_state = self.get_mp_state()?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D clippy::field-reassign-with-default` implied by `-D warnings`
note: consider initializing the variable with `kvm::aarch64::VcpuKvmState { mp_state: self.get_mp_state()?, ..Default::default() }` and removing relevant reassignments
--> hypervisor/src/kvm/mod.rs:1237:9
|
1237 | let mut state = CpuState::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: field assignment outside of initializer for an instance created with Default::default()
--> hypervisor/src/kvm/mod.rs:318:9
|
318 | cap.cap = KVM_CAP_SPLIT_IRQCHIP;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D clippy::field-reassign-with-default` implied by `-D warnings`
note: consider initializing the variable with `kvm_bindings::kvm_enable_cap { cap: KVM_CAP_SPLIT_IRQCHIP, ..Default::default() }` and removing relevant reassignments
--> hypervisor/src/kvm/mod.rs:317:9
|
317 | let mut cap: kvm_enable_cap = Default::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently these two macros(msr, msr_data) reside both on kvm and mshv
module. Definition is same for both module. Moving them to arch/x86
module eliminates redundancy and makes more sense.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When a total ordering between multiple atomic variables is not required
then use Ordering::Acquire with atomic loads and Ordering::Release with
atomic stores.
This will improve performance as this does not require a memory fence
on x86_64 which Ordering::SeqCst will use.
Add a comment to the code in the vCPU handling code where it operates on
multiple atomics to explain why Ordering::SeqCst is required.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to validate emulated memory accesses, we need to be able to get
all the segments descriptor attributes.
This is done by abstracting the SegmentRegister attributes through a
trait that each hypervisor will have to implement.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.
During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()
13.75% 6.02% vcpu0 cloud-hypervisor [.] arc_swap::ArcSwapAny<T,S>::load_full
|
---arc_swap::ArcSwapAny<T,S>::load_full
|
--13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
std::sys_common::backtrace::__rust_begin_short_backtrace
core::ops::function::FnOnce::call_once{{vtable-shim}}
std::sys::unix:🧵:Thread:🆕:thread_start
However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().
This completely removes the above CPU usage from subsequent profiles.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Cloning the ArcSwapOption (like the ArcSwap) does not act like a
.clone() on an Arc, instead an entirely new ArcSwap is created with the
same contents. To correctly share the ArcSwap needs to be placed inside
an Arc.
See: 2433d5719b (diff-6c6d94533c44c19bd1416ef17bad1a878e63dca6e98d59181228fbe8f967c62bR6)
Due to this being wrongly used ::clone() was removed from
ArcSwap/ArcSwapOption in 1.0.0.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>