Relying on the recent additions to the kvm-ioctls crate, this commit
implements the support for providing the exit reason details to the
caller, which allows the identification of the type of hypercall that
was issued. It also introduces a way for the consumer to set the status
code that must be sent back to the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A new fork of the kvm-bindings crate has been submitted to the
ch-v0.5.0-tdx branch. It contains updated bindings for x86 to support
TDX.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Like devicefd, vcpufd also has ability to set/has attribute through kvm
ioctl. These traits are used when enable PMU on arm64, so add it here.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
If the guest hasn't initialised a PV clock then the KVM_KVMCLOCK_CTRL
ioctl will return -EINVAL. Therefore if running in the firmware or an OS
that doesn't use the PV clock then we should ignore that error
Tested by migrating a VM that has not yet booted into the Linux kernel
(just in firmware) by specifying no disk image:
e.g. target/debug/cloud-hypervisor --kernel ~/workloads/hypervisor-fw --api-socket /tmp/api --serial tty --console off
Fixes: #3586
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extending the Vm trait with set_identity_map_address() in order to
expose this ioctl to the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Place the 3 page TSS at an explicit location in the 32-bit address space
to avoid conflicting with the loaded raw firmware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Simple aggregate types are Sync by default. There is no need to use
`impl Sync` for MockVmm (a simple struct).
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The variable tmp was never initialized. Calling assume_init when the
content is not yet initialized causes immediate undefined behaviour.
We also cannot create any intermediate references because they will be
subject to the same requirements for references -- the referred object
must be valid.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Because anyhow version 1.0.46 has been yanked, let's move back to the
previous version 1.0.45.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This was causing some issues because of the use of 2 different versions
for the vm-memmory crate. We'll wait for all dependencies to be properly
resolved before we move to 0.7.0.
This reverts commit 76b6c62d07.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Hypercall register needs to be saved and restored for
TLB flush and IPI synthetic features enablement.
Enabling these two synthetic features improves
guest performance.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When the synthetic interrupt controller is enabled, an extra set of MSRs
must be stored in case of migration. There was one MSR missing in the
list, HV_X64_MSR_SINT14 corresponding to the 15th interrupt source from
the synthetic interrupt controller.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Updating kvm-ioctls from 0.9.0 to 0.10.0 now that Cloud Hypervisor
relies on kvm-bindings 0.5.0.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Update the kvm-bindings dependency so that Cloud Hypervisor now depends
on the version 0.5.0, which is based on Linux kernel v5.13.0. We still
have to rely on a forked version to be able to serialize all the KVM
structures we need.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The rust-vmm crates we're pulling from git have renamed their main
branches. We need to update the branch names we're giving to Cargo,
or people who don't have these dependencies cached will get errors
like this when trying to build:
error: failed to get `vm-fdt` as a dependency of package `arch v0.1.0 (/home/src/cloud-hypervisor/arch)`
Caused by:
failed to load source for dependency `vm-fdt`
Caused by:
Unable to update https://github.com/rust-vmm/vm-fdt?branch=master#031572a6
Caused by:
object not found - no match for id (031572a6edc2f566a7278f1e17088fc5308d27ab); class=Odb (9); code=NotFound (-3)
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Definition of kvm_tdx_init_vm used by INIT_VM has been updated in latest
kernel, needing an update on the Cloud Hypervisor side as well.
Update structure TdxInitVm to fit this change and avoid -EINVAL to be
returned by the kernel.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This doesn't really affect the build as we ship a Cargo.lock with fixed
versions in. However for clarity it makes sense to use fixed versions
throughout and let dependabot update them.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Moving the MSHV crate form Cloud Hypervisor to rust-vmm
is done. This patch update the MSHV referent to rust-vmm
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch modify the existing live migration code
to support MSHV. Adds couple of new functions to enable
and disable dirty page tracking. Add missing IOCTL
to the seccomp rules for live migration.
Adds necessary flags for MSHV.
This changes don't affect KVM functionality at all.
In order to get better performance it is good to
enable dirty page tracking when we start live migration
and disable it when the migration is done.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Right now, get_dirty_log API has two parameters,
slot and memory_size.
MSHV needs gpa to retrieve the page states. GPA is
needed as MSHV returns the state base on PFN.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.
This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.
Fixes: #2858
Signed-off-by: Bo Chen <chen.bo@intel.com>
We need a dedicated function to enable the SGX attribute capability
through the Hypervisor abstraction.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.
MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.
Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.
This fixes user memory region removal on MSHV.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This vcpu API is necessary for MSHV related debugging.
These two registers controls the vcpu_run in the
/dev/mshv driver code.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
Initialize MTRR defType register the same way the KVM code does - WB caching by default.
Tested with latest mshv code.
Without this patch, these lines are present in guest serial log:
[ 0.000032] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.000036] CPU MTRRs all blank - virtualized system.
This indicates the guest is detecting the set MTRR.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch addresses this issue https://github.com/rust-lang/rust-bindgen/pull/2064.
While we access field of packed struct the compiler can generate the
correct code to create a temporary variable to access the packed struct
field. Access withing {} ensures that.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Not all AArch64 platforms support IPAs up to 40 bits. Since the
kvm-ioctl crate now supports `get_host_ipa_limit` for AArch64,
when creating the VM, it is better to get the IPA size from the
host and use that to create the VM.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds a helper `get_host_ipa_limit` to the AArch64
`KvmHypervisor` struct. This helper can be used to get the
`Host_IPA_Limit`, which is the maximum possible value for
IPA_Bits on the host and is dependent on the CPU capability
and the kernel configuration.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Mark it as unreachable for now in the default implementation as this is
currently only used on tdx code path which is KVM only.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `FinalizeTDX` contains a capitalized acronym
--> vmm/src/vm.rs:274:5
|
274 | FinalizeTDX(hypervisor::HypervisorVmError),
| ^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `FinalizeTdx`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: an implementation of `From` is preferred since it gives you `Into<_>` for free where the reverse isn't true
--> hypervisor/src/vm.rs:41:1
|
41 | impl Into<u64> for DataMatch {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(clippy::from_over_into)]` on by default
= help: consider to implement `From` instead
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#from_over_into
warning: 1 warning emitted
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `TranslateGVA` contains a capitalized acronym
--> hypervisor/src/arch/emulator/mod.rs:51:5
|
51 | TranslateGVA(#[source] anyhow::Error),
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `TranslateGva`
|
= note: `#[warn(clippy::upper_case_acronyms)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
refactor vec_with_array_field to common hypervisor code
so that mshv can also make use of it.
Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Bumps iced-x86 from 1.10.3 to 1.11.0.
Manual update of the code was needed since memory_displacement() was
deprecated and replaced with either memory_displacement32() or
memory_displacement64().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On AArch64, interrupt controller (GIC) is emulated by KVM. VMM need to
set IRQ routing for devices, including legacy ones.
Before this commit, IRQ routing was only set for MSI. Legacy routing
entries of type KVM_IRQ_ROUTING_IRQCHIP were missing. That is way legacy
devices (like serial device ttyS0) does not work.
The setting of X86 IRQ routing entries are not impacted.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Add API to the hypervisor interface and implement for KVM to allow the
special TDX KVM ioctls on the VM and vCPU FDs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These need to be updated together as the kvm-ioctls depends upon a
strictly newer version of kvm-bindings which requires a rebase in the CH
fork.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
So far we've only had the need to emulate one instruction. There is no
need to use a HashMap when a simple tuple for the initial mapping will
do.
We can bring back the HashMap once more sophisticated use cases surface.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This avoids code complexity down the line when we get around
implementing Windows support.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
In particular update for the vmm-sys-util upgrade and all the other
dependent packages. This requires an updated forked version of
kvm-bindings (due to updated vfio-ioctls) but allowed the removal of our
forked version of kvm-ioctls.
The changes to the API from kvm-ioctls and vmm-sys-util required some
other minor changes to the code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
According to Intel's mnemonic (which is used by iced-x86) the first
argument is destination while the second is source.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We don't have an easy way to figure out if a GPA points to normal memory
or device memory, but the guest's normal memory regions shouldn't
overlap with device regions. We can simply try to do a normal memory
read / write, and proceed to do device memory read / write if that
fails.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
OVMF would use string IO on those ports. String IO has not been
implemented, so that leads to panics.
Skip them explicitly in MSHV. Leave a long-ish comment in code to
explain the situation. We should properly implement string IO once it
becomes feasible / necessary.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
If the function can never return an error this is now a clippy failure:
error: this function's return value is unnecessarily wrapped by `Result`
--> virtio-devices/src/watchdog.rs:215:5
|
215 | / fn set_state(&mut self, state: &WatchdogState) -> io::Result<()> {
216 | | self.common.avail_features = state.avail_features;
217 | | self.common.acked_features = state.acked_features;
218 | | // When restoring enable the watchdog if it was previously enabled. We reset the timer
... |
223 | | Ok(())
224 | | }
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_wraps
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Swap the last two parameters of guest_mem_{read,write} to be consistent
with other read / write functions.
Use more descriptive parameter names.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
MOV R/RM is a special case of MOVZX, so we generalize the mov_r_rm macro
to make it support both instructions.
Fixes: #2227
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
error: field assignment outside of initializer for an instance created with Default::default()
Error: --> hypervisor/src/kvm/mod.rs:1239:9
|
1239 | state.mp_state = self.get_mp_state()?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D clippy::field-reassign-with-default` implied by `-D warnings`
note: consider initializing the variable with `kvm::aarch64::VcpuKvmState { mp_state: self.get_mp_state()?, ..Default::default() }` and removing relevant reassignments
--> hypervisor/src/kvm/mod.rs:1237:9
|
1237 | let mut state = CpuState::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: field assignment outside of initializer for an instance created with Default::default()
--> hypervisor/src/kvm/mod.rs:318:9
|
318 | cap.cap = KVM_CAP_SPLIT_IRQCHIP;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D clippy::field-reassign-with-default` implied by `-D warnings`
note: consider initializing the variable with `kvm_bindings::kvm_enable_cap { cap: KVM_CAP_SPLIT_IRQCHIP, ..Default::default() }` and removing relevant reassignments
--> hypervisor/src/kvm/mod.rs:317:9
|
317 | let mut cap: kvm_enable_cap = Default::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And fix related warnings: op_kind and op_register are being deprecated
as they might panic.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently these two macros(msr, msr_data) reside both on kvm and mshv
module. Definition is same for both module. Moving them to arch/x86
module eliminates redundancy and makes more sense.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Handle CPU exits, adding instruction emulations.
Keep CPU specific data inside vmm for later use.
Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Co-Developed-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch adds the definition and implementation
MshvEmulatorContext which is platform emulation for Hyper-V.
Co-Developed-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
A software emulated TLB. This is mostly used by
the instruction emulator to cache gva to gpa
translations passed from the hypervisor.
Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Co-Developed-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
We don't have IrqFd and IOEventFd support in the kernel for now.
So an emulation layer is needed. In the future, we will be adding this
support in the kernel.
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
vmmops trait object is needed to get access some
of the upper level vmm functionalities i.e guest
memory access, IO read write etc.
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Adding hv_state (hyperv state) to Vm and Vcpu struct for mshv.
This state is needed to keep some kernel data(for now hypercall page)
in the vmm.
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement hypervisor, Vm, Vcpu crate at a minimal
functionalities. Also adds the mshv feature gate,
separates out the functionalities between kvm and
mshv inside the vmm crate.
Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Co-Developed-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This is the initial folder structure of the mshv module inside
the hypervisor crate. The aim of this module is to support Microsoft
Hyper-V as a supported Hypervisor.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support, which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.
Signed-off-by: Muminul Islam <muislam@microsoft.com>