Adding hv_state (hyperv state) to Vm and Vcpu struct for mshv.
This state is needed to keep some kernel data(for now hypercall page)
in the vmm.
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement hypervisor, Vm, Vcpu crate at a minimal
functionalities. Also adds the mshv feature gate,
separates out the functionalities between kvm and
mshv inside the vmm crate.
Co-Developed-by: Nuno Das Neves <nudasnev@microsoft.com>
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Co-Developed-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Co-Developed-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Co-Developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This is the initial folder structure of the mshv module inside
the hypervisor crate. The aim of this module is to support Microsoft
Hyper-V as a supported Hypervisor.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support, which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
The customized hashmap macro can't be lifted to common MockVMM code.
MockVMM only needs a collection to iterate over to get initial register
states. A vector is just as good as a hashmap.
Switch to use a vector to store initial register states. This allows us
to drop the hashmap macro everywhere.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Unfortunately Rust stable does not yet have inline ASM support the flag
calculation will have to be implemented in software.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Unfortunately it seems patch entries are ignored when obtaining
dependencies from another workspace.
Remove the problematic kvm-ioctls and kvm-bindings patch entries and use
the forked repository directly.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There is no need to have a pair of curly brackets for structures without
any member.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The mapping between code and its handler is static. We can drop the
HashMap in favour of a static match expression.
This has two benefits:
1. No more memory allocation and deallocation for the HashMap.
2. Shorter look-up time.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
When a total ordering between multiple atomic variables is not required
then use Ordering::Acquire with atomic loads and Ordering::Release with
atomic stores.
This will improve performance as this does not require a memory fence
on x86_64 which Ordering::SeqCst will use.
Add a comment to the code in the vCPU handling code where it operates on
multiple atomics to explain why Ordering::SeqCst is required.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the x86 instruction decoder tells us about some missing bytes from
the instruction stream, we call into the platform fetch method and
emulate one last instruction.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In preparation for the instruction fetching step, we modify the decoding
loop so that we can check what the last decoding error is.
We also switch to explictly using decode_out() which removes a 32 bytes
copy compared to decode().
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to validate emulated memory accesses, we need to be able to get
all the segments descriptor attributes.
This is done by abstracting the SegmentRegister attributes through a
trait that each hypervisor will have to implement.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We need to be able to build the CPU mode from its state in order to
start implementing mode related checks in the x86 emulator.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The MockVMM platform will be used by other instructions emulation
implementations, but also by the emulator framework.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The observation here is PlatformEmulator can be seen as the context for
emulation to take place. It should be rather easy to construct a context
that satisfies the lifetime constraints for instruction emulation.
The thread doing the emulation will have full ownership over the
context, so this removes the need to wrap PlatformEmulator in Arc and
Mutex, as well as the need for the context to be either Clone or Copy.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The emulator gets a CPU state from a CpuStateManager instance, emulates
the passed instructions stream and returns the modified CPU state.
The emulator is a skeleton for now since it comes with an empty
instruction mnemonic map.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
And an InstructionMap helper structure to map x86 mnemonic codes
to instruction handlers.
Any instruction emulation implementation should then boil down with
implementing InstructionHandler for any supported mnemonic.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Minimal will be defined by the amount of emulated instructions.
Carrying all GPRs, all CRs, segment registers and table registers should
cover quite a few instructions.
Co-developed-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
For efficiently emulating x86 instructions, we need to build and pass a
CPU state copy/reference to instruction emulation handlers. Those handlers
will typically modify the CPU state and let the caller commit those
changes back through the PlatformEmulator trait set_cpu_state method.
Hypervisors typically have internal CPU state structures, that maps back
to the correspinding kernel APIs. By implementing the CpuState trait,
instruction emulators will be able to directly work on CPU state
instances that are directly consumable by the underlying hypervisor and
its kernel APIs.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to emulate instructions, we need a way to get access to some of
the guest resources. The PlatformEmulator interface provides guest
memory and CPU state access to emulator implementations.
Typically, an hypervisor will implement PlatformEmulator for architecture
specific instruction emulators to build their framework on top of.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We will need the GDT API for the hypervisor's x86 instruction
emulator implementation, it's better if the arch crate depends on the
hypervisor one rather than the other way around.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.
During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()
13.75% 6.02% vcpu0 cloud-hypervisor [.] arc_swap::ArcSwapAny<T,S>::load_full
|
---arc_swap::ArcSwapAny<T,S>::load_full
|
--13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
std::sys_common::backtrace::__rust_begin_short_backtrace
core::ops::function::FnOnce::call_once{{vtable-shim}}
std::sys::unix:🧵:Thread:🆕:thread_start
However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().
This completely removes the above CPU usage from subsequent profiles.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Cloning the ArcSwapOption (like the ArcSwap) does not act like a
.clone() on an Arc, instead an entirely new ArcSwap is created with the
same contents. To correctly share the ArcSwap needs to be placed inside
an Arc.
See: 2433d5719b (diff-6c6d94533c44c19bd1416ef17bad1a878e63dca6e98d59181228fbe8f967c62bR6)
Due to this being wrongly used ::clone() was removed from
ArcSwap/ArcSwapOption in 1.0.0.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The logic to handle AArch64 system event was: SHUTDOWN and RESET were
all treated as RESET.
Now we handle them differently:
- RESET event will trigger Vmm::vm_reboot(),
- SHUTDOWN event will trigger Vmm::vm_shutdown().
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.
First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.
Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
"Using a stable sort consumes more memory and cpu cycles. Because values
which compare equal are identical, preserving their relative order (the
guarantee that a stable sort provides) means nothing, while the extra
costs still apply."
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of having the hypervisor crate embedding Cloud-Hypervisor forks
from the rust-vmm project, it's more appropriate to leave the rust-vmm
references in the hypervisor crate, and have the root Cargo.toml being
patched.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The OneRegister literally means "one (arbitrary) register". Just call it
"Register" instead. There is no need to inherit KVM's naming scheme in
the hypervisor agnostic code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Run loop in hypervisor needs a callback mechanism to access resources
like guest memory, mmio, pio etc.
VmmOps trait is introduced here, which is implemented by vmm module.
While handling vcpuexits in run loop, this trait allows hypervisor
module access to the above mentioned resources via callbacks.
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit implements the `get_device_attr` method for the
`KVM_GET_DEVICE_ATTR` ioctl. This ioctl will be used in retrieving
the GIC status.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds methods to save/restore AArch64 vCPU registers,
including:
1. The AArch64 `VcpuKvmState` structure.
2. Some `Vcpu` trait methods of the `KvmVcpu` structure to
enable the save/restore of the AArch64 vCPU states.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit ports code from firecracker and refactors the existing
AArch64 code as the preparation for implementing save/restore
AArch64 vCPU, including:
1. Modification of `arm64_core_reg` macro to retrive the index of
arm64 core register and implemention of a helper to determine if
a register is a system register.
2. Move some macros and helpers in `arch` crate to the `hypervisor`
crate.
3. Added related unit tests for above functions and macros.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Currently we don't need to do anything to service these exits but when
the synthetic interrupt controller is active an exit will be triggered
to notify the VMM of details of the synthetic interrupt page.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new function already checks if the API version is compatible. There
is no need to expose the get_api_version function to code outside
hypervisor crate.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We may need to store hypervisor speciific data to the VM. This support is
needed for Microsoft hyperv implementations. This patch introduces two
new definitions to Vm trait and implements for KVM.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Based on the way KVM_GET_MSRS and KVM_SET_MSRS work, both function are
very unlikely to fail, as they simply stop looping through the list of
MSRs as soon as getting or setting one fails. This is causing some
issues with the snapshot/restore feature, as on some platforms, we only
save a subset of the list of MSRs, leading to unproper way of saving the
VM.
The way to address this issue is by checking the number of MSRs get/set
matches the expected amount from the list. In case it does not match, we
simply ignore the failing MSR and continue getting/setting the rest of
the list. By doing this by iterations, we end up getting/setting as many
MSRs as the platform can support.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make set_gsi_routing take a list of IrqRoutingEntry. The construction of
hypervisor specific structure is left to set_gsi_routing.
Now set_gsi_routes, which is part of the interrupt module, is only
responsible for constructing a list of routing entries.
This further splits hypervisor specific code from hypervisor agnostic
code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
That function is going to return a handle for passthrough related
operations.
Move create_kvm_device code there.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
It returns an hypervisor object depending on which hypervisor is
configured. Currently it only supports KVM.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The support for SGX is exposed to the guest through CPUID 0x12. KVM
passes static subleaves 0 and 1 from the host to the guest, without
needing any modification from the VMM itself.
But SGX also relies on dynamic subleaves 2 through N, used for
describing each EPC section. This is not handled by KVM, which means
the VMM is in charge of setting each subleaf starting from index 2
up to index N, depending on the number of EPC sections.
These subleaves 2 through N are not listed as part of the supported
CPUID entries from KVM. But it's important to set them as long as index
0 and 1 are present and indicate that SGX is supported.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In this commit we saved the BDF of a PCI device and set it to "devid"
in GSI routing entry, because this field is mandatory for GICv3-ITS.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
It gets bubbled all the way up from hypervsior crate to top-level
Cargo.toml.
Cloud Hypervisor can't function without KVM at this point, so make it
a default feature.
Fix all scripts that use --no-default-features.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
In order to move the hypervisor specific parts of the VM exit handling
path, we're defining a generic, hypervisor agnostic VM exit enum.
This is what the hypervisor's Vcpu run() call should return when the VM
exit can not be completely handled through the hypervisor specific bits.
For KVM based hypervisors, this means directly forwarding the IO related
exits back to the VMM itself. For other hypervisors that e.g. rely on the
VMM to decode and emulate instructions, this means the decoding itself
would happen in the hypervisor crate exclusively, and the rest of the VM
exit handling would be handled through the VMM device model implementation.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fix test_vm unit test by using the new abstraction and dropping some
dead code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
On x86 architecture, we need to save a list of MSRs as part of the vCPU
state. By providing the full list of MSRs supported by KVM, this patch
fixes the remaining snapshot/restore issues, as the vCPU is restored
with all its previous states.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new function to the hypervisor trait so that the caller can
retrieve the list of MSRs supported by this hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some vCPU states such as MP_STATE can be modified while retrieving
other states. For this reason, it's important to follow a specific
order that will ensure a state won't be modified after it has been
saved. Comments about ordering requirements have been copied over
from Firecracker commit 57f4c7ca14a31c5536f188cacb669d2cad32b9ca.
This patch also set the previously saved VCPU_EVENTS, as this was
missing from the restore codepath.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Initially the licensing was just Apache-2.0. This patch changes
the licensing to dual license Apache-2.0 OR BSD-3-Clause
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When set_user_memory_region was moved to hypervisor crate, it was turned
into a safe function that wrapped around an unsafe call. All but one
call site had the safety statements removed. But safety statement was
not moved inside the wrapper function.
Add the safety statement back to help reasoning in the future. Also
remove that one last instance where the safety statement is not needed .
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This commit fixes some warnings introduced in the previous
hyperviosr crate PR.Removed some unused variables from arch/aarch64
module.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement the vCPU state getter and setter separately from the initial
KVM Hypervisor trait implementation, mostly for readability purposes.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
For each of the traits we are defining kvm related structures
and add the trait implementation to the structs. For more information
please see the kvm-ioctls and kvm-bindings crate.
This is a standalone implementation that does not include the switch of
the Cloud-Hypervisor vmm and arch crates to it.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As the only hypervisor that Cloud-Hypervisor supports is KVM, the
Hypervisor trait accomodates for the upcoming KVM implementation.
This trait will be instanciated at build time through hypervisor
specific features, i.e. it's not aiming at run-time selection of
hypervisors for Cloud-Hypervisor.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This VM trait should be implemented by each underlying hypervisor.
Previously created hypervisor object should create the VM based on
already selected hypervisor. This is just the trait definition. For each
of supported hypervisor we need to implement the trait. Later we will
implement this trait for KVM and then Microsoft Hyper-V.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This Vcpu trait should be implemented by each underlying hypervisor.
Previously created hypervisor object should create the VM based on
already selected hypervisor and Vm object should create this vcpu
object based on same hyperviosr. Each of this object should be
referenced by trait object i.e <dyn Vcpu>.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The purpose of this trait is to add support for other hypervisors than
KVM, like e.g. Microsoft Hyper-V.
Further commits will define additional hypervisor related traits like
Vcpu and Vm. Each of the supported hypervisor will need to implement all
traits defined from the hypervisor crate.
Signed-off-by: Muminul Islam <muislam@microsoft.com>