Commit Graph

115 Commits

Author SHA1 Message Date
Rob Bradford
57ce0986f7 vmm: cpu: Add functionality for enabling TDX for all vCPUs
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-08 18:30:00 +00:00
Rob Bradford
c8cad394b5 vmm: cpu: Expose the common/shared CPUID data for all vCPUs
This allows the CPUID data to be passed into the VM level ioctl used for
initalizing TDX.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-08 18:30:00 +00:00
Rob Bradford
c48e82915d vmm: Make kernel optional in VM internals
When booting with TDX no kernel is supplied as the TDFV is responsible
for loading the OS. The requirement to have the kernel is still
currently enforced at the validation entry point; this change merely
changes function prototypes and stored state to use Option<> to support.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-08 18:30:00 +00:00
Rob Bradford
f8875acec2 misc: Bulk upgrade dependencies
In particular update for the vmm-sys-util upgrade and all the other
dependent packages. This requires an updated forked version of
kvm-bindings (due to updated vfio-ioctls) but allowed the removal of our
forked version of kvm-ioctls.

The changes to the API from kvm-ioctls and vmm-sys-util required some
other minor changes to the code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-26 11:31:08 +00:00
Rob Bradford
9c5be6f660 build: Remove unnecessary Result<> returns
If the function can never return an error this is now a clippy failure:

error: this function's return value is unnecessarily wrapped by `Result`
   --> virtio-devices/src/watchdog.rs:215:5
    |
215 | /     fn set_state(&mut self, state: &WatchdogState) -> io::Result<()> {
216 | |         self.common.avail_features = state.avail_features;
217 | |         self.common.acked_features = state.acked_features;
218 | |         // When restoring enable the watchdog if it was previously enabled. We reset the timer
...   |
223 | |         Ok(())
224 | |     }
    | |_____^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_wraps

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-11 18:18:44 +00:00
Rob Bradford
50a995b63d vmm: Rename patch_cpuid() to generate_common_cpuid()
This reflects that it generates CPUID state used across all vCPUs.
Further ensure that errors from this function get correctly propagated.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-09 16:02:25 +00:00
Rob Bradford
ccdea0274c vmm, arch: Move KVM HyperV emulation handling to shared CPUID code
Move the code for populating the CPUID with KVM HyperV emulation details from
the per-vCPU CPUID handling code to the shared CPUID handling code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-09 16:02:25 +00:00
Rob Bradford
688ead51c6 vmm, arch: Move CPU identification handling to shared CPUID code
Move the code for populating the CPUID with details of the CPU
identification from the per-vCPU CPUID handling code to the shared CPUID
handling code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-09 16:02:25 +00:00
Rob Bradford
9792c9aafa vmm, arch: Move max_phys_bits handling to shared CPUID code
Move the code for populating the CPUID with details of the maximum
address space from the per-vCPU CPUID handling code to the shared CPUID
handling code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-09 16:02:25 +00:00
Rob Bradford
952f9bd3fe vmm: acpi: Remove incorrect return statement
_EJx built in should not return.

dsdt.dsl    813:                 Return (CEJ0 (0x00))
Warning  3104 -                            ^ Reserved method should not return a value (_EJ0)

dsdt.dsl    813:                 Return (CEJ0 (0x00))
Error    6080 -                            ^ Called method returns no value

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
c29caf2a85 vmm: acpi: Fix incorrect mutex timeout value
The mutex timeout should be 0xffff rather than 0xfff to disable the
timeout feature.

dsdt.dsl    745:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    767:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    775:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
76e15a4240 vmm: acpi: Support compiling ACPI code on aarch64
This skeleton commit brings in the support for compiling aarch64 with
the "acpi" feature ready to the ACPI enabling. It builds on the work to
move the ACPI hotplug devices from I/O ports to MMIO and conditionalises
any code that is x86_64 only (i.e. because it uses an I/O port.)

Filling in the aarch64 specific details in tables such as the MADT it
out of the scope.

See: #2178

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-26 15:19:02 +08:00
Rob Bradford
55a3a38e14 vmm: acpi: Move CpuManager ACPI device to an MMIO address
Migrate the CpuManager from a fixed I/O port address to an allocated
MMIO address.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-22 16:08:41 +01:00
Rob Bradford
1fc6d50f3e misc: Make Bus::write() return an Option<Arc<Barrier>>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.

This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.

See #1863

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 11:23:53 +00:00
Muminul Islam
9ce6c3b75c hypervisor, vmm: Feature guard KVM specific code
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support,  which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2020-12-09 14:55:20 +01:00
Rob Bradford
ffaab46934 misc: Use a more relaxed memory model when possible
When a total ordering between multiple atomic variables is not required
then use Ordering::Acquire with atomic loads and Ordering::Release with
atomic stores.

This will improve performance as this does not require a memory fence
on x86_64 which Ordering::SeqCst will use.

Add a comment to the code in the vCPU handling code where it operates on
multiple atomics to explain why Ordering::SeqCst is required.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-02 19:04:30 +01:00
Rob Bradford
b2608ca285 vmm: cpu: Fix clippy issues inside test
Found by:  cargo clippy --all-features --all --tests

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-26 09:32:46 +01:00
Rob Bradford
0fec326582 hypervisor, vmm: Remove shared ownership of VmmOps
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.

During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()

    13.75%     6.02%  vcpu0            cloud-hypervisor    [.] arc_swap::ArcSwapAny<T,S>::load_full
            |
            ---arc_swap::ArcSwapAny<T,S>::load_full
               |
                --13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
                          std::sys_common::backtrace::__rust_begin_short_backtrace
                          core::ops::function::FnOnce::call_once{{vtable-shim}}
                          std::sys::unix:🧵:Thread:🆕:thread_start

However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().

This completely removes the above CPU usage from subsequent profiles.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-19 00:16:02 +01:00
Michael Zhao
093a581ee1 vmm: Implement VM rebooting on AArch64
The logic to handle AArch64 system event was: SHUTDOWN and RESET were
all treated as RESET.

Now we handle them differently:
- RESET event will trigger Vmm::vm_reboot(),
- SHUTDOWN event will trigger Vmm::vm_shutdown().

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-10-30 17:14:44 +00:00
Michael Zhao
69394c9c35 vmm: Handle hypervisor VCPU run result from Vcpu to VcpuManager
Now Vcpu::run() returns a boolean value to VcpuManager, indicating
whether the VM is going to reboot (false) or just continue (true).
Moving the handling of hypervisor VCPU run result from Vcpu to
VcpuManager gives us the flexibility to handle more scenarios like
shutting down on AArch64.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-10-30 17:14:44 +00:00
Sebastien Boeuf
28e12e9f3a vmm, hypervisor: Fix snapshot/restore for Windows guest
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.

First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.

Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-21 19:11:03 +01:00
Wei Liu
d667ed0c70 vmm: don't call notify_guest_clock_paused when Hyper-V emulation is on
We turn on that emulation for Windows. Windows does not have KVM's PV
clock, so calling notify_guest_clock_paused results in an error.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-10-15 19:14:25 +02:00
Sebastien Boeuf
1b9890b807 vmm: cpu: Set CPU physical bits based on user input
If the user specified a maximum physical bits value through the
`max_phys_bits` option from `--cpus` parameter, the guest CPUID
will be patched accordingly to ensure the guest will find the
right amount of physical bits.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-13 18:58:36 +02:00
Wei Liu
ed1fdd1f7d hypervisor, arch: rename "OneRegister" and relevant code
The OneRegister literally means "one (arbitrary) register". Just call it
"Register" instead. There is no need to inherit KVM's naming scheme in
the hypervisor agnostic code.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-10-08 08:55:10 +02:00
Praveen Paladugu
71c435ce91 hypervisor, vmm: Introduce VmmOps trait
Run loop in hypervisor needs a callback mechanism to access resources
like guest memory, mmio, pio etc.

VmmOps trait is introduced here, which is implemented by vmm module.
While handling vcpuexits in run loop, this trait allows hypervisor
module access to the above mentioned resources via callbacks.

Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-02 16:42:55 +01:00
Sebastien Boeuf
c85e396ce5 vmm: cpu: x86: Enable MTRR feature in CPUID
The MTRR feature was missing from the CPUID, which is causing the guest
to ignore the MTRR settings exposed through dedicated MSRs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-25 15:03:52 +02:00
Henry Wang
c6b47d39e0 vmm: refactor vCPU save/restore code in restoring VM
Similarly as the VM booting process, on AArch64 systems,
the vCPUs should be created before the creation of GIC. This
commit refactors the vCPU save/restore code to achieve the
above-mentioned restoring order.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
970a5a410d vmm: decouple vCPU init from configure_vcpus
Since calling `KVM_GET_ONE_REG` before `KVM_VCPU_INIT` will
result in an error: Exec format error (os error 8). This commit
decouples the vCPU init process from `configure_vcpus`. Therefore
in the process of restoring the vCPUs, these vCPUs can be
initialized separately before started.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
47e65cd341 vmm: AArch64: add methods to get saved vCPU states
The construction of `GICR_TYPER` register will need vCPU states.
Therefore this commit adds methods to extract saved vCPU states
from the cpu manager.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
9dd188a8e8 tests: AArch64: Add unit test cases for vCPU save/restore
Adds 3 more unit test cases for AArch64:

*save_restore_core_regs
*save_restore_system_regs
*get_set_mpstate

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
e3d45be6f7 AArch64: Preparation for vCPU save/restore
This commit ports code from firecracker and refactors the existing
AArch64 code as the preparation for implementing save/restore
AArch64 vCPU, including:

1. Modification of `arm64_core_reg` macro to retrive the index of
arm64 core register and implemention of a helper to determine if
a register is a system register.

2. Move some macros and helpers in `arch` crate to the `hypervisor`
crate.

3. Added related unit tests for above functions and macros.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Rob Bradford
27c28fa3b0 vmm, arch: Enable KVM HyperV support
Inject CPUID leaves for advertising KVM HyperV support when the
"kvm_hyperv" toggle is enabled. Currently we only enable a selection of
features required to boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-16 16:08:01 +01:00
Rob Bradford
da642fcf7f hypervisor: Add "HyperV" exit to list of KVM exits
Currently we don't need to do anything to service these exits but when
the synthetic interrupt controller is active an exit will be triggered
to notify the VMM of details of the synthetic interrupt page.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-16 16:08:01 +01:00
Bo Chen
2612a6df29 vmm: seccomp: Add seccomp filters for the vcpu worker thread
Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-09-11 07:42:31 +02:00
Rob Bradford
15025d71b1 devices, vm-device: Move BusDevice and Bus into vm-device
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-10 09:35:38 +01:00
Samuel Ortiz
e5ce6dc43c vmm: cpu: Warn if the guest is trying to access unregistered IO ranges
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-09-04 14:39:58 +02:00
Sebastien Boeuf
871138d5cc vm-migration: Make snapshot() mutable
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Michael Zhao
afc98a5ec9 vmm: Fix AArch64 clippy warnings of vmm and other crates
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-24 10:59:08 +02:00
Anatol Belski
eba42c392f devices: acpi: Add UID to devices with common HID
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.

Signed-off-by: Anatol Belski <ab@php.net>
2020-08-14 08:52:02 +02:00
Wei Liu
d80e383dbb arch: move test cases to vmm crate
This saves us from adding a "kvm" feature to arch crate merely for the
purpose of running tests.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-15 17:21:07 +02:00
Sebastien Boeuf
e10d9b13d4 arch, hypervisor, vmm: Patch CPUID subleaves to expose EPC sections
The support for SGX is exposed to the guest through CPUID 0x12. KVM
passes static subleaves 0 and 1 from the host to the guest, without
needing any modification from the VMM itself.

But SGX also relies on dynamic subleaves 2 through N, used for
describing each EPC section. This is not handled by KVM, which means
the VMM is in charge of setting each subleaf starting from index 2
up to index N, depending on the number of EPC sections.

These subleaves 2 through N are not listed as part of the supported
CPUID entries from KVM. But it's important to set them as long as index
0 and 1 are present and indicate that SGX is supported.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-07-15 15:08:56 +02:00
Sebastien Boeuf
1603786374 vmm: Pass MemoryManager through CpuManager creation
Instead of passing the GuestMemoryMmap directly to the CpuManager upon
its creation, it's better to pass a reference to the MemoryManager. This
way we will be able to know if SGX EPC region along with one or multiple
sections are present.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-07-15 15:08:56 +02:00
Wei Liu
a4f484bc5e hypervisor: Define a VM-Exit abstraction
In order to move the hypervisor specific parts of the VM exit handling
path, we're defining a generic, hypervisor agnostic VM exit enum.

This is what the hypervisor's Vcpu run() call should return when the VM
exit can not be completely handled through the hypervisor specific bits.
For KVM based hypervisors, this means directly forwarding the IO related
exits back to the VMM itself. For other hypervisors that e.g. rely on the
VMM to decode and emulate instructions, this means the decoding itself
would happen in the hypervisor crate exclusively, and the rest of the VM
exit handling would be handled through the VMM device model implementation.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

Fix test_vm unit test by using the new abstraction and dropping some
dead code.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-06 12:59:43 +01:00
Samuel Ortiz
3db4c003a3 vmm: cpu: Rename fd variable into something more meaningful
The fd naming is quite KVM specific. Since we're now using the
hypervisor crate abstractions, we can rename those into something more
readable and meaningful. Like e.g. vcpu or vm.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-07-06 09:35:30 +01:00
Samuel Ortiz
618722cdca hypervisor: cpu: Rename state getter and setter
vcpu.{set_}cpu_state() is a stutter.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-07-06 09:35:30 +01:00
Sebastien Boeuf
f6eeba781b vmm: Save and restore vCPU states during pause/resume operations
We need consistency between pause/resume and snapshot/restore
operations. The symmetrical behavior of pausing/snapshotting
and restoring/resuming has been introduced recently, and we must
now ensure that no matter if we're using pause/resume or
snapshot/restore features, the resulting VM should be running in
the exact same way.

That's why the vCPU state is now stored upon VM pausing. The snapshot
operation being a simple serialization of the previously saved state.
The same way, the vCPU state is now restored upon VM resuming. The
restore operation being a simple deserialization of the previously
restored state.

It's interesting to note that this patch ensures time consistency from a
guest perspective, no matter which clocksource is being used. From a
previous patch, the KVM clock was saved/restored upon VM pause/resume.
We now have the same behavior for TSC, as the TSC from the vCPUs are
saved/restored upon VM pause/resume too.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-25 12:01:34 +02:00
Sebastien Boeuf
18e7d7a1f7 vmm: cpu: Resume before shutdown in a specific way
Instead of calling the resume() function from the CpuManager, which
involves more than what is needed from the shutdown codepath, and
potentially ends up with a deadlock, we replace it with a subset.

The full resume operation is reserved for a VM that has been paused.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-25 12:01:34 +02:00
Sebastien Boeuf
65132fb99d vmm: Implement Pausable trait for Vcpu
We want each Vcpu to store the vCPU state upon VM pausing. This is the
reason why we need to explicitly implement the Pausable trait for the
Vcpu structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-25 12:01:34 +02:00
Sebastien Boeuf
4a81d65f79 vmm: Notify the guest about vCPUs being paused
Through the newly added API notify_guest_clock_paused(), this patch
improves the vCPU pause operation by letting the guest know that each
vCPU is being paused. This is important to avoid soft lockups detection
from the guest that could happen because the VM has been paused for more
than 20 seconds.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-24 12:38:56 +02:00
Sebastien Boeuf
9fa8438063 vmm: Fill CpuManager's vCPU list on restore path
It's important that on restore path, the CpuManager's vCPU gets filled
with each new vCPU that is being created. In order to cover both boot
and restore paths, the list is being filled from the common function
create_vcpu().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-24 12:38:56 +02:00