By taking advantage of the fact that IrqRoutingEntry is exported by the
hypervisor crate (that is typedef'ed to the hypervisor specific version)
then the code for handling the MsiInterruptManager can be simplified.
This is particularly useful if in this future it is not a typedef but
rather a wrapper type.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the requirement to leak as many datastructures from the
hypervisor crate into the vmm crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The calls to these functions are always preceded by a call to
InterruptSourceGroup::update(). By adding a masked boolean to that
function call it possible to remove 50% of the calls to the
KVM_SET_GSI_ROUTING ioctl as the the update will correctly handle the
masked or unmasked case.
This causes the ioctl to disappear from the perf report for a boot of
the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:
crash> bt
PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
[exception RIP: svm_update_pi_irte+227]
RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086
RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001
RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8
RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200
R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b
RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020
RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0
R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0
R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.
This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.
Also, this patch adjusts the order of enable() and set_gsi_routes() in
unmask(), which should do no harm.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:
crash> bt
PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
[exception RIP: svm_update_pi_irte+227]
RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086
RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001
RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8
RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200
R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b
RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020
RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0
R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0
R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.
This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
After introducing multiple PCI segments, the `devid` value in
`kvm_irq_routing_entry` exceeds the maximum supported range on AArch64.
This commit restructed the `devid` to the allowed range.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On FDT, VMM can allocate IRQ from 0 for devices.
But on ACPI, the lowest range below 32 has to be avoided.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The original code had a generic type E. It was later replaced by a
concrete type. The code should have been simplified when the replacement
happened.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Drop the generic type E and use IrqRoutngEntry directly. This allows
dropping a bunch of trait bounds from code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Their make_entry functions look the same now. Extract the logic to a
common function.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
On AArch64, interrupt controller (GIC) is emulated by KVM. VMM need to
set IRQ routing for devices, including legacy ones.
Before this commit, IRQ routing was only set for MSI. Legacy routing
entries of type KVM_IRQ_ROUTING_IRQCHIP were missing. That is way legacy
devices (like serial device ttyS0) does not work.
The setting of X86 IRQ routing entries are not impacted.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Both GIC and IOAPIC must implement a new method notifier() in order to
provide the caller with an EventFd corresponding to the IRQ it refers
to.
This is needed in anticipation for supporting INTx with VFIO PCI
devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for supporting the notifier function for the legacy
interrupt source group, we need this function to return an EventFd
instead of a reference to this same EventFd.
The reason is we can't return a reference when there's an Arc<Mutex<>>
involved in the call chain.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support, which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When a total ordering between multiple atomic variables is not required
then use Ordering::Acquire with atomic loads and Ordering::Release with
atomic stores.
This will improve performance as this does not require a memory fence
on x86_64 which Ordering::SeqCst will use.
Add a comment to the code in the vCPU handling code where it operates on
multiple atomics to explain why Ordering::SeqCst is required.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.
During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()
13.75% 6.02% vcpu0 cloud-hypervisor [.] arc_swap::ArcSwapAny<T,S>::load_full
|
---arc_swap::ArcSwapAny<T,S>::load_full
|
--13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
std::sys_common::backtrace::__rust_begin_short_backtrace
core::ops::function::FnOnce::call_once{{vtable-shim}}
std::sys::unix:🧵:Thread:🆕:thread_start
However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().
This completely removes the above CPU usage from subsequent profiles.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Virtio-mmio is removed, now virtio-pci is the only option for virtio
transport layer. We use MSI for PCI device interrupt. While GICv2, the
legacy interrupt controller, doesn't support MSI. So GICv2 is not very
practical for Cloud-hypervisor, we can remove it.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
MsiInterruptGroup doesn't need to know the internal field names of
InterruptRoute. Introduce two helper functions to eliminate references
to irq_fd. This is done similarly to the enable and disable helper
functions.
Also drop the pub keyword from InterruptRoute fields. It is not needed
anymore.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
There is no point in manually dropping the lock for gsi_msi_routes then
instantly grabbing it again in set_gsi_routes.
Make set_gsi_routes take a reference to the routing hashmap instead.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This commit adds a function which allows to save RDIST pending
tables to the guest RAM, as well as unit test case for it.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds the unit test cases for getting/setting the GIC
distributor, redistributor and ICC registers.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Shrink GICDevice trait to contain hypervisor agnostic API's only, which
are used in generating FDT.
Move all KVM specific logic into KvmGICDevice trait.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Make set_gsi_routing take a list of IrqRoutingEntry. The construction of
hypervisor specific structure is left to set_gsi_routing.
Now set_gsi_routes, which is part of the interrupt module, is only
responsible for constructing a list of routing entries.
This further splits hypervisor specific code from hypervisor agnostic
code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
In this commit we saved the BDF of a PCI device and set it to "devid"
in GSI routing entry, because this field is mandatory for GICv3-ITS.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The _fd suffix is KVM specific. But since it now point to an hypervisor
agnostic hypervisor::Vm implementation, we should just rename it vm.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The logic can be shared among hypervisor implementations.
The 'static bound is used such that we don't need to deal with extra
lifetime parameter everywhere. It should be okay because we know the
entry type E doesn't contain any reference.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This trait contains a function which produces a interrupt routing entry.
Implement that trait for KvmRoutingEntry and rewrite the update
function.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The observation is only the route entry is hypervisor dependent.
Keep a definition of KvmMsiInterruptManager to avoid too much code
churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The observation is that only the route field is hypervisor specific.
Provide a new function in blanket implementation. Also redefine
KvmRoutingEntry with RoutingEntry to avoid code churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The observation is that the GSI hashmap remains untouched before getting
passed into the MSI interrupt manager. We can create that hashmap
directly in the interrupt manager's new function.
The drops one import from the interrupt module.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The structure is tightly coupled with KVM. It uses KVM specific
structures and calls. Add Kvm prefix to it.
Microsoft hypervisor will implement its own interrupt group(s) later.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
IOAPIC, a X86 specific interrupt controller, is referenced by device
manager and CPU manager. To work with more architectures, a common
type for all architectures is needed.
This commit introduces trait InterruptController to provide architecture
agnostic functions. Device manager and CPU manager can use it without
caring what the underlying device is.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.
The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.
That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.
Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>