With the VIRTIO_F_EVENT_IDX handling now conducted inside the
virtio-queue crate it is necessary to activate the functionality on
every queue if it is negotiatated. Otherwise this leads to a failure of
the guest to signal to the host that there is something in the available
queue as the queue's internal state has not been configured correctly.
Fixes: #3829
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 223d0cf787)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:
crash> bt
PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
[exception RIP: svm_update_pi_irte+227]
RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086
RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001
RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8
RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200
R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b
RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020
RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0
R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0
R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.
This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.
Also, this patch adjusts the order of enable() and set_gsi_routes() in
unmask(), which should do no harm.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
(cherry picked from commit 5375b84e3b)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:
crash> bt
PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
[exception RIP: svm_update_pi_irte+227]
RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086
RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001
RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8
RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200
R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b
RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020
RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0
R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0
R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.
This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
(cherry picked from commit db9e5e5a87)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The PCI spec does not specify that the access has to be of a specific
size.
Fixes: #3714
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 9c6e7c4a4b)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 507912385a)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When freeing memory sometimes glibc will attempt to read
"/proc/sys/vm/overcommit_memory" to find out how it should release the
blocks. This happens sporadically with Cloud Hypervisor but has been
seen in use. It is not necessary to add the read() syscall to the list
as it is already included in the virtio devices common set. Similarly
the vCPU and vmm threads already have both these in the allowed list.
Fixes: #3609
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
(cherry picked from commit 53caa565bb)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Both OVMF and RHF firmwares triggered an error when O_DIRECT was used
because they didn't align the buffers to the block sector size.
In order to prevent regressions, we're adding a new test validating the
VM can properly boot when the OS disk is opened with O_DIRECT and booted
from the rust-hypervisor-fw.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the backing file of our virtio-block device is opened with
O_DIRECT, there's a requirement about the buffer address and size to be
aligned to the sector size.
We know virtio-block requests are sector aligned in terms of size, but
we must still check if the buffer address is. In case it's not, we
create an intermediate buffer that will be passed through the system
call. In case of a write operation, the content of the non-aligned
buffer must be copied beforehand, and in case of a read operation, the
content of the aligned buffer must be copied to the non-aligned one
after the operation has been completed.
Fixes#3587
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This table is listed as required in the ARM Base Boot Requirements
document. The particular need arises to make the serial debugging of
Windows guest functional.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
If the guest hasn't initialised a PV clock then the KVM_KVMCLOCK_CTRL
ioctl will return -EINVAL. Therefore if running in the firmware or an OS
that doesn't use the PV clock then we should ignore that error
Tested by migrating a VM that has not yet booted into the Linux kernel
(just in firmware) by specifying no disk image:
e.g. target/debug/cloud-hypervisor --kernel ~/workloads/hypervisor-fw --api-socket /tmp/api --serial tty --console off
Fixes: #3586
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
From 15358ef79d: Device Mapper Multipath
config can avoid systemd errors related to Device Mapper multipath while
guest booting.
From 46672c384c: CONFIG_NVME_MULTIPATH is
needed to fix the observed guest hanging issue cased by systemd crash
while booting.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The SPDK-NVMe is needed for the integration test for vfio_user.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
AArch64 does not use IOBAR, and current code of panics the whole VMM if
we need to allocate the IOBAR.
This commit checks if IOBAR is enabled before the arch conditional code
of IOBAR allocation and if the IOBAR is not enabled, we can just skip
the IOBAR allocation and do nothing.
Fixes: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/3479
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Ensure all pending virtio activations (as triggered by MMIO write on the
vCPU threads leading to a barrier wait) are completed before pausing the
vCPUs as otherwise there will a deadlock with the VMM waiting for the
vCPU to acknowledge it's pause and the vCPU waiting for the VMM to
activate the device and release the barrier.
Fixes: #3585
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The code can be written in a better form and the clippy warning
suppression can be dropped.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
These are the leftovers from the commit 8155be2:
arch: aarch64: vm_memory is not required when configuring vcpu
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Update documentation and CI to rely on the new CLOUDHV.fd firmware built
from the newly introduced target CloudHvX64.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By pinning the OVMF version, we will be able to update the EDK2 fork
with a new version without potentially breaking our Cloud Hypervisor CI.
Once the new version is ready on the EDK2 fork, we'll be able to update
Cloud Hypervisor codebase, replacing the fixed version with the latest,
as well as replacing OVMF.fd with CLOUDHV.fd. This is because we'll
start building from the new target CloudHvX64.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).
Fixes: #3566
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If this FD (wrapped in a File) is supplied when the RAM region is being
created use that over creating a new one.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This function is used to open an FD (wrapped in a File) that points to
guest memory from memfd_create() or backed on the filesystem.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>