We fixed the L2 and L3 level cache sharing issues and confirmed that the
L2 level cache is independent, while the L3 level cache is shared per-socket.
See:#5505
Signed-off-by: zhongbingnan <zhongbingnan@bytedance.com>
This patch modifies `event_monitor` to ensure that concurrent access to
`event_log` from multiple threads is safe. Previously, the `event_log`
function would acquire a reference to the event log file and write
to it without doing any synchronization, which made it prone to
data races. This issue likely went under the radar because the
relevant `SAFETY` comment on the unsafe block was incomplete.
The new implementation spawns a dedicated thread named `event-monitor`
solely for writing to the file. It uses the MPMC channel exposed by
`flume` to pass messages to the `event-monitor` thread. Since
`flume::Sender<T>` implements `Sync`, it is safe for multiple threads
to share it and send messages to the `event-monitor` thread.
This is not possible with `std::sync::mpsc::Sender<T>` since it's
`!Sync`, meaning it is not safe for it to be shared between different
threads.
The `event_monitor::set_monitor` function now only initializes
the required global state and returns an instance of the
`Monitor` struct. This decouples the actual logging logic from the
`event_monitor` crate. The `event-monitor` thread is then spawned by
the `vmm` crate.
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
These tests require a binary from a previous release to perform a
migration. That binary doesn't yet have the shutdown fixes and thus
these tests are supposed to fail. The tests are to be re-enabled after
the next release.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
The fixup_msix_region() function added in
a718716831
made the assumption that MSI-X was always available. This is the case
with many VFIO devices and all our virtio devices but created regression
with MSI devices.
Simply return the existing region size if MSI-X is not supported by the
device.
Fixes: #5649
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.
Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
As the container image can be used on both Intel and AMD,
ensure the SPDK binaries are compatible. This implies -march=skylake
passed to the underlaying toolchain.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Now feature "mshv" can be built together with "kvm". There is no need to
use "--no-default-features" any more.
Fixes: #5647
Signed-off-by: Bo Chen <chen.bo@intel.com>
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.
This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.
Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Currently if container registry is inaccessible the image will be built
locally and that takes time. This patch adds support to use mirror
registry. To use a different registry CTR_IMAGE environment variable
must be set. For example:
CTR_IMAGE="registry/cloud-hypervisor" scripts/dev_cli.sh
or
export CTR_IMAGE="registry/cloud-hypervisor"
scripts/dev_cli.sh
Signed-off-by: Ruslan Mstoi <ruslan.mstoi@intel.com>
Split interrupt source group restore into two steps, first restore
the irqfd for each interrupt source entry, and second restore the
GSI routing of the entire interrupt source group.
This patch will reduce restore latency of interrupt source group,
and in a 200-concurrent restore test, the patch reduced the
average IOAPIC restore time from 15ms to 1ms.
Signed-off-by: Yong He <alexyonghe@tencent.com>
The VFIO tests based on the NVMe emulation framework require cause a
high number of file descriptors to be opened on the system. On systems
with a low limit on opened files, tests like test_vfio_user can possibly
fail with "too many open files" exeption. This issue is fixed by raising
the corresponding limit.
A tricky detail here is, that the limit has to be changed in the test
container and not on the host. As the the container is executed in the
privileged mode, the setting in it will override the host value.
Related: #5426
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Add integration test of coredump with no need pause.
As file of coredump has been tested in test_coredump(), so this
patch only test vm state after coredump.
Signed-off-by: Yi Wang <foxywang@tencent.com>
If the VMM is not already paused then pause the VM prior to executing
the coredump and then resume it after. If the VM is already paused then
the original state is maintained.
Signed-off-by: Yi Wang <foxywang@tencent.com>
As the commit 2f15d027c05f (KVM: x86: Properly handle APF vs disabled
LAPIC situation) has been merged into kernel 5.13, so update the minimum
performance host kernel version to that.
Signed-off-by: Yi Wang <foxywang@tencent.com>
The commit b92fe648e9 (vmm: cpu: Disable KVM_FEATURE_ASYNC_PF_INT in
CPUID) disabled APF (Asynchronous Page Fault) mechanism to address
problem that makes vcpu thread spin 100%. As the actual issue is in
KVM, which has been merged in commit 2f15d027c05f (KVM: x86: Properly
handle APF vs disabled LAPIC situation) since 2021, so it's okay to
re-enable APF now.
Signed-off-by: Yi Wang <foxywang@tencent.com>
Add MSHV_CREATE_DEVICE, MSHV_SET_DEVICE_ATTR ioctls to filters. These
ioctls are required to passthrough PCI devices on mshv.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>