Similar to balloon inflation, memory allocation is also constrained to
align with the page size. Therefore, memory is allocated in units of the
host page size, one page at a time, until all host pages that the memory
range requested by the guest are managed. If the requested size is
smaller than the page size, the entire page will still be allocated
because smaller allocations are not possible due to the page size
limitation.
Fixes: cloud-hypervisor#5369
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Currently, virtio-balloon can't work well with page size other than 4k.
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.
We get some idea from [1] to solve this issue.
What has been done in this commit:
For balloon inflation:
A bitmap is employed to track the memory range to be released in 4k
granularity. Once it accumulates to one host page size, the corresponding
page is released, and the bitmap is cleared to handle the next record.
This process continues until all the memory range is managed. Memory will
only be released when a consecutive set of balloon request entries from
the same host page reaches the full host page size. If a balloon request
entry from a different host page is encountered, the bitmap and the base
host page address will be reset. Consequently, memory is released in
units of the page size, ensuring efficient memory management. That's say
if memory range length to be released smaller than page size or if the
guest scatters requests each of whose size is smaller than page size
across different host pages no memory will be released.
[1] https://patchwork.kernel.org/project/qemu-devel/patch/20190214043916.22128-6-david@gibson.dropbear.id.au/
Fixes: cloud-hypervisor#5369
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Add pending removed vcpu check according to VcpuState.removing, which
can avoid cloud hypervisor hangup during continual vcpu resize.
Fix#5419
Signed-off-by: Yi Wang <foxywang@tencent.com>
These had to be previously disabled because a released binary has to be
used in course of the test scenario. The necessary functionality is now
supported with the release 34.0, thus these tests should now pass.
Fixes: #5660
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
We fixed the L2 and L3 level cache sharing issues and confirmed that the
L2 level cache is independent, while the L3 level cache is shared per-socket.
See:#5505
Signed-off-by: zhongbingnan <zhongbingnan@bytedance.com>
This patch modifies `event_monitor` to ensure that concurrent access to
`event_log` from multiple threads is safe. Previously, the `event_log`
function would acquire a reference to the event log file and write
to it without doing any synchronization, which made it prone to
data races. This issue likely went under the radar because the
relevant `SAFETY` comment on the unsafe block was incomplete.
The new implementation spawns a dedicated thread named `event-monitor`
solely for writing to the file. It uses the MPMC channel exposed by
`flume` to pass messages to the `event-monitor` thread. Since
`flume::Sender<T>` implements `Sync`, it is safe for multiple threads
to share it and send messages to the `event-monitor` thread.
This is not possible with `std::sync::mpsc::Sender<T>` since it's
`!Sync`, meaning it is not safe for it to be shared between different
threads.
The `event_monitor::set_monitor` function now only initializes
the required global state and returns an instance of the
`Monitor` struct. This decouples the actual logging logic from the
`event_monitor` crate. The `event-monitor` thread is then spawned by
the `vmm` crate.
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
These tests require a binary from a previous release to perform a
migration. That binary doesn't yet have the shutdown fixes and thus
these tests are supposed to fail. The tests are to be re-enabled after
the next release.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
The fixup_msix_region() function added in
a718716831
made the assumption that MSI-X was always available. This is the case
with many VFIO devices and all our virtio devices but created regression
with MSI devices.
Simply return the existing region size if MSI-X is not supported by the
device.
Fixes: #5649
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.
Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
As the container image can be used on both Intel and AMD,
ensure the SPDK binaries are compatible. This implies -march=skylake
passed to the underlaying toolchain.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Now feature "mshv" can be built together with "kvm". There is no need to
use "--no-default-features" any more.
Fixes: #5647
Signed-off-by: Bo Chen <chen.bo@intel.com>
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.
This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.
Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Currently if container registry is inaccessible the image will be built
locally and that takes time. This patch adds support to use mirror
registry. To use a different registry CTR_IMAGE environment variable
must be set. For example:
CTR_IMAGE="registry/cloud-hypervisor" scripts/dev_cli.sh
or
export CTR_IMAGE="registry/cloud-hypervisor"
scripts/dev_cli.sh
Signed-off-by: Ruslan Mstoi <ruslan.mstoi@intel.com>
Split interrupt source group restore into two steps, first restore
the irqfd for each interrupt source entry, and second restore the
GSI routing of the entire interrupt source group.
This patch will reduce restore latency of interrupt source group,
and in a 200-concurrent restore test, the patch reduced the
average IOAPIC restore time from 15ms to 1ms.
Signed-off-by: Yong He <alexyonghe@tencent.com>