This PR addresses a bug in which the cpu topology of a guest
with non power-of-two number of cores is incorrect. For example,
in some contexts, a virtual machine with 2-sockets and 12-cores
will incorrectly believe that 16 cores are on socket 1 and 8
cores are on socket 2. In other cases, common topology enumeration
software such as hwloc will crash.
The root of the problem was the way that cloud-hypervisor generates
apic_id. On x86_64, the (x2) apic_id embeds information about cpu
topology. The cpuid instruction is primarily used to discover the
number of sockets, dies, cores, threads, etc. Using this information,
the (x2) apic_id is masked to determine which {core, die, socket} the
cpu is on. When the cpu topology is not a power of two
(e.g. a 12-core machine), this requires non-contiguous (x2) apic_id.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
This patch defines a new function 'generate_ram_ranges', to generate
usable physical memory ranges for the guest based on the existing guest
memory managed by VMM. This function is also made public, so that it can
be reused, say by the IGVM loader in the future [1].
No functional change.
See: #6020
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now, default values for vcpu topology are 0s, that is not correct and may
lead to bug. Fix it by setting default value to 1s. Also add check in
case one or more of these values are zero.
Fixes: #5892
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This struct contains all configuration fields that controls the way how
we generate CPUID for the guest on x86_64. This allows cleaner extension
when adding new configuration fields.
Signed-off-by: Bo Chen <chen.bo@intel.com>
EntryPoint had an optional entry_addr, but there is no usage of this
struct that makes it necessary that the address is optional.
Remove the Option to avoid being able to express things that are not
useful.
Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de>
This fixes all typos found by the typos utility with respect to the config file.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Update to the latest vm-memory and all the crates that also depend upon
it.
Fix some deprecation warnings.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
This feature flag gates the development for SEV-SNP enabled guest.
Also add a helper function to identify if SNP should be enabled for the
guest.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
We fixed the L2 and L3 level cache sharing issues and confirmed that the
L2 level cache is independent, while the L3 level cache is shared per-socket.
See:#5505
Signed-off-by: zhongbingnan <zhongbingnan@bytedance.com>
The commit b92fe648e9 (vmm: cpu: Disable KVM_FEATURE_ASYNC_PF_INT in
CPUID) disabled APF (Asynchronous Page Fault) mechanism to address
problem that makes vcpu thread spin 100%. As the actual issue is in
KVM, which has been merged in commit 2f15d027c05f (KVM: x86: Properly
handle APF vs disabled LAPIC situation) since 2021, so it's okay to
re-enable APF now.
Signed-off-by: Yi Wang <foxywang@tencent.com>
Using the data from sysfs forward the host host cache layout to the
guest using the FDT tables.
TEST=The host cache layout (from sysfs) can be seen in inside the guest
using lscpu.
Signed-off-by: zhongbingnan <zhongbingnan@bytedance.com>
This patch clarifies the assumptions we have regarding the guest address
space layout while creating memory mapping in E820 on x86_64 and fdt on
aarch64. It also explicitly checks on these assumptions and report
errors if these assumptions do not hold.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The previous `arch_memory_regions` function will provide some memory
regions with the specified memory size and fill all the previous
regions before using the next one, but sometimes there may be no need
to fill up the previous one, e.g., the previous one should be aligned
with hugepage size.
This commit make `arch_memory_regions` function not take any
parameters and return the max available regions, the memory manager
can use them on demand.
Fixes: #5463
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
The original codes did not consider that the previous memory region
might not be full and always set it to the maximum size.
This commit fixes this problem by creating memory mappings based on
the actual memory details in both E820 on x86_64 and fdt on aarch64.
Fixes: #5463
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
Program the APIC ID (CPUID leaf 0x1 EBX) with the CPU id. This resolves
an issue where the EDKII firmware expects the APIC ID to vary per-CPU.
Fixes: #5475
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Bump to the latest rust-vmm crates, including vm-memory, vfio,
vfio-bindings, vfio-user, virtio-bindings, virtio-queue, linux-loader,
vhost, and vhost-user-backend,
Signed-off-by: Bo Chen <chen.bo@intel.com>
This will allow the SIGWINCH listener to run on kernels older than
5.5, although on those kernels it will have to make 64 syscalls to
reset all the signal handlers.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This doesn't need to be rendered in the HTML API documentation, and
wouldn't be formatted correctly if it was.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.
TEST=Run `cpuid` in the guest and observe the frequency populated.
Fixes: #5178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Before Linux v6.0, AArch64 didn't support "socket" in "cpu-map"
(CPU topology) of FDT.
We found that clusters can be used in the same way of sockets. That is
the way we implemented the socket settings in Cloud Hypervisor. But in
fact it was a bug.
Linux commit 26a2b7 fixed the mistake. So the cluster nodes can no
longer act as sockets. And in a following commit dea8c0, sockets were
supported.
This patch fixed the way to configure sockets. In each socket, a default
cluster was added to contain all the cores, because cluster layer is
mandatory in CPU topology on AArch64.
This fix will break the socket settings on the guests where the kernel
version is lower than v6.0. In that case, if socket number is set to
more than 1, the kernel will treat that as FDT mistake and all the CPUs
will be put in single cluster of single socket.
The patch only impacts the case of using FDT, not ACPI.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
e.g. on QEMU on KVM:
cloud-hypervisor: 17.079406ms: <vmm> INFO:arch/src/x86_64/mod.rs:565 -- Running under nested virtualisation. Hypervisor string: KVMKVMKVM
Or under Azure:
cloud-hypervisor: 3.881263ms: <vmm> INFO:arch/src/x86_64/mod.rs:565 -- Running under nested virtualisation. Hypervisor string: Microsoft Hv
Fixes: #5067
Signed-off-by: Rob Bradford <robert.bradford@intel.com>