This PR addresses a bug in which the cpu topology of a guest
with non power-of-two number of cores is incorrect. For example,
in some contexts, a virtual machine with 2-sockets and 12-cores
will incorrectly believe that 16 cores are on socket 1 and 8
cores are on socket 2. In other cases, common topology enumeration
software such as hwloc will crash.
The root of the problem was the way that cloud-hypervisor generates
apic_id. On x86_64, the (x2) apic_id embeds information about cpu
topology. The cpuid instruction is primarily used to discover the
number of sockets, dies, cores, threads, etc. Using this information,
the (x2) apic_id is masked to determine which {core, die, socket} the
cpu is on. When the cpu topology is not a power of two
(e.g. a 12-core machine), this requires non-contiguous (x2) apic_id.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
(cherry picked from commit 5c0b66529a5ea053ce50f6a67b4de3bfb9071696)
According to crates.io the crc-any crate is actively maintained which
avoids issues with the crc32c crate and the nightly compiler.
Fixes: #6168
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit d516374c39d99dffd69ef856c41b73e939c7096c)
This patch bumps the following crates, including `kvm-bindings@0.7.0`*,
`kvm-ioctls@0.16.0`**, `linux-loader@0.11.0`, `versionize@0.2.0`,
`versionize_derive@0.1.6`***, `vhost@0.10.0`,
`vhost-user-backend@0.13.1`, `virtio-queue@0.11.0`, `vm-memory@0.14.0`,
`vmm-sys-util@0.12.1`, and the latest of `vfio-bindings`, `vfio-ioctls`,
`mshv-bindings`,`mshv-ioctls`, and `vfio-user`.
* A fork of the `kvm-bindings` crate is being used to support
serialization of various structs for migration [1]. Also, code changes
are made to accommodate the updated `struct xsave` from the Linux
kernel. Note: these changes related to `struct xsave` break
live-upgrade.
** The new `kvm-ioctls` crate introduced breaking changes for
the `get/set_one_reg` API on `aarch64` [2], so code changes are made to
the new APIs.
*** A fork of the `versionize_derive` crate is being used to support
versionize on packed structs [3].
[1] https://github.com/cloud-hypervisor/kvm-bindings/tree/ch-v0.7.0
[2] https://github.com/rust-vmm/kvm-ioctls/pull/223
[3] https://github.com/cloud-hypervisor/versionize_derive/tree/ch-0.1.6Fixes: #6072
Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 3ce0fef7fd546467398c914dbc74d8542e45cf6f)
When the HTT flag CPUID.1.EDX[HTT] is 0, it indicates that there is
only a single logical processor in the package. When HTT is 1, it
indicates that CPUID.1.EBX[23:16] contains the number of logical
processors in the package.
When this information is not included in CPUID leaf 0x1, some cpu
topology enumeration software such as hwloc are known to crash.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
(cherry picked from commit 5ec47d4883666387ea58d2d9124838c2639d1e37)
When using amd topology, the svm feature flag on cpuid leaf
0x8000_0001.ecx is overwritten. We update the amd cpu topology
logic to use the flag values that originated in
KVM_GET_SUPPORTED_CPUID ioctl and override as necessary.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
(cherry picked from commit 7bc764d4e0da03bdbeb0d0f734b368d618944ac3)
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit d245e624275ea7d93b31e056666103fe16827040)
When running on the merge group this workflow is run twice - once for
the create event (merge queue creates a new branch) and once for the
merge_group event. Unfortunately the second event would cause the first
to be cancelled - unfortunately sometimes that second event is the
create event where the job in the workflow only runs if it is also a
tag.
By creating distinct concurrency groups for each event type then the
cross cancellation can be avoided.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 6f49d7f192beb7224ec7187debe250e97909ad23)
The workers share a common public IP address and often GitHub will
reject attempts to access the API due to exceeding the anonymous rate
limit threshold.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 0f71956d6df9c241b205ae3c68e38dfd98f73d68)
Run these workflows as part of the merge queue to help improve testing
coverage.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit cdafe5344d09b5f7fd731dec90657ed0c1a5b5f8)
When a bare-metal worker is canceled, its workspace can be left with
files owned by the root user as a result of running tests from our
container. This patch add a step to fix workspace permissions for such
case before checking out code.
Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit f48942ce3f12f507ea5530b926aaf631d914dadd)
It takes longer time to restore a VM on a VM with 16 cores comparing
with ones with 64 cores.
Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 071806785187e28d3567d6f2471de07fdad07c76)
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 7d60ab70e6d1d061f20524b85318cd650f88995f)
If the PR updated cancel outstanding jobs to conserve resources.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 1db30405e13995172bd45386a49a54e7c7a5f621)
The DCO tool doesn't understand merge_groups but we still need to have a
valid status check to allow the merge group to proceed.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 96cc1ba76c620f0648a64c025111309c967a3f79)
This takes a long time and duplicates existing checks on the pull
requests.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 022f375ef86f6b099b68144c9a85dcecc95492ef)
Run all the tests on the merge queue.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 81b95023c47123799413d1220c06fa0c5885cea6)
When running with the merge queue the tests will be fully executed.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit f15ca1aec398c8180fc0602b67f6e10e9abeab0f)
And clean up some of the whitespace formatting so that the "name" and
"on" are grouped away from the "jobs".
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit d9f48505fec88940025e6844cc541e84ffbff6dd)
Most of our CI workers are now running form GitHub actions, so we are
ready to disable Jenkins CI workers.
See: #6231
Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 1d098949b9f3ed7965d3ff0d4fc1fcb348f33506)
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit ba6bfee4fffa892a4f8e9a31b65a108786a65261)
Add top-level timeout for the jobs and also more agressive per step
timeouts.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 1fe2771a0ddb05d5e952eb67a18a34d656efe3a7)
To reduce issues caused by flaky tests split the musl and glibc jobs
into separate jobs. This means fewer jobs will need to be restarted for
flaky tests. This will also increase CI throughput since the musl builds
account for ~40% of the total CI time when run together with glibc.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 2e4079becb785e1b948b9cd4ae97ca3ab846a9ef)
This will help handle flakiness in the builds by requiring the minimum
number of restarts.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit d32de07be7a1fb2e5dea1a09906578107ed9b5df)
Use the matrix to add a build runnind on the AMD variant of the garm
runner.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 84a6da5e93f15b1df4e088255f51e2840b15c041)
The bionic image was being downloaded and converted but no test uses
this image any longer.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
(cherry picked from commit 6930370a03f8764b0054925515420fd20f3169df)
The 'test_vfio_user' is prone to fail when the system is under high
workloads with errors:
```
Error while connecting to /var/tmp/spdk.sock
Is SPDK application running?
Error details: Invalid or non-existing address: '/var/tmp/spdk.sock'
```
This is because SPDK is not fully functional before we request to
create a nvme device using the vfio_user protocol. This patch stabilize
this test with allowing retires to execute host commands.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch defines a new function 'generate_ram_ranges', to generate
usable physical memory ranges for the guest based on the existing guest
memory managed by VMM. This function is also made public, so that it can
be reused, say by the IGVM loader in the future [1].
No functional change.
See: #6020
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch adds missing new lines after functions,
fixes few typos in the comments, adds few missing
comments to SNP related functions.
Signed-off-by: Muminul Islam <muislam@microsoft.com>