Move the live migration tests to a 'jammy' worker rather than
'jammy-small'. This type of worker has more CPUs (64 vs 16) and more RAM
(256G vs 64G), which should improve the time it takes to run each test.
With this improvement, the test shouldn't fail anymore due to timeout
being reached.
A second improvement is to reduce the amount of vCPUs created for each
VM. The point is simply to check we can migrate a VM with multiple
vCPUs, therefore using 2 instead of 6 should be enough when possible.
When testing NUMA, we can't lower the amount of vCPUs since there's a
quite complex topology that is expected there.
Also, the total amount of vCPUs is reduced from 12 to 4 (again when not
testing with NUMA).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no point in wasting resources building use Jenkins if the
change only modifies the fuzzers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This reverts commit 0d0013c46e86c18d383f3e180fa8959091bba8b9.
Grovvy shell script execution engine does not like backslash as the
escape character. So we need to put another backslash to escape the
backslash character. This would most likely fix the issue that we saw
with the CI.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
In order to conserve resources it is better to not run CI, whenever there
are changes only in fuzz/Cargo.toml or fuzz/Cargo.lock.
Fixes#4148
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
For now only generate the boot time related tests as the full metrics
test suite needs some more time to bed in.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
No need to run the Jenkins CI on pull request for which exclusively
markdown files have been modified.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since AZURE_CONNECTION_STRING is only useful for the Windows build,
let's remove it from other builds where it's not invoked.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extending the test_simple_launch() integration test to validate Cloud
Hypervisor boots correctly with both rust-hypervisor-fw and OVMF on
x86_64 platforms.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Bumping the OVMF binary version along with UEFI documentation to
reflect the latest set of patches on top of tianocore/edk2 'master'
branch, which can be found on the Cloud Hypervisor fork on 'ch' branch.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The Jenkins master is now known as the controller and the agent it provides is
called "built-in".
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since we ran into issues while using the Azure credentials plugin for
Jenkins, let's rely directly on the Azure CLI to download the Windows
guest image along with the modified OVMF firmware.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to avoid regression regarding OVS-DPDK support, a new
integration test is added. This test consists of running two VMs, both
attached to a distinct OVS port, where both ports are connected to an
OVS bridge. Once the VM are running, the test validates the connection
between the two VMs works correctly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since SGX and VFIO tests don't need to be run on pull requests, we
should not have to wait for the corresponding node (bionic-sgx or
bionic-vfio) to become available in order to skip them.
Fixes#2607
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on dnsmasq running on the host, the Windows guest are now
getting allocated with the expected IP addresses. This allows for
multiple VMs, therefore multiple tests to run in parallel.
The end goal is to reduce the time spent running Windows integration
tests.
Fixes#1891
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on a NVIDIA Tesla T4 card present in the SGX machine, this patch
enables baremetal VFIO testing, validated by running several NVIDIA
tools in the guest. The guest image has been prepared to include all the
software needed to run these tests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the SGX server is down for maintenance, all builds are waiting on
the node agent to answer, causing all PRs to be blocked.
Let's disable temporarily the SGX CI until the server is back up.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we're back on transient builder, let's download the image
everytime.
This reverts commit b5653d52787a0be5f58d557fb06ff798ac280c45.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Temporarily disable fast failing to try and make progress on CI
stabilisation to allow more test coverage.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If we rely on timeouts at the top level we can get builds being aborted
simply because they took too long to be scheduled rather than because
the actual integration tests took too long.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Build testing of changes happens on GitHub actions and the integration
tests will build the binary (with different feature flags) again. So
these earlier build operations are just wasted time on the critical
path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extending the Cloud-Hypervisor CI to allow for testing SGX on a
dedicated machine where special image and kernels are ready.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Problem with the previous solution was that Cleanup stage was not
executed when previous stages failed. We fix this by adding a post
section that executes always after the Aarch64 build completed, no
matter if it has failed, succeeded or been aborted.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cleanup of the Aarch64 machine can't be done as part of the parallel
stage as this is often skipped. When the build is aborted because
another parallel stage failed, the post actions are simply not
performed. That's why we need a dedicated stage, out of the parallel
ones, to cleanup the Aarch64 machine.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We recently added the ability for the gnu and musl workers to retry if
integration test were not passing, relying on some simple Jenkins
options. Unfortunately, this is not working as expected as the retries
never pass either. The suspected reason is the machine itself, which
might be scheduled on some specific hardware, which makes our VMs more
error prone.
Bottom line, on a faulty machine, the tests will always fail, therefore
there is no added value in retrying on the same machine.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the build was aborted then the dev_cli.sh code that is responsible
for changing the file ownership will not get run. This results in the
failure to delete some of the files in the workspace.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>