Relying on dnsmasq running on the host, the Windows guest are now
getting allocated with the expected IP addresses. This allows for
multiple VMs, therefore multiple tests to run in parallel.
The end goal is to reduce the time spent running Windows integration
tests.
Fixes#1891
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new wrapping structure allows for a better factorization of the
Windows specific code. This makes each test simpler and easier to read.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By using a Box around the DiskConfig trait, it becomes Sized. For that
reason, we can pass the DiskConfig to the Guest so that it can own it.
This allows for further simplification as the Guest does not need to be
bound to a specific lifetime, which makes things easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Refactor the existing vhost-user-net integration tests in order to
extend it with an extra test for vhost-user client mode.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on guest Ubuntu image 21.04, including a 5.11 kernel, this patch
adds some additional tests to the VFIO baremetal integration tests. It
adds a test for ACPI memory hotplug, another one for virtio-mem memory
hotplug, and finally a test for hotplugging the NVIDIA card.
The existing test already taking care of the reboot has been renamed.
The script running "cargo test" has been modified to run only one thread
at a time, so that each test run sequentially. This is mandatory since
the card can't be shared across multiple VMs.
Fixes#2404
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support most recent Ubuntu distributions, we must update
the way of detecting a reboot through the journal since there is no
more "-- Reboot --" logs.
Using the `--list-boots` option is the preferred way for getting the
boot count information from journalctl command. We simply need to add 1
to the count in order to get the reboot count.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving to latest Ubuntu version as the guest image is needed to move to
more recent guest kernel (5.11). With more recent kernels, we'll be able
to add hotplug and virtio-mem tests to the VFIO baremetal CI.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The CI is failing due the git server that the submodules required for
this fork of QEMU need to build from is unavailable.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
test_reboot became flaky after the refactoring of the VM reboot code.
This is because we removed the ability to specify a custom timeout.
This patch fixes the issue by allowing a custom timeout of 120s to be
set.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Factorize NVIDIA GPU checks into its own function so that it can be
reused.
Factorize linux guest reboot into its own function to reduce the amount
of code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use the PVH vmlinux for all tests (with the exception of the specific
bzImage test.)
See: #2231
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new kernel 5.12 requires the devices to be manually bound to
vfio-pci while adding a new_id is only needed once per
device_id:vendor_id pair.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It appears that mshv is not yet there to succeed with these tests. It is
suggested to ignore them and enable later one by one as the
functionality gets fixed.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
We now reply on the value from '/sys/kernel/mm/ksm/shared_pages' to
validate our "--memory mergeable=on|off" option. For `mergeable=on`,
we are expecting to see more 'shared_pages' reported by the kernel when
we start more VMs with this option. For `mergeable=off`, we are
expecting the 'shared_pages' value to be always 0, as we are assuming
the rest of the system (in our CI) is not using mergeable memory.
Fixes: #2138
Signed-off-by: Bo Chen <chen.bo@intel.com>
The MCRS method returns a 64-bit memory range descriptor. The
calculation is supposed to be done as follows:
max = min + len - 1
However, every operand is represented not as a QWORD but as combination
of two DWORDs for high and low part. Till now, the calculation was done
this way, please see also inline comments:
max.lo = min.lo + len.lo //this may overflow, need to carry over to high
max.hi = min.hi + len.hi
max.hi = max.hi - 1 // subtraction needs to happen on the low part
This calculation has been corrected the following way:
max.lo = min.lo + len.lo
max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow
max.lo = max.lo - 1 // subtract from low part
The relevant part from the generated ASL for the MCRS method:
```
Method (MCRS, 1, Serialized)
{
Acquire (MLCK, 0xFFFF)
\_SB.MHPC.MSEL = Arg0
Name (MR64, ResourceTemplate ()
{
QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0xFFFFFFFFFFFFFFFE, // Range Maximum
0x0000000000000000, // Translation Offset
0xFFFFFFFFFFFFFFFF, // Length
,, _Y00, AddressRangeMemory, TypeStatic)
})
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL) // _MIN: Minimum Base Address
CreateDWordField (MR64, 0x12, MINH)
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL) // _MAX: Maximum Base Address
CreateDWordField (MR64, 0x1A, MAXH)
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL) // _LEN: Length
CreateDWordField (MR64, 0x2A, LENH)
MINL = \_SB.MHPC.MHBL
MINH = \_SB.MHPC.MHBH
LENL = \_SB.MHPC.MHLL
LENH = \_SB.MHPC.MHLH
MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL */
MAXH = (MINH + LENH) /* \_SB_.MHPC.MCRS.LENH */
If ((MAXL < MINL))
{
MAXH += One /* \_SB_.MHPC.MCRS.MAXH */
}
MAXL -= One
Release (MLCK)
Return (MR64) /* \_SB_.MHPC.MCRS.MR64 */
}
```
Fixes#1800.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Since using bzImage is now deprecated, let's update the SGX integration
test to rely on vmlinux instead.
Fixes#2476
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both changes aim to document the absence of the CPU hot-remove
functionality on Windows.
Closes#2457.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Update the Ubuntu Focal image used as the guest image. It's based on the
latest Focal image released on April 1st 2021, and customized to include
all the utilities we need. As usual, snapd and pollinate services have
been removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fixes the current codebase so that every cargo clippy can be run with
the beta toolchain without any error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It must be specified as excluded from the workspace as it must not be
built on non-test targets due to issues with the ssh2 dependency and the
musl toolchain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This includes:
* OS disk image management
* Cloud init creation
* SSH to guest access
* Waiting for guest to boot
This will be useful in other projects that want to do similar things in
their integration tests.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on a NVIDIA Tesla T4 card present in the SGX machine, this patch
enables baremetal VFIO testing, validated by running several NVIDIA
tools in the guest. The guest image has been prepared to include all the
software needed to run these tests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Enabled all "ttyS0" related test cases:
- test_serial_off
- test_serial_tty
- test_serial_file
Enabled mandatory guest kernel driver for "ns16550a" on AArch64.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This removes the dependency on "tempdir" which in turn depends on the
large rand dependency chain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the function can never return an error this is now a clippy failure:
error: this function's return value is unnecessarily wrapped by `Result`
--> virtio-devices/src/watchdog.rs:215:5
|
215 | / fn set_state(&mut self, state: &WatchdogState) -> io::Result<()> {
216 | | self.common.avail_features = state.avail_features;
217 | | self.common.acked_features = state.acked_features;
218 | | // When restoring enable the watchdog if it was previously enabled. We reset the timer
... |
223 | | Ok(())
224 | | }
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_wraps
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.
Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.
From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).
The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.
The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).
Signed-off-by: William Douglas <william.r.douglas@gmail.com>
Let's create a fixed VHD disk file from the existing RAW file thanks to
qemu-img, and create a new integration test to validate that
Cloud-Hypervisor can boot VHD disk image.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By using `net_util::open_tap` to create the TAP interface, the created
interface will be deleted when the returned variable (`net_utils::Tap`)
is dropped.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The Windows image is quite large (about 20GiB), hence it takes some time
to copy it for every test in order to avoid potential corruption.
One way to mitigate that without compromising on safety between each
test is by using device mapper. By creating a read-only base, we ensure
the image won't be modified by any of the tests, and by creating one
snapshot for each test, we avoid copying the entire image each time.
A dedicated Copy On Write disk image is created to handle any change
that might be performed on the base image, letting the tests behave as
expected.
Fixes#2155
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By relying on the Guest object, Windows dedicated tests copy the Windows
guest image before booting from it. The point being to avoid corruption
between multiple tests. This is already how the rest of the integration
tests work, Windows tests were the only ones missing this feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This image does not have the pollinate service which can sometimes fail
and prevent SSH from starting as it marks itself as a prerequisite. This
service will never fully succeed as it tries to make a network
connection which will fail inside our test VMs.
Fixes: #2113
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Using --net=host is not necessary for any of the integration tests, so
let's use the default network option called "bridge".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some sporadic failures were due to an early connection to the VM while
it was not fully ready. Increasing sleep times fixes these issues.
Fixes#2104
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given we already check the connected IP address matches the expected
guest IP address, the check on the "booted" message is not needed.
Fixes: #2117
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This test is very flaky and regularly causing CI failures. Until we can
identify the root cause we should disable this test.
See: #2103
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Simplify our image handling by not copying both QCOW2 and raw images for
every test. Allow the test to choose QCOW2 or raw by specifying the
image name manually. A follow on patch will add explicity QCOW2 tests.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When an SSH command fails we want to be able to see, via a panic() why
and where it failed. Replace use of .unwrap_or_default() from SSH
command calls to ensure that we can see the location of the panic.
Also enhance the existing SSH output code to show the error if there is
one.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The boot time for direct kernel boot based tests is significantly
quicker than booting via the firmware and stock kernel as it triggers a
reboot during the boot process due to the initrd handling.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When doing a direct kernel boot only have console=ttyS0 in the command
line if we are explicitly testing the serial output. The default
behaviour is `--serial null` so this output will not be visible but will
trigger a KVM exit for every byte which is very costly when running
under nested virtualization.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Starting the virtio device threads from the VMM thread has slowed down
the start of the VM when running on a highly contested system like the
CI.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
On the CI we are seeing that sometimes the epoll is receiving these
errors which do not indicate a failure but that we should retry.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As we switched to focal for this test we no longer get any output during
the boot unless serial is used over virtio-console.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There have been a lot of flakes around tests such as
test_virtio_fs_hotplug_dax_on_w_vhost_user_fs_daemon() or
test_virtio_fs_hotplug_dax_on() which all try and hotplug memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the removal of vhost-user self-spawning support we should migrate
the tests to use the binaries so that we can remove the functionality
from the cloud-hypervisor binary itself.
See: #1925
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
While the addressable space size reduction of 4k in necessary due to
the Linux bug, the 64k alignment of the addressable space size is
required by Windows. This patch satisfies both.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Set the test case test_snapshot_restore X86 only, instead of excluding
it from test command line.
The command line option was added because we used to support migration
with Virtio-MMIO, but not Virtio-PCI.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Tests not ported include 1) the ones that start guest VMs without
network (e.g. test_net_hotplug, test_initramfs), 2) test_vfio that
involves l2 guest. Also, some tests that use bionic guest image are
given extended timeout (120s) for 'wait_vm_boot'.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Instead of waiting blindly with fixed amount of sleeping time, we can
use the `wait-timeout` crate to explicitly wait VM shutdown (with a
timeout). It can reduces the execution time of some tests
substantially. Also, this patch increases the `shutdown` timeout for
'test_reboot', which should fix the recent sporadic failures on this
test.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Instead of blindly waiting for 20-40s for the guest VM to boot, this
patch waits the notification from the guest VM explicitly by using a
simple TcpListener on the host and a custom systemd service in the
guest.
This patch also ported few tests to use this new machanism, while more
tests are to be ported.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now that virtio-balloon is not declared as part of the --memory
parameter, the integration tests are updated to keep the correct
behavior.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given the increased amount of output from cloud-hypervisor, this patch
also increased the PIPE_SIZE to 32MB (from 256KB).
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch prints the complete commandline when launching
cloud-hypervisor. It also prints the details of the `ssh` command if
the command is failing.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This is a new integration test running Windows as a guest with Cloud
Hypervisor. Once the VM is booted, the test connects to the guest
through SSH and shutdown the VM. If this succeeds, this means the VM
was properly booted to userspace and that the network was functional.
Important to note that because this test generates lots of logs, it
requires a large pipe size for both stdout and stderr.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Introduce a new test that will validate the new option `max_phys_bits`
from the `--cpus` parameter behaves as expected.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since all unit and integration tests are run inside containers because
they are called from dev_cli.sh, they always run as root. That's why
both unit and integration scripts can be simplified as they don't need
to apply specific capabilities and run cargo tests in a dedicated 'kvm'
group.
Fixes#1683
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>