Move the live migration tests to a 'jammy' worker rather than
'jammy-small'. This type of worker has more CPUs (64 vs 16) and more RAM
(256G vs 64G), which should improve the time it takes to run each test.
With this improvement, the test shouldn't fail anymore due to timeout
being reached.
A second improvement is to reduce the amount of vCPUs created for each
VM. The point is simply to check we can migrate a VM with multiple
vCPUs, therefore using 2 instead of 6 should be enough when possible.
When testing NUMA, we can't lower the amount of vCPUs since there's a
quite complex topology that is expected there.
Also, the total amount of vCPUs is reduced from 12 to 4 (again when not
testing with NUMA).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fuzz the HTTP API handling code with a minimum HTTP API
receiver (e.g. mocking the behavior of our "vmm" thread).
Signed-off-by: Bo Chen <chen.bo@intel.com>
The parameter "poll_queue" was useful at the time Cloud Hypervisor was
responsible for spawning vhost-user backends, as it was carrying the
information the vhost-user-block backend should have this option enabled
or not.
It's been quite some time that we walked away from this design, as we
now expect a management layer to be responsible for running vhost-user
backends.
That's the reason why we can remove "poll_queue" from the DiskConfig
structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Improve error catching on the steps creating the block device so that we
can understand if qemu-img or losetup is the faulty command leading to
an empty device path.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both of these tests have been sporadically failing through multiple CI
runs. The reason is related to cloud-init which fails to run the
"init-local" script during the second boot of the VM. This causes the
network interface to not be available, and therefore the test can't SSH
into the VM as expected. The root cause is the filesystem and cache
corruption that happens on the cloud-init disk.
The way to prevent from this issue is to sync the guest filesystem
before we shut it down, and as a security harness, we also wait for a
few seconds for the shutdown command to complete inside the guest before
we trigger the API shutdown or delete.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we rely on pop_descriptor_chain() rather than iter() to iterate
over a queue, there's no more borrow on the queue itself, meaning we can
invoke add_used() directly for the iteration loop. This simplifies the
processing of the queues for each virtio device, and bring some possible
performance improvement given we don't have to iterate twice over the
list of descriptors to invoke add_used().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Using pop_descriptor_chain() is much more appropriate than iter() since
it recreates the iterator every time, avoiding the queue to be borrowed
and allowing the virtio-net implementation to match all the other ones.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The new virtio-queue version introduced some breaking changes which need
to be addressed so that Cloud Hypervisor can still work with this
version.
The most important change is about removing a handle to the guest memory
from the Queue, meaning the caller has to provide the guest memory
handle for multiple methods from the QueueT trait.
One interesting aspect is that QueueT has been widely extended to
provide every getter and setter we need to access and update the Queue
structure without having direct access to its internal fields.
This patch ports all the virtio and vhost-user devices to this new crate
definition. It also updates both vhost-user-block and vhost-user-net
backends based on the updated vhost-user-backend crate. It also updates
the fuzz directory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no point in wasting resources building use Jenkins if the
change only modifies the fuzzers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding a new limitation related to the TDX guest kernel as it doesn't
allow for most ACPI devices, meaning the PCI hotplug through ACPI isn't
supported unless we use 'tdx_disable_filter' boot parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Handle the case `body_offset` is `None` instead of calling `unwrap()`
which leads to a panic.
Signed-off-by: Maximilian Nitsch <maximilian.nitsch@d3tn.com>
When starting the VM such that it is already on a breakpoint (via
stop_on_boot) when attached to gdb then start the vCPUs in a paused
state rather than starting the vCPUs later (upon resume).
Further, make the resumption/break of the VM more resilient by only
attempting to resume the vCPUs if were are already in a break point and
only attempting to pause/break if we were already running.
Fixes: #4354
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the hardcoded addresses.
Also remove PM_TMR_BLK as spec compliant implementation will use
X_PM_TMR_BLK over this field.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Breaks the receive loop of the API client when the VMM shuts down the
socket connection. A shutdown is indicated by the return value 0 of the
`recv()` system call.[^1][^2] This case was not handled before, so the
API client tried infinitely to receive more bytes and did not return.
[^1]: https://linux.die.net/man/2/recv
[^2]: https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read
Signed-off-by: Maximilian Nitsch <maximilian.nitsch@d3tn.com>