Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is preliminary work to ensure a migrated VM is created right before
it is restored. This will be useful when moving to a design where the VM
is both created and restored simultaneously from the Snapshot.
In details, that means the MemoryManager is the object that must be
created upon receiving the config from the source VM, so that memory
content can be later received and filled into the GuestMemory.
Only after these steps happened, the snapshot is received from the
source VM, and the actual Vm object can be created from both the
snapshot and the MemoryManager previously created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
These have been replaced by members of PayloadConfig and should be
removed in v28.0 (mentioned in v26.0 release notes.)
Fixes: #4737
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Place the data structures that are required for constructing a VmConfig
into it's own module from the logic that exists to suppot them.
This is useful as a consumer of the API can now clearly see what data
structures make up the API for creating VMs.
This has no functional change and I made no attempt to clean up the
ordering (it's as in the original file) nor any other clean up.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This simplifies the CI process but also logical with the existing
functionality under "guest_debug" (dumping guest memory).
Fixes: #4679
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add tracing of the VM boot sequence from the point at which the request
to create a VM is received to the hand-off to the vCPU threads running.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Removing the option --tdx to specify that we want to run a TD VM. Rely
on --platform option by adding the "tdx" boolean parameter. This is the
new way for enabling TDX with Cloud Hypervisor.
Along with this change, the way to retrieve the firmware path has been
updated to rely on the recently introduced PayloadConfig structure.
Fixes#4556
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We want to be able to reuse the SerialBuffer from the virtio-devices
crate, particularly from the virtio-console implementation. That's why
we move the SerialBuffer definition to its own crate so that it can be
accessed from both vmm and virtio-devices crates, without creating any
cyclic dependency.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Introduce a new top level member of VmConfig called PayloadConfig that
(currently) encompasses the kernel, commandline and initramfs for the
guest to use.
In future this can be extended for firmware use. The existing
"--kernel", "--cmdline" and "initramfs" CLI parameters now fill the
PayloadConfig.
Any config supplied which uses the now deprecated config members have
those members mapped to the new version with a warning.
See: #4445
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us. This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on). To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one. This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.
For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once. So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.
Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again. But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.
The generic VMM signals (SIGINT and SIGTERM) need handling at all times.
Split the signal handling into two separate threads which have differing
lifetimes.
Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.
Fixes: #4269
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on the newly added guest_debug feature, this patch adds http
endpoint support.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.
Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].
[1] https://github.com/google/crosvm/blob/main/src/gdb.rs
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.
Fixes: #3639
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).
Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.
For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).
Fixes: #3566
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This crate was used in the integration tests to allow the tests to
continue and clean up after a failure. This isn't necessary in the unit
tests and adds a large build dependency chain including an unmaintained
crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Saving the KVM clock and restoring it is key for correct behaviour of
the VM when doing snapshot/restore or live migration. The clock is
restored to the KVM state as part of the Vm::resume() method prior to
that it must be extracted from the state object and stored for later use
by this method. This change simplifies the extraction and storage part
so that it is done in the same way for both snapshot/restore and live
migration.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Always properly initialize vectors so that we don't run in undefined
behaviors when the vector gets dropped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular use the accessor for getting the device id from the bdf.
As a side effect the VIOT table is now segment aware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>