It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.
Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].
[1] https://github.com/google/crosvm/blob/main/src/gdb.rs
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.
Fixes: #3639
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).
Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.
For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).
Fixes: #3566
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This crate was used in the integration tests to allow the tests to
continue and clean up after a failure. This isn't necessary in the unit
tests and adds a large build dependency chain including an unmaintained
crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Saving the KVM clock and restoring it is key for correct behaviour of
the VM when doing snapshot/restore or live migration. The clock is
restored to the KVM state as part of the Vm::resume() method prior to
that it must be extracted from the state object and stored for later use
by this method. This change simplifies the extraction and storage part
so that it is done in the same way for both snapshot/restore and live
migration.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Always properly initialize vectors so that we don't run in undefined
behaviors when the vector gets dropped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular use the accessor for getting the device id from the bdf.
As a side effect the VIOT table is now segment aware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Move the PciSegment struct and the associated code to a new file. This
will allow some clearer separation between the core DeviceManager and
PCI handling.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.
If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.
This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This change switches from handling serial input in the VMM thread to
its own thread controlled by the SerialManager.
The motivation for this change is to avoid the VMM thread being unable
to process events while serial input is happening and vice versa.
The change also makes future work flushing the serial buffer on PTY
connections easier.
Signed-off-by: William Douglas <william.douglas@intel.com>
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize. This is the only way to be notified
by the kernel of a pty resize.
We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in. To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty. The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal. This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.
Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity. I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.
I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process. The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Move the processing of the input from stdin, PTY or file from the VMM
thread to the existing virtio-console thread. The handling of the resize
of a virtio-console has not changed but the name of the struct used to
support that has been renamed to reflect its usage.
Fixes: #3060
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Introduce a dynamic buffer for storing output from the serial port. The
SerialBuffer implements std::io::Write and can be used in place of the
direct output for the serial device.
The internals of the buffer is a vector that grows dynamically based on
demand up to a fixed size at which point old data will be overwritten.
Currently the buffer is only flushed upon writes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the indirection of a dispatch table and simply use the enum as
the event data for the events.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use two separate events for the console and serial PTY and then drive
the handling of the inputs on the PTY separately. This results in the
correct behaviour when both console and serial are attached to the PTY
as they are triggered separately on the epoll so events are not lost.
Fixes: #3012
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The code wasn't doing what it was expected to. The '?' was simply
returning the error to the top level function, meaning the Err() case in
the match was never hit. Moving the whole logic to a dedicated function
allows to identify when something got wrong without propagating to the
calling function, so that we can still stop the dirty logging and
unpause the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the migration succeeds, the destination VM will be correctly
running, with potential vhost-user backends attached to it. We can't let
the source VM trying to reconnect to the same backends, which is why
it's safer to shutdown the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that Migratable provides the methods for starting, stopping and
retrieving the dirty pages, we move the existing code to these new
functions.
No functional change intended.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch adds a fallback path for sending live migration, where it
ensures the following behavior of source VM post live-migration:
1. The source VM will be paused only when the migration is completed
successfully, or otherwise it will keep running;
2. The source VM will always stop dirty pages logging.
Fixes: #2895
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch adds a common function "Vmm::vm_check_cpuid_compatibility()"
to be shared by both live-migration and snapshot/restore.
Signed-off-by: Bo Chen <chen.bo@intel.com>
We now send not only the 'VmConfig' at the 'Command::Config' step of
live migration, but also send the 'common CPUID'. In this way, we can
check the compatibility of CPUID features between the source and
destination VMs, and abort live migration early if needed.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.
Fixes: #2858
Signed-off-by: Bo Chen <chen.bo@intel.com>
It ensures all handlers for `ApiRequest` in `control_loop` are
consistent and minimum and should read better.
No functional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>