We want to be able to reuse the SerialBuffer from the virtio-devices
crate, particularly from the virtio-console implementation. That's why
we move the SerialBuffer definition to its own crate so that it can be
accessed from both vmm and virtio-devices crates, without creating any
cyclic dependency.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Introduce a new top level member of VmConfig called PayloadConfig that
(currently) encompasses the kernel, commandline and initramfs for the
guest to use.
In future this can be extended for firmware use. The existing
"--kernel", "--cmdline" and "initramfs" CLI parameters now fill the
PayloadConfig.
Any config supplied which uses the now deprecated config members have
those members mapped to the new version with a warning.
See: #4445
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us. This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on). To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one. This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.
For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once. So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.
Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again. But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.
The generic VMM signals (SIGINT and SIGTERM) need handling at all times.
Split the signal handling into two separate threads which have differing
lifetimes.
Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.
Fixes: #4269
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on the newly added guest_debug feature, this patch adds http
endpoint support.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.
Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].
[1] https://github.com/google/crosvm/blob/main/src/gdb.rs
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.
Fixes: #3639
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).
Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.
For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).
Fixes: #3566
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This crate was used in the integration tests to allow the tests to
continue and clean up after a failure. This isn't necessary in the unit
tests and adds a large build dependency chain including an unmaintained
crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Saving the KVM clock and restoring it is key for correct behaviour of
the VM when doing snapshot/restore or live migration. The clock is
restored to the KVM state as part of the Vm::resume() method prior to
that it must be extracted from the state object and stored for later use
by this method. This change simplifies the extraction and storage part
so that it is done in the same way for both snapshot/restore and live
migration.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Always properly initialize vectors so that we don't run in undefined
behaviors when the vector gets dropped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular use the accessor for getting the device id from the bdf.
As a side effect the VIOT table is now segment aware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Move the PciSegment struct and the associated code to a new file. This
will allow some clearer separation between the core DeviceManager and
PCI handling.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.
If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.
This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This change switches from handling serial input in the VMM thread to
its own thread controlled by the SerialManager.
The motivation for this change is to avoid the VMM thread being unable
to process events while serial input is happening and vice versa.
The change also makes future work flushing the serial buffer on PTY
connections easier.
Signed-off-by: William Douglas <william.douglas@intel.com>
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize. This is the only way to be notified
by the kernel of a pty resize.
We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in. To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty. The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal. This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.
Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity. I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.
I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process. The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Move the processing of the input from stdin, PTY or file from the VMM
thread to the existing virtio-console thread. The handling of the resize
of a virtio-console has not changed but the name of the struct used to
support that has been renamed to reflect its usage.
Fixes: #3060
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Introduce a dynamic buffer for storing output from the serial port. The
SerialBuffer implements std::io::Write and can be used in place of the
direct output for the serial device.
The internals of the buffer is a vector that grows dynamically based on
demand up to a fixed size at which point old data will be overwritten.
Currently the buffer is only flushed upon writes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the indirection of a dispatch table and simply use the enum as
the event data for the events.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use two separate events for the console and serial PTY and then drive
the handling of the inputs on the PTY separately. This results in the
correct behaviour when both console and serial are attached to the PTY
as they are triggered separately on the epoll so events are not lost.
Fixes: #3012
Signed-off-by: Rob Bradford <robert.bradford@intel.com>