Commit Graph

198 Commits

Author SHA1 Message Date
Sebastien Boeuf
a940f525a8 vmm: Move SerialBuffer to its own crate
We want to be able to reuse the SerialBuffer from the virtio-devices
crate, particularly from the virtio-console implementation. That's why
we move the SerialBuffer definition to its own crate so that it can be
accessed from both vmm and virtio-devices crates, without creating any
cyclic dependency.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Michael Zhao
d66d64c325 vmm: Restrict the maximum number of HW breakpoints
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-23 16:57:12 +02:00
Rob Bradford
cef51a9de0 vmm: Encompass guest payload configuration in PayloadConfig
Introduce a new top level member of VmConfig called PayloadConfig that
(currently) encompasses the kernel, commandline and initramfs for the
guest to use.

In future this can be extended for firmware use. The existing
"--kernel", "--cmdline" and "initramfs" CLI parameters now fill the
PayloadConfig.

Any config supplied which uses the now deprecated config members have
those members mapped to the new version with a warning.

See: #4445

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 15:12:34 +01:00
Bo Chen
eb056d374a vmm: Make 'EpollContext::add_event()' public
So that it can be reused by other crate, e.g. from fuzz targets.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Wei Liu
ad33f7c5e6 vmm: return seccomp rules according to hypervisors
That requires stashing the hypervisor type into various places.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-22 12:50:12 +01:00
Wei Liu
08135fa085 hypervisor: provide a generic CpudIdEntry structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
45fbf840db hypervisor, vmm: move away from CpuId type
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.

Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.

This allows us to decouple CpuIdEntry from hypervisors more easily.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Alyssa Ross
a455917db5 vmm: fix missed API or debug events
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us.  This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on).  To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one.  This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.

For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once.  So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.

Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again.  But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2022-07-14 17:44:11 +01:00
Rob Bradford
121729a3b0 vmm: Split signal handling for VM and VMM signals
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.

The generic VMM signals (SIGINT and SIGTERM) need handling at all times.

Split the signal handling into two separate threads which have differing
lifetimes.

Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
    SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
    socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.

Fixes: #4269

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-08 15:15:46 +01:00
Rob Bradford
adf5881757 build: #[allow(clippy::significant_drop_in_scrutinee) in some crates
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2716bc3311 build: Fix beta clippy issue (derive_partial_eq_without_eq)
warning: you are deriving `PartialEq` and can implement `Eq`
  --> vmm/src/serial_manager.rs:59:30
   |
59 | #[derive(Debug, Clone, Copy, PartialEq)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Wei Liu
8fa1098629 vmm: switch from lazy_static to once_cell
Once_cell does not require using macro and is slated to become part of
Rust std at some point.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 16:03:07 +01:00
Yi Wang
8b585b96c1 vmm: enable coredump
Based on the newly added guest_debug feature, this patch adds http
endpoint support.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
90034fd6ba vmm: add GuestDebuggable trait
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.

Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Maksym Pavlenko
3a0429c998 cargo: Clean up serde dependencies
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-18 08:21:19 +02:00
Rob Bradford
cd0df05808 vmm, arch: CpuId is x86_64 specific so import from the x86_64 module
It will be removed as a top-level export from the hypervisor crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
62f17ccf8c vmm: Improve error handling for vmm::vm::Error
In particular implement thiserror::Error, cleanup wording and remove
unused errors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
7c0cf8cc23 arch, devices, vmm: Remove "acpi" feature gate
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-28 09:18:29 -07:00
William Douglas
6b0df31e5d vmm: Add support for enabling AMX in vm guests
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.

On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.

This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.

The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.

Signed-off-by: William Douglas <william.douglas@intel.com>
2022-03-25 14:11:54 -07:00
Sebastien Boeuf
9c95109a6b vmm: Streamline reboot code path
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:50 +01:00
Sebastien Boeuf
3fea5f5396 vmm: Add support for hotplugging a vDPA device
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Sebastien Boeuf
72169686fe vmm: Add a vDPA device parameter
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Akira Moroo
2451c4d833 vmm: Implement GDB event handler to enable --gdb flag
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
f1c4705638 vmm: Add Debuggable trait implementation
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].

[1] https://github.com/google/crosvm/blob/main/src/gdb.rs

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Fabiano Fidêncio
5d2db68f67 vmm: lib: Allow config changes before the VM is booted
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.

Fixes: #3639

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
b780a916bb vmm: lib: Add unit tests
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
16782e8c6d vmm: lib: Do the config validation in the Vmm
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Sebastien Boeuf
10676b74dc vmm: Split VM config and VM state for snapshot/restore
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).

Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-08 15:06:49 +00:00
Sebastien Boeuf
b3ca1d90e9 vmm: Stop dirty logging only if it has been started
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.

For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-03 13:33:26 +01:00
lizhaoxin1
a45e458c50 vm-migration: Add start_migration() to Migratable trait
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.

A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-03 13:33:26 +01:00
Rob Bradford
88952cc500 vmm: Send FDs across unix socket for migration when in local mode
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).

Fixes: #3566

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
1676fffaad vmm: Check shared memory is enabled for local migration
This is required so that the receiving process can access the existing
process's memory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
1daef5e8c9 vmm: Propagate the set of memory slots to FDs received in migration
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
735658a49d vm-migration: Add MemoryFd command for setting FDs for memory
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
b9c260c0de vmm, ch-remote: Add "local" option to send-migration API
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
e4763b47f1 vmm, build: Remove use of "credibility" from unit tests
This crate was used in the integration tests to allow the tests to
continue and clean up after a failure. This isn't necessary in the unit
tests and adds a large build dependency chain including an unmaintained
crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-05 12:35:50 +01:00
Rob Bradford
a29e53e436 vmm: Move KVM clock saving to common Vm::restore() method
Saving the KVM clock and restoring it is key for correct behaviour of
the VM when doing snapshot/restore or live migration. The clock is
restored to the KVM state as part of the Vm::resume() method prior to
that it must be extracted from the state object and stored for later use
by this method. This change simplifies the extraction and storage part
so that it is done in the same way for both snapshot/restore and live
migration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-12-06 11:23:16 +00:00
Wei Liu
ff0e92ab88 vmm: add a safety comment for EpollContext
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-11-17 23:12:11 +00:00
Sebastien Boeuf
c8e3c1eed6 clippy: Make sure to initialize data
Always properly initialize vectors so that we don't run in undefined
behaviors when the vector gets dropped.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-10 10:23:43 +01:00
Rob Bradford
ae83e3b383 vmm: Use PciBdf throughout in order to remove manual bit manipulation
In particular use the accessor for getting the device id from the bdf.
As a side effect the VIOT table is now segment aware.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Sebastien Boeuf
58d8206e2b migration: Use MemoryManager restore code path
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Rob Bradford
84fc0e093d vmm: Move PciSegment to new file
Move the PciSegment struct and the associated code to a new file. This
will allow some clearer separation between the core DeviceManager and
PCI handling.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-05 10:54:07 +01:00
Rob Bradford
83066cf58e vmm: Set a default maximum physical address size
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.

If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.

This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-01 08:59:15 -07:00
William Douglas
46f6d9597d vmm: Switch to using the serial_manager for serial input
This change switches from handling serial input in the VMM thread to
its own thread controlled by the SerialManager.

The motivation for this change is to avoid the VMM thread being unable
to process events while serial input is happening and vice versa.

The change also makes future work flushing the serial buffer on PTY
connections easier.

Signed-off-by: William Douglas <william.douglas@intel.com>
2021-09-17 11:15:35 +01:00
Alyssa Ross
330b5ea3be vmm: notify virtio-console of pty resizes
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize.  This is the only way to be notified
by the kernel of a pty resize.

We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in.  To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty.  The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal.  This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.

Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity.  I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.

I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process.  The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2021-09-14 15:43:25 +01:00
Rob Bradford
b6b686c71c vmm: Shutdown VMM if API thread panics
See: #3031

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-10 10:52:08 -07:00
Rob Bradford
c2144b5690 vmm, virtio-console: Move input reading into virtio-console thread
Move the processing of the input from stdin, PTY or file from the VMM
thread to the existing virtio-console thread. The handling of the resize
of a virtio-console has not changed but the name of the struct used to
support that has been renamed to reflect its usage.

Fixes: #3060

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-02 21:17:33 +01:00
Rob Bradford
d92707afc5 vmm: Introduce a SerialBuffer for buffering serial output
Introduce a dynamic buffer for storing output from the serial port. The
SerialBuffer implements std::io::Write and can be used in place of the
direct output for the serial device.

The internals of the buffer is a vector that grows dynamically based on
demand up to a fixed size at which point old data will be overwritten.
Currently the buffer is only flushed upon writes.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-02 13:52:18 +01:00
Rob Bradford
63637eba31 vmm: Simplify epoll handling for VMM main loop
Remove the indirection of a dispatch table and simply use the enum as
the event data for the events.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-31 21:30:11 +01:00
Rob Bradford
4d2a4e2805 vmm: Handle epoll events for PTYs separately
Use two separate events for the console and serial PTY and then drive
the handling of the inputs on the PTY separately. This results in the
correct behaviour when both console and serial are attached to the PTY
as they are triggered separately on the epoll so events are not lost.

Fixes: #3012

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-25 13:33:32 +01:00