AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.
On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.
This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.
The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.
Signed-off-by: William Douglas <william.douglas@intel.com>
Disable the DAX feature from the virtio-fs implementation as the feature
is still not stable. The feature is deprecated, meaning the 'dax'
parameter will be removed in about 2 releases cycles.
In the meantime, the parameter value is ignored and forced to be
disabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Replace the thread for handling SIGSYS with a simple signal handler.
This resolves inconsistent delivery of signals to the SIGSYS thread due
to other threads manipulating the signals.
Tested by removing key syscalls from vCPU and virtio device filters and
observing correct notice.
Fixes: #3811
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
Passing no boot related parameters (e.g. no --kernel) is used for e.g.
receiving a live migration or an API based boot.
marvin:~/src/cloud-hypervisor (2022-01-11-live-migration-with-fds *)$ target/debug/cloud-hypervisor --api-socket /tmp/api2
thread 'main' panicked at '`tdx` is not a name of an argument or a group.
Make sure you're using the name of the argument itself and not the name of short or long flags.', /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/clap-3.0.6/src/parse/matches/arg_matches.rs:598:14
stack backtrace:
0: rust_begin_unwind
at /rustc/7d6f948173ccb18822bab13d548c65632db5f0aa/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/7d6f948173ccb18822bab13d548c65632db5f0aa/library/core/src/panicking.rs:107:14
2: clap::parse::matches::arg_matches::ArgMatches::get_arg
at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/clap-3.0.6/src/parse/matches/arg_matches.rs:1052:17
3: clap::parse::matches::arg_matches::ArgMatches::is_present
at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/clap-3.0.6/src/parse/matches/arg_matches.rs:598:9
4: cloud_hypervisor::start_vmm
at ./src/main.rs:530:46
5: cloud_hypervisor::main
at ./src/main.rs:566:27
6: core::ops::function::FnOnce::call_once
at /rustc/7d6f948173ccb18822bab13d548c65632db5f0aa/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: unneeded late initalization
--> src/main.rs:134:5
|
134 | let mut app: App;
| ^^^^^^^^^^^^^^^^^
|
= note: `#[warn(clippy::needless_late_init)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_late_init
help: declare `app` here
|
138 | let mut app: App = App::new("cloud-hypervisor")
| ~~~~~~~~~~~~~~~~
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This crate was used in the integration tests to allow the tests to
continue and clean up after a failure. This isn't necessary in the unit
tests and adds a large build dependency chain including an unmaintained
crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extend the existing list of options available for the 'cpus' parameter
with the newly added option 'affinity'.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With the introduction of a new option `affinity` to the `cpus`
parameter, Cloud Hypervisor can now let the user choose the set
of host CPUs where to run each vCPU.
This is useful when trying to achieve CPU pinning, as well as making
sure the VM runs on a specific NUMA node.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This currently contains only the number over PCI segments to create.
This is limited to 16 at the moment which should allow 496 user specified
PCI devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.
If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.
This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The argument `prefault` is provided in MemoryManager, but it can
only be used by SGX and restore.
With prefault (MAP_POPULATE) been set, subsequent page faults will
decrease during running, although it will make boot slower.
This commit adds `prefault` in MemoryConfig and MemoryZoneConfig.
To resolve conflict between memory and restore, argument
`prefault` has been changed from `bool` to `Option<bool>`, when
its value is None, config from memory will be used, otherwise
argument in Option will be used.
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
This allows Cloud Hypervisor to be run under `perf` as some of the
signals will already be blocked in the child process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Despite setting up a dedicated thread for signal handling, we weren't
making sure that the signals we were listening for there were actually
dispatched to the right thread. While the signal-hook provides an
iterator API, so we can know that we're only processing the signals
coming out of the iterator on our signal handling thread, the actual
signal handling code from signal-hook, which pushes the signals onto
the iterator, can run on any thread. This can lead to seccomp
violations when the signal-hook signal handler does something that
isn't allowed on that thread by our seccomp policy.
To reproduce, resize a terminal running cloud-hypervisor continuously
for a few minutes. Eventually, the kernel will deliver a SIGWINCH to
a thread with a restrictive seccomp policy, and a seccomp violation
will trigger.
As part of this change, it's also necessary to allow rt_sigreturn(2)
on the signal handling thread, so signal handlers are actually allowed
to run on it. The fact that this didn't seem to be needed before
makes me think that signal handlers were almost _never_ actually
running on the signal handling thread.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This allows the user to specify devices that are running in a different
userspace process and communicated with vfio-user.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Enable specifically for the add-net command the ability to send file
descriptors along with the HTTP request. This is useful to hotplug a
macvtap interface after the VMM has already been started.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The help of arguments `memory` and `memory-zone` missing a comma.
Before adding, these parts are as follows:
> hugepage_size=<hugepage_size>hotplug_method=acpi|virtio-mem
After adding, these parts will be:
> hugepage_size=<hugepage_size>,hotplug_method=acpi|virtio-mem
Signed-off-by: Yukiteru Lee <wfly1998@sina.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To avoid race issues where the api-socket may not be created by the
time a cloud-hypervisor caller is ready to look for it, enable the
caller to pass the api-socket fd directly.
Avoid breaking current callers by allowing the --api-socket path to be
passed as it is now in addition to through the path argument.
Signed-off-by: William Douglas <william.r.douglas@gmail.com>
This allows the return of errors which will be printed using the
existing code and removes panic()s
Fixes: #2342
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the return of errors which will be printed using the
existing code and removes panic()s
Fixes: #2342
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `StartVMMThread` contains a capitalized acronym
--> src/main.rs:50:5
|
50 | StartVMMThread(#[source] vmm::Error),
| ^^^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `StartVmmThread`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: name `InvalidCPUCount` contains a capitalized acronym
--> src/bin/ch-remote.rs:24:5
|
24 | InvalidCPUCount(std::num::ParseIntError),
| ^^^^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `InvalidCpuCount`
|
= note: `#[warn(clippy::upper_case_acronyms)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the startup message containing incomplete VM configuration
details. It's a bit unusual for a tool such as Cloud Hypervisor to print
those kind of details which are a direct representation of the command
line.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Only if we have a valid API server path then create the API server. For
now this has no functional change there is a default API server path in
the clap handling but rather prepares to do so optionally.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Replace "--monitor-fd" with "--event-monitor" which can either take
"fd=<int>" or "path=<path>" which can point to e.g. a named pipe and
allow more flexibility.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.
Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.
From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).
The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.
The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).
Signed-off-by: William Douglas <william.r.douglas@gmail.com>
This allows the user to use an alternative huge page size otherwise the
default size will be used.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If we receive SIGSYS and identify it as a seccomp violation then give
friendly instructions on how to debug further. We are unable to decode
the siginfo_t struct ourselves due to https://github.com/rust-lang/libc/issues/716Fixes: #2139
Signed-off-by: Rob Bradford <robert.bradford@intel.com>