Commit Graph

483 Commits

Author SHA1 Message Date
Songqian Li
33c15ca273 vmm: remove pub use vm_config in config
This patch removes pub import vm_config in config.rs to eliminate
the ambiguity of vm_comfig reference.

Signed-off-by: Songqian Li <sionli@tencent.com>
2024-09-30 08:18:02 +00:00
Ruoqing He
61e57e1cb1 misc: Further improve imports styling
By introducing `imports_granularity="Module"` format strategy,
effectively groups imports from the same module into one line or block,
improving maintainability and readability.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-09-29 16:13:48 +00:00
Rob Bradford
88a9f79944 misc: Adapt consistent import style formatting
Historically the Cloud Hypervisor coding style has been to ensure that
all imports are ordered and placed in a single group. Unfortunately
cargo fmt has no support for ensuring that all imports are in a single
group so if whitespace lines were added as part of the import statements
then they would only be odered correctly in the group.

By adopting "group_imports="StdExternalCrate" we can enforce a style
where imports are placed in at most three groups for std, external
crates and the crate itself. Choosing a style enforceable by the tooling
reduces the reviewer burden.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-09-29 13:08:12 +01:00
Songqian Li
cc9899e09d vmm: remove unused mutex in api
This patch removes locks in VmCreate request and VmInfo response
since we needn't use a lock here and should ensure that internal
implementation is transparent to the runtime.

Signed-off-by: Songqian Li <sionli@tencent.com>
2024-09-28 14:02:04 +00:00
Yuanchu Xie
5f18ac3bc0 devices: Add pvmemcontrol device
Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.

Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.

There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.

The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.

The Cloud Hypervisor component can be enabled with --pvmemcontrol.

Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
2024-08-05 22:41:56 +00:00
Praveen K Paladugu
bd180bc3eb main: rename landlock_config to landlock_rules
To keep the naming consistent, rename all uses of landlock_config
to landlock_rules.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-08-05 17:46:30 +00:00
Praveen K Paladugu
d2f0e8aebb Revert "vmm: make landlock configs VMM-level config"
This reverts commit 94929889ac.
This revert moves landlock config back to VMConfig.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-08-05 17:46:30 +00:00
Wei Liu
94929889ac vmm: make landlock configs VMM-level config
This requires stashing the config values in `struct Vmm`. The configs
should be validated before before creating the VMM thread. Refactor the
code and update documentation where necessary.

The place where the rules are applied remain the same.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
11c17ca319 main: Enable landlock on main thread
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
130c988380 vmm: Enable Landlock on signal-handler thread
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
8c76a3e4b5 vmm: Enable Landlock on event-monitor thread
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
1d89f98edf vmm: Introduce landlock-rules cmdline param
Users can use this parameter to pass extra paths that 'vmm' and its
child threads can use at runtime. Hotplug is the primary usecase for
this parameter.

In order to hotplug devices that use local files: disks, memory zones,
pmem devices etc, users can use this option to pass the path/s that will
be used during hotplug while starting cloud-hypervisor. Doing this will
allow landlock to add required rules to grant access to these paths when
cloud-hypervisor process starts.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
287dbd4fc9 vmm: Introduce landlock cmdline parameter
Users can use this cmdline option to enable/disable Landlock based
sandboxing while running cloud-hypervisor.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-07-06 04:42:58 +00:00
Josh Soref
42e9632c53 misc: Fix spelling issues
Misspellings were identified by:
  https://github.com/marketplace/actions/check-spelling

* Initial corrections based on forbidden patterns from the action
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
* Adding markdown bullets to readme credits section

Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
2024-06-08 16:31:30 +00:00
Alexandru Matei
091ce85473 main: update expand_fdtable comment
Updated the comment so it is sync with the code

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-05-28 14:50:40 +01:00
Alexandru Matei
f13d8f1412 main: fix high latency generated by file handle creation
Whenever the file descriptor table is full, Linux expands it by doubling
it's size.
The filesystem code that does this uses RCU synchronization to ensure
all pre-existing RCU read-side critical sections have completed. The
latency induced by this synchronization is a big part of the total time
required to restore a snapshot.

The kernel has an optimization in code, where it doesn't call
synchronize_rcu() if there is only one thread in the process. We can
take advantage of this optimization by expanding the descriptor table at
the application start, when it has only one thread.

This commit tries to expand the table to 4096 entries, this way we avoid
any expansion that could take place later.

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-05-25 01:09:10 +00:00
Omer Faruk Bayram
036e7e3797 vmm: ch-remote: replace deprecated zbus macros with new equivalents
Fixes deprecation related warnings introduced in #6400.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2024-05-23 12:20:06 +00:00
Purna Pavan Chandra
555c4c41ab ch-remote: allow fds to be sent along with 'restore'
Enable restore command the ability to send file descriptors along with
HTTP request. This is useful when restoring a VM with explicit FDs
passed to NetConfig(s).

Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
2024-05-14 10:52:46 +00:00
Yi Wang
4fd5070f5d ch-remote: fix help of remove-device
remove-device can remove not only VFIO device but also pci device.

No functional change.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2024-05-09 14:34:30 +00:00
Thomas Barrett
e7e856d8ac vmm: add pci_segment mmio aperture configs
When using multiple PCI segments, the 32-bit and 64-bit mmio
aperture is split equally between each segment. Add an option
to configure the 'weight'. For example, a PCI segment with a
`mmio32_aperture_weight` of 2 will be allocated twice as much
32-bit mmio space as a normal PCI segment.

Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2024-04-24 09:35:19 +00:00
Yi Wang
1708561c74 ch-remote: add support for nmi
Adding the wrapping layer to be able to trigger NMI for the guest
from the ch-remote tool.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2024-03-04 10:02:38 +00:00
Alexandru Matei
1091494320 vmm: http: graceful shutdown of the http api thread
This commit ensures that the HttpApi thread flushes all the responses
before the application shuts down. Without this step, in case of a
VmmShutdown request the application might terminate before the
thread sends a response.

Fixes: #6247

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-02-29 12:34:30 +00:00
Chris Webb
0310c5726f main: Show help text when run without arguments
cloud-hypervisor, ch-remote, vhost-user-block and vhost-user-net all
need at least one argument to do anything useful, so printing command
help is helpful when they are run without arguments or a subcommand.

Use clap::Command::arg_required_else_help(true) to do this.

Signed-off-by: Chris Webb <chris@arachsys.com>
2024-02-24 09:35:37 +00:00
Chris Webb
5627c26405 ch-remote: Fix crash when run with no subcommand
ch-remote crashes when run with --api-socket but no subcommand:

  $ target/release/ch-remote --api-socket /tmp/api
  thread 'main' panicked at src/bin/ch-remote.rs:509:14:
  internal error: entered unreachable code

Use clap::Command::subcommand_required(true) to yield a more friendly
error in this case.

Signed-off-by: Chris Webb <chris@arachsys.com>
2024-02-24 09:35:37 +00:00
Muminul Islam
aa6c486a6b vmm: add host-data as a command line argument
The host data provided at launch. Data is passed
to the hypervisor during the completion of the
isolated import.

Host Data provided by the hypervisor during guest launch.
The firmware includes this value in all attestation
reports for the guest.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2024-02-23 13:32:56 -08:00
Rob Bradford
adb318f4cd misc: Remove redundant "use" imports
With the nightly toolchain (2024-02-18) cargo check will flag up
redundant imports either because they are pulled in by the prelude on
earlier match.

Remove those redundant imports.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-02-19 17:54:30 +00:00
Chris Webb
09f3658999 vmm: Avoid zombie sigwinch_listener processes
When a guest running on a terminal reboots, the sigwinch_listener
subprocess exits and a new one restarts. The parent never wait()s
for children, so the old subprocess remains as a zombie. With further
reboots, more and more zombies build up.

As there are no other children for which we want the exit status,
the easiest fix is to take advantage of the implicit reaping specified
by POSIX when we set the disposition of SIGCHLD to SIG_IGN.

For this to work, we also need to set the correct default exit signal
of SIGCHLD when using clone3() CLONE_CLEAR_SIGHAND. Unlike the fallback
fork() path, clone_args::default() initialises the exit signal to zero,
which results in a child with non-standard reaping behaviour.

Signed-off-by: Chris Webb <chris@arachsys.com>
2024-02-19 17:08:47 +00:00
Muminul Islam
56dbb8f0db main: Support igvm as a payload
Currently kernel and firmware are checked as a payload.
IGVM should be checked as well. Otherwise, it hangs indefinitely.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2024-02-08 09:46:45 -08:00
Bo Chen
c1f4a7b295 main: Clarify truncate behavior for event monitor file
Fix beta clippy issue:

error: file opened with `create`, but `truncate` behavior not defined
   --> src/main.rs:624:26
    |
624 |                         .create(true)
    |                          ^^^^^^^^^^^^- help: add: `.truncate(true)`
    |
    = help: if you intend to overwrite an existing file entirely, call `.truncate(true)`
    = help: if you instead know that you may want to keep some parts of the old file, call `.truncate(false)`
    = help: alternatively, use `.append(true)` to append to the file instead of overwriting it
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#suspicious_open_options
    = note: `-D clippy::suspicious-open-options` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::suspicious_open_options)]`

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-02-07 09:25:40 +00:00
Philipp Schuster
e50a641126 devices: add debug-console device
This commit adds the debug-console (or debugcon) device to CHV. It is a
very simple device on I/O port 0xe9 supported by QEMU and BOCHS. It is
meant for printing information as easy as possible, without any
necessary configuration from the guest at all.

It is primarily interesting to OS/kernel and firmware developers as they
can produce output as soon as the guest starts without any configuration
of a serial device or similar. Furthermore, a kernel hacker might use
this device for information of type B whereas information of type A are
printed to the serial device.

This device is not used by default by Linux, Windows, or any other
"real" OS, but only by toy kernels and during firmware development.

In the CLI, it can be configured similar to --console or --serial with
the --debug-console parameter.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2024-01-25 10:25:14 -08:00
Alyssa Ross
7674196113 vmm: remove Default impls for config
These Default implementations either don't produce valid configs, are
no longer used outside of tests, or both.

For the tests, we can define our own local "default" values that make
the most sense for the tests, without worrying about what's
a (somewhat) sensible "global" default value.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-23 12:44:44 +00:00
Alyssa Ross
4ca18c082e vmm: use trait objects for API actions
Uses of the old ApiRequest enum conflated two different concerns:
identifying an API request endpoint, and storing data for an API
request.  This led to ApiRequest values being passed around with junk
data just to communicate a request type, which forced all API request
body types to implement Default, which in some cases doesn't make any
sense — what's the "default" path for a vhost-user socket?  The
nonsensical Default values have led to tests relying on being able to
use nonsensical data, which is an impediment to adding better
validation for these types.

Rather than having API request types be represented by an enum, which
has to carry associated body data everywhere it's used, it makes more
sense to represent API request types as trait objects.  These can have
an associated type for the type of the request body, and this makes it
possible to pass API request types and data around as siblings in a
type-safe way without forcing them into a single value even where it
doesn't make sense.  Trait objects also give us dynamic dispatch,
which lets us get rid of several large match blocks.

To keep it possible to fuzz the HTTP API, all the Vmm methods called
by the HTTP API are pulled out into a trait, so the fuzzer can provide
its own stub implementation of the VMM.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-17 10:20:02 +00:00
Thomas Barrett
c297d8d796 vmm: use RateLimiterGroup for virtio-blk devices
Add a 'rate_limit_groups' field to VmConfig that defines a set of
named RateLimiterGroups.

When the 'rate_limit_group' field of DiskConfig is defined, all
virtio-blk queues will be rate-limited by a shared RateLimiterGroup.
The lifecycle of all RateLimiterGroups is tied to the Vm.
A RateLimiterGroup may exist even if no Disks are configured to use
the RateLimiterGroup. Disks may be hot-added or hot-removed from the
RateLimiterGroup.

When the 'rate_limiter' field of DiskConfig is defined, we construct
an anonymous RateLimiterGroup whose lifecycle is tied to the Disk.
This is primarily done for api backwards compatability. Importantly,
the behavior is not the same! This implementation rate_limits the
aggregate bandwidth / iops of an individual disk rather than the
bandwidth / iops of an individual queue of a disk.

When neither the 'rate_limit_group' or the 'rate_limiter' fields of
DiskConfig is defined, the Disk is not rate-limited.

Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2024-01-03 10:21:06 -08:00
Jinank Jain
42477207e2 misc: Add check for sev_snp and target_arch
Feature 'sev_snp' can only be enabled when the target_arch is 'x86_64'

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2023-12-18 08:55:43 -08:00
Muminul Islam
13ef424bf1 vmm: Add IGVM to the config/commandline
This patch adds igvm to the Vm config and params as well as
the command line argument to pass igvm file to load into
guest memory. The file must maintain the IGVM format.
The CLI option is featured guarded by igvm feature gate.

The IGVM(Independent Guest Virtual Machine) file format
is designed to encapsulate all information required to
launch a virtual machine on any given virtualization stack,
with support for different isolation technologies such as
AMD SEV-SNP and Intel TDX.

At a conceptual level, this file format is a set of commands created
by the tool that generated the file, used by the loader to construct
the initial guest state. The file format also contains measurement
information that the underlying platform will use to confirm that
the file was loaded correctly and signed by the appropriate authorities.

The IGVM file is generated by the tool:
https://github.com/microsoft/igvm-tooling

The IGVM file is parsed by the following crates:
https://github.com/microsoft/igvm

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2023-12-08 09:22:42 -08:00
Bo Chen
a4d83ce9c5 main: Add the '--serial socket=' option help information
See: #5708

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-11-02 08:15:41 +00:00
Ravi kumar Veeramally
d1f337aef1 ch-remote: switch to clap
Porting back using clap crate

Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
2023-10-20 11:44:28 -07:00
Wei Liu
7bc3452139 main: switch command parsing to use clap
Partially revert 111225a2a5
and add the new dbus and pvpanic arguments.

As we are switching back to clap observe the following changes.

A few examples:

1. `-v -v -v` needs to be written as`-vvv`
2. `--disk D1 --disk D2` and others need to be written as `--disk D1 D2`.
3. `--option value` needs to be written as `--option=value.`

Change integration tests to adapt to the breaking changes.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
2023-10-20 11:44:28 -07:00
Praveen K Paladugu
6d1077fc3c vmm: Unix socket backend for serial port
Cloud-Hypervisor takes a path for Unix socket, where it will listen
on. Users can connect to the other end of the socket and access serial
port on the guest.

    "--serial socket=/path/to/socket" is the cmdline option to pass to
cloud-hypervisor.

Users can use socat like below to access guest's serial port once the
guest starts to boot:

    socat -,crnl UNIX-CONNECT:/path/to/socket

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-10-05 15:26:29 +01:00
Bo Chen
1e01b5eabc main: Report enabled features from CLI with "--version -v"
Fixes: #5817

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-10-05 08:40:50 +01:00
Jinank Jain
1b9ce69afa src: Add compile time check for SNP and TDX
We need to make sure that SEV-SNP and TDX are not enabled at the same
time. As these two features belong to mutually exclusive hardware
vendors. So, we should make sure that these two features are not enabled
at the same. Thus, a compile time check for it.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2023-09-07 12:52:27 +01:00
Omer Faruk Bayram
2ed96cd3ed vmm: dbus: broadcast event_monitor events over the DBus API
This commit builds on top of the `Monitor::subscribe` function and
makes it possible to broadcast events published from `event-monitor`
over D-Bus.

The broadcasting functionality is enabled if the D-Bus API is enabled
and users who wish to also enable the file based `event-monitor` can do
so with the CLI arg `--event-monitor`.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-28 17:01:03 -07:00
Rob Bradford
1dd1850747 ch-remote: dbus: Remove unnecessary mut from reference
warning: this argument is a mutable reference, but not used mutably
   --> src/bin/ch-remote.rs:397:52
    |
397 | fn dbus_api_do_command(toplevel: &TopLevel, proxy: &mut DBusApi1ProxyBlocking<'_>) -> ApiResult {
    |                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider changing to: `&DBusApi1ProxyBlocking<'_>`
    |
    = note: this is cfg-gated and may require further changes
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_ref_mut
    = note: `#[warn(clippy::needless_pass_by_ref_mut)]` on by default

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-22 12:01:54 +01:00
Omer Faruk Bayram
a0c8bf4f9f vmm: seccomp: implement seccomp filtering for the event-monitor thread
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-09 17:22:25 +01:00
Omer Faruk Bayram
02e1c54426 event_monitor: refactor the implementation to support concurrent access
This patch modifies `event_monitor` to ensure that concurrent access to
`event_log` from multiple threads is safe. Previously, the `event_log`
function would acquire a reference to the event log file and write
to it without doing any synchronization, which made it prone to
data races. This issue likely went under the radar because the
relevant `SAFETY` comment on the unsafe block was incomplete.

The new implementation spawns a dedicated thread named `event-monitor`
solely for writing to the file. It uses the MPMC channel exposed by
`flume` to pass messages to the `event-monitor` thread. Since
`flume::Sender<T>` implements `Sync`, it is safe for multiple threads
to share it and send messages to the `event-monitor` thread.
This is not possible with `std::sync::mpsc::Sender<T>` since it's
`!Sync`, meaning it is not safe for it to be shared between different
threads.

The `event_monitor::set_monitor` function now only initializes
the required global state and returns an instance of the
`Monitor` struct. This decouples the actual logging logic from the
`event_monitor` crate. The `event-monitor` thread is then spawned by
the `vmm` crate.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-08-09 17:22:25 +01:00
Yu Li
f03c3b737f main: add missing comma in for net param
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-14 09:36:27 -07:00
Yu Li
d0dbc7fb4d build: Fix beta clippy issue (useless_vec)
warning: useless use of `vec!`
   --> test_infra/src/lib.rs:111:30
    |
111 |             let mut events = vec![epoll::Event::new(epoll::Events::empty(), 0); 1];
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: you can use an array directly: `[epoll::Event::new(epoll::Events::empty(), 0); 1]`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_vec
    = note: `#[warn(clippy::useless_vec)]` on by default

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-13 08:16:30 -07:00
Yi Wang
d99c0c0d1d devices: pvpanic: add method for DeviceManager
Add method for DeviceManager to invoke.

Signed-off-by: Yi Wang <foxywang@tencent.com>
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-07-06 11:14:54 +01:00
Yu Li
499989fb17 logger: use write with \r\n instead of writeln
The device manager will set tty or pty to raw mode, all the `\n` will
be LF without CR, which makes the output difficult to read.

This commit solves it by using `write` with `\r\n` instead of
`writeln`, which can print CR and LF explicitly.

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-06-16 14:15:03 -07:00
Omer Faruk Bayram
a64d27f841 ch-remote: full support for calling the D-Bus API
Support calling into the D-Bus API in a non-breaking way.

Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
2023-06-06 10:18:26 -07:00