73 Commits

Author SHA1 Message Date
Ruoqing He
6164aa0885 misc: Replace div_round_up operation with div_ceil
As clippy of rust-toolchain version 1.83.0-beta.1 suggests, replace
manually implemented `div_round_up!` and the like with `div_ceil` from
std.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-10-18 17:46:39 +00:00
Songqian Li
33c15ca273 vmm: remove pub use vm_config in config
This patch removes pub import vm_config in config.rs to eliminate
the ambiguity of vm_comfig reference.

Signed-off-by: Songqian Li <sionli@tencent.com>
2024-09-30 08:18:02 +00:00
Ruoqing He
61e57e1cb1 misc: Further improve imports styling
By introducing `imports_granularity="Module"` format strategy,
effectively groups imports from the same module into one line or block,
improving maintainability and readability.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-09-29 16:13:48 +00:00
Rob Bradford
f041c940a7 build: Apply cargo fmt check to fuzz workspace
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-09-29 13:08:12 +01:00
Songqian Li
cc9899e09d vmm: remove unused mutex in api
This patch removes locks in VmCreate request and VmInfo response
since we needn't use a lock here and should ensure that internal
implementation is transparent to the runtime.

Signed-off-by: Songqian Li <sionli@tencent.com>
2024-09-28 14:02:04 +00:00
Ruoqing He
f436231cba fuzz: Wrap params of FilePair with Arc
The construction of `FilePair` in `virtio_devices` component has changed
in 287887c, wrapping the parameters with `Arc` to fix fuzz build.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2024-09-27 15:58:21 +00:00
Jinank Jain
2049b2c377 fuzz: Fix rustfmt warning
Imports were not sorted lexicographically, thus causing rustfmt warning.

Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
2024-09-02 14:22:22 +00:00
Yuanchu Xie
5f18ac3bc0 devices: Add pvmemcontrol device
Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.

Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.

There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.

The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.

The Cloud Hypervisor component can be enabled with --pvmemcontrol.

Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
2024-08-05 22:41:56 +00:00
Praveen K Paladugu
bd180bc3eb main: rename landlock_config to landlock_rules
To keep the naming consistent, rename all uses of landlock_config
to landlock_rules.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-08-05 17:46:30 +00:00
Praveen K Paladugu
d2f0e8aebb Revert "vmm: make landlock configs VMM-level config"
This reverts commit 94929889ac2489015d60ca1480a5d412e2792c57.
This revert moves landlock config back to VMConfig.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-08-05 17:46:30 +00:00
Wei Liu
94929889ac vmm: make landlock configs VMM-level config
This requires stashing the config values in `struct Vmm`. The configs
should be validated before before creating the VMM thread. Refactor the
code and update documentation where necessary.

The place where the rules are applied remain the same.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
1d89f98edf vmm: Introduce landlock-rules cmdline param
Users can use this parameter to pass extra paths that 'vmm' and its
child threads can use at runtime. Hotplug is the primary usecase for
this parameter.

In order to hotplug devices that use local files: disks, memory zones,
pmem devices etc, users can use this option to pass the path/s that will
be used during hotplug while starting cloud-hypervisor. Doing this will
allow landlock to add required rules to grant access to these paths when
cloud-hypervisor process starts.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2024-07-06 04:42:58 +00:00
Praveen K Paladugu
287dbd4fc9 vmm: Introduce landlock cmdline parameter
Users can use this cmdline option to enable/disable Landlock based
sandboxing while running cloud-hypervisor.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2024-07-06 04:42:58 +00:00
Rob Bradford
3def18f502 fuzz: Fix use of "guest_debug" conditional code
Enable the use of the vmm crate with the "guest_debug" feature and make
the code that exercises that in the fuzzer unconditional on
"guest_debug" as a feature (as that is not specified as a feature in the
fuzz workspace itself.)

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2024-05-08 08:10:28 +00:00
Thomas Barrett
e7e856d8ac vmm: add pci_segment mmio aperture configs
When using multiple PCI segments, the 32-bit and 64-bit mmio
aperture is split equally between each segment. Add an option
to configure the 'weight'. For example, a PCI segment with a
`mmio32_aperture_weight` of 2 will be allocated twice as much
32-bit mmio space as a normal PCI segment.

Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2024-04-24 09:35:19 +00:00
Ruslan Mstoi
5e9886bba4 build: add REUSE Compliance Check
In accordance with reuse requirements:
- Place each license file in the LICENSES/ directory
- Add missing SPDX-License-Identifier to files.
- Add .reuse/dep5 to bulk-license files

Fixes: #5887

Signed-off-by: Ruslan Mstoi <ruslan.mstoi@intel.com>
2024-04-19 17:35:45 +00:00
Yi Wang
f40dd4a993 vmm: add endpoint api for NMI support
Add http endpoint for trigger nmi.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2024-03-04 10:02:38 +00:00
acarp
035c4b20fb block: Set an option to pin virtio block threads to host cpus
Currently the only way to set the affinity for virtio block threads is
to boot the VM, search for the tid of each of the virtio block threads,
then set the affinity manually. This commit adds an option to pin virtio
block queues to specific host cpus (similar to pinning vcpus to host
cpus). A queue_affinity option has been added to the disk flag in
the cli to specify a mapping of queue indices to host cpus.

Signed-off-by: acarp <acarp@crusoeenergy.com>
2024-02-13 09:05:57 +00:00
Philipp Schuster
e50a641126 devices: add debug-console device
This commit adds the debug-console (or debugcon) device to CHV. It is a
very simple device on I/O port 0xe9 supported by QEMU and BOCHS. It is
meant for printing information as easy as possible, without any
necessary configuration from the guest at all.

It is primarily interesting to OS/kernel and firmware developers as they
can produce output as soon as the guest starts without any configuration
of a serial device or similar. Furthermore, a kernel hacker might use
this device for information of type B whereas information of type A are
printed to the serial device.

This device is not used by default by Linux, Windows, or any other
"real" OS, but only by toy kernels and during firmware development.

In the CLI, it can be configured similar to --console or --serial with
the --debug-console parameter.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2024-01-25 10:25:14 -08:00
Alyssa Ross
7674196113 vmm: remove Default impls for config
These Default implementations either don't produce valid configs, are
no longer used outside of tests, or both.

For the tests, we can define our own local "default" values that make
the most sense for the tests, without worrying about what's
a (somewhat) sensible "global" default value.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-23 12:44:44 +00:00
Alyssa Ross
4ca18c082e vmm: use trait objects for API actions
Uses of the old ApiRequest enum conflated two different concerns:
identifying an API request endpoint, and storing data for an API
request.  This led to ApiRequest values being passed around with junk
data just to communicate a request type, which forced all API request
body types to implement Default, which in some cases doesn't make any
sense — what's the "default" path for a vhost-user socket?  The
nonsensical Default values have led to tests relying on being able to
use nonsensical data, which is an impediment to adding better
validation for these types.

Rather than having API request types be represented by an enum, which
has to carry associated body data everywhere it's used, it makes more
sense to represent API request types as trait objects.  These can have
an associated type for the type of the request body, and this makes it
possible to pass API request types and data around as siblings in a
type-safe way without forcing them into a single value even where it
doesn't make sense.  Trait objects also give us dynamic dispatch,
which lets us get rid of several large match blocks.

To keep it possible to fuzz the HTTP API, all the Vmm methods called
by the HTTP API are pulled out into a trait, so the fuzzer can provide
its own stub implementation of the VMM.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-17 10:20:02 +00:00
Alyssa Ross
9d5dfa879b fuzz: fix unused import warnings
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-08 17:39:05 +00:00
Muminul Islam
13ef424bf1 vmm: Add IGVM to the config/commandline
This patch adds igvm to the Vm config and params as well as
the command line argument to pass igvm file to load into
guest memory. The file must maintain the IGVM format.
The CLI option is featured guarded by igvm feature gate.

The IGVM(Independent Guest Virtual Machine) file format
is designed to encapsulate all information required to
launch a virtual machine on any given virtualization stack,
with support for different isolation technologies such as
AMD SEV-SNP and Intel TDX.

At a conceptual level, this file format is a set of commands created
by the tool that generated the file, used by the loader to construct
the initial guest state. The file format also contains measurement
information that the underlying platform will use to confirm that
the file was loaded correctly and signed by the appropriate authorities.

The IGVM file is generated by the tool:
https://github.com/microsoft/igvm-tooling

The IGVM file is parsed by the following crates:
https://github.com/microsoft/igvm

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2023-12-08 09:22:42 -08:00
Thomas Barrett
c4e8e653ac block: Add support for user specified ID_SERIAL
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-09-11 12:50:41 +01:00
Philipp Schuster
7bf0cc1ed5 misc: Fix various spelling errors using typos
This fixes all typos found by the typos utility with respect to the config file.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2023-09-09 10:46:21 +01:00
Rob Bradford
a00d29867c fuzz, vmm: Avoid infinite loop in CMOS fuzzer
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.

Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-07 08:04:55 +08:00
Rob Bradford
06dc708515 vmm: Only return from reset driven I/O once event received
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.

This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.

Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-04 09:57:25 +08:00
Yong He
0149e65081 vm-device: support batch update interrupt source group GSI
Split interrupt source group restore into two steps, first restore
the irqfd for each interrupt source entry, and second restore the
GSI routing of the entire interrupt source group.

This patch will reduce restore latency of interrupt source group,
and in a 200-concurrent restore test, the patch reduced the
average IOAPIC restore time from 15ms to 1ms.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-08-03 15:58:36 +01:00
Yu Li
447cad3861 block: merge qcow, vhdx and block_util into block crate
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.

This commit is based on crosvm implementation:
f2eecc4152

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-19 13:52:43 +01:00
Yong He
3494080e2f vmm: add configuration for network offloading features
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.

Fixes: #4792.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-01-12 09:05:45 +00:00
Bo Chen
51307dd509 fuzz: Add fuzzer for 'linux loader' cmdline
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-12-12 13:50:28 +00:00
Bo Chen
32ded2c72b fuzz: Add fuzzer for 'linux loader'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-12-12 13:50:28 +00:00
Bo Chen
e2e02c8f69 fuzz: Add fuzzer for virtio-net
To synthesize the interactions between the virtio-net device and the tap
interface, the fuzzer utilizes a pair of unix domain sockets: one socket
(e.g. the dummy tap frontend) is used to construct the 'net_util::Tap'
instance for creating a virtio-net device; the other socket (e.g. the
dummy tap backend) is used in a epoll loop for handling the tx and rx
requests from the virtio-net device.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-30 12:13:14 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Rob Bradford
f603afc46e vmm: Make Transparent Huge Pages controllable (default on)
Add MemoryConfig::thp and `--memory thp=on|off` to allow control of
Transparent Huge Pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Bo Chen
ef8fb9bd25 fuzz: Add fuzzer for virtio-console
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-03 09:10:41 -07:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Bo Chen
802f489e4d fuzz: Add fuzzer for virtio-iommu
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-21 14:21:42 +01:00
Bo Chen
7b31871a36 fuzz: mem: Avoid using hugepages
The kernel will trigger a SIGBUS upon hugetlb page faults when there is
no huge pages available. We neither have a way to ensure enough huge
pages available on the host system, nor have a way to gracefully report
the lack of huge pages in advance from Cloud Hypervisor. For these
reasons, we have to avoid using huge pages from the virtio-mem fuzzer to
avoid SIGBUS errors.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-18 09:09:35 +01:00
Bo Chen
342851c88c fuzz: Add fuzzer for virtio-mem
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-12 16:09:27 +01:00
Bo Chen
4fea40f008 fuzz: Balloon: Reduce the guest memory size and queue size
With the guest memory size of 1MB, a valid descriptor size can be close
to the guest memory size (e.g. 1MB) and can contain close to 256k
valid pfn entries (each entry is 4 bytes). Multiplying the queue
size (e.g. 256), there can be close to 64 millions pfn entries to
process in a single request. This is why the oss-fuzz reported a
timeout (with a limit of 60s).

By reducing the guest memory size and the queue size, the worst-case now
is 8 million pfn entries for fuzzing, which can be finished in around 20
seconds according to my local experiment.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-04 09:40:12 +01:00
Bo Chen
ef603fde4c fuzz: Reduce the guest memory size for balloon fuzzer
As the virt queues are initialized with random bytes from the fuzzing
engine, a descriptor buffer for the available ring can have a very large
length (e.g. up to 4GB). This means there can be up to 1 billion
entries (e.g. page frame number) for virtio-balloon to process a signal
available descriptor (given each entry is 4 bytes). This is the reason
why oss-fuzz reported a hanging issue for this fuzzer, where the
generated descriptor buffer length is 4,278,321,152.

We can avoid this kind of long execution by reducing the size of guest
memory. For example, with 1MB of guest memory, the number of descriptor
entries for processing is limited ~256K.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-23 08:28:07 +01:00
Rob Bradford
194b59f44b fuzz: Don't overload meaning of reset()
This function is for really for the transport layer to trigger a device
reset. Instead name it appropriately for the fuzzing specific use case.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-22 11:01:41 -07:00
Bo Chen
ab0b3f1b7b fuzz: Add fuzzer for virtio-balloon
The fuzzer exercises the inflate, deflate and reporting events of
virtio-balloon via creating three queues and kicking three events.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
e1b483fc88 fuzz: Add fuzzer for virtio-rng
To make the fuzzer faster and more effective, the guest memory is
setup with a much smaller size (comparing with other virtio device
fuzzers) and  a hole between the memory for holding virtio queue and
the rest of guest data. It brings two benefits: 1) avoid writing large
chunk of data from 'urandom' into the available descriptor chain (which
makes the fuzzer faster); 2) reduce substantial amount of overwrites to
the virtio queue data by the data from 'urandom (which makes the fuzzer
more deterministic and hence effective).

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
f815fcbb5d fuzz: Add fuzzer for virtio-watchdog
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
91b8b00f95 fuzz: Add fuzzer for virtio-pmem
The fuzzer is focusing on the virtio-pmem code that processes guest
inputs (e.g. virt queues). Given 'flush' is the only virtio-pmem
request, the fuzzer is essentially testing the code for parsing and
error handling.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-15 09:08:11 +01:00
Bo Chen
67a89f4538 fuzz: Setup virt queue with proper addresses
To make the fuzzers more focused and more efficient, we now provide
default addresses for the descriptor table, available ring, and used
ring, which ensures the virt-queue has a valid memory layout (e.g. no
overlapping between descriptor tables, available ring, and used ring).

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-15 09:08:11 +01:00
Bo Chen
742d6858f7 fuzz: block: Setup the virt queue based on the fuzzed input bytes
Instead of always fuzzing virt-queues with default values (mostly 0s),
the fuzzer now initializes the virt-queue based on the fuzzed input
bytes, such as the tail position of the available ring, queue size
selected by driver, descriptor table address, available ring address,
used ring address, etc. In this way, the fuzzer can explore the
virtio-block code path with various virt-queue setup.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-01 08:39:28 +02:00
Bo Chen
6cb214f15c fuzz: block: Rely on custom EpollHelper::run and VirtioCommon:reset
This commit also extends the copyright header.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-30 14:01:33 -07:00