57 Commits

Author SHA1 Message Date
Yi Wang
f40dd4a993 vmm: add endpoint api for NMI support
Add http endpoint for trigger nmi.

Signed-off-by: Yi Wang <foxywang@tencent.com>
2024-03-04 10:02:38 +00:00
acarp
035c4b20fb block: Set an option to pin virtio block threads to host cpus
Currently the only way to set the affinity for virtio block threads is
to boot the VM, search for the tid of each of the virtio block threads,
then set the affinity manually. This commit adds an option to pin virtio
block queues to specific host cpus (similar to pinning vcpus to host
cpus). A queue_affinity option has been added to the disk flag in
the cli to specify a mapping of queue indices to host cpus.

Signed-off-by: acarp <acarp@crusoeenergy.com>
2024-02-13 09:05:57 +00:00
Philipp Schuster
e50a641126 devices: add debug-console device
This commit adds the debug-console (or debugcon) device to CHV. It is a
very simple device on I/O port 0xe9 supported by QEMU and BOCHS. It is
meant for printing information as easy as possible, without any
necessary configuration from the guest at all.

It is primarily interesting to OS/kernel and firmware developers as they
can produce output as soon as the guest starts without any configuration
of a serial device or similar. Furthermore, a kernel hacker might use
this device for information of type B whereas information of type A are
printed to the serial device.

This device is not used by default by Linux, Windows, or any other
"real" OS, but only by toy kernels and during firmware development.

In the CLI, it can be configured similar to --console or --serial with
the --debug-console parameter.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2024-01-25 10:25:14 -08:00
Alyssa Ross
7674196113 vmm: remove Default impls for config
These Default implementations either don't produce valid configs, are
no longer used outside of tests, or both.

For the tests, we can define our own local "default" values that make
the most sense for the tests, without worrying about what's
a (somewhat) sensible "global" default value.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-23 12:44:44 +00:00
Alyssa Ross
4ca18c082e vmm: use trait objects for API actions
Uses of the old ApiRequest enum conflated two different concerns:
identifying an API request endpoint, and storing data for an API
request.  This led to ApiRequest values being passed around with junk
data just to communicate a request type, which forced all API request
body types to implement Default, which in some cases doesn't make any
sense — what's the "default" path for a vhost-user socket?  The
nonsensical Default values have led to tests relying on being able to
use nonsensical data, which is an impediment to adding better
validation for these types.

Rather than having API request types be represented by an enum, which
has to carry associated body data everywhere it's used, it makes more
sense to represent API request types as trait objects.  These can have
an associated type for the type of the request body, and this makes it
possible to pass API request types and data around as siblings in a
type-safe way without forcing them into a single value even where it
doesn't make sense.  Trait objects also give us dynamic dispatch,
which lets us get rid of several large match blocks.

To keep it possible to fuzz the HTTP API, all the Vmm methods called
by the HTTP API are pulled out into a trait, so the fuzzer can provide
its own stub implementation of the VMM.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-17 10:20:02 +00:00
Alyssa Ross
9d5dfa879b fuzz: fix unused import warnings
Signed-off-by: Alyssa Ross <hi@alyssa.is>
2024-01-08 17:39:05 +00:00
Muminul Islam
13ef424bf1 vmm: Add IGVM to the config/commandline
This patch adds igvm to the Vm config and params as well as
the command line argument to pass igvm file to load into
guest memory. The file must maintain the IGVM format.
The CLI option is featured guarded by igvm feature gate.

The IGVM(Independent Guest Virtual Machine) file format
is designed to encapsulate all information required to
launch a virtual machine on any given virtualization stack,
with support for different isolation technologies such as
AMD SEV-SNP and Intel TDX.

At a conceptual level, this file format is a set of commands created
by the tool that generated the file, used by the loader to construct
the initial guest state. The file format also contains measurement
information that the underlying platform will use to confirm that
the file was loaded correctly and signed by the appropriate authorities.

The IGVM file is generated by the tool:
https://github.com/microsoft/igvm-tooling

The IGVM file is parsed by the following crates:
https://github.com/microsoft/igvm

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2023-12-08 09:22:42 -08:00
Thomas Barrett
c4e8e653ac block: Add support for user specified ID_SERIAL
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
2023-09-11 12:50:41 +01:00
Philipp Schuster
7bf0cc1ed5 misc: Fix various spelling errors using typos
This fixes all typos found by the typos utility with respect to the config file.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
2023-09-09 10:46:21 +01:00
Rob Bradford
a00d29867c fuzz, vmm: Avoid infinite loop in CMOS fuzzer
With the addition of the spinning waiting for the exit event to be
received in the CMOS device a regression was introduced into the CMOS
fuzzer. Since there is nothing to receive the event in the fuzzer and
there is nothing to update the bit the that the device is looping on;
introducing an infinite loop.

Use an Option<> type so that when running the device in the fuzzer no
Arc<AtomicBool> is provided effectively disabling the spinning logic.

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=61165

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-07 08:04:55 +08:00
Rob Bradford
06dc708515 vmm: Only return from reset driven I/O once event received
The reset system is asynchronous with an I/O event (PIO or MMIO) for
ACPI/i8042/CMOS triggering a write to the reset_evt event handler. The
VMM thread will pick up this event on the VMM main loop and then trigger
a shutdown in the CpuManager. However since there is some delay between
the CPU threads being marked to be killed (through the
CpuManager::cpus_kill_signalled bool) it is possible for the guest vCPU
that triggered the exit to be re-entered when the vCPU KVM_RUN is called
after the I/O exit is completed.

This is undesirable and in particular the Linux kernel will attempt to
jump to real mode after a CMOS based exit - this is unsupported in
nested KVM on AMD on Azure and will trigger an error in KVM_RUN.

Solve this problem by spinning in the device that has triggered the
reset until the vcpus_kill_signalled boolean has been updated
indicating that the VMM thread has received the event and called
CpuManager::shutdown(). In particular if this bool is set then the vCPU
threads will not re-enter the guest.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
2023-08-04 09:57:25 +08:00
Yong He
0149e65081 vm-device: support batch update interrupt source group GSI
Split interrupt source group restore into two steps, first restore
the irqfd for each interrupt source entry, and second restore the
GSI routing of the entire interrupt source group.

This patch will reduce restore latency of interrupt source group,
and in a 200-concurrent restore test, the patch reduced the
average IOAPIC restore time from 15ms to 1ms.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-08-03 15:58:36 +01:00
Yu Li
447cad3861 block: merge qcow, vhdx and block_util into block crate
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.

This commit is based on crosvm implementation:
f2eecc4152

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2023-07-19 13:52:43 +01:00
Yong He
3494080e2f vmm: add configuration for network offloading features
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.

Fixes: #4792.

Signed-off-by: Yong He <alexyonghe@tencent.com>
2023-01-12 09:05:45 +00:00
Bo Chen
51307dd509 fuzz: Add fuzzer for 'linux loader' cmdline
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-12-12 13:50:28 +00:00
Bo Chen
32ded2c72b fuzz: Add fuzzer for 'linux loader'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-12-12 13:50:28 +00:00
Bo Chen
e2e02c8f69 fuzz: Add fuzzer for virtio-net
To synthesize the interactions between the virtio-net device and the tap
interface, the fuzzer utilizes a pair of unix domain sockets: one socket
(e.g. the dummy tap frontend) is used to construct the 'net_util::Tap'
instance for creating a virtio-net device; the other socket (e.g. the
dummy tap backend) is used in a epoll loop for handling the tx and rx
requests from the virtio-net device.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-30 12:13:14 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Rob Bradford
f603afc46e vmm: Make Transparent Huge Pages controllable (default on)
Add MemoryConfig::thp and `--memory thp=on|off` to allow control of
Transparent Huge Pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Bo Chen
ef8fb9bd25 fuzz: Add fuzzer for virtio-console
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-03 09:10:41 -07:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Bo Chen
802f489e4d fuzz: Add fuzzer for virtio-iommu
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-21 14:21:42 +01:00
Bo Chen
7b31871a36 fuzz: mem: Avoid using hugepages
The kernel will trigger a SIGBUS upon hugetlb page faults when there is
no huge pages available. We neither have a way to ensure enough huge
pages available on the host system, nor have a way to gracefully report
the lack of huge pages in advance from Cloud Hypervisor. For these
reasons, we have to avoid using huge pages from the virtio-mem fuzzer to
avoid SIGBUS errors.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-18 09:09:35 +01:00
Bo Chen
342851c88c fuzz: Add fuzzer for virtio-mem
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-12 16:09:27 +01:00
Bo Chen
4fea40f008 fuzz: Balloon: Reduce the guest memory size and queue size
With the guest memory size of 1MB, a valid descriptor size can be close
to the guest memory size (e.g. 1MB) and can contain close to 256k
valid pfn entries (each entry is 4 bytes). Multiplying the queue
size (e.g. 256), there can be close to 64 millions pfn entries to
process in a single request. This is why the oss-fuzz reported a
timeout (with a limit of 60s).

By reducing the guest memory size and the queue size, the worst-case now
is 8 million pfn entries for fuzzing, which can be finished in around 20
seconds according to my local experiment.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-04 09:40:12 +01:00
Bo Chen
ef603fde4c fuzz: Reduce the guest memory size for balloon fuzzer
As the virt queues are initialized with random bytes from the fuzzing
engine, a descriptor buffer for the available ring can have a very large
length (e.g. up to 4GB). This means there can be up to 1 billion
entries (e.g. page frame number) for virtio-balloon to process a signal
available descriptor (given each entry is 4 bytes). This is the reason
why oss-fuzz reported a hanging issue for this fuzzer, where the
generated descriptor buffer length is 4,278,321,152.

We can avoid this kind of long execution by reducing the size of guest
memory. For example, with 1MB of guest memory, the number of descriptor
entries for processing is limited ~256K.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-23 08:28:07 +01:00
Rob Bradford
194b59f44b fuzz: Don't overload meaning of reset()
This function is for really for the transport layer to trigger a device
reset. Instead name it appropriately for the fuzzing specific use case.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-22 11:01:41 -07:00
Bo Chen
ab0b3f1b7b fuzz: Add fuzzer for virtio-balloon
The fuzzer exercises the inflate, deflate and reporting events of
virtio-balloon via creating three queues and kicking three events.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
e1b483fc88 fuzz: Add fuzzer for virtio-rng
To make the fuzzer faster and more effective, the guest memory is
setup with a much smaller size (comparing with other virtio device
fuzzers) and  a hole between the memory for holding virtio queue and
the rest of guest data. It brings two benefits: 1) avoid writing large
chunk of data from 'urandom' into the available descriptor chain (which
makes the fuzzer faster); 2) reduce substantial amount of overwrites to
the virtio queue data by the data from 'urandom (which makes the fuzzer
more deterministic and hence effective).

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
f815fcbb5d fuzz: Add fuzzer for virtio-watchdog
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-20 11:05:53 +02:00
Bo Chen
91b8b00f95 fuzz: Add fuzzer for virtio-pmem
The fuzzer is focusing on the virtio-pmem code that processes guest
inputs (e.g. virt queues). Given 'flush' is the only virtio-pmem
request, the fuzzer is essentially testing the code for parsing and
error handling.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-15 09:08:11 +01:00
Bo Chen
67a89f4538 fuzz: Setup virt queue with proper addresses
To make the fuzzers more focused and more efficient, we now provide
default addresses for the descriptor table, available ring, and used
ring, which ensures the virt-queue has a valid memory layout (e.g. no
overlapping between descriptor tables, available ring, and used ring).

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-15 09:08:11 +01:00
Bo Chen
742d6858f7 fuzz: block: Setup the virt queue based on the fuzzed input bytes
Instead of always fuzzing virt-queues with default values (mostly 0s),
the fuzzer now initializes the virt-queue based on the fuzzed input
bytes, such as the tail position of the available ring, queue size
selected by driver, descriptor table address, available ring address,
used ring address, etc. In this way, the fuzzer can explore the
virtio-block code path with various virt-queue setup.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-09-01 08:39:28 +02:00
Bo Chen
6cb214f15c fuzz: block: Rely on custom EpollHelper::run and VirtioCommon:reset
This commit also extends the copyright header.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-30 14:01:33 -07:00
Bo Chen
0b182be65e fuzz: block: Remove meaningless setup to the virt-queue
The current fuzzer defines a 'format' for the random input 'bytes' from
libfuzzer, but this 'format' failed to improve the fuzzing
efficiency. Instead, the 'format' parsing process obfuscates the fuzzer and
makes the fuzzing engine much harder to focus on the actual fuzzing
target (e.g. virtio-block queue event handling). It is actually worse than
simply using the random inputs as the virt queue content for fuzzing.

We can later introduce a different 'format' to the input 'bytes' for
better fuzzing, say focusing more on virito-block fuzzing through
ensuring the virt queue content always has a valid 'available'
descriptor chain to process.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-11 09:35:46 +02:00
Bo Chen
fbec4a070d fuzz: block: Ensure the virtio-block thread is killed and joined
This also ensures that the 'queue_evt' is fully processed, as we enforce
the main thread is waiting for the virtio-block thread to process the
'kill_evt' which is after the 'queue_evt' processing.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-11 09:35:46 +02:00
Bo Chen
5ba3b80e83 fuzz: block: Ensure a queue event is properly processed
Currently the main thread returns immediately after sending a 'queue'
event which is rarely received and processed by the virtio-block
thread (unless system is in high workload). In this way, the fuzzer is
mostly doing nothing and is unable to reproduce its behavior
deterministically (from the same inputs). This patch relies on a
'level-triggered' epoll to ensure a 'queue' event is properly processed
before return from the main thread.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-11 09:35:46 +02:00
Bo Chen
1125fd2667 vmm: api: Use 'BTreeMap' for 'HttpRoutes'
In this way, we get the values sorted by its key by default, which is
useful for the 'http_api' fuzzer.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Bo Chen
e5155bab62 fuzz: Add fuzzer for HTTP API
Fuzz the HTTP API handling code with a minimum HTTP API
receiver (e.g. mocking the behavior of our "vmm" thread).

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Sebastien Boeuf
a423bf13ad virtio: Port codebase to the latest virtio-queue version
The new virtio-queue version introduced some breaking changes which need
to be addressed so that Cloud Hypervisor can still work with this
version.

The most important change is about removing a handle to the guest memory
from the Queue, meaning the caller has to provide the guest memory
handle for multiple methods from the QueueT trait.

One interesting aspect is that QueueT has been widely extended to
provide every getter and setter we need to access and update the Queue
structure without having direct access to its internal fields.

This patch ports all the virtio and vhost-user devices to this new crate
definition. It also updates both vhost-user-block and vhost-user-net
backends based on the updated vhost-user-backend crate. It also updates
the fuzz directory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-07-29 17:41:32 +01:00
Rob Bradford
a330c531b0 fuzz: Add new fuzzer for emulated cmos device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-27 18:12:56 +01:00
Rob Bradford
e4211272ad fuzz: Add new fuzzer for emulated serial device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-27 18:12:56 +01:00
Sebastien Boeuf
3f62a172b2 virtio-devices: Pass a list of tuples for virtqueues
Instead of passing separately a list of Queues and the equivalent list
of EventFds, we consolidate these two through a tuple along with the
queue index.

The queue index can be very useful if looking for the actual index
related to the queue, no matter if other queues have been enabled or
not.

It's also convenient to have the EventFd associated with the Queue so
that we don't have to carry two lists with the same amount of items.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-07-21 14:28:41 +02:00
Sebastien Boeuf
de3e003e3e virtio-devices: Handle virtio queues interrupts from transport layer
Instead of relying on the virtio-queue crate to store the information
about the MSI-X vectors for each queue, we handle this directly from the
PCI transport layer.

This is the first step in getting closer to the upstream version of
virtio-queue so that we can eventually move fully to the upstream
version.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-25 12:01:12 +01:00
Sebastien Boeuf
0162d73ed8 virtio-queue: Update crate based on latest rust-vmm/vm-virtio
This crate contains up to date definition of the Queue, AvailIter,
DescriptorChain and Descriptor structures forked from the upstream
crate rust-vmm/vm-virtio 27b18af01ee2d9564626e084a758a2b496d2c618.

The following patches have been applied on top of this base in order to
make it work correctly with Cloud Hypervisor requirements:

- Add MSI vector field to the Queue

  In order to help with MSI/MSI-X support, it is convenient to store the
  value of the interrupt vector inside the Queue directly.

- Handle address translations

  For devices with access to data in memory being translated, we add to
  the Queue the ability to translate the address stored in the
  descriptor.
  It is very helpful as it performs the translation right after the
  untranslated address is read from memory, avoiding any errors from
  happening from the consumer's crate perspective. It also allows the
  consumer to reduce greatly the amount of duplicated code for applying
  the translation in many different places.

- Add helpers for Queue structure

  They are meant to help crate's consumers getting/setting information
  about the Queue.

These patches can be found on the 'ch' branch from the Cloud Hypervisor
fork: https://github.com/cloud-hypervisor/vm-virtio.git

This patch takes care of updating the Cloud Hypervisor code in
virtio-devices and vm-virtio to build correctly with the latest version
of virtio-queue.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-06 10:02:40 +00:00
Sebastien Boeuf
0249e8641a Move Cloud Hypervisor to virtio-queue crate
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.

The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-22 11:38:55 +02:00
Rob Bradford
687d646c60 virtio-devices, vmm: Shutdown VMM on virtio thread panic
Shutdown the VMM in the virtio (or VMM side of vhost-user) thread
panics.

See: #3031

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-08 09:40:36 +01:00
Fazla Mehrab
98fc38c465 fuzz: fuzz testing for VHDx block device is added
The fuzzer needs to take a larger input for the whole disk image to
be most useful. Since the file is small we can test by reading and
writing over the whole file.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Fazla Mehrab <akm.fazla.mehrab@intel.com>
2021-08-19 11:43:19 +02:00
Bo Chen
2d2463ce04 fuzz: Move to the seccompiler crate
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Sebastien Boeuf
d278e9f39b fuzz: block: Test a RAW file instead QCOW
Instead of running the generic block fuzzer with QCOW, it's better to
use a RAW file since it's less complex and it will focus on virtqueues.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-11 08:55:54 -07:00