The cumulative average formula [1] requires to use signed integers
for proper calculations, while calculated result (e.g. cumulative
average) is always positive. This patch reflects the above requirements
in our code.
[1] https://en.wikipedia.org/wiki/Moving_average#Cumulative_averageFixes: #5745
Signed-off-by: Bo Chen <chen.bo@intel.com>
There is a "LATENCY_SCALE" being used for calculating cumulative average
latency, so it should also be used for the latency of the first op.
See: #5712
Signed-off-by: Bo Chen <chen.bo@intel.com>
Update to the latest vm-memory and all the crates that also depend upon
it.
Fix some deprecation warnings.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Logically until we have handled the first operation the latency is
infinite; this logic was applied to the minimum latency originally but
this patch extends that logic to the maximum and average latency.
To prevent the initial average latency being skewed by the inclusion of
infinity the average value is initally seeded with the first measured
latency.
Fixes: #5704
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Since kernel v6.3 the vsock packet is not split over two descriptors
and is instead included in a single one.
This change is based on the discovery and fix identified by Stefano
Garzarella for the vm-virtio vsock implementation and adapted for our
very different codebase.
Fixes: #5691
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
warning: this argument is a mutable reference, but not used mutably
--> virtio-devices/src/transport/pci_common_config.rs💯17
|
100 | queues: &mut [Queue],
| ^^^^^^^^^^^^ help: consider changing to: `&[Queue]`
|
= warning: changing this function will impact semver compatibility
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_ref_mut
= note: `#[warn(clippy::needless_pass_by_ref_mut)]` on by default
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Similar to balloon inflation, memory allocation is also constrained to
align with the page size. Therefore, memory is allocated in units of the
host page size, one page at a time, until all host pages that the memory
range requested by the guest are managed. If the requested size is
smaller than the page size, the entire page will still be allocated
because smaller allocations are not possible due to the page size
limitation.
Fixes: cloud-hypervisor#5369
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Currently, virtio-balloon can't work well with page size other than 4k.
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.
We get some idea from [1] to solve this issue.
What has been done in this commit:
For balloon inflation:
A bitmap is employed to track the memory range to be released in 4k
granularity. Once it accumulates to one host page size, the corresponding
page is released, and the bitmap is cleared to handle the next record.
This process continues until all the memory range is managed. Memory will
only be released when a consecutive set of balloon request entries from
the same host page reaches the full host page size. If a balloon request
entry from a different host page is encountered, the bitmap and the base
host page address will be reset. Consequently, memory is released in
units of the page size, ensuring efficient memory management. That's say
if memory range length to be released smaller than page size or if the
guest scatters requests each of whose size is smaller than page size
across different host pages no memory will be released.
[1] https://patchwork.kernel.org/project/qemu-devel/patch/20190214043916.22128-6-david@gibson.dropbear.id.au/
Fixes: cloud-hypervisor#5369
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This commit merges crates `qcow`, `vhdx` and `block_util` into the
crate `block`, which can allow `qcow` to use functions from `block_util`
without introducing a circular crate dependency.
This commit is based on crosvm implementation:
f2eecc4152
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
warning: usage of `Arc<T>` where `T` is not `Send` or `Sync`
--> virtio-devices/src/vsock/device.rs:376:22
|
376 | backend: Arc::new(RwLock::new(backend)),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: consider using `Rc<T>` instead or wrapping `T` in a std::sync type like `Mutex<T>`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync
= note: `#[warn(clippy::arc_with_non_send_sync)]` on by default
The vsock backend may be shared between threads, so the type `B` in
`Vsock` should be `VsockBackend` and `Sync`.
Considering that `api_receiver` and `gdb_receiver` are only used in vmm
threads, the `Arc` can be replaced by `Rc`.
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
warning: useless use of `vec!`
--> test_infra/src/lib.rs:111:30
|
111 | let mut events = vec![epoll::Event::new(epoll::Events::empty(), 0); 1];
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: you can use an array directly: `[epoll::Event::new(epoll::Events::empty(), 0); 1]`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_vec
= note: `#[warn(clippy::useless_vec)]` on by default
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
warning: casting raw pointers to the same type and constness is unnecessary (`*const protocol::MemoryRange` -> `*const protocol::MemoryRange`)
--> vm-migration/src/protocol.rs:280:17
|
280 | self.data.as_ptr() as *const MemoryRange as *const u8,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `self.data.as_ptr()`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_cast
= note: `#[warn(clippy::unnecessary_cast)]` on by default
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
This gives users the chance to reduce the number of dependencies
included, which is generally good practice and also reduces code size.
Furthermore, `io_uring` specifically is a strong contender for something
one may wish to disable due to the syscall API's many security issues[1]
[1]: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html
Signed-off-by: Manish Goregaokar <manishsmail@gmail.com>
Remove "enum_variant_names" clippy. Enumeration variant names should
specify their variant, not repeat the enumeration name.
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
SerialBuffer uses VecDeque::extend, which calls realloc, which a
maximum buffer size of 1 MiB. Starting at allocation sizes of
128 KiB, musl's mallocng allocator will use mremap for the allocation.
Since this was not permitted by the seccomp rules, heavy write load
could crash cloud-hypervisor with a seccomp failure. (Encountered
using virtio-console, but I don't see any reason it wouldn't happen
for the legacy serial device too.)
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Bump to the latest rust-vmm crates, including vm-memory, vfio,
vfio-bindings, vfio-user, virtio-bindings, virtio-queue, linux-loader,
vhost, and vhost-user-backend,
Signed-off-by: Bo Chen <chen.bo@intel.com>
Don't import via glob to avoid (unused) objects colliding in the
namespace. This fixes a beta clippy issue.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Cloud Hypervisor's vhost-user implementation will reconnect if it gets
disconnected from the backend. That means connections happen inside
the vhost-user seccomp sandbox, so all syscalls used in reconnecting
have to be allowed in that sandbox.
clock_nanosleep is used by Glibc, and nanosleep is used by musl.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Doc comments are Markdown, and can include HTML tags. Anything in
angle brackets will therefore be inserted as an HTML tag into
rustdoc's output. If that's not intentional, the left angle bracket
needs to be escaped.
I haven't fixed the doc comments in src/main.rs, because argh doesn't
understand the escaping, so the backslashes would show up in the
--help output. I've opened https://github.com/google/argh/issues/159
about that.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
These need to be //! comments, because they apply to the module as a
whole, not to whatever directly follows the comment. Using ///
comments here resulted in documentation being attached to the wrong
thing, or not rendered at all.
I've also checked the Markdown formatting of these comments as
rendered by rustdoc, and fixed it where appropriate.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This change is important to do a proper resource cleanup. We decided
to do this repetitive approach as VirtioCommon can't implement Drop
without major changes to the corresponding code. Also, devices such as
Net can't easily use the epoll_threads-abstraction from VirtioCommon as
it has multiple threads with different semantics.
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.
Fixes: #4792.
Signed-off-by: Yong He <alexyonghe@tencent.com>
Add new latency counters for virtio-block device, including
minimal latency, maximal latency, and average latency for block
read and write.
The average latency is calculated based on cumulative average.
Signed-off-by: Yong He <alexyonghe@tencent.com>
Rather than aggregate the completion list into an intermediate vector
instead adjust the API to provide one completion item at a time.
With DHAT this shows the number of heap allocations has decreased.
Before:
dhat: Total: 623,852 bytes in 8,157 blocks
After:
dhat: Total: 380,444 bytes in 3,469 blocks
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
During analysis of the asynchrous block I/O handling it was observed
that the majority of the time the completion events occur in the same
order as submissions. Further the maximum number of inflight requests
during the boot time is much lower than the size of the queue.
Through the use of a double ended queue (VecDequeue) with a reasonable
pre-allocation capacity we can have O(1) allocation free addition of
items to the list of inflight requests and mostly O(1) matching of
completed requests to submissions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There is duplicated code when handlin queue events in handle_event()
refactor and introduce a new helper function.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular update to latest linux-loader release and point to latest
vfio repository for both crates hosted there.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.
This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Three functions are added:
* 'Tap::new_for_fuzzing()' a custom constructor that creates a dummy
`Tap` interface directly from `File` backed by Unix domain socket;
* 'Tap::mtu()' a custom function that returns hard-coded mtu;
* 'Net::wait_for_epoll_threads()'.
Two functions are reused with modifications to work with the dummy 'Tap'
interface:
* 'Net::new_with_tap()' is made public for fuzzing;
* 'Net::activate()' is modified to not call into 'Tap::set_offload()'
for fuzzing.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.
It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
TEST=Boot `--disk readonly=on` along with a guest that tries to write
(unmodified hypervisor-fw) and observe that the virtio device thread no
longer panics.
Fixes: #4888
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It's perfectly reasonable to expect if that some virtio threads trigger
libc behaviour that needs mprotect() that all virtio threads would do
the same.
Fixes: #4874
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is required to close all file descriptors pointing to an opened TAP
device prior to reopening the TAP device; otherwise it will return
-EBUSY as the device can only be opened once (excluding MQ use cases.)
When rebooting the VM the virtio-net threads would still be running and
so the TAP file descriptor may not have been closed. To ensure that the
TAP FD is closed wait for all the epoll threads to exit after receiving the
KILL_EVENT.
Fixes: #4868
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
An integer overflow from our virtio-mem device can be triggered
from (misbehaved) guest driver with malicious requests. This patch
handles this integer overflow explicitly and treats it as an invalid
request.
Note: this bug was detected by our virtio-mem fuzzer through 'oss-fuzz'.
Signed-off-by: Bo Chen <chen.bo@intel.com>
To support all virtio-devices, this patch replaces the customized
EpollHelper::run` with customized `EpollHelper::run_with_timeout` for
fuzzing.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Since the processing of the console inputs was moved from the VMM thread
to the virtio-console thread (#3061), we have been using the 'FILE_EVENT'
to handle input from stdin/pty/file, which made 'INPUT_EVENT' obsoleted.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Considering error messages will be mostly nested, ensuring no
punctuation at the end will make the error log more readable.
Signed-off-by: Bo Chen <chen.bo@intel.com>
With known number of queues and queue events, we can make each of them
more explicit and avoid using vector/direct indexing, which is cleaner
and slightly more efficient.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The the number of queues and associated events is known and fixed. We
can define and use each of them explicitly and avoid using vector (and
hence direct indexing), which is cleaner and slightly more efficient.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The the number of queues and associated events is known and fixed. We
can define and use each of them explicitly and avoid using vector (and
hence direct indexing), which is cleaner and slightly more efficient.
Also, this refactoring makes it clearer that we are not handling "event
queue" events (as "_event_queue" is not being used intentionally).
Signed-off-by: Bo Chen <chen.bo@intel.com>
In this way, the virtio-iommu code can properly report an error when
a wrong number of queues is provided, instead of triggering an
out-of-bound error.
Signed-off-by: Bo Chen <chen.bo@intel.com>