Add new latency counters for virtio-block device, including
minimal latency, maximal latency, and average latency for block
read and write.
The average latency is calculated based on cumulative average.
Signed-off-by: Yong He <alexyonghe@tencent.com>
Rather than aggregate the completion list into an intermediate vector
instead adjust the API to provide one completion item at a time.
With DHAT this shows the number of heap allocations has decreased.
Before:
dhat: Total: 623,852 bytes in 8,157 blocks
After:
dhat: Total: 380,444 bytes in 3,469 blocks
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
During analysis of the asynchrous block I/O handling it was observed
that the majority of the time the completion events occur in the same
order as submissions. Further the maximum number of inflight requests
during the boot time is much lower than the size of the queue.
Through the use of a double ended queue (VecDequeue) with a reasonable
pre-allocation capacity we can have O(1) allocation free addition of
items to the list of inflight requests and mostly O(1) matching of
completed requests to submissions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There is duplicated code when handlin queue events in handle_event()
refactor and introduce a new helper function.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular update to latest linux-loader release and point to latest
vfio repository for both crates hosted there.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.
This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Three functions are added:
* 'Tap::new_for_fuzzing()' a custom constructor that creates a dummy
`Tap` interface directly from `File` backed by Unix domain socket;
* 'Tap::mtu()' a custom function that returns hard-coded mtu;
* 'Net::wait_for_epoll_threads()'.
Two functions are reused with modifications to work with the dummy 'Tap'
interface:
* 'Net::new_with_tap()' is made public for fuzzing;
* 'Net::activate()' is modified to not call into 'Tap::set_offload()'
for fuzzing.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.
It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
TEST=Boot `--disk readonly=on` along with a guest that tries to write
(unmodified hypervisor-fw) and observe that the virtio device thread no
longer panics.
Fixes: #4888
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It's perfectly reasonable to expect if that some virtio threads trigger
libc behaviour that needs mprotect() that all virtio threads would do
the same.
Fixes: #4874
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is required to close all file descriptors pointing to an opened TAP
device prior to reopening the TAP device; otherwise it will return
-EBUSY as the device can only be opened once (excluding MQ use cases.)
When rebooting the VM the virtio-net threads would still be running and
so the TAP file descriptor may not have been closed. To ensure that the
TAP FD is closed wait for all the epoll threads to exit after receiving the
KILL_EVENT.
Fixes: #4868
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
An integer overflow from our virtio-mem device can be triggered
from (misbehaved) guest driver with malicious requests. This patch
handles this integer overflow explicitly and treats it as an invalid
request.
Note: this bug was detected by our virtio-mem fuzzer through 'oss-fuzz'.
Signed-off-by: Bo Chen <chen.bo@intel.com>
To support all virtio-devices, this patch replaces the customized
EpollHelper::run` with customized `EpollHelper::run_with_timeout` for
fuzzing.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Since the processing of the console inputs was moved from the VMM thread
to the virtio-console thread (#3061), we have been using the 'FILE_EVENT'
to handle input from stdin/pty/file, which made 'INPUT_EVENT' obsoleted.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Considering error messages will be mostly nested, ensuring no
punctuation at the end will make the error log more readable.
Signed-off-by: Bo Chen <chen.bo@intel.com>
With known number of queues and queue events, we can make each of them
more explicit and avoid using vector/direct indexing, which is cleaner
and slightly more efficient.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The the number of queues and associated events is known and fixed. We
can define and use each of them explicitly and avoid using vector (and
hence direct indexing), which is cleaner and slightly more efficient.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The the number of queues and associated events is known and fixed. We
can define and use each of them explicitly and avoid using vector (and
hence direct indexing), which is cleaner and slightly more efficient.
Also, this refactoring makes it clearer that we are not handling "event
queue" events (as "_event_queue" is not being used intentionally).
Signed-off-by: Bo Chen <chen.bo@intel.com>
In this way, the virtio-iommu code can properly report an error when
a wrong number of queues is provided, instead of triggering an
out-of-bound error.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.
Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).
In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate for migration support, we need to be able to
create a Vdpa object without VhostKernVdpa object associated with it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When restoring a VM, the BAR type can be found directly from the
snapshot resources. It is more reliable than the previous method which
was using self.use_64bit_bar from VirtioPciDevice because at the time
the BARs are allocated, the VirtioDevice hasn't been restored yet,
meaning the way to determine the value of use_64bit_bar is wrong for a
device like vDPA. At this time, the device type is not known and relying
on the stored resources is the only reliable way.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The RNG device never reads from the guest memory it reads from a file
and writes to the guest memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Don't silently ignore the descriptors provided by the guest. This is
consistent with other devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the virtio-rng device the descriptors that are provided by the
guest must be writable and of non-zero length. Also propagate an error
if writing to the guest memory fails.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Adjust MTU logic such that:
1. Apply an MTU to the TAP interface if the user supplies it
2. Always query the TAP interface for the MTU and expose that.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This function is for really for the transport layer to trigger a device
reset. Instead name it appropriately for the fuzzing specific use case.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.
In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.
In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to delegate the resize operation to the virtio-balloon
thread. This can come directly from the vmm thread which will use the
Balloon object to update the VIRTIO configuration and trigger the
interrupt for the guest to be notified.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Update the implementation of the process_queue() function to match all
other virtio devices implementations. This solves some issue related to
potential out-of-bound accesses to the former used_desc_heads list.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Multiple rust-vmm crates must be updated at once given the vm-memory one
has been updated and they all rely on vm-memory.
- vm-memory from 0.8.0 to 0.9.0
- vhost from 0.4.0 to 0.5.0
- virtio-queue from 0.5.0 to 0.6.0
- vhost-user-backend from 0.6.0 to 0.7.0
- linux-loader from 0.4.0 to 0.5.0
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It provides fuzzer a reliable way to wait for a sequence of events
to complete for virtio-devices while not using a fixed timeout to
maintain the full speed of fuzzing.
Take virtio-block as an example, the 'queue event' with a valid
available queue setup can trigger a 'completion event'. This is a
meaningful virtio-block code path of processing guest inputs which is
our target for fuzzing virtio devices.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Through multiple changes, this patch aims at providing a reliable
solution for detecting the state of the PTY's connection. Being able to
find out when the other end of the PTY is connected is essential to
prevent the loss of data being output through the PTY. When the PTY
isn't connected, the output is buffered through the SerialBuffer, the
same solution that was created for the serial port initially.
Fixes#4521
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extending and improving both the structure and the trait allows for more
flexibility regarding what can be achieved with the epoll loop. It
allows for a timeout to be configured instead of the default blocking
behavior. There is a new method in the trait to notify the caller that
the timeout has been reached. And there's a new knob to be notified with
the full list of events before the internal code will actually loop over
every event.
All of these new features are not affecting the previous behavior, and
using EpollHelper::run() should be unchanged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Reads from the random file may only be partial, e.g., if the random file is an ordinary text
file. When that happens, the device needs to signal to the driver that only parts of the buffer have
been overwritten.
Signed-off-by: Markus Napierkowski <markus.napierkowski@cyberus-technology.de>
Remove the use of 'unwrap()' that assumes the guest address for request
status is always valid, which avoid virtio-block thread panic on
malformed descriptors from the guest.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Some dependencies are not tracking the latest version in the .toml file
so update all dependencies to the latest version.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that we rely on pop_descriptor_chain() rather than iter() to iterate
over a queue, there's no more borrow on the queue itself, meaning we can
invoke add_used() directly for the iteration loop. This simplifies the
processing of the queues for each virtio device, and bring some possible
performance improvement given we don't have to iterate twice over the
list of descriptors to invoke add_used().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Using pop_descriptor_chain() is much more appropriate than iter() since
it recreates the iterator every time, avoiding the queue to be borrowed
and allowing the virtio-net implementation to match all the other ones.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The new virtio-queue version introduced some breaking changes which need
to be addressed so that Cloud Hypervisor can still work with this
version.
The most important change is about removing a handle to the guest memory
from the Queue, meaning the caller has to provide the guest memory
handle for multiple methods from the QueueT trait.
One interesting aspect is that QueueT has been widely extended to
provide every getter and setter we need to access and update the Queue
structure without having direct access to its internal fields.
This patch ports all the virtio and vhost-user devices to this new crate
definition. It also updates both vhost-user-block and vhost-user-net
backends based on the updated vhost-user-backend crate. It also updates
the fuzz directory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever a virtio reset happens, the vhost-user backend should be
notified that the vring should be stopped. This is performed by calling
GET_VRING_BASE on the appropriate queue indexes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Rather than relying on the amount of queues to enable or disable the
queue that have been activated, we rely on the actual queue indexes
provided through the tuple including the queue index, the Queue and the
EventFd. By storing the list of indexes, we simplify the code and also
make it more accurate in case some queues aren't activated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of passing separately a list of Queues and the equivalent list
of EventFds, we consolidate these two through a tuple along with the
queue index.
The queue index can be very useful if looking for the actual index
related to the queue, no matter if other queues have been enabled or
not.
It's also convenient to have the EventFd associated with the Queue so
that we don't have to carry two lists with the same amount of items.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When preparing the activator, we must provide the correct queue index to
clone the right EventFd associated with the queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's not mandatory for the virtio-fs driver to enable all virtqueues
provided by the backend since all it needs is one request queue to work
correctly. Therefore we lower the minimal amount of enabled queues to 1.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
And along with virtio-queue, we must also bump vhost-user-backend from
0.3.0 to 0.5.0 (since it relies on virtio-queue 0.4.0).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The vhost-user backend was always provided the maximum queue size but
this is incorrect. Instead it must be informed of the actual queue size
that has been negotiated with the virtio driver running in the guest.
This ensures proper functioning of vhost-user-block with the Rust
Hypervisor Firmware, which uses a hardcoded queue size of 16.
Partially fixes#4285
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The latest vhost-user specification describes VHOST_USER_RESET_OWNER
command as deprecated with the following explanation:
This is no longer used. Used to be sent to request disabling all
rings, but some back-ends interpreted it to also discard connection
state (this interpretation would lead to bugs). It is recommended that
back-ends either ignore this message, or use it to disable all rings.
Also, it's been observed that when using either Rust Hypervisor Firmware
or EDK2 OVMF firmware with SPDK (using the block device as the boot
disk), the virtio reset that happens when the firmware no longer needs
to access the block device caused a failure by triggering the command
VHOST_USER_RESET_OWNER.
For all these reasons, this patch simplifies the virtio reset
implementation by simply disabling the virtqueues and no longer calling
into VHOST_USER_RESET_OWNER.
Partially fixes#4285
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: accessing first element with `data.get(0)`
--> virtio-devices/src/transport/pci_device.rs:1055:34
|
1055 | if let Some(v) = data.get(0) {
| ^^^^^^^^^^^ help: try: `data.first()`
|
= note: `#[warn(clippy::get_first)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#get_first
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>