For vhost-user devices, memory should be shared between CLH and
vhost-user backend. However, madvise DONTNEED doesn't working in
this case. So, let's use fallocate PUNCH_HOLE to discard those
memory regions instead.
Signed-off-by: Li Hangjing <lihangjing@bytedance.com>
Since the slave request handler is common to all vhost-user devices, the
same way the reconnection is, it makes sense to handle the requests from
the backend through the same thread.
The reconnection thread now handles both a reconnection as well as any
request coming from the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The control queue was missing rt_sigprocmask syscall, which was causing
a crash when the VM was shutdown.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We thought we could move the control queue to the backend as it was
making some good sense. Unfortunately, doing so was a wrong design
decision as it broke the compatibility with OVS-DPDK backend.
This is why this commit moves the control queue back to the VMM side,
meaning an additional thread is being run for handling the communication
with the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows the guest to reprogram the offload settings and mitigates
issues where the Linux kernel tries to reprogram the queues even when
the feature is not advertised.
Fixes: #2528
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Cleanup the control queue handling in preparation for supporting
alternative commands.
Note that this change does not make the MQ handling spec compliant.
According to the specification MQ should only be enabled once the number
of queue pairs the guest would like to use has been specified. The only
improvement towards the specication in this change is correct error
handling if the guest specifies an inappropriate number of queues (out
of range.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
On x86_64 architecture, multiple syscalls were missing when shutting
down the vhost-user-net device along with the VM. This was causing the
usual crash related to seccomp filters.
This commit adds these missing syscalls to fix the issue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Create two functions for registering/unregistering DMA mapping handlers,
each handler being associated with a VFIO device.
Whenever the plugged_size is modified (which means triggered by the
virtio-mem driver in the guest), the virtio-mem backend is responsible
for updating the DMA mappings related to every VFIO device through the
handler previously provided.
It's important to update the map when the handler is either registered
or unregistered as well, as we don't want to miss some plugged memory
that would have been added before the VFIO device is added to the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In particular update for the vmm-sys-util upgrade and all the other
dependent packages. This requires an updated forked version of
kvm-bindings (due to updated vfio-ioctls) but allowed the removal of our
forked version of kvm-ioctls.
The changes to the API from kvm-ioctls and vmm-sys-util required some
other minor changes to the code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The main idea behind this commit is to remove all the complexity
associated with TX/RX handling for virtio-net. By using writev() and
readv() syscalls, we could get rid of intermediate buffers for both
queues.
The complexity regarding the TAP registration has been simplified as
well. The RX queue is only processed when some data are ready to be
read from TAP. The event related to the RX queue getting more
descriptors only serves the purpose to register the TAP file if it's not
already.
With all these simplifications, the code is more readable but more
performant as well. We can see an improvement of 10% for a single
queue device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the function can never return an error this is now a clippy failure:
error: this function's return value is unnecessarily wrapped by `Result`
--> virtio-devices/src/watchdog.rs:215:5
|
215 | / fn set_state(&mut self, state: &WatchdogState) -> io::Result<()> {
216 | | self.common.avail_features = state.avail_features;
217 | | self.common.acked_features = state.acked_features;
218 | | // When restoring enable the watchdog if it was previously enabled. We reset the timer
... |
223 | | Ok(())
224 | | }
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_wraps
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds the asynchronous support for fixed VHD disk files.
It introduces FixedVhd as a new ImageType, moving the image type
detection to the block_util crate (instead of qcow crate).
It creates a new vhd module in the block_util crate in order to handle
VHD footer, following the VHD specification.
It creates a new fixed_vhd_async module in the block_util crate to
implement the asynchronous version of fixed VHD disk file. It relies on
io_uring.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that BlockIoUring is the only implementation of virtio-block,
handling both synchronous and asynchronous backends based on the
AsyncIo trait, we can rename it to Block.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that both synchronous and asynchronous backends rely on the
asynchronous version of virtio-block (namely BlockIoUring), we can
get rid of the synchronous version (namely Block).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the synchronous QCOW file implementation present in the qcow
crate, we created a new qcow_sync module in block_util that ports this
synchronous implementation to the AsyncIo trait.
The point is to reuse virtio-blk asynchronous implementation for both
synchronous and asynchronous backends.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the synchronous RAW file implementation present in the qcow
crate, we created a new raw_sync module in block_util that ports this
synchronous implementation to the AsyncIo trait.
The point is to reuse virtio-blk asynchronous implementation for both
synchronous and asynchronous backends.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On aarch64, the openat() syscall was missing from the seccomp filters
list, preventing the test_watchdog from running properly.
Fixes#2103
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This device operates a single virtq. When the driver offers a descriptor
to the device it is interpreted as a "ping" to indicate that the guest
is alive. A periodic timer fires and if when the timer is fired there
has not been a "ping" from the guest then the device will reset the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
While using the virtio-iommu device involving L2 scenario, and tearing
things down all the way from L2 back to L0 exposed some bad syscalls
that were not part of the authorized list.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The missing syscall rt_sigprocmask(2) was triggered for the musl build
upon rebooting the VM, and was causing the VM to be killed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By testing manually the memory resizing through virtio-mem, several
missing syscalls have been identified.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We observed CI instability for the past couple of days. This
instability is confirmed to be a result of incomplete seccomp
filters. Given the filter on 'virtio_vsock' is recently added and
is missing 'brk', it is likely to be the root cause of the
instability.
Signed-off-by: Bo Chen <chen.bo@intel.com>
"debug!" marco is used in virtio-devices/src/epoll_helper.rs. When"-vvv"
and "--log-file" option was specified, the missing "SYS_write" rule
caused a "bad system call" crash.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
From the experiments of running integration tests on my local machine,
auditd occationally reported the 'brk' syscall is needed for the
'virtio-rng' worker thread.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch adds the seccomp filter list for the virtio_net thread, while
the list was already added for the virtio_net_ctl thread.
Partially fixes: #925
Signed-off-by: Bo Chen <chen.bo@intel.com>
The current seccomp filter for virtio-net is actually for the worker
thread 'virtio_net_ctl' (not the actual worker thread
'virtio_net'). This patch introduces changes to distinguish those two
worker threads and seccomp filters.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The seccomp filters specific to the virtio-net threads must contain
dup() syscall now that we ported the epoll code to the EpollHelper.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The seccomp filters specific to the virtio-console thread must contain
dup syscall now that we ported the epoll code to the EpollHelper.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>