The vhost-user backend was always provided the maximum queue size but
this is incorrect. Instead it must be informed of the actual queue size
that has been negotiated with the virtio driver running in the guest.
This ensures proper functioning of vhost-user-block with the Rust
Hypervisor Firmware, which uses a hardcoded queue size of 16.
Partially fixes#4285
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The latest vhost-user specification describes VHOST_USER_RESET_OWNER
command as deprecated with the following explanation:
This is no longer used. Used to be sent to request disabling all
rings, but some back-ends interpreted it to also discard connection
state (this interpretation would lead to bugs). It is recommended that
back-ends either ignore this message, or use it to disable all rings.
Also, it's been observed that when using either Rust Hypervisor Firmware
or EDK2 OVMF firmware with SPDK (using the block device as the boot
disk), the virtio reset that happens when the firmware no longer needs
to access the block device caused a failure by triggering the command
VHOST_USER_RESET_OWNER.
For all these reasons, this patch simplifies the virtio reset
implementation by simply disabling the virtqueues and no longer calling
into VHOST_USER_RESET_OWNER.
Partially fixes#4285
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of relying on the virtio-queue crate to store the information
about the MSI-X vectors for each queue, we handle this directly from the
PCI transport layer.
This is the first step in getting closer to the upstream version of
virtio-queue so that we can eventually move fully to the upstream
version.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This crate contains up to date definition of the Queue, AvailIter,
DescriptorChain and Descriptor structures forked from the upstream
crate rust-vmm/vm-virtio 27b18af01ee2d9564626e084a758a2b496d2c618.
The following patches have been applied on top of this base in order to
make it work correctly with Cloud Hypervisor requirements:
- Add MSI vector field to the Queue
In order to help with MSI/MSI-X support, it is convenient to store the
value of the interrupt vector inside the Queue directly.
- Handle address translations
For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the
descriptor.
It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.
- Add helpers for Queue structure
They are meant to help crate's consumers getting/setting information
about the Queue.
These patches can be found on the 'ch' branch from the Cloud Hypervisor
fork: https://github.com/cloud-hypervisor/vm-virtio.git
This patch takes care of updating the Cloud Hypervisor code in
virtio-devices and vm-virtio to build correctly with the latest version
of virtio-queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.
The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introducing a new function to factorize a small part of the
initialization that is shared between a full reinitialization and a
restoration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It was incorrect to call Vec::from_raw_parts() on the address pointing
to the shared memory log region since Vec is a Rust specific structure
that doesn't directly translate into bytes. That's why we use the same
function from std::slice in order to create a proper slice out of the
memory region, which is then copied into a Vec.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the common vhost-user code for starting logging dirty pages when
the migration is started, and its counterpart for stopping, as well as
the code in charge of retrieving the bitmap of the dirty pages that have
been logged.
All these functions are meant to be leveraged from vhost-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a simple field `migration_support` to VhostUserHandle in order to
store the information about the device supporting migration or not. The
value of this flag depends on the feature set negotiated with the
backend. It's considered as supporting migration if VHOST_F_LOG_ALL is
present in the virtio features and if VHOST_USER_PROTOCOL_F_LOG_SHMFD is
present in the vhost-user protocol features.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
backend like SPDK required to know how many virt queues to be handled
before gets VHOST_USER_SET_INFLIGHT_FD message.
fix dpdk core dump while processing vhost_user_set_inflight_fd:
#0 0x00007fffef47c347 in vhost_user_set_inflight_fd (pdev=0x7fffe2895998, msg=0x7fffe28956f0, main_fd=545) at ../lib/librte_vhost/vhost_user.c:1570
#1 0x00007fffef47e7b9 in vhost_user_msg_handler (vid=0, fd=545) at ../lib/librte_vhost/vhost_user.c:2735
#2 0x00007fffef46bac0 in vhost_user_read_cb (connfd=545, dat=0x7fffdc0008c0, remove=0x7fffe2895a64) at ../lib/librte_vhost/socket.c:309
#3 0x00007fffef45b3f6 in fdset_event_dispatch (arg=0x7fffef6dc2e0 <vhost_user+8192>) at ../lib/librte_vhost/fd_man.c:286
#4 0x00007ffff09926f3 in rte_thread_init (arg=0x15ee180) at ../lib/librte_eal/common/eal_common_thread.c:175
Signed-off-by: Arafatms <arafatms@outlook.com>
The vhost-user-net backend needs to prepare all queues before enabling vring.
For example, DPVGW will report 'the RX queue can't find' error, if we enable
vring immediately after kicking it out.
Signed-off-by: Arafatms <arafatms@outlook.com>
This patch moves all vhost-user common functions behind a new structure
VhostUserHandle. There is no functional changes intended, the only goal
being to prepare for storing information through this new structure,
limiting the amount of parameters that are needed for each function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This dependency bump needed some manual handling since the API changed
quite a lot regarding some RawFd being changed into either File or
AsRawFd traits.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that vhost crate allows the caller to set the header flags, we can
set NEED_REPLY whenever the REPLY_ACK protocol feature is supported from
both ends.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Vhost user INFLIGHT_SHMFD protocol feature supports inflight I/O
tracking, this commit implement the vhost-user device (master) support
of the feature. Till this commit, specific vhost-user devices (blk, fs,
or net) have not enable this feature.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Add the support for reconnecting the backend request handler after a
disconnection/crash happened.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We should try to read the last avail index from the vring memory aera. This
is necessary when handling vhost-user socket reconnection.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].
[1] https://github.com/rust-vmm/vm-memory/issues/133
Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
When in client mode, the VMM will retry connecting the backend for a
minute before giving up.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit implements the reconnection feature for vhost-user-net in
case the connection with the backend is shutdown.
The guest has no knowledge about what happens when a disconnection
occurs, which simplifies the code to handle the reconnection. The
reconnection happens with the backend, and the VMM goes through the
vhost-user setup once again.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VIRTIO features should not be set before they are acked from the
guest. This code was only present to overcome a vhost crate limitation
that was expecting the VIRTIO features to be set before we could fetch
and set the protocol features.
The vhost crate has been recently fixed by removing the limitation,
therefore there's no need for this workaround in the Cloud Hypervisor
codebase anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Factorize the virtio features and vhost-user protocol features
negotiation through a common function that blk, fs and net
implementations can directly rely on.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure the virtio features are set upon device activation. At the
time the device is activated, we know the guest acknowledged the
features, which mean it's safe to set them back to the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio features are negotiated and set at the time the device is
created, hence there's no need to set the features again while going
through the vhost-user setup that is performed upon queue activation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some refactoring is performed in order to always expect the irqfd to be
provided by VirtioInterrupt trait. In case no irqfd is available, we
simply fail initializing the vhost-user device. This allows for further
simplification since we can assume the interrupt will always be
triggered directly by the vhost-user backend without proxying through
the VMM. This allows for complete removal of the dedicated thread for
both block and net.
vhost-user-fs is a bit more complex as it requires the slave request
protocol feature in order to support DAX. That's why we still need the
VMM to interfere and therefore run a dedicated thread for it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Assuming vhost-user devices support CONFIGURE_MEM_SLOTS protocol
feature, we introduce a new method to the VirtioDevice trait in order to
update one single memory at a time.
In case CONFIGURE_MEM_SLOTS is not supported by the backend (feature not
acked), we fallback onto the current way of updating the memory
mappings, that is with SET_MEM_TABLE.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is no need to get the vring base when resetting the vhost-user
device. This was mostly ignored, but in some cases, it was causing some
actual errors.
A reset must simply be a combination of disabling the vrings along with
the reset of the owner.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The vhost crate from rust-vmm is ready, which is why we do the switch
from the Cloud Hypervisor fork to the upstream crate.
At the same time, we rename the crate from vhost_rs to vhost.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that ExternalDmaMapping is defined in vm-device, let's use it from
there.
This commit also defines the function get_host_address_range() to move
away from the vfio-ioctls dependency.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>