The new virtio-queue version introduced some breaking changes which need
to be addressed so that Cloud Hypervisor can still work with this
version.
The most important change is about removing a handle to the guest memory
from the Queue, meaning the caller has to provide the guest memory
handle for multiple methods from the QueueT trait.
One interesting aspect is that QueueT has been widely extended to
provide every getter and setter we need to access and update the Queue
structure without having direct access to its internal fields.
This patch ports all the virtio and vhost-user devices to this new crate
definition. It also updates both vhost-user-block and vhost-user-net
backends based on the updated vhost-user-backend crate. It also updates
the fuzz directory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Rather than relying on the amount of queues to enable or disable the
queue that have been activated, we rely on the actual queue indexes
provided through the tuple including the queue index, the Queue and the
EventFd. By storing the list of indexes, we simplify the code and also
make it more accurate in case some queues aren't activated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of passing separately a list of Queues and the equivalent list
of EventFds, we consolidate these two through a tuple along with the
queue index.
The queue index can be very useful if looking for the actual index
related to the queue, no matter if other queues have been enabled or
not.
It's also convenient to have the EventFd associated with the Queue so
that we don't have to carry two lists with the same amount of items.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For vhost-user devices, we don't want to loose the vhost-user protocol
feature through the negotiation between guest and device. Since we know
VIRTIO has no knowledge of the vhost-user protocol feature, there is no
way it would ever be acknowledged by the guest. For that reason, we
create each vhost-user device with the set of acked features containing
the vhost-user protocol feature is this one was part of the available
list.
Having the set of acked features containing this bit allows for solving
a bug that was happening through the migration process since the
vhost-user protocol feature wasn't explicitely enabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.
The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a common solution for spawning the virtio threads which will
make it easier to add the panic handling.
During this effort I discovered that there were no seccomp filters
registered for the vhost-user-net thread nor the vhost-user-block
thread. This change also incorporates basic seccomp filters for those as
part of the refactoring.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Introducing a new structure VhostUserCommon allowing to factorize a lot
of the code shared between the vhost-user devices (block, fs and net).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We cannot let vhost-user devices connect to the backend when the Block,
Fs or Net object is being created during a restore/migration. The reason
is we can't have two VMs (source and destination) connected to the same
backend at the same time. That's why we must delay the connection with
the vhost-user backend until the restoration is performed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to prevent the vhost-user devices from reconnecting to the
backend after the migration has been successfully performed, we make
sure to kill the thread in charge of handling the reconnection
mechanism.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
During a migration, the vhost-user device talks to the backend to
retrieve the dirty pages. Once done with this, a snapshot will be taken,
meaning there's no need to communicate with the backend anymore. Closing
the communication is needed to let the destination VM being able to
connect to the same backend.
That's why we shutdown the communication with the backend in case a
migration has been started and we're asked for a snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This anticipates the need for creating a new Blk, Fs or Net object
without having performed the connection with the vhost-user backend yet.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the common vhost-user code can handle logging dirty pages
through shared memory, we need to advertise it to the vhost-user
backends with the protocol feature VHOST_USER_PROTOCOL_F_LOG_SHMFD.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the support for snapshot/restore feature for all supported
vhost-user devices.
The complexity of vhost-user-fs device makes it only partially
compatible with the feature. When using the DAX feature, there's no way
to store and remap what was previously mapped in the DAX region. And
when not using the cache region, if the filesystem is mounted, it fails
to be properly restored as this would require a special command to let
the backend know that it must remount what was already mounted before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch moves all vhost-user common functions behind a new structure
VhostUserHandle. There is no functional changes intended, the only goal
being to prepare for storing information through this new structure,
limiting the amount of parameters that are needed for each function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the reconnection thread took on the responsibility to handle
backend initiated requests as well, the variable naming should reflect
this by avoiding the "reconnect" prefix.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Vhost user INFLIGHT_SHMFD protocol feature supports inflight I/O
tracking, this commit implement the vhost-user device (master) support
of the feature. Till this commit, specific vhost-user devices (blk, fs,
or net) have not enable this feature.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Add the support for reconnecting the backend request handler after a
disconnection/crash happened.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the slave request handler is common to all vhost-user devices, the
same way the reconnection is, it makes sense to handle the requests from
the backend through the same thread.
The reconnection thread now handles both a reconnection as well as any
request coming from the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Add a helper to VirtioCommon which returns duplicates of the EventFds
for kill and pause event.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The reconnection code is moved to the vhost-user module as it is a
common place to be shared across all vhost-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit implements the reconnection feature for vhost-user-net in
case the connection with the backend is shutdown.
The guest has no knowledge about what happens when a disconnection
occurs, which simplifies the code to handle the reconnection. The
reconnection happens with the backend, and the VMM goes through the
vhost-user setup once again.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We thought we could move the control queue to the backend as it was
making some good sense. Unfortunately, doing so was a wrong design
decision as it broke the compatibility with OVS-DPDK backend.
This is why this commit moves the control queue back to the VMM side,
meaning an additional thread is being run for handling the communication
with the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A lot of the VIRTIO reserved features should be supported or not by the
vhost-user backend. That means on the VMM side, these features should be
available, so that they don't get lost during the negotiation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving helpers to the net_util crate since we don't want virtio-net
common code to be split between two places. The net_util crate should be
the only place to host virtio-net common code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Factorize the virtio features and vhost-user protocol features
negotiation through a common function that blk, fs and net
implementations can directly rely on.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure the virtio features are set upon device activation. At the
time the device is activated, we know the guest acknowledged the
features, which mean it's safe to set them back to the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio features are negotiated and set at the time the device is
created, hence there's no need to set the features again while going
through the vhost-user setup that is performed upon queue activation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the control queue is correctly handled by the backend, there's
no need to handle it as well from the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some refactoring is performed in order to always expect the irqfd to be
provided by VirtioInterrupt trait. In case no irqfd is available, we
simply fail initializing the vhost-user device. This allows for further
simplification since we can assume the interrupt will always be
triggered directly by the vhost-user backend without proxying through
the VMM. This allows for complete removal of the dedicated thread for
both block and net.
vhost-user-fs is a bit more complex as it requires the slave request
protocol feature in order to support DAX. That's why we still need the
VMM to interfere and therefore run a dedicated thread for it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the support for an OVS vhost-user backend to connect as the
vhost-user client. This means we introduce with this patch a new
option to our `--net` parameter. This option is called 'server' in order
to ask the VMM to run as the server for the vhost-user socket.
Fixes#1745
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows the guest to reprogram the offload settings and mitigates
issues where the Linux kernel tries to reprogram the queues even when
the feature is not advertised.
Fixes: #2528
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Cleanup the control queue handling in preparation for supporting
alternative commands.
Note that this change does not make the MQ handling spec compliant.
According to the specification MQ should only be enabled once the number
of queue pairs the guest would like to use has been specified. The only
improvement towards the specication in this change is correct error
handling if the guest specifies an inappropriate number of queues (out
of range.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>