Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.
This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.
Here is the list of devices now following the new restore design:
- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The new virtio-queue version introduced some breaking changes which need
to be addressed so that Cloud Hypervisor can still work with this
version.
The most important change is about removing a handle to the guest memory
from the Queue, meaning the caller has to provide the guest memory
handle for multiple methods from the QueueT trait.
One interesting aspect is that QueueT has been widely extended to
provide every getter and setter we need to access and update the Queue
structure without having direct access to its internal fields.
This patch ports all the virtio and vhost-user devices to this new crate
definition. It also updates both vhost-user-block and vhost-user-net
backends based on the updated vhost-user-backend crate. It also updates
the fuzz directory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Rather than relying on the amount of queues to enable or disable the
queue that have been activated, we rely on the actual queue indexes
provided through the tuple including the queue index, the Queue and the
EventFd. By storing the list of indexes, we simplify the code and also
make it more accurate in case some queues aren't activated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of passing separately a list of Queues and the equivalent list
of EventFds, we consolidate these two through a tuple along with the
queue index.
The queue index can be very useful if looking for the actual index
related to the queue, no matter if other queues have been enabled or
not.
It's also convenient to have the EventFd associated with the Queue so
that we don't have to carry two lists with the same amount of items.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's not mandatory for the virtio-fs driver to enable all virtqueues
provided by the backend since all it needs is one request queue to work
correctly. Therefore we lower the minimal amount of enabled queues to 1.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The file descriptor provided to fs_slave_map() and fs_slave_io() is
passed as a AsRawFd trait, meaning the caller owns it. For that reason,
there's no need for these functions to close the file descriptor as it
will be closed later on anyway.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For vhost-user devices, we don't want to loose the vhost-user protocol
feature through the negotiation between guest and device. Since we know
VIRTIO has no knowledge of the vhost-user protocol feature, there is no
way it would ever be acknowledged by the guest. For that reason, we
create each vhost-user device with the set of acked features containing
the vhost-user protocol feature is this one was part of the available
list.
Having the set of acked features containing this bit allows for solving
a bug that was happening through the migration process since the
vhost-user protocol feature wasn't explicitely enabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.
The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Setting the reply_ack should depend on the set of acknowledged features
containing the REPLY_ACK flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a common solution for spawning the virtio threads which will
make it easier to add the panic handling.
During this effort I discovered that there were no seccomp filters
registered for the vhost-user-net thread nor the vhost-user-block
thread. This change also incorporates basic seccomp filters for those as
part of the refactoring.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Introducing a new structure VhostUserCommon allowing to factorize a lot
of the code shared between the vhost-user devices (block, fs and net).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We cannot let vhost-user devices connect to the backend when the Block,
Fs or Net object is being created during a restore/migration. The reason
is we can't have two VMs (source and destination) connected to the same
backend at the same time. That's why we must delay the connection with
the vhost-user backend until the restoration is performed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to prevent the vhost-user devices from reconnecting to the
backend after the migration has been successfully performed, we make
sure to kill the thread in charge of handling the reconnection
mechanism.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
During a migration, the vhost-user device talks to the backend to
retrieve the dirty pages. Once done with this, a snapshot will be taken,
meaning there's no need to communicate with the backend anymore. Closing
the communication is needed to let the destination VM being able to
connect to the same backend.
That's why we shutdown the communication with the backend in case a
migration has been started and we're asked for a snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This anticipates the need for creating a new Blk, Fs or Net object
without having performed the connection with the vhost-user backend yet.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the common vhost-user code can handle logging dirty pages
through shared memory, we need to advertise it to the vhost-user
backends with the protocol feature VHOST_USER_PROTOCOL_F_LOG_SHMFD.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the support for snapshot/restore feature for all supported
vhost-user devices.
The complexity of vhost-user-fs device makes it only partially
compatible with the feature. When using the DAX feature, there's no way
to store and remap what was previously mapped in the DAX region. And
when not using the cache region, if the filesystem is mounted, it fails
to be properly restored as this would require a special command to let
the backend know that it must remount what was already mounted before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch moves all vhost-user common functions behind a new structure
VhostUserHandle. There is no functional changes intended, the only goal
being to prepare for storing information through this new structure,
limiting the amount of parameters that are needed for each function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This dependency bump needed some manual handling since the API changed
quite a lot regarding some RawFd being changed into either File or
AsRawFd traits.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
Since the reconnection thread took on the responsibility to handle
backend initiated requests as well, the variable naming should reflect
this by avoiding the "reconnect" prefix.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Vhost user INFLIGHT_SHMFD protocol feature supports inflight I/O
tracking, this commit implement the vhost-user device (master) support
of the feature. Till this commit, specific vhost-user devices (blk, fs,
or net) have not enable this feature.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Add the support for reconnecting the backend request handler after a
disconnection/crash happened.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the slave request handler is common to all vhost-user devices, the
same way the reconnection is, it makes sense to handle the requests from
the backend through the same thread.
The reconnection thread now handles both a reconnection as well as any
request coming from the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit enables socket reconnection for vhost-user-fs backends. Note
that, till this commit:
- The re-establish of the slave communication channel is no supported. So
the socket reconnection does not support virtiofsd with DAX enabled.
- Inflight I/O tracking and restoring is not supported. Therefore, only
virtio-fs daemons that are not processing inflight requests can work
normally after reconnection.
- To make the restarted virtiofsd work normally after reconnection, the
internal status of virtiofsd should also be recovered. This is not the
work of cloud-hypervisor. If the virtio-fs daemon does not support
saving or restoring its internal status, then a re-mount in guest after
socket reconnection should be performed.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Add a helper to VirtioCommon which returns duplicates of the EventFds
for kill and pause event.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A lot of the VIRTIO reserved features should be supported or not by the
vhost-user backend. That means on the VMM side, these features should be
available, so that they don't get lost during the negotiation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Factorize the virtio features and vhost-user protocol features
negotiation through a common function that blk, fs and net
implementations can directly rely on.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure the virtio features are set upon device activation. At the
time the device is activated, we know the guest acknowledged the
features, which mean it's safe to set them back to the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio features are negotiated and set at the time the device is
created, hence there's no need to set the features again while going
through the vhost-user setup that is performed upon queue activation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some refactoring is performed in order to always expect the irqfd to be
provided by VirtioInterrupt trait. In case no irqfd is available, we
simply fail initializing the vhost-user device. This allows for further
simplification since we can assume the interrupt will always be
triggered directly by the vhost-user backend without proxying through
the VMM. This allows for complete removal of the dedicated thread for
both block and net.
vhost-user-fs is a bit more complex as it requires the slave request
protocol feature in order to support DAX. That's why we still need the
VMM to interfere and therefore run a dedicated thread for it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
error: reference to packed field is unaligned
--> virtio-devices/src/vhost_user/fs.rs:85:21
|
85 | fs.flags[i].bits() as i32,
| ^^^^^^^^^^^
|
= note: `-D unaligned-references` implied by `-D warnings`
= warning: this was previously accepted by the compiler but is being
phased out; it will become a hard error in a future release!
= note: for more information, see issue #82523
<https://github.com/rust-lang/rust/issues/82523>
= note: fields of packed structs are not properly aligned, and
creating a misaligned reference is undefined behavior (even if that
reference is never dereferenced)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `TYPE_UNKNOWN` contains a capitalized acronym
--> vm-virtio/src/lib.rs:48:5
|
48 | TYPE_UNKNOWN = 0xFF,
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `Type_Unknown`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that virtio devices can be updated with add_memory_region(), there's
no need to keep update_memory() around.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>