FS_IO is part of the actions a vhost-user-fs daemon can ask the VMM to
perform on its behalf. It is meant to read/write the content from a file
descriptor directly into a guest memory region. This region can either
be a RAM region or the dedicated cache region for virtio-fs.
The way FS_IO was implemented, it was only expecting the guest physical
address provided through the "cache_offset" field to refer to the cache
region. Unfortunately, this was only implementing FS_IO partially.
This patch extends the existing FS_IO implementation by checking the GPA
against the cache region as a first step, but if it is not part of the
cache region address range, then we fallback onto searching for a RAM
region that could match. If there is a matching RAM region, we retrieve
the corresponding host address to let the VMM read/write from/to it.
Fixes: #1054
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
This patch implements the Snapshottable trait for virtio-console, which
enables migration support for it. A VM with a virtio-console device
attached can be snapshot and then restored without issues.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
This brings the migration support to virtio-pmem device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
The frame buffer must be updated depending on the amount read from it,
which depends on the number and depth of descriptors available at the
time of the processing.
This patch handles this buffer update, and allow for large buffers to be
correctly processed in multiple rounds.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On the restore path, using the available and used indexes read from
memory to fill the Queue structure was a mistake. Indeed, the available
index is written from the guest and it reflects the last available index
in the descriptor table. But the driver might have queued a lot of
buffers which have not yet been used by the device. This leads to a
situation where the next_avail from Queue is completely different from
the one we can read from memory.
Instead, the right way to determine the next_avail index that should be
used by the device is by relying on the used index from the memory. This
index represents the correct information we're looking for as it has
been updated before the snapshot to let the guest know the next index to
process.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
First, this modifies the existing helpers on how to get indexes for
available and used rings from memory. Instead of updating the queue
through each helper, they are now used as simple getters.
Based on these new getters, we could create a new helper to determine if
the queue has some available descriptors already queued from the driver
side. This helper is going to be particularly helpful when trying to
determine from a virtio thread if a queue is already loaded with some
available buffers that can be used to send information to the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the virtio-fs device is backed by a vhost-user process, it is
important to implement the proper shutdown() function from the
VirtioDevice trait, as vhost-user-blk and vhost-user-net do.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When hot-unplugging the virtio-pmem from the VM, we don't remove the
associated userspace mapping. This patch will let us fix this in a
following patch. For now, it simply adapts the code so that the Pmem
device knows about the mapping associated with it. By knowing about it,
it can expose it to the caller through the new userspace_mappings()
function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will help when we will implement the hot-unplug of the virtio-fs
device, as we will have to remove correctly the userspace mappings
associated with the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce new getter function to the VirtioDevice trait, as it will
allow the caller to retrieve the list of userspace mappings associated
with the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the context of the shared memory region used by virtio-fs in order to
support DAX feature, the shared region is exposed as a dedicated PCI
BAR, and it is backed by a KVM userspace mapping.
Upon BAR remapping, the BAR is moved to a different location in the
guest address space, and the KVM mapping must be updated accordingly.
Additionally, we need the VirtioDevice to report the updated guest
address through the shared memory region returned by get_shm_regions().
That's why a new setter is added to the VirtioDevice trait, so that
after the mapping has been updated for KVM, we can tell the VirtioDevice
the new guest address the shared region is located at.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding the shared memory regions to the list of BARs, we make sure
the DeviceManager will register it as a BAR on the PCI bus. Without
this, when PCI BAR reprogramming happens, the PCI bus errors since it
does not know about any BAR at the specified address.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Any virtio device relying on the mmio transport layer can be snapshotted
and restored thanks to this new patch. From the MmioDevice perspective,
it is mainly a matter of saving the information about the virtqueues as
the restore path will need them to activate the device (if needed
because it has been activated before being snapshotted).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for adding snapshot/restore support to virtio devices,
this commit introduces two new helpers updating the available and used
indexes of a queue, relying on the guest memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit relies on serde to serialize and deserialize the content of
a Queue structure. This will be useful information to store when
implementing snapshot/restore feature for virtio devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Add support for specifying the PCI revision in the PCI configuration and
populate this with the value of 1 for virtio-pci devices.
The virtio-pci specification is slightly ambiguous only saying that
transitional (i.e. devices that support legacy and virtio 1.0) should
set this to 0. In practice it seems that software expects the revision
to be set to 1 for modern only devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an accessor to return the underlying VirtioDevice. This is useful
for managing the removal of the device from internal datastructures when
handling virtio-pci device unplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to support freeing the memory that is allocated we need to make
sure that we update the internal representation so that free_bars() can
correctly free the memory if the device has its BARs moved.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement the free_bars() method from the PciDevice trait which is used
as part of the device removal process. Although there is only one BAR
allocated by VirtioPciDevice simplify the code by using a vector.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the release of the managed memory region from the DeviceManager to
the vhost-user-fs device. This ensures that the memory will be freed when
the device is unplugged which will lead to it being Drop()ed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the release of the managed memory region from the DeviceManager to
the virtio-pmem device. This ensures that the memory will be freed when
the device is unplugged which will lead to it being Drop()ed.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Return libc::EINVAL instead of custom "Wrong offset" error, as mmap(2)
returns EINVAL when offset/len is invalid.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
In fs_slave_map/unmap/sync, we only made sure offset < cache_size, but
didn't validate (offset + len). We should ensure [offset, offset+len]
is within cache range as well.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
The basic idea of virtio-mem is to provide a flexible, cross-architecture
memory hot plug and hot unplug solution that avoids many limitations
imposed by existing technologies, architectures, and interfaces. More
details can be found in https://lkml.org/lkml/2019/12/12/681.
This commit add virtio-mem device.
Signed-off-by: Hui Zhu <teawater@antfin.com>
We made sure gpa is in cache range, but not the end addr of request,
which is (gpa + len). If the end addr of request is beyond dax cache
window, vmm would corrupt guest memory or crash.
Fix it by making sure end addr of request is within cache range as well.
And while we're at it, return EFAULT if the request is out of range, as
write(2)/read(2) returns EFAULT when buffer is outside accessible
address space.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
In order to keep vhost-user backend to work across guest memory resizing
happening when memory is hot-plugged or hot-unplugged, both blk, net and
fs devices are implementing the notifier to let the backend know.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By factorizing the setup of the memory table for vhost-user, we
anticipate the fact that vhost-user devices are going to reuse this
function when the guest memory will be updated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio devices backed by a vhost-user backend must send an update to
the associated backend with the new file descriptors corresponding to
the memory regions.
This patch allows such devices to be notified when such update needs to
happen.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Virtio-fs daemon expects fs_slave_io() returns the number of bytes
read/written on success, but we always return 0 and make userspace think
nothing has been read/written.
Fix it by returning the actual bytes read/written. Note that This
depends on the corresponding fix in vhost crate.
Fixes: #949
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
On x86_64, a hint to the compiler is not enough, we need to issue a
MFENCE instruction. Replace the Acquire fence with a SeqCst one.
Without this, it's still possible to miss an used_event update,
leading to the omission of a notification, possibly stalling the
vring.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Virtiofs's dax window can be used as read/write's source (e.g. mmap a file
on virtiofs), but the dax window area is not shared with vhost-user
backend, i.e. virtiofs daemon.
To make those IO work, addresses of this kind of IO source are routed to
VMM via FS_IO requests to perform a read/write from an fd directly to the
given GPA.
This adds the support of FS_IO request to clh's vhost-user-fs master part.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
"DescriptorChain"s are tied to the lifetime of the referenced
GuestMemoryMmap object (for good reasons), but sometimes (i.e., when
processing descriptors from different contexts) we may need to switch
them to point a different GuestMemoryMmap.
Here we introduce the structure DescriptorHead, which holds the data
needed to rebuild a DescriptorChain, the method "get_head" which
returns the DescriptorHead for a DescriptorChain, and the method
"new_from_head", which allows to create a new DescriptorChain with a
DescriptorHead and a new reference to a GuestMemoryMmap.
Signed-off-by: Sergio Lopez <slp@redhat.com>
get_used_event is used from vhost_user_backend:needs_notification to
check whether an interrupt must be sent to the guest to notify there
are new items in the queue. Shorten the update window by asking the
the compiler to inline this method, so a write won't slip between the
read of the memory contents and the actual check.
Signed-off-by: Sergio Lopez <slp@redhat.com>
When a virtio device is paused an event is written to the appropriate
"pause" EventFd for the device. This will be noticed by the the device's
epoll_wait(), an atomic bool checked an if true then the thread is
parked(). When resuming the bool is reset and the thread is unpark()ed.
However the event triggering the pause is still in the EventFd so the
epoll_wait() will continue to return but because the boolean is not set
the thread will not be park()ed but instead we will busy loop around an
event that is not being consumed.
The solution is to drain the "pause" EventFd when the event is first
received and thus the epoll_wait() will only return for the pause event
once. This resolves the infinite epoll_wait() wake-ups.
Fixes: #869
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The details of the SHM regions or the lack of, which is used by
virtio-fs DAX, is communicated through configuration fields on the
virtio-mmio memory region. Implement the necessary fields to return the
SHM entries and in particular return a length of (u64)-1 which is used
by the kernel to indicate there are no SHM regions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As cloud-hypervisor/vhost crate (dragonball branch) is ready to be used,
switch vhost_rs from internal crate to the external one.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Current device configuration space offset value is 0, we need to
update that value to VHOST_USER_CONFIG_OFFSET(0x100) to follow the spec
Fixes#844
Signed-off-by: Arron Wang <arron.wang@intel.com>
Indirect descriptors is a virtio feature that allows the driver to
store a table of descriptors anywhere in memory, pointing to it from a
virtqueue ring's descriptor with a particular flag.
We can't seamlessly transition from an iterator over a conventional
descriptor chain to an indirect chain, so Queue users need to
explicitly support this feature by calling Queue::is_indirect() and
Queue::new_from_indirect().
Signed-off-by: Sergio Lopez <slp@redhat.com>