The virtio devices backed by a vhost-user backend must send an update to
the associated backend with the new file descriptors corresponding to
the memory regions.
This patch allows such devices to be notified when such update needs to
happen.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Virtio-fs daemon expects fs_slave_io() returns the number of bytes
read/written on success, but we always return 0 and make userspace think
nothing has been read/written.
Fix it by returning the actual bytes read/written. Note that This
depends on the corresponding fix in vhost crate.
Fixes: #949
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
On x86_64, a hint to the compiler is not enough, we need to issue a
MFENCE instruction. Replace the Acquire fence with a SeqCst one.
Without this, it's still possible to miss an used_event update,
leading to the omission of a notification, possibly stalling the
vring.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Virtiofs's dax window can be used as read/write's source (e.g. mmap a file
on virtiofs), but the dax window area is not shared with vhost-user
backend, i.e. virtiofs daemon.
To make those IO work, addresses of this kind of IO source are routed to
VMM via FS_IO requests to perform a read/write from an fd directly to the
given GPA.
This adds the support of FS_IO request to clh's vhost-user-fs master part.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
"DescriptorChain"s are tied to the lifetime of the referenced
GuestMemoryMmap object (for good reasons), but sometimes (i.e., when
processing descriptors from different contexts) we may need to switch
them to point a different GuestMemoryMmap.
Here we introduce the structure DescriptorHead, which holds the data
needed to rebuild a DescriptorChain, the method "get_head" which
returns the DescriptorHead for a DescriptorChain, and the method
"new_from_head", which allows to create a new DescriptorChain with a
DescriptorHead and a new reference to a GuestMemoryMmap.
Signed-off-by: Sergio Lopez <slp@redhat.com>
get_used_event is used from vhost_user_backend:needs_notification to
check whether an interrupt must be sent to the guest to notify there
are new items in the queue. Shorten the update window by asking the
the compiler to inline this method, so a write won't slip between the
read of the memory contents and the actual check.
Signed-off-by: Sergio Lopez <slp@redhat.com>
When a virtio device is paused an event is written to the appropriate
"pause" EventFd for the device. This will be noticed by the the device's
epoll_wait(), an atomic bool checked an if true then the thread is
parked(). When resuming the bool is reset and the thread is unpark()ed.
However the event triggering the pause is still in the EventFd so the
epoll_wait() will continue to return but because the boolean is not set
the thread will not be park()ed but instead we will busy loop around an
event that is not being consumed.
The solution is to drain the "pause" EventFd when the event is first
received and thus the epoll_wait() will only return for the pause event
once. This resolves the infinite epoll_wait() wake-ups.
Fixes: #869
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The details of the SHM regions or the lack of, which is used by
virtio-fs DAX, is communicated through configuration fields on the
virtio-mmio memory region. Implement the necessary fields to return the
SHM entries and in particular return a length of (u64)-1 which is used
by the kernel to indicate there are no SHM regions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As cloud-hypervisor/vhost crate (dragonball branch) is ready to be used,
switch vhost_rs from internal crate to the external one.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Current device configuration space offset value is 0, we need to
update that value to VHOST_USER_CONFIG_OFFSET(0x100) to follow the spec
Fixes#844
Signed-off-by: Arron Wang <arron.wang@intel.com>
Indirect descriptors is a virtio feature that allows the driver to
store a table of descriptors anywhere in memory, pointing to it from a
virtqueue ring's descriptor with a particular flag.
We can't seamlessly transition from an iterator over a conventional
descriptor chain to an indirect chain, so Queue users need to
explicitly support this feature by calling Queue::is_indirect() and
Queue::new_from_indirect().
Signed-off-by: Sergio Lopez <slp@redhat.com>
VIRTIO_RING_F_EVENT_IDX is a virtio feature that allows to avoid
device <-> driver notifications under some circunstances, most
notably when actively polling the queue.
This commit implements support for in in the vm-virtio
crate. Consumers of this crate will also need to add support for it by
exposing the feature and calling using update_avail_event() and
get_used_event() accordingly.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows the VMM to explicitly shutdown devices as part of the VM
shutdown ahead of what Drop::drop() would do.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By detecting if an existing tap interface supports multiqueue, we now
have the information to determine if the command line parameters
regarding the number of queues is correct.
Fixes#738
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Remove duplicated code across the different devices by handling
the virtio feature pages in VirtioDevice itself rather than
in the backends. This works as no virtio devices use feature
bits beyond 64-bits.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The existing code taking care of the epoll loop was too restrictive as
it was considering all errors the same. But in case the error is EINTR,
this means the syscall has been interrupted while waiting, and it should
be resumed to wait again.
This patch enforces the parsing of the returned error and prevent the
code from assuming EINTR should be handled as all other errors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The previous code only support one queue, and we need
to support MQ in vhost user block device. This patch
can work with SPDK with MQ setting.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Based on the new structures previously introduced, the new topology
feature is being fully implemented through this commit. This allows
the description of the devices attached to the virtual IOMMU, which
is why a new function attach_devices() has been introduced. It gives
the virtual IOMMU device the full list of devices which must be attached
to it, letting the device share this information through its virtio
configuration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio-iommu device defines a new virtio feature allowing the
topology to be discovered fully through virtio configuration.
By topology, it means describing the devices attached to the virtual
IOMMU. This is currently managed through ACPI with IORT and VIOT table,
but this is another way of describing it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio capability VIRTIO_PCI_CAP_PCI_CFG is exposed through the
device's PCI config space the same way other virtio-pci capabilities
are exposed.
The main and important difference is that this specific capability is
designed as a way for the guest to access virtio capabilities without
mapping the PCI BAR. This is very rarely used, but it can be useful when
it is too early for the guest to be able to map the BARs.
One thing to note, this special feature MUST be implemented, based on
the virtio specification.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need to support more features related to the
access of a device's PCI config space, this commits changes the self
reference in the function read_config_register() to be mutable.
This also brings some more flexibility for any implementation of the
PciDevice trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces a clear definition of the virtio-fs
configuration structure, allowing vhost-user-fs device to
rely on it.
This makes the code more readable for developers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit reuses the clear definition of the virtio-blk
configuration structure, allowing both vhost-user-blk and
virtio-blk devices to rely on it.
This makes the code more readable for developers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces a clear definition of the virtio-net
configuration structure, allowing both vhost-user-net and
virtio-net devices to rely on it.
This makes the code more readable for developers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit improves the existing virtio-blk implementation, allowing
for better I/O performance. The cost for the end user is to accept
allocating more vCPUs to the virtual machine, so that multiple I/O
threads can run in parallel.
One thing to notice, the amount of vCPUs must be egal or superior to the
amount of queues dedicated to the virtio-blk device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The trait bound and non trait bound virtio devices can use the same
inner implementation.
Also, the virtio pausable trait definiton can also be factorized.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that we have factorized the common virtio pausable implementation,
it's cleaner to have a dedicated macro for control queue devices rather
than overload the macro prototype.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By adding an internal layer of abstraction (the hidden VirtioPausable
trait), we can factorize the virtio common code.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that we unified epoll_thread to potentially be a vector of threads,
it makes sense to make it a plural field.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Although only the block and net virtio devices can actually be multi
threaded (for now), handling them as special cases makes the code more
complex.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.
Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that KVM specific interrupts are handled through InterruptManager
trait implementation, the vm-virtio crate does not need to rely on
kvm_ioctls and kvm_bindings crates.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>