If VIRTIO_RING_F_EVENT_IDX is negotiated only generate suppress
interrupts if the guest has asked us to do so.
Fixes: #788
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In some situations it is seen that the first interrupt sent to the guest
is lost upon a restore (due to the tap worker being awake ahead of the
vPUs).
This causes problems with VIRTIO_RING_F_EVENT_IDX interrupt suppression
as the guest will not be interrupted again in order to mitigate this we
always interrupt the guest until the device itself has been signalled by
the guest.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This requires exposing the struct members and also using Option<..>
types for the main epoll fd and the memory as they are initialised later
in vhost-user-net.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split handling of behaviour that is independent of the device itself so
that it can be reused in the vhost-user-net device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split out functions that work just on the TAP device and queues. Whilst
doing so also improve the error handling to return Results rather than
drop errors.
This change also addresses a bug where the TAP event suppression could
ineffectual because it was being enabled immediately after it may have
been disabled:
resume_rx -> rx_single_frame -> unregister_listener -> resume_rx ->
register_listener.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
According to the virtio spec the guest should always be interrupted when
"used" descriptors are returned from the device to the driver. However
this was not the case for the TX queue in either the virtio-net
implementation or the vhost-user-net implementation.
This would have meant that the guest could end up with a reduced TX
throughput as it would not know that the packets had been dispatched via
the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.
Support for vhost-user-net self spawning was also included.
Fixes: #1177
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This identifier is chosen from the DeviceManager so that it will manage
all identifiers across the VM, which will ensure uniqueness.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
When a virtio device is paused an event is written to the appropriate
"pause" EventFd for the device. This will be noticed by the the device's
epoll_wait(), an atomic bool checked an if true then the thread is
parked(). When resuming the bool is reset and the thread is unpark()ed.
However the event triggering the pause is still in the EventFd so the
epoll_wait() will continue to return but because the boolean is not set
the thread will not be park()ed but instead we will busy loop around an
event that is not being consumed.
The solution is to drain the "pause" EventFd when the event is first
received and thus the epoll_wait() will only return for the pause event
once. This resolves the infinite epoll_wait() wake-ups.
Fixes: #869
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Remove duplicated code across the different devices by handling
the virtio feature pages in VirtioDevice itself rather than
in the backends. This works as no virtio devices use feature
bits beyond 64-bits.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
This commit introduces a clear definition of the virtio-net
configuration structure, allowing both vhost-user-net and
virtio-net devices to rely on it.
This makes the code more readable for developers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we have factorized the common virtio pausable implementation,
it's cleaner to have a dedicated macro for control queue devices rather
than overload the macro prototype.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that we unified epoll_thread to potentially be a vector of threads,
it makes sense to make it a plural field.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
While feature VIRTIO_NET_F_CTRL_VQ is negotiated, control queue
will exits besides the Tx/Rx virtqueues, an epoll handler should
be started to monitor and handle the control queue event.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Due to the amount of code currently duplicated across virtio devices,
the stats for this commit is on the large side but it's mostly more
duplicated code, unfortunately.
Migratable and Snapshotable placeholder implementations are provided as
well, making all virtio devices Migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio specification defines a device can be reset, which was not
supported by this virtio-net implementation. The reason it is needed is
to support unbinding this device from the guest driver, and rebind it to
vfio-pci driver.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that virtio-bindings is a crate part of the rust-vmm project, we
want to rely on this one instead of the local one we had so far.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By making the registration functions immutable, this patch prevents from
self borrowing issues with the RwLock on self.mem.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Following the refactoring of the code allowing multiple threads to
access the same instance of the guest memory, this patch goes one step
further by adding RwLock to it. This anticipates the future need for
being able to modify the content of the guest memory at runtime.
The reasons for adding regions to an existing guest memory could be:
- Add virtio-pmem and virtio-fs regions after the guest memory was
created.
- Support future hotplug of devices, memory, or anything that would
require more memory at runtime.
Because most of the time, the lock will be taken as read only, using
RwLock instead of Mutex is the right approach.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VMM guest memory was cloned (copied) everywhere the code needed to
have ownership of it. In order to clean the code, and in anticipation
for future support of modifying this guest memory instance at runtime,
it is important that every part of the code share the same instance.
Because VirtioDevice implementations need to have access to it from
different threads, that's why Arc must be used in this case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When there are no available descriptors in the queue (observed when the
network interface hasn't been brought up by the kernel) stop waiting for
notifications that the TAP fd should be read from.
This avoids a situation where the TAP device has data avaiable and wakes
up the virtio-net thread only for the virtio-net thread not read that
data as it has nowhere to put it.
When there are descriptors available in the queue then we resume waiting
for the epoll event on the TAP fd.
This bug demonstrated itself as 100% CPU usage for cloud-hypervisor
binary prior to the guest network interface being brought up. The
solution was inspired by the Firecracker virtio-net code.
Fixes: #208
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The numbr of bytes read was being incorrectly increased by the potential
length of the end of the sliced data rather than the number of bytes
that was in the range. This caused a panic when the the network was
under load by using iperf.
It's important to note that in the Firecracker code base the function
that read_slice() returns the number of bytes read which is used to
increment this counter. The VM memory version however only returns the
empty unit "()".
Fixes: #166
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The structure of the vmm-sys-util crate has changed with lots of code
moving to submodules.
This change adjusts the use of the imported structs to reference the
submodules.
Fixes: #145
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The existing code taking care of the epoll loop was too restrictive as
it was propagating the error returned from the epoll_wait() syscall, no
matter what was the error. This causes the epoll loop to be broken,
leading to a non-functional virtio device.
This patch enforces the parsing of the returned error and prevent from
the error propagation in case it is EINTR, which stands for Interrupted.
In case the epoll loop is interrupted, it is appropriate to retry.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per the VIRTIO specification, every virtio device configuration can
be updated while the guest is running. The guest needs to be notified
when this happens, and it can be done in two different ways, depending
on the type of interrupt being used for those devices.
In case the device uses INTx, the allocated IRQ pin is shared between
queues and configuration updates. The way for the guest to differentiate
between an interrupt meant for a virtqueue or meant for a configuration
update is tied to the value of the ISR status field. This field is a
simple 32 bits bitmask where only bit 0 and 1 can be changed, the rest
is reserved.
In case the device uses MSI/MSI-X, the driver should allocate a
dedicated vector for configuration updates. This case is much simpler as
it only requires the device to send the appropriate MSI vector.
The cloud-hypervisor codebase was not supporting the update of a virtio
device configuration. This patch extends the existing VirtioInterrupt
closure to accept a type that can be Config or Queue, so that based on
this type, the closure implementation can make the right choice about
which interrupt pin or vector to trigger.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we cannot always assume the irq fd will be the way to send
an IRQ to the guest, this means we cannot make the assumption that
every virtio device implementation should expect an EventFd to
trigger an IRQ.
This commit organizes the code related to virtio devices so that it
now expects a Rust closure instead of a known EventFd. This lets the
caller decide what should be done whenever a device needs to trigger
an interrupt to the guest.
The closure will allow for other type of interrupt mechanism such as
MSI to be implemented. From the device perspective, it could be a
pin based interrupt or an MSI, it does not matter since the device
will simply call into the provided callback, passing the appropriate
Queue as a reference. This design keeps the device model generic.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to provide connectivity through network interface between
host and guest, this patch introduces the virtio-net backend.
This code is based on Firecracker commit
d4a89cdc0bd2867f821e3678328dabad6dd8b767
It is a trimmed down version of the original files as it removes the
rate limiter support. It has been ported to support vm-memory crate
and the epoll handler has been modified in order to run a dedicated
epoll loop from the device itself. This epoll loop runs in its own
dedicated thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>