Each virtio thread was reading/draining the pause_evt pipe when
detecting the associated event. Problem is, when a virtio device has
multiple threads, they all share the same pause_evt pipe, which can
prevent some threads from receiving the event. If the first thread to
catch the event is quickly clearing the pipe, some other threads might
simply miss the event and they will not enter the "paused" state as
expected.
This is a behavior that was spotted with virtio-net as it usually uses
2 threads by default (1 for TX/RX queues and 1 for the control queue).
The way to solve this issue is by letting each thread drain the pipe
during the resume codepath, that is after the thread has been unparked.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The same way the VM and the vCPUs are restored in a paused state, all
devices associated with the device manager must be restored in the same
paused state.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 78ca0a942f32140465c67ea4b45d68c52c72d751.
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 6dbe8e021a64ba3742081741a7538cdfd93a102e.
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 78819f35f63f5777a58e3e1e774b3270b32881ed.
The vsock TX buffer flush operation would report inconsistent results,
under specific circumstances.
The flush operation is performed in two steps, since it's dealing with a
ring buffer, an the data to be flushed may wrap around. If the first
step was successful, but the second one failed, the whole flush
operation would report an error, thus causing flow control accounting to
lose track of the bytes that were successfully written by the first
pass.
This commit changes the flush behavior to always report success when
some data has been written to the backing stream.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 2da612a9cdce85c91fb54ab22d950ec6ccc93b27.
Fixed a bug introduced by a271d08f0b1ba0ee82761cd49244b6a8017bcede,
whereby the flow control accouting would be off by a few bytes, for
host-initiated connections.
The connection ack message ("OK <port_num><CR>") was accounted for as
data sent by the guest, so its length was substracted from the total
amount of data the guest was allowed to send.
This commit changes the way this ack message is sent, so that it
bypasses flow control accouting.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 109e631566350867dafa4b16c3919dfd1533eeea.
This commit changes the vsock connection state machine behavior to absorb
any EWOULDBLOCK errors recevied while handling an EPOLLOUT event. Previously,
this condition would lead to immediate connection termination.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 660d18cf7fee5b38c3b1b17a5da6544b9025909d.
Apparently, epoll_wait sometimes yields false EPOLLIN events (i.e. events
follwing which read() would fail with EWOULDBLOCK). This would cause the
vsock connection state machine to terminate connections, since an error
was detected on the underlying Unix socket.
This commit changes the vsock connection state machine code to handle such
erroneous EPOLLIN events by absorbing EWOULDBLOCK read() errors.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 1cc8b8a678eb28b20f5843556bdb7fbb2dfa6284.
Fixed a logical error in the vsock flow control, that would cause credit
update packets to not be sent at the right time.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is d2475773557c82d2abad2fc8bdf69e7d01444109.
Fixed a vsock muxer issue that would cause a connection to be removed
from the RX queue, even though it still had pending RX data.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch adds `is_empty` method to VsockPacket to fix the
following clippy error:
error: item `vsock::packet::VsockPacket` has a public `len` method but no corresponding `is_empty` method
--> vm-virtio/src/vsock/packet.rs💯1
|
100 | / impl VsockPacket {
101 | | /// Create the packet wrapper from a TX virtq chain head.
102 | | ///
103 | | /// The chain head is expected to hold valid packet header data. A following packet buffer
... |
334 | | }
335 | | }
| |_^
|
= note: `-D clippy::len-without-is-empty` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
The API has change to use generic GuestMemory trait:
pub fn get_host_address_range<M: GuestMemory>(
mem: &M,
addr: GuestAddress,
size: usize,
) -> Option<*mut u8> {
Signed-off-by: Arron Wang <arron.wang@intel.com>
It's not possible to call UnixListener::Bind() on an existing file so
unlink the created socket when shutting down the Vsock device.
This will allow the VM to be rebooted with a vsock device.
Fixes: #1083
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This identifier is chosen from the DeviceManager so that it will manage
all identifiers across the VM, which will ensure uniqueness.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
When a virtio device is paused an event is written to the appropriate
"pause" EventFd for the device. This will be noticed by the the device's
epoll_wait(), an atomic bool checked an if true then the thread is
parked(). When resuming the bool is reset and the thread is unpark()ed.
However the event triggering the pause is still in the EventFd so the
epoll_wait() will continue to return but because the boolean is not set
the thread will not be park()ed but instead we will busy loop around an
event that is not being consumed.
The solution is to drain the "pause" EventFd when the event is first
received and thus the epoll_wait() will only return for the pause event
once. This resolves the infinite epoll_wait() wake-ups.
Fixes: #869
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Remove duplicated code across the different devices by handling
the virtio feature pages in VirtioDevice itself rather than
in the backends. This works as no virtio devices use feature
bits beyond 64-bits.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
The existing code taking care of the epoll loop was too restrictive as
it was considering all errors the same. But in case the error is EINTR,
this means the syscall has been interrupted while waiting, and it should
be resumed to wait again.
This patch enforces the parsing of the returned error and prevent the
code from assuming EINTR should be handled as all other errors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding an internal layer of abstraction (the hidden VirtioPausable
trait), we can factorize the virtio common code.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that we unified epoll_thread to potentially be a vector of threads,
it makes sense to make it a plural field.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Although only the block and net virtio devices can actually be multi
threaded (for now), handling them as special cases makes the code more
complex.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 1db04ccc69862f30b7814f30024d112d1b86b80e.
Changed the host-initiated vsock connection protocol to include a
trivial handshake.
The new protocol looks like this:
- [host] CONNECT <port><LF>
- [guest/success] OK <assigned_host_port><LF>
On connection failure, the host host connection is reset without any
accompanying message, as before.
This allows host software to more easily detect connection failures, for
instance when attempting to connect to a guest server that may have not
yet started listening for client connections.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The vsock packets that we're building are resolving guest addresses to
host ones and use the latter as raw pointers.
If the corresponding guest mapped buffer spans across several regions in
the guest, they will do so in the host as well. Since we have no
guarantees that host regions are contiguous, it may lead the VMM into
trying to access memory outside of its memory space.
For now we fix that by ensuring that the guest buffers do not span
across several regions. If they do, we error out.
Ideally, we should enhance the rust-vmm memory model to support safe
acces across host regions.
Fixes CVE-2019-18960
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Due to the amount of code currently duplicated across virtio devices,
the stats for this commit is on the large side but it's mostly more
duplicated code, unfortunately.
Migratable and Snapshotable placeholder implementations are provided as
well, making all virtio devices Migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio specification defines a device can be reset, which was not
supported by this virtio-vsock implementation. The reason it is needed
is to support unbinding this device from the guest driver, and rebind
it to vfio-pci driver.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This unit testing porting effort is based off of Firecracker commit
1e1cb6f8f8003e0bdce11d265f0feb23249a03f6
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is the last step connecting the dots between the virtio-vsock
device and the bulk of the logic hosted in the unix and csm modules.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit relies on the new vsock::unix module to create the backend
that will be used from the virtio-vsock device.
The concept of backend is interesting here as it would allow for a vhost
kernel backend to be plugged if that was needed someday.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This code porting is based off of Firecracker commit
1e1cb6f8f8003e0bdce11d265f0feb23249a03f6
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This code porting is based off of Firecracker commit
1e1cb6f8f8003e0bdce11d265f0feb23249a03f6
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is a lot of code related to this virtio-vsock hybrid
implementation, that's why it's better to keep it under its
own module.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>