As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
DescriptorChain::is_valid() wrongly used .checked_offset() to attempt to
validate that the descriptor's data is in valid memory. This works in
all cases except where the guest has placed the data at the very end of
the guest memory as the offset + offset will be outside the range (as
the combined offset will be the next byte and as such out of the guest
memory). Instead use the function .check_range() takes an offset and a
length to validate
This fixes issues see with error messages featuring the
DescriptorChainTooShort error.
Fixes: #2424
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `TYPE_UNKNOWN` contains a capitalized acronym
--> vm-virtio/src/lib.rs:48:5
|
48 | TYPE_UNKNOWN = 0xFF,
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `Type_Unknown`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This returns trues if this descriptor has another descriptor linked to
it. Not whether this descriptor chain has another one following it.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In case of the virtio frontend driver doesn't need interupts for
certain queue event, it may explicitly write VIRTIO_MSI_NO_VECTOR
to the virtio common configuration, or it may doesn't configure
the event type vector at all.
This patch initializes both MSI-X configuration vector and queue vector
with VIRTIO_MSI_NO_VECTOR, so that the backend drivers won't trigger
unexpected interrupts to the guest.
Signed-off-by: Zide Chen <zide.chen@intel.com>
We should use full memory barrier to ensure both guest and us
can see the correct avail_idx and avail_event_idx. Something
like this pattern:
VM: CLH:
update vring.avail->idx update avail_event_idx
mb() mb()
read avail_event_idx read vring.avail->idx
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Each virtio thread was reading/draining the pause_evt pipe when
detecting the associated event. Problem is, when a virtio device has
multiple threads, they all share the same pause_evt pipe, which can
prevent some threads from receiving the event. If the first thread to
catch the event is quickly clearing the pipe, some other threads might
simply miss the event and they will not enter the "paused" state as
expected.
This is a behavior that was spotted with virtio-net as it usually uses
2 threads by default (1 for TX/RX queues and 1 for the control queue).
The way to solve this issue is by letting each thread drain the pipe
during the resume codepath, that is after the thread has been unparked.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Through the counters() function on the trait expose the accumulated
counters.
TEST=Observe that the counters from the VM match those from the tap on
the host (RX-TX inverted) and inside the guest (non inverted.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The counters are a hash of counter name to (wrapping) u64 value. The
interpretation layer is responsible for converting this data into a
rate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The same way the VM and the vCPUs are restored in a paused state, all
devices associated with the device manager must be restored in the same
paused state.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 78ca0a942f32140465c67ea4b45d68c52c72d751.
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 6dbe8e021a64ba3742081741a7538cdfd93a102e.
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 78819f35f63f5777a58e3e1e774b3270b32881ed.
The vsock TX buffer flush operation would report inconsistent results,
under specific circumstances.
The flush operation is performed in two steps, since it's dealing with a
ring buffer, an the data to be flushed may wrap around. If the first
step was successful, but the second one failed, the whole flush
operation would report an error, thus causing flow control accounting to
lose track of the bytes that were successfully written by the first
pass.
This commit changes the flush behavior to always report success when
some data has been written to the backing stream.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 2da612a9cdce85c91fb54ab22d950ec6ccc93b27.
Fixed a bug introduced by a271d08f0b1ba0ee82761cd49244b6a8017bcede,
whereby the flow control accouting would be off by a few bytes, for
host-initiated connections.
The connection ack message ("OK <port_num><CR>") was accounted for as
data sent by the guest, so its length was substracted from the total
amount of data the guest was allowed to send.
This commit changes the way this ack message is sent, so that it
bypasses flow control accouting.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 109e631566350867dafa4b16c3919dfd1533eeea.
This commit changes the vsock connection state machine behavior to absorb
any EWOULDBLOCK errors recevied while handling an EPOLLOUT event. Previously,
this condition would lead to immediate connection termination.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 660d18cf7fee5b38c3b1b17a5da6544b9025909d.
Apparently, epoll_wait sometimes yields false EPOLLIN events (i.e. events
follwing which read() would fail with EWOULDBLOCK). This would cause the
vsock connection state machine to terminate connections, since an error
was detected on the underlying Unix socket.
This commit changes the vsock connection state machine code to handle such
erroneous EPOLLIN events by absorbing EWOULDBLOCK read() errors.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is 1cc8b8a678eb28b20f5843556bdb7fbb2dfa6284.
Fixed a logical error in the vsock flow control, that would cause credit
update packets to not be sent at the right time.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch has been cherry-picked from the Firecracker tree. The
reference commit is d2475773557c82d2abad2fc8bdf69e7d01444109.
Fixed a vsock muxer issue that would cause a connection to be removed
from the RX queue, even though it still had pending RX data.
Signed-off-by: Dan Horobeanu <dhr@amazon.com>
Signed-off-by: Gabriel Ionescu <gbi@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This patch adds `is_empty` method to VsockPacket to fix the
following clippy error:
error: item `vsock::packet::VsockPacket` has a public `len` method but no corresponding `is_empty` method
--> vm-virtio/src/vsock/packet.rs💯1
|
100 | / impl VsockPacket {
101 | | /// Create the packet wrapper from a TX virtq chain head.
102 | | ///
103 | | /// The chain head is expected to hold valid packet header data. A following packet buffer
... |
334 | | }
335 | | }
| |_^
|
= note: `-D clippy::len-without-is-empty` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This removes the need to use CAP_NET_ADMIN privileges and instead the
host MAC addres is either provided by the user or alternatively it is
retrieved from the kernel.
TEST=Run cloud-hypervisor without CAP_NET_ADMIN permission and a
preconfigured tap device:
sudo ip tuntap add name tap0 mode tap
sudo ifconfig tap0 192.168.249.1 netmask 255.255.255.0 up
cargo clean
cargo build
target/debug/cloud-hypervisor --serial tty --console off --kernel ~/src/rust-hypervisor-firmware/target/target/release/hypervisor-fw --disk path=~/workloads/clear-33190-kvm.img --net tap=tap0
VM was also rebooted to check that works correctly.
Fixes: #1274
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The API has change to use generic GuestMemory trait:
pub fn get_host_address_range<M: GuestMemory>(
mem: &M,
addr: GuestAddress,
size: usize,
) -> Option<*mut u8> {
Signed-off-by: Arron Wang <arron.wang@intel.com>
If VIRTIO_RING_F_EVENT_IDX is negotiated only generate suppress
interrupts if the guest has asked us to do so.
Fixes: #788
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In some situations it is seen that the first interrupt sent to the guest
is lost upon a restore (due to the tap worker being awake ahead of the
vPUs).
This causes problems with VIRTIO_RING_F_EVENT_IDX interrupt suppression
as the guest will not be interrupted again in order to mitigate this we
always interrupt the guest until the device itself has been signalled by
the guest.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This requires exposing the struct members and also using Option<..>
types for the main epoll fd and the memory as they are initialised later
in vhost-user-net.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split handling of behaviour that is independent of the device itself so
that it can be reused in the vhost-user-net device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split out functions that work just on the TAP device and queues. Whilst
doing so also improve the error handling to return Results rather than
drop errors.
This change also addresses a bug where the TAP event suppression could
ineffectual because it was being enabled immediately after it may have
been disabled:
resume_rx -> rx_single_frame -> unregister_listener -> resume_rx ->
register_listener.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>