For the synchronous backends efficiently preserve the order for
completion requests through the use of VecDequeue. Preserving the order
is not required but is beneficial as it matches the existing
optimisation that looks to match completions and requests.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than aggregate the completion list into an intermediate vector
instead adjust the API to provide one completion item at a time.
With DHAT this shows the number of heap allocations has decreased.
Before:
dhat: Total: 623,852 bytes in 8,157 blocks
After:
dhat: Total: 380,444 bytes in 3,469 blocks
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The number of buffers in the request is usually one so by using SmallVec
the number of heap allocations can be significantly reduced.
DHAT reports:
Before:
dhat: Total: 1,166,412 bytes in 40,383 blocks
After:
dhat: Total: 623,852 bytes in 8,157 blocks
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than passing the vector of iovecs for the I/O to act on pass a
reference to the slice of values inside them. This removes the explicit
container type from the API allowing the use of e.g. SmallVec.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When using DHAT[1] for analysis the amount heap allocations change from:
dhat: Total: 3,186,536 bytes in 41,452 blocks
to
dhat: Total: 1,059,816 bytes in 34,747 blocks
When running against virtio-block; this still more allocations than
virtio-pmem but a significant improvement without any cost.
[1] https://docs.rs/dhat/latest/dhat/
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Between musl and glibc there is a difference in the signature of the
ioctl libc function. Use an anonymous cast to force the type coversion.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The probing logic wasn't updated to reflect the actual opcodes in use
for io_uring which are vectored read/writes not the unvectored versions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Considering error messages will be mostly nested, ensuring no
punctuation at the end will make the error log more readable.
Signed-off-by: Bo Chen <chen.bo@intel.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Renaming translate() to translate_gva() to clarify we want to translate
a GVA address into a GPA.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new trait simplifies the address translation of a GuestAddress by
having GuestAddress implementing it.
The three crates virtio-devices, block_util and net_util have been
updated accordingly to rely on this new trait, helping with code
readability and limiting the amount of duplicated code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the backing file of our virtio-block device is opened with
O_DIRECT, there's a requirement about the buffer address and size to be
aligned to the sector size.
We know virtio-block requests are sector aligned in terms of size, but
we must still check if the buffer address is. In case it's not, we
create an intermediate buffer that will be passed through the system
call. In case of a write operation, the content of the non-aligned
buffer must be copied beforehand, and in case of a read operation, the
content of the aligned buffer must be copied to the non-aligned one
after the operation has been completed.
Fixes#3587
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code can be written in a better form and the clippy warning
suppression can be dropped.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This crate contains up to date definition of the Queue, AvailIter,
DescriptorChain and Descriptor structures forked from the upstream
crate rust-vmm/vm-virtio 27b18af01ee2d9564626e084a758a2b496d2c618.
The following patches have been applied on top of this base in order to
make it work correctly with Cloud Hypervisor requirements:
- Add MSI vector field to the Queue
In order to help with MSI/MSI-X support, it is convenient to store the
value of the interrupt vector inside the Queue directly.
- Handle address translations
For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the
descriptor.
It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.
- Add helpers for Queue structure
They are meant to help crate's consumers getting/setting information
about the Queue.
These patches can be found on the 'ch' branch from the Cloud Hypervisor
fork: https://github.com/cloud-hypervisor/vm-virtio.git
This patch takes care of updating the Cloud Hypervisor code in
virtio-devices and vm-virtio to build correctly with the latest version
of virtio-queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For simplicity this trait implements a default version that has a
topology with 512 byte (i.e. sector) recommended sizes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The observation is that the code in question was used to bridge
synchronized and asynchronized code.
We can group the functions for that purpose under an adaptor trait. To
limit the scope of locking, the users of the trait are required to
implement a method to return a MutexGuard for the underlying file.
This then allows us to use concrete types (QcowFile and Vhdx) in code,
which is easier to read than a bunch of traits.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Previously mutex (semaphore) and file were separated. The code needed to
create artificial scopes to use mutex to protect file.
Rewrite the code to be idiomatic. The file itself is turned into a trait
object and placed inside the mutex. This requires providing a new
ReadWriteSeekFile trait to unify all helper functions.
The rewrite further simplified vhdx_sync code. The original code
contained two mutex'es for no apparent reason.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
As part of checking if io_uring is supported various functionality is
tested. The test for whether io_uring supports EventFds is very time
consuming (~10ms) however this test can be removed as a later test will
test for functionality added after this one.
The support for register_eventfd() was released in Linux 5.1 but the
support for register_probe() was released in Linux 5.4. So if the latter
is present the former also is.
Before:
cloud-hypervisor: 4.880411ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.105123ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 14.134837ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.221869ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
After:
cloud-hypervisor: 3.140716ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 3.376027ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 3.40446ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 3.513969ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.
The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhdx_sync.rs in block_util implements traits to represent the vhdx
crate as a supported block device in the cloud hypervisor. The vhdx
is added to the block device list in device_manager.rs at the vmm
crate so that it can automatically detect a vhdx disk and invoke the
corresponding crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Fazla Mehrab <akm.fazla.mehrab@intel.com>
Instead of panicking with an expect() function, the QcowDiskSync::new
function now propagates the error properly. This ensures the VMM will
not panic, which might be the source of weird errors if only one thread
exits while the VMM continues to run.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
As discussed in the working PR in the upstream vm-memory crate repo,
some special functions (e.g. return raw pointers to the wrapped guest
memory) require manual dirty page tracking from their users (e.g.the
VMM). One of the special functions is `VolatileSlice::as_ptr(), which is
used in our code base for supporting async block I/O. This patch
manually mark dirty for guest pages touched while reading from block
devices.
Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
With current serde_derive it is possible to #[derive(Serialize)] on
packed structures if they implement Copy. This allows the removal of the
manual equivalent code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:68:31
|
68 | fn build_device_id(disk_path: &PathBuf) -> result::Result<String, Error> {
| ^^^^^^^^ help: change this to: `&Path`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:83:39
|
83 | pub fn build_disk_image_id(disk_path: &PathBuf) -> Vec<u8> {
| ^^^^^^^^ help: change this to: `&Path`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:68:31
|
68 | fn build_device_id(disk_path: &PathBuf) -> result::Result<String, Error> {
| ^^^^^^^^ help: change this to: `&Path`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:83:39
|
83 | pub fn build_disk_image_id(disk_path: &PathBuf) -> Vec<u8> {
| ^^^^^^^^ help: change this to: `&Path`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `TYPE_UNKNOWN` contains a capitalized acronym
--> vm-virtio/src/lib.rs:48:5
|
48 | TYPE_UNKNOWN = 0xFF,
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `Type_Unknown`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
error: name `GetDeviceID` contains a capitalized acronym
--> block_util/src/lib.rs:138:5
|
138 | GetDeviceID,
| ^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `GetDeviceId`
|
= note: `-D clippy::upper-case-acronyms` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are multiple reports of DescriptorChainTooShort errors and so add
some extra debugging to aid the debugging of this issue.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The vhd module is the implementation of the VHD specification, which is
why it is important to unit test it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the simplified version of the synchronous support for RAW
disk files, the new fixed_vhd_sync module in the block_util crate
introduces the synchronous support for fixed VHD disk files.
With this patch, the fixed VHD support is complete as it is implemented
in both synchronous and asynchronous versions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>