Now that all the preliminary work has been merged to make Cloud
Hypervisor work with the upstream crate virtio-queue from
rust-vmm/vm-virtio repository, we can move the whole codebase and remove
the local copy of the virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new trait simplifies the address translation of a GuestAddress by
having GuestAddress implementing it.
The three crates virtio-devices, block_util and net_util have been
updated accordingly to rely on this new trait, helping with code
readability and limiting the amount of duplicated code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the backing file of our virtio-block device is opened with
O_DIRECT, there's a requirement about the buffer address and size to be
aligned to the sector size.
We know virtio-block requests are sector aligned in terms of size, but
we must still check if the buffer address is. In case it's not, we
create an intermediate buffer that will be passed through the system
call. In case of a write operation, the content of the non-aligned
buffer must be copied beforehand, and in case of a read operation, the
content of the aligned buffer must be copied to the non-aligned one
after the operation has been completed.
Fixes#3587
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code can be written in a better form and the clippy warning
suppression can be dropped.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This crate contains up to date definition of the Queue, AvailIter,
DescriptorChain and Descriptor structures forked from the upstream
crate rust-vmm/vm-virtio 27b18af01ee2d9564626e084a758a2b496d2c618.
The following patches have been applied on top of this base in order to
make it work correctly with Cloud Hypervisor requirements:
- Add MSI vector field to the Queue
In order to help with MSI/MSI-X support, it is convenient to store the
value of the interrupt vector inside the Queue directly.
- Handle address translations
For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the
descriptor.
It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.
- Add helpers for Queue structure
They are meant to help crate's consumers getting/setting information
about the Queue.
These patches can be found on the 'ch' branch from the Cloud Hypervisor
fork: https://github.com/cloud-hypervisor/vm-virtio.git
This patch takes care of updating the Cloud Hypervisor code in
virtio-devices and vm-virtio to build correctly with the latest version
of virtio-queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For simplicity this trait implements a default version that has a
topology with 512 byte (i.e. sector) recommended sizes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The observation is that the code in question was used to bridge
synchronized and asynchronized code.
We can group the functions for that purpose under an adaptor trait. To
limit the scope of locking, the users of the trait are required to
implement a method to return a MutexGuard for the underlying file.
This then allows us to use concrete types (QcowFile and Vhdx) in code,
which is easier to read than a bunch of traits.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Previously mutex (semaphore) and file were separated. The code needed to
create artificial scopes to use mutex to protect file.
Rewrite the code to be idiomatic. The file itself is turned into a trait
object and placed inside the mutex. This requires providing a new
ReadWriteSeekFile trait to unify all helper functions.
The rewrite further simplified vhdx_sync code. The original code
contained two mutex'es for no apparent reason.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
As part of checking if io_uring is supported various functionality is
tested. The test for whether io_uring supports EventFds is very time
consuming (~10ms) however this test can be removed as a later test will
test for functionality added after this one.
The support for register_eventfd() was released in Linux 5.1 but the
support for register_probe() was released in Linux 5.4. So if the latter
is present the former also is.
Before:
cloud-hypervisor: 4.880411ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.105123ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 14.134837ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 14.221869ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
After:
cloud-hypervisor: 3.140716ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/home/rob/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 3.376027ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 3.40446ms: <vmm> INFO:vmm/src/device_manager.rs:1916 -- Creating virtio-block device: DiskConfig { path: Some("/tmp/disk"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk1"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 3.513969ms: <vmm> INFO:vmm/src/device_manager.rs:1998 -- Using asynchronous RAW disk file (io_uring)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.
The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This was causing some issues because of the use of 2 different versions
for the vm-memmory crate. We'll wait for all dependencies to be properly
resolved before we move to 0.7.0.
This reverts commit 76b6c62d07.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhdx_sync.rs in block_util implements traits to represent the vhdx
crate as a supported block device in the cloud hypervisor. The vhdx
is added to the block device list in device_manager.rs at the vmm
crate so that it can automatically detect a vhdx disk and invoke the
corresponding crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Fazla Mehrab <akm.fazla.mehrab@intel.com>
Instead of panicking with an expect() function, the QcowDiskSync::new
function now propagates the error properly. This ensures the VMM will
not panic, which might be the source of weird errors if only one thread
exits while the VMM continues to run.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This doesn't really affect the build as we ship a Cargo.lock with fixed
versions in. However for clarity it makes sense to use fixed versions
throughout and let dependabot update them.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Issue from beta verion of clippy:
Error: --> vm-virtio/src/queue.rs:700:59
|
700 | if let Some(used_event) = self.get_used_event(&mem) {
| ^^^^ help: change this to: `mem`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
Signed-off-by: Bo Chen <chen.bo@intel.com>
As discussed in the working PR in the upstream vm-memory crate repo,
some special functions (e.g. return raw pointers to the wrapped guest
memory) require manual dirty page tracking from their users (e.g.the
VMM). One of the special functions is `VolatileSlice::as_ptr(), which is
used in our code base for supporting async block I/O. This patch
manually mark dirty for guest pages touched while reading from block
devices.
Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
With current serde_derive it is possible to #[derive(Serialize)] on
packed structures if they implement Copy. This allows the removal of the
manual equivalent code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:68:31
|
68 | fn build_device_id(disk_path: &PathBuf) -> result::Result<String, Error> {
| ^^^^^^^^ help: change this to: `&Path`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:83:39
|
83 | pub fn build_disk_image_id(disk_path: &PathBuf) -> Vec<u8> {
| ^^^^^^^^ help: change this to: `&Path`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:68:31
|
68 | fn build_device_id(disk_path: &PathBuf) -> result::Result<String, Error> {
| ^^^^^^^^ help: change this to: `&Path`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do.
--> block_util/src/lib.rs:83:39
|
83 | pub fn build_disk_image_id(disk_path: &PathBuf) -> Vec<u8> {
| ^^^^^^^^ help: change this to: `&Path`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
error: name `TYPE_UNKNOWN` contains a capitalized acronym
--> vm-virtio/src/lib.rs:48:5
|
48 | TYPE_UNKNOWN = 0xFF,
| ^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `Type_Unknown`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
error: name `GetDeviceID` contains a capitalized acronym
--> block_util/src/lib.rs:138:5
|
138 | GetDeviceID,
| ^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `GetDeviceId`
|
= note: `-D clippy::upper-case-acronyms` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are multiple reports of DescriptorChainTooShort errors and so add
some extra debugging to aid the debugging of this issue.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The vhd module is the implementation of the VHD specification, which is
why it is important to unit test it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the simplified version of the synchronous support for RAW
disk files, the new fixed_vhd_sync module in the block_util crate
introduces the synchronous support for fixed VHD disk files.
With this patch, the fixed VHD support is complete as it is implemented
in both synchronous and asynchronous versions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds the asynchronous support for fixed VHD disk files.
It introduces FixedVhd as a new ImageType, moving the image type
detection to the block_util crate (instead of qcow crate).
It creates a new vhd module in the block_util crate in order to handle
VHD footer, following the VHD specification.
It creates a new fixed_vhd_async module in the block_util crate to
implement the asynchronous version of fixed VHD disk file. It relies on
io_uring.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since QCOW and RAW synchronous implementation are very close, it makes
sense to introduce some common functions that can be shared between
these two.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the synchronous QCOW file implementation present in the qcow
crate, we created a new qcow_sync module in block_util that ports this
synchronous implementation to the AsyncIo trait.
The point is to reuse virtio-blk asynchronous implementation for both
synchronous and asynchronous backends.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the synchronous RAW file implementation present in the qcow
crate, we created a new raw_sync module in block_util that ports this
synchronous implementation to the AsyncIo trait.
The point is to reuse virtio-blk asynchronous implementation for both
synchronous and asynchronous backends.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the new DiskFile and AsyncIo traits, the implementation of
asynchronous block support does not have to be tied to io_uring anymore.
Instead, the only thing the virtio-blk implementation knows is that it
is using an asynchronous implementation of the underlying disk file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both DiskFile and AsyncIo traits are introduced to allow all kind of
files (RAW, QCOW, VHD) to be able to handle asynchronous access to the
underlying file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Small patch creating a dedicated `block_io_uring_is_supported()`
function for the non-io_uring case, so that we can simplify the
code in the DeviceManager.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some operations complete directly after they have been submitted, which
means they are not submitted asynchronously and therefore they don't
generate any ioevent. This is the reason why we are not processing some
of the completed operations, which leads to some unpredictable
behaviors.
Forcing all io_uring operations submitted to the SQE to be asynchronous
helps simplifying the code as it ensures the completion of every
operation will generate an ioevent, therefore no operation is missed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The Windows virtio block driver puts multiple data descriptors between
the header and the status footer. To handle this when parsing iterate
over the descriptor chain until the end is reached accumulating the
address and length pairs in a vector. For execution iterate over the
vector and make sequential reads from the disk for each data descriptor.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for supporting multiple virtio descriptors, let's make
sure the read/write operations are performed with vectored I/O.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>