It doesn't matter if we're trying to translate a GVA or a GPA address,
but in both cases we must error out if the address couldn't be
translated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever a virtio device is placed behind a vIOMMU, we have some code in
pci_common_config.rs to translate the queue addresses (descriptor table,
available ring and used ring) from GVA to GPA, so that they can be used
correctly.
But in case of vDPA, we also need to provide the queue addresses to the
vhost backend. And since the vhost backend deals with consistent IOVAs,
all addresses being provided should be GVAs if the device is placed
being a vIOMMU. For that reason, we perform a translation of the queue
addresses back from GPA to GVA if necessary, and only to be provided to
the vhost backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case an external mapping would have been added after the virtio-iommu
device has been activated, it would have simply be ignored because the
code wasn't using a shared object between the vmm thread and the iommu
thread. This behavior is only triggered on the hotplug codepath, and
only if the hotplugged device is placed behind the virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for the vDPA need to translate a GPA back into a GVA, we
extend the existing trait DmaRemapping and AccessPlatform to perform
such operation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Renaming translate() to translate_gva() to clarify we want to translate
a GVA address into a GPA.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we rely on vhost v0.4.0, which contains the fix for
get_iova_range(), we don't need the workaround anymore, and we can
actually call into the dedicated function.
Fixes#3861
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The file descriptor provided to fs_slave_map() and fs_slave_io() is
passed as a AsRawFd trait, meaning the caller owns it. For that reason,
there's no need for these functions to close the file descriptor as it
will be closed later on anyway.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vDPA is a kernel framework introduced fairly recently in order to handle
devices complying with virtio specification on their datapath, while the
control path is vendor specific. For the datapath, that means the
virtqueues are handled through DMA directly between the hardware and the
guest, while the control path goes through the vDPA framework,
eventually exposed through a vhost-vdpa device.
vDPA, like VFIO, aims at achieving baremetal performance for devices
that are passed into a VM. But unlike VFIO, it provides a simpler/better
framework for achieving migration. Because the DMA accesses between the
device and the guest are going through virtio queues, migration can be
achieved way more easily, and doesn't require each device driver to
implement the migration support. In the VFIO case, each vendor is
expected to provide an implementation of the VFIO migration framework,
which makes things harder as it must be done for each and every device.
So to summarize the point is to support migration for hardware devices
through which we can achieve baremetal performances.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given that some virtio device might need some DMA handling, we provide a
way to store this through the VirtioPciDevice layer, so that it can be
accessed when the PCI device is removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With the VIRTIO_F_EVENT_IDX handling now conducted inside the
virtio-queue crate it is necessary to activate the functionality on
every queue if it is negotiatated. Otherwise this leads to a failure of
the guest to signal to the host that there is something in the available
queue as the queue's internal state has not been configured correctly.
Fixes: #3829
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After writing to an address, Windows 11 on ARM64 unconditionally reads
it back. It is harmless. Drop the error message to avoid spamming.
Fixes: #3732
Signed-off-by: Wei Liu <liuwe@microsoft.com>
error: writing `&mut Vec` instead of `&mut [_]` involves a new object
where a slice will do
--> virtio-devices/src/transport/pci_common_config.rs:93:17
|
93 | queues: &mut
Vec<Queue<GuestMemoryAtomic<GuestMemoryMmap>>>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: change this to: `&mut [Queue<GuestMemoryAtomic<GuestMemoryMmap>>]`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit
https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
Signed-off-by: Akira Moroo <retrage01@gmail.com>
For vhost-user devices, we don't want to loose the vhost-user protocol
feature through the negotiation between guest and device. Since we know
VIRTIO has no knowledge of the vhost-user protocol feature, there is no
way it would ever be acknowledged by the guest. For that reason, we
create each vhost-user device with the set of acked features containing
the vhost-user protocol feature is this one was part of the available
list.
Having the set of acked features containing this bit allows for solving
a bug that was happening through the migration process since the
vhost-user protocol feature wasn't explicitely enabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement the VIRTIO_BALLOON_F_REPORTING feature, indicating to the
guest it can report set of free pages. A new virtqueue dedicated for
receiving the information about the free pages is created. The VMM
releases the memory by punching holes with fallocate() if the guest
memory is backed by a file, and madvise() the host about the ranges of
memory that shouldn't be needed anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a new parameter free_page_reporting=on|off to the balloon device
so that we can enable the corresponding feature from virtio-balloon.
Running a VM with a balloon device where this feature is enabled allows
the guest to report pages that are free from guest's perspective. This
information is used by the VMM to release the corresponding pages on the
host.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Improving the existing code for better readability and in anticipation
for adding an additional virtqueue for the free page reporting feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This should not occur as ioeventfd is used for notification. Such an
error message would have made the discovery of the underlying cause of
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that all the preliminary work has been merged to make Cloud
Hypervisor work with the upstream crate virtio-queue from
rust-vmm/vm-virtio repository, we can move the whole codebase and remove
the local copy of the virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new trait simplifies the address translation of a GuestAddress by
having GuestAddress implementing it.
The three crates virtio-devices, block_util and net_util have been
updated accordingly to rely on this new trait, helping with code
readability and limiting the amount of duplicated code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving away from the virtio-queue mechanism for descriptor address
translation. Instead, we enable the new mechanism added to every
VirtioDevice implementation, by setting the AccessPlatform trait if one
can be found.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we're trying to move away from the translation happening in the
virtio-queue crate, the device itself is performing the address
translation when needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new method set_access_platform() to the VirtioDevice trait in
order to allow an AccessPlatform trait to be setup on any virtio device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Upon the enablement of the queue by the guest, we perform a translation
of the descriptor table, the available ring and used ring addresses
prior to enabling the device itself. This only applies to the case where
the device is placed behind a vIOMMU, which is the reason why the
translation is needed. Indeed, the addresses allocated by the guest are
IOVAs which must be translated into GPAs before we can access the queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of relying on the virtio-queue crate to store the information
about the MSI-X vectors for each queue, we handle this directly from the
PCI transport layer.
This is the first step in getting closer to the upstream version of
virtio-queue so that we can eventually move fully to the upstream
version.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When freeing memory sometimes glibc will attempt to read
"/proc/sys/vm/overcommit_memory" to find out how it should release the
blocks. This happens sporadically with Cloud Hypervisor but has been
seen in use. It is not necessary to add the read() syscall to the list
as it is already included in the virtio devices common set. Similarly
the vCPU and vmm threads already have both these in the allowed list.
Fixes: #3609
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Whenever the backing file of our virtio-block device is opened with
O_DIRECT, there's a requirement about the buffer address and size to be
aligned to the sector size.
We know virtio-block requests are sector aligned in terms of size, but
we must still check if the buffer address is. In case it's not, we
create an intermediate buffer that will be passed through the system
call. In case of a write operation, the content of the non-aligned
buffer must be copied beforehand, and in case of a read operation, the
content of the aligned buffer must be copied to the non-aligned one
after the operation has been completed.
Fixes#3587
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This crate contains up to date definition of the Queue, AvailIter,
DescriptorChain and Descriptor structures forked from the upstream
crate rust-vmm/vm-virtio 27b18af01ee2d9564626e084a758a2b496d2c618.
The following patches have been applied on top of this base in order to
make it work correctly with Cloud Hypervisor requirements:
- Add MSI vector field to the Queue
In order to help with MSI/MSI-X support, it is convenient to store the
value of the interrupt vector inside the Queue directly.
- Handle address translations
For devices with access to data in memory being translated, we add to
the Queue the ability to translate the address stored in the
descriptor.
It is very helpful as it performs the translation right after the
untranslated address is read from memory, avoiding any errors from
happening from the consumer's crate perspective. It also allows the
consumer to reduce greatly the amount of duplicated code for applying
the translation in many different places.
- Add helpers for Queue structure
They are meant to help crate's consumers getting/setting information
about the Queue.
These patches can be found on the 'ch' branch from the Cloud Hypervisor
fork: https://github.com/cloud-hypervisor/vm-virtio.git
This patch takes care of updating the Cloud Hypervisor code in
virtio-devices and vm-virtio to build correctly with the latest version
of virtio-queue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When forwarding an epoll event from the unix muxer to the
targeted connection event handler, the eventset the connection
registered is forwarded instead of the actual epoll
operation (IN/OUT).
For example, if the connection was registered for EPOLLIN,
and receives an EPOLLOUT, the connection will actually handle
an EPOLLOUT.
This is the root cause of previous bug, which caused the
introduction of some workarounds (i.e: handling ewouldblock
when reading after receiving EPOLLIN, which should never happen).
When matching the connection, we retrieve and use the evset of
the connection instead of the one passed as a parameter.
The compiler does not complain for an unused variable because
it was first logged in a debug! statement.
This is an unfortunate naming mistake that caused a lot of problems.
Fixes#3497
Signed-off-by: Eduard Kyvenko <eduard.kyvenko@gmail.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the disk is backed by a block device on the host a non-default
topology will be available and that topology can be advertised by virtio
block.
Fixes: #3262
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This reverts commit 58d25b3ccc.
This change introduced a regression when running iperf with the guest
running as the server:
marvin:~/src/cloud-hypervisor ((58d25b3c...))$ iperf -c 192.168.249.2
------------------------------------------------------------
Client connecting to 192.168.249.2, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.249.1 port 47078 connected with 192.168.249.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.40 sec 14.0 MBytes 11.3 Mbits/sec
marvin:~/src/cloud-hypervisor ((58d25b3c...))$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.249.1 port 5001 connected with 192.168.249.2 port 42866
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.01 sec 51.2 GBytes 44.0 Gbits/sec
Fixes: #3450
Signed-off-by: Rob Bradford <robert.bradford@intel.com>