On the restore path, using the available and used indexes read from
memory to fill the Queue structure was a mistake. Indeed, the available
index is written from the guest and it reflects the last available index
in the descriptor table. But the driver might have queued a lot of
buffers which have not yet been used by the device. This leads to a
situation where the next_avail from Queue is completely different from
the one we can read from memory.
Instead, the right way to determine the next_avail index that should be
used by the device is by relying on the used index from the memory. This
index represents the correct information we're looking for as it has
been updated before the snapshot to let the guest know the next index to
process.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
First, this modifies the existing helpers on how to get indexes for
available and used rings from memory. Instead of updating the queue
through each helper, they are now used as simple getters.
Based on these new getters, we could create a new helper to determine if
the queue has some available descriptors already queued from the driver
side. This helper is going to be particularly helpful when trying to
determine from a virtio thread if a queue is already loaded with some
available buffers that can be used to send information to the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding the shared memory regions to the list of BARs, we make sure
the DeviceManager will register it as a BAR on the PCI bus. Without
this, when PCI BAR reprogramming happens, the PCI bus errors since it
does not know about any BAR at the specified address.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Any virtio device relying on the mmio transport layer can be snapshotted
and restored thanks to this new patch. From the MmioDevice perspective,
it is mainly a matter of saving the information about the virtqueues as
the restore path will need them to activate the device (if needed
because it has been activated before being snapshotted).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add support for specifying the PCI revision in the PCI configuration and
populate this with the value of 1 for virtio-pci devices.
The virtio-pci specification is slightly ambiguous only saying that
transitional (i.e. devices that support legacy and virtio 1.0) should
set this to 0. In practice it seems that software expects the revision
to be set to 1 for modern only devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an accessor to return the underlying VirtioDevice. This is useful
for managing the removal of the device from internal datastructures when
handling virtio-pci device unplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to support freeing the memory that is allocated we need to make
sure that we update the internal representation so that free_bars() can
correctly free the memory if the device has its BARs moved.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement the free_bars() method from the PciDevice trait which is used
as part of the device removal process. Although there is only one BAR
allocated by VirtioPciDevice simplify the code by using a vector.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
The details of the SHM regions or the lack of, which is used by
virtio-fs DAX, is communicated through configuration fields on the
virtio-mmio memory region. Implement the necessary fields to return the
SHM entries and in particular return a length of (u64)-1 which is used
by the kernel to indicate there are no SHM regions.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Remove duplicated code across the different devices by handling
the virtio feature pages in VirtioDevice itself rather than
in the backends. This works as no virtio devices use feature
bits beyond 64-bits.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The virtio capability VIRTIO_PCI_CAP_PCI_CFG is exposed through the
device's PCI config space the same way other virtio-pci capabilities
are exposed.
The main and important difference is that this specific capability is
designed as a way for the guest to access virtio capabilities without
mapping the PCI BAR. This is very rarely used, but it can be useful when
it is too early for the guest to be able to map the BARs.
One thing to note, this special feature MUST be implemented, based on
the virtio specification.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need to support more features related to the
access of a device's PCI config space, this commits changes the self
reference in the function read_config_register() to be mutable.
This also brings some more flexibility for any implementation of the
PciDevice trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point is to be able to retrieve directly the event fd related to
the interrupt, as this might optimize the way VirtioDevice devices are
implemented.
For instance, this can be used by vhost-user devices to provide
vhost-user backends directly with the event fd triggering the
interrupt related to a virtqueue.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that MsixConfig has access to the irq_fd descriptors associated with
each vector, it can directly write to it anytime it needs to trigger an
interrupt.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.
Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
According to virtio spec, for used buffer notifications, if
MSI-X capability is enabled, and queue msix vector is
VIRTIO_MSI_NO_VECTOR 0xffff, the device must not deliver an
interrupt for that virtqueue.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
The following commit broke this unit test:
"""
vmm: Convert virtio devices to Arc<Mutex<T>>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.
Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
"""
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Due to the amount of code currently duplicated across virtio devices,
the stats for this commit is on the large side but it's mostly more
duplicated code, unfortunately.
Migratable and Snapshotable placeholder implementations are provided as
well, making all virtio devices Migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.
Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to group together some functions that can be shared across
virtio transport layers, this commit introduces a new trait called
VirtioTransport.
The first function of this trait being ioeventfds() as it is needed from
both virtio-mmio and virtio-pci devices, represented by MmioDevice and
VirtioPciDevice structures respectively.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The specific part of PCI BAR reprogramming that happens for a virtio PCI
device is the update of the ioeventfds addresses KVM should listen to.
This should not be triggered for every BAR reprogramming associated with
the virtio device since a virtio PCI device might have multiple BARs.
The update of the ioeventfds addresses should only happen when the BAR
related to those addresses is being moved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The PciDevice trait is supposed to describe only functions related to
PCI. The specific method ioeventfds() has nothing to do with PCI, but
instead would be more specific to virtio transport devices.
This commit removes the ioeventfds() method from the PciDevice trait,
adding some convenient helper as_any() to retrieve the Any trait from
the structure behing the PciDevice trait. This is the only way to keep
calling into ioeventfds() function from VirtioPciDevice, so that we can
still properly reprogram the PCI BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the value being written to the BAR, the implementation can
now detect if the BAR is being moved to another address. If that is the
case, it invokes move_bar() function from the DeviceRelocation trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the type of BAR, we can now provide the correct address related
to a BAR index provided by the caller.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case some virtio devices are attached to the virtual IOMMU, their
vring addresses need to be translated from IOVA into GPA. Otherwise it
makes no sense to try to access them, and they would cause out of range
errors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The 32 or 64 bits type for the memory BAR was not set correctly. This
patch ensure the right type is applied to the BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Currently all devices and guest memory share the same 64GiB
allocation. With guest memory working upwards and devices working
downwards. This creates issues if you want to either have a VM with a
large amount of memory or want to have devices with a large allocation
(e.g. virtio-pmem.)
As it is possible for the hypervisor to place devices anywhere in its
address range it is required for simplistic users like the firmware to
set up an identity page table mapping across the full range. Currently
the hypervisor sets up an identify mapping of 1GiB which the firmware
extends to 64GiB to match the current address space size of the
hypervisor.
A simpler solution is to place the device needed for booting with the
firmware (virtio-block) inside the 32-bit memory hole. This allows the
firmware to easily access the block device and paves the way for
increasing the address space beyond the current 64GiB limit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Derived from the crosvm code at 5656c124af2bb956dba19e409a269ca588c685e3
and adapted to work within cloud-hypervisor:
Main differences:
* Interrupt handling is done via a VirtioInterrupt turned into a
devices::Interrupt
* GuestMemory -> GuestMemoryMmap
* Differences in read/write for BusDevice
* Different crates for EventFd and GuestAddress
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Following the refactoring of the code allowing multiple threads to
access the same instance of the guest memory, this patch goes one step
further by adding RwLock to it. This anticipates the future need for
being able to modify the content of the guest memory at runtime.
The reasons for adding regions to an existing guest memory could be:
- Add virtio-pmem and virtio-fs regions after the guest memory was
created.
- Support future hotplug of devices, memory, or anything that would
require more memory at runtime.
Because most of the time, the lock will be taken as read only, using
RwLock instead of Mutex is the right approach.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VMM guest memory was cloned (copied) everywhere the code needed to
have ownership of it. In order to clean the code, and in anticipation
for future support of modifying this guest memory instance at runtime,
it is important that every part of the code share the same instance.
Because VirtioDevice implementations need to have access to it from
different threads, that's why Arc must be used in this case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Latest clippy version complains about our existing code for the
following reasons:
- trait objects without an explicit `dyn` are deprecated
- `...` range patterns are deprecated
- lint `clippy::const_static_lifetime` has been renamed to
`clippy::redundant_static_lifetimes`
- unnecessary `unsafe` block
- unneeded return statement
All these issues have been fixed through this patch, and rustfmt has
been run to cleanup potential formatting errors due to those changes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the context of shared memory regions, they could not be present for
most of the virtio devices. For this reason, we prefer dedicate a BAR
for the shared memory regions.
Another reason is that memory regions, if there are several, can be
allocated all at once as a contiguous region, which then can be used as
its own BAR. It would be more complicated to try to allocate the BAR 0
holding the regular information about the virtio-pci device along with
the shared memory regions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch cleans up the VirtioDevice trait. Since some function are PCI
specific and since they are not even used, it makes sense to remove them
from the trait definition.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the latest version of the virtio specification, the structure
virtio_pci_cap has been updated and a new structure virtio_pci_cap64 has
been introduced.
virtio_pci_cap now includes a field "id" that does not modify the
existing structure size since there was a 3 bytes reserved field
already there. The id is used in the context of shared memory regions
which need to be identified since there could be more than one of this
kind of capability.
virtio_pci_cap64 is a new structure that includes virtio_pci_cap and
extends it to allow 64 bits offsets and 64 bits region length. This is
used in the context of shared memory regions capability, as we might need
to describe regions of 4G or more, that could be placed at a 4G offset
or more in the associated BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The length of the PCI capability as it is being calculated by the guest
was not accurate since it was not including the implicit 2 bytes offset.
The reason for this offset is that the structure itself does not contain
the capability ID (1 byte) and the next capability pointer (1 byte), but
the structure exposed through PCI config space does include those bytes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the newly added MSI-X helper, the interrupt callback checks
the interrupts are enabled on the device before to try triggering the
interrupt.
Fixes#156
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The structure of the vmm-sys-util crate has changed with lots of code
moving to submodules.
This change adjusts the use of the imported structs to reference the
submodules.
Fixes: #145
Signed-off-by: Rob Bradford <robert.bradford@intel.com>