The point is to make sure the DeviceManager holds a strong reference of
each BusDevice inserted on the IO and MMIO buses. This will allow these
buses to hold Weak references onto the BusDevice devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.
This commit simply takes care of making the necessary changes to change
this function as mutable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:
DeviceManager ---> AddressManager ---> Bus ---> BusDevice
^ |
| |
+---------------------------------------------+
This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.
One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Through the BusDevice implementation from the DeviceManager, and by
inserting the DeviceManager on the IO bus for a specific IO port range,
the VMM now has the ability to handle PCI device hotplug.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation of inserting the DeviceManager on the IO/MMIO buses,
the DeviceManager must implement the BusDevice trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Create a small method that will perform both hotplug of all the devices
identified by PCIU bitmap, and then perform the hotunplug of all the
devices identified by the PCID bitmap.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The _EJ0 method provides the guest OS a way to notify the VMM that the
device has been properly ejected from the guest OS. Only after this
point, the VMM can fully remove the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new PHPR device in the DSDT table introduces some specific
operation regions and the associated fields.
PCIU stands for "PCI up", which identifies PCI devices that must be
added.
PCID stands for "PCI down", which identifies PCI devices that must be
removed.
B0EJ stands for "Bus 0 eject", which identifies which device on the bus
has been ejected by the guest OS.
Thanks to these fields, the VMM and the guest OS can communicate while
performing hotplug/hotunplug operations.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adds the DVNT method to the PCI0 device in the DSDT table. This new
method is responsible for checking each slot and notify the guest OS if
one of the slots is supposed to be added or removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the ACPI support for describing the 32 device
slots attached to the main PCI host bridge.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.
It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.
Fixes#763
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Currently this only supports creating a new tap interface as the network
backend also only supports that.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))
The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We create 2 different interrupt managers for separately handling
creation of legacy and MSI interrupt groups.
Doing so allows us to have a cleaner interrupt manager and IOAPIC
initialization path. It also prepares for an InterruptManager trait
design improvement where we remove the interrupt source type dependency
by associating an interrupt configuration type to the trait.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A reference to the VmFd is stored on the AddressManager so it is not
necessary to pass in the VmInfo into all methods that need it as it can
be obtained from the AddressManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The DeviceManager has a reference to the MemoryManager so use that to
get the GuestMemoryMmap rather than the version stored in the VmInfo
struct.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the use of vm_info in methods to get the config and instead use
the config stored on the DeviceManager itself.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. This prepares the way to more easily store state on
the DeviceManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. A follow-up commit will change the callee functions
that create the devices themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Modify these functions to take an &mut self and become methods on
DeviceManager. This allows the removal of some in/out parameters and
leads the way to further refactoring and simplification.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MemoryManager should only be included on the I/O bus when doing ACPI
builds as that is the only time it will be interrogated.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently the MemoryManager is only used on the ACPI code paths after
the DeviceManager has been created. This will change in a future commit
as part of the refactoring so for now always include it but name it with
underscore prefix to indicate it might not always be used.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that devices attached to the virtual IOMMU are described through
virtio configuration, there is no need for the DeviceManager to store
the list of IDs for all these devices. Instead, things are handled
locally when PCI devices are being added.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of relying on the ACPI tables to describe the devices attached
to the virtual IOMMU, let's use the virtio topology, as the ACPI support
is getting deprecated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-block and
vhost-user-block. For now it is necessary to specify both vhost_user
and socket parameters as auto activation is not yet implemented. The wce
parameter for supporting "Write Cache Enabling" is also added to the
disk configuration.
The original command line parameter is still supported for now and will
be removed in a future release.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-net and vhost-user-net.
For now it is necessary to specify both vhost_user and socket parameters
as auto activation is not yet implemented. The original command line
parameter is still supported for now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit improves the existing virtio-blk implementation, allowing
for better I/O performance. The cost for the end user is to accept
allocating more vCPUs to the virtual machine, so that multiple I/O
threads can run in parallel.
One thing to notice, the amount of vCPUs must be egal or superior to the
amount of queues dedicated to the virtio-blk device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Devices like virtio-pmem and virtio-fs require some dedicated memory
region to be mapped. The memory mapping from the DeviceManager is being
replaced by the usage of MmapRegion from the vm-memory crate.
The unmap will happen automatically when the MmapRegion will be dropped,
which should happen when the DeviceManager gets dropped.
Fixes#240
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Move GED device reporting of required device type to scan into an MMIO
region rather than an I/O port.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than have the MemoryManager device sit on the I/O bus allocate
space for MMIO and add it to the MMIO bus.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit relies on the interrupt manager and the resulting interrupt
source group to abstract the knowledge about KVM and how interrupts are
updated and delivered.
This allows the entire "devices" crate to be freed from kvm_ioctls and
kvm_bindings dependencies.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The interrupt manager is passed to the IOAPIC creation, and the IOAPIC
now creates an InterruptSourceGroup for MSI interrupts based on it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By introducing a new InterruptManager dedicated to the IOAPIC, we don't
have to solve the chicken and eggs problem about which of the
InterruptManager or the Ioapic should be created first. It's also
totally fine to have two interrupt manager instances as they both share
the same list of GSI routes and the same allocator.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'direct' property, honor
it while opening the file backing the disk image, and pass it to
vm_virtio::RawFile.
Fixes#631
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'readonly' properly, and pass
it to vm_virtio::Block.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.
Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the InterruptManager is never stored into any structure, it should
be passed as a reference instead of being cloned.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Additionally, since it removes the last bit relying on the Interrupt
trait, the trait and its implementation can be removed from the
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By having a reference to the IOAPIC, the KvmInterruptManager is going
to be able to initialize properly the legacy interrupt source group.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the InterruptManager be shared across both PCI and MMIO
devices, this commit moves the initialization earlier in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.
Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:
- Address of the source/destination memory buffer.
- File offset.
- Length of the data to be read/written.
The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".
To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.
We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.
In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.
Follow on commits will further reduce some of the duplicated code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To reflect updated clippy rules:
error: `if` chain can be rewritten with `match`
--> vmm/src/device_manager.rs:1508:25
|
1508 | / if ret > 0 {
1509 | | debug!("MSI message successfully delivered");
1510 | | } else if ret == 0 {
1511 | | warn!("failed to deliver MSI message, blocked by guest");
1512 | | }
| |_________________________^
|
= note: `-D clippy::comparison-chain` implied by `-D warnings`
= help: Consider rewriting the `if` chain to use `cmp` and `match`.
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Address updated clippy errors:
error: redundant clone
--> vmm/src/device_manager.rs:699:32
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^ help: remove this
|
= note: `-D clippy::redundant-clone` implied by `-D warnings`
note: this value is dropped without further use
--> vmm/src/device_manager.rs:699:21
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:737:26
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:737:21
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:754:29
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:754:25
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Continue to notify on all vCPUs but instead separate the notification
functionality into two methods, CSCN that walks through all the CPUs
and CTFY which notifies based on the numerical CPU id. This is an
interim step towards only notifying on changed CPUs and ultimately CPU
removal.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.
All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.
Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.
Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The FsConfig structure has been recently adjusted so that the default
value matches between OpenAPI and CLI. Unfortunately, with the current
description, there is no way from the OpenAPI to describe a cache_size
value "None", so that DAX does not get enabled. Usually, using a Rust
"Option" works because the default value is None. But in this case, the
default value is Some(8G), which means we cannot describe a None.
This commit tackles the problem, introducing an explicit parameter
"dax", and leaving "cache_size" as a simple u64 integer.
This way, the default value is dax=true and cache_size=8G, but it lets
the opportunity to disable DAX entirely with dax=false, which will
simply ignore the cache_size value.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We want to set different default configurations for vhost-user-net and
vhost-user-blk, which is the reason why the common part corresponding to
the number of queues and the queue size cannot be embedded.
This prepares for the following commit, matching API and CLI behaviors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Remove the previously hardcoded IRQ number used for the GED device.
Instead allocate the IRQ using the allocator and use that value in the
definition in the ACPI device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for handling the creation of the DSDT entries for devices
into the DeviceManager.
This will make it easier to handle device hotplug and also in the future
remove some hardcoded ACPI constants.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When consumer of the HTTP API try to interact with cloud-hypervisor,
they have to provide the equivalent of the config structure related to
each component they need. Problem is, the Rust enum type "Option" cannot
be obtained from the OpenAPI YAML definition.
This patch intends to fix this inconsistency between what is possible
through the CLI and what's possible through the HTTP API by using simple
types bool and int64 instead of Option<u64>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the device setup code assumed that if no IOAPIC was passed in
then the device should be added to the kernel irqchip. As an earlier
change meant that there was always a userspace IOAPIC this kernel based
code can be removed.
The accessor still returns an Option type to leave scope for
implementing a situation without an IOAPIC (no serial or GED device).
This change does not add support no-IOAPIC mode as the original code did
not either.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The KVM_SET_GSI_ROUTING ioctl is very simple, it overrides the previous
routes configuration with the new ones being applied. This means the
caller, in this case cloud-hypervisor, needs to maintain the list of all
interrupts which needs to be active at all times. This allows to
correctly support multiple devices to be passed through the VM and being
functional at the same time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add ability to notify via the GED device that there is some new hotplug
activity. This will be used by the CpuManager (and later DeviceManager
itself) to notify of new hotplug activity.
Currently it has a hardcoded IRQ of 5 as the ACPI tables also need to
refer to this IRQ and the IRQ allocation does not permit the allocation
of specific IRQs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In case the VM is started with the flag "--pmem mergeable=on", it means
the user expects the guest persistent memory pages to be marked as
mergeable. This commit relies on the madvise(MADV_MERGEABLE) system call
to inform the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When adding devices to the guest, and populating the device model, we
should prefix the routines with add_. When we're just creating the
device objects but not yet adding them we use make_.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
MMIO devices creation code into its own routine.
Fixes: #441
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
PCI devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
legacy devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
console creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Ensure that we tell the allocator about all the I/O ports that we are
using for I/O bus attached devices (serial, i8042, ACPI device.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to group together some functions that can be shared across
virtio transport layers, this commit introduces a new trait called
VirtioTransport.
The first function of this trait being ioeventfds() as it is needed from
both virtio-mmio and virtio-pci devices, represented by MmioDevice and
VirtioPciDevice structures respectively.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that kvm-ioctls has been updated, the function unregister_ioevent()
can be used to remove eventfd previously associated with some specific
PIO or MMIO guest address. Particularly, it is useful for the PCI BAR
reprogramming case, as we want to ensure the eventfd will only get
triggered by the new BAR address, and not the old one.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to rely on the latest kvm-ioctls version to benefit from the
recent addition of unregister_ioevent(), allowing us to detach a
previously registered eventfd to a PIO or MMIO guest address.
Because of this update, we had to modify the current constraint we had
on the vmm-sys-util crate, using ">= 0.1.1" instead of being strictly
tied to "0.2.0".
Once the dependency conflict resolved, this commit took care of fixing
build issues caused by recent modification of kvm-ioctls relying on
EventFd reference instead of RawFd.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The specific part of PCI BAR reprogramming that happens for a virtio PCI
device is the update of the ioeventfds addresses KVM should listen to.
This should not be triggered for every BAR reprogramming associated with
the virtio device since a virtio PCI device might have multiple BARs.
The update of the ioeventfds addresses should only happen when the BAR
related to those addresses is being moved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The PciDevice trait is supposed to describe only functions related to
PCI. The specific method ioeventfds() has nothing to do with PCI, but
instead would be more specific to virtio transport devices.
This commit removes the ioeventfds() method from the PciDevice trait,
adding some convenient helper as_any() to retrieve the Any trait from
the structure behing the PciDevice trait. This is the only way to keep
calling into ioeventfds() function from VirtioPciDevice, so that we can
still properly reprogram the PCI BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Storing a strong reference to the AddressManager behind the
DeviceRelocation trait results in a cyclic reference count.
Use a weak reference to break that dependency.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the value being written to the BAR, the implementation can
now detect if the BAR is being moved to another address. If that is the
case, it invokes move_bar() function from the DeviceRelocation trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to trigger the PCI BAR reprogramming from PciConfigIo and
PciConfigMmmio, we need the PciBus to have a hold onto the trait
implementation of DeviceRelocation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By implementing the DeviceRelocation trait for the AddressManager
structure, we now have a way to let the PCI BAR reprogramming happen.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to reuse the SystemAllocator later at runtime, it is moved into
the new structure AddressManager. The goal is to have a hold onto the
SystemAllocator and both IO and MMIO buses so that we can use them
later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a VFIO devices is being attached behind a virtual IOMMU, we
should not automatically map the entire guest memory for the specific
device.
A VFIO device attached to the virtual IOMMU will be driven with IOVAs,
hence we should simply wait for the requests coming from the virtual
IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When VFIO devices are created and if the device is attached to the
virtual IOMMU, the ExternalDmaMapping trait implementation is created
and associated with the device. The idea is to build a hash map of
device IDs with their associated trait implementation.
This hash map is provided to the virtual IOMMU device so that it knows
how to properly trigger external mappings associated with VFIO devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With this implementation of the trait ExternalDmaMapping, we now have
the tool to provide to the virtual IOMMU to trigger the map/unmap on
behalf of the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VFIO container is the object needed to update the VFIO mapping
associated with a VFIO device. This patch allows the device manager
to have access to the VFIO container.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch attaches VFIO devices to the virtual IOMMU if they are
identified as they should be, based on the option "iommu=on". This
simply takes care of adding the PCI device ID to the ACPI IORT table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit is the glue between the virtio-pci devices attached to the
vIOMMU, and the IORT ACPI table exposing them to the guest as sitting
behind this vIOMMU.
An important thing is the trait implementation provided to the virtio
vrings for each device attached to the vIOMMU, as they need to perform
proper address translation before they can access the buffers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case some virtio devices are attached to the virtual IOMMU, their
vring addresses need to be translated from IOVA into GPA. Otherwise it
makes no sense to try to access them, and they would cause out of range
errors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio specification defines a device can be reset, which was not
supported by this virtio-console implementation. The reason it is needed
is to support unbinding this device from the guest driver, and rebind it
to vfio-pci driver.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
They point to a vm_virtio structure (VhostUserConfig) and in order to
make the whole config serializable (through the serde crate for
example), we'd have to add a serde dependency to the vm_virtio crate.
Instead we use a local, serializable structure and convert it to
VhostUserConfig from the DeviceManager code.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based off of crosvm revision b5237bbcf074eb30cf368a138c0835081e747d71
add a CMOS device. This environments that can't use KVM clock to get the
current time (e.g. Windows and EFI.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the PCI datastructures to move the device ownership to a PciBus
struct. This PciBus struct can then be used by both a PciConfigIo and
PciConfigMmio in order to expose the configuration space via both IO
port and also via MMIO for PCI MMCONFIG.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to avoid introducing a dependency on arch in the devices crate
pass the constant in to the IOAPIC device creation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rerrange "use" statements and make rename variables and fields to
indicate they might be unused.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the register_devices() function with all that functionality
spread across the places where the devices are created.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Mark exit_evt with an underscore it may be unused (it is ignored if the
"acpi" feature is not turned on.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add (non-default) support for using MMIO for virtio devices. This can be
tested by:
cargo build --no-default-features --features "mmio"
All necessary options will be included injected into the kernel
commandline.
Fixes: #243
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than calling it at the very start of the VM execution (i.e. when
the VCPUs are created) do it as part of the DeviceManager creation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the virtio devices independently of adding them to the PCI bus.
Instead accrue the devices in a vector and add them to the bus en-masse.
This will allow the virtio device creation to be used independently of
PCI based transport.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit relies on the new vsock::unix module to create the backend
that will be used from the virtio-vsock device.
The concept of backend is interesting here as it would allow for a vhost
kernel backend to be plugged if that was needed someday.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on previous patch introducing the new flag "--vsock", this commit
creates a new virtio-vsock device based on the presence of this flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The default number of MSI-X vector allocated was 2, which is the minimum
defined by the virtio specification. The reason for this minimum is that
virtio needs at least one interrupt to signal that configuration changed
and at least one to specify something happened regarding the virtqueues.
But this current implementation is not optimal because our VMM supports
as many MSI-X vectors as allowed by the MSI-X specification (2048 max).
For that reason, the current patch relies on the number of virtqueues
needed by the virtio device to determine the right amount of MSI-X
vectors needed. It's important not to forget the dedicated vector for
any configuration change too.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Refactor out DeviceManager into it's own file. This is part of a bigger
effort to reduce complexity in the vm.rs file but will also allow future
separation to allow making PCI support optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>