Commit Graph

109 Commits

Author SHA1 Message Date
Sebastien Boeuf
75e22ff34e vmm: Use LegacyUserspaceInterruptGroup for serial device
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
8d7c4ea334 vmm: Use LegacyUserspaceInterruptGroup for mmio devices
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
f70c9937fb vmm: Add ioapic to KvmInterruptManager
By having a reference to the IOAPIC, the KvmInterruptManager is going
to be able to initialize properly the legacy interrupt source group.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
e73cb1ff80 vmm: Initialize InterruptManager sooner
In order to let the InterruptManager be shared across both PCI and MMIO
devices, this commit moves the initialization earlier in the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-21 10:44:48 +01:00
Sebastien Boeuf
4bb12a2d8d interrupt: Reorganize all interrupt management with InterruptManager
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.

By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.

Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.

Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
c396baca46 vm-virtio: Modify VirtioInterrupt callback into a trait
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.

Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.

For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
2381f32ae0 msix: Add gsi_msi_routes to MsixConfig
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
9b60fcdc39 msix: Add VmFd to MsixConfig
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
86c760a0d9 msix: Add SystemAllocator to MsixConfig
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.

Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sebastien Boeuf
f5704d32b3 vmm: Move gsi_msi_routes creation to be shared across all PCI devices
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-01-17 23:43:45 +01:00
Sergio Lopez
a14aee9213 qcow: Use RawFile as backend instead of File
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Sergio Lopez
c5a656c9dc vm-virtio: block: Add support for alignment restrictions
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:

 - Address of the source/destination memory buffer.
 - File offset.
 - Length of the data to be read/written.

The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".

To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.

We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-01-17 17:28:44 +00:00
Cathy Zhang
652e7b9b8a vm-virtio: Implement multiple queue support for net devices
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.

Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
4ab88a8173 net_util: Add multiple queue support for tap
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Cathy Zhang
1ae7deb393 vm-virtio: Implement refactor for net devices and backend
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-01-17 12:06:19 +01:00
Rob Bradford
8b500d7873 deps: Bump vm-memory and linux-loader version
The function GuestMemory::end_addr() has been renamed to last_addr()

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
7310ab6fa7 devices, vmm: Use a bit field for ACPI GED interrupt type
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
4e414f0d84 vmm: device_manager: Scan memory devices upon GED interrupt
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
8ecf736982 vmm: device_manager: Add the MemoryManager to the I/O bus
Now that the MemoryManager has I/O port functionality it needs to be
exposed on the I/O bus.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Rob Bradford
78dcb1862c vmm: device_manager: Store the type of notification in a local value
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-15 20:21:22 +01:00
Samuel Ortiz
5788d36583 vmm: Do not create virtio devices when missing a transport
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-01-14 07:42:09 +01:00
Rob Bradford
b2589d4f3f vm-virtio, vmm, vfio: Store GuestMemoryMmap in an Arc<ArcSwap<T>>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.

Not covered by this change is replacing the GuestMemoryMmap itself.

This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-02 13:20:11 +00:00
Rob Bradford
a551398135 vmm: device_manager: Use MemoryManager to create KVM mapping
Use the newly exported funtionality to reduce the amount of duplicated
code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
7df88793a0 vmm: device_manager: Get device range from MemoryManager
This removes the duplication of these values.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
61cfe3e72d vmm: Obtain sequential KVM memory slot numbers from MemoryManager
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
260cebb8cf vmm: Introduce MemoryManager
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.

In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.

Follow on commits will further reduce some of the duplicated code.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-23 10:25:40 +00:00
Rob Bradford
d5682cd306 vmm: device_manager: Rewrite if chain using match
To reflect updated clippy rules:

error: `if` chain can be rewritten with `match`
    --> vmm/src/device_manager.rs:1508:25
     |
1508 | /                         if ret > 0 {
1509 | |                             debug!("MSI message successfully delivered");
1510 | |                         } else if ret == 0 {
1511 | |                             warn!("failed to deliver MSI message, blocked by guest");
1512 | |                         }
     | |_________________________^
     |
     = note: `-D clippy::comparison-chain` implied by `-D warnings`
     = help: Consider rewriting the `if` chain to use `cmp` and `match`.
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-20 00:52:03 +01:00
Rob Bradford
e25a47b32c vmm: device_manager: Remove redundant clones
Address updated clippy errors:

error: redundant clone
   --> vmm/src/device_manager.rs:699:32
    |
699 |             .insert(acpi_device.clone(), 0x3c0, 0x4)
    |                                ^^^^^^^^ help: remove this
    |
    = note: `-D clippy::redundant-clone` implied by `-D warnings`
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:699:21
    |
699 |             .insert(acpi_device.clone(), 0x3c0, 0x4)
    |                     ^^^^^^^^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

error: redundant clone
   --> vmm/src/device_manager.rs:737:26
    |
737 |             .insert(i8042.clone(), 0x61, 0x4)
    |                          ^^^^^^^^ help: remove this
    |
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:737:21
    |
737 |             .insert(i8042.clone(), 0x61, 0x4)
    |                     ^^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

error: redundant clone
   --> vmm/src/device_manager.rs:754:29
    |
754 |                 .insert(cmos.clone(), 0x70, 0x2)
    |                             ^^^^^^^^ help: remove this
    |
note: this value is dropped without further use
   --> vmm/src/device_manager.rs:754:25
    |
754 |                 .insert(cmos.clone(), 0x70, 0x2)
    |                         ^^^^
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-20 00:52:03 +01:00
Rob Bradford
e8313e3e69 vmm: acpi: Refactor ACPI CPU notification
Continue to notify on all vCPUs but instead separate the notification
functionality into two methods, CSCN that walks through all the CPUs
and CTFY which notifies based on the numerical CPU id. This is an
interim step towards only notifying on changed CPUs and ultimately CPU
removal.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-16 23:57:14 +01:00
Samuel Ortiz
f0b7412495 vmm: device_manager: Add all virtio devices to the migratable list
We want to track all migratable devices through the DeviceManager.

Fixes: #341

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
35dd1523c9 vmm: device_manager: Implement the Pausable trait
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.

All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.

Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Samuel Ortiz
35d7721683 vmm: Convert virtio devices to Arc<Mutex<T>>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.

Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-12-12 08:50:36 +01:00
Sebastien Boeuf
64c5e3d8cb vmm: api: Adjust FsConfig for OpenAPI
The FsConfig structure has been recently adjusted so that the default
value matches between OpenAPI and CLI. Unfortunately, with the current
description, there is no way from the OpenAPI to describe a cache_size
value "None", so that DAX does not get enabled. Usually, using a Rust
"Option" works because the default value is None. But in this case, the
default value is Some(8G), which means we cannot describe a None.

This commit tackles the problem, introducing an explicit parameter
"dax", and leaving "cache_size" as a simple u64 integer.

This way, the default value is dax=true and cache_size=8G, but it lets
the opportunity to disable DAX entirely with dax=false, which will
simply ignore the cache_size value.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-11 15:50:24 +00:00
Sebastien Boeuf
5e0bbf9c3b vmm: Don't factorize vhost-user configurations
We want to set different default configurations for vhost-user-net and
vhost-user-blk, which is the reason why the common part corresponding to
the number of queues and the queue size cannot be embedded.

This prepares for the following commit, matching API and CLI behaviors.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-11 15:50:24 +00:00
Rob Bradford
ba59c62044 vmm, devices: Remove hardcoded IRQ number for GED device
Remove the previously hardcoded IRQ number used for the GED device.
Instead allocate the IRQ using the allocator and use that value in the
definition in the ACPI device.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-09 16:58:00 +00:00
Sebastien Boeuf
aa94e9b8f3 Revert "vmm: api: Modify FsConfig to be OpenAPI friendly"
This reverts commit defc5dcd9c.
2019-12-06 18:08:10 +00:00
Rob Bradford
9b1ba14f2d vmm: Delegate device related ACPI DSDT table work to DeviceManager
Move the code for handling the creation of the DSDT entries for devices
into the DeviceManager.

This will make it easier to handle device hotplug and also in the future
remove some hardcoded ACPI constants.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-06 17:44:00 +00:00
Sebastien Boeuf
defc5dcd9c vmm: api: Modify FsConfig to be OpenAPI friendly
When consumer of the HTTP API try to interact with cloud-hypervisor,
they have to provide the equivalent of the config structure related to
each component they need. Problem is, the Rust enum type "Option" cannot
be obtained from the OpenAPI YAML definition.

This patch intends to fix this inconsistency between what is possible
through the CLI and what's possible through the HTTP API by using simple
types bool and int64 instead of Option<u64>.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-06 06:38:48 -08:00
Rob Bradford
59d01712ad vmm: Remove kernel based IOAPIC handling from the device manager
Previously the device setup code assumed that if no IOAPIC was passed in
then the device should be added to the kernel irqchip. As an earlier
change meant that there was always a userspace IOAPIC this kernel based
code can be removed.

The accessor still returns an Option type to leave scope for
implementing a situation without an IOAPIC (no serial or GED device).
This change does not add support no-IOAPIC mode as the original code did
not either.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-06 12:34:06 +01:00
Rob Bradford
9b1cb9621f vmm: Remove pin based interrupt setup for virtio devices
With MSI now required remove pin based interrupt support from all the
virtio PCI device setup.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-06 12:34:06 +01:00
Rob Bradford
1722708612 vmm: Switch to storing VmConfig inside an Arc<Mutex<>>
This permits the runtime reconfiguration of the VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-05 16:39:19 +00:00
Sebastien Boeuf
08258d5dad vfio: pci: Allow multiple devices to be passed through
The KVM_SET_GSI_ROUTING ioctl is very simple, it overrides the previous
routes configuration with the new ones being applied. This means the
caller, in this case cloud-hypervisor, needs to maintain the list of all
interrupts which needs to be active at all times. This allows to
correctly support multiple devices to be passed through the VM and being
functional at the same time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-12-04 08:48:17 +01:00
Rob Bradford
791ca3388f vmm: device_manager: Add ability to notify via GED device
Add ability to notify via the GED device that there is some new hotplug
activity. This will be used by the CpuManager (and later DeviceManager
itself) to notify of new hotplug activity.

Currently it has a hardcoded IRQ of 5 as the ACPI tables also need to
refer to this IRQ and the IRQ allocation does not permit the allocation
of specific IRQs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-02 13:49:04 +00:00
Rob Bradford
7ad68d499a vmm: device_manager: Allocate I/O port for ACPI shutdown device
The refactoring in ce1765c8af dropped the
code to allocate the I/O port.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-12-02 13:49:04 +00:00
Samuel Ortiz
0f21781fbe cargo: Bump the kvm and vmm-sys-util crates
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-11-29 17:48:02 +00:00
Sebastien Boeuf
f979380620 vmm: Mark guest persistent memory pages as mergeable
In case the VM is started with the flag "--pmem mergeable=on", it means
the user expects the guest persistent memory pages to be marked as
mergeable. This commit relies on the madvise(MADV_MERGEABLE) system call
to inform the host kernel about these pages.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-11-22 15:28:10 +00:00
Rob Bradford
50c8335d3d vmm: device_manager: Expose the SystemAllocator
This allows other code to allocate I/O ports for use on the (already)
exposed IO bus.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-11-21 09:17:15 -08:00
Samuel Ortiz
f0e618431d vmm: device_manager: Use consistent naming when adding devices
When adding devices to the guest, and populating the device model, we
should prefix the routines with add_. When we're just creating the
device objects but not yet adding them we use make_.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-11-19 13:36:21 -08:00
Samuel Ortiz
a2ee681665 vmm: device_manager: Add an MMIO devices creation routine
In order to reduce the DeviceManager's new() complexity, we can move the
MMIO devices creation code into its own routine.

Fixes: #441

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-11-19 13:36:21 -08:00
Samuel Ortiz
79b8f8e477 vmm: device_manager: Add a PCI devices creation routine
In order to reduce the DeviceManager's new() complexity, we can move the
PCI devices creation code into its own routine.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-11-19 13:36:21 -08:00