Commit Graph

484 Commits

Author SHA1 Message Date
Sebastien Boeuf
6cbdb9aa47 vmm: api: Introduce new "remove-device" HTTP endpoint
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
991f3bb5da vmm: Remove VFIO device from everywhere it is referenced
This commit implements the eject function so that a VFIO device will be
removed from any bus it might sit on, and from any list it might be
stored in.

The idea is to reach a point where there is no reference of the device
anywhere in the code, so that the Drop implementation will be invoked
and so that the device will be fully removed from the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
6adebbc6a0 vmm: Detect when guest notifies about ejecting PCI device
When the guest OS is done removing a PCI device, it will invoke the _EJ0
method from ACPI, associated with the device. This will trigger a port
IO write to a region known by the VMM. Upon this writing, the VMM will
trap the VM exit and retrieve the written value.

Based on the value, the VMM will invoke its eject_device() method to
finalize the removal of the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
08604ac6a8 vmm: Store PCI devices as Any devices from DeviceManager
As we try to keep track of every PCI device related to the VM, we don't
want to have separate lists depending on the concrete type associated
with the PciDevice trait. Also, we want to be able to cast the actual
type into any trait or concrete type.

The most efficient way to solve all these issues is to store every
device as an Arc<dyn Any + Send + Sync>. This gives the ability to
downcast into the appropriate concrete type, and then to cast back into
any trait that we might need.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Sebastien Boeuf
0f99d3f7cc vmm: Store VFIO device's name and its PCI b/d/f
Add a new list storing the device names across the entire codebase. VFIO
devices are added to the list whenever a new one is created. By default,
each VFIO device is given a name "vfioX" where X is the first available
integer.

Along with this new list of names, another list is created, grouping PCI
device's name with its associated b/d/f. This will be useful to keep
track of the created devices so that we can implement unplug
functionality.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-10 17:05:06 +00:00
Rob Bradford
f0a3e7c4a1 build: Bump linux-loader and vm-memory dependencies
linux-loader now uses the released vm-memory so we must move to that
version at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-05 11:01:30 +01:00
Sebastien Boeuf
09829c44b2 vmm: Remove IO bus strong reference from Vm
The Vm structure was used to store a strong reference to the IO bus.
This is not needed anymore since the AddressManager is logically the
one holding this strong reference. This has been made possible by the
introduction of Weak references on the Bus structure itself.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
2dbb376175 vmm: Remove all Weak references from DeviceManager
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
DeviceManager anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
9e915a0284 vmm: Remove all Weak references from CpuManager
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
CpuManager anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
49268bff3b pci: Remove all Weak references from PciBus
Now that the BusDevice devices are stored as Weak references by the IO
and MMIO buses, there's no need to use Weak references from the PciBus
anymore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
7773812f58 vmm: Store the list of BusDevice devices from DeviceManager
The point is to make sure the DeviceManager holds a strong reference of
each BusDevice inserted on the IO and MMIO buses. This will allow these
buses to hold Weak references onto the BusDevice devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
d0820cc026 vmm: Make add_vfio_device mutable
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.

This commit simply takes care of making the necessary changes to change
this function as mutable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
948f808da6 vm: Rename DeviceManager field in Vm structure
It's more logical to name the field referring to the DeviceManager as
"device_manager" instead of "devices".

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 18:46:44 +01:00
Sebastien Boeuf
d47f733e51 vmm: Break the cyclic dependency between DeviceManager and IO bus
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:

  DeviceManager ---> AddressManager ---> Bus ---> BusDevice
        ^                                             |
        |                                             |
        +---------------------------------------------+

This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c1af13efeb vmm: Update VmConfig when adding new device
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
a86f4369a7 vmm: Add VFIO PCI device hotplug support
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.

One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
320fea0eaf vmm: Factorize VFIO PCI device creation
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
00716f90a0 vmm: Store virtio-iommu device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5902dfa403 vmm: Store VFIO KVM device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d9c1b4396e vmm: Store MSI InterruptManager from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
02adc4061a vmm: Store PciBus from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d0218e94a3 vmm: Trigger hotplug notification to the guest
Whenever the user wants to hotplug a new VFIO PCI device, the VMM will
have to trigger a hotplug notification through the GED device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0e58741a09 vmm: api: Introduce new "add-device" HTTP endpoint
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0f1396acef vmm: Insert PCI device hotplug operation region on IO bus
Through the BusDevice implementation from the DeviceManager, and by
inserting the DeviceManager on the IO bus for a specific IO port range,
the VMM now has the ability to handle PCI device hotplug.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
65774e8a78 vmm: Implement BusDevice for DeviceManager
In anticipation of inserting the DeviceManager on the IO/MMIO buses,
the DeviceManager must implement the BusDevice trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
8dbc84318c vmm: acpi: Add PCNT method to invoke DVNT
Create a small method that will perform both hotplug of all the devices
identified by PCIU bitmap, and then perform the hotunplug of all the
devices identified by the PCID bitmap.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c62db97a81 vmm: acpi: Add _EJ0 to each PCI device slot
The _EJ0 method provides the guest OS a way to notify the VMM that the
device has been properly ejected from the guest OS. Only after this
point, the VMM can fully remove the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
4dc2a39f3a vmm: acpi: Create PHPR container
This new PHPR device in the DSDT table introduces some specific
operation regions and the associated fields.

PCIU stands for "PCI up", which identifies PCI devices that must be
added.
PCID stands for "PCI down", which identifies PCI devices that must be
removed.
B0EJ stands for "Bus 0 eject", which identifies which device on the bus
has been ejected by the guest OS.

Thanks to these fields, the VMM and the guest OS can communicate while
performing hotplug/hotunplug operations.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c3a0685e2d vmm: acpi: Add notification method for PCI device slots
Adds the DVNT method to the PCI0 device in the DSDT table. This new
method is responsible for checking each slot and notify the guest OS if
one of the slots is supposed to be added or removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5a68d5b6a7 vmm: acpi: Create PCI device slots
This commit introduces the ACPI support for describing the 32 device
slots attached to the main PCI host bridge.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Bin Liu
d6e6901957 vmm/api: Fix vm.info response definition
Update cloud-hypervisor.yaml with latest code.

Fixes: #841

Signed-off-by: liubin <liubin0329@gmail.com>
2020-03-03 09:34:25 +01:00
Sebastien Boeuf
8142c823ed vmm: Move DeviceManager into an Arc<Mutex<>>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-27 11:12:31 +01:00
Qiu Wenbo
9de3ace8c7 devices: implement Aml trait for GED device
Fixes: #657

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-02-25 08:32:16 +00:00
Sebastien Boeuf
b77fdeba2d msi/msi-x: Prevent from losing masked interrupts
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.

The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.

That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.

Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-25 08:31:14 +00:00
Rob Bradford
bba5ef3a59 vmm: Remove deprecated CPU syntax
Remove the old way of specifying the number of vCPUs to use.

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
374ac77c63 main, vmm: Remove deprecated --vhost-user-net
This has been superseded by using --net with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
ffd816ebfa main, vmm: Remove deprecated --vhost-user-blk
This has been superseded by using --disk with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
dependabot-preview[bot]
f190cb05b5 build(deps): bump libc from 0.2.66 to 0.2.67
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.66 to 0.2.67.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.66...0.2.67)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-02-21 08:03:30 +00:00
Sergio Lopez
d2f1749edb vmm: config: Add poll_queue property to DiskConfig
Recently, vhost_user_block gained the ability of actively polling the
queue, a feature that can be disabled with the poll_queue property.

This change adds this property to DiskConfig, so it can be used
through the "disk" argument.

For the moment, it can only be used when vhost_user=true, but this
will change once virtio-block gets the poll_queue feature too.

Fixes: #787

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
378dd81204 vmm: openapi: Add missing "direct" knob to DiskConfig
Add missing "direct" knob that should be exposed through the REST API.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
056f5481ac vmm: openapi: Fix "readonly" and "wce" defaults in DiskConfig
Fix "readonly" and "wce" defaults in cloud-hypervisor.yaml to match
their respective defaults in config.rs:DiskConfig.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Samuel Ortiz
c49e31a6d9 vmm: api: Return a resize error when resize fails
And not a VmCreate one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
ebc6391bea vmm: api: Fix resize command typos
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
9de755334d vmm: openapi: Update DiskConfig
It's missing a few knobs (readonly, vhost, wce) that should be exposed
through the rest API.

Fixes: #790

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:17:50 +01:00
Rob Bradford
ed1e7817cc vmm: Workaround double reboot triggered by the kernel
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)

In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.

The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.

Fixes: #783

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-19 18:51:14 +01:00
Sebastien Boeuf
793d4e7b8d vmm: Move codebase to GuestMemoryAtomic from vm-memory
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.

The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.

Fixes #735

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 13:48:19 +00:00
Rob Bradford
1f6cbad01a vmm: Add support for spawning vhost-user-block backend
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-18 08:43:47 +00:00
Sebastien Boeuf
3edc2bd6ab vmm: Prevent memory overcommitment through virtio-fs shared regions
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.

It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.

Fixes #763

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-17 15:03:47 +01:00
Rob Bradford
bc75c1b4e1 vmm: Add support for spawning vhost-user-net backend
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Currently this only supports creating a new tap interface as the network
backend also only supports that.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
b04eb4770b vmm: Follow the "exe" symlink from the PID directory in /proc
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))

The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00