Commit Graph

450 Commits

Author SHA1 Message Date
Sebastien Boeuf
a86f4369a7 vmm: Add VFIO PCI device hotplug support
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.

One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
320fea0eaf vmm: Factorize VFIO PCI device creation
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
00716f90a0 vmm: Store virtio-iommu device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5902dfa403 vmm: Store VFIO KVM device from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d9c1b4396e vmm: Store MSI InterruptManager from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
02adc4061a vmm: Store PciBus from DeviceManager
Helps with future refactoring of VFIO device creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
d0218e94a3 vmm: Trigger hotplug notification to the guest
Whenever the user wants to hotplug a new VFIO PCI device, the VMM will
have to trigger a hotplug notification through the GED device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0e58741a09 vmm: api: Introduce new "add-device" HTTP endpoint
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
0f1396acef vmm: Insert PCI device hotplug operation region on IO bus
Through the BusDevice implementation from the DeviceManager, and by
inserting the DeviceManager on the IO bus for a specific IO port range,
the VMM now has the ability to handle PCI device hotplug.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
65774e8a78 vmm: Implement BusDevice for DeviceManager
In anticipation of inserting the DeviceManager on the IO/MMIO buses,
the DeviceManager must implement the BusDevice trait.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
8dbc84318c vmm: acpi: Add PCNT method to invoke DVNT
Create a small method that will perform both hotplug of all the devices
identified by PCIU bitmap, and then perform the hotunplug of all the
devices identified by the PCID bitmap.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c62db97a81 vmm: acpi: Add _EJ0 to each PCI device slot
The _EJ0 method provides the guest OS a way to notify the VMM that the
device has been properly ejected from the guest OS. Only after this
point, the VMM can fully remove the device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
4dc2a39f3a vmm: acpi: Create PHPR container
This new PHPR device in the DSDT table introduces some specific
operation regions and the associated fields.

PCIU stands for "PCI up", which identifies PCI devices that must be
added.
PCID stands for "PCI down", which identifies PCI devices that must be
removed.
B0EJ stands for "Bus 0 eject", which identifies which device on the bus
has been ejected by the guest OS.

Thanks to these fields, the VMM and the guest OS can communicate while
performing hotplug/hotunplug operations.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
c3a0685e2d vmm: acpi: Add notification method for PCI device slots
Adds the DVNT method to the PCI0 device in the DSDT table. This new
method is responsible for checking each slot and notify the guest OS if
one of the slots is supposed to be added or removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Sebastien Boeuf
5a68d5b6a7 vmm: acpi: Create PCI device slots
This commit introduces the ACPI support for describing the 32 device
slots attached to the main PCI host bridge.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-04 12:06:02 +00:00
Bin Liu
d6e6901957 vmm/api: Fix vm.info response definition
Update cloud-hypervisor.yaml with latest code.

Fixes: #841

Signed-off-by: liubin <liubin0329@gmail.com>
2020-03-03 09:34:25 +01:00
Sebastien Boeuf
8142c823ed vmm: Move DeviceManager into an Arc<Mutex<>>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-27 11:12:31 +01:00
Qiu Wenbo
9de3ace8c7 devices: implement Aml trait for GED device
Fixes: #657

Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
2020-02-25 08:32:16 +00:00
Sebastien Boeuf
b77fdeba2d msi/msi-x: Prevent from losing masked interrupts
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.

The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.

That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.

Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-25 08:31:14 +00:00
Rob Bradford
bba5ef3a59 vmm: Remove deprecated CPU syntax
Remove the old way of specifying the number of vCPUs to use.

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
374ac77c63 main, vmm: Remove deprecated --vhost-user-net
This has been superseded by using --net with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
ffd816ebfa main, vmm: Remove deprecated --vhost-user-blk
This has been superseded by using --disk with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Sergio Lopez
d2f1749edb vmm: config: Add poll_queue property to DiskConfig
Recently, vhost_user_block gained the ability of actively polling the
queue, a feature that can be disabled with the poll_queue property.

This change adds this property to DiskConfig, so it can be used
through the "disk" argument.

For the moment, it can only be used when vhost_user=true, but this
will change once virtio-block gets the poll_queue feature too.

Fixes: #787

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
378dd81204 vmm: openapi: Add missing "direct" knob to DiskConfig
Add missing "direct" knob that should be exposed through the REST API.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Sergio Lopez
056f5481ac vmm: openapi: Fix "readonly" and "wce" defaults in DiskConfig
Fix "readonly" and "wce" defaults in cloud-hypervisor.yaml to match
their respective defaults in config.rs:DiskConfig.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-20 18:06:54 +01:00
Samuel Ortiz
c49e31a6d9 vmm: api: Return a resize error when resize fails
And not a VmCreate one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
ebc6391bea vmm: api: Fix resize command typos
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:26:12 +01:00
Samuel Ortiz
9de755334d vmm: openapi: Update DiskConfig
It's missing a few knobs (readonly, vhost, wce) that should be exposed
through the rest API.

Fixes: #790

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-20 12:17:50 +01:00
Rob Bradford
ed1e7817cc vmm: Workaround double reboot triggered by the kernel
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)

In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.

The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.

Fixes: #783

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-19 18:51:14 +01:00
Sebastien Boeuf
793d4e7b8d vmm: Move codebase to GuestMemoryAtomic from vm-memory
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.

The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.

Fixes #735

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 13:48:19 +00:00
Rob Bradford
1f6cbad01a vmm: Add support for spawning vhost-user-block backend
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-18 08:43:47 +00:00
Sebastien Boeuf
3edc2bd6ab vmm: Prevent memory overcommitment through virtio-fs shared regions
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.

It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.

Fixes #763

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-17 15:03:47 +01:00
Rob Bradford
bc75c1b4e1 vmm: Add support for spawning vhost-user-net backend
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)

Currently this only supports creating a new tap interface as the network
backend also only supports that.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
b04eb4770b vmm: Follow the "exe" symlink from the PID directory in /proc
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))

The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Rob Bradford
7c9e8b103f vmm: device_manager: Shutdown all virtio devices
When the DeviceManager is dropped explicitly shutdown() all virtio
devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Sebastien Boeuf
3447e226d9 dependencies: bump vm-memory from 4237db3 to f3d1c27
This commit updates Cloud-Hypervisor to rely on the latest version of
the vm-memory crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-06 11:40:45 +01:00
Sebastien Boeuf
62ccccc303 vmm: Make sure to retry creating the VM on EINTR
If the ioctl syscall KVM_CREATE_VM gets interrupted while creating the
VM, it is expected that we should retry since EINTR should not be
considered a standard error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-05 12:06:21 +01:00
Samuel Ortiz
da2b3c92d3 vm-device: interrupt: Remove InterruptType dependencies and definitions
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.

By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 19:32:45 +01:00
Samuel Ortiz
84fc807bc6 interrupt: Interrupt manager split
We create 2 different interrupt managers for separately handling
creation of legacy and MSI interrupt groups.
Doing so allows us to have a cleaner interrupt manager and IOAPIC
initialization path. It also prepares for an InterruptManager trait
design improvement where we remove the interrupt source type dependency
by associating an interrupt configuration type to the trait.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 19:32:45 +01:00
Rob Bradford
880a57c920 vmm: Remove VmInfo struct
After refactoring the VmInfo struct is no longer needed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
07bc292fa5 vmm: device_manager: Get VmFd from AddressManager
A reference to the VmFd is stored on the AddressManager so it is not
necessary to pass in the VmInfo into all methods that need it as it can
be obtained from the AddressManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
6411c3ae42 vmm: device_manager: Use MemoryManager to get guest memory
The DeviceManager has a reference to the MemoryManager so use that to
get the GuestMemoryMmap rather than the version stored in the VmInfo
struct.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
066fc6c0d1 vmm: device_manager: Get VM config from the struct member
Remove the use of vm_info in methods to get the config and instead use
the config stored on the DeviceManager itself.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
77ae3de4f3 vmm: device_manager: Make legacy device addition a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
599275b610 vmm: device_manager: Make ACPI device creation a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
b8c1b2e174 vmm: device_manager: Make console creation a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
b5440e2d0a vmm: device_manager: Make virtio device creation functions methods
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. This prepares the way to more easily store state on
the DeviceManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
e90c6f3c44 vmm: device_manager: Make make_virtio_devices a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. A follow-up commit will change the callee functions
that create the devices themselves.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
dbc09ad0ef vmm: device_manager: Make add_vfio_devices a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00
Rob Bradford
d9e1c2cd22 vmm: device_manager: Make add_virtio_pci_device a method
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 12:28:30 +00:00