This patch completes the series by connecting the dots between the HTTP
frontend and the device manager backend.
Any request to hotplug a VFIO, disk, fs, pmem, net, or vsock device will
now return a response including the device name and the place of the
device in the PCI topology.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Pass from the device manager to the calling code the information about
the PCI device that has just been hotplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now the flow of both architectures are aligned to:
1. load kernel
2. create VCPU's
3. configure system
4. start VCPU's
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Between X86 and AArch64, there is some difference in booting a VM:
- X86_64 can setup IOAPIC before creating any VCPU.
- AArch64 have to create VCPU's before creating GIC.
The old process is:
1. load_kernel()
load kernel binary
configure system
2. activate_vcpus()
create & start VCPU's
So we need to separate "activate_vcpus" into "create_vcpus" and
"activate_vcpus" (to start vcpus only). Setup GIC and create FDT
between the 2 steps.
The new procedure is:
1. load_kernel()
load kernel binary
(X86_64) configure system
2. create VCPU's
3. (AArch64) setup GIC
4. (AArch64) configure system
5. start VCPU's
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This is a preparing commit to build and test CH on AArch64. All building
issues were fixed, but no functionality was introduced.
For X86, the logic of code was not changed at all.
For ARM, the architecture specific part is still empty. And we applied
some tricks to workaround lint warnings. But such code will be replaced
later by other commits with real functionality.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The ch branch has been rebased to incorporate the latest upstream code
requiring a small change to the unit tests.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To ensure that the DeviceManager threads (such as those used for virtio
devices) are cleaned up it is necessary to unpark them so that they get
cleanly terminated as part of the shutdown.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In case the VM is created from scratch, the devices should be created
after the DeviceManager has been created. But this should not affect the
restore codepath, as in this case the devices should be created as part
of the restore() function.
It's necessary to perform this differentiation as the restore must go
through the following steps:
- Create the DeviceManager
- Restore the DeviceManager with the right state
- Create the devices based on the restored DeviceManager's device tree
- Restore each device based on the restored DeviceManager's device tree
That's why this patch leverages the recent split of the DeviceManager's
creation to achieve what's needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit performs the split of the DeviceManager's creation into two
separate functions by moving anything related to device's creation after
the DeviceManager structure has been initialized.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the current state is paused that means most of the handles got killed by pthread_kill
We need to unpark those threads to make the shutdown worked. Otherwise
The shutdown API hangs and the API is not responding afterwards. So
before the shutdown call we need to resume the VM make it succeed.
Fixes: #817
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Adds DeviceManager method `make_virtio_fs_device` which creates a single
device, and modifies `make_virtio_fs_devices` to use this method.
Implements the new `vm.add-fs route`.
Signed-off-by: Dean Sheather <dean@coder.com>
We can now allow guests that specify an initramfs to boot
using the PVH boot protocol.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
When performing an API boot validate the configuration. For now only
some very basic validation is performed but in subsequent commits
the validation will be extended.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that the restore path uses RestoreConfig structure, we add a new
parameter called "prefault" to it. This will give the user the ability
to populate the pages corresponding to the mapped regions backed by the
snapshotted memory files.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When CoW can be used, the VM restoration time is reduced, but the pages
are not populated. This can lead to some slowness from the guest when
accessing these pages.
Depending on the use case, we might prefer a slower boot time for better
performances from guest runtime. The way to achieve this is to prefault
the pages in this case, using the MAP_POPULATE flag along with CoW.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever a MemoryManager is restored from a snapshot, the memory regions
associated with it might need to directly back the mapped memory for
increased performances. If that's the case, a list of external regions
is provided and the MemoryManager should simply ignore what's coming
from the MemoryConfig.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The MemoryManager is somehow a special case, as its restore() function
was not implemented as part of the Snapshottable trait. Instead, and
because restoring memory regions rely both on vm.json and every memory
region snapshot file, the memory manager is restored at creation time.
This makes the restore path slightly different from CpuManager, Vcpu,
DeviceManager and Vm, but achieve the correct restoration of the
MemoryManager along with its memory regions filled with the correct
content.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is only implementing the send() function in order to store all Vm
states into a file.
This needs to be extended for live migration, by adding more transport
methods, and also the recv() function must be implemented.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By aggregating snapshots from the CpuManager, the MemoryManager and the
DeviceManager, Vm implements the snapshot() function from the
Snapshottable trait.
And by restoring snapshots from the CpuManager, the MemoryManager and
the DeviceManager, Vm implements the restore() function from the
Snapshottable trait.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Implement the Snapshottable trait for Vcpu, and then implements it for
CpuManager. Note that CpuManager goes through the Snapshottable
implementation of Vcpu for every vCPU in order to implement the
Snapshottable trait for itself.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Whenever the memory is resized, it's important to retrieve the new
region to pass it down to the device manager, this way it can decide
what to do with it.
Also, there's no need to use a boolean as we can instead use an Option
to carry the information about the region. In case of virtio-mem, there
will be no region since the whole memory has been reserved up front by
the VMM at boot. This means only the ACPI hotplug will return a region
and is the only method that requires the memory to be updated from the
device manager.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Commit 2adddce2 reorganized the crate for a cleaner multi architecture
(x86_64 and aarch64) support.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
For now, the codebase does not support booting from initramfs with PVH
boot protocol, therefore we need to fallback to the legacy boot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
* load the initramfs File into the guest memory, aligned to page size
* finally setup the initramfs address and its size into the boot params
(in configure_64bit_boot)
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".
Signed-off-by: Hui Zhu <teawater@antfin.com>
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Whenever the VM memory is resized, DeviceManager needs to be notified
so that it can subsequently notify each virtio devices about it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This feature is stable and there is no need for this to be behind a
flag. This will also reduce the time needed to run the integration test
as we will not be running them all again under the flag.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
I spent a few minutes trying to understand why we were unconditionally
updating the VM config memory size, even if the guest memory resizing
did not happen.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fill the hvm_start_info and related memory map structures as
specified in the PVH boot protocol. Write the data structures
to guest memory at the GPA that will be stored in %rbx when
the guest starts.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
In order to properly initialize the kvm regs/sregs structs for
the guest, the load_kernel() return type must specify which
boot protocol to use with the entry point address it returns.
Make load_kernel() return an EntryPoint struct containing the
required information. This structure will later be used
in the vCPU configuration methods to setup the appropriate
initial conditions for the guest.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Add a new id option to the VFIO hotplug command so that it matches the
VFIO coldplug semantic.
This is done by refactoring the existing code for VFIO hotplug, where
VmAddDeviceData structure is replaced by DeviceConfig. This structure is
the one used whenever a VFIO device is coldplugged, which is why it
makes sense to reuse it for the hotplug codepath.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit ensures that when a VFIO device is hot-unplugged from the
VM, it is also removed from the VmConfig. This prevents a potential
reboot from creating the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The Vm structure was used to store a strong reference to the IO bus.
This is not needed anymore since the AddressManager is logically the
one holding this strong reference. This has been made possible by the
introduction of Weak references on the Bus structure itself.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.
This commit simply takes care of making the necessary changes to change
this function as mutable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's more logical to name the field referring to the DeviceManager as
"device_manager" instead of "devices".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:
DeviceManager ---> AddressManager ---> Bus ---> BusDevice
^ |
| |
+---------------------------------------------+
This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.
One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the user wants to hotplug a new VFIO PCI device, the VMM will
have to trigger a hotplug notification through the GED device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))
The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the ioctl syscall KVM_CREATE_VM gets interrupted while creating the
VM, it is expected that we should retry since EINTR should not be
considered a standard error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.
Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.
In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.
Follow on commits will further reduce some of the duplicated code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.
All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.
Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that the GED device does not use a hardcoded IRQ number the starting
IRQ number can be restored (needed for the hardcoded serial port IRQ.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for handling the creation of the DSDT entries for devices
into the DeviceManager.
This will make it easier to handle device hotplug and also in the future
remove some hardcoded ACPI constants.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for generating the MADT (APIC) table and the DSDT
generation for CPU related functionality into the CpuManager.
There is no functional change just code rearrangement.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previously the device setup code assumed that if no IOAPIC was passed in
then the device should be added to the kernel irqchip. As an earlier
change meant that there was always a userspace IOAPIC this kernel based
code can be removed.
The accessor still returns an Option type to leave scope for
implementing a situation without an IOAPIC (no serial or GED device).
This change does not add support no-IOAPIC mode as the original code did
not either.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Update the configuration after a resize to ensure that after a reboot
the added vCPUs are preserved.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The tty mode remains raw mode when cloud-hypervisor is terminted by
SIGTERM or SIGINT. The terminal is unusable due to echoing is
disabled which is really annoying.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>