Update for clippy in Rust 1.50.0:
error: Unnecessary nested match
--> vmm/src/vm.rs:419:17
|
419 | / if let vm_device::BusError::MissingAddressRange = e {
420 | | warn!("Guest MMIO write to unregistered address 0x{:x}", gpa);
421 | | }
| |_________________^
|
= note: `-D clippy::collapsible-match` implied by `-D warnings`
help: The outer pattern can be modified to include the inner pattern.
--> vmm/src/vm.rs:418:17
|
418 | Err(e) => {
| ^ Replace this binding
419 | if let vm_device::BusError::MissingAddressRange = e {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ with this pattern
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_match
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Swap the last two parameters of guest_mem_{read,write} to be consistent
with other read / write functions.
Use more descriptive parameter names.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.
Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.
From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).
The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.
The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).
Signed-off-by: William Douglas <william.r.douglas@gmail.com>
This skeleton commit brings in the support for compiling aarch64 with
the "acpi" feature ready to the ACPI enabling. It builds on the work to
move the ACPI hotplug devices from I/O ports to MMIO and conditionalises
any code that is x86_64 only (i.e. because it uses an I/O port.)
Filling in the aarch64 specific details in tables such as the MADT it
out of the scope.
See: #2178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use the ACPI GED device to trigger a notitifcation of type
POWER_BUTTON_CHANGED which will ultimately lead to the guest being
notified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Renamed this bitfield as it will also be used for non-hotplug purposes
such as synthesising a power button.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When a device is ready to be activated signal to the VMM thread via an
EventFd that there is a device to be activated. When the VMM receives a
notification on the EventFd that there is a device to be activated
notify the device manager to attempt to activate any devices that have
not been activated.
As a side effect the VMM thread will create the virtio device threads.
Fixes: #1863
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are some code base and function which are purely KVM specific for
now and we don't have those supports in mshv at the moment but we have plan
for the future. We are doing a feature guard with KVM. For example, KVM has
mp_state, cpu clock support, which we don't have for mshv. In order to build
those code we are making the code base for KVM specific compilation.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
When using an PIO write to 0x80 which is a special case handle that and
then return without going through the resolve.
This removes an extra warning that is reported.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The DeviceNode cannot be fully represented as it embeds a Rust style
enum (i.e. with data) which is instead represented by a simple
associative array.
Fixes: #1167
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This interface is used by the vCPU thread to delegate responsibility for
handling MMIO/PIO operations and to support different approaches than a
VM exit.
During profiling I found that we were spending 13.75% of the boot CPU
uage acquiring access to the object holding the VmmOps via
ArcSwap::load_full()
13.75% 6.02% vcpu0 cloud-hypervisor [.] arc_swap::ArcSwapAny<T,S>::load_full
|
---arc_swap::ArcSwapAny<T,S>::load_full
|
--13.43%--<hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run
std::sys_common::backtrace::__rust_begin_short_backtrace
core::ops::function::FnOnce::call_once{{vtable-shim}}
std::sys::unix:🧵:Thread:🆕:thread_start
However since the object implementing VmmOps does not need to be mutable
and it is only used from the vCPU side we can change the ownership to
being a simple Arc<> that is passed in when calling create_vcpu().
This completely removes the above CPU usage from subsequent profiles.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows code running in the VMM to access the VM's MemoryManager's
functionality for managing the dirty log including resetting it but also
generating a table.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Prior to sending the memory the full state is not needed only the
configuration. This is sufficient to create the appropriate structures
in the guest and have the memory allocations ready for filling.
Update the protocol documentation to add a separate config step and move
the state to after the memory is transferred. As the VM is created in a
separate step to restoring it the requires a slightly different
constructor as well as saving the VM object for the subsequent commands.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is tested by:
Source VMM:
target/debug/cloud-hypervisor --kernel ~/src/linux/vmlinux \
--pmem file=~/workloads/focal.raw --cpus boot=1 \
--memory size=2048M \
--cmdline"root=/dev/pmem0p1 console=ttyS0" --serial tty --console off \
--api-socket=/tmp/api1 -v
Destination VMM:
target/debug/cloud-hypervisor --api-socket=/tmp/api2 -v
And the following commands:
target/debug/ch-remote --api-socket=/tmp/api1 pause
target/debug/ch-remote --api-socket=/tmp/api2 receive-migration unix:/tmp/foo &
target/debug/ch-remote --api-socket=/tmp/api1 send-migration unix:/tmp/foo
target/debug/ch-remote --api-socket=/tmp/api2 resume
The VM is then responsive on the destination VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the code to be reused when creating the VM from a snapshot
when doing VM migration.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This also removes the need to lookup up the "exe" symlink for finding
the VMM executable path.
Fixes: #1925
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The logic to handle AArch64 system event was: SHUTDOWN and RESET were
all treated as RESET.
Now we handle them differently:
- RESET event will trigger Vmm::vm_reboot(),
- SHUTDOWN event will trigger Vmm::vm_shutdown().
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Rather than filling the guest memory from a file at the point of the the
guest memory region being created instead fill from the file later. This
simplifies the region creation code but also adds flexibility for
sourcing the guest memory from a source other than an on disk file.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that we have a new dedicated way of asking for a balloon through the
CLI and the REST API, we can move all the balloon code to the device
manager. This allows us to simplify the memory manager, which is already
quite complex.
It also simplifies the behavior of the balloon resizing command. Instead
of providing the expected size for the RAM, which is complex when memory
zones are involved, it now expects the balloon size. This is a much more
straightforward behavior as it really resizes the balloon to the desired
size. Additionally to the simplication, the benefit of this approach is
that it does not need to be tied to the memory manager at all.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Before Virtio-mmio was removed, we passed an optional PCI space address
parameter to AArch64 code for generating FDT. The address is none if the
transport is MMIO.
Now Virtio-PCI is the only option, the parameter is mandatory.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Virtio-mmio is removed, now virtio-pci is the only option for virtio
transport layer. We use MSI for PCI device interrupt. While GICv2, the
legacy interrupt controller, doesn't support MSI. So GICv2 is not very
practical for Cloud-hypervisor, we can remove it.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
If the user specified a maximum physical bits value through the
`max_phys_bits` option from `--cpus` parameter, the guest CPUID
will be patched accordingly to ensure the guest will find the
right amount of physical bits.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If the user provided a maximum physical bits value for the vCPUs, the
memory manager will adapt the guest physical address space accordingly
so that devices are not placed further than the specified value.
It's important to note that if the number exceed what is available on
the host, the smaller number will be picked.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Run loop in hypervisor needs a callback mechanism to access resources
like guest memory, mmio, pio etc.
VmmOps trait is introduced here, which is implemented by vmm module.
While handling vcpuexits in run loop, this trait allows hypervisor
module access to the above mentioned resources via callbacks.
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio-balloon change the memory size is asynchronous.
VirtioBalloonConfig.actual of balloon device show current balloon size.
This commit add memory_actual_size to vm.info to show memory actual size.
Signed-off-by: Hui Zhu <teawater@antfin.com>
Write to the exit_evt EventFD which will trigger all the devices and
vCPUs to exit. This is slightly cleaner than just exiting the process as
any temporary files will be removed.
Fixes: #1242
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The states of GIC should be part of the VM states. This commit
enables the AArch64 VM states save/restore by adding save/restore
of GIC states.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Similarly as the VM booting process, on AArch64 systems,
the vCPUs should be created before the creation of GIC. This
commit refactors the vCPU save/restore code to achieve the
above-mentioned restoring order.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
In AArch64 systems, the state of GIC device can only be
retrieved from `KVM_GET_DEVICE_ATTR` ioctl. Therefore to implement
saving/restoring the GIC states, we need to make sure that the
GIC object (either the file descriptor or the device itself) can
be extracted after the VM is started.
This commit refactors the code of GIC creation by adding a new
field `gic_device_entity` in device manager and methods to set/get
this field. The GIC object can be therefore saved in the device
manager after calling `arch::configure_system`.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
virtio-mem device would use 'VIRTIO_MEM_F_ACPI_PXM' to add memory to NUMA
node, which MUST be existed, otherwise it will be assigned to node id 0,
even if user specify different node id.
According ACPI spec about Memory Affinity Structure, system hardware
supports hot-add memory region using 'Hot Pluggable | Enabled' flags.
Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
Based on all the preparatory work achieved through previous commits,
this patch updates the 'hotplugged_size' field for both MemoryConfig and
MemoryZoneConfig structures when either the whole memory is resized, or
simply when a memory zone is resized.
This fixes the lack of support for rebooting a VM with the right amount
of memory plugged in.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that e820 tables are created from the 'boot_guest_memory', we can
simplify the memory manager code by adding the virtio-mem regions when
they are created. There's no need to wait for the first hotplug to
insert these regions.
This also anticipates the need for starting a VM with some memory
already plugged into the virtio-mem region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to differentiate the 'boot' memory regions from the virtio-mem
regions, we store what we call 'boot_guest_memory'. This is useful to
provide the adequate list of regions to the configure_system() function
as it expects only the list of regions that should be exposed through
the e820 table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that virtio-mem device accept a guest NUMA node as parameter, we
retrieve this information from the list of NUMA nodes. Based on the
memory zone associated with the virtio-mem device, we obtain the NUMA
node identifier, which we provide to the virtio-mem device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement a new VM action called 'resize-zone' allowing the user to
resize one specific memory zone at a time. This relies on all the
preliminary work from the previous commits to resize each virtio-mem
device independently from each others.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need for storing memory regions along with
virtio-mem information for each memory zone, we create a new structure
MemoryZone that will replace Vec<Arc<GuestRegionMmap>> in the hash map
MemoryZones.
This makes thing more logical as MemoryZones becomes a list of
MemoryZone sorted by their identifier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The goal of this commit is to rename the existing NUMA option 'id' with
'guest_numa_id'. This is done without any modification to the way this
option behaves.
The reason for the rename is caused by the observation that all other
parameters with an option called 'id' expect a string to be provided.
Because in this particular case we expect a u32 representing a proximity
domain from the ACPI specification, it's better to name it with a more
explicit name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the previous changes introducing new options for both memory
zones and NUMA configuration, this patch changes the behavior of the
NUMA node definition. Instead of relying on the memory zones to define
the guest NUMA nodes, everything goes through the --numa parameter. This
allows for defining NUMA nodes without associating any particular memory
range to it. And in case one wants to associate one or multiple memory
ranges to it, the expectation is to describe a list of memory zone
through the --numa parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the NumaConfig which now provides distance information, we can
internally update the list of NUMA nodes with the exact distances they
should be located from other nodes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>