If the user provided a maximum physical bits value for the vCPUs, the
memory manager will adapt the guest physical address space accordingly
so that devices are not placed further than the specified value.
It's important to note that if the number exceed what is available on
the host, the smaller number will be picked.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Run loop in hypervisor needs a callback mechanism to access resources
like guest memory, mmio, pio etc.
VmmOps trait is introduced here, which is implemented by vmm module.
While handling vcpuexits in run loop, this trait allows hypervisor
module access to the above mentioned resources via callbacks.
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio-balloon change the memory size is asynchronous.
VirtioBalloonConfig.actual of balloon device show current balloon size.
This commit add memory_actual_size to vm.info to show memory actual size.
Signed-off-by: Hui Zhu <teawater@antfin.com>
Write to the exit_evt EventFD which will trigger all the devices and
vCPUs to exit. This is slightly cleaner than just exiting the process as
any temporary files will be removed.
Fixes: #1242
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The states of GIC should be part of the VM states. This commit
enables the AArch64 VM states save/restore by adding save/restore
of GIC states.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Similarly as the VM booting process, on AArch64 systems,
the vCPUs should be created before the creation of GIC. This
commit refactors the vCPU save/restore code to achieve the
above-mentioned restoring order.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
In AArch64 systems, the state of GIC device can only be
retrieved from `KVM_GET_DEVICE_ATTR` ioctl. Therefore to implement
saving/restoring the GIC states, we need to make sure that the
GIC object (either the file descriptor or the device itself) can
be extracted after the VM is started.
This commit refactors the code of GIC creation by adding a new
field `gic_device_entity` in device manager and methods to set/get
this field. The GIC object can be therefore saved in the device
manager after calling `arch::configure_system`.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
virtio-mem device would use 'VIRTIO_MEM_F_ACPI_PXM' to add memory to NUMA
node, which MUST be existed, otherwise it will be assigned to node id 0,
even if user specify different node id.
According ACPI spec about Memory Affinity Structure, system hardware
supports hot-add memory region using 'Hot Pluggable | Enabled' flags.
Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
Based on all the preparatory work achieved through previous commits,
this patch updates the 'hotplugged_size' field for both MemoryConfig and
MemoryZoneConfig structures when either the whole memory is resized, or
simply when a memory zone is resized.
This fixes the lack of support for rebooting a VM with the right amount
of memory plugged in.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that e820 tables are created from the 'boot_guest_memory', we can
simplify the memory manager code by adding the virtio-mem regions when
they are created. There's no need to wait for the first hotplug to
insert these regions.
This also anticipates the need for starting a VM with some memory
already plugged into the virtio-mem region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to differentiate the 'boot' memory regions from the virtio-mem
regions, we store what we call 'boot_guest_memory'. This is useful to
provide the adequate list of regions to the configure_system() function
as it expects only the list of regions that should be exposed through
the e820 table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that virtio-mem device accept a guest NUMA node as parameter, we
retrieve this information from the list of NUMA nodes. Based on the
memory zone associated with the virtio-mem device, we obtain the NUMA
node identifier, which we provide to the virtio-mem device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement a new VM action called 'resize-zone' allowing the user to
resize one specific memory zone at a time. This relies on all the
preliminary work from the previous commits to resize each virtio-mem
device independently from each others.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need for storing memory regions along with
virtio-mem information for each memory zone, we create a new structure
MemoryZone that will replace Vec<Arc<GuestRegionMmap>> in the hash map
MemoryZones.
This makes thing more logical as MemoryZones becomes a list of
MemoryZone sorted by their identifier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The goal of this commit is to rename the existing NUMA option 'id' with
'guest_numa_id'. This is done without any modification to the way this
option behaves.
The reason for the rename is caused by the observation that all other
parameters with an option called 'id' expect a string to be provided.
Because in this particular case we expect a u32 representing a proximity
domain from the ACPI specification, it's better to name it with a more
explicit name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the previous changes introducing new options for both memory
zones and NUMA configuration, this patch changes the behavior of the
NUMA node definition. Instead of relying on the memory zones to define
the guest NUMA nodes, everything goes through the --numa parameter. This
allows for defining NUMA nodes without associating any particular memory
range to it. And in case one wants to associate one or multiple memory
ranges to it, the expectation is to describe a list of memory zone
through the --numa parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the NumaConfig which now provides distance information, we can
internally update the list of NUMA nodes with the exact distances they
should be located from other nodes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the list of CPUs defined through the NumaConfig, this patch
will update the internal list of CPUs attached to each NUMA node.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch added the seccomp_filter module to the virtio-devices crate
by taking reference code from the vmm crate. This patch also adds
allowed-list for the virtio-block worker thread.
Partially fixes: #925
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch propagates the SeccompAction value from main to the
Vm struct constructor (i.e. Vm::new_from_memory_manager), so that we can
use it to construct the DeviceManager and CpuManager struct for
controlling the behavior of the seccomp filters for vcpu/virtio-device
worker threads.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Shrink GICDevice trait to contain hypervisor agnostic API's only, which
are used in generating FDT.
Move all KVM specific logic into KvmGICDevice trait.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
SGX expects the EPC region to be reported as "reserved" from the e820
table. This patch adds a new entry to the table if SGX is enabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of passing the GuestMemoryMmap directly to the CpuManager upon
its creation, it's better to pass a reference to the MemoryManager. This
way we will be able to know if SGX EPC region along with one or multiple
sections are present.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the presence of one or multiple SGX EPC sections from the VM
configuration, the MemoryManager will allocate a contiguous block of
guest address space to hold the entire EPC region. Within this EPC
region, each EPC section is memory mapped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit store balloon size to MemoryConfig.
After reboot, virtio-balloon can use this size to inflate back to
the size before reboot.
Signed-off-by: Hui Zhu <teawater@antfin.com>
In order to move the hypervisor specific parts of the VM exit handling
path, we're defining a generic, hypervisor agnostic VM exit enum.
This is what the hypervisor's Vcpu run() call should return when the VM
exit can not be completely handled through the hypervisor specific bits.
For KVM based hypervisors, this means directly forwarding the IO related
exits back to the VMM itself. For other hypervisors that e.g. rely on the
VMM to decode and emulate instructions, this means the decoding itself
would happen in the hypervisor crate exclusively, and the rest of the VM
exit handling would be handled through the VMM device model implementation.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Fix test_vm unit test by using the new abstraction and dropping some
dead code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The fd naming is quite KVM specific. Since we're now using the
hypervisor crate abstractions, we can rename those into something more
readable and meaningful.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Because we don't want the guest to miss any event triggered by the
emulation of devices, it is important to resume all vCPUs before we can
resume the DeviceManager with all its associated devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When set_user_memory_region was moved to hypervisor crate, it was turned
into a safe function that wrapped around an unsafe call. All but one
call site had the safety statements removed. But safety statement was
not moved inside the wrapper function.
Add the safety statement back to help reasoning in the future. Also
remove that one last instance where the safety statement is not needed .
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
That removes one more KVM-ism in VMM crate.
Note that there are more KVM specific code in those files to be split
out, but we're not at that stage yet.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Collate the virtio device counters in DeviceManager for each device that
exposes any and expose it through the recently added HTTP API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The counters are a hash of device name to hash of counter name to u64
value. Currently the API is only implemented with a stub that returns an
empty set of counters.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to maintain correct time when doing pause/resume and
snapshot/restore operations, this patch stores the clock value
on pause, and restore it on resume. Because snapshot/restore
expects a VM to be paused before the snapshot and paused after
the restore, this covers the migration use case too.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>