virtio-mem device would use 'VIRTIO_MEM_F_ACPI_PXM' to add memory to NUMA
node, which MUST be existed, otherwise it will be assigned to node id 0,
even if user specify different node id.
According ACPI spec about Memory Affinity Structure, system hardware
supports hot-add memory region using 'Hot Pluggable | Enabled' flags.
Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
Use zone.host_numa_node to create memory zone, so that memory zone
can apply memory policy in according with host numa node ID
Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
If after the creation of the self-spawned backend, the VMM cannot create
the corresponding vhost-user frontend, the VMM must kill the freshly
spawned process in order to ensure the error propagation can happen.
In case the child process would still be around, the VMM cannot return
the error as it waits onto the child to terminate.
This should help us identify when self-spawned failures are caused by a
connection being refused between the VMM and the backend.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the VMM is terminated by receiving a SIGTERM signal, the signal
handler thread must be able to invoke ioctl(TCGETS) and ioctl(TCSETS)
without error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Build testing of changes happens on GitHub actions and the integration
tests will build the binary (with different feature flags) again. So
these earlier build operations are just wasted time on the critical
path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Only use `cloud-hypervisor` when referring to the binary itself and
prefer Cloud Hypervisor when referring to the project.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some operations complete directly after they have been submitted, which
means they are not submitted asynchronously and therefore they don't
generate any ioevent. This is the reason why we are not processing some
of the completed operations, which leads to some unpredictable
behaviors.
Forcing all io_uring operations submitted to the SQE to be asynchronous
helps simplifying the code as it ensures the completion of every
operation will generate an ioevent, therefore no operation is missed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that virtio-mem supports reboot, we extend the existing integration
tests to validate the amount of RAM after reboot is the same as before
the reboot, but also that we can still resize down the VM or the memory
zone after the reboot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The missing syscall rt_sigprocmask(2) was triggered for the musl build
upon rebooting the VM, and was causing the VM to be killed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to facilitate the memory hotplug for our users, let's enable
the automatic memory onlining in our guest kernel by activating the
kernel config option 'CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE'.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the preparatory work achieved through previous commits,
this patch updates the 'hotplugged_size' field for both MemoryConfig and
MemoryZoneConfig structures when either the whole memory is resized, or
simply when a memory zone is resized.
This fixes the lack of support for rebooting a VM with the right amount
of memory plugged in.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a new field to VirtioMemZone structure, as it lets us associate
with a particular virtio-mem region the amount of memory that should be
plugged in at boot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch simplifies the code as we have one single Option for the
VirtioMemZone. This also prepares for storing additional information
related to the virtio-mem region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add the new option 'hotplugged_size' to both --memory-zone and --memory
parameters so that we can let the user specify a certain amount of
memory being plugged at boot.
This is also part of making sure we can store the virtio-mem size over a
reboot of the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit gives the possibility to create a virtio-mem device with
some memory already plugged into it. This is preliminary work to be
able to reboot a VM with the virtio-mem region being already resized.
Signed-off-by: Hui Zhu <teawater@antfin.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that e820 tables are created from the 'boot_guest_memory', we can
simplify the memory manager code by adding the virtio-mem regions when
they are created. There's no need to wait for the first hotplug to
insert these regions.
This also anticipates the need for starting a VM with some memory
already plugged into the virtio-mem region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to differentiate the 'boot' memory regions from the virtio-mem
regions, we store what we call 'boot_guest_memory'. This is useful to
provide the adequate list of regions to the configure_system() function
as it expects only the list of regions that should be exposed through
the e820 table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend the existing test to validate that each NUMA node gets assigned
the right amount of memory after each memory zone has been resized.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we can resize each memory zone independently, this commit
extends the memory zone related test by validating 'vm.resize-zone'
works correctly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio-mem driver is generating some warnings regarding both size
and alignment of the virtio-mem region if not based on 128MiB:
The alignment of the physical start address can make some memory
unusable.
The alignment of the physical end address can make some memory
unusable.
For these reasons, the current patch enforces virtio-mem regions to be
128MiB aligned and checks the size provided by the user is a multiple of
128MiB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that virtio-mem device accept a guest NUMA node as parameter, we
retrieve this information from the list of NUMA nodes. Based on the
memory zone associated with the virtio-mem device, we obtain the NUMA
node identifier, which we provide to the virtio-mem device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement support for associating a virtio-mem device with a specific
guest NUMA node, based on the ACPI proximity domain identifier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For more consistency and help reading the code better, this commit
renames all 'virtiomem*' variables into 'virtio_mem*'.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By testing manually the memory resizing through virtio-mem, several
missing syscalls have been identified.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement a new VM action called 'resize-zone' allowing the user to
resize one specific memory zone at a time. This relies on all the
preliminary work from the previous commits to resize each virtio-mem
device independently from each others.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding a new parameter 'id' to the virtiomem_resize() function, we
prepare this function to be usable for both global memory resizing and
memory zone resizing.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's important to return the region covered by virtio-mem the first time
it is inserted as the device manager must update all devices with this
information.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the previous code changes, we can now update the MemoryManager
code to create one virtio-mem region and resizing handler per memory
zone. This will naturally create one virtio-mem device per memory zone
from the DeviceManager's code which has been previously updated as well.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for resizing support of an individual memory zone,
this commit introduces a new option 'hotplug_size' to '--memory-zone'
parameter. This defines the amount of memory that can be added through
each specific memory zone.
Because memory zone resize is tied to virtio-mem, make sure the user
selects 'virtio-mem' hotplug method, otherwise return an error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both MemoryManager and DeviceManager are updated through this commit to
handle the creation of multiple virtio-mem devices if needed. For now,
only the framework is in place, but the behavior remains the same, which
means only the memory zone created from '--memory' generates a
virtio-mem region that can be used for resize.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need for storing memory regions along with
virtio-mem information for each memory zone, we create a new structure
MemoryZone that will replace Vec<Arc<GuestRegionMmap>> in the hash map
MemoryZones.
This makes thing more logical as MemoryZones becomes a list of
MemoryZone sorted by their identifier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Inject CPUID leaves for advertising KVM HyperV support when the
"kvm_hyperv" toggle is enabled. Currently we only enable a selection of
features required to boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently we don't need to do anything to service these exits but when
the synthetic interrupt controller is active an exit will be triggered
to notify the VMM of details of the synthetic interrupt page.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>