It's important to return the region covered by virtio-mem the first time
it is inserted as the device manager must update all devices with this
information.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the previous code changes, we can now update the MemoryManager
code to create one virtio-mem region and resizing handler per memory
zone. This will naturally create one virtio-mem device per memory zone
from the DeviceManager's code which has been previously updated as well.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for resizing support of an individual memory zone,
this commit introduces a new option 'hotplug_size' to '--memory-zone'
parameter. This defines the amount of memory that can be added through
each specific memory zone.
Because memory zone resize is tied to virtio-mem, make sure the user
selects 'virtio-mem' hotplug method, otherwise return an error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both MemoryManager and DeviceManager are updated through this commit to
handle the creation of multiple virtio-mem devices if needed. For now,
only the framework is in place, but the behavior remains the same, which
means only the memory zone created from '--memory' generates a
virtio-mem region that can be used for resize.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to anticipate the need for storing memory regions along with
virtio-mem information for each memory zone, we create a new structure
MemoryZone that will replace Vec<Arc<GuestRegionMmap>> in the hash map
MemoryZones.
This makes thing more logical as MemoryZones becomes a list of
MemoryZone sorted by their identifier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The way to describe guest NUMA nodes has been updated through previous
commits, letting the user describe the full NUMA topology through the
--numa parameter (or NumaConfig).
That's why we can remove the deprecated and unused 'guest_numa_node'
option.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the previous changes introducing new options for both memory
zones and NUMA configuration, this patch changes the behavior of the
NUMA node definition. Instead of relying on the memory zones to define
the guest NUMA nodes, everything goes through the --numa parameter. This
allows for defining NUMA nodes without associating any particular memory
range to it. And in case one wants to associate one or multiple memory
ranges to it, the expectation is to describe a list of memory zone
through the --numa parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we have an identifier per memory zone, and in order to keep
track of the memory regions associated with the memory zones, we create
and store a map referencing list of memory regions per memory zone ID.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for allowing memory zones to be removed, but also in
anticipation for refactoring NUMA parameter, we introduce a mandatory
'id' option to the --memory-zone parameter.
This forces the user to provide a unique identifier for each memory zone
so that we can refer to these.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the NumaConfig which now provides distance information, we can
internally update the list of NUMA nodes with the exact distances they
should be located from other nodes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the list of CPUs defined through the NumaConfig, this patch
will update the internal list of CPUs attached to each NUMA node.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the 'guest_numa_node' option, we create and store a list of
NUMA nodes in the MemoryManager. The point being to associate a list of
memory regions to each node, so that we can later create the ACPI tables
with the proper memory range information.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With the introduction of this new option, the user will be able to
describe if a particular memory zone should belong to a specific NUMA
node from a guest perspective.
For instance, using '--memory-zone size=1G,guest_numa_node=2' would let
the user describe that a memory zone of 1G in the guest should be
exposed as being associated with the NUMA node 2.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given that ACPI uses u32 as the type for the Proximity Domain, we can
use u32 instead of u64 as the type for 'host_numa_node' option.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Let's narrow down the limitation related to mbind() by allowing shared
mappings backed by a file backed by RAM. This leaves the restriction on
only for mappings backed by a regular file.
With this patch, host NUMA node can be specified even if using
vhost-user devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.
The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Factorize the codepath between simple memory and multiple memory zones.
This simplifies the way regions are memory mapped, as everything relies
on the same codepath. This is performed by creating a memory zone on the
fly for the specific use case where --memory is used with size being
different from 0. Internally, the code can rely on memory zones to
create the memory regions forming the guest memory.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
User defined memory regions can now support being snapshot and restored,
therefore this commit removes the restrictions that were applied through
earlier commit.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By factorizing a lot of code into create_ram_region(), this commit
achieves the simplification of the restore codepath. Additionally, it
makes user defined memory zones compatible with snapshot/restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
First thing, this patch introduces a new function to identify if a file
descriptor is linked to any hard link on the system. This can let the
VMM know if the file can be accessed by the user, or if the file will
be destroyed as soon as the VMM releases the file descriptor.
Based on this information, and associated with the knowledge about the
region being MAP_SHARED or not, the VMM can now decide to skip the copy
of the memory region content. If the user has access to the file from
the filesystem, and if the file has been mapped as MAP_SHARED, we can
consider the guest memory region content to be present in this file at
any point in time. That's why in this specific case, there's no need for
performing the copy of the memory region content into a dedicated file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Let's not assume that a backing file is going to be the result from
a snapshot for each memory region. These regions might be backed by
a file on the host filesystem (not a temporary file in host RAM), which
means they don't need to be copied and stored into dedicated files.
That's why this commit prepares for further changes by introducing an
optional PathBuf associated with the snapshot of each memory region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the memory size is 0, this means the user defined memory
zones are used as a way to specify how to back the guest memory.
This is the first step in supporting complex use cases where the user
can define exactly which type of memory from the host should back the
memory from the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for the need to map part of a file with the function
create_ram_region(), it is extended to accept a file offset as argument.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the provided backing file is an actual file and not a directory,
we should not truncate it, as we expect the file to already be the right
size.
This change will be important once we try to map the same file through
multiple memory mappings. We can't let the file be truncated as the
second mapping wouldn't work properly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.
Signed-off-by: Anatol Belski <ab@php.net>
The SGX EPC region must be exposed through the ACPI tables so that the
guest can detect its presence. The guest only get the full range from
ACPI, as the specific EPC sections are directly described through the
CPUID of each vCPU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the presence of one or multiple SGX EPC sections from the VM
configuration, the MemoryManager will allocate a contiguous block of
guest address space to hold the entire EPC region. Within this EPC
region, each EPC section is memory mapped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit store balloon size to MemoryConfig.
After reboot, virtio-balloon can use this size to inflate back to
the size before reboot.
Signed-off-by: Hui Zhu <teawater@antfin.com>
The code is purely for maintaining an internal counter. It is not really
tied to KVM.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The fd naming is quite KVM specific. Since we're now using the
hypervisor crate abstractions, we can rename those into something more
readable and meaningful.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
That removes one more KVM-ism in VMM crate.
Note that there are more KVM specific code in those files to be split
out, but we're not at that stage yet.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Implemented GSI allocator and system allocator for AArch64.
Renamed some layout definitions to align more code between architectures.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
In order to workaround a Linux bug that happens when we place devices at
the end of the physical address space on recent hardware (52 bits limit)
we reduce the MMIO address space by one 4k page. This way, nothing gets
allocated in the last 4k of the address space, which is negligible given
the amount of space in the address space.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is a preparing commit to build and test CH on AArch64. All building
issues were fixed, but no functionality was introduced.
For X86, the logic of code was not changed at all.
For ARM, the architecture specific part is still empty. And we applied
some tricks to workaround lint warnings. But such code will be replaced
later by other commits with real functionality.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").
Fixes: #1011
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
Replace alignment calculation of start address with functionally
equivalent version that does not assume that the block size is a power
of two.
Signed-off-by: Martin Xu <martin.xu@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>