Commit Graph

900 Commits

Author SHA1 Message Date
Sebastien Boeuf
629befdb4a vmm: acpi: Add CPUs to NUMA nodes
Based on the list of CPUs related to each NUMA node, Processor Local
x2APIC Affinity structures are created and included into the SRAT table.

This describes which CPUs are part of each node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 15:25:00 +02:00
Sebastien Boeuf
db28db8567 vmm: Update NUMA nodes based on NumaConfig
Relying on the list of CPUs defined through the NumaConfig, this patch
will update the internal list of CPUs attached to each NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 15:25:00 +02:00
Sebastien Boeuf
42f963d6f2 main, vmm: Add new --numa parameter
Through this new parameter, we give users the opportunity to specify a
set of CPUs attached to a NUMA node that has been previously created
from the --memory-zone parameter.

This parameter will be extended in the future to describe the distance
between multiple nodes.

For instance, if a user wants to attach CPUs 0, 1, 2 and 6 to a NUMA
node, here are two different ways of doing so:
Either
	./cloud-hypervisor ... --numa id=0,cpus=0-2:6
Or
	./cloud-hypervisor ... --numa id=0,cpus=0:1:2:6

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 15:25:00 +02:00
Sebastien Boeuf
65a23c6fc6 vmm: acpi: Create the SRAT table
The SRAT table (System Resource Affinity Table) is needed to describe
NUMA nodes and how memory ranges and CPUs are attached to them.

For now it simply attaches a list of Memory Affinity structures based on
the list of NUMA nodes created from the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
cf81254a8d vmm: memory_manager: Create a NUMA node list
Based on the 'guest_numa_node' option, we create and store a list of
NUMA nodes in the MemoryManager. The point being to associate a list of
memory regions to each node, so that we can later create the ACPI tables
with the proper memory range information.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
768dbd1fb0 vmm: Add 'guest_numa_node' option to 'memory-zone'
With the introduction of this new option, the user will be able to
describe if a particular memory zone should belong to a specific NUMA
node from a guest perspective.

For instance, using '--memory-zone size=1G,guest_numa_node=2' would let
the user describe that a memory zone of 1G in the guest should be
exposed as being associated with the NUMA node 2.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
274c001eab vmm: Use u32 instead of u64 for host_numa_node option
Given that ACPI uses u32 as the type for the Proximity Domain, we can
use u32 instead of u64 as the type for 'host_numa_node' option.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 13:29:42 +02:00
Michael Zhao
a95b6bbd8b vmm: Add seccomp rules for starting vhost-user-net backend on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-31 08:19:23 +02:00
Hui Zhu
f7b3581645 cloud-hypervisor.yaml: MemoryConfig: Add balloon_size
"struct MemoryConfig" has balloon_size but not in MemoryConfig
of cloud-hypervisor.yaml.
This commit adds it.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-08-28 09:58:39 +02:00
Sebastien Boeuf
a8a9e61c3d vmm: memory_manager: Allow host NUMA for RAM backed files
Let's narrow down the limitation related to mbind() by allowing shared
mappings backed by a file backed by RAM. This leaves the restriction on
only for mappings backed by a regular file.

With this patch, host NUMA node can be specified even if using
vhost-user devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
1b4591aecc vmm: memory_manager: Apply NUMA policy to memory zones
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
e6f585a31c vmm: Add 'host_numa_nodes' option to memory zones
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.

The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
ad5d0e4713 vmm: Remove 'mergeable' from memory zones
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 07:26:49 +02:00
Sebastien Boeuf
89e7774b96 vmm: openapi: Don't expect cmdline to always be there
The 'cmdline' parameter should not be required as it is not needed when
the 'kernel' parameter is the rust-hypervisor-fw, which means the kernel
and the associated command line will be found from the EFI partition.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:49:05 +02:00
Sebastien Boeuf
e8149380b7 vmm: memory_manager: Factorize memory regions creation
Factorize the codepath between simple memory and multiple memory zones.
This simplifies the way regions are memory mapped, as everything relies
on the same codepath. This is performed by creating a memory zone on the
fly for the specific use case where --memory is used with size being
different from 0. Internally, the code can rely on memory zones to
create the memory regions forming the guest memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c58dd761f4 vmm: Remove 'file' option from MemoryConfig
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
5bf7113768 vmm: memory_manager: Remove restrictions about snapshot/restore
User defined memory regions can now support being snapshot and restored,
therefore this commit removes the restrictions that were applied through
earlier commit.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
2583d572fc vmm: memory_manager: Simplify how to restore memory regions
By factorizing a lot of code into create_ram_region(), this commit
achieves the simplification of the restore codepath. Additionally, it
makes user defined memory zones compatible with snapshot/restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
b14c861c6f vmm: memory_manager: Store memory regions content only when necessary
First thing, this patch introduces a new function to identify if a file
descriptor is linked to any hard link on the system. This can let the
VMM know if the file can be accessed by the user, or if the file will
be destroyed as soon as the VMM releases the file descriptor.

Based on this information, and associated with the knowledge about the
region being MAP_SHARED or not, the VMM can now decide to skip the copy
of the memory region content. If the user has access to the file from
the filesystem, and if the file has been mapped as MAP_SHARED, we can
consider the guest memory region content to be present in this file at
any point in time. That's why in this specific case, there's no need for
performing the copy of the memory region content into a dedicated file.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d1ce52f3a8 vmm: memory_manager: Make backing file from snapshot optional
Let's not assume that a backing file is going to be the result from
a snapshot for each memory region. These regions might be backed by
a file on the host filesystem (not a temporary file in host RAM), which
means they don't need to be copied and stored into dedicated files.

That's why this commit prepares for further changes by introducing an
optional PathBuf associated with the snapshot of each memory region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
871138d5cc vm-migration: Make snapshot() mutable
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c13721fdbd vmm: memory_manager: Handle user defined memory zones
In case the memory size is 0, this means the user defined memory
zones are used as a way to specify how to back the guest memory.

This is the first step in supporting complex use cases where the user
can define exactly which type of memory from the host should back the
memory from the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
7cd3867e2c vmm: memory_manager: Provide file offset through create_ram_region()
In anticipation for the need to map part of a file with the function
create_ram_region(), it is extended to accept a file offset as argument.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
59d4a56ab7 vmm: memory_manager: Don't truncate backing file
In case the provided backing file is an actual file and not a directory,
we should not truncate it, as we expect the file to already be the right
size.

This change will be important once we try to map the same file through
multiple memory mappings. We can't let the file be truncated as the
second mapping wouldn't work properly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
be475ddc22 main, vmm: Let the user define distincts memory zones
Introducing a new CLI option --memory-zone letting the user specify
custom memory zones. When this option is present, the --memory size
must be explicitly set to 0.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d25ec66bb6 vmm: memory_manager: Simplify start_addr()
Small simplification for the function calculating the start address.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Anatol Belski
12212d2966 pci: device_manager: Remove hardcoded I/O port assignment
It is otherwise seems to be able to cause resource conflicts with
Windows APCI_HAL. The OS might do a better job on assigning resources
to this device, withouth them to be requested explicitly. 0xcf8 and
0xcfc are only what is certainly needed for the PCI device enumeration.

Signed-off-by: Anatol Belski <anatol.belski@microsoft.com>
2020-08-25 09:00:06 +02:00
Michael Zhao
afc98a5ec9 vmm: Fix AArch64 clippy warnings of vmm and other crates
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-24 10:59:08 +02:00
Muminul Islam
92b4499c1e vmm, hypervisor: Add vmstate to snapshot and restore path
Signed-off-by: Muminul Islam <muislam@microsoft.com>
2020-08-24 08:48:15 +02:00
Bo Chen
02d87833f0 virtio-devices: seccomp: Add seccomp filters for vhost_blk thread
This patch enables the seccomp filters for the vhost_blk worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
896b9a1d4b virtio-devices: seccomp: Add seccomp filter for vhost_net_ctl thread
This patch enables the seccomp filters for the vhost_net_ctl worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
02d63149fe virtio-devices: seccomp: Add seccomp filters for vhost_fs thread
This patch enables the seccomp filters for the vhost_fs worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
c82ded8afa virtio-devices: seccomp: Add seccomp filters for balloon thread
This patch enables the seccomp filters for the balloon worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
c460178723 virtio-devices: seccomp: Add seccomp filters for mem thread
This patch enables the seccomp filters for the mem worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
4539236690 virtio-devices: seccomp: Add seccomp filters for iommu thread
This patch enables the seccomp filters for the iommu worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-17 21:08:49 +02:00
Anatol Belski
eba42c392f devices: acpi: Add UID to devices with common HID
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.

Signed-off-by: Anatol Belski <ab@php.net>
2020-08-14 08:52:02 +02:00
Sebastien Boeuf
bdef54ead6 vmm: Add brk syscall to the API thread
The brk syscall is not always called as the system might not need it.
But when it's needed from the API thread, this causes the thread to
terminate as it is not part of the authorized list of syscalls.

This should fix some sporadic failures on the CI with the musl build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-11 15:04:21 +01:00
Jose Carlos Venegas Munoz
90acb01bad vmm: seccomp: add mprotect to API thread filter
Add mprotect to API thread rules. Prevent the VMM is
killed when it is used.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-08-05 21:35:21 +01:00
Bo Chen
dc71d2765a virtio-devices: seccomp: Add seccomp filters for pmem thread
This patch enables the seccomp filters for the pmem worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
d77977536d virtio-devices: seccomp: Add seccomp filters for net thread
This patch enables the seccomp filters for the net worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
276df6b71c virtio-devices: seccomp: Add seccomp filters for console thread
This patch enables the seccomp filters for the console worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
a426221167 virtio-devices: seccomp: Add seccomp filters for rng thread
This patch enables the seccomp filters for the rng worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
704edd544c virtio-devices: seccomp: Add seccomp_filter module
This patch added the seccomp_filter module to the virtio-devices crate
by taking reference code from the vmm crate. This patch also adds
allowed-list for the virtio-block worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
ff7ed8f628 vmm: Propagate the SeccompAction value to the Vm struct constructor
This patch propagates the SeccompAction value from main to the
Vm struct constructor (i.e. Vm::new_from_memory_manager), so that we can
use it to construct the DeviceManager and CpuManager struct for
controlling the behavior of the seccomp filters for vcpu/virtio-device
worker threads.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
8e74637ebb main, vmm: seccomp: Add the '--seccomp log' option
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".

Fixes: #1180

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
b41884a406 main, vmm: seccomp: Use SeccompAction instead of SeccompLevel
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Sebastien Boeuf
8f0bf82648 io_uring: Add new feature gate
By adding a new io_uring feature gate, we let the user the possibility
to choose if he wants to enable the io_uring improvements or not.
Since the io_uring feature depends on the availability on recent host
kernels, it's better if we leave it off for now.

As soon as our CI will have support for a kernel 5.6 with all the
features needed from io_uring, we'll enable this feature gate
permanently.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Sebastien Boeuf
917027c55b vmm: Rely on virtio-blk io_uring when possible
In case the host supports io_uring and the specific io_uring options
needed, the VMM will choose the asynchronous version of virtio-blk.
This will enable better I/O performances compared to the default
synchronous version.

This is also important to note the VMM won't be able to use the
asynchronous version if the backend image is in QCOW format.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Praveen Paladugu
afa8ecc90c vmm: add validation for network parameters
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
2020-07-31 09:07:12 +02:00
Wei Liu
a52b614a61 vmm: device_manager: console input should be only consumed by one device
Cloud Hypervisor allows either the serial or virtio console to output to
TTY, but TTY input is pushed to both.

This is not correct. When Linux guest is configured to spawn TTYs on
both ttyS0 and hvc0, the user effectively issues the same commands twice
in different TTYs.

Fix this by only direct input to the one choice that is using host side
TTY.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-30 18:05:01 +02:00