Commit Graph

893 Commits

Author SHA1 Message Date
Michael Zhao
a95b6bbd8b vmm: Add seccomp rules for starting vhost-user-net backend on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-31 08:19:23 +02:00
Hui Zhu
f7b3581645 cloud-hypervisor.yaml: MemoryConfig: Add balloon_size
"struct MemoryConfig" has balloon_size but not in MemoryConfig
of cloud-hypervisor.yaml.
This commit adds it.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-08-28 09:58:39 +02:00
Sebastien Boeuf
a8a9e61c3d vmm: memory_manager: Allow host NUMA for RAM backed files
Let's narrow down the limitation related to mbind() by allowing shared
mappings backed by a file backed by RAM. This leaves the restriction on
only for mappings backed by a regular file.

With this patch, host NUMA node can be specified even if using
vhost-user devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
1b4591aecc vmm: memory_manager: Apply NUMA policy to memory zones
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
e6f585a31c vmm: Add 'host_numa_nodes' option to memory zones
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.

The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
ad5d0e4713 vmm: Remove 'mergeable' from memory zones
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 07:26:49 +02:00
Sebastien Boeuf
89e7774b96 vmm: openapi: Don't expect cmdline to always be there
The 'cmdline' parameter should not be required as it is not needed when
the 'kernel' parameter is the rust-hypervisor-fw, which means the kernel
and the associated command line will be found from the EFI partition.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:49:05 +02:00
Sebastien Boeuf
e8149380b7 vmm: memory_manager: Factorize memory regions creation
Factorize the codepath between simple memory and multiple memory zones.
This simplifies the way regions are memory mapped, as everything relies
on the same codepath. This is performed by creating a memory zone on the
fly for the specific use case where --memory is used with size being
different from 0. Internally, the code can rely on memory zones to
create the memory regions forming the guest memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c58dd761f4 vmm: Remove 'file' option from MemoryConfig
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
5bf7113768 vmm: memory_manager: Remove restrictions about snapshot/restore
User defined memory regions can now support being snapshot and restored,
therefore this commit removes the restrictions that were applied through
earlier commit.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
2583d572fc vmm: memory_manager: Simplify how to restore memory regions
By factorizing a lot of code into create_ram_region(), this commit
achieves the simplification of the restore codepath. Additionally, it
makes user defined memory zones compatible with snapshot/restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
b14c861c6f vmm: memory_manager: Store memory regions content only when necessary
First thing, this patch introduces a new function to identify if a file
descriptor is linked to any hard link on the system. This can let the
VMM know if the file can be accessed by the user, or if the file will
be destroyed as soon as the VMM releases the file descriptor.

Based on this information, and associated with the knowledge about the
region being MAP_SHARED or not, the VMM can now decide to skip the copy
of the memory region content. If the user has access to the file from
the filesystem, and if the file has been mapped as MAP_SHARED, we can
consider the guest memory region content to be present in this file at
any point in time. That's why in this specific case, there's no need for
performing the copy of the memory region content into a dedicated file.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d1ce52f3a8 vmm: memory_manager: Make backing file from snapshot optional
Let's not assume that a backing file is going to be the result from
a snapshot for each memory region. These regions might be backed by
a file on the host filesystem (not a temporary file in host RAM), which
means they don't need to be copied and stored into dedicated files.

That's why this commit prepares for further changes by introducing an
optional PathBuf associated with the snapshot of each memory region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
871138d5cc vm-migration: Make snapshot() mutable
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c13721fdbd vmm: memory_manager: Handle user defined memory zones
In case the memory size is 0, this means the user defined memory
zones are used as a way to specify how to back the guest memory.

This is the first step in supporting complex use cases where the user
can define exactly which type of memory from the host should back the
memory from the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
7cd3867e2c vmm: memory_manager: Provide file offset through create_ram_region()
In anticipation for the need to map part of a file with the function
create_ram_region(), it is extended to accept a file offset as argument.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
59d4a56ab7 vmm: memory_manager: Don't truncate backing file
In case the provided backing file is an actual file and not a directory,
we should not truncate it, as we expect the file to already be the right
size.

This change will be important once we try to map the same file through
multiple memory mappings. We can't let the file be truncated as the
second mapping wouldn't work properly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
be475ddc22 main, vmm: Let the user define distincts memory zones
Introducing a new CLI option --memory-zone letting the user specify
custom memory zones. When this option is present, the --memory size
must be explicitly set to 0.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d25ec66bb6 vmm: memory_manager: Simplify start_addr()
Small simplification for the function calculating the start address.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Anatol Belski
12212d2966 pci: device_manager: Remove hardcoded I/O port assignment
It is otherwise seems to be able to cause resource conflicts with
Windows APCI_HAL. The OS might do a better job on assigning resources
to this device, withouth them to be requested explicitly. 0xcf8 and
0xcfc are only what is certainly needed for the PCI device enumeration.

Signed-off-by: Anatol Belski <anatol.belski@microsoft.com>
2020-08-25 09:00:06 +02:00
Michael Zhao
afc98a5ec9 vmm: Fix AArch64 clippy warnings of vmm and other crates
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-24 10:59:08 +02:00
Muminul Islam
92b4499c1e vmm, hypervisor: Add vmstate to snapshot and restore path
Signed-off-by: Muminul Islam <muislam@microsoft.com>
2020-08-24 08:48:15 +02:00
Bo Chen
02d87833f0 virtio-devices: seccomp: Add seccomp filters for vhost_blk thread
This patch enables the seccomp filters for the vhost_blk worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
896b9a1d4b virtio-devices: seccomp: Add seccomp filter for vhost_net_ctl thread
This patch enables the seccomp filters for the vhost_net_ctl worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
02d63149fe virtio-devices: seccomp: Add seccomp filters for vhost_fs thread
This patch enables the seccomp filters for the vhost_fs worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
c82ded8afa virtio-devices: seccomp: Add seccomp filters for balloon thread
This patch enables the seccomp filters for the balloon worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
c460178723 virtio-devices: seccomp: Add seccomp filters for mem thread
This patch enables the seccomp filters for the mem worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-19 08:33:58 +02:00
Bo Chen
4539236690 virtio-devices: seccomp: Add seccomp filters for iommu thread
This patch enables the seccomp filters for the iommu worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-17 21:08:49 +02:00
Anatol Belski
eba42c392f devices: acpi: Add UID to devices with common HID
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.

Signed-off-by: Anatol Belski <ab@php.net>
2020-08-14 08:52:02 +02:00
Sebastien Boeuf
bdef54ead6 vmm: Add brk syscall to the API thread
The brk syscall is not always called as the system might not need it.
But when it's needed from the API thread, this causes the thread to
terminate as it is not part of the authorized list of syscalls.

This should fix some sporadic failures on the CI with the musl build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-11 15:04:21 +01:00
Jose Carlos Venegas Munoz
90acb01bad vmm: seccomp: add mprotect to API thread filter
Add mprotect to API thread rules. Prevent the VMM is
killed when it is used.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-08-05 21:35:21 +01:00
Bo Chen
dc71d2765a virtio-devices: seccomp: Add seccomp filters for pmem thread
This patch enables the seccomp filters for the pmem worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
d77977536d virtio-devices: seccomp: Add seccomp filters for net thread
This patch enables the seccomp filters for the net worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
276df6b71c virtio-devices: seccomp: Add seccomp filters for console thread
This patch enables the seccomp filters for the console worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
a426221167 virtio-devices: seccomp: Add seccomp filters for rng thread
This patch enables the seccomp filters for the rng worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-05 08:13:31 +01:00
Bo Chen
704edd544c virtio-devices: seccomp: Add seccomp_filter module
This patch added the seccomp_filter module to the virtio-devices crate
by taking reference code from the vmm crate. This patch also adds
allowed-list for the virtio-block worker thread.

Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
ff7ed8f628 vmm: Propagate the SeccompAction value to the Vm struct constructor
This patch propagates the SeccompAction value from main to the
Vm struct constructor (i.e. Vm::new_from_memory_manager), so that we can
use it to construct the DeviceManager and CpuManager struct for
controlling the behavior of the seccomp filters for vcpu/virtio-device
worker threads.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
8e74637ebb main, vmm: seccomp: Add the '--seccomp log' option
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".

Fixes: #1180

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
b41884a406 main, vmm: seccomp: Use SeccompAction instead of SeccompLevel
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Sebastien Boeuf
8f0bf82648 io_uring: Add new feature gate
By adding a new io_uring feature gate, we let the user the possibility
to choose if he wants to enable the io_uring improvements or not.
Since the io_uring feature depends on the availability on recent host
kernels, it's better if we leave it off for now.

As soon as our CI will have support for a kernel 5.6 with all the
features needed from io_uring, we'll enable this feature gate
permanently.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Sebastien Boeuf
917027c55b vmm: Rely on virtio-blk io_uring when possible
In case the host supports io_uring and the specific io_uring options
needed, the VMM will choose the asynchronous version of virtio-blk.
This will enable better I/O performances compared to the default
synchronous version.

This is also important to note the VMM won't be able to use the
asynchronous version if the backend image is in QCOW format.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Praveen Paladugu
afa8ecc90c vmm: add validation for network parameters
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
2020-07-31 09:07:12 +02:00
Wei Liu
a52b614a61 vmm: device_manager: console input should be only consumed by one device
Cloud Hypervisor allows either the serial or virtio console to output to
TTY, but TTY input is pushed to both.

This is not correct. When Linux guest is configured to spawn TTYs on
both ttyS0 and hvc0, the user effectively issues the same commands twice
in different TTYs.

Fix this by only direct input to the one choice that is using host side
TTY.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-30 18:05:01 +02:00
Wei Liu
5ed794a44c vmm: device_manager: rename console_input to virtio_console_input
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-30 18:05:01 +02:00
Wei Liu
3e68867bb7 vmm: device_manager: eliminate KvmMsiInterruptManager from the new function
The logic to create an MSI interrupt manager is applicable to Hyper-V as
well.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-30 08:00:33 +02:00
Wei Liu
218ec563fc vmm: fix warnings when KVM is not enabled
Some imports are only used by KVM. Some variables and code become dead
or unused when KVM is not enabled.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-28 21:08:39 +01:00
Jianyong Wu
d24b110519 seccomp: AArch64: Add SYS_unlinkat to seccomp whitelist
This commit fixes an "Bad syscall" error when shutting down the VM
on AArch64 by adding the SYS_unlinkat syscall to the seccomp
whitelist.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-07-27 07:25:07 +00:00
Rob Bradford
9ae44aeada vmm: acpi_tables: Fix PM timer I/O port width
Ensure that the width of the I/O port is correctly set to 32-bits in the
generic address used for the X_PM_TMR_BLK. Do this by type
parameterising GenericAddress::io_port_address() fuction.

TEST=Boot with clocksource=acpi_pm and observe no errors in the dmesg.

Fixes: #1496

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-23 17:48:22 +02:00
Rob Bradford
aae5d988e1 devices: vmm: Add ACPI PM timer
This is a counter exposed via an I/O port that runs at 3.579545MHz. Here
we use a hardcoded I/O and expose the details through the FADT table.

TEST=Boot Linux kernel and see the following in dmesg:

[    0.506198] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-23 13:10:21 +01:00
Wei Liu
f03afea0d6 device_manager: document unsafe block in add_vfio_device
It is not immediately obvious why the conversion is safe. Document the
safety guarantee.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-21 17:13:10 +01:00