Commit Graph

1400 Commits

Author SHA1 Message Date
Rob Bradford
63637eba31 vmm: Simplify epoll handling for VMM main loop
Remove the indirection of a dispatch table and simply use the enum as
the event data for the events.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-31 21:30:11 +01:00
Bo Chen
b82bb55927 vmm: openapi: use the right default values
This patch fixes couple of typos for the default values from the openapi
yaml file.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-27 15:58:23 +01:00
Rob Bradford
4d2a4e2805 vmm: Handle epoll events for PTYs separately
Use two separate events for the console and serial PTY and then drive
the handling of the inputs on the PTY separately. This results in the
correct behaviour when both console and serial are attached to the PTY
as they are triggered separately on the epoll so events are not lost.

Fixes: #3012

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-25 13:33:32 +01:00
Rob Bradford
6233f6f68e vmm: Send tty input to correct destination
Check the config to find out which device is attached to the tty and
then send the input from the user into that device (serial or
virtio-console.)

Fixes: #3005

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-25 10:08:25 +01:00
Fazla Mehrab
5db4dede28 block_util, vhdx: vhdx crate integration with the cloud hypervisor
vhdx_sync.rs in block_util implements traits to represent the vhdx
crate as a supported block device in the cloud hypervisor. The vhdx
is added to the block device list in device_manager.rs at the vmm
crate so that it can automatically detect a vhdx disk and invoke the
corresponding crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Fazla Mehrab <akm.fazla.mehrab@intel.com>
2021-08-19 11:43:19 +02:00
Bo Chen
9aba1fdee6 virtio-devices, vmm: Use syscall definitions from the libc crate
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
864a5e4fe0 virtio-devices, vmm: Simplify 'get_seccomp_rules'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
7d38a1848b virtio-devices, vmm: Fix the '--seccomp false' option
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
08ac3405f5 virtio-devices, vmm: Move to the seccompiler crate
Fixes: #2929

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Rob Bradford
9d35a10fd4 vmm: cpu: Shutdown VMM on vCPU thread panic
If the vCPU thread panics then catch it and trigger the shutdown of the
VMM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-13 09:19:54 +02:00
Rob Bradford
53b2e19934 vmm: Add support for hotplugging user devices
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-12 13:19:04 +01:00
Henry Wang
bcae6c41e3 vmm, doc: Forbid same memory zone in multiple NUMA nodes
It is forbidden that the same memory zone belongs to more than one
NUMA node. This commit adds related validation to the `--numa`
parameter to prevent the user from specifying such configuration.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-08-12 10:49:02 +02:00
Henry Wang
5a0a4bc505 arch: Add optional distance-map node to FDT
The optional device tree node distance-map describes the relative
distance (memory latency) between all NUMA nodes.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-08-12 10:49:02 +02:00
Henry Wang
165364e08b vmm: Move NUMA node data structures to arch
This is to make sure the NUMA node data structures can be accessed
both from the `vmm` crate and `arch` crate.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-08-12 10:49:02 +02:00
Henry Wang
20aa811de7 vmm: Extend NUMA setup to more than ACPI
The AArch64 platform provides a NUMA binding for the device tree,
which means on AArch64 platform, the NUMA setup can be extended to
more than the ACPI feature.

Based on above, this commit extends the NUMA setup and data
structures to following scenarios:

- All AArch64 platform
- x86_64 platform with ACPI feature enabled

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Michael Zhao <Michael.Zhao@arm.com>
2021-08-12 10:49:02 +02:00
Sebastien Boeuf
4918c1ca7f block_util, vmm: Propagate error on QcowDiskSync creation
Instead of panicking with an expect() function, the QcowDiskSync::new
function now propagates the error properly. This ensures the VMM will
not panic, which might be the source of weird errors if only one thread
exits while the VMM continues to run.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-11 16:44:28 -07:00
Sebastien Boeuf
4735cb8563 vmm, virtio-devices: Restore vhost-user devices in a dedicated way
We cannot let vhost-user devices connect to the backend when the Block,
Fs or Net object is being created during a restore/migration. The reason
is we can't have two VMs (source and destination) connected to the same
backend at the same time. That's why we must delay the connection with
the vhost-user backend until the restoration is performed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-10 12:36:58 -07:00
Sebastien Boeuf
71c7dff32b vmm: Fix the error handling logic when migration fails
The code wasn't doing what it was expected to. The '?' was simply
returning the error to the top level function, meaning the Err() case in
the match was never hit. Moving the whole logic to a dedicated function
allows to identify when something got wrong without propagating to the
calling function, so that we can still stop the dirty logging and
unpause the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-10 12:36:58 -07:00
Sebastien Boeuf
db444715fd vmm: Shutdown VM after migration succeeded
In case the migration succeeds, the destination VM will be correctly
running, with potential vhost-user backends attached to it. We can't let
the source VM trying to reconnect to the same backends, which is why
it's safer to shutdown the source VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-10 12:36:58 -07:00
Sebastien Boeuf
5a83ebce64 vmm: Notify Migratable objects about migration being complete
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-10 12:36:58 -07:00
Sebastien Boeuf
06729bb3ba vmm: Provide a restoring state to the DeviceManager
In anticipation for creating vhost-user devices in a different way when
being restored compared to a fresh start, this commit introduces a new
boolean created by the Vm depending on the use case, and passed down to
the DeviceManager. In the future, the DeviceManager will use this flag
to assess how vhost-user devices should be created.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-10 12:36:58 -07:00
Rob Bradford
5e74848ab4 vmm: seccomp: Permit syscalls used for vfio-user on vCPU thread
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Rob Bradford
3efccd0fef vmm: config: Ensure shared memory is enabled if using user-devices
Correct operation of user devices (vfio-user) requires shared memory so
flag this to prevent it from failing in strange ways.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Rob Bradford
b28063a7b4 vmm: Create user devices from config
Create the vfio-user / user devices from the config. Currently hotplug
of the devices is not supported nor can they be placed behind the
(virt-)iommu.

Removal of the coldplugged device is however supported.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Rob Bradford
7fbec7113e main, config: Add support for --user-device
This allows the user to specify devices that are running in a different
userspace process and communicated with vfio-user.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Markus Theil
5b0d4bb398 virtio-devices: seccomp: allow unix socket connect in vsock thread
Allow vsocks to connect to Unix sockets on the host running
cloud-hypervisor with enabled seccomp.

Reported-by: Philippe Schaaf <philippe.schaaf@secunet.com>
Tested-by: Franz Girlich <franz.girlich@tu-ilmenau.de>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
2021-08-06 08:44:47 -07:00
Henry Wang
27a285257e vmm: cpu: Add PPTT table for AArch64
The optional Processor Properties Topology Table (PPTT) table is
used to describe the topological structure of processors controlled
by the OSPM, and their shared resources, such as caches. The table
can also describe additional information such as which nodes in the
processor topology constitute a physical package.

The ACPI PPTT table supports topology descriptions for ACPI guests.
Therefore, this commit adds the PPTT table for AArch64 to enable
CPU topology feature for ACPI.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-08-05 21:19:16 +08:00
Henry Wang
7fb980f17b arch, vmm: Pass cpu topology configuation to FDT
In an Arm system, the hierarchy of CPUs is defined through three
entities that are used to describe the layout of physical CPUs in
the system:

- cluster
- core
- thread

All these three entities have their own FDT node field. Therefore,
This commit adds an AArch64-specific helper to pass the config from
the Cloud Hypervisor command line to the `configure_system`, where
eventually the `create_fdt` is called.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-08-05 21:19:16 +08:00
Sebastien Boeuf
5c6139bbff vmm: Finalize migration support for all devices
Make sure the DeviceManager is triggered for all migration operations.
The dirty pages are merged from MemoryManager and DeviceManager before
to be sent up to the Vmm in lib.rs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Sebastien Boeuf
0411064271 vmm: Refactor migration through Migratable trait
Now that Migratable provides the methods for starting, stopping and
retrieving the dirty pages, we move the existing code to these new
functions.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Sebastien Boeuf
e9637d3733 vmm: device_manager: Fully implement Migratable trait
This patch connects the dots between the vm.rs code and each Migratable
device, in order to make sure Migratable methods are correctly invoked
when migration happens.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Sebastien Boeuf
79425b6aa8 vm-migration, vmm: Extend methods for MemoryRangeTable
In anticipation for supporting the merge of multiple dirty pages coming
from multiple devices, this patch factorizes the creation of a
MemoryRangeTable from a bitmap, as well as providing a simple method for
merging the dirty pages regions under a single MemoryRangeTable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Muminul Islam
83c44a2411 vmm, virtio-devices: Add missing seccomp rules for MSHV
This patch adds all the seccomp rules missing for MSHV.
With this patch MSFT internal CI runs with seccomp enabled.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-08-03 11:09:07 -07:00
Bo Chen
902fe20d41 vmm: Add fallback handling for sending live migration
This patch adds a fallback path for sending live migration, where it
ensures the following behavior of source VM post live-migration:

1. The source VM will be paused only when the migration is completed
successfully, or otherwise it will keep running;

2. The source VM will always stop dirty pages logging.

Fixes: #2895

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-03 09:26:12 +01:00
Muminul Islam
3baa0c3721 vmm: Add MSHV_VP_TRANSLATE_GVA to seccomp rule
This rule is needed to boot windows guest.
This bug was introduced while we tried to boot
windows guest on MSHV.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Muminul Islam
81895b9b40 hypervisor: Implement start/stop_dirty_log for MSHV
This patch modify the existing live migration code
to support MSHV. Adds couple of new functions to enable
and disable dirty page tracking. Add missing IOCTL
to the seccomp rules for live migration.
Adds necessary flags for MSHV.
This changes don't affect KVM functionality at all.

In order to get better performance it is good to
enable dirty page tracking when we start live migration
and disable it when the migration is done.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Muminul Islam
fdecba6958 hypervisor: MSHV needs gpa to retrieve dirty logs
Right now, get_dirty_log API has two parameters,
slot and memory_size.
MSHV needs gpa to retrieve the page states. GPA is
needed as MSHV returns the state base on PFN.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Sebastien Boeuf
12db6e5068 vmm: Allow restoring virtio-fs with no cache region
It's totally acceptable to snapshot and restore a virtio-fs device that
has no cache region, since this is a valid mode of functioning for
virtio-fs itself.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-29 06:35:03 -07:00
Sebastien Boeuf
dcc646f5b1 clippy: Fix redundant allocations
With the new beta version, clippy complains about redundant allocation
when using Arc<Box<dyn T>>, and suggests replacing it simply with
Arc<dyn T>.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-29 13:28:57 +02:00
Bo Chen
b00a6a8519 vmm: Create guest memory regions with explicit dirty-pages-log flags
As we are now using an global control to start/stop dirty pages log from
the `hypervisor` crate, we need to explicitly tell the hypervisor (KVM)
whether a region needs dirty page tracking when it is created.

This reverts commit f063346de3.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:08:32 -07:00
Bo Chen
e7c9954dc1 hypervisor, vmm: Abstract the interfaces to start/stop dirty log
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.

This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:08:32 -07:00
Bo Chen
ca09638491 vmm: Add CPUID compatibility check for snapshot/restore
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:26:02 +02:00
Bo Chen
0835198ddd vmm: Factorize CPUID check for live-migration and snapshot/restore
This patch adds a common function "Vmm::vm_check_cpuid_compatibility()"
to be shared by both live-migration and snapshot/restore.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:26:02 +02:00
Bo Chen
6d9c1eb638 arch, vmm: Add CPUID check to the 'Config' step of live migration
We now send not only the 'VmConfig' at the 'Command::Config' step of
live migration, but also send the 'common CPUID'. In this way, we can
check the compatibility of CPUID features between the source and
destination VMs, and abort live migration early if needed.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:26:02 +02:00
Bo Chen
f063346de3 vmm: Create guest memory regions without dirty-pages-log by default
With the support of dynamically turning on/off dirty-pages-log during
live-migration (only for guest RAM regions), we now can create guest
memory regions without dirty-pages-log by default both for guest RAM
regions and other regions backed by file/device.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Bo Chen
5e0d498582 hypervisor, vmm: Add dynamic control of logging dirty pages
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.

Fixes: #2858

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Sebastien Boeuf
0ac4545c5b vmm: Extend seccomp filters with fcntl() for HTTP thread
Whenever a file descriptor is sent through the control message, it
requires fcntl() syscall to handle it, meaning we must allow it through
the list of syscalls authorized for the HTTP thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-21 15:34:22 +02:00
Sebastien Boeuf
3e482c9c74 vmm: Limit physical address space for TDX
When running TDX guest, the Guest Physical Address space is limited by
a shared bit that is located on bit 47 for 4 level paging, and on bit 51
for 5 level paging (when GPAW bit is 1). In order to keep things simple,
and since a 47 bits address space is 128TiB large, we ensure to limit
the physical addressable space to 47 bits when runnning TDX.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-20 15:00:04 +02:00
Sebastien Boeuf
05f7651cf5 vmm: Force VIRTIO_F_IOMMU_PLATFORM when running TDX
When running a TDX guest, we need the virtio drivers to use the DMA API
to share specific memory pages with the VMM on the host. The point is to
let the VMM get access to the pages related to the buffers pointed by
the virtqueues.

The way to force the virtio drivers to use the DMA API is by exposing
the virtio devices with the feature VIRTIO_F_IOMMU_PLATFORM. This is a
feature indicating the device will require some address translation, as
it will not deal directly with physical addresses.

Cloud Hypervisor takes care of this requirement by adding a generic
parameter called "force_iommu". This parameter value is decided based on
the "tdx" feature gate, and then passed to the DeviceManager. It's up to
the DeviceManager to use this parameter on every virtio device creation,
which will imply setting the VIRTIO_F_IOMMU_PLATFORM feature.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-20 14:47:01 +02:00
Bo Chen
569be6e706 arch, vmm: Move "generate_common_cpuid" from "CpuManager" to "arch"
This refactoring ensures all CPUID related operations are centralized in
`arch::x86_64` module, and exposes only two related public functions to
the vmm crate, e.g. `generate_common_cpuid` and `configure_vcpu`.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-19 09:59:34 -07:00
Sebastien Boeuf
d4316d0228 vmm: http: Allow file descriptor to be sent with add-net
In order to let a separate process open a TAP device and pass the file
descriptor through the control message mechanism, this patch adds the
support for sending a file descriptor over to the Cloud Hypervisor
process along with the add-net HTTP API command.

The implementation uses the NetConfig structure mutably to update the
list of fds with the one passed through control message. The list should
always be empty prior to this, as it makes no sense to provide a list of
fds once the Cloud Hypervisor process has already been started.

It is important to note that reboot is supported since the file
descriptor is duplicated upon receival, letting the VM only use the
duplicated one. The original file descriptor is kept open in order to
support a potential reboot.

Fixes #2525

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-19 09:51:32 +02:00
Sebastien Boeuf
d68c388cac vmm: Update seccomp filters for HTTP thread
The micro-http crate now uses recvmsg() syscall in order to receive file
descriptors through control messages. This means the syscall must be
part of the authorized list in the seccomp filters.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-15 08:13:48 +00:00
Sebastien Boeuf
6b710209b1 numa: Add optional sgx_epc_sections field to NumaConfig
This new option allows the user to define a list of SGX EPC sections
attached to a specific NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
Sebastien Boeuf
9aedabe11e sgx: Add mandatory id field to SgxEpcConfig
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
Sebastien Boeuf
17c99ae00a vmm: Enable provisioning for SGX guest
The guest can see that SGX supports provisioning as it is exposed
through the CPUID. This patch enables the proper backing of this
feature by having the host open the provisioning device and enable
this capability through the hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-07 14:56:38 +02:00
Sebastien Boeuf
5b6d424a77 arch, vmm: Fix TDVF section handling
This patch fixes a few things to support TDVF correctly.

The HOB memory resources must contain EFI_RESOURCE_ATTRIBUTE_ENCRYPTED
attribute.

Any section with a base address within the already allocated guest RAM
must not be allocated.

The list of TD_HOB memory resources should contain both TempMem and
TdHob sections as well.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-06 11:47:43 +02:00
Henry Wang
4da3bdcd6e vmm: Split restore device_manager and devices
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-07-05 22:51:56 +02:00
Henry Wang
95ca4fb15e vmm: vm: Enable snapshot/restore of GICv3ITS
This commit enables the snapshot/restore of GICv3ITS in the process
of VM snapshot/restore.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-07-05 22:51:56 +02:00
Wei Liu
1f2915bff0 vmm: hypervisor: split set_user_memory_region to two functions
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.

MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.

Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.

This fixes user memory region removal on MSHV.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:45:45 +02:00
Wei Liu
71bbaf556f vmm: seccomp: add seccomp rules for MSHV
Add a minimum set of rules that allow Cloud Hypervisor to run Linux on
top of Microsoft Hypervisor.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Wei Liu
8819bb0f21 vmm: seccomp: make use of KVM feature
The to-be-introduced MSHV rules don't need to contain KVM rules and vice
versa.

Put KVM constants into to a module. This avoids the warnings about
dead code in the future.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Henry Wang
054c036e81 vmm: acpi: Add AArch64 vCPUs to SRAT table
This commit introduces the `ProcessorGiccAffinity` struct for the
AArch64 platform. This struct will be created and included into
the SRAT table to enable AArch64 NUMA setup.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-06-25 10:22:40 +01:00
Michael Zhao
239e39ddbc vmm: Fix clippy warnings on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-24 08:59:53 -07:00
Bo Chen
5768dcc320 vmm: Refactor slightly vm_boot and 'control_loop'
It ensures all handlers for `ApiRequest` in `control_loop` are
consistent and minimum and should read better.

No functional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 16:01:39 +02:00
Bo Chen
1075209e2a vmm: Handle ApiRequest::VmCreate in a separate function
It simplifies a bit the `Vmm::control_loop` and reads better to be
consistent with other `ApiRequest` handlers. Also, it removes the
repetitive `ApiError::VmAlreadyCreated` and makes `ApiError::VmCreate`
useful.

No functional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 16:01:39 +02:00
Michael Zhao
3613b4c096 aarch64: Enable default build option
We have been building Cloud Hypervisor with command like:
`cargo build --no-default-features --features ...`.

After implementing ACPI, we donot have to use specify all features
explicitly. Default build command `cargo build` can work.

This commit fixed some build warnings with default build option and
changed github workflow correspondingly.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-24 13:13:27 +01:00
Bo Chen
c4be0f4235 clippy: Address the issue 'needless-collect'
error: avoid using `collect()` when not needed
   --> vmm/src/vm.rs:630:86
    |
630 |             let node_id_list: Vec<u32> = configs.iter().map(|cfg| cfg.guest_numa_id).collect();
    |                                                                                      ^^^^^^^
...
664 |                         if !node_id_list.contains(&dest) {
    |                             ---------------------------- the iterator could be used here instead
    |
    = note: `-D clippy::needless-collect` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_collect

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Bo Chen
5825ab2dd4 clippy: Address the issue 'needless-borrow'
Issue from beta verion of clippy:

Error:    --> vm-virtio/src/queue.rs:700:59
    |
700 |             if let Some(used_event) = self.get_used_event(&mem) {
    |                                                           ^^^^ help: change this to: `mem`
    |
    = note: `-D clippy::needless-borrow` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Bo Chen
585269ecb9 clippy: Address the issue 'field is never read'
Issue from beta verion of clippy:

error: field is never read: `type`
   --> vmm/src/cpu.rs:235:5
    |
235 |     pub r#type: u8,
    |     ^^^^^^^^^^^^^^
    |
    = note: `-D dead-code` implied by `-D warnings`

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Rob Bradford
4d25eaa24a vmm: Add I/O port range to PCI bus resources
The Linux kernel expects that any PCI devices that advertise I/O bars
have use an address that is within the range advertised by the bus
itself. Unfortunately we were not advertising any I/O ports associated
with the PCI bus in the ACPI tables.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-23 16:48:52 +01:00
Rob Bradford
b56e1217b6 vmm: tdx: Add KVM_FEATURE_STEAL_TIME_BIT to filtered bits
Filter out the KVM_FEATURE_STEAL_TIME_BIT when running with TDX.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-18 15:54:10 +01:00
Sebastien Boeuf
a36ac96444 vmm: cpu_manager: Add _PXM ACPI method to each vCPU
In order to allow a hotplugged vCPU to be assigned to the correct NUMA
node in the guest, the DSDT table must expose the _PXM method for each
vCPU. This method defines the proximity domain to which each vCPU should
be attached to.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Sebastien Boeuf
09c3ddd47d vmm: memory_manager: Remove _PXM from ACPI memory slot
The _PXM method always return 0, which is wrong since the SRAT might
tell differently. The point of the _PXM method is to be evaluated by the
guest OS when some new memory slot is being plugged, but this will never
happen for Cloud Hypervisor since using NUMA nodes along with memory
hotplug only works for virtio-mem.

Memory hotplug through ACPI will only happen when there's only one NUMA
node exposed to the guest, which means the _PXM method won't be needed
at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Sebastien Boeuf
07f3075773 vmm: device_manager: Tie PCI bus to NUMA node 0
Make sure the unique PCI bus is tied to the default NUMA node 0, and
update the documentation to let the users know about this special case.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Fei Li
aa27f0e743 virtio-balloon: add deflate_on_oom support
Sometimes we need balloon deflate automatically to give memory
back to guest, especially for some low priority guest processes
under memory pressure. Enable deflate_on_oom to support this.

Usage: --balloon "size=0,deflate_on_oom=on" \

Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
2021-06-16 09:55:22 +02:00
Sebastien Boeuf
a6fe4aa7e9 virtio-devices, vmm: Update virtio-iommu to rely on VIOT
Since using the VIRTIO configuration to expose the virtual IOMMU
topology has been deprecated, the virtio-iommu implementation must be
updated.

In order to follow the latest patchset that is about to be merged in the
upstream Linux kernel, it must rely on ACPI, and in particular the newly
introduced VIOT table to expose the information about the list of PCI
devices attached to the virtual IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-15 17:05:59 +02:00
Michael Zhao
1a7a76511f vmm: Enable pty console on AArch64
Allowed syscall "SYS_readlinkat" on AArch64 which is required by pty.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-10 15:05:25 +02:00
Jianyong Wu
b8b5dccfd8 aarch64: Enable UEFI image loading
Implemented an architecture specific function for loading UEFI binary.

Changed the logic of loading kernel image:
1. First try to load the image as kernel in PE format;
2. If failed, try again to load it as formatless UEFI binary.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-06-09 18:36:59 +08:00
Jianyong Wu
6880692a78 vmm, acpi: Add DSM method to ACPI
_DSM (Device Specific Method) is a control method that enables devices
to provide device specific control functions. Linux kernel will evaluate
this device then initialize preserve_config in acpi pci initialization.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-06-09 18:36:59 +08:00
Rob Bradford
3dc15a9259 vmm: tdx: Don't access same locked structure twice
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-03 17:29:05 +02:00
Bo Chen
7839e121f6 vmm: Add dirty pages tracked by vm_memory::bitmap to live migration
Live migration currently handles guest memory writes from the guest
through the KVM dirty page tracking and sends those dirty pages to the
destination. This patch augments the live migration support with dirty
page tracking of writes from the VMM to the guest memory(e.g. virtio
devices).

Fixes: #2458

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
2c4fa258a6 virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)"
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].

[1] https://github.com/rust-vmm/vm-memory/issues/133

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
b5bcdbaf48 misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.

The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Rob Bradford
c357adae44 vmm: tdx: Clear unsupported KVM PV features
This matches with the features that QEMU clears as they are not
supported with TDX.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-01 23:00:54 +02:00
Michael Zhao
7f3fa39d81 vmm: Remove enable_interrupt_controller()
After adding "get_interrupt_controller()" function in DeviceManager,
"enable_interrupt_controller()" became redundant, because the latter
one is the a simple wrapper on the interrupt controller.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
9a5f3fc2a7 vmm: Remove "gicr" handling from DeviceManager
The function used to calculate "gicr-typer" value has nothing with
DeviceManager. Now it is moved to AArch64 specific files.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
7932cd22ca vmm: Remove GIC entity set/get from DeviceManager
Moved the set/get functions from vmm::DeviceManager to devices::Gic.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
195eba188a vmm: Split create_gic() from configure_system()
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Sebastien Boeuf
e9cc23ea94 virtio-devices: vhost_user: net: Move control queue back
We thought we could move the control queue to the backend as it was
making some good sense. Unfortunately, doing so was a wrong design
decision as it broke the compatibility with OVS-DPDK backend.

This is why this commit moves the control queue back to the VMM side,
meaning an additional thread is being run for handling the communication
with the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-05-26 16:09:32 +01:00
Michael Zhao
ff46fb69d0 aarch64: Fix IRQ number setting for ACPI
On FDT, VMM can allocate IRQ from 0 for devices.
But on ACPI, the lowest range below 32 has to be avoided.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
efc583c13e acpi: AArch64: Enable PSCI in FADT
This commit enables the PSCI (Power State Coordination Interface)
for the AArch64 platform, which allows the VMM to manage the power
status of the guest. Also, multiple vCPUs can be brought up using
PSCI.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
d882f8c928 acpi: Implement IORT for AArch64
This commit implements the IO Remapping Table (IORT) for AArch64.
The IORT is one of the required ACPI table for AArch64, since
it describes the GICv3ITS node.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
213da7d862 acpi: Implement SPCR on AArch64
This commit implements an AArch64-required ACPI table: Serial
Port Console Redirection Table (SPCR). The table provides
information about the configuration and use of the serial port
or non-legacy UART interface.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
9c5528490e acpi: Implement GTDT on AArch64
This commit implements an AArch64-specific ACPI table: Generic
Timer Description Table (GTDT). The GTDT provides OSPM with
information about a system’s Generic Timers configuration.

The Generic Timer (GT) is a standard timer interface implemented
on ARM processor-based systems.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Michael Zhao
7bfb51489b acpi: Implement MCFG for AArch64
Added the final PCI bus number in MCFG table. This field is mandatory on
AArch64. On X86 it is optional.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Michael Zhao
e1ef141112 acpi: Enable DSDT for CpuManager on AArch64
Simplified definition block of CPU's on AArch64. It is not complete yet.
Guest boots. But more is to do in future:
- Fix the error in ACPI definition blocks (seen in boot messages)
- Implement CPU hot-plug controller

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Michael Zhao
5f27d649a6 acpi: Enable DSDT for DeviceManager on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Michael Zhao
cab103d172 acpi: Implement MADT on AArch64
Added following structures in MADT table:
- GICC: GIC CPU interface structure
- GICD: GIC Distributor structure
- GICR: GIC Redistributor Structure
- GICITS: GIC Interrupt Translation Service Structure

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Rob Bradford
f840327ffb vmm: Version MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-21 15:29:52 +02:00
Sebastien Boeuf
9a7199a116 virtio-devices: vhost_user: blk: Cleanup device creation
Prepare the device creation so that it can be factorized in a follow up
commit.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-05-21 12:03:54 +02:00
renlei4
65a39f43cb vmm: support restore KVM clock in migration
In migration, vm object is created by new_from_migration with
NULL kvm clock. so vm.set_clock will not be called during vm resume.
If the guest using kvm-clock, the ticks will be stopped after migration.

As clock was already saved to snapshot, add a method to restore it before
vm resume in migration. after that, guest's kvm-clock works well.

Signed-off-by: Ren Lei <ren.lei4@zte.com.cn>
2021-05-20 14:32:49 +01:00
ren lei
324c924ca4 vmm: fix KVM clock lost during restore form snapshot
Connecting a restored KVM clock vm will take long time, as clock
is NOT restored immediately after vm resume from snapshot.

this is because 9ce6c3b incorrectly remove vm_snapshot.clock, and
always pass None to new_from_memory_manager, which will result to
kvm_set_clock() never be called during restore from snapshot.

Fixes: 9ce6c3b
Signed-off-by: Ren Lei <ren.lei4@zte.com.cn>
2021-05-20 11:31:01 +02:00
Sebastien Boeuf
5d2df70a79 virtio-devices: vhost_user: net: Remove control queue
Now that the control queue is correctly handled by the backend, there's
no need to handle it as well from the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-05-19 18:21:47 +02:00
Rob Bradford
7ae631e0c1 vmm: Hypervisor::check_required_extensions() is not x86_64 only
This function is implemented for aarch64 so should be called on it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-19 17:11:30 +02:00
Rob Bradford
0cf9218d3f hypervisor, vmm: Add default Hypervisor::check_required_extensions()
This allows the removal of KVM specific compile time checks on this
function.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-19 17:11:30 +02:00
Rob Bradford
2439625785 hypervisor: Cleanup unused Hypervisor trait members
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-19 17:11:30 +02:00
Michael Zhao
01acea3102 acpi: Refactor create_acpi_tables()
Split the code of create_acpi_tables() into multiple functions for
easy extension in future.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-19 09:59:59 +02:00
Michael Zhao
e4bb6409ae aarch64: Change memory layout for UEFI & ACPI
Before this change, the FDT was loaded at the end of RAM. The address of
FDT was not fixed.
While UEFI (edk2 now) requires fixed address to find FDT and RSDP.
Now the FDT is moved to the beginning of RAM, which is a fixed address.
RSDP is wrote to 2 MiB after FDT, also a fixed address.
Kernel comes 2 MiB after RSDP.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-18 23:24:09 +02:00
Rob Bradford
b282ff44d4 vmm: Enhance boot with info!() level messages
These messages are predominantly during the boot process but will also
occur during events such as hotplug.

These cover all the significant steps of the boot and can be helpful for
diagnosing performance and functionality issues during the boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-18 20:45:38 +02:00
Dayu Liu
8160c2884b docs: Fix some typos in docs and comments
Fix some typos or misspellings without functional change.

Signed-off-by: Dayu Liu <liu.dayu@zte.com.cn>
2021-05-18 17:19:12 +01:00
Rob Bradford
5296bd10c1 vmm: tdx: Reorder TDX ioctls
Separate the population of the memory and the HOB from the TDX
initialisation of the memory so that the latter can happen after the CPU
is initialised.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-17 07:39:52 +02:00
Rob Bradford
496ceed1d0 misc: Remove unnecessary "extern crate"
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-12 17:26:11 +02:00
Rob Bradford
b8f5911c4e misc: Remove unused errors from public interface
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-11 13:37:19 +02:00
Rob Bradford
30b74e74cd vmm: tdx: Reject attempt to use --kernel with --tdx
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-10 15:12:41 +01:00
Sebastien Boeuf
b5c6b04b36 virtio-devices, vmm: vhost: net: Add client mode support
Adding the support for an OVS vhost-user backend to connect as the
vhost-user client. This means we introduce with this patch a new
option to our `--net` parameter. This option is called 'server' in order
to ask the VMM to run as the server for the vhost-user socket.

Fixes #1745

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-05-05 16:05:51 +02:00
Rob Bradford
7e0ccce225 vmm: config: Validate that vCPUs is sufficient for MQ queue count
Fixes: #2563

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-04 19:49:34 +02:00
Rob Bradford
2ad615cd32 vmm: Validate config upon hotplug
Create a temporary copy of the config, add the new device and validate
that. This needs to be done separately to adding it to the config to
avoid race conditions that might be result in config changes being
overwritten.

Fixes: #2564

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-04 19:49:20 +02:00
Rob Bradford
151668b260 vmm: Simplify adding devices to config
To handle that devices are stored in an Option<Vec<T>> and reduce
duplicated code use generic function to add the devices to the the
struct.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-04 19:49:20 +02:00
Rob Bradford
f237789753 vmm: Output config used for booting VM at info!() level
This helps with debugging VM behaviour when the VM is created via an
API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-04 14:27:04 +01:00
Rob Bradford
da8136e49d arch, vmm: Remove support for LinuxBoot
By supporting just PVH boot on x86-64 we simplify our boot path
substatially.

Fixes: #2231

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-30 16:16:48 +02:00
Mikko Ylinen
3b18caf229 sgx: update virt EPC device path and docs
The latest kvm-sgx code has renamed sgx_virt_epc device node
to sgx_vepc. Update cloud-hypervisor code and documentation to
follow this.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-30 16:16:01 +02:00
William Douglas
a2cfe71c0a vmm: Update the http thread's seccomp filter
Because the http thread no longer needs to create the api socket,
remove the socket, bind and listen syscalls from the seccomp filter.

Signed-off-by: William Douglas <william.douglas@intel.com>
2021-04-29 09:44:40 +01:00
William Douglas
b8779ddc9e vmm: Create the api socket fd to pass to the http server
Instead of using the http server's method to have it create the
fd (causing the http thread to need to support the socket, bind and
listen syscalls). Create the socket fd in the vmm thread and use the
http server's new method supporting passing in this fd for the api
socket.

Signed-off-by: William Douglas <william.douglas@intel.com>
2021-04-29 09:44:40 +01:00
William Douglas
767b4f0e59 main: Enable the api-socket to be passed as an fd
To avoid race issues where the api-socket may not be created by the
time a cloud-hypervisor caller is ready to look for it, enable the
caller to pass the api-socket fd directly.

Avoid breaking current callers by allowing the --api-socket path to be
passed as it is now in addition to through the path argument.

Signed-off-by: William Douglas <william.r.douglas@gmail.com>
2021-04-26 14:40:49 -07:00
Rob Bradford
01f0c1e313 vmm: Simplify memory state to support Versionize
In order to support using Versionize for state structures it is necessary
to use simpler, primitive, data types in the state definitions used for
snapshot restore.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-23 14:24:16 +01:00
Sebastien Boeuf
0b00442022 vmm: acpi: Allow reading from B0EJ field
Windows guests read this field upon PCI device ejection. Let's make sure
we don't return an error as this is valid. We simply return an empty u32
since the ejection is done right away upon write access, which means
there's no pending ejection that might be reported to the guest.

Here is the error that was shown during PCI device removal:

ERROR:vmm/src/device_manager.rs:3960 -- Accessing unknown location at
base 0x7ffffee000, offset 0x8

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-04-21 16:11:54 +01:00
Rob Bradford
a7c4483b8b vmm: Directly (de)serialise CpuManager, DeviceManager and MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-20 18:58:37 +02:00
Bo Chen
78796f96b7 vmm: Refine the granularity of dirty memory tracking
Instead of tracking on a block level of 64 pages, we are now collecting
dirty pages one by one. It improves the efficiency of dirty memory
tracking while live migration.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-04-19 17:17:14 +02:00
Sebastien Boeuf
b92fe648e9 vmm: cpu: Disable KVM_FEATURE_ASYNC_PF_INT in CPUID
By disabling this KVM feature, we prevent the guest from using APF
(Asynchronous Page Fault) mechanism. The kernel has recently switched to
using interrupts to notify about a page being ready, but for some
reasons, this is causing unexpected behavior with Cloud Hypervisor, as
it will make the vcpu thread spin at 100%.

While investigating the issue, it's better to disable the KVM feature to
prevent 100% CPU usage in some cases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-04-15 10:08:45 +01:00
Wei Liu
810ed7e887 vmm: interrupt: drop unnecessary type from impl
The original code had a generic type E. It was later replaced by a
concrete type. The code should have been simplified when the replacement
happened.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-04-13 14:18:14 +01:00
Rob Bradford
86e4067437 vmm: config: Reject reserved fd from network config
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-13 14:29:18 +02:00
Rob Bradford
e0c0d0e142 vmm: config: Validate network configuration
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-13 14:29:18 +02:00
Rob Bradford
17072e9a6f vmm: seccomp: Add missing SYS_newfstatat
This is used when running on a new libc like Fedora34.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-12 18:02:29 +02:00
Anatol Belski
e1cc702327 memory_manager: Fix address range calculation in MemorySlot
The MCRS method returns a 64-bit memory range descriptor. The
calculation is supposed to be done as follows:

max = min + len - 1

However, every operand is represented not as a QWORD but as combination
of two DWORDs for high and low part. Till now, the calculation was done
this way, please see also inline comments:

max.lo = min.lo + len.lo //this may overflow, need to carry over to high
max.hi = min.hi + len.hi
max.hi = max.hi - 1 // subtraction needs to happen on the low part

This calculation has been corrected the following way:

max.lo = min.lo + len.lo
max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow
max.lo = max.lo - 1 // subtract from low part

The relevant part from the generated ASL for the MCRS method:
```
Method (MCRS, 1, Serialized)
{
    Acquire (MLCK, 0xFFFF)
    \_SB.MHPC.MSEL = Arg0
    Name (MR64, ResourceTemplate ()
    {
	QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
	    0x0000000000000000, // Granularity
	    0x0000000000000000, // Range Minimum
	    0xFFFFFFFFFFFFFFFE, // Range Maximum
	    0x0000000000000000, // Translation Offset
	    0xFFFFFFFFFFFFFFFF, // Length
	    ,, _Y00, AddressRangeMemory, TypeStatic)
    })
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL)  // _MIN: Minimum Base Address
    CreateDWordField (MR64, 0x12, MINH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL)  // _MAX: Maximum Base Address
    CreateDWordField (MR64, 0x1A, MAXH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL)  // _LEN: Length
    CreateDWordField (MR64, 0x2A, LENH)
    MINL = \_SB.MHPC.MHBL
    MINH = \_SB.MHPC.MHBH
    LENL = \_SB.MHPC.MHLL
    LENH = \_SB.MHPC.MHLH
    MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL */
    MAXH = (MINH + LENH) /* \_SB_.MHPC.MCRS.LENH */
    If ((MAXL < MINL))
    {
	MAXH += One /* \_SB_.MHPC.MCRS.MAXH */
    }

    MAXL -= One
    Release (MLCK)
    Return (MR64) /* \_SB_.MHPC.MCRS.MR64 */
}
```

Fixes #1800.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2021-04-12 16:20:19 +02:00
Sebastien Boeuf
46f96f27a4 vmm: Add missing syscall for vCPU unplug
clock_nanosleep() is triggered when hot-unplugging a vCPU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-31 10:51:52 +01:00
Bo Chen
5d8de62362 vmm: openapi: Add rate_limiter_config to the NetConfig
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-30 19:47:43 +02:00
Bo Chen
b176ddfe2a virtio-devices, vmm: Add rate limiter for the TX queue of virtio-net
Partially fixes: #1286

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-30 19:47:43 +02:00
Sebastien Boeuf
73e8fd4d72 clippy: Fix codebase to compile with beta toolchain
Fixes the current codebase so that every cargo clippy can be run with
the beta toolchain without any error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-29 15:56:23 +01:00
Anatol Belski
9e9aba7c0b CpuManager: Fix MMIO read handling
There are two parts:

- Unconditionally zero the output area. The length of the incoming
  vector has been seen from 1 to 4 bytes, even though just the first
  byte might need to be handled. But also, this ensures any possibly
  unhandled offset will return zeroed result to the caller. The former
  implementation used an I/O port which seems to behave differently from
  MMIO and wouldn't require explicit output zeroing.
- An access with zero offset still takes place and needs to be handled.

Fixes #2437.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2021-03-29 13:51:31 +01:00
Rob Bradford
431c16dc44 vmm: Use definition of MmioDeviceInfo from arch
Remove duplicated copies from vmm.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-29 12:06:07 +02:00
Gaelan Steele
21506a4a76 vmm: reorder constructor fields to match struct
Satisfies nightly clippy.

Signed-off-by: Gaelan Steele <gbs@canishe.com>
2021-03-29 09:55:29 +02:00
Gaelan Steele
b161a570ec vmm: use Option::map and Option::cloned
It's more concise, more idiomatic Rust, and satisfies nightly clippy.

Signed-off-by: Gaelan Steele <gbs@canishe.com>
2021-03-29 09:55:29 +02:00
Anatol Belski
5b168f54a6 hyperv: Fix CPU hotadd
The following is from the Hyper-V specification v6.0b.

Cpuid leaf 0x40000003 EDX:

Bit 3: Support for physical CPU dynamic partitioning events is
available.

When Windows determines to be running under a hypervisor, it will
require this cpuid bit to be set to support dynamic CPU operations.

Cpuid leaf 0x40000004 EAX:

Bit 5: Recommend using relaxed timing for this partition. If
used, the VM should disable any watchdog timeouts that
rely on the timely delivery of external interrupts.

This bit has been figured out as required after seeing guest BSOD
when CPU hotplug bit is enabled. Race conditions seem to arise after a
hotplug operation, when a system watchdog has expired.

Closes #1799.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2021-03-26 14:06:51 +01:00
Rob Bradford
40da6210f4 aarch64: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
error: name `GPIOInterruptDisabled` contains a capitalized acronym

Error:   --> devices/src/legacy/gpio_pl061.rs:46:5
   |
46 |     GPIOInterruptDisabled,
   |     ^^^^^^^^^^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `GpioInterruptDisabled`
   |
   = note: `-D clippy::upper-case-acronyms` implied by `-D warnings`
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
3c6dfd7709 tdx: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
error: name `FinalizeTDX` contains a capitalized acronym
   --> vmm/src/vm.rs:274:5
    |
274 |     FinalizeTDX(hypervisor::HypervisorVmError),
    |     ^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `FinalizeTdx`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
c5d15fd938 devices: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `AcpiPMTimerDevice` contains a capitalized acronym
   --> devices/src/acpi.rs:175:12
    |
175 | pub struct AcpiPMTimerDevice {
    |            ^^^^^^^^^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `AcpiPmTimerDevice`
    |
    = note: `#[warn(clippy::upper_case_acronyms)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
827229d8e4 pci: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `IORegion` contains a capitalized acronym
   --> pci/src/configuration.rs:320:5
    |
320 |     IORegion = 0x01,
    |     ^^^^^^^^ help: consider making the acronym lowercase, except the initial letter (notice the capitalization): `IoRegion`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
3b8d1f1411 vmm: Address Rust 1.51.0 clippy issue (vec_init_then_push)
warning: calls to `push` immediately after creation
   --> vmm/src/cpu.rs:630:9
    |
630 | /         let mut cpuid_patches = Vec::new();
631 | |
632 | |         // Patch tsc deadline timer bit
633 | |         cpuid_patches.push(CpuidPatch {
...   |
662 | |             edx_bit: Some(MTRR_EDX_BIT),
663 | |         });
    | |___________^ help: consider using the `vec![]` macro: `let mut cpuid_patches = vec![..];`
    |
    = note: `#[warn(clippy::vec_init_then_push)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#vec_init_then_push

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
9762c8bc28 vmm: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `LocalAPIC` contains a capitalized acronym
   --> vmm/src/cpu.rs:197:8
    |
197 | struct LocalAPIC {
    |        ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
7c302373ed vmm: Address Rust 1.51.0 clippy issue (needless_question_mark)
warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:493:5
    |
493 | /     Ok(SeccompFilter::new(
494 | |         rules.into_iter().collect(),
495 | |         SeccompAction::Trap,
496 | |     )?)
    | |_______^
    |
    = note: `#[warn(clippy::needless_question_mark)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
493 |     SeccompFilter::new(
494 |         rules.into_iter().collect(),
495 |         SeccompAction::Trap,
496 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:507:5
    |
507 | /     Ok(SeccompFilter::new(
508 | |         rules.into_iter().collect(),
509 | |         SeccompAction::Log,
510 | |     )?)
    | |_______^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
507 |     SeccompFilter::new(
508 |         rules.into_iter().collect(),
509 |         SeccompAction::Log,
510 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/vm.rs:887:9
    |
887 |         Ok(CString::new(cmdline).map_err(Error::CmdLineCString)?)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `CString::new(cmdline).map_err(Error::CmdLineCString)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00