Commit Graph

1342 Commits

Author SHA1 Message Date
Muminul Islam
3937e03c02 vmm, virtio-devices: Extend mshv feature
There are some seccomp rules needed for MSHV
in virtio-devices but not for KVM. We only want to
add those rules based on MSHV feature guard.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-15 11:05:11 -07:00
Sebastien Boeuf
d68c388cac vmm: Update seccomp filters for HTTP thread
The micro-http crate now uses recvmsg() syscall in order to receive file
descriptors through control messages. This means the syscall must be
part of the authorized list in the seccomp filters.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-15 08:13:48 +00:00
Wei Liu
39bc444db4 vmm, vm-device: make use of the kvm feature gate in vfio-ioctls
The vfio-ioctls crate now contains a KVM feature gate. Make use of it in
Cloud Hypervisor.

That crate has two users. For the vmm crate is it straight-forward. For
the vm-device crate, we introduce a KVM feature gate as well so that the
vmm crate can pass on the configuration.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-15 09:35:51 +02:00
Sebastien Boeuf
6b710209b1 numa: Add optional sgx_epc_sections field to NumaConfig
This new option allows the user to define a list of SGX EPC sections
attached to a specific NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
Sebastien Boeuf
9aedabe11e sgx: Add mandatory id field to SgxEpcConfig
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
dependabot[bot]
5effa20a5b build: bump libc from 0.2.97 to 0.2.98
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.97 to 0.2.98.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.97...0.2.98)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-08 04:15:16 +00:00
Sebastien Boeuf
17c99ae00a vmm: Enable provisioning for SGX guest
The guest can see that SGX supports provisioning as it is exposed
through the CPUID. This patch enables the proper backing of this
feature by having the host open the provisioning device and enable
this capability through the hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-07 14:56:38 +02:00
Sebastien Boeuf
5b6d424a77 arch, vmm: Fix TDVF section handling
This patch fixes a few things to support TDVF correctly.

The HOB memory resources must contain EFI_RESOURCE_ATTRIBUTE_ENCRYPTED
attribute.

Any section with a base address within the already allocated guest RAM
must not be allocated.

The list of TD_HOB memory resources should contain both TempMem and
TdHob sections as well.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-06 11:47:43 +02:00
Henry Wang
4da3bdcd6e vmm: Split restore device_manager and devices
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-07-05 22:51:56 +02:00
Henry Wang
95ca4fb15e vmm: vm: Enable snapshot/restore of GICv3ITS
This commit enables the snapshot/restore of GICv3ITS in the process
of VM snapshot/restore.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-07-05 22:51:56 +02:00
Wei Liu
1f2915bff0 vmm: hypervisor: split set_user_memory_region to two functions
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.

MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.

Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.

This fixes user memory region removal on MSHV.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:45:45 +02:00
Wei Liu
71bbaf556f vmm: seccomp: add seccomp rules for MSHV
Add a minimum set of rules that allow Cloud Hypervisor to run Linux on
top of Microsoft Hypervisor.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Wei Liu
8819bb0f21 vmm: seccomp: make use of KVM feature
The to-be-introduced MSHV rules don't need to contain KVM rules and vice
versa.

Put KVM constants into to a module. This avoids the warnings about
dead code in the future.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Henry Wang
054c036e81 vmm: acpi: Add AArch64 vCPUs to SRAT table
This commit introduces the `ProcessorGiccAffinity` struct for the
AArch64 platform. This struct will be created and included into
the SRAT table to enable AArch64 NUMA setup.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-06-25 10:22:40 +01:00
Michael Zhao
239e39ddbc vmm: Fix clippy warnings on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-24 08:59:53 -07:00
Bo Chen
5768dcc320 vmm: Refactor slightly vm_boot and 'control_loop'
It ensures all handlers for `ApiRequest` in `control_loop` are
consistent and minimum and should read better.

No functional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 16:01:39 +02:00
Bo Chen
1075209e2a vmm: Handle ApiRequest::VmCreate in a separate function
It simplifies a bit the `Vmm::control_loop` and reads better to be
consistent with other `ApiRequest` handlers. Also, it removes the
repetitive `ApiError::VmAlreadyCreated` and makes `ApiError::VmCreate`
useful.

No functional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 16:01:39 +02:00
Michael Zhao
3613b4c096 aarch64: Enable default build option
We have been building Cloud Hypervisor with command like:
`cargo build --no-default-features --features ...`.

After implementing ACPI, we donot have to use specify all features
explicitly. Default build command `cargo build` can work.

This commit fixed some build warnings with default build option and
changed github workflow correspondingly.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-24 13:13:27 +01:00
Bo Chen
c4be0f4235 clippy: Address the issue 'needless-collect'
error: avoid using `collect()` when not needed
   --> vmm/src/vm.rs:630:86
    |
630 |             let node_id_list: Vec<u32> = configs.iter().map(|cfg| cfg.guest_numa_id).collect();
    |                                                                                      ^^^^^^^
...
664 |                         if !node_id_list.contains(&dest) {
    |                             ---------------------------- the iterator could be used here instead
    |
    = note: `-D clippy::needless-collect` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_collect

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Bo Chen
5825ab2dd4 clippy: Address the issue 'needless-borrow'
Issue from beta verion of clippy:

Error:    --> vm-virtio/src/queue.rs:700:59
    |
700 |             if let Some(used_event) = self.get_used_event(&mem) {
    |                                                           ^^^^ help: change this to: `mem`
    |
    = note: `-D clippy::needless-borrow` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Bo Chen
585269ecb9 clippy: Address the issue 'field is never read'
Issue from beta verion of clippy:

error: field is never read: `type`
   --> vmm/src/cpu.rs:235:5
    |
235 |     pub r#type: u8,
    |     ^^^^^^^^^^^^^^
    |
    = note: `-D dead-code` implied by `-D warnings`

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-24 08:55:43 +02:00
Rob Bradford
4d25eaa24a vmm: Add I/O port range to PCI bus resources
The Linux kernel expects that any PCI devices that advertise I/O bars
have use an address that is within the range advertised by the bus
itself. Unfortunately we were not advertising any I/O ports associated
with the PCI bus in the ACPI tables.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-23 16:48:52 +01:00
Rob Bradford
b56e1217b6 vmm: tdx: Add KVM_FEATURE_STEAL_TIME_BIT to filtered bits
Filter out the KVM_FEATURE_STEAL_TIME_BIT when running with TDX.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-18 15:54:10 +01:00
Sebastien Boeuf
a36ac96444 vmm: cpu_manager: Add _PXM ACPI method to each vCPU
In order to allow a hotplugged vCPU to be assigned to the correct NUMA
node in the guest, the DSDT table must expose the _PXM method for each
vCPU. This method defines the proximity domain to which each vCPU should
be attached to.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Sebastien Boeuf
09c3ddd47d vmm: memory_manager: Remove _PXM from ACPI memory slot
The _PXM method always return 0, which is wrong since the SRAT might
tell differently. The point of the _PXM method is to be evaluated by the
guest OS when some new memory slot is being plugged, but this will never
happen for Cloud Hypervisor since using NUMA nodes along with memory
hotplug only works for virtio-mem.

Memory hotplug through ACPI will only happen when there's only one NUMA
node exposed to the guest, which means the _PXM method won't be needed
at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Sebastien Boeuf
07f3075773 vmm: device_manager: Tie PCI bus to NUMA node 0
Make sure the unique PCI bus is tied to the default NUMA node 0, and
update the documentation to let the users know about this special case.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Fei Li
aa27f0e743 virtio-balloon: add deflate_on_oom support
Sometimes we need balloon deflate automatically to give memory
back to guest, especially for some low priority guest processes
under memory pressure. Enable deflate_on_oom to support this.

Usage: --balloon "size=0,deflate_on_oom=on" \

Signed-off-by: Fei Li <lifei.shirley@bytedance.com>
2021-06-16 09:55:22 +02:00
Sebastien Boeuf
a6fe4aa7e9 virtio-devices, vmm: Update virtio-iommu to rely on VIOT
Since using the VIRTIO configuration to expose the virtual IOMMU
topology has been deprecated, the virtio-iommu implementation must be
updated.

In order to follow the latest patchset that is about to be merged in the
upstream Linux kernel, it must rely on ACPI, and in particular the newly
introduced VIOT table to expose the information about the list of PCI
devices attached to the virtual IOMMU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-15 17:05:59 +02:00
dependabot[bot]
428c637506 build: bump libc from 0.2.96 to 0.2.97
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.96 to 0.2.97.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.96...0.2.97)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-14 09:50:38 +00:00
Michael Zhao
1a7a76511f vmm: Enable pty console on AArch64
Allowed syscall "SYS_readlinkat" on AArch64 which is required by pty.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-10 15:05:25 +02:00
Jianyong Wu
b8b5dccfd8 aarch64: Enable UEFI image loading
Implemented an architecture specific function for loading UEFI binary.

Changed the logic of loading kernel image:
1. First try to load the image as kernel in PE format;
2. If failed, try again to load it as formatless UEFI binary.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-06-09 18:36:59 +08:00
Jianyong Wu
6880692a78 vmm, acpi: Add DSM method to ACPI
_DSM (Device Specific Method) is a control method that enables devices
to provide device specific control functions. Linux kernel will evaluate
this device then initialize preserve_config in acpi pci initialization.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2021-06-09 18:36:59 +08:00
dependabot[bot]
c866ccb3a1 build: bump libc from 0.2.95 to 0.2.96
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.95 to 0.2.96.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.95...0.2.96)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-09 07:27:37 +00:00
Rob Bradford
3dc15a9259 vmm: tdx: Don't access same locked structure twice
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-03 17:29:05 +02:00
Bo Chen
7839e121f6 vmm: Add dirty pages tracked by vm_memory::bitmap to live migration
Live migration currently handles guest memory writes from the guest
through the KVM dirty page tracking and sends those dirty pages to the
destination. This patch augments the live migration support with dirty
page tracking of writes from the VMM to the guest memory(e.g. virtio
devices).

Fixes: #2458

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
2c4fa258a6 virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)"
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].

[1] https://github.com/rust-vmm/vm-memory/issues/133

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
b5bcdbaf48 misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.

The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
dependabot[bot]
f3d3d8daed build: bump signal-hook from 0.3.8 to 0.3.9
Bumps [signal-hook](https://github.com/vorner/signal-hook) from 0.3.8 to 0.3.9.
- [Release notes](https://github.com/vorner/signal-hook/releases)
- [Changelog](https://github.com/vorner/signal-hook/blob/master/CHANGELOG.md)
- [Commits](https://github.com/vorner/signal-hook/compare/v0.3.8...v0.3.9)

---
updated-dependencies:
- dependency-name: signal-hook
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-02 07:48:03 +00:00
Rob Bradford
c357adae44 vmm: tdx: Clear unsupported KVM PV features
This matches with the features that QEMU clears as they are not
supported with TDX.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-06-01 23:00:54 +02:00
Michael Zhao
7f3fa39d81 vmm: Remove enable_interrupt_controller()
After adding "get_interrupt_controller()" function in DeviceManager,
"enable_interrupt_controller()" became redundant, because the latter
one is the a simple wrapper on the interrupt controller.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
9a5f3fc2a7 vmm: Remove "gicr" handling from DeviceManager
The function used to calculate "gicr-typer" value has nothing with
DeviceManager. Now it is moved to AArch64 specific files.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
7932cd22ca vmm: Remove GIC entity set/get from DeviceManager
Moved the set/get functions from vmm::DeviceManager to devices::Gic.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Michael Zhao
195eba188a vmm: Split create_gic() from configure_system()
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-01 16:56:43 +01:00
Sebastien Boeuf
e9cc23ea94 virtio-devices: vhost_user: net: Move control queue back
We thought we could move the control queue to the backend as it was
making some good sense. Unfortunately, doing so was a wrong design
decision as it broke the compatibility with OVS-DPDK backend.

This is why this commit moves the control queue back to the VMM side,
meaning an additional thread is being run for handling the communication
with the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-05-26 16:09:32 +01:00
dependabot[bot]
6c245f6cf1 build: Manual seccomp bump
Seccomp needs to be bumped in the main tree and fuzz at the same time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-26 14:39:43 +02:00
dependabot[bot]
dd92715ed2 build: Bump libc from 0.2.94 to 0.2.95
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.94 to 0.2.95.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.94...0.2.95)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-26 07:18:40 +00:00
Michael Zhao
ff46fb69d0 aarch64: Fix IRQ number setting for ACPI
On FDT, VMM can allocate IRQ from 0 for devices.
But on ACPI, the lowest range below 32 has to be avoided.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
efc583c13e acpi: AArch64: Enable PSCI in FADT
This commit enables the PSCI (Power State Coordination Interface)
for the AArch64 platform, which allows the VMM to manage the power
status of the guest. Also, multiple vCPUs can be brought up using
PSCI.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
d882f8c928 acpi: Implement IORT for AArch64
This commit implements the IO Remapping Table (IORT) for AArch64.
The IORT is one of the required ACPI table for AArch64, since
it describes the GICv3ITS node.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00
Henry Wang
213da7d862 acpi: Implement SPCR on AArch64
This commit implements an AArch64-required ACPI table: Serial
Port Console Redirection Table (SPCR). The table provides
information about the configuration and use of the serial port
or non-legacy UART interface.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2021-05-25 10:20:37 +02:00