2161 Commits

Author SHA1 Message Date
Sebastien Boeuf
a6959a7469 vmm: Move DeviceManager to new restore design
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 13:46:31 +01:00
Sebastien Boeuf
4487c8376b vmm: Move CpuManager and Vcpu to the new restore design
Every Vcpu is now created with the right state if there's an available
snapshot associated with it. This simplifies the restore logic.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 09:27:00 +01:00
Sebastien Boeuf
b62a40efae virtio-devices, vmm: Always restore virtio devices in paused state
Following the new restore design, it is not appropriate to set every
virtio device threads into a paused state after they've been started.

This is why we remove the line of code pausing the devices only after
they've been restored, and replace it with a small patch in every virtio
device implementation. When a virtio device is created as part of a
restored VM, the associated "paused" boolean is set to true. This
ensures the corresponding thread will be directly parked when being
started, avoiding the thread to be in a different state than the one it
was on the source VM during the snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-12-01 09:27:00 +01:00
Bo Chen
ec94ae31ee vmm: EpollContext: Allow to add custom epoll events for fuzzing
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-30 12:13:14 +00:00
Sebastien Boeuf
90b5014a50 vmm: device_manager: Remove 'restoring' attribute
Given 'restoring' isn't needed anymore from the DeviceManager structure,
let's simplify it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Sebastien Boeuf
cc3706afe1 pci, vmm: Move VfioPciDevice and VfioUserPciDevice to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-29 13:46:30 +01:00
Rob Bradford
6f8bd27cf7 build: Bulk update dependencies
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-28 16:57:49 +00:00
Sebastien Boeuf
81862e8ed3 devices, vmm: Move Gpio to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
9fbf52b998 devices, vmm: Move Pl011 to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
0bd910e8b0 devices, vmm: Move Serial to new restore design
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:37:29 +00:00
Sebastien Boeuf
ef92e55998 devices, vmm: Move Ioapic to new restore design
Moving the Ioapic object to the new restore design, meaning the Ioapic
is created directly with the right state, and it shares the same
codepath as when it's created from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-25 17:18:21 +01:00
Sebastien Boeuf
a50b3784fe virtio-devices: Create a proper result type for VirtioPciDevice
Creating a dedicated Result type for VirtioPciDevice, associated with
the new VirtioPciDeviceError enum. This allows for a clearer handling of
the errors generated through VirtioPciDevice::new().

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Sebastien Boeuf
eae8043890 pci, virtio-devices: Move VirtioPciDevice to the new restore design
The code for restoring a VirtioPciDevice has been updated, including the
dependencies VirtioPciCommonConfig, MsixConfig and PciConfiguration.

It's important to note that both PciConfiguration and MsixConfig still
have restore() implementations because Vfio and VfioUser devices still
rely on the old way for restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 18:37:40 +00:00
Michael Zhao
7d16c74020 vmm: Refactor AArch64 GIC initialization process
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings

The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
86e7f07485 vmm: cpu: Create vCPUs before the DeviceManager
Moving the creation of the vCPUs before the DeviceManager gets created
will allow for the aarch64 vGIC to be created before the DeviceManager
as well in a follow up patch. The end goal being to adopt the same
creation sequence for both x86_64 and aarch64, and keeping in mind that
the vGIC requires every vCPU to be created.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
578780ed0c vmm: cpu: Split vCPU creation
Split the vCPU creation into two distincts parts. On the one hand we
create the actual Vcpu object with the creation of the hypervisor::Vcpu.
And on the other hand, we configure the existing Vcpu, setting registers
to proper values (such as setting the entry point).

This will allow for further work to move the creation earlier in the
boot, so that the hypervisor::Vcpu will be already created when the
DeviceManager gets created.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-11-23 11:49:57 +01:00
Sebastien Boeuf
ec01062ada vmm: Switch order between DeviceManager and CpuManager creation
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.

It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-11-23 11:49:57 +01:00
Rob Bradford
e37ec26ccf vmm: Remove PCI PIO optimisation
This optimisation provided some peformance improvement when measured by
perf however when considered in terms of boot time peformance this
optimisation doesn't have any impact measurable using our
peformance-metrics tooling.

Removing this optimisation helps simplify the VMM internals as it allows
the reordering of the VM creation process permitting refactoring of the
restore code path.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-22 19:47:53 +00:00
Wei Liu
d05586f520 vmm: modify or provide safety comments
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-11-18 12:50:01 +00:00
Wei Liu
d274fe9cb8 vmm: fix tdx check
The field has been moved in 3793ffe888db.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-11-18 12:50:01 +00:00
Praveen K Paladugu
09e79a5e9b vmm: Add tpm device to mmio bus
Add tpm device to mmio bus if appropriate cmdline arguments were
passed.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2022-11-15 16:42:21 +00:00
Praveen K Paladugu
af261f231c vmm: Add required acpi entries for vtpm device
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
2022-11-15 16:42:21 +00:00
Praveen K Paladugu
7122e2989c vmm: Add tpm parameter
Add an optional --tpm parameter that takes UNIX Domain
Socket from swtpm.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2022-11-15 16:42:21 +00:00
dependabot[bot]
f93aa42319 build: Bump once_cell from 1.15.0 to 1.16.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.15.0 to 1.16.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.15.0...v1.16.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-12 08:30:12 +00:00
Rob Bradford
6230929d51 openapi: Add thp option to MemoryConfig
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
f603afc46e vmm: Make Transparent Huge Pages controllable (default on)
Add MemoryConfig::thp and `--memory thp=on|off` to allow control of
Transparent Huge Pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
b68add2d0d vmm: Enable THP when using anonymous memory
If the memory is not backed by a file then it is possible to enable
Transparent Huge Pages on the memory and take advantage of the benefits
of huge pages without requiring the specific allocation of an appropriate
number of huge pages.

TEST=Boot and see that in /proc/`pidof cloud-hypervisor`/smaps that the
region is now THPeligible (and that also pages are being used.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-09 16:51:21 +00:00
Rob Bradford
6e0bd73c90 build: Bump linux-loader from 0.6.0 to 0.7.0
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-11-02 11:02:00 +00:00
Bo Chen
a9ec0f33c0 misc: Fix clippy issues
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-11-02 09:41:43 +01:00
dependabot[bot]
9266ea4995 build: Bump clap from 4.0.17 to 4.0.18
Bumps [clap](https://github.com/clap-rs/clap) from 4.0.17 to 4.0.18.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.0.17...v4.0.18)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-01 00:41:51 +00:00
Rob Bradford
f4495de143 vmm: Improve handling of shared memory backing
As huge pages are always MAP_SHARED then where the shared memory would
be checked (for vhost-user and local migration) we can also check
instead for huge pages.

The checking is also extended to cover the memory zones based
configuration as well.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
99d9a3d299 vmm: memory_manager: Avoid MAP_PRIVATE CoW with VFIO for hugepages too
We can't use MAP_ANONYMOUS and still have huge pages so MAP_SHARED is
effectively required when using huge pages.

Unfortunately it is not as simple as always forcing MAP_SHARED if
hugepages is on as this might be inappropriate in the backing file case
hence why there is additional complexity of assigning to mmap_flags on
each case and the MAP_SHARED is only turned on for the anonymous file
huge page case as well as anonymous shared file case.

See: #4805

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
df7c728399 vmm: memory_manager: Only file back memory when required
If we do not need an anonymous file backing the memory then do not
create one.

As a side effect this addresses an issue with CoW (mmap with MAP_PRIVATE
but no MAP_ANONYMOUS) when the memory is pinned for VFIO.

Fixes: #4805

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
1e5a4e8d77 vmm: memory_manager: Split filesystem backed and anonymous RAM creation
This simplifies the code somewhat making the code paths more readable.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Rob Bradford
ff3fb91ba6 vmm: Refactor creation of the FileOffset for GuestRegionMmap::new()
Create this earlier so that it is possible to pass a None in for
anonymous mappings.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-31 22:28:29 +00:00
Jinrong Liang
cb171d4a23 device_manager: Avoid checking io_uring support when it's not needed
After testing, io_uring_is_supported() causes about 38ms of
overhead when creating virtio-blk. By modifying the position
of io_uring_is_supported(), the overhead of creating virtio-blk
is reduced to less than 1ms when we close io_uring.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
2022-10-27 22:21:51 -07:00
dependabot[bot]
bc310bb173 build: Bump libc from 0.2.135 to 0.2.137
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.135 to 0.2.137.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.135...0.2.137)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-27 23:57:43 +00:00
Wei Liu
b99b2bc990 memory_manager: use MFD_CLOEXEC flag when creating memory fd
Until there is a need for sharing the memory fd with a child process, we
should err on the safe side to close it on exec.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-10-27 09:20:08 +02:00
Sebastien Boeuf
1f0e5eb66a vmm: virtio-devices: Restore every VirtioDevice upon creation
Following the new design proposal to improve the restore codepath when
migrating a VM, all virtio devices are supplied with an optional state
they can use to restore from. The restore() implementation every device
was providing has been removed in order to prevent from going through
the restoration twice.

Here is the list of devices now following the new restore design:

- Block (virtio-block)
- Net (virtio-net)
- Rng (virtio-rng)
- Fs (vhost-user-fs)
- Blk (vhost-user-block)
- Net (vhost-user-net)
- Pmem (virtio-pmem)
- Vsock (virtio-vsock)
- Mem (virtio-mem)
- Balloon (virtio-balloon)
- Watchdog (virtio-watchdog)
- Vdpa (vDPA)
- Console (virtio-console)
- Iommu (virtio-iommu)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
Sebastien Boeuf
157db33d65 vmm: Refactor hypervisor::Vm creation on restore
This prevents from leaking implementation details to lib.rs, and rather
keep them in vm.rs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-24 14:17:08 +02:00
dependabot[bot]
40df6c3787 build: Bump serde from 1.0.145 to 1.0.147
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.145 to 1.0.147.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.145...v1.0.147)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-24 08:41:05 +00:00
Fabiano Fidêncio
b4e3942708 api: Fix vm.add-device argument type
The add_device() function, from the device manager code, takes a
DeviceConfig as a parameter, instead of a VmAddDevice.

The change was originally done as part of 34412c9b41126 and it didn't
break Kata Containers because the VmAddDevice and DeviceConfig structs
share most of their fields, besides the optional for serialization
`pci_segment`, which is not used by the client yet.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-21 11:09:55 -07:00
dependabot[bot]
e710e21744 build: Bump anyhow from 1.0.65 to 1.0.66
Bumps [anyhow](https://github.com/dtolnay/anyhow) from 1.0.65 to 1.0.66.
- [Release notes](https://github.com/dtolnay/anyhow/releases)
- [Commits](https://github.com/dtolnay/anyhow/compare/1.0.65...1.0.66)

---
updated-dependencies:
- dependency-name: anyhow
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-21 00:04:03 +00:00
dependabot[bot]
d7afa3c47e build: Bump serde_json from 1.0.86 to 1.0.87
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.86 to 1.0.87.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.86...v1.0.87)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-19 23:44:35 +00:00
dependabot[bot]
f63cf2ebc0 build: Bump clap from 4.0.15 to 4.0.17
Bumps [clap](https://github.com/clap-rs/clap) from 4.0.15 to 4.0.17.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.0.15...v4.0.17)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-18 23:47:28 +00:00
Sebastien Boeuf
c52ccf3992 vmm: migration: Create destination VM right before to restore it
This is preliminary work to ensure a migrated VM is created right before
it is restored. This will be useful when moving to a design where the VM
is both created and restored simultaneously from the Snapshot.

In details, that means the MemoryManager is the object that must be
created upon receiving the config from the source VM, so that memory
content can be later received and filled into the GuestMemory.
Only after these steps happened, the snapshot is received from the
source VM, and the actual Vm object can be created from both the
snapshot and the MemoryManager previously created.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-18 17:14:29 +02:00
Rob Bradford
a75d71f2c8 vmm: Reduce logging severity for unknown MMIO/PIO device accesses
These look alarming if you are booting with the a distro kernel which is
now a recommended approach.

See: #4786

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-10-17 10:08:36 -07:00
Bo Chen
96209e7a16 vmm: Remove the explicit call to 'Snapshottable:restore()'
The restore path of MemoryManager is handled specially without
implementing a `Snapshottable:restore()`. Removing the explicit call to
it along the migration code path to avoid confusions.

See: #4783

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-10-17 10:07:44 -07:00
dependabot[bot]
58066e2da4 build: Bump clap from 4.0.14 to 4.0.15
Bumps [clap](https://github.com/clap-rs/clap) from 4.0.14 to 4.0.15.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.0.14...v4.0.15)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-14 00:12:55 +00:00
Sebastien Boeuf
099cdd2af8 virtio-devices, vmm: vdpa: Implement live migration support
Vdpa now implements the Migratable trait, which allows the device to be
added to the DeviceTree and therefore allows live migrating any vDPA
device that supports being suspended.

Given a vDPA device can't be resumed from a suspended state without
having to reset everything, we don't support pause/resume for a vDPA
device, as well as snapshot/restore (which requires resume to be
supported).

In order for the migration to work locally, reusing the same device on
the same host machine, the vhost-vdpa handler is dropped after the
snapshot has been performed, which allows the destination VM to open the
device without any conflict about the device being busy.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00