cloud-hypervisor

mirror of https://github.com/cloud-hypervisor/cloud-hypervisor.git synced 2024-12-23 14:15:19 +00:00

Author	SHA1	Message	Date
Sebastien Boeuf	63c6c78c4e	vmm: memory_manager: Factorize configuration validation In order to simplify MemoryManager::new() function. let's move the memory configuration validation to its own function. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-10-06 18:35:49 -07:00
Rob Bradford	84fc0e093d	vmm: Move PciSegment to new file Move the PciSegment struct and the associated code to a new file. This will allow some clearer separation between the core DeviceManager and PCI handling. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-10-05 10:54:07 +01:00
Rob Bradford	0eb78ab177	vmm: Extract PCI related state from DeviceManager Move the PCI related state from the DeviceManager struct to a PciSegment struct inside the DeviceManager. This is in preparation for multiple segment support. Currently this state is just the bus itself, the MMIO and PIO config devices and hotplug related state. The main change that this required is using the Arc<Mutex<PciBus>> in the device addition logic in order to ensure that the bus could be created earlier. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-10-05 10:54:07 +01:00
Bo Chen	1a4747a20f	Build: Seccompiler: Move to use the released version from crate.io Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-10-01 11:34:54 -07:00
Rob Bradford	83066cf58e	vmm: Set a default maximum physical address size When using PVH for booting (which we use for all firmwares and direct kernel boot) the Linux kernel does not configure LA57 correctly. As such we need to limit the address space to the maximum 4-level paging address space. If the user knows that their guest image can take advantage of the 5-level addressing and they need it for their workload then they can increase the physical address space appropriately. This PR removes the TDX specific handling as the new address space limit is below the one that that code specified. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-10-01 08:59:15 -07:00
Sebastien Boeuf	495e444ca6	vmm: Add ACPI tables to TdVmmData when running TDX Whenever running TDX, we must pass the ACPI tables to the TDVF firmware running in the guest. The proper way to do this is by adding the tables to the TdHob as a TdVmmData type, so that TDVF will know how to access these tables and expose them to the guest OS. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-30 06:35:55 -07:00
Sebastien Boeuf	b99a3a7dc9	vmm: Factorize ACPI tables creation inside boot() function Instead of having the ACPI tables being created both in x86_64 and aarch64 implementations of configure_system(), we can remove the duplicated code by moving the ACPI tables creation in vm.rs inside the boot() function. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-30 06:35:55 -07:00
Yu Li	08021087ec	vmm: add prefault option in memory and memory-zone The argument `prefault` is provided in MemoryManager, but it can only be used by SGX and restore. With prefault (MAP_POPULATE) been set, subsequent page faults will decrease during running, although it will make boot slower. This commit adds `prefault` in MemoryConfig and MemoryZoneConfig. To resolve conflict between memory and restore, argument `prefault` has been changed from `bool` to `Option<bool>`, when its value is None, config from memory will be used, otherwise argument in Option will be used. Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>	2021-09-29 14:17:35 +02:00
Sebastien Boeuf	59031531b6	vmm: Simplify the way memory is snapshot and restored By using a single file for storing the memory ranges, we simplify the way snapshot/restore works by avoiding multiples files, but the main and more important point is that we have now a way to save only the ranges that matter. In particular, the ranges related to virtio-mem regions are not always fully hotplugged, meaning we don't want to save the entire region. That's where the usage of memory ranges is interesting as it lets us optimize the snapshot/restore process when one or multiple virtio-mem regions are involved. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	1ea63f50a1	vmm: Move MemoryRangeTable creation to the MemoryManager The function memory_range_table() will be reused by the MemoryManager in a following patch to describe all the ranges that we should snapshot. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	86f86c5348	vmm: Optimize migration for virtio-mem Copy only the memory ranges that have been plugged through virtio-mem, allowing for an interesting optimization regarding the time it takes to migrate a large virtio-mem device. Even if the hotpluggable space is very large (say 64GiB), if only 1GiB has been previously added to the VM, only 1GiB will be sent to the destination VM, avoiding the transfer of the remaining 63GiB which are unused. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	e390775bcb	vmm, virtio-devices: Move BlocksState creation to the MemoryManager By creating the BlocksState object in the MemoryManager, we can directly provide it to the virtio-mem device when being created. This will allow the MemoryManager through each VirtioMemZone to have a handle onto the blocks that are plugged at any point in time. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	a1caa6549a	vmm: Add page size as a parameter for MemoryRangeTable::from_bitmap() This will be helpful to support the creation of a MemoryRangeTable from virtio-mem, as it uses 2M pages. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	d7115ec656	virtio-devices: mem: Add snapshot/restore support Adding the snapshot/restore support along with migration as well, allowing a VM with virtio-mem devices attached to be properly migrated. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	7bbcc0f849	vmm: memory_manager: Make sure the hotplugged_size is up to date The amount of memory plugged in the virtio-mem region should always be kept up to date in the hotplugged_size field from VirtioMemZone. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	c4dc7a583d	vmm: memory_manager: Simplify the MemoryManager structure There's no need to duplicate the GuestMemory for snapshot purpose, as we always have a handle onto the GuestMemory through the guest_memory field. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Sebastien Boeuf	74485924b1	vmm: memory_manager: Simplification to avoid unnecessary locking Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-28 10:15:22 -07:00
Rob Bradford	4889999277	vmm: Only advertise a single PCI bus Since we only support a single PCI bus right now advertise only a single bus in the ACPI tables. This reduces the number of VM exits from probing substantially. Number of PCI config I/O port exits: 17871 -> 1551 (91% reduction) with direct kernel boot. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-28 14:10:10 +02:00
dependabot[bot]	eda0dc20d3	build: bump libc from 0.2.102 to 0.2.103 Bumps [libc](https://github.com/rust-lang/libc) from 0.2.102 to 0.2.103. - [Release notes](https://github.com/rust-lang/libc/releases) - [Commits](https://github.com/rust-lang/libc/compare/0.2.102...0.2.103) --- updated-dependencies: - dependency-name: libc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2021-09-28 10:45:35 +00:00
Rob Bradford	b50519651c	vmm: Simplify slot eject code in PCI ACPI device code Use a simpler method for extracting the affected slot on the eject command. Also update the terminology to reflect that this a slot rather than a bdf (which is what device id refers to elsewhere.) Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-28 12:03:23 +02:00
William Douglas	a8f063db7c	vmm: Refactor serial buffer to allow flush on PTY when writable Refactor the serial buffer handling in order to write the serial buffer's output to a PTY connected after the serial device stops being written to by the guest. This change moves the serial buffer initialization inside the serial manager. That is done to allow the serial buffer to be made aware of the PTY and epoll fds needed in order to modify the EpollDispatch::File trigger. These are then used by the serial buffer to trigger an epoll event when the PTY fd is writable and the buffer has content in it. They are also used to remove the trigger when the buffer is emptied in order to avoid unnecessary wake-ups. Signed-off-by: William Douglas <william.douglas@intel.com>	2021-09-27 14:18:21 +01:00
Sebastien Boeuf	b910a7922d	vmm: Fix migration when writing/reading big chunks of data Both read_exact_from() and write_all_to() functions from the GuestMemory trait implementation in vm-memory are buggy. They should retry until they wrote or read the amount of data that was expected, but instead they simply return an error when this happens. This causes the migration to fail when trying to send important amount of data through the migration socket, due to large memory regions. This should be eventually fixed in vm-memory, and here is the link to follow up on the issue: https://github.com/rust-vmm/vm-memory/issues/174 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-27 11:13:56 +02:00
Rob Bradford	1a2d0e6dd8	build: bump linux-loader from 0.3.0 to 0.4.0 Requires manual change to command line loading. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-24 09:11:57 +00:00
Michael Zhao	d72af85c42	vmm: Add "_CCA" field to ACPI DSDT table "_CCA" is required by DMA configuration on AArch64. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-09-24 07:57:57 +01:00
Rob Bradford	43365ade2e	vmm, pci: Implement virtio-mem support for vfio-user Implement the infrastructure that lets a virtio-mem device map the guest memory into the device. This is necessary since with virtio-mem zones memory can be added or removed and the vfio-user device must be informed. Fixes: #3025 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-21 15:42:49 +01:00
Rob Bradford	e9d67dc405	vmm: pci: Move creation of vfio_user::Client to DeviceManager By moving this from the VfioUserPciDevice to DeviceManager the client can be reused for handling DMA mapping behind an IOMMU. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-21 15:42:49 +01:00
Rob Bradford	fd4f32fa69	virtio-mem: Support multiple mappings For vfio-user the mapping handler is per device and needs to be removed when the device in unplugged. For VFIO the mapping handler is for the default VFIO container (used when no vIOMMU is used - using a vIOMMU does not require mappings with virtio-mem) To represent these two use cases use an enum for the handlers that are stored. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-21 15:42:49 +01:00
dependabot[bot]	d826b4fbdc	build: bump arc-swap from 1.3.2 to 1.4.0 Bumps [arc-swap](https://github.com/vorner/arc-swap) from 1.3.2 to 1.4.0. - [Release notes](https://github.com/vorner/arc-swap/releases) - [Changelog](https://github.com/vorner/arc-swap/blob/master/CHANGELOG.md) - [Commits](https://github.com/vorner/arc-swap/compare/v1.3.2...v1.4.0) --- updated-dependencies: - dependency-name: arc-swap dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2021-09-19 17:12:50 +00:00
Rob Bradford	0faa7afac2	vmm: Add fast path for PCI config IO port Looking up devices on the port I/O bus is time consuming during the boot at there is an O(lg n) tree lookup and the overhead from taking a lock on the bus contents. Avoid this by adding a fast path uses the hardcoded port address and size and directs PCI config requests directly to the device. Command line: target/release/cloud-hypervisor --kernel ~/src/linux/vmlinux --cmdline "root=/dev/vda1 console=ttyS0" --serial tty --console off --disk path=~/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw --api-socket /tmp/api PIO exit: 17913 PCI fast path: 17871 Percentage on fast path: 99.8% perf before: marvin:~/src/cloud-hypervisor (main )$ perf report -g \| grep resolve 6.20% 6.20% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve perf after: marvin:~/src/cloud-hypervisor (2021-09-17-ioapic-fast-path )$ perf report -g \| grep resolve 0.08% 0.08% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve The compromise required to implement this fast path is bringing the creation of the PciConfigIo device into the DeviceManager::new() so that it can be used in the VmmOps struct which is created before DeviceManager::create_devices() is called. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-17 17:09:45 +01:00
Michael Zhao	b3fa56544c	virtio-devices: iommu: Support AArch64 The MSI IOVA address on X86 and AArch64 is different. This commit refactored the code to receive the MSI IOVA address and size from device_manager, which provides the actual IOVA space data for both architectures. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-09-17 12:19:46 +02:00
Michael Zhao	253c06d3ba	arch/aarch64: Add virtio-iommu device in FDT Add a virtio-iommu node into FDT if iommu option is turned on. Now we support only one virtio-iommu device. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-09-17 12:19:46 +02:00
William Douglas	46f6d9597d	vmm: Switch to using the serial_manager for serial input This change switches from handling serial input in the VMM thread to its own thread controlled by the SerialManager. The motivation for this change is to avoid the VMM thread being unable to process events while serial input is happening and vice versa. The change also makes future work flushing the serial buffer on PTY connections easier. Signed-off-by: William Douglas <william.douglas@intel.com>	2021-09-17 11:15:35 +01:00
William Douglas	7b4f56e372	vmm: Add new serial_manager for serial input handling This change adds a SerialManager with its own epoll handling that should be created and run by the DeviceManager when creating an appropriately configured console (serial tty or pty). Both stdin and pty input are handled by the SerialManager. The stdin and pty specific methods used by the VMM should be removed in a future commit. Signed-off-by: William Douglas <william.douglas@intel.com>	2021-09-17 11:15:35 +01:00
William Douglas	d6a2f48b32	vmm: device_manager: Make PtyPair implement Clone The clone method for PtyPair should have been an impl of the Clone trait but the method ended up not being used. Future work will make use of the trait however so correct the missing trait implementation. Signed-off-by: William Douglas <william.douglas@intel.com>	2021-09-17 11:15:35 +01:00
dependabot[bot]	f67b3f79ea	build: bump vmm-sys-util from 0.8.0 to 0.9.0 Bumps [vmm-sys-util](https://github.com/rust-vmm/vmm-sys-util) from 0.8.0 to 0.9.0. - [Release notes](https://github.com/rust-vmm/vmm-sys-util/releases) - [Changelog](https://github.com/rust-vmm/vmm-sys-util/blob/main/CHANGELOG.md) - [Commits](https://github.com/rust-vmm/vmm-sys-util/compare/v0.8.0...v0.9.0) --- updated-dependencies: - dependency-name: vmm-sys-util dependency-type: direct:production update-type: version-update:semver-minor ... This needed a bunch of manual updates as well, including vfio-ioctls and vhost crates. The vhost crate is being patched with the latest version from rust-vmm because the version 0.1.0 on crates.io doesn't include the patches we need yet. Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-16 14:01:19 +01:00
dependabot[bot]	c1e896dddb	build: bump libc from 0.2.101 to 0.2.102 Bumps [libc](https://github.com/rust-lang/libc) from 0.2.101 to 0.2.102. - [Release notes](https://github.com/rust-lang/libc/releases) - [Commits](https://github.com/rust-lang/libc/compare/0.2.101...0.2.102) --- updated-dependencies: - dependency-name: libc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2021-09-15 17:23:46 +00:00
Sebastien Boeuf	a6040d7a30	vmm: Create a single VFIO container For most use cases, there is no need to create multiple VFIO containers as it causes unwanted behaviors. Especially when passing multiple devices from the same IOMMU group, we need to use the same container so that it can properly list the groups that have been already opened. The correct logic was already there in vfio-ioctls, but it was incorrectly used from our VMM implementation. For the special case where we put a VFIO device behind a vIOMMU, we must create one container per device, as we need to control the DMA mappings per device, which is performed at the container level. Because we must keep one container per device, the vIOMMU use case prevents multiple devices attached to the same IOMMU group to be passed through the VM. But this is a limitation that we are fine with, especially since the vIOMMU doesn't let us group multiple devices in the same group from a guest perspective. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-09-15 09:08:13 -07:00
dependabot[bot]	8836715c2d	build: bump serde_json from 1.0.67 to 1.0.68 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.67 to 1.0.68. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.67...v1.0.68) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2021-09-15 00:06:23 +00:00
Alyssa Ross	330b5ea3be	vmm: notify virtio-console of pty resizes When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)), the kernel will send a SIGWINCH signal to the pty's foreground process group to notify it of the resize. This is the only way to be notified by the kernel of a pty resize. We can't just make the cloud-hypervisor process's process group the foreground process group though, because a process can only set the foreground process group of its controlling terminal, and cloud-hypervisor's controlling terminal will often be the terminal the user is running it in. To work around this, we fork a subprocess in a new process group, and set its process group to be the foreground process group of the pty. The subprocess additionally must be running in a new session so that it can have a different controlling terminal. This subprocess writes a byte to a pipe every time the pty is resized, and the virtio-console device can listen for this in its epoll loop. Alternatives I considered were to have the subprocess just send SIGWINCH to its parent, and to use an eventfd instead of a pipe. I decided against the signal approach because re-purposing a signal that has a very specific meaning (even if this use was only slightly different to its normal meaning) felt unclean, and because it would have required using pidfds to avoid race conditions if cloud-hypervisor had terminated, which added complexity. I decided against using an eventfd because using a pipe instead allows the child to be notified (via poll(2)) when nothing is reading from the pipe any more, meaning it can be reliably notified of parent death and terminate itself immediately. I used clone3(2) instead of fork(2) because without CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal handlers, and there's no other straightforward way to restore all signal handlers to their defaults in the child process. The only way to do it would be to iterate through all possible signals, or maintain a global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is insufficient because it doesn't take into account e.g. the SIGSYS signal handler that catches seccomp violations). Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-14 15:43:25 +01:00
Alyssa Ross	28382a1491	virtio-devices: determine tty size in console This prepares us to be able to handle console resizes in the console device's epoll loop, which we'll have to do if the output is a pty, since we won't get SIGWINCH from it. Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-14 15:43:25 +01:00
dependabot[bot]	f3778a7fc7	build: bump anyhow from 1.0.43 to 1.0.44 Bumps [anyhow](https://github.com/dtolnay/anyhow) from 1.0.43 to 1.0.44. - [Release notes](https://github.com/dtolnay/anyhow/releases) - [Commits](https://github.com/dtolnay/anyhow/compare/1.0.43...1.0.44) --- updated-dependencies: - dependency-name: anyhow dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2021-09-14 00:22:00 +00:00
Alyssa Ross	8abe8c679b	seccomp: allow mmap everywhere brk is allowed Musl often uses mmap to allocate memory where Glibc would use brk. This has caused seccomp violations for me on the API and signal handling threads. Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-10 12:01:31 -07:00
Rob Bradford	b6b686c71c	vmm: Shutdown VMM if API thread panics See: #3031 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-10 10:52:08 -07:00
Rob Bradford	171d12943d	vmm: memory_manager: Increase robustness of MemoryManager control device See: #1289 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-10 10:23:19 -07:00
Rob Bradford	bdc44cd8bc	vmm: cpu: Increase robustness of CpuManager control device See: #1289 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-10 10:22:05 -07:00
Bo Chen	4f37a273d9	vmm: Fix clippy issue error: all if blocks contain the same code at the end --> vmm/src/memory_manager.rs:884:9 \| 884 \| / Ok(mm) 885 \| \| } \| \|_________^ Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-09-08 13:31:19 -07:00
Rob Bradford	d64a77a5c6	vmm: Shutdown VMM if signal thread panics See: #3031 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	e0d05683ab	vmm: Split up functions for creating signal handler and tty setup These are quite separate and should be in their own functions. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	387753ae1d	vmm: Remove concept of "input_enabled" This concept ends up being broken with multiple types on input connected e.g. console on TTY and serial on PTY. Already the code for checking for injecting into the serial device checks that the serial is configured. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	951ad3495e	vmm: Only resize virtio-console when attached to TTY Fixes: #3092 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00

1 2 3 4 5 ...

1477 Commits