cloud-hypervisor

mirror of https://github.com/cloud-hypervisor/cloud-hypervisor.git synced 2024-11-05 19:41:27 +00:00

Author	SHA1	Message	Date
William Douglas	46f6d9597d	vmm: Switch to using the serial_manager for serial input This change switches from handling serial input in the VMM thread to its own thread controlled by the SerialManager. The motivation for this change is to avoid the VMM thread being unable to process events while serial input is happening and vice versa. The change also makes future work flushing the serial buffer on PTY connections easier. Signed-off-by: William Douglas <william.douglas@intel.com>	2021-09-17 11:15:35 +01:00
Alyssa Ross	330b5ea3be	vmm: notify virtio-console of pty resizes When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)), the kernel will send a SIGWINCH signal to the pty's foreground process group to notify it of the resize. This is the only way to be notified by the kernel of a pty resize. We can't just make the cloud-hypervisor process's process group the foreground process group though, because a process can only set the foreground process group of its controlling terminal, and cloud-hypervisor's controlling terminal will often be the terminal the user is running it in. To work around this, we fork a subprocess in a new process group, and set its process group to be the foreground process group of the pty. The subprocess additionally must be running in a new session so that it can have a different controlling terminal. This subprocess writes a byte to a pipe every time the pty is resized, and the virtio-console device can listen for this in its epoll loop. Alternatives I considered were to have the subprocess just send SIGWINCH to its parent, and to use an eventfd instead of a pipe. I decided against the signal approach because re-purposing a signal that has a very specific meaning (even if this use was only slightly different to its normal meaning) felt unclean, and because it would have required using pidfds to avoid race conditions if cloud-hypervisor had terminated, which added complexity. I decided against using an eventfd because using a pipe instead allows the child to be notified (via poll(2)) when nothing is reading from the pipe any more, meaning it can be reliably notified of parent death and terminate itself immediately. I used clone3(2) instead of fork(2) because without CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal handlers, and there's no other straightforward way to restore all signal handlers to their defaults in the child process. The only way to do it would be to iterate through all possible signals, or maintain a global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is insufficient because it doesn't take into account e.g. the SIGSYS signal handler that catches seccomp violations). Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-14 15:43:25 +01:00
Alyssa Ross	28382a1491	virtio-devices: determine tty size in console This prepares us to be able to handle console resizes in the console device's epoll loop, which we'll have to do if the output is a pty, since we won't get SIGWINCH from it. Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-14 15:43:25 +01:00
Rob Bradford	d64a77a5c6	vmm: Shutdown VMM if signal thread panics See: #3031 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	e0d05683ab	vmm: Split up functions for creating signal handler and tty setup These are quite separate and should be in their own functions. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	387753ae1d	vmm: Remove concept of "input_enabled" This concept ends up being broken with multiple types on input connected e.g. console on TTY and serial on PTY. Already the code for checking for injecting into the serial device checks that the serial is configured. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Rob Bradford	0dbb2683e3	vmm: Consolidate duplicated code for setting up signal handler Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-08 11:26:48 -07:00
Alyssa Ross	7549149bb5	vmm: ensure signal handlers run on the right thread Despite setting up a dedicated thread for signal handling, we weren't making sure that the signals we were listening for there were actually dispatched to the right thread. While the signal-hook provides an iterator API, so we can know that we're only processing the signals coming out of the iterator on our signal handling thread, the actual signal handling code from signal-hook, which pushes the signals onto the iterator, can run on any thread. This can lead to seccomp violations when the signal-hook signal handler does something that isn't allowed on that thread by our seccomp policy. To reproduce, resize a terminal running cloud-hypervisor continuously for a few minutes. Eventually, the kernel will deliver a SIGWINCH to a thread with a restrictive seccomp policy, and a seccomp violation will trigger. As part of this change, it's also necessary to allow rt_sigreturn(2) on the signal handling thread, so signal handlers are actually allowed to run on it. The fact that this didn't seem to be needed before makes me think that signal handlers were almost _never_ actually running on the signal handling thread. Signed-off-by: Alyssa Ross <hi@alyssa.is>	2021-09-02 21:33:31 +01:00
Rob Bradford	c2144b5690	vmm, virtio-console: Move input reading into virtio-console thread Move the processing of the input from stdin, PTY or file from the VMM thread to the existing virtio-console thread. The handling of the resize of a virtio-console has not changed but the name of the struct used to support that has been renamed to reflect its usage. Fixes: #3060 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-09-02 21:17:33 +01:00
Henry Wang	0d01eac1d4	vmm: Do the downcast of GicDevice in a safer way for AArch64 Downcasting of GicDevice trait might fail. Therefore we try to downcast the trait first and only if the downcasting succeeded we can then use the object to call methods. Otherwise, do nothing and log the failure. Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-09-02 15:18:41 +01:00
Rob Bradford	4d2a4e2805	vmm: Handle epoll events for PTYs separately Use two separate events for the console and serial PTY and then drive the handling of the inputs on the PTY separately. This results in the correct behaviour when both console and serial are attached to the PTY as they are triggered separately on the epoll so events are not lost. Fixes: #3012 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-08-25 13:33:32 +01:00
Rob Bradford	6233f6f68e	vmm: Send tty input to correct destination Check the config to find out which device is attached to the tty and then send the input from the user into that device (serial or virtio-console.) Fixes: #3005 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-08-25 10:08:25 +01:00
Bo Chen	7d38a1848b	virtio-devices, vmm: Fix the '--seccomp false' option We are relying on applying empty 'seccomp' filters to support the '--seccomp false' option, which will be treated as an error with the updated 'seccompiler' crate. This patch fixes this issue by explicitly checking whether the 'seccomp' filter is empty before applying the filter. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-18 10:42:19 +02:00
Bo Chen	08ac3405f5	virtio-devices, vmm: Move to the seccompiler crate Fixes: #2929 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-08-18 10:42:19 +02:00
Rob Bradford	53b2e19934	vmm: Add support for hotplugging user devices Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-08-12 13:19:04 +01:00
Henry Wang	5a0a4bc505	arch: Add optional `distance-map` node to FDT The optional device tree node distance-map describes the relative distance (memory latency) between all NUMA nodes. Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-08-12 10:49:02 +02:00
Henry Wang	165364e08b	vmm: Move NUMA node data structures to `arch` This is to make sure the NUMA node data structures can be accessed both from the `vmm` crate and `arch` crate. Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-08-12 10:49:02 +02:00
Henry Wang	20aa811de7	vmm: Extend NUMA setup to more than ACPI The AArch64 platform provides a NUMA binding for the device tree, which means on AArch64 platform, the NUMA setup can be extended to more than the ACPI feature. Based on above, this commit extends the NUMA setup and data structures to following scenarios: - All AArch64 platform - x86_64 platform with ACPI feature enabled Signed-off-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Michael Zhao <Michael.Zhao@arm.com>	2021-08-12 10:49:02 +02:00
Sebastien Boeuf	5a83ebce64	vmm: Notify Migratable objects about migration being complete Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-08-10 12:36:58 -07:00
Sebastien Boeuf	06729bb3ba	vmm: Provide a restoring state to the DeviceManager In anticipation for creating vhost-user devices in a different way when being restored compared to a fresh start, this commit introduces a new boolean created by the Vm depending on the use case, and passed down to the DeviceManager. In the future, the DeviceManager will use this flag to assess how vhost-user devices should be created. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-08-10 12:36:58 -07:00
Henry Wang	7fb980f17b	arch, vmm: Pass cpu topology configuation to FDT In an Arm system, the hierarchy of CPUs is defined through three entities that are used to describe the layout of physical CPUs in the system: - cluster - core - thread All these three entities have their own FDT node field. Therefore, This commit adds an AArch64-specific helper to pass the config from the Cloud Hypervisor command line to the `configure_system`, where eventually the `create_fdt` is called. Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-08-05 21:19:16 +08:00
Sebastien Boeuf	5c6139bbff	vmm: Finalize migration support for all devices Make sure the DeviceManager is triggered for all migration operations. The dirty pages are merged from MemoryManager and DeviceManager before to be sent up to the Vmm in lib.rs. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-08-05 06:07:00 -07:00
Sebastien Boeuf	0411064271	vmm: Refactor migration through Migratable trait Now that Migratable provides the methods for starting, stopping and retrieving the dirty pages, we move the existing code to these new functions. No functional change intended. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-08-05 06:07:00 -07:00
Sebastien Boeuf	dcc646f5b1	clippy: Fix redundant allocations With the new beta version, clippy complains about redundant allocation when using Arc<Box<dyn T>>, and suggests replacing it simply with Arc<dyn T>. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-29 13:28:57 +02:00
Bo Chen	b00a6a8519	vmm: Create guest memory regions with explicit dirty-pages-log flags As we are now using an global control to start/stop dirty pages log from the `hypervisor` crate, we need to explicitly tell the hypervisor (KVM) whether a region needs dirty page tracking when it is created. This reverts commit `f063346de3`. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-28 09:08:32 -07:00
Bo Chen	ca09638491	vmm: Add CPUID compatibility check for snapshot/restore Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-28 09:26:02 +02:00
Bo Chen	f063346de3	vmm: Create guest memory regions without dirty-pages-log by default With the support of dynamically turning on/off dirty-pages-log during live-migration (only for guest RAM regions), we now can create guest memory regions without dirty-pages-log by default both for guest RAM regions and other regions backed by file/device. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-26 09:19:35 -07:00
Bo Chen	5e0d498582	hypervisor, vmm: Add dynamic control of logging dirty pages This patch extends slightly the current live-migration code path with the ability to dynamically start and stop logging dirty-pages, which relies on two new methods added to the `hypervisor::vm::Vm` Trait. This patch also contains a complete implementation of the two new methods based on `kvm` and placeholders for `mshv` in the `hypervisor` crate. Fixes: #2858 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-07-26 09:19:35 -07:00
Sebastien Boeuf	3e482c9c74	vmm: Limit physical address space for TDX When running TDX guest, the Guest Physical Address space is limited by a shared bit that is located on bit 47 for 4 level paging, and on bit 51 for 5 level paging (when GPAW bit is 1). In order to keep things simple, and since a 47 bits address space is 128TiB large, we ensure to limit the physical addressable space to 47 bits when runnning TDX. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-20 15:00:04 +02:00
Sebastien Boeuf	05f7651cf5	vmm: Force VIRTIO_F_IOMMU_PLATFORM when running TDX When running a TDX guest, we need the virtio drivers to use the DMA API to share specific memory pages with the VMM on the host. The point is to let the VMM get access to the pages related to the buffers pointed by the virtqueues. The way to force the virtio drivers to use the DMA API is by exposing the virtio devices with the feature VIRTIO_F_IOMMU_PLATFORM. This is a feature indicating the device will require some address translation, as it will not deal directly with physical addresses. Cloud Hypervisor takes care of this requirement by adding a generic parameter called "force_iommu". This parameter value is decided based on the "tdx" feature gate, and then passed to the DeviceManager. It's up to the DeviceManager to use this parameter on every virtio device creation, which will imply setting the VIRTIO_F_IOMMU_PLATFORM feature. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-20 14:47:01 +02:00
Sebastien Boeuf	6b710209b1	numa: Add optional `sgx_epc_sections` field to NumaConfig This new option allows the user to define a list of SGX EPC sections attached to a specific NUMA node. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-09 14:45:30 +02:00
Sebastien Boeuf	17c99ae00a	vmm: Enable provisioning for SGX guest The guest can see that SGX supports provisioning as it is exposed through the CPUID. This patch enables the proper backing of this feature by having the host open the provisioning device and enable this capability through the hypervisor. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-07 14:56:38 +02:00
Sebastien Boeuf	5b6d424a77	arch, vmm: Fix TDVF section handling This patch fixes a few things to support TDVF correctly. The HOB memory resources must contain EFI_RESOURCE_ATTRIBUTE_ENCRYPTED attribute. Any section with a base address within the already allocated guest RAM must not be allocated. The list of TD_HOB memory resources should contain both TempMem and TdHob sections as well. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-07-06 11:47:43 +02:00
Henry Wang	4da3bdcd6e	vmm: Split restore device_manager and devices Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-07-05 22:51:56 +02:00
Henry Wang	95ca4fb15e	vmm: vm: Enable snapshot/restore of GICv3ITS This commit enables the snapshot/restore of GICv3ITS in the process of VM snapshot/restore. Signed-off-by: Henry Wang <Henry.Wang@arm.com>	2021-07-05 22:51:56 +02:00
Wei Liu	1f2915bff0	vmm: hypervisor: split set_user_memory_region to two functions Previously the same function was used to both create and remove regions. This worked on KVM because it uses size 0 to indicate removal. MSHV has two calls -- one for creation and one for removal. It also requires having the size field available because it is not slot based. Split set_user_memory_region to {create/remove}_user_memory_region. For KVM they still use set_user_memory_region underneath, but for MSHV they map to different functions. This fixes user memory region removal on MSHV. Signed-off-by: Wei Liu <liuwe@microsoft.com>	2021-07-05 09:45:45 +02:00
Michael Zhao	239e39ddbc	vmm: Fix clippy warnings on AArch64 Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-06-24 08:59:53 -07:00
Bo Chen	5768dcc320	vmm: Refactor slightly `vm_boot` and 'control_loop' It ensures all handlers for `ApiRequest` in `control_loop` are consistent and minimum and should read better. No functional changes. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-24 16:01:39 +02:00
Bo Chen	c4be0f4235	clippy: Address the issue 'needless-collect' error: avoid using `collect()` when not needed --> vmm/src/vm.rs:630:86 \| 630 \| let node_id_list: Vec<u32> = configs.iter().map(\|cfg\| cfg.guest_numa_id).collect(); \| ^^^^^^^ ... 664 \| if !node_id_list.contains(&dest) { \| ---------------------------- the iterator could be used here instead \| = note: `-D clippy::needless-collect` implied by `-D warnings` = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_collect Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-24 08:55:43 +02:00
Bo Chen	5825ab2dd4	clippy: Address the issue 'needless-borrow' Issue from beta verion of clippy: Error: --> vm-virtio/src/queue.rs:700:59 \| 700 \| if let Some(used_event) = self.get_used_event(&mem) { \| ^^^^ help: change this to: `mem` \| = note: `-D clippy::needless-borrow` implied by `-D warnings` = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-24 08:55:43 +02:00
Sebastien Boeuf	a36ac96444	vmm: cpu_manager: Add _PXM ACPI method to each vCPU In order to allow a hotplugged vCPU to be assigned to the correct NUMA node in the guest, the DSDT table must expose the _PXM method for each vCPU. This method defines the proximity domain to which each vCPU should be attached to. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-06-17 16:08:46 +02:00
Jianyong Wu	b8b5dccfd8	aarch64: Enable UEFI image loading Implemented an architecture specific function for loading UEFI binary. Changed the logic of loading kernel image: 1. First try to load the image as kernel in PE format; 2. If failed, try again to load it as formatless UEFI binary. Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>	2021-06-09 18:36:59 +08:00
Rob Bradford	3dc15a9259	vmm: tdx: Don't access same locked structure twice Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-06-03 17:29:05 +02:00
Bo Chen	2c4fa258a6	virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)" Function "GuestMemory::with_regions(_mut)" were mainly temporary methods to access the regions in `GuestMemory` as the lack of iterator-based access, and hence they are deprecated in the upstream vm-memory crate [1]. [1] https://github.com/rust-vmm/vm-memory/issues/133 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-03 08:34:45 +01:00
Bo Chen	b5bcdbaf48	misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking As the first step to complete live-migration with tracking dirty-pages written by the VMM, this commit patches the dependent vm-memory crate to the upstream version with the dirty-page-tracking capability. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter to specify what 'bitmap backend' is used. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-03 08:34:45 +01:00
Rob Bradford	c357adae44	vmm: tdx: Clear unsupported KVM PV features This matches with the features that QEMU clears as they are not supported with TDX. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-06-01 23:00:54 +02:00
Michael Zhao	7f3fa39d81	vmm: Remove enable_interrupt_controller() After adding "get_interrupt_controller()" function in DeviceManager, "enable_interrupt_controller()" became redundant, because the latter one is the a simple wrapper on the interrupt controller. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-06-01 16:56:43 +01:00
Michael Zhao	9a5f3fc2a7	vmm: Remove "gicr" handling from DeviceManager The function used to calculate "gicr-typer" value has nothing with DeviceManager. Now it is moved to AArch64 specific files. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-06-01 16:56:43 +01:00
Michael Zhao	7932cd22ca	vmm: Remove GIC entity set/get from DeviceManager Moved the set/get functions from vmm::DeviceManager to devices::Gic. Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-06-01 16:56:43 +01:00
Michael Zhao	195eba188a	vmm: Split create_gic() from configure_system() Signed-off-by: Michael Zhao <michael.zhao@arm.com>	2021-06-01 16:56:43 +01:00

1 2 3 4 5 ...

414 Commits