cloud-hypervisor

mirror of https://github.com/cloud-hypervisor/cloud-hypervisor.git synced 2025-01-19 02:55:20 +00:00

Author	SHA1	Message	Date
Bo Chen	7839e121f6	vmm: Add dirty pages tracked by vm_memory::bitmap to live migration Live migration currently handles guest memory writes from the guest through the KVM dirty page tracking and sends those dirty pages to the destination. This patch augments the live migration support with dirty page tracking of writes from the VMM to the guest memory(e.g. virtio devices). Fixes: #2458 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-03 08:34:45 +01:00
Bo Chen	2c4fa258a6	virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)" Function "GuestMemory::with_regions(_mut)" were mainly temporary methods to access the regions in `GuestMemory` as the lack of iterator-based access, and hence they are deprecated in the upstream vm-memory crate [1]. [1] https://github.com/rust-vmm/vm-memory/issues/133 Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-03 08:34:45 +01:00
Bo Chen	b5bcdbaf48	misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking As the first step to complete live-migration with tracking dirty-pages written by the VMM, this commit patches the dependent vm-memory crate to the upstream version with the dirty-page-tracking capability. Most changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and `MmapRegion` structs which are taking an additional generic type parameter to specify what 'bitmap backend' is used. The above changes should be transparent to the rest of the code base, e.g. all unit/integration tests should pass without additional changes. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-06-03 08:34:45 +01:00
Rob Bradford	f840327ffb	vmm: Version MemoryManager state Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-05-21 15:29:52 +02:00
Rob Bradford	496ceed1d0	misc: Remove unnecessary "extern crate" Now all crates use edition = "2018" then the majority of the "extern crate" statements can be removed. Only those for importing macros need to remain. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-05-12 17:26:11 +02:00
Rob Bradford	b8f5911c4e	misc: Remove unused errors from public interface Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-05-11 13:37:19 +02:00
Mikko Ylinen	3b18caf229	sgx: update virt EPC device path and docs The latest kvm-sgx code has renamed sgx_virt_epc device node to sgx_vepc. Update cloud-hypervisor code and documentation to follow this. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-04-30 16:16:01 +02:00
Rob Bradford	01f0c1e313	vmm: Simplify memory state to support Versionize In order to support using Versionize for state structures it is necessary to use simpler, primitive, data types in the state definitions used for snapshot restore. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-04-23 14:24:16 +01:00
Rob Bradford	a7c4483b8b	vmm: Directly (de)serialise CpuManager, DeviceManager and MemoryManager state Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-04-20 18:58:37 +02:00
Bo Chen	78796f96b7	vmm: Refine the granularity of dirty memory tracking Instead of tracking on a block level of 64 pages, we are now collecting dirty pages one by one. It improves the efficiency of dirty memory tracking while live migration. Signed-off-by: Bo Chen <chen.bo@intel.com>	2021-04-19 17:17:14 +02:00
Anatol Belski	e1cc702327	memory_manager: Fix address range calculation in MemorySlot The MCRS method returns a 64-bit memory range descriptor. The calculation is supposed to be done as follows: max = min + len - 1 However, every operand is represented not as a QWORD but as combination of two DWORDs for high and low part. Till now, the calculation was done this way, please see also inline comments: max.lo = min.lo + len.lo //this may overflow, need to carry over to high max.hi = min.hi + len.hi max.hi = max.hi - 1 // subtraction needs to happen on the low part This calculation has been corrected the following way: max.lo = min.lo + len.lo max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow max.lo = max.lo - 1 // subtract from low part The relevant part from the generated ASL for the MCRS method: ``` Method (MCRS, 1, Serialized) { Acquire (MLCK, 0xFFFF) \_SB.MHPC.MSEL = Arg0 Name (MR64, ResourceTemplate () { QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite, 0x0000000000000000, // Granularity 0x0000000000000000, // Range Minimum 0xFFFFFFFFFFFFFFFE, // Range Maximum 0x0000000000000000, // Translation Offset 0xFFFFFFFFFFFFFFFF, // Length ,, _Y00, AddressRangeMemory, TypeStatic) }) CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL) // _MIN: Minimum Base Address CreateDWordField (MR64, 0x12, MINH) CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL) // _MAX: Maximum Base Address CreateDWordField (MR64, 0x1A, MAXH) CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL) // _LEN: Length CreateDWordField (MR64, 0x2A, LENH) MINL = \_SB.MHPC.MHBL MINH = \_SB.MHPC.MHBH LENL = \_SB.MHPC.MHLL LENH = \_SB.MHPC.MHLH MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL / MAXH = (MINH + LENH) / \_SB_.MHPC.MCRS.LENH / If ((MAXL < MINL)) { MAXH += One / \_SB_.MHPC.MCRS.MAXH / } MAXL -= One Release (MLCK) Return (MR64) / \_SB_.MHPC.MCRS.MR64 */ } ``` Fixes #1800. Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>	2021-04-12 16:20:19 +02:00
Rob Bradford	9762c8bc28	vmm: Address Rust 1.51.0 clippy issue (upper_case_acroynms) warning: name `LocalAPIC` contains a capitalized acronym --> vmm/src/cpu.rs:197:8 \| 197 \| struct LocalAPIC { \| ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-03-26 11:32:09 +00:00
Rob Bradford	db6516931d	acpi_tables: Address Rust 1.51.0 clippy issue (upper_case_acronyms) error: name `SDT` contains a capitalized acronym --> acpi_tables/src/sdt.rs:27:12 \| 27 \| pub struct SDT { \| ^^^ help: consider making the acronym lowercase, except the initial letter: `Sdt` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-03-26 11:32:09 +00:00
Rob Bradford	ab4b30edd3	vmm: Switch MemoryManager::send() to url_to_path() This continues the work in cc78a597cd579f49971d379626021300196ef548 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-03-12 16:52:55 +01:00
Rob Bradford	cc78a597cd	vmm: Simplify snapshot/restore path handling Extend the existing url_to_path() to take the URL string and then use that to simplify the snapshot/restore code paths. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-03-12 13:03:01 +01:00
Rob Bradford	b02aff5761	vmm: memory_manager: Disable dirty page logging when running on TDX It is not permitted to have this enabled in memory that is part of a TD. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-03-08 18:30:00 +00:00
Sebastien Boeuf	d6db2fdf96	vmm: memory_manager: Add ACPI hotplug region to default memory zone When memory is resized through ACPI, a new region is added to the guest memory. This region must also be added to the corresponding memory zone in order to keep everything in sync. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2021-03-05 10:38:42 +01:00
Rob Bradford	deedfcdc35	vmm: Improve restore error message about URL conversion Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-02-23 11:07:48 +00:00
Rob Bradford	38c41a5074	vmm: memory_manager: Extract code for allocating new memory This function can then be used by the TDX code to allocate the memory at specific locations required for the TDVF to run from. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-02-16 18:38:57 +01:00
Rob Bradford	7928a697dc	vmm: Support configurable huge pages in MemoryManager Use the newly added hugepages_size option if provided by the user to pick a huge page size when creating the memfd region. If none is specified use the system default. Sadly different huge pages cannot be tested by an integration test as creating a pool of the non-default size cannot be done at runtime (requires kernel to be booted with certain parameters.) TETS=Manually tested with a kernel booted with both 1GiB and 2MiB huge pages (hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=512) Fixes: #2230 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-02-05 09:24:02 +00:00
Rob Bradford	29607f38ad	vmm: config: Add a hugepage_size option This allows the user to use an alternative huge page size otherwise the default size will be used. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-02-05 09:24:02 +00:00
Sebastien Boeuf	c397c9c95e	vmm, virtio-devices: mem: Don't use MADV_DONTNEED on hugepages This commit introduces a new information to the VirtioMemZone structure in order to know if the memory zone is backed by hugepages. Based on this new information, the virtio-mem device is now able to determine if madvise(MADV_DONTNEED) should be performed or not. The madvise documentation specifies that MADV_DONTNEED advice will fail if the memory range has been allocated with some hugepages. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: Hui Zhu <teawater@antfin.com>	2021-02-04 17:52:30 +00:00
Rob Bradford	9ea087597f	vmm: acpi: Ensure field size address matches ResourceTag dsdt.dsl 960: CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL) // _MIN: Minimum Base Address Warning 3128 - ResourceTag larger than Field ^ (Size mismatch, Tag: 64 bits, Field: 32 bits) dsdt.dsl 962: CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL) // _MAX: Maximum Base Address Warning 3128 - ResourceTag larger than Field ^ (Size mismatch, Tag: 64 bits, Field: 32 bits) dsdt.dsl 964: CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL) // _LEN: Length Warning 3128 - ResourceTag larger than Field ^ (Size mismatch, Tag: 64 bits, Field: 32 bits) Fixes: #2216 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-01-28 14:30:34 +01:00
Rob Bradford	c29caf2a85	vmm: acpi: Fix incorrect mutex timeout value The mutex timeout should be 0xffff rather than 0xfff to disable the timeout feature. dsdt.dsl 745: Acquire (\_SB.PRES.CPLK, 0x0FFF) Warning 3130 - ^ Result is not used, possible operator timeout will be missed dsdt.dsl 767: Acquire (\_SB.PRES.CPLK, 0x0FFF) Warning 3130 - ^ Result is not used, possible operator timeout will be missed dsdt.dsl 775: Acquire (\_SB.PRES.CPLK, 0x0FFF) Warning 3130 - ^ Result is not used, possible operator timeout will be missed Fixes: #2216 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-01-28 14:30:34 +01:00
Rob Bradford	6006068951	vmm: acpi: Move MemoryManager ACPI device to an MMIO address Migrate the MemoryManager from a fixed I/O port address to an allocated MMIO address. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-01-22 16:08:41 +01:00
Mikko Ylinen	f583aa9d30	sgx: update virt EPC device path and docs Based on the LKML feedback, the devices under /dev/sgx/* are not justified. SGX RFC v40 moves the SGX device nodes to /dev/sgx_* and this is reflected in kvm-sgx (next branch) too. Update cloud-hypervisor code and documentation to follow this. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2021-01-05 15:48:17 +00:00
Rob Bradford	fabd63072b	misc: Remove unnecessary literal casts Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-01-04 13:46:37 +01:00
Rob Bradford	faba6a3fb3	vmm: memory_manager: Use workaround for conditional function arguments With Rust 1.49 using attributes on a function parameter is not allowed. The recommended workaround is to put it in a new block. error[E0658]: attributes on expressions are experimental --> vmm/src/memory_manager.rs:698:17 \| 698 \| #[cfg(target_arch = "x86_64")] \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: see issue #15701 <https://github.com/rust-lang/rust/issues/15701> for more information error: removing an expression is not supported in this position --> vmm/src/memory_manager.rs:698:17 \| 698 \| #[cfg(target_arch = "x86_64")] \| Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2021-01-04 13:46:37 +01:00
Rob Bradford	1fc6d50f3e	misc: Make Bus::write() return an Option<Arc<Barrier>> This can be uses to indicate to the caller that it should wait on the barrier before returning as there is some asynchronous activity triggered by the write which requires the KVM exit to block until it's completed. This is useful for having vCPU thread wait for the VMM thread to proceed to activate the virtio devices. See #1863 Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-12-17 11:23:53 +00:00
Rob Bradford	c62e409827	memory_manager: Generate a MemoryRangeTable for dirty ranges In order to do this we must extend the MemoryManager API to add the ability to specify the tracking of the dirty pages when creating the userspace mappings and also keep track of the userspace mappings that have been created for RAM regions. Currently the dirty pages are collected into ranges based on a block level of 64 pages. The algorithm could be tweaked to create smaller ranges but for now if any page in the block of 64 is dirty the whole block is added to the range. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-11-17 16:57:11 +00:00
Rob Bradford	8baa244ec1	hypervisor: Add control for dirty page logging When creating a userspace mapping provide a control for enabling the logging of dirty pages. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-11-17 16:57:11 +00:00
Anatol Belski	b399287430	memory_manager: Make addressable space size 64k aligned While the addressable space size reduction of 4k in necessary due to the Linux bug, the 64k alignment of the addressable space size is required by Windows. This patch satisfies both. Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>	2020-11-16 16:39:11 +00:00
Rob Bradford	dfe2dadb3e	vmm: memory_manager: Make the snapshot source directory an Option This allows the code to be reused when creating the VM from a snapshot when doing VM migration. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-11-11 11:07:24 +01:00
Rob Bradford	cb88ceeae8	vmm: memory_manager: Move the restoration of guest memory later Rather than filling the guest memory from a file at the point of the the guest memory region being created instead fill from the file later. This simplifies the region creation code but also adds flexibility for sourcing the guest memory from a source other than an on disk file. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-10-30 12:31:47 +01:00
Rob Bradford	21db6f53c8	vmm: memory_manager: Write all guest region to disk As a mirror of bdbea19e239b22e781a7df3caf8db04675e15553 which ensured that GuestMemoryMmap::read_exact_from() was used to read all the file to the region ensure that all the guest memory region is written to disk. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-10-27 12:11:31 -07:00
Sebastien Boeuf	7e127df415	vmm: memory_manager: Replace 'ext_region' by 'saved_region' Any occurrence of of a variable containing `ext_region` is replaced with the less confusing name `saved_region`. The point is to clearly identify the memory regions that might have been saved during a snapshot, while the `ext` standing for `external` was pretty unclear. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2020-10-23 21:59:52 +02:00
Sebastien Boeuf	c0e8e5b53f	vmm: memory_manager: Replace 'backing_file' variable names In the context of saving the memory regions content through snapshot, using the term "backing file" brings confusion with the actual backing file that might back the memory mapping. To avoid such conflicting naming, the 'backing_file' field from the MemoryRegion structure gets replaced with 'content', as this is designating the potential file containing the memory region data. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2020-10-23 21:59:52 +02:00
Rob Bradford	bdbea19e23	vmm: memory_manager: Completely fill guest ram from snapshot Use GuestRegionMmap::read_exact_from() to ensure that all of the file is read into the guest. This addresses an issue where GuestRegionMmap::read_from() was only copying the first 2GiB of the memory and so lead to snapshot-restore was failing when the guest RAM was 2GiB or greater. This change also propagates any error from the copying upwards. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-10-23 17:56:19 +01:00
Rob Bradford	a60b437f89	vmm: memory_manager: Always copy anonymous RAM regions from disk When restoring if a region of RAM is backed by anonymous memory i.e from memfd_create() then copy the contents of the ram from the file that has been saved to disk. Previously the code would map the memory from that file into the guest using a MAP_PRIVATE mapping. This has the effect of minimising the restore time but provides an issue where the restored VM does not have the same structure as the snapshotted VM, in particular memory is backed by files in the restored VM that were anonymously backed in the original. This creates two problems: * The snapshot data is mapped from files for the pages of the guest which prevents the storage from being reclaimed. * When snapshotting again the guest memory will not be correctly saved as it will have looked like it was backed by a file so it will not be written to disk but as it is a MAP_PRIVATE mapping the changes will never be written to the disk again. This results in incorrect behaviour. Signed-off-by: Rob Bradford <robert.bradford@intel.com>	2020-10-23 12:34:32 +02:00
Sebastien Boeuf	3594685279	vmm: Move balloon code from MemoryManager to DeviceManager Now that we have a new dedicated way of asking for a balloon through the CLI and the REST API, we can move all the balloon code to the device manager. This allows us to simplify the memory manager, which is already quite complex. It also simplifies the behavior of the balloon resizing command. Instead of providing the expected size for the RAM, which is complex when memory zones are involved, it now expects the balloon size. This is a much more straightforward behavior as it really resizes the balloon to the desired size. Additionally to the simplication, the benefit of this approach is that it does not need to be tied to the memory manager at all. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2020-10-22 16:33:16 +02:00
Sebastien Boeuf	aec88e20d7	vmm: memory_manager: Rely on physical bits for address space size If the user provided a maximum physical bits value for the vCPUs, the memory manager will adapt the guest physical address space accordingly so that devices are not placed further than the specified value. It's important to note that if the number exceed what is available on the host, the smaller number will be picked. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2020-10-13 18:58:36 +02:00
Bo Chen	e9738a4a49	vmm: Replace the use of 'unchecked_add' with 'checked_add' The 'GuestAddress::unchecked_add' function has undefined behavior when an overflow occurs. Its alternative 'checked_add' requires use to handle the overflow explicitly. Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-13 12:09:22 +02:00
Bo Chen	9ab2a34b40	vmm: Remove reserved 256M gaps for hotplugging memory with ACPI We are now reserving a 256M gap in the guest address space each time when hotplugging memory with ACPI, which prevents users from hotplugging memory to the maximum size they requested. We confirm that there is no need to reserve this gap. This patch removes the 'reserved gaps'. It also refactors the 'MemoryManager::start_addr' so that it is rounding-up to 128M alignment when hotplugged memory is allowed with ACPI. Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-13 12:09:22 +02:00
Bo Chen	10f380f95b	vmm: Report no error when resizing to current memory size with ACPI We now try to create a ram region of size 0 when the requested memory size is the same as current memory size. It results in an error of `GuestMemoryRegion(Mmap(Os { code: 22, kind: InvalidInput, message: "Invalid argument" }))`. This error is not meaningful to users and we should not report it. Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-12 08:46:38 +02:00
Bo Chen	789ee7b3e4	vmm: Support resizing memory up to and including hotplug size The start address after the hottplugged memory can be the start address of device area. Fixes: #1803 Signed-off-by: Bo Chen <chen.bo@intel.com>	2020-10-10 09:51:32 +02:00
Hui Zhu	c75f8b2f89	virtio-balloon: Add memory_actual_size to vm.info to show memory actual size The virtio-balloon change the memory size is asynchronous. VirtioBalloonConfig.actual of balloon device show current balloon size. This commit add memory_actual_size to vm.info to show memory actual size. Signed-off-by: Hui Zhu <teawater@antfin.com>	2020-10-01 17:46:30 +02:00
Praveen Paladugu	f10872e706	vmm: fix clippy warnings Signed-off-by: Praveen Paladugu <prapal@microsoft.com>	2020-09-26 14:07:12 +01:00
Josh Soref	5c3f4dbe6f	ch: Fix various misspelled words Misspellings were identified by https://github.com/marketplace/actions/check-spelling * Initial corrections suggested by Google Sheets * Additional corrections by Google Chrome auto-suggest * Some manual corrections Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>	2020-09-23 08:59:31 +01:00
Jiangbo Wu	223189c063	mm: Apply zone's property instread of global config Apply memory zone's property for associated virtio-mem regions. Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>	2020-09-22 09:56:37 +02:00
Jiangbo Wu	80be8ac0dc	mm: Apply memory policy for virtio-mem region Use zone.host_numa_node to create memory zone, so that memory zone can apply memory policy in according with host numa node ID Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>	2020-09-22 09:56:37 +02:00

1 2 3 4 5

248 Commits