Commit Graph

244 Commits

Author SHA1 Message Date
Rob Bradford
715a7d9065 vmm: Add convenience API for getting slots to FDs mapping
This will be used for sending those file descriptors for local
migration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
b95e46565c vmm: Support using existing files for memory slots
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
eeba1d3ad8 vmm: Support using an existing FD for memory
If this FD (wrapped in a File) is supplied when the RAM region is being
created use that over creating a new one.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
271e17bd79 vmm: Extract code for opening a file for memory
This function is used to open an FD (wrapped in a File) that points to
guest memory from memfd_create() or backed on the filesystem.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Sebastien Boeuf
7bb343dce8 vmm: Improve logging related to memory management
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-12-04 23:04:32 +01:00
Ziye Yang
b09cbb8493 vmm: Add constant SGX_PAGE_SIZE in memory_manager.rs
Purpose: Do not directly use 0x1000 but use predefined constant value.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
2021-12-03 10:06:15 +00:00
Rob Bradford
185f0c1bf3 vmm: Port MemoryManager to Aml::append_aml_bytes()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-08 16:46:30 +00:00
Rob Bradford
a2e02a8fff vmm: Add SGX section creation logging
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
def98faf37 vmm, vm-allocator: Introduce an allocator for platform devices
This allocator allocates 64-bit MMIO addresses for use with platform
devices e.g. ACPI control devices and ensures there is no overlap with
PCI address space ranges which can cause issues with PCI device
remapping.

Use this allocator the ACPI platform devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
afe95e5a2a vmm: Use an allocator specifically for RAM regions
Rather than use the system MMIO allocator for RAM use an allocator that
covers the full RAM range.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b8fee11822 vmm: Place SGX EPC region between RAM and device area
Increase the start of the device area to accomodate the SGX EPC area.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
e20be3e147 vmm: Check hotplug memory against end of RAM not start of device area
This is because the SGX region will be placed between the end of ram and
the start of the device area.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
ec81f377b6 vmm: Refactor SGX setup to inside MemoryManager::new()
This makes it possible to manually allocate the SGX region after the end
of RAM region.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Sebastien Boeuf
58d8206e2b migration: Use MemoryManager restore code path
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
1e1e61614c vmm: memory_manager: Leverage new codepath for snapshot/restore
Now that all the pieces are in place, we can restore a VM with the new
codepath that restores properly all memory regions, allowing for ACPI
memory hotplug to work properly with snapshot/restore feature.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
6a55768d94 vmm: Create MemoryManager from restore data
Extending the MemoryManager::new() function to be able to create a
MemoryManager from data that have been previously stored instead of
always creating everything from scratch.

This change brings real added value as it allows a VM to be restored
respecting the proper memory layout instead of hoping the regions will
be created the way they were before.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
5b177b205b arch, vmm: Extend the data being snapshot
Storing multiple data coming from the MemoryManager in order to be able
to restore without creating everything from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
f440976a7c vmm: memory_manager: Add a way to restore memory regions properly
This new function will be able to restore memory regions and memory
zones based on the GuestMemoryMapping list that will be provided through
snapshot/restore and migration phases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
0d573ae86c vmm: memory_manager: Add file_offset to GuestRamMapping
This will help restoring the region with the correct file offset for the
memory mapping.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
01420f5195 vmm: memory_manager: Add virtio_mem to GuestRamMapping
This will help identify if the range belongs to a virtio-mem region or
not.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
dfb1829f65 vmm: memory_manager: Add zone_id to GuestRamMapping
This can help identifying which zone relates to which memory range.
This is going to be useful when recreating GuestMemory regions from
the previous layout instead of having to recreate everything from
scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
b5d11f72b3 vmm: memory_manager: Factorize allocation of ranges
Create a dedicated function to factorize the allocation of the memory
ranges, and helping with the simplification of MemoryManager::new()
function.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
00951f17d4 vmm: memory_manager: Simplify regions creation
By updating the list of GuestMemory regions with the virtio-mem ones
before the creation of the MemoryManager, we know the GuestMemory is up
to date and the allocation of memory ranges is simplified afterwards.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
63c6c78c4e vmm: memory_manager: Factorize configuration validation
In order to simplify MemoryManager::new() function. let's move the
memory configuration validation to its own function.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Yu Li
08021087ec vmm: add prefault option in memory and memory-zone
The argument `prefault` is provided in MemoryManager, but it can
only be used by SGX and restore.
With prefault (MAP_POPULATE) been set, subsequent page faults will
decrease during running, although it will make boot slower.

This commit adds `prefault` in MemoryConfig and MemoryZoneConfig.
To resolve conflict between memory and restore, argument
`prefault` has been changed from `bool` to `Option<bool>`, when
its value is None, config from memory will be used, otherwise
argument in Option will be used.

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2021-09-29 14:17:35 +02:00
Sebastien Boeuf
59031531b6 vmm: Simplify the way memory is snapshot and restored
By using a single file for storing the memory ranges, we simplify the
way snapshot/restore works by avoiding multiples files, but the main and
more important point is that we have now a way to save only the ranges
that matter. In particular, the ranges related to virtio-mem regions are
not always fully hotplugged, meaning we don't want to save the entire
region. That's where the usage of memory ranges is interesting as it
lets us optimize the snapshot/restore process when one or multiple
virtio-mem regions are involved.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
1ea63f50a1 vmm: Move MemoryRangeTable creation to the MemoryManager
The function memory_range_table() will be reused by the MemoryManager in
a following patch to describe all the ranges that we should snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
e390775bcb vmm, virtio-devices: Move BlocksState creation to the MemoryManager
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
a1caa6549a vmm: Add page size as a parameter for MemoryRangeTable::from_bitmap()
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
d7115ec656 virtio-devices: mem: Add snapshot/restore support
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
7bbcc0f849 vmm: memory_manager: Make sure the hotplugged_size is up to date
The amount of memory plugged in the virtio-mem region should always be
kept up to date in the hotplugged_size field from VirtioMemZone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
c4dc7a583d vmm: memory_manager: Simplify the MemoryManager structure
There's no need to duplicate the GuestMemory for snapshot purpose, as we
always have a handle onto the GuestMemory through the guest_memory
field.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
74485924b1 vmm: memory_manager: Simplification to avoid unnecessary locking
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Rob Bradford
171d12943d vmm: memory_manager: Increase robustness of MemoryManager control device
See: #1289

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-10 10:23:19 -07:00
Bo Chen
4f37a273d9 vmm: Fix clippy issue
error: all if blocks contain the same code at the end
   --> vmm/src/memory_manager.rs:884:9
    |
884 | /             Ok(mm)
885 | |         }
    | |_________^

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-09-08 13:31:19 -07:00
Sebastien Boeuf
0411064271 vmm: Refactor migration through Migratable trait
Now that Migratable provides the methods for starting, stopping and
retrieving the dirty pages, we move the existing code to these new
functions.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Sebastien Boeuf
79425b6aa8 vm-migration, vmm: Extend methods for MemoryRangeTable
In anticipation for supporting the merge of multiple dirty pages coming
from multiple devices, this patch factorizes the creation of a
MemoryRangeTable from a bitmap, as well as providing a simple method for
merging the dirty pages regions under a single MemoryRangeTable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-08-05 06:07:00 -07:00
Muminul Islam
fdecba6958 hypervisor: MSHV needs gpa to retrieve dirty logs
Right now, get_dirty_log API has two parameters,
slot and memory_size.
MSHV needs gpa to retrieve the page states. GPA is
needed as MSHV returns the state base on PFN.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Bo Chen
b00a6a8519 vmm: Create guest memory regions with explicit dirty-pages-log flags
As we are now using an global control to start/stop dirty pages log from
the `hypervisor` crate, we need to explicitly tell the hypervisor (KVM)
whether a region needs dirty page tracking when it is created.

This reverts commit f063346de3.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:08:32 -07:00
Bo Chen
e7c9954dc1 hypervisor, vmm: Abstract the interfaces to start/stop dirty log
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.

This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:08:32 -07:00
Bo Chen
f063346de3 vmm: Create guest memory regions without dirty-pages-log by default
With the support of dynamically turning on/off dirty-pages-log during
live-migration (only for guest RAM regions), we now can create guest
memory regions without dirty-pages-log by default both for guest RAM
regions and other regions backed by file/device.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Bo Chen
5e0d498582 hypervisor, vmm: Add dynamic control of logging dirty pages
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.

Fixes: #2858

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Sebastien Boeuf
9aedabe11e sgx: Add mandatory id field to SgxEpcConfig
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
Sebastien Boeuf
17c99ae00a vmm: Enable provisioning for SGX guest
The guest can see that SGX supports provisioning as it is exposed
through the CPUID. This patch enables the proper backing of this
feature by having the host open the provisioning device and enable
this capability through the hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-07 14:56:38 +02:00
Wei Liu
1f2915bff0 vmm: hypervisor: split set_user_memory_region to two functions
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.

MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.

Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.

This fixes user memory region removal on MSHV.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:45:45 +02:00
Sebastien Boeuf
09c3ddd47d vmm: memory_manager: Remove _PXM from ACPI memory slot
The _PXM method always return 0, which is wrong since the SRAT might
tell differently. The point of the _PXM method is to be evaluated by the
guest OS when some new memory slot is being plugged, but this will never
happen for Cloud Hypervisor since using NUMA nodes along with memory
hotplug only works for virtio-mem.

Memory hotplug through ACPI will only happen when there's only one NUMA
node exposed to the guest, which means the _PXM method won't be needed
at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Bo Chen
7839e121f6 vmm: Add dirty pages tracked by vm_memory::bitmap to live migration
Live migration currently handles guest memory writes from the guest
through the KVM dirty page tracking and sends those dirty pages to the
destination. This patch augments the live migration support with dirty
page tracking of writes from the VMM to the guest memory(e.g. virtio
devices).

Fixes: #2458

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
2c4fa258a6 virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)"
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].

[1] https://github.com/rust-vmm/vm-memory/issues/133

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
b5bcdbaf48 misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.

The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Rob Bradford
f840327ffb vmm: Version MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-21 15:29:52 +02:00
Rob Bradford
496ceed1d0 misc: Remove unnecessary "extern crate"
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-12 17:26:11 +02:00
Rob Bradford
b8f5911c4e misc: Remove unused errors from public interface
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-11 13:37:19 +02:00
Mikko Ylinen
3b18caf229 sgx: update virt EPC device path and docs
The latest kvm-sgx code has renamed sgx_virt_epc device node
to sgx_vepc. Update cloud-hypervisor code and documentation to
follow this.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-30 16:16:01 +02:00
Rob Bradford
01f0c1e313 vmm: Simplify memory state to support Versionize
In order to support using Versionize for state structures it is necessary
to use simpler, primitive, data types in the state definitions used for
snapshot restore.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-23 14:24:16 +01:00
Rob Bradford
a7c4483b8b vmm: Directly (de)serialise CpuManager, DeviceManager and MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-20 18:58:37 +02:00
Bo Chen
78796f96b7 vmm: Refine the granularity of dirty memory tracking
Instead of tracking on a block level of 64 pages, we are now collecting
dirty pages one by one. It improves the efficiency of dirty memory
tracking while live migration.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-04-19 17:17:14 +02:00
Anatol Belski
e1cc702327 memory_manager: Fix address range calculation in MemorySlot
The MCRS method returns a 64-bit memory range descriptor. The
calculation is supposed to be done as follows:

max = min + len - 1

However, every operand is represented not as a QWORD but as combination
of two DWORDs for high and low part. Till now, the calculation was done
this way, please see also inline comments:

max.lo = min.lo + len.lo //this may overflow, need to carry over to high
max.hi = min.hi + len.hi
max.hi = max.hi - 1 // subtraction needs to happen on the low part

This calculation has been corrected the following way:

max.lo = min.lo + len.lo
max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow
max.lo = max.lo - 1 // subtract from low part

The relevant part from the generated ASL for the MCRS method:
```
Method (MCRS, 1, Serialized)
{
    Acquire (MLCK, 0xFFFF)
    \_SB.MHPC.MSEL = Arg0
    Name (MR64, ResourceTemplate ()
    {
	QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
	    0x0000000000000000, // Granularity
	    0x0000000000000000, // Range Minimum
	    0xFFFFFFFFFFFFFFFE, // Range Maximum
	    0x0000000000000000, // Translation Offset
	    0xFFFFFFFFFFFFFFFF, // Length
	    ,, _Y00, AddressRangeMemory, TypeStatic)
    })
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL)  // _MIN: Minimum Base Address
    CreateDWordField (MR64, 0x12, MINH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL)  // _MAX: Maximum Base Address
    CreateDWordField (MR64, 0x1A, MAXH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL)  // _LEN: Length
    CreateDWordField (MR64, 0x2A, LENH)
    MINL = \_SB.MHPC.MHBL
    MINH = \_SB.MHPC.MHBH
    LENL = \_SB.MHPC.MHLL
    LENH = \_SB.MHPC.MHLH
    MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL */
    MAXH = (MINH + LENH) /* \_SB_.MHPC.MCRS.LENH */
    If ((MAXL < MINL))
    {
	MAXH += One /* \_SB_.MHPC.MCRS.MAXH */
    }

    MAXL -= One
    Release (MLCK)
    Return (MR64) /* \_SB_.MHPC.MCRS.MR64 */
}
```

Fixes #1800.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2021-04-12 16:20:19 +02:00
Rob Bradford
9762c8bc28 vmm: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `LocalAPIC` contains a capitalized acronym
   --> vmm/src/cpu.rs:197:8
    |
197 | struct LocalAPIC {
    |        ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
db6516931d acpi_tables: Address Rust 1.51.0 clippy issue (upper_case_acronyms)
error: name `SDT` contains a capitalized acronym
  --> acpi_tables/src/sdt.rs:27:12
   |
27 | pub struct SDT {
   |            ^^^ help: consider making the acronym lowercase, except the initial letter: `Sdt`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
ab4b30edd3 vmm: Switch MemoryManager::send() to url_to_path()
This continues the work in cc78a597cd

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-12 16:52:55 +01:00
Rob Bradford
cc78a597cd vmm: Simplify snapshot/restore path handling
Extend the existing url_to_path() to take the URL string and then use
that to simplify the snapshot/restore code paths.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-12 13:03:01 +01:00
Rob Bradford
b02aff5761 vmm: memory_manager: Disable dirty page logging when running on TDX
It is not permitted to have this enabled in memory that is part of a TD.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-08 18:30:00 +00:00
Sebastien Boeuf
d6db2fdf96 vmm: memory_manager: Add ACPI hotplug region to default memory zone
When memory is resized through ACPI, a new region is added to the guest
memory. This region must also be added to the corresponding memory zone
in order to keep everything in sync.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-05 10:38:42 +01:00
Rob Bradford
deedfcdc35 vmm: Improve restore error message about URL conversion
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-23 11:07:48 +00:00
Rob Bradford
38c41a5074 vmm: memory_manager: Extract code for allocating new memory
This function can then be used by the TDX code to allocate the memory at
specific locations required for the TDVF to run from.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-16 18:38:57 +01:00
Rob Bradford
7928a697dc vmm: Support configurable huge pages in MemoryManager
Use the newly added hugepages_size option if provided by the user to
pick a huge page size when creating the memfd region. If none is
specified use the system default.

Sadly different huge pages cannot be tested by an integration test as
creating a pool of the non-default size cannot be done at runtime
(requires kernel to be booted with certain parameters.)

TETS=Manually tested with a kernel booted with both 1GiB and 2MiB huge
pages (hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=512)

Fixes: #2230

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-05 09:24:02 +00:00
Rob Bradford
29607f38ad vmm: config: Add a hugepage_size option
This allows the user to use an alternative huge page size otherwise the
default size will be used.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-05 09:24:02 +00:00
Sebastien Boeuf
c397c9c95e vmm, virtio-devices: mem: Don't use MADV_DONTNEED on hugepages
This commit introduces a new information to the VirtioMemZone structure
in order to know if the memory zone is backed by hugepages.

Based on this new information, the virtio-mem device is now able to
determine if madvise(MADV_DONTNEED) should be performed or not. The
madvise documentation specifies that MADV_DONTNEED advice will fail if
the memory range has been allocated with some hugepages.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-02-04 17:52:30 +00:00
Rob Bradford
9ea087597f vmm: acpi: Ensure field size address matches ResourceTag
dsdt.dsl    960:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL)  // _MIN: Minimum Base Address
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

dsdt.dsl    962:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL)  // _MAX: Maximum Base Address
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

dsdt.dsl    964:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL)  // _LEN: Length
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
c29caf2a85 vmm: acpi: Fix incorrect mutex timeout value
The mutex timeout should be 0xffff rather than 0xfff to disable the
timeout feature.

dsdt.dsl    745:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    767:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    775:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
6006068951 vmm: acpi: Move MemoryManager ACPI device to an MMIO address
Migrate the MemoryManager from a fixed I/O port address to an allocated
MMIO address.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-22 16:08:41 +01:00
Mikko Ylinen
f583aa9d30 sgx: update virt EPC device path and docs
Based on the LKML feedback, the devices under /dev/sgx/* are
not justified. SGX RFC v40 moves the SGX device nodes to /dev/sgx_*
and this is reflected in kvm-sgx (next branch) too.

Update cloud-hypervisor code and documentation to follow this.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-05 15:48:17 +00:00
Rob Bradford
fabd63072b misc: Remove unnecessary literal casts
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 13:46:37 +01:00
Rob Bradford
faba6a3fb3 vmm: memory_manager: Use workaround for conditional function arguments
With Rust 1.49 using attributes on a function parameter is not allowed.
The recommended workaround is to put it in a new block.

error[E0658]: attributes on expressions are experimental
   --> vmm/src/memory_manager.rs:698:17
    |
698 |                 #[cfg(target_arch = "x86_64")]
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #15701 <https://github.com/rust-lang/rust/issues/15701> for more information

error: removing an expression is not supported in this position
   --> vmm/src/memory_manager.rs:698:17
    |
698 |                 #[cfg(target_arch = "x86_64")]
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 13:46:37 +01:00
Rob Bradford
1fc6d50f3e misc: Make Bus::write() return an Option<Arc<Barrier>>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.

This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.

See #1863

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 11:23:53 +00:00
Rob Bradford
c62e409827 memory_manager: Generate a MemoryRangeTable for dirty ranges
In order to do this we must extend the MemoryManager API to add the
ability to specify the tracking of the dirty pages when creating the
userspace mappings and also keep track of the userspace mappings that
have been created for RAM regions.

Currently the dirty pages are collected into ranges based on a block
level of 64 pages. The algorithm could be tweaked to create smaller
ranges but for now if any page in the block of 64 is dirty the whole
block is added to the range.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-17 16:57:11 +00:00
Rob Bradford
8baa244ec1 hypervisor: Add control for dirty page logging
When creating a userspace mapping provide a control for enabling the
logging of dirty pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-17 16:57:11 +00:00
Anatol Belski
b399287430 memory_manager: Make addressable space size 64k aligned
While the addressable space size reduction of 4k in necessary due to
the Linux bug, the 64k alignment of the addressable space size is
required by Windows. This patch satisfies both.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2020-11-16 16:39:11 +00:00
Rob Bradford
dfe2dadb3e vmm: memory_manager: Make the snapshot source directory an Option
This allows the code to be reused when creating the VM from a snapshot
when doing VM migration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-11 11:07:24 +01:00
Rob Bradford
cb88ceeae8 vmm: memory_manager: Move the restoration of guest memory later
Rather than filling the guest memory from a file at the point of the the
guest memory region being created instead fill from the file later. This
simplifies the region creation code but also adds flexibility for
sourcing the guest memory from a source other than an on disk file.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-30 12:31:47 +01:00
Rob Bradford
21db6f53c8 vmm: memory_manager: Write all guest region to disk
As a mirror of bdbea19e23 which ensured
that GuestMemoryMmap::read_exact_from() was used to read all the file to
the region ensure that all the guest memory region is written to disk.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-27 12:11:31 -07:00
Sebastien Boeuf
7e127df415 vmm: memory_manager: Replace 'ext_region' by 'saved_region'
Any occurrence of of a variable containing `ext_region` is replaced with
the less confusing name `saved_region`. The point is to clearly identify
the memory regions that might have been saved during a snapshot, while
the `ext` standing for `external` was pretty unclear.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-23 21:59:52 +02:00
Sebastien Boeuf
c0e8e5b53f vmm: memory_manager: Replace 'backing_file' variable names
In the context of saving the memory regions content through snapshot,
using the term "backing file" brings confusion with the actual backing
file that might back the memory mapping.

To avoid such conflicting naming, the 'backing_file' field from the
MemoryRegion structure gets replaced with 'content', as this is
designating the potential file containing the memory region data.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-23 21:59:52 +02:00
Rob Bradford
bdbea19e23 vmm: memory_manager: Completely fill guest ram from snapshot
Use GuestRegionMmap::read_exact_from() to ensure that all of the file is
read into the guest. This addresses an issue where
GuestRegionMmap::read_from() was only copying the first 2GiB of the
memory and so lead to snapshot-restore was failing when the guest RAM
was 2GiB or greater.

This change also propagates any error from the copying upwards.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-23 17:56:19 +01:00
Rob Bradford
a60b437f89 vmm: memory_manager: Always copy anonymous RAM regions from disk
When restoring if a region of RAM is backed by anonymous memory i.e from
memfd_create() then copy the contents of the ram from the file that has
been saved to disk.

Previously the code would map the memory from that file into the guest
using a MAP_PRIVATE mapping. This has the effect of
minimising the restore time but provides an issue where the restored VM
does not have the same structure as the snapshotted VM, in particular
memory is backed by files in the restored VM that were anonymously
backed in the original.

This creates two problems:

* The snapshot data is mapped from files for the pages of the guest
  which prevents the storage from being reclaimed.
* When snapshotting again the guest memory will not be correctly saved
  as it will have looked like it was backed by a file so it will not be
  written to disk but as it is a MAP_PRIVATE mapping the changes will
  never be written to the disk again. This results in incorrect
  behaviour.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-23 12:34:32 +02:00
Sebastien Boeuf
3594685279 vmm: Move balloon code from MemoryManager to DeviceManager
Now that we have a new dedicated way of asking for a balloon through the
CLI and the REST API, we can move all the balloon code to the device
manager. This allows us to simplify the memory manager, which is already
quite complex.

It also simplifies the behavior of the balloon resizing command. Instead
of providing the expected size for the RAM, which is complex when memory
zones are involved, it now expects the balloon size. This is a much more
straightforward behavior as it really resizes the balloon to the desired
size. Additionally to the simplication, the benefit of this approach is
that it does not need to be tied to the memory manager at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-22 16:33:16 +02:00
Sebastien Boeuf
aec88e20d7 vmm: memory_manager: Rely on physical bits for address space size
If the user provided a maximum physical bits value for the vCPUs, the
memory manager will adapt the guest physical address space accordingly
so that devices are not placed further than the specified value.

It's important to note that if the number exceed what is available on
the host, the smaller number will be picked.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-13 18:58:36 +02:00
Bo Chen
e9738a4a49 vmm: Replace the use of 'unchecked_add' with 'checked_add'
The 'GuestAddress::unchecked_add' function has undefined behavior when
an overflow occurs. Its alternative 'checked_add' requires use to handle
the overflow explicitly.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-13 12:09:22 +02:00
Bo Chen
9ab2a34b40 vmm: Remove reserved 256M gaps for hotplugging memory with ACPI
We are now reserving a 256M gap in the guest address space each time
when hotplugging memory with ACPI, which prevents users from hotplugging
memory to the maximum size they requested. We confirm that there is no
need to reserve this gap.

This patch removes the 'reserved gaps'. It also refactors the
'MemoryManager::start_addr' so that it is rounding-up to 128M alignment
when hotplugged memory is allowed with ACPI.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-13 12:09:22 +02:00
Bo Chen
10f380f95b vmm: Report no error when resizing to current memory size with ACPI
We now try to create a ram region of size 0 when the requested memory
size is the same as current memory size. It results in an error of
`GuestMemoryRegion(Mmap(Os { code: 22, kind: InvalidInput, message:
"Invalid argument" }))`. This error is not meaningful to users and we
should not report it.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-12 08:46:38 +02:00
Bo Chen
789ee7b3e4 vmm: Support resizing memory up to and including hotplug size
The start address after the hottplugged memory can be the start address of
device area.

Fixes: #1803

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-10 09:51:32 +02:00
Hui Zhu
c75f8b2f89 virtio-balloon: Add memory_actual_size to vm.info to show memory actual size
The virtio-balloon change the memory size is asynchronous.
VirtioBalloonConfig.actual of balloon device show current balloon size.

This commit add memory_actual_size to vm.info to show memory actual size.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-10-01 17:46:30 +02:00
Praveen Paladugu
f10872e706 vmm: fix clippy warnings
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
2020-09-26 14:07:12 +01:00
Josh Soref
5c3f4dbe6f ch: Fix various misspelled words
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-09-23 08:59:31 +01:00
Jiangbo Wu
223189c063 mm: Apply zone's property instread of global config
Apply memory zone's property for associated virtio-mem regions.

Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
2020-09-22 09:56:37 +02:00
Jiangbo Wu
80be8ac0dc mm: Apply memory policy for virtio-mem region
Use zone.host_numa_node to create memory zone, so that memory zone
can apply memory policy in according with host numa node ID

Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
2020-09-22 09:56:37 +02:00
Sebastien Boeuf
1e1a50ef70 vmm: Update memory configuration upon virtio-mem resizing
Based on all the preparatory work achieved through previous commits,
this patch updates the 'hotplugged_size' field for both MemoryConfig and
MemoryZoneConfig structures when either the whole memory is resized, or
simply when a memory zone is resized.

This fixes the lack of support for rebooting a VM with the right amount
of memory plugged in.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
de2b917f55 vmm: Add hotplugged_size to VirtioMemZone
Adding a new field to VirtioMemZone structure, as it lets us associate
with a particular virtio-mem region the amount of memory that should be
plugged in at boot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
3faf8605f3 vmm: Group virtio-mem fields under a dedicated structure
This patch simplifies the code as we have one single Option for the
VirtioMemZone. This also prepares for storing additional information
related to the virtio-mem region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
4e1b78e1ff vmm: Add 'hotplugged_size' to memory parameters
Add the new option 'hotplugged_size' to both --memory-zone and --memory
parameters so that we can let the user specify a certain amount of
memory being plugged at boot.

This is also part of making sure we can store the virtio-mem size over a
reboot of the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00