This allocator allocates 64-bit MMIO addresses for use with platform
devices e.g. ACPI control devices and ensures there is no overlap with
PCI address space ranges which can cause issues with PCI device
remapping.
Use this allocator the ACPI platform devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than use the system MMIO allocator for RAM use an allocator that
covers the full RAM range.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is because the SGX region will be placed between the end of ram and
the start of the device area.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that all the pieces are in place, we can restore a VM with the new
codepath that restores properly all memory regions, allowing for ACPI
memory hotplug to work properly with snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extending the MemoryManager::new() function to be able to create a
MemoryManager from data that have been previously stored instead of
always creating everything from scratch.
This change brings real added value as it allows a VM to be restored
respecting the proper memory layout instead of hoping the regions will
be created the way they were before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Storing multiple data coming from the MemoryManager in order to be able
to restore without creating everything from scratch.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new function will be able to restore memory regions and memory
zones based on the GuestMemoryMapping list that will be provided through
snapshot/restore and migration phases.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This can help identifying which zone relates to which memory range.
This is going to be useful when recreating GuestMemory regions from
the previous layout instead of having to recreate everything from
scratch.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Create a dedicated function to factorize the allocation of the memory
ranges, and helping with the simplification of MemoryManager::new()
function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By updating the list of GuestMemory regions with the virtio-mem ones
before the creation of the MemoryManager, we know the GuestMemory is up
to date and the allocation of memory ranges is simplified afterwards.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to simplify MemoryManager::new() function. let's move the
memory configuration validation to its own function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The argument `prefault` is provided in MemoryManager, but it can
only be used by SGX and restore.
With prefault (MAP_POPULATE) been set, subsequent page faults will
decrease during running, although it will make boot slower.
This commit adds `prefault` in MemoryConfig and MemoryZoneConfig.
To resolve conflict between memory and restore, argument
`prefault` has been changed from `bool` to `Option<bool>`, when
its value is None, config from memory will be used, otherwise
argument in Option will be used.
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
By using a single file for storing the memory ranges, we simplify the
way snapshot/restore works by avoiding multiples files, but the main and
more important point is that we have now a way to save only the ranges
that matter. In particular, the ranges related to virtio-mem regions are
not always fully hotplugged, meaning we don't want to save the entire
region. That's where the usage of memory ranges is interesting as it
lets us optimize the snapshot/restore process when one or multiple
virtio-mem regions are involved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The function memory_range_table() will be reused by the MemoryManager in
a following patch to describe all the ranges that we should snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The amount of memory plugged in the virtio-mem region should always be
kept up to date in the hotplugged_size field from VirtioMemZone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to duplicate the GuestMemory for snapshot purpose, as we
always have a handle onto the GuestMemory through the guest_memory
field.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
error: all if blocks contain the same code at the end
--> vmm/src/memory_manager.rs:884:9
|
884 | / Ok(mm)
885 | | }
| |_________^
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now that Migratable provides the methods for starting, stopping and
retrieving the dirty pages, we move the existing code to these new
functions.
No functional change intended.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for supporting the merge of multiple dirty pages coming
from multiple devices, this patch factorizes the creation of a
MemoryRangeTable from a bitmap, as well as providing a simple method for
merging the dirty pages regions under a single MemoryRangeTable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Right now, get_dirty_log API has two parameters,
slot and memory_size.
MSHV needs gpa to retrieve the page states. GPA is
needed as MSHV returns the state base on PFN.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
As we are now using an global control to start/stop dirty pages log from
the `hypervisor` crate, we need to explicitly tell the hypervisor (KVM)
whether a region needs dirty page tracking when it is created.
This reverts commit f063346de3.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.
This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.
Signed-off-by: Bo Chen <chen.bo@intel.com>
With the support of dynamically turning on/off dirty-pages-log during
live-migration (only for guest RAM regions), we now can create guest
memory regions without dirty-pages-log by default both for guest RAM
regions and other regions backed by file/device.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.
Fixes: #2858
Signed-off-by: Bo Chen <chen.bo@intel.com>
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The guest can see that SGX supports provisioning as it is exposed
through the CPUID. This patch enables the proper backing of this
feature by having the host open the provisioning device and enable
this capability through the hypervisor.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.
MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.
Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.
This fixes user memory region removal on MSHV.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The _PXM method always return 0, which is wrong since the SRAT might
tell differently. The point of the _PXM method is to be evaluated by the
guest OS when some new memory slot is being plugged, but this will never
happen for Cloud Hypervisor since using NUMA nodes along with memory
hotplug only works for virtio-mem.
Memory hotplug through ACPI will only happen when there's only one NUMA
node exposed to the guest, which means the _PXM method won't be needed
at all.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Live migration currently handles guest memory writes from the guest
through the KVM dirty page tracking and sends those dirty pages to the
destination. This patch augments the live migration support with dirty
page tracking of writes from the VMM to the guest memory(e.g. virtio
devices).
Fixes: #2458
Signed-off-by: Bo Chen <chen.bo@intel.com>
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].
[1] https://github.com/rust-vmm/vm-memory/issues/133
Signed-off-by: Bo Chen <chen.bo@intel.com>
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.
The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The latest kvm-sgx code has renamed sgx_virt_epc device node
to sgx_vepc. Update cloud-hypervisor code and documentation to
follow this.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
In order to support using Versionize for state structures it is necessary
to use simpler, primitive, data types in the state definitions used for
snapshot restore.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of tracking on a block level of 64 pages, we are now collecting
dirty pages one by one. It improves the efficiency of dirty memory
tracking while live migration.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The MCRS method returns a 64-bit memory range descriptor. The
calculation is supposed to be done as follows:
max = min + len - 1
However, every operand is represented not as a QWORD but as combination
of two DWORDs for high and low part. Till now, the calculation was done
this way, please see also inline comments:
max.lo = min.lo + len.lo //this may overflow, need to carry over to high
max.hi = min.hi + len.hi
max.hi = max.hi - 1 // subtraction needs to happen on the low part
This calculation has been corrected the following way:
max.lo = min.lo + len.lo
max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow
max.lo = max.lo - 1 // subtract from low part
The relevant part from the generated ASL for the MCRS method:
```
Method (MCRS, 1, Serialized)
{
Acquire (MLCK, 0xFFFF)
\_SB.MHPC.MSEL = Arg0
Name (MR64, ResourceTemplate ()
{
QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0xFFFFFFFFFFFFFFFE, // Range Maximum
0x0000000000000000, // Translation Offset
0xFFFFFFFFFFFFFFFF, // Length
,, _Y00, AddressRangeMemory, TypeStatic)
})
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL) // _MIN: Minimum Base Address
CreateDWordField (MR64, 0x12, MINH)
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL) // _MAX: Maximum Base Address
CreateDWordField (MR64, 0x1A, MAXH)
CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL) // _LEN: Length
CreateDWordField (MR64, 0x2A, LENH)
MINL = \_SB.MHPC.MHBL
MINH = \_SB.MHPC.MHBH
LENL = \_SB.MHPC.MHLL
LENH = \_SB.MHPC.MHLH
MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL */
MAXH = (MINH + LENH) /* \_SB_.MHPC.MCRS.LENH */
If ((MAXL < MINL))
{
MAXH += One /* \_SB_.MHPC.MCRS.MAXH */
}
MAXL -= One
Release (MLCK)
Return (MR64) /* \_SB_.MHPC.MCRS.MR64 */
}
```
Fixes#1800.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>