Commit Graph

205 Commits

Author SHA1 Message Date
Bo Chen
e7c9954dc1 hypervisor, vmm: Abstract the interfaces to start/stop dirty log
Following KVM interfaces, the `hypervisor` crate now provides interfaces
to start/stop the dirty pages logging on a per region basis, and asks
its users (e.g. the `vmm` crate) to iterate over the regions that needs
dirty pages log. MSHV only has a global control to start/stop dirty
pages log on all regions at once.

This patch refactors related APIs from the `hypervisor` crate to provide
a global control to start/stop dirty pages log (following MSHV's
behaviors), and keeps tracking the regions need dirty pages log for
KVM. It avoids leaking hypervisor-specific behaviors out of the
`hypervisor` crate.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-28 09:08:32 -07:00
Bo Chen
f063346de3 vmm: Create guest memory regions without dirty-pages-log by default
With the support of dynamically turning on/off dirty-pages-log during
live-migration (only for guest RAM regions), we now can create guest
memory regions without dirty-pages-log by default both for guest RAM
regions and other regions backed by file/device.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Bo Chen
5e0d498582 hypervisor, vmm: Add dynamic control of logging dirty pages
This patch extends slightly the current live-migration code path with
the ability to dynamically start and stop logging dirty-pages, which
relies on two new methods added to the `hypervisor::vm::Vm` Trait. This
patch also contains a complete implementation of the two new methods
based on `kvm` and placeholders for `mshv` in the `hypervisor` crate.

Fixes: #2858

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-07-26 09:19:35 -07:00
Sebastien Boeuf
9aedabe11e sgx: Add mandatory id field to SgxEpcConfig
In order to uniquely identify each SGX EPC section, we introduce a
mandatory option `id` to the `--sgx-epc` parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-09 14:45:30 +02:00
Sebastien Boeuf
17c99ae00a vmm: Enable provisioning for SGX guest
The guest can see that SGX supports provisioning as it is exposed
through the CPUID. This patch enables the proper backing of this
feature by having the host open the provisioning device and enable
this capability through the hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-07 14:56:38 +02:00
Wei Liu
1f2915bff0 vmm: hypervisor: split set_user_memory_region to two functions
Previously the same function was used to both create and remove regions.
This worked on KVM because it uses size 0 to indicate removal.

MSHV has two calls -- one for creation and one for removal. It also
requires having the size field available because it is not slot based.

Split set_user_memory_region to {create/remove}_user_memory_region. For
KVM they still use set_user_memory_region underneath, but for MSHV they
map to different functions.

This fixes user memory region removal on MSHV.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:45:45 +02:00
Sebastien Boeuf
09c3ddd47d vmm: memory_manager: Remove _PXM from ACPI memory slot
The _PXM method always return 0, which is wrong since the SRAT might
tell differently. The point of the _PXM method is to be evaluated by the
guest OS when some new memory slot is being plugged, but this will never
happen for Cloud Hypervisor since using NUMA nodes along with memory
hotplug only works for virtio-mem.

Memory hotplug through ACPI will only happen when there's only one NUMA
node exposed to the guest, which means the _PXM method won't be needed
at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-06-17 16:08:46 +02:00
Bo Chen
7839e121f6 vmm: Add dirty pages tracked by vm_memory::bitmap to live migration
Live migration currently handles guest memory writes from the guest
through the KVM dirty page tracking and sends those dirty pages to the
destination. This patch augments the live migration support with dirty
page tracking of writes from the VMM to the guest memory(e.g. virtio
devices).

Fixes: #2458

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
2c4fa258a6 virtio-devices, vmm: Deprecate "GuestMemory::with_regions(_mut)"
Function "GuestMemory::with_regions(_mut)" were mainly temporary methods
to access the regions in `GuestMemory` as the lack of iterator-based
access, and hence they are deprecated in the upstream vm-memory crate [1].

[1] https://github.com/rust-vmm/vm-memory/issues/133

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Bo Chen
b5bcdbaf48 misc: Upgrade to use the vm-memory crate w/ dirty-page-tracking
As the first step to complete live-migration with tracking dirty-pages
written by the VMM, this commit patches the dependent vm-memory crate to
the upstream version with the dirty-page-tracking capability. Most
changes are due to the updated `GuestMemoryMmap`, `GuestRegionMmap`, and
`MmapRegion` structs which are taking an additional generic type
parameter to specify what 'bitmap backend' is used.

The above changes should be transparent to the rest of the code base,
e.g. all unit/integration tests should pass without additional changes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-06-03 08:34:45 +01:00
Rob Bradford
f840327ffb vmm: Version MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-21 15:29:52 +02:00
Rob Bradford
496ceed1d0 misc: Remove unnecessary "extern crate"
Now all crates use edition = "2018" then the majority of the "extern
crate" statements can be removed. Only those for importing macros need
to remain.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-12 17:26:11 +02:00
Rob Bradford
b8f5911c4e misc: Remove unused errors from public interface
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-05-11 13:37:19 +02:00
Mikko Ylinen
3b18caf229 sgx: update virt EPC device path and docs
The latest kvm-sgx code has renamed sgx_virt_epc device node
to sgx_vepc. Update cloud-hypervisor code and documentation to
follow this.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-04-30 16:16:01 +02:00
Rob Bradford
01f0c1e313 vmm: Simplify memory state to support Versionize
In order to support using Versionize for state structures it is necessary
to use simpler, primitive, data types in the state definitions used for
snapshot restore.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-23 14:24:16 +01:00
Rob Bradford
a7c4483b8b vmm: Directly (de)serialise CpuManager, DeviceManager and MemoryManager state
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-20 18:58:37 +02:00
Bo Chen
78796f96b7 vmm: Refine the granularity of dirty memory tracking
Instead of tracking on a block level of 64 pages, we are now collecting
dirty pages one by one. It improves the efficiency of dirty memory
tracking while live migration.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-04-19 17:17:14 +02:00
Anatol Belski
e1cc702327 memory_manager: Fix address range calculation in MemorySlot
The MCRS method returns a 64-bit memory range descriptor. The
calculation is supposed to be done as follows:

max = min + len - 1

However, every operand is represented not as a QWORD but as combination
of two DWORDs for high and low part. Till now, the calculation was done
this way, please see also inline comments:

max.lo = min.lo + len.lo //this may overflow, need to carry over to high
max.hi = min.hi + len.hi
max.hi = max.hi - 1 // subtraction needs to happen on the low part

This calculation has been corrected the following way:

max.lo = min.lo + len.lo
max.hi = min.hi + len.hi + (max.lo < min.lo) // check for overflow
max.lo = max.lo - 1 // subtract from low part

The relevant part from the generated ASL for the MCRS method:
```
Method (MCRS, 1, Serialized)
{
    Acquire (MLCK, 0xFFFF)
    \_SB.MHPC.MSEL = Arg0
    Name (MR64, ResourceTemplate ()
    {
	QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
	    0x0000000000000000, // Granularity
	    0x0000000000000000, // Range Minimum
	    0xFFFFFFFFFFFFFFFE, // Range Maximum
	    0x0000000000000000, // Translation Offset
	    0xFFFFFFFFFFFFFFFF, // Length
	    ,, _Y00, AddressRangeMemory, TypeStatic)
    })
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL)  // _MIN: Minimum Base Address
    CreateDWordField (MR64, 0x12, MINH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL)  // _MAX: Maximum Base Address
    CreateDWordField (MR64, 0x1A, MAXH)
    CreateQWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL)  // _LEN: Length
    CreateDWordField (MR64, 0x2A, LENH)
    MINL = \_SB.MHPC.MHBL
    MINH = \_SB.MHPC.MHBH
    LENL = \_SB.MHPC.MHLL
    LENH = \_SB.MHPC.MHLH
    MAXL = (MINL + LENL) /* \_SB_.MHPC.MCRS.LENL */
    MAXH = (MINH + LENH) /* \_SB_.MHPC.MCRS.LENH */
    If ((MAXL < MINL))
    {
	MAXH += One /* \_SB_.MHPC.MCRS.MAXH */
    }

    MAXL -= One
    Release (MLCK)
    Return (MR64) /* \_SB_.MHPC.MCRS.MR64 */
}
```

Fixes #1800.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2021-04-12 16:20:19 +02:00
Rob Bradford
9762c8bc28 vmm: Address Rust 1.51.0 clippy issue (upper_case_acroynms)
warning: name `LocalAPIC` contains a capitalized acronym
   --> vmm/src/cpu.rs:197:8
    |
197 | struct LocalAPIC {
    |        ^^^^^^^^^ help: consider making the acronym lowercase, except the initial letter: `LocalApic`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
db6516931d acpi_tables: Address Rust 1.51.0 clippy issue (upper_case_acronyms)
error: name `SDT` contains a capitalized acronym
  --> acpi_tables/src/sdt.rs:27:12
   |
27 | pub struct SDT {
   |            ^^^ help: consider making the acronym lowercase, except the initial letter: `Sdt`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#upper_case_acronyms

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
ab4b30edd3 vmm: Switch MemoryManager::send() to url_to_path()
This continues the work in cc78a597cd

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-12 16:52:55 +01:00
Rob Bradford
cc78a597cd vmm: Simplify snapshot/restore path handling
Extend the existing url_to_path() to take the URL string and then use
that to simplify the snapshot/restore code paths.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-12 13:03:01 +01:00
Rob Bradford
b02aff5761 vmm: memory_manager: Disable dirty page logging when running on TDX
It is not permitted to have this enabled in memory that is part of a TD.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-08 18:30:00 +00:00
Sebastien Boeuf
d6db2fdf96 vmm: memory_manager: Add ACPI hotplug region to default memory zone
When memory is resized through ACPI, a new region is added to the guest
memory. This region must also be added to the corresponding memory zone
in order to keep everything in sync.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-05 10:38:42 +01:00
Rob Bradford
deedfcdc35 vmm: Improve restore error message about URL conversion
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-23 11:07:48 +00:00
Rob Bradford
38c41a5074 vmm: memory_manager: Extract code for allocating new memory
This function can then be used by the TDX code to allocate the memory at
specific locations required for the TDVF to run from.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-16 18:38:57 +01:00
Rob Bradford
7928a697dc vmm: Support configurable huge pages in MemoryManager
Use the newly added hugepages_size option if provided by the user to
pick a huge page size when creating the memfd region. If none is
specified use the system default.

Sadly different huge pages cannot be tested by an integration test as
creating a pool of the non-default size cannot be done at runtime
(requires kernel to be booted with certain parameters.)

TETS=Manually tested with a kernel booted with both 1GiB and 2MiB huge
pages (hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=512)

Fixes: #2230

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-05 09:24:02 +00:00
Rob Bradford
29607f38ad vmm: config: Add a hugepage_size option
This allows the user to use an alternative huge page size otherwise the
default size will be used.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-05 09:24:02 +00:00
Sebastien Boeuf
c397c9c95e vmm, virtio-devices: mem: Don't use MADV_DONTNEED on hugepages
This commit introduces a new information to the VirtioMemZone structure
in order to know if the memory zone is backed by hugepages.

Based on this new information, the virtio-mem device is now able to
determine if madvise(MADV_DONTNEED) should be performed or not. The
madvise documentation specifies that MADV_DONTNEED advice will fail if
the memory range has been allocated with some hugepages.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-02-04 17:52:30 +00:00
Rob Bradford
9ea087597f vmm: acpi: Ensure field size address matches ResourceTag
dsdt.dsl    960:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MIN, MINL)  // _MIN: Minimum Base Address
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

dsdt.dsl    962:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._MAX, MAXL)  // _MAX: Maximum Base Address
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

dsdt.dsl    964:             CreateDWordField (MR64, \_SB.MHPC.MCRS._Y00._LEN, LENL)  // _LEN: Length
Warning  3128 -                              ResourceTag larger than Field ^  (Size mismatch, Tag: 64 bits, Field: 32 bits)

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
c29caf2a85 vmm: acpi: Fix incorrect mutex timeout value
The mutex timeout should be 0xffff rather than 0xfff to disable the
timeout feature.

dsdt.dsl    745:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    767:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

dsdt.dsl    775:             Acquire (\_SB.PRES.CPLK, 0x0FFF)
Warning  3130 -                                           ^ Result is not used, possible operator timeout will be missed

Fixes: #2216

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-28 14:30:34 +01:00
Rob Bradford
6006068951 vmm: acpi: Move MemoryManager ACPI device to an MMIO address
Migrate the MemoryManager from a fixed I/O port address to an allocated
MMIO address.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-22 16:08:41 +01:00
Mikko Ylinen
f583aa9d30 sgx: update virt EPC device path and docs
Based on the LKML feedback, the devices under /dev/sgx/* are
not justified. SGX RFC v40 moves the SGX device nodes to /dev/sgx_*
and this is reflected in kvm-sgx (next branch) too.

Update cloud-hypervisor code and documentation to follow this.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2021-01-05 15:48:17 +00:00
Rob Bradford
fabd63072b misc: Remove unnecessary literal casts
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 13:46:37 +01:00
Rob Bradford
faba6a3fb3 vmm: memory_manager: Use workaround for conditional function arguments
With Rust 1.49 using attributes on a function parameter is not allowed.
The recommended workaround is to put it in a new block.

error[E0658]: attributes on expressions are experimental
   --> vmm/src/memory_manager.rs:698:17
    |
698 |                 #[cfg(target_arch = "x86_64")]
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #15701 <https://github.com/rust-lang/rust/issues/15701> for more information

error: removing an expression is not supported in this position
   --> vmm/src/memory_manager.rs:698:17
    |
698 |                 #[cfg(target_arch = "x86_64")]
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 13:46:37 +01:00
Rob Bradford
1fc6d50f3e misc: Make Bus::write() return an Option<Arc<Barrier>>
This can be uses to indicate to the caller that it should wait on the
barrier before returning as there is some asynchronous activity
triggered by the write which requires the KVM exit to block until it's
completed.

This is useful for having vCPU thread wait for the VMM thread to proceed
to activate the virtio devices.

See #1863

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 11:23:53 +00:00
Rob Bradford
c62e409827 memory_manager: Generate a MemoryRangeTable for dirty ranges
In order to do this we must extend the MemoryManager API to add the
ability to specify the tracking of the dirty pages when creating the
userspace mappings and also keep track of the userspace mappings that
have been created for RAM regions.

Currently the dirty pages are collected into ranges based on a block
level of 64 pages. The algorithm could be tweaked to create smaller
ranges but for now if any page in the block of 64 is dirty the whole
block is added to the range.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-17 16:57:11 +00:00
Rob Bradford
8baa244ec1 hypervisor: Add control for dirty page logging
When creating a userspace mapping provide a control for enabling the
logging of dirty pages.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-17 16:57:11 +00:00
Anatol Belski
b399287430 memory_manager: Make addressable space size 64k aligned
While the addressable space size reduction of 4k in necessary due to
the Linux bug, the 64k alignment of the addressable space size is
required by Windows. This patch satisfies both.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2020-11-16 16:39:11 +00:00
Rob Bradford
dfe2dadb3e vmm: memory_manager: Make the snapshot source directory an Option
This allows the code to be reused when creating the VM from a snapshot
when doing VM migration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-11 11:07:24 +01:00
Rob Bradford
cb88ceeae8 vmm: memory_manager: Move the restoration of guest memory later
Rather than filling the guest memory from a file at the point of the the
guest memory region being created instead fill from the file later. This
simplifies the region creation code but also adds flexibility for
sourcing the guest memory from a source other than an on disk file.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-30 12:31:47 +01:00
Rob Bradford
21db6f53c8 vmm: memory_manager: Write all guest region to disk
As a mirror of bdbea19e23 which ensured
that GuestMemoryMmap::read_exact_from() was used to read all the file to
the region ensure that all the guest memory region is written to disk.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-27 12:11:31 -07:00
Sebastien Boeuf
7e127df415 vmm: memory_manager: Replace 'ext_region' by 'saved_region'
Any occurrence of of a variable containing `ext_region` is replaced with
the less confusing name `saved_region`. The point is to clearly identify
the memory regions that might have been saved during a snapshot, while
the `ext` standing for `external` was pretty unclear.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-23 21:59:52 +02:00
Sebastien Boeuf
c0e8e5b53f vmm: memory_manager: Replace 'backing_file' variable names
In the context of saving the memory regions content through snapshot,
using the term "backing file" brings confusion with the actual backing
file that might back the memory mapping.

To avoid such conflicting naming, the 'backing_file' field from the
MemoryRegion structure gets replaced with 'content', as this is
designating the potential file containing the memory region data.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-23 21:59:52 +02:00
Rob Bradford
bdbea19e23 vmm: memory_manager: Completely fill guest ram from snapshot
Use GuestRegionMmap::read_exact_from() to ensure that all of the file is
read into the guest. This addresses an issue where
GuestRegionMmap::read_from() was only copying the first 2GiB of the
memory and so lead to snapshot-restore was failing when the guest RAM
was 2GiB or greater.

This change also propagates any error from the copying upwards.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-23 17:56:19 +01:00
Rob Bradford
a60b437f89 vmm: memory_manager: Always copy anonymous RAM regions from disk
When restoring if a region of RAM is backed by anonymous memory i.e from
memfd_create() then copy the contents of the ram from the file that has
been saved to disk.

Previously the code would map the memory from that file into the guest
using a MAP_PRIVATE mapping. This has the effect of
minimising the restore time but provides an issue where the restored VM
does not have the same structure as the snapshotted VM, in particular
memory is backed by files in the restored VM that were anonymously
backed in the original.

This creates two problems:

* The snapshot data is mapped from files for the pages of the guest
  which prevents the storage from being reclaimed.
* When snapshotting again the guest memory will not be correctly saved
  as it will have looked like it was backed by a file so it will not be
  written to disk but as it is a MAP_PRIVATE mapping the changes will
  never be written to the disk again. This results in incorrect
  behaviour.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-23 12:34:32 +02:00
Sebastien Boeuf
3594685279 vmm: Move balloon code from MemoryManager to DeviceManager
Now that we have a new dedicated way of asking for a balloon through the
CLI and the REST API, we can move all the balloon code to the device
manager. This allows us to simplify the memory manager, which is already
quite complex.

It also simplifies the behavior of the balloon resizing command. Instead
of providing the expected size for the RAM, which is complex when memory
zones are involved, it now expects the balloon size. This is a much more
straightforward behavior as it really resizes the balloon to the desired
size. Additionally to the simplication, the benefit of this approach is
that it does not need to be tied to the memory manager at all.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-22 16:33:16 +02:00
Sebastien Boeuf
aec88e20d7 vmm: memory_manager: Rely on physical bits for address space size
If the user provided a maximum physical bits value for the vCPUs, the
memory manager will adapt the guest physical address space accordingly
so that devices are not placed further than the specified value.

It's important to note that if the number exceed what is available on
the host, the smaller number will be picked.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-13 18:58:36 +02:00
Bo Chen
e9738a4a49 vmm: Replace the use of 'unchecked_add' with 'checked_add'
The 'GuestAddress::unchecked_add' function has undefined behavior when
an overflow occurs. Its alternative 'checked_add' requires use to handle
the overflow explicitly.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-13 12:09:22 +02:00
Bo Chen
9ab2a34b40 vmm: Remove reserved 256M gaps for hotplugging memory with ACPI
We are now reserving a 256M gap in the guest address space each time
when hotplugging memory with ACPI, which prevents users from hotplugging
memory to the maximum size they requested. We confirm that there is no
need to reserve this gap.

This patch removes the 'reserved gaps'. It also refactors the
'MemoryManager::start_addr' so that it is rounding-up to 128M alignment
when hotplugged memory is allowed with ACPI.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-13 12:09:22 +02:00
Bo Chen
10f380f95b vmm: Report no error when resizing to current memory size with ACPI
We now try to create a ram region of size 0 when the requested memory
size is the same as current memory size. It results in an error of
`GuestMemoryRegion(Mmap(Os { code: 22, kind: InvalidInput, message:
"Invalid argument" }))`. This error is not meaningful to users and we
should not report it.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-12 08:46:38 +02:00
Bo Chen
789ee7b3e4 vmm: Support resizing memory up to and including hotplug size
The start address after the hottplugged memory can be the start address of
device area.

Fixes: #1803

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-10-10 09:51:32 +02:00
Hui Zhu
c75f8b2f89 virtio-balloon: Add memory_actual_size to vm.info to show memory actual size
The virtio-balloon change the memory size is asynchronous.
VirtioBalloonConfig.actual of balloon device show current balloon size.

This commit add memory_actual_size to vm.info to show memory actual size.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-10-01 17:46:30 +02:00
Praveen Paladugu
f10872e706 vmm: fix clippy warnings
Signed-off-by: Praveen Paladugu <prapal@microsoft.com>
2020-09-26 14:07:12 +01:00
Josh Soref
5c3f4dbe6f ch: Fix various misspelled words
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2020-09-23 08:59:31 +01:00
Jiangbo Wu
223189c063 mm: Apply zone's property instread of global config
Apply memory zone's property for associated virtio-mem regions.

Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
2020-09-22 09:56:37 +02:00
Jiangbo Wu
80be8ac0dc mm: Apply memory policy for virtio-mem region
Use zone.host_numa_node to create memory zone, so that memory zone
can apply memory policy in according with host numa node ID

Signed-off-by: Jiangbo Wu <jiangbo.wu@intel.com>
2020-09-22 09:56:37 +02:00
Sebastien Boeuf
1e1a50ef70 vmm: Update memory configuration upon virtio-mem resizing
Based on all the preparatory work achieved through previous commits,
this patch updates the 'hotplugged_size' field for both MemoryConfig and
MemoryZoneConfig structures when either the whole memory is resized, or
simply when a memory zone is resized.

This fixes the lack of support for rebooting a VM with the right amount
of memory plugged in.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
de2b917f55 vmm: Add hotplugged_size to VirtioMemZone
Adding a new field to VirtioMemZone structure, as it lets us associate
with a particular virtio-mem region the amount of memory that should be
plugged in at boot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
3faf8605f3 vmm: Group virtio-mem fields under a dedicated structure
This patch simplifies the code as we have one single Option for the
VirtioMemZone. This also prepares for storing additional information
related to the virtio-mem region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
4e1b78e1ff vmm: Add 'hotplugged_size' to memory parameters
Add the new option 'hotplugged_size' to both --memory-zone and --memory
parameters so that we can let the user specify a certain amount of
memory being plugged at boot.

This is also part of making sure we can store the virtio-mem size over a
reboot of the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
8b5202aa5a vmm: Always add virtio-mem region upon VM creation
Now that e820 tables are created from the 'boot_guest_memory', we can
simplify the memory manager code by adding the virtio-mem regions when
they are created. There's no need to wait for the first hotplug to
insert these regions.

This also anticipates the need for starting a VM with some memory
already plugged into the virtio-mem region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
66fc557015 vmm: Store boot guest memory and use it for boot sequence
In order to differentiate the 'boot' memory regions from the virtio-mem
regions, we store what we call 'boot_guest_memory'. This is useful to
provide the adequate list of regions to the configure_system() function
as it expects only the list of regions that should be exposed through
the e820 table.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
1798ed8194 vmm: virtio-mem: Enforce alignment and size requirements
The virtio-mem driver is generating some warnings regarding both size
and alignment of the virtio-mem region if not based on 128MiB:

The alignment of the physical start address can make some memory
unusable.
The alignment of the physical end address can make some memory
unusable.

For these reasons, the current patch enforces virtio-mem regions to be
128MiB aligned and checks the size provided by the user is a multiple of
128MiB.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
0658559880 vmm: memory_manager: Rename 'use_zones' with 'user_provided_zones'
This brings more clarity on the meaning of this boolean.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
775f3346e3 vmm: Rename 'virtiomem' to 'virtio_mem'
For more consistency and help reading the code better, this commit
renames all 'virtiomem*' variables into 'virtio_mem*'.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
015c78411e vmm: Add a 'resize-zone' action to the API actions
Implement a new VM action called 'resize-zone' allowing the user to
resize one specific memory zone at a time. This relies on all the
preliminary work from the previous commits to resize each virtio-mem
device independently from each others.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
141df701dd vmm: memory_manager: Make virtiomem_resize function generic
By adding a new parameter 'id' to the virtiomem_resize() function, we
prepare this function to be usable for both global memory resizing and
memory zone resizing.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
34331d3e72 vmm: memory_manager: Fix virtio-mem resize
It's important to return the region covered by virtio-mem the first time
it is inserted as the device manager must update all devices with this
information.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
adc59a6f15 vmm: memory_manager: Create one virtio-mem per memory zone
Based on the previous code changes, we can now update the MemoryManager
code to create one virtio-mem region and resizing handler per memory
zone. This will naturally create one virtio-mem device per memory zone
from the DeviceManager's code which has been previously updated as well.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
c645a72c17 vmm: Add 'hotplug_size' to memory zones
In anticipation for resizing support of an individual memory zone,
this commit introduces a new option 'hotplug_size' to '--memory-zone'
parameter. This defines the amount of memory that can be added through
each specific memory zone.

Because memory zone resize is tied to virtio-mem, make sure the user
selects 'virtio-mem' hotplug method, otherwise return an error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
30ff7e108f vmm: Prepare code to accept multiple virtio-mem devices
Both MemoryManager and DeviceManager are updated through this commit to
handle the creation of multiple virtio-mem devices if needed. For now,
only the framework is in place, but the behavior remains the same, which
means only the memory zone created from '--memory' generates a
virtio-mem region that can be used for resize.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Sebastien Boeuf
b173b6c5b4 vmm: Create a MemoryZone structure
In order to anticipate the need for storing memory regions along with
virtio-mem information for each memory zone, we create a new structure
MemoryZone that will replace Vec<Arc<GuestRegionMmap>> in the hash map
MemoryZones.

This makes thing more logical as MemoryZones becomes a list of
MemoryZone sorted by their identifier.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 19:20:04 +02:00
Rob Bradford
15025d71b1 devices, vm-device: Move BusDevice and Bus into vm-device
This removes the dependency of the pci crate on the devices crate which
now only contains the device implementations themselves.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-10 09:35:38 +01:00
Sebastien Boeuf
1970ee89da main, vmm: Remove guest_numa_node option from memory zones
The way to describe guest NUMA nodes has been updated through previous
commits, letting the user describe the full NUMA topology through the
--numa parameter (or NumaConfig).

That's why we can remove the deprecated and unused 'guest_numa_node'
option.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-07 07:37:14 +02:00
Sebastien Boeuf
f21c04166a vmm: Move NUMA node list creation to Vm structure
Based on the previous changes introducing new options for both memory
zones and NUMA configuration, this patch changes the behavior of the
NUMA node definition. Instead of relying on the memory zones to define
the guest NUMA nodes, everything goes through the --numa parameter. This
allows for defining NUMA nodes without associating any particular memory
range to it. And in case one wants to associate one or multiple memory
ranges to it, the expectation is to describe a list of memory zone
through the --numa parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-07 07:37:14 +02:00
Sebastien Boeuf
5d7215915f vmm: memory_manager: Store a list of memory zones
Now that we have an identifier per memory zone, and in order to keep
track of the memory regions associated with the memory zones, we create
and store a map referencing list of memory regions per memory zone ID.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-07 07:37:14 +02:00
Sebastien Boeuf
3ff82b4b65 main, vmm: Add mandatory id to memory zones
In anticipation for allowing memory zones to be removed, but also in
anticipation for refactoring NUMA parameter, we introduce a mandatory
'id' option to the --memory-zone parameter.

This forces the user to provide a unique identifier for each memory zone
so that we can refer to these.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-07 07:37:14 +02:00
Sebastien Boeuf
9548e7e857 vmm: Update NUMA node distances internally
Based on the NumaConfig which now provides distance information, we can
internally update the list of NUMA nodes with the exact distances they
should be located from other nodes.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 18:09:01 +02:00
Sebastien Boeuf
db28db8567 vmm: Update NUMA nodes based on NumaConfig
Relying on the list of CPUs defined through the NumaConfig, this patch
will update the internal list of CPUs attached to each NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 15:25:00 +02:00
Sebastien Boeuf
cf81254a8d vmm: memory_manager: Create a NUMA node list
Based on the 'guest_numa_node' option, we create and store a list of
NUMA nodes in the MemoryManager. The point being to associate a list of
memory regions to each node, so that we can later create the ACPI tables
with the proper memory range information.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
768dbd1fb0 vmm: Add 'guest_numa_node' option to 'memory-zone'
With the introduction of this new option, the user will be able to
describe if a particular memory zone should belong to a specific NUMA
node from a guest perspective.

For instance, using '--memory-zone size=1G,guest_numa_node=2' would let
the user describe that a memory zone of 1G in the guest should be
exposed as being associated with the NUMA node 2.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
274c001eab vmm: Use u32 instead of u64 for host_numa_node option
Given that ACPI uses u32 as the type for the Proximity Domain, we can
use u32 instead of u64 as the type for 'host_numa_node' option.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 13:29:42 +02:00
Sebastien Boeuf
a8a9e61c3d vmm: memory_manager: Allow host NUMA for RAM backed files
Let's narrow down the limitation related to mbind() by allowing shared
mappings backed by a file backed by RAM. This leaves the restriction on
only for mappings backed by a regular file.

With this patch, host NUMA node can be specified even if using
vhost-user devices.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
1b4591aecc vmm: memory_manager: Apply NUMA policy to memory zones
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
e6f585a31c vmm: Add 'host_numa_nodes' option to memory zones
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.

The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
ad5d0e4713 vmm: Remove 'mergeable' from memory zones
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 07:26:49 +02:00
Sebastien Boeuf
e8149380b7 vmm: memory_manager: Factorize memory regions creation
Factorize the codepath between simple memory and multiple memory zones.
This simplifies the way regions are memory mapped, as everything relies
on the same codepath. This is performed by creating a memory zone on the
fly for the specific use case where --memory is used with size being
different from 0. Internally, the code can rely on memory zones to
create the memory regions forming the guest memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c58dd761f4 vmm: Remove 'file' option from MemoryConfig
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
5bf7113768 vmm: memory_manager: Remove restrictions about snapshot/restore
User defined memory regions can now support being snapshot and restored,
therefore this commit removes the restrictions that were applied through
earlier commit.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
2583d572fc vmm: memory_manager: Simplify how to restore memory regions
By factorizing a lot of code into create_ram_region(), this commit
achieves the simplification of the restore codepath. Additionally, it
makes user defined memory zones compatible with snapshot/restore.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
b14c861c6f vmm: memory_manager: Store memory regions content only when necessary
First thing, this patch introduces a new function to identify if a file
descriptor is linked to any hard link on the system. This can let the
VMM know if the file can be accessed by the user, or if the file will
be destroyed as soon as the VMM releases the file descriptor.

Based on this information, and associated with the knowledge about the
region being MAP_SHARED or not, the VMM can now decide to skip the copy
of the memory region content. If the user has access to the file from
the filesystem, and if the file has been mapped as MAP_SHARED, we can
consider the guest memory region content to be present in this file at
any point in time. That's why in this specific case, there's no need for
performing the copy of the memory region content into a dedicated file.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d1ce52f3a8 vmm: memory_manager: Make backing file from snapshot optional
Let's not assume that a backing file is going to be the result from
a snapshot for each memory region. These regions might be backed by
a file on the host filesystem (not a temporary file in host RAM), which
means they don't need to be copied and stored into dedicated files.

That's why this commit prepares for further changes by introducing an
optional PathBuf associated with the snapshot of each memory region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
871138d5cc vm-migration: Make snapshot() mutable
There will be some cases where the implementation of the snapshot()
function from the Snapshottable trait will require to modify some
internal data, therefore we make this possible by updating the trait
definition with snapshot(&mut self).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
c13721fdbd vmm: memory_manager: Handle user defined memory zones
In case the memory size is 0, this means the user defined memory
zones are used as a way to specify how to back the guest memory.

This is the first step in supporting complex use cases where the user
can define exactly which type of memory from the host should back the
memory from the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
7cd3867e2c vmm: memory_manager: Provide file offset through create_ram_region()
In anticipation for the need to map part of a file with the function
create_ram_region(), it is extended to accept a file offset as argument.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
59d4a56ab7 vmm: memory_manager: Don't truncate backing file
In case the provided backing file is an actual file and not a directory,
we should not truncate it, as we expect the file to already be the right
size.

This change will be important once we try to map the same file through
multiple memory mappings. We can't let the file be truncated as the
second mapping wouldn't work properly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
d25ec66bb6 vmm: memory_manager: Simplify start_addr()
Small simplification for the function calculating the start address.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Michael Zhao
afc98a5ec9 vmm: Fix AArch64 clippy warnings of vmm and other crates
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-24 10:59:08 +02:00
Anatol Belski
eba42c392f devices: acpi: Add UID to devices with common HID
Some OS might check for duplicates and bail out, if it can't create a
distinct mapping. According to ACPI 5.0 section 6.1.12, while _UID is
optional, it becomes required when there are multiple devices with the
same _HID.

Signed-off-by: Anatol Belski <ab@php.net>
2020-08-14 08:52:02 +02:00