When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.
If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.
This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Whenever running TDX, we must pass the ACPI tables to the TDVF firmware
running in the guest. The proper way to do this is by adding the tables
to the TdHob as a TdVmmData type, so that TDVF will know how to access
these tables and expose them to the guest OS.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of having the ACPI tables being created both in x86_64 and
aarch64 implementations of configure_system(), we can remove the
duplicated code by moving the ACPI tables creation in vm.rs inside the
boot() function.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On AArch64, device hotplug can be enabled with ACPI. Therefore,
this commit enables the hotplug test case for following devices:
- PCI bar reprogramming
- virtio-disk
- virtio-net
- macvtap
- virtio-vsock
- virtio-pmem: Works with the latest reference kernel
- virtio-fs: Works with the latest reference kernel
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Currently vfio and nested virtualization is not used on AArch64,
and SGX is a x86_64 only feature. Therefore this commit adds the
architecture gates for helper functions related to vfio, SGX, and
nested virtualization to mute warnings when building tests on the
AArch64 platform.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Update introduction of option `prefault`.
In addition, this commit also did the following:
- Rearrange options, synchronize order with `config.rs`.
- Break long lines in `hugepages`.
- Update old example of `hugepages` in memory zone.
Signed-off-by: Li Yu <liyu.yukiteru@bytedance.com>
Memory hotplug and virtio_balloon works on arm64 with:
- memory hotplug: An updated kernel using ACPI
- virtio balloon: `stress` installed in the cloud image
Therefore, we can enable test cases for them in integration test.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
On MSHV some of the integration test cases are not supported yet
or still in progress. This patch disables all those test cases.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
The argument `prefault` is provided in MemoryManager, but it can
only be used by SGX and restore.
With prefault (MAP_POPULATE) been set, subsequent page faults will
decrease during running, although it will make boot slower.
This commit adds `prefault` in MemoryConfig and MemoryZoneConfig.
To resolve conflict between memory and restore, argument
`prefault` has been changed from `bool` to `Option<bool>`, when
its value is None, config from memory will be used, otherwise
argument in Option will be used.
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
Setting the reply_ack should depend on the set of acknowledged features
containing the REPLY_ACK flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding some bits to the existing live migration test with NUMA in order
to properly validate virtio-mem works with live migration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By using a single file for storing the memory ranges, we simplify the
way snapshot/restore works by avoiding multiples files, but the main and
more important point is that we have now a way to save only the ranges
that matter. In particular, the ranges related to virtio-mem regions are
not always fully hotplugged, meaning we don't want to save the entire
region. That's where the usage of memory ranges is interesting as it
lets us optimize the snapshot/restore process when one or multiple
virtio-mem regions are involved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
During snapshot/restore we will need to store this structure, which is
why it must derive the Versionize trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The function memory_range_table() will be reused by the MemoryManager in
a following patch to describe all the ranges that we should snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Copy only the memory ranges that have been plugged through virtio-mem,
allowing for an interesting optimization regarding the time it takes to
migrate a large virtio-mem device. Even if the hotpluggable space is
very large (say 64GiB), if only 1GiB has been previously added to the
VM, only 1GiB will be sent to the destination VM, avoiding the transfer
of the remaining 63GiB which are unused.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support correctly the snapshot/restore and migration use
cases, we must be careful with the ranges that we discard by punching
holes. On restore, there might be some ranges already plugged in,
meaning they should not be discarded. That's why we loop over the list
of blocks to discard only the ranges that are marked as unplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is going to be useful to let virtio-mem report the list of ranges
that are currently plugged, so that both snapshot/restore and migration
will copy only what is needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The amount of memory plugged in the virtio-mem region should always be
kept up to date in the hotplugged_size field from VirtioMemZone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to duplicate the GuestMemory for snapshot purpose, as we
always have a handle onto the GuestMemory through the guest_memory
field.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we only support a single PCI bus right now advertise only a single
bus in the ACPI tables. This reduces the number of VM exits from probing
substantially.
Number of PCI config I/O port exits: 17871 -> 1551 (91% reduction) with
direct kernel boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use a simpler method for extracting the affected slot on the eject
command. Also update the terminology to reflect that this a slot rather
than a bdf (which is what device id refers to elsewhere.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the serial buffer handling in order to write the serial
buffer's output to a PTY connected after the serial device stops being
written to by the guest.
This change moves the serial buffer initialization inside the serial
manager. That is done to allow the serial buffer to be made aware of
the PTY and epoll fds needed in order to modify the
EpollDispatch::File trigger. These are then used by the serial buffer
to trigger an epoll event when the PTY fd is writable and the buffer
has content in it. They are also used to remove the trigger when the
buffer is emptied in order to avoid unnecessary wake-ups.
Signed-off-by: William Douglas <william.douglas@intel.com>
In preparation for reorganizing how the serial output is constructed
add methods to the serial devices for setting the out buffer after the
device is created.
Also add a method to enable flushing the output buffer to be used to
write the buffer to the PTY fd once the PTY is writable.
Signed-off-by: William Douglas <william.douglas@intel.com>
In integration test, we fetch latest EDK2 code on its master branch and
build. While the update on EDK2 master is frequent. And the building is
time consuming. It takes a lot of time in CI and local test. Floating on
top of a busy master branch also bring potential risk in tracking and
debugging.
Now that Cloud Hypervisor support in EDK2 has been steady, we can pin
the EDK2 software versions to avoid unnecessary updating and building.
We can update the versions manually every after several months.
The commit also optimizes the build process by applying multi-threaded
compiling.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Added a bash function in integration test script to checkout source code
of a GIT repo with specified branch and commit.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
These packages will be used to compile `stress` from source, and
the `stress` will be used by the virtio-balloon integration test.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Both read_exact_from() and write_all_to() functions from the GuestMemory
trait implementation in vm-memory are buggy. They should retry until
they wrote or read the amount of data that was expected, but instead
they simply return an error when this happens. This causes the migration
to fail when trying to send important amount of data through the
migration socket, due to large memory regions.
This should be eventually fixed in vm-memory, and here is the link to
follow up on the issue: https://github.com/rust-vmm/vm-memory/issues/174
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This resolves issues between released version of cargo fuzz and nightly.
See rust-fuzz/cargo-fuzz#276
Signed-off-by: Rob Bradford <robert.bradford@intel.com>