Adding some bits to the existing live migration test with NUMA in order
to properly validate virtio-mem works with live migration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By using a single file for storing the memory ranges, we simplify the
way snapshot/restore works by avoiding multiples files, but the main and
more important point is that we have now a way to save only the ranges
that matter. In particular, the ranges related to virtio-mem regions are
not always fully hotplugged, meaning we don't want to save the entire
region. That's where the usage of memory ranges is interesting as it
lets us optimize the snapshot/restore process when one or multiple
virtio-mem regions are involved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
During snapshot/restore we will need to store this structure, which is
why it must derive the Versionize trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The function memory_range_table() will be reused by the MemoryManager in
a following patch to describe all the ranges that we should snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Copy only the memory ranges that have been plugged through virtio-mem,
allowing for an interesting optimization regarding the time it takes to
migrate a large virtio-mem device. Even if the hotpluggable space is
very large (say 64GiB), if only 1GiB has been previously added to the
VM, only 1GiB will be sent to the destination VM, avoiding the transfer
of the remaining 63GiB which are unused.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support correctly the snapshot/restore and migration use
cases, we must be careful with the ranges that we discard by punching
holes. On restore, there might be some ranges already plugged in,
meaning they should not be discarded. That's why we loop over the list
of blocks to discard only the ranges that are marked as unplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is going to be useful to let virtio-mem report the list of ranges
that are currently plugged, so that both snapshot/restore and migration
will copy only what is needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The amount of memory plugged in the virtio-mem region should always be
kept up to date in the hotplugged_size field from VirtioMemZone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need to duplicate the GuestMemory for snapshot purpose, as we
always have a handle onto the GuestMemory through the guest_memory
field.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since we only support a single PCI bus right now advertise only a single
bus in the ACPI tables. This reduces the number of VM exits from probing
substantially.
Number of PCI config I/O port exits: 17871 -> 1551 (91% reduction) with
direct kernel boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use a simpler method for extracting the affected slot on the eject
command. Also update the terminology to reflect that this a slot rather
than a bdf (which is what device id refers to elsewhere.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the serial buffer handling in order to write the serial
buffer's output to a PTY connected after the serial device stops being
written to by the guest.
This change moves the serial buffer initialization inside the serial
manager. That is done to allow the serial buffer to be made aware of
the PTY and epoll fds needed in order to modify the
EpollDispatch::File trigger. These are then used by the serial buffer
to trigger an epoll event when the PTY fd is writable and the buffer
has content in it. They are also used to remove the trigger when the
buffer is emptied in order to avoid unnecessary wake-ups.
Signed-off-by: William Douglas <william.douglas@intel.com>
In preparation for reorganizing how the serial output is constructed
add methods to the serial devices for setting the out buffer after the
device is created.
Also add a method to enable flushing the output buffer to be used to
write the buffer to the PTY fd once the PTY is writable.
Signed-off-by: William Douglas <william.douglas@intel.com>
In integration test, we fetch latest EDK2 code on its master branch and
build. While the update on EDK2 master is frequent. And the building is
time consuming. It takes a lot of time in CI and local test. Floating on
top of a busy master branch also bring potential risk in tracking and
debugging.
Now that Cloud Hypervisor support in EDK2 has been steady, we can pin
the EDK2 software versions to avoid unnecessary updating and building.
We can update the versions manually every after several months.
The commit also optimizes the build process by applying multi-threaded
compiling.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Added a bash function in integration test script to checkout source code
of a GIT repo with specified branch and commit.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
These packages will be used to compile `stress` from source, and
the `stress` will be used by the virtio-balloon integration test.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Both read_exact_from() and write_all_to() functions from the GuestMemory
trait implementation in vm-memory are buggy. They should retry until
they wrote or read the amount of data that was expected, but instead
they simply return an error when this happens. This causes the migration
to fail when trying to send important amount of data through the
migration socket, due to large memory regions.
This should be eventually fixed in vm-memory, and here is the link to
follow up on the issue: https://github.com/rust-vmm/vm-memory/issues/174
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This resolves issues between released version of cargo fuzz and nightly.
See rust-fuzz/cargo-fuzz#276
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactored the test case `test_virtio_iommu` to adapt architectures and
different choices among ACPI and FDT. In the case of ACPI, a Focal image
with modified kernel is tested.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
On AArch64, ACPI must work with UEFI (EDK2). This way, the kernel is
always loaded from the disk image. We can not specify a direct custom
kernel while using ACPI.
To use a custom kernel, we have to replace the kernel file in the disk
image by:
- Making a copy of the Focal `raw` image
- Mounting the rootfs with `libguestfs-tools`
- Replacing the compressed kernel file
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Installed `libguestfs-tools` to replace kernel file in cloud image.
Installed a kernel as `libguestfs-tools` requires.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Implement the infrastructure that lets a virtio-mem device map the guest
memory into the device. This is necessary since with virtio-mem zones
memory can be added or removed and the vfio-user device must be
informed.
Fixes: #3025
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By moving this from the VfioUserPciDevice to DeviceManager the client
can be reused for handling DMA mapping behind an IOMMU.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For vfio-user the mapping handler is per device and needs to be removed
when the device in unplugged.
For VFIO the mapping handler is for the default VFIO container (used
when no vIOMMU is used - using a vIOMMU does not require mappings with
virtio-mem)
To represent these two use cases use an enum for the handlers that are
stored.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with a virtio-balloon device attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Given the 'virtiofsd' executable is used in multiple CI workers,
installing them directly to the docker image is more efficient and can
save CI time.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Looking up devices on the port I/O bus is time consuming during the
boot at there is an O(lg n) tree lookup and the overhead from taking a
lock on the bus contents.
Avoid this by adding a fast path uses the hardcoded port address and
size and directs PCI config requests directly to the device.
Command line:
target/release/cloud-hypervisor --kernel ~/src/linux/vmlinux --cmdline "root=/dev/vda1 console=ttyS0" --serial tty --console off --disk path=~/workloads/focal-server-cloudimg-amd64-custom-20210609-0.raw --api-socket /tmp/api
PIO exit: 17913
PCI fast path: 17871
Percentage on fast path: 99.8%
perf before:
marvin:~/src/cloud-hypervisor (main *)$ perf report -g | grep resolve
6.20% 6.20% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve
perf after:
marvin:~/src/cloud-hypervisor (2021-09-17-ioapic-fast-path *)$ perf report -g | grep resolve
0.08% 0.08% vcpu0 cloud-hypervisor [.] vm_device:🚌:Bus::resolve
The compromise required to implement this fast path is bringing the
creation of the PciConfigIo device into the DeviceManager::new() so that
it can be used in the VmmOps struct which is created before
DeviceManager::create_devices() is called.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Added a section in "Usage" chapter of "iommu.md" to introduce the
special behavior when virtio-iommu is working with FDT on AArch64.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
For AArch64, now virtual IOMMU is only tested on FDT, not ACPI.
In the case of FDT, the behavior of IOMMU is a bit different with ACPI.
All the devices on the PCI bus will be attached to the virtual IOMMU,
except the virtio-iommu device itself. So these devices will all be
added to IOMMU groups, and appear in folder '/sys/kernel/iommu_groups/'.
The result is, on AArch64 IOMMU group '0' contains "0000:00:01.0" which
is the console device. But on X86, console device is not attached to
IOMMU. So the IOMMU group '0' contains "0000:00:02.0" which is the first
disk.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>