Change the block size to the standard 512 byte sector size to that
disk images can be used (since their partition tables will be specified
in terms of 512 byte sectors.)
Also remove the hugepages=on option from the command line as it is not
necessary.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When mapping the region into the guest ensure that all the fields are
updated correctly as the unmap code path checks that they are set.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit adds an AArch64-only integration test case called
`test_guest_numa_nodes_dt` so that it is possible to test the
NUMA for the FDT on AArch64 platform.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Based on `--memory-zone` and `--numa` param in the Cloud Hypervisor
cmdline, the NUMA memory configuration is described. This commit
adds such NUMA memory configuration to the FDT memory node.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
It is forbidden that the same memory zone belongs to more than one
NUMA node. This commit adds related validation to the `--numa`
parameter to prevent the user from specifying such configuration.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
For the purpose of identification, each NUMA node is associated
with a unique token known as a `numa-node-id`. For the purpose of
device tree binding, a `numa-node-id` is a 32-bit integer.
The CPU node is associated with a NUMA node by the presence of a
`numa-node-id` property which contains the node id of the device.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The optional device tree node distance-map describes the relative
distance (memory latency) between all NUMA nodes.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This is to make sure the NUMA node data structures can be accessed
both from the `vmm` crate and `arch` crate.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The AArch64 platform provides a NUMA binding for the device tree,
which means on AArch64 platform, the NUMA setup can be extended to
more than the ACPI feature.
Based on above, this commit extends the NUMA setup and data
structures to following scenarios:
- All AArch64 platform
- x86_64 platform with ACPI feature enabled
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Michael Zhao <Michael.Zhao@arm.com>
Introducing a new structure VhostUserCommon allowing to factorize a lot
of the code shared between the vhost-user devices (block, fs and net).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of panicking with an expect() function, the QcowDiskSync::new
function now propagates the error properly. This ensures the VMM will
not panic, which might be the source of weird errors if only one thread
exits while the VMM continues to run.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of running the generic block fuzzer with QCOW, it's better to
use a RAW file since it's less complex and it will focus on virtqueues.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We cannot let vhost-user devices connect to the backend when the Block,
Fs or Net object is being created during a restore/migration. The reason
is we can't have two VMs (source and destination) connected to the same
backend at the same time. That's why we must delay the connection with
the vhost-user backend until the restoration is performed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introducing a new function to factorize a small part of the
initialization that is shared between a full reinitialization and a
restoration.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code wasn't doing what it was expected to. The '?' was simply
returning the error to the top level function, meaning the Err() case in
the match was never hit. Moving the whole logic to a dedicated function
allows to identify when something got wrong without propagating to the
calling function, so that we can still stop the dirty logging and
unpause the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the migration succeeds, the destination VM will be correctly
running, with potential vhost-user backends attached to it. We can't let
the source VM trying to reconnect to the same backends, which is why
it's safer to shutdown the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to prevent the vhost-user devices from reconnecting to the
backend after the migration has been successfully performed, we make
sure to kill the thread in charge of handling the reconnection
mechanism.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a way to let every Migratable object know when the migration is
complete, so they can take appropriate actions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
During a migration, the vhost-user device talks to the backend to
retrieve the dirty pages. Once done with this, a snapshot will be taken,
meaning there's no need to communicate with the backend anymore. Closing
the communication is needed to let the destination VM being able to
connect to the same backend.
That's why we shutdown the communication with the backend in case a
migration has been started and we're asked for a snapshot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This anticipates the need for creating a new Blk, Fs or Net object
without having performed the connection with the vhost-user backend yet.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation for creating vhost-user devices in a different way when
being restored compared to a fresh start, this commit introduces a new
boolean created by the Vm depending on the use case, and passed down to
the DeviceManager. In the future, the DeviceManager will use this flag
to assess how vhost-user devices should be created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It was incorrect to call Vec::from_raw_parts() on the address pointing
to the shared memory log region since Vec is a Rust specific structure
that doesn't directly translate into bytes. That's why we use the same
function from std::slice in order to create a proper slice out of the
memory region, which is then copied into a Vec.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the common vhost-user code can handle logging dirty pages
through shared memory, we need to advertise it to the vhost-user
backends with the protocol feature VHOST_USER_PROTOCOL_F_LOG_SHMFD.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Correct operation of user devices (vfio-user) requires shared memory so
flag this to prevent it from failing in strange ways.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the vfio-user / user devices from the config. Currently hotplug
of the devices is not supported nor can they be placed behind the
(virt-)iommu.
Removal of the coldplugged device is however supported.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Taking advantage of the refactored VFIO code implement a new
VfioUserPciDevice that wraps the client for vfio-user and exposes the
BusDevice and PciDevice into the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the user to specify devices that are running in a different
userspace process and communicated with vfio-user.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement (most) of the client side (i.e. VMM side) of the vfio-user
protocol:
https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
Items that are not implemented (because they are optimisations or unused
due to alternative solutions:
* VFIO_USER_DMA_READ/WRITE - this is a way for the server to read guest
memory if the guest memory is not shared by fd where the client
doesn't support it. However since we do support sharing the memory by
fd this is not required.
* VFIO_USER_GET_REGION_IO_FDS - an optimisation to bypass the VMM by
having KVM talk directly to the backend using ioregionfd
* VFIO_USER_DIRTY_PAGES - for the implementation of live migration
Signed-off-by: Rob Bradford <robert.bradford@intel.com>