Since the devices behind the IOMMU cannot be changed at runtime we offer
the ability to place all devices on user chosen segments behind the
IOMMU. This allows the hotplugging of devices behind the IOMMU provided
that they are assigned to a segment that is located behind the iommu.
Fixes: #911
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding a new parameter free_page_reporting=on|off to the balloon device
so that we can enable the corresponding feature from virtio-balloon.
Running a VM with a balloon device where this feature is enabled allows
the guest to report pages that are free from guest's perspective. This
information is used by the VMM to release the corresponding pages on the
host.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).
Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a payload is found in the TDVF section, and after it's been copied to
the guest memory, make sure to create the corresponding TdPayload
structure and insert it through the HOB.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case of TDX, if a kernel and/or a command line are provided by the
user, they can't be treated the same way as for the non-TDX case. That
is why this patch ensures the function load_kernel() is only invoked for
the non-TDX case.
For the TDX case, whenever TDVF contains a Payload and/or PayloadParam
sections, the file provided through --kernel and the parameters provided
through --cmdline are copied at the locations specified by each TDVF
section.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The TDVF specification has been updated with the ability to provide a
specific payload, which means we will be able to achieve direct kernel
boot.
For that reason, let's not prevent the user from using --kernel
parameter when running with TDX.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Make sure Cloud Hypervisor relies on upstream and actively maintained
vfio-ioctls crate from the rust-vmm/vfio repository instead of the
deprecated version coming from rust-vmm/vfio-ioctls repository.
Fixes#3673
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.
For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
While cloud-hypervisor does support receiving the file descriptors of a
tuntap device, advertising the fds structure via the openAPI can lead to
misinterpretations of what can and what should be done.
An unadvertised consumer will think that they could rather just set the
file descriptors there directly, or even pass them as a byte array.
However, the proper way to go in those cases would be actually sending
those via send_msg(), together with the request.
As hacking the openAPI auto-generated code to properly do this is not
*that* trivial, and as doing so during a `create VM` request is not
supported, we better not advertising those.
Please, for more details, also check:
https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3607#issuecomment-1020935523
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that all the preliminary work has been merged to make Cloud
Hypervisor work with the upstream crate virtio-queue from
rust-vmm/vm-virtio repository, we can move the whole codebase and remove
the local copy of the virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the latest code from the micro-http crate, this patch adds the
support for multiple file descriptors to be sent along with the add-net
request. This means we can now hotplug multiqueue network interface to
the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If a PMU is enabled in a VM, we also need to initialize the PMU
when the VM is restored. Otherwise, vCPUs cannot be started after
the VM is restored.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
When enable PMU on arm64, ioctl with group KVM_HAS_DEVICE_ATTR will be
blocked by seccomp, add it to authorized list.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Relying on helpers for creating the ACPI tables and to add each table to
the HOB, this patch connects the dot to provide the set of ACPI tables
to the TD firmware.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The way to create ACPI tables for TDX is different as each table must be
passed through the HOB. This means the XSDT table is not required since
the firmware will take care of creating it. Same for RSDP, this is
firmware responsibility to provide it to the guest.
That's why this patch creates a TDX dedicated function, returning a list
of Sdt objects, which will let the calling code copy the content of each
table through the HOB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case of TDX, we don't want to create the ACPI tables the same way we
do for all the other use cases. That's because the ACPI tables don't
need to be written to guest memory at a specific address, instead they
are passed directly through the HOB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's been decided the ACPI tables will be passed to the firmware in a
different way, rather than using TD_VMM_DATA. Since TD_VMM_DATA was
introduced for this purpose, there's no reason to keep it in our
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This table is listed as required in the ARM Base Boot Requirements
document. The particular need arises to make the serial debugging of
Windows guest functional.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Ensure all pending virtio activations (as triggered by MMIO write on the
vCPU threads leading to a barrier wait) are completed before pausing the
vCPUs as otherwise there will a deadlock with the VMM waiting for the
vCPU to acknowledge it's pause and the vCPU waiting for the VMM to
activate the device and release the barrier.
Fixes: #3585
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These are the leftovers from the commit 8155be2:
arch: aarch64: vm_memory is not required when configuring vcpu
Signed-off-by: Henry Wang <Henry.Wang@arm.com>