This commit adds `KVM_SET_GUEST_DEBUG` and `KVM_TRANSLATE` ioctls to
seccomp filter to enable guest debugging without `--seccomp=false`.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
This commit adds `VmState::BreakPoint` to handle hardware breakpoint.
The VM will enter this state when a breakpoint hits or a debugger
interrupts the execution.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
42b5d4a2f7 has changed how the PciBdf
field of a DeviceNode is represented (from an int32 to its own struct).
To avoid marshelling / demarshelling issues for the projects relying on
the openapi auto generated code, let's propagate the change, updating
the yaml file accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`Dies per package` setting of VCPU topology doesnot apply on AArch64.
Now we only accept `1` value. This way we can make the `dies` field
transparent, avoid it from impacting the topology setting.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Based on the helpers from the hypervisor crate, the VMM can identify
what type of hypercall has been issued through the KVM_EXIT_TDX reason.
For now, we only log warnings and set the status to INVALID_OPERAND
since these hypercalls aren't supported. The proper handling will be
implemented later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the object returned from CpuManager.create_vcpu() is never used,
we can avoid the cloning of this object.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By having the DeviceNode storing a PciBdf, we simplify the internal code
as well as allow for custom Serialize/Deserialize implementation for the
PciBdf structure. These custom implementations let us display the PCI
s/b/d/f in a human readable format.
Fixes#3711
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As we've added support for cold adding devices to a VM that was created
but not already started, we should propagate the `204` response
generated on those cases to the yaml file, so openapi-generator can
produce the correct client code on the go side, to handle both `200` and
`204` successful results.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.
Fixes: #3639
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move add_to_config to config.rs so it can be used from both inside
and outside of the vm.rs file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TDx support is already present on the project for quite some time, but
the TDx configuration was not yet exposed to the ones using CH via the
OpenAPI auto generated code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since the devices behind the IOMMU cannot be changed at runtime we offer
the ability to place all devices on user chosen segments behind the
IOMMU. This allows the hotplugging of devices behind the IOMMU provided
that they are assigned to a segment that is located behind the iommu.
Fixes: #911
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding a new parameter free_page_reporting=on|off to the balloon device
so that we can enable the corresponding feature from virtio-balloon.
Running a VM with a balloon device where this feature is enabled allows
the guest to report pages that are free from guest's perspective. This
information is used by the VMM to release the corresponding pages on the
host.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).
Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As per this kernel documentation:
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.
Fixes: #3658
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a payload is found in the TDVF section, and after it's been copied to
the guest memory, make sure to create the corresponding TdPayload
structure and insert it through the HOB.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case of TDX, if a kernel and/or a command line are provided by the
user, they can't be treated the same way as for the non-TDX case. That
is why this patch ensures the function load_kernel() is only invoked for
the non-TDX case.
For the TDX case, whenever TDVF contains a Payload and/or PayloadParam
sections, the file provided through --kernel and the parameters provided
through --cmdline are copied at the locations specified by each TDVF
section.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The TDVF specification has been updated with the ability to provide a
specific payload, which means we will be able to achieve direct kernel
boot.
For that reason, let's not prevent the user from using --kernel
parameter when running with TDX.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.
For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.
A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
While cloud-hypervisor does support receiving the file descriptors of a
tuntap device, advertising the fds structure via the openAPI can lead to
misinterpretations of what can and what should be done.
An unadvertised consumer will think that they could rather just set the
file descriptors there directly, or even pass them as a byte array.
However, the proper way to go in those cases would be actually sending
those via send_msg(), together with the request.
As hacking the openAPI auto-generated code to properly do this is not
*that* trivial, and as doing so during a `create VM` request is not
supported, we better not advertising those.
Please, for more details, also check:
https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3607#issuecomment-1020935523
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Based on the latest code from the micro-http crate, this patch adds the
support for multiple file descriptors to be sent along with the add-net
request. This means we can now hotplug multiqueue network interface to
the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If a PMU is enabled in a VM, we also need to initialize the PMU
when the VM is restored. Otherwise, vCPUs cannot be started after
the VM is restored.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
When enable PMU on arm64, ioctl with group KVM_HAS_DEVICE_ATTR will be
blocked by seccomp, add it to authorized list.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Relying on helpers for creating the ACPI tables and to add each table to
the HOB, this patch connects the dot to provide the set of ACPI tables
to the TD firmware.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The way to create ACPI tables for TDX is different as each table must be
passed through the HOB. This means the XSDT table is not required since
the firmware will take care of creating it. Same for RSDP, this is
firmware responsibility to provide it to the guest.
That's why this patch creates a TDX dedicated function, returning a list
of Sdt objects, which will let the calling code copy the content of each
table through the HOB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case of TDX, we don't want to create the ACPI tables the same way we
do for all the other use cases. That's because the ACPI tables don't
need to be written to guest memory at a specific address, instead they
are passed directly through the HOB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's been decided the ACPI tables will be passed to the firmware in a
different way, rather than using TD_VMM_DATA. Since TD_VMM_DATA was
introduced for this purpose, there's no reason to keep it in our
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This table is listed as required in the ARM Base Boot Requirements
document. The particular need arises to make the serial debugging of
Windows guest functional.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Ensure all pending virtio activations (as triggered by MMIO write on the
vCPU threads leading to a barrier wait) are completed before pausing the
vCPUs as otherwise there will a deadlock with the VMM waiting for the
vCPU to acknowledge it's pause and the vCPU waiting for the VMM to
activate the device and release the barrier.
Fixes: #3585
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These are the leftovers from the commit 8155be2:
arch: aarch64: vm_memory is not required when configuring vcpu
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).
Fixes: #3566
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>