Commit Graph

1610 Commits

Author SHA1 Message Date
Fabiano Fidêncio
2c8045343c vmm,cpu: Deny resizing only if the vcpu amount has changed
188078467d made clear that resize should
only happen when dealing with a "dynamic" CpuManager.  Although this is
very much correct, it causes a regression on Kata Containers (and on any
other consumer of Cloud Hypervisor) in cases where a resize would be
triggered but the vCPUs values wouldn't be changed.

There's no doubt Kata Containers could do better and do not call a
resize in such situations, and that's something that should **also** be
solved there.  However, we should also work this around on Cloud
Hypervisor side as it introduces a regression with the current Kata
Containers code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-30 21:29:08 +01:00
Sebastien Boeuf
3c973fa7ce virtio-devices: vhost-user: Add support for TDX
By enabling the VIRTIO feature VIRTIO_F_IOMMU_PLATFORM for all
vhost-user devices when needed, we force the guest to use the DMA API,
making these devices compatible with TDX. By using DMA API, the guest
triggers the TDX codepath to share some of the guest memory, in
particular the virtqueues and associated buffers so that the VMM and
vhost-user backends/processes can access this memory.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-30 10:32:23 +02:00
Rob Bradford
ca68b9e7a9 build: Remove "cmos" feature gate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
e0d3efec6e devices: cmos: Implement CMOS based reset
If EFI reset fails on the Linux kernel then it will fallthrough to CMOS
reset. Implement this as one of our reset solutions.

Fixes: #3912

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-29 15:20:58 +01:00
Rob Bradford
7c0cf8cc23 arch, devices, vmm: Remove "acpi" feature gate
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-28 09:18:29 -07:00
William Douglas
6b0df31e5d vmm: Add support for enabling AMX in vm guests
AMX is an x86 extension adding hardware units for matrix
operations (int and float dot products). The goal of the extension is
to provide performance enhancements for these common operations.

On Linux, AMX requires requesting the permission from the kernel prior
to use. Guests wanting to make use of the feature need to have the
request made prior to starting the vm.

This change then adds the first --cpus features option amx that when
passed will enable AMX usage for guests (needs a 5.17+ kernel) or
exits with failure.

The activation is done in the CpuManager of the VMM thread as it
allows migration and snapshot/restore to work fairly painlessly for
AMX enabled workloads.

Signed-off-by: William Douglas <william.douglas@intel.com>
2022-03-25 14:11:54 -07:00
Bo Chen
639a7dd73a vmm: Improve 'test_config_validation' with precise Err assertions
Fixed: #3879

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-25 09:17:05 +00:00
Sebastien Boeuf
afd9f17b73 virtio-fs: Deprecate the DAX feature
Disable the DAX feature from the virtio-fs implementation as the feature
is still not stable. The feature is deprecated, meaning the 'dax'
parameter will be removed in about 2 releases cycles.

In the meantime, the parameter value is ignored and forced to be
disabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-24 10:39:11 -07:00
Rob Bradford
7a8061818e vmm: Don't expose MemoryManager ACPI functionality unless required
When running non-dynamic or with virtio-mem for hotplug the ACPI
functionality should not be included on the DSDT nor does the
MemoryManager need to be placed on the MMIO bus.

Fixes: #3883

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:51 +00:00
Rob Bradford
f6dfb42a64 vmm: cpu: Don't place CpuManager on MMIO bus when non-dynamic
This is now consistent with not supplying the _CRS for the device when
CpuManager is not dynamic.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-24 13:17:39 +00:00
Rob Bradford
bbf7fd5372 vmm: Reject memory resizing on TDX
This is similar to the dynamic concept used in CpuManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-23 23:15:20 +00:00
Rob Bradford
1756b23aea vmm: device_manager: Check IOMMU placed device hotplug
Rather than just printing a message return an error back through the API
if the user attempts to hotplug a device that supports being behind an
IOMMU where that device isn't placed on an IOMMU segment.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
0834eca8d4 vmm: config: Validate IOMMU configuration
Ensure devices that are specified to be on a PCI segment that is behind
the IOMMU are IOMMU enabled if possible or error out for those devices
that do not support it.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
6d2224f1ba vmm: device_manager: Create IOMMU mapping for hotplugged virtio devices
Previously it was not possible to enable vIOMMU for a virtio device.
However with the ability to place an entire PCI segment behind the
IOMMU the IOMMU mapping needs to be setup for the virtio device if it is
behind the IOMMU.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
54b65107b1 vmm: config: Validate vDPA devices in configuration
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-22 11:13:20 +00:00
Rob Bradford
3b8a017257 vmm: acpi: Print total size of ACPI tables
This can already be calculated by the summing the tables reported by the
Linux kernel but this is more convenient.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-21 14:41:46 +00:00
Sebastien Boeuf
9c95109a6b vmm: Streamline reboot code path
Separate the destruction and cleanup of original VM and the creation of
the new one. In particular have a clear hand off point for resources
(e.g. reset EventFd) used by the new VM from the original. In the
situation where vm.shutdown() generates an error this also avoids the
Vmm reference to the Vm (self.vm) from being maintained.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:50 +01:00
Sebastien Boeuf
3fea5f5396 vmm: Add support for hotplugging a vDPA device
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Sebastien Boeuf
c73c6039c3 vmm: Enable vDPA support
Based on the newly added Vdpa device along with the new vdpa parameter,
this patch enables the support for vDPA devices.

It's important to note this the only virtio device for which we provide
an ExternalDmaMapping instance. This will allow for the right DMA ranges
to be mapped/unmapped.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Sebastien Boeuf
72169686fe vmm: Add a vDPA device parameter
Introduce a new --vdpa parameter associated with a VdpaConfig for the
future creation of a Vdpa device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Rob Bradford
7324b0e514 vmm: cpu: Only include hotplug/unplug related AML code if dynamic
This will significantly reduce the size of the DSDT and the effort
required to parse them if there is no requirement to support
hotplug/unplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
Rob Bradford
188078467d vmm: cpu: Deny resizing if CpuManager is not dynamic
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
Rob Bradford
e5cb13588b vmm: cpu: Add concept of making CpuManager dynamic
If the CpuManager is dynamic it devices CPUs can be
hotplugged/unplugged.

Since TDX does not support CPU hotplug this is currently the only
determinator as to whether the CpuManager is dynamic.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-03-17 13:46:21 +00:00
LiHui
b0be5ff8ad API: fix http hang for vmm.ping/vm.create/vm.info/vmm.shutdown
vmm.ping/vm.info will hang for PUT method, vm.create/vmm.shutdonw hang for GET method.
Because these four APIs do not write the response body when the HTTP method does not match.

Signed-off-by: LiHui <andrewli@kubesphere.io>
2022-03-11 11:56:14 +00:00
Sebastien Boeuf
9d46890dc0 vmm: device_manager: Make virtio DMA mapping conditional on vIOMMU
In case the virtio device which requires DMA mapping is placed behind a
virtual IOMMU, we shouldn't map/unmap any region manually. Instead, we
provide the DMA handler to the virtio-iommu device so that it can
trigger the proper mappings.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
a4f742277b vmm: device_manager: Handle DMA mapping for virtio devices
If a virtio device is associated with a DMA handler, the DMA mapping and
unmapping is performed from the device manager through the handler.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
86bc313f38 virtio-devices, vmm: Register a DMA handler to VirtioPciDevice
Given that some virtio device might need some DMA handling, we provide a
way to store this through the VirtioPciDevice layer, so that it can be
accessed when the PCI device is removed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
54d63e774c vmm: device_manager: Extend MetaVirtioDevice with a DMA handler
In anticipation for handling potential DMA mapping/unmapping operations for a
virtio device, we extend the MetaVirtioDevice with an additional field
that holds an optional DMA handler.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
f801b0fc72 vmm: device_manager: Factorize virtio device tuple into structure
The tuple of information related to each virtio device is too big, and
it's better to factorize it through a dedicated structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Sebastien Boeuf
80296b9497 vmm: device_manager: Remove typedef VirtioDeviceArc
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-11 12:37:17 +01:00
Yi Wang
5375b84e3b vmm: interrupt: fix msi mask irq causing kernel panic on AMD
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.

This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.

Also, this patch adjusts the order of enable() and set_gsi_routes() in
unmask(), which should do no harm.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-03-10 09:27:50 +01:00
Yi Wang
db9e5e5a87 vmm: interrupt: fix msi mask irq causing kernel panic on AMD
When mask a msi irq, we set the entry.masked to be true, so kvm
hypervisor will not pass the gsi to kernel through KVM_SET_GSI_ROUTING
ioctl which update kvm->irq_routing. This will trigger kernel
panic on AMD platform when the gsi is the largest one in kernel
kvm->irqfds.items:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

To solve this problem, move route.disable() before set_gsi_routes() to
remove the gsi from irqfds.items first.

This problem only exists on AMD platform, 'cause on Intel platform
kernel just return when update irte while it only prints a warning on
AMD.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-03-10 09:27:50 +01:00
Wei Liu
4cf22e4ec7 arch: do not hardcode MMIO region length in MmioDeviceInfo
Add a field for its length and fix up users.

Things work just because all hardcoded values agree with each other.
This is prone to breakage.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-03-04 15:21:48 +08:00
Feng Ye
6c1fe07d90 openapi: Mark ReceiveMigrationData.receiver_url as required
Signed-off-by: Feng Ye <yefeng@smartx.com>
2022-02-24 09:17:22 +01:00
Sebastien Boeuf
00fbd77494 vmm: api: Make 'local' optional in SendMigrationData
Make sure the OpenAPI definition matches the code.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-23 14:37:41 +01:00
Feng Ye
c504f302e9 vmm: api: Make VmSendMigrationData.local optional
Fixes: #3756

Signed-off-by: Feng Ye <yefeng@smartx.com>
2022-02-23 11:56:09 +00:00
Akira Moroo
2451c4d833 vmm: Implement GDB event handler to enable --gdb flag
This commit adds event fds and the event handler to send/receive
requests and responses from the GDB thread. It also adds `--gdb` flag to
enable GDB stub feature.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
23bb629241 vmm: Add stop_on_boot to Vm to stop VM on boot
This commit adds `stop_on_boot` to `Vm` so that the VM stops before
starting on boot requested. This change is required to keep the target
VM stopped before a debugger attached as the user expected.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
bae63a8b8c vmm: Add debug_request to send debug request
This commit adds `Vm::debug_request` to handle `GdbRequestPayload`,
which will be sent from the GDB thread.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
2f430e08e1 vmm: Implement multicore GDB stub support
This commit adds GDB stub implementation with multicore support. This
implementaton is based on the gdbstub crate example code [1].

[1]
https://github.com/daniel5151/gdbstub/tree/master/examples/armv4t_multicore

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
f1c4705638 vmm: Add Debuggable trait implementation
This commit adds initial gdb.rs implementation for `Debuggable` trait to
describe a debuggable component. Some part of the trait bound
implementations is based on the crosvm GDB stub code [1].

[1] https://github.com/google/crosvm/blob/main/src/gdb.rs

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
a2a492f3df seccomp: Add ioctls to seccomp filter for guest debug
This commit adds `KVM_SET_GUEST_DEBUG` and `KVM_TRANSLATE` ioctls to
seccomp filter to enable guest debugging without `--seccomp=false`.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Akira Moroo
f452e51488 vmm: Add BreakPoint to VmState
This commit adds `VmState::BreakPoint` to handle hardware breakpoint.
The VM will enter this state when a breakpoint hits or a debugger
interrupts the execution.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Fabiano Fidêncio
dd77070f16 openapi: Update the PciBdf type
42b5d4a2f7 has changed how the PciBdf
field of a DeviceNode is represented (from an int32 to its own struct).

To avoid marshelling / demarshelling issues for the projects relying on
the openapi auto generated code, let's propagate the change, updating
the yaml file accordingly.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-22 15:10:08 +00:00
Michael Zhao
0fc3fad363 vmm: Limit "Dies" in VCPU topology on AArch64
`Dies per package` setting of VCPU topology doesnot apply on AArch64.
Now we only accept `1` value. This way we can make the `dies` field
transparent, avoid it from impacting the topology setting.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-02-22 09:21:00 +08:00
Michael Zhao
0fa31539eb vmm: Add default VCPU topology in PPTT on AArch64
When VCPU topology is not specified, fill the PPTT with default setting.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-02-22 09:21:00 +08:00
Sebastien Boeuf
0ac094c0d1 vmm: Handle TDX hypercalls with INVALID_OPERAND
Based on the helpers from the hypervisor crate, the VMM can identify
what type of hypercall has been issued through the KVM_EXIT_TDX reason.

For now, we only log warnings and set the status to INVALID_OPERAND
since these hypercalls aren't supported. The proper handling will be
implemented later.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-18 14:41:07 +01:00
Sebastien Boeuf
a3dfe726f8 vmm: cpu: Avoid useless cloning of Arc<Mutex<Vcpu>>
Since the object returned from CpuManager.create_vcpu() is never used,
we can avoid the cloning of this object.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-18 14:41:07 +01:00
Sebastien Boeuf
42b5d4a2f7 pci, vmm: Update DeviceNode to store PciBdf instead of u32
By having the DeviceNode storing a PciBdf, we simplify the internal code
as well as allow for custom Serialize/Deserialize implementation for the
PciBdf structure. These custom implementations let us display the PCI
s/b/d/f in a human readable format.

Fixes #3711

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-16 11:57:23 +00:00
Fabiano Fidêncio
5752a2a4fb openapi: Add the 204 response to vm-add-* actions
As we've added support for cold adding devices to a VM that was created
but not already started, we should propagate the `204` response
generated on those cases to the yaml file, so openapi-generator can
produce the correct client code on the go side, to handle both `200` and
`204` successful results.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
5d2db68f67 vmm: lib: Allow config changes before the VM is booted
Instead of erroring out when trying to change the configuration of the
VM somewhere between the VM was created but not yet booted, let's allow
users to change that without any issue, as long as the VM has already
been created.

Fixes: #3639

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
b780a916bb vmm: lib: Add unit tests
Let's add very basic unit for the vm_add_$device() functions, so we can
easily expand those when changing its behaviour in the coming commits.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
16782e8c6d vmm: lib: Do the config validation in the Vmm
Instead of doing the validation of the configuration change as part of
the vm, let's do this in the uper layer, in the Vmm.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
bd024bffb1 vmm: config: Move add_to_config to config.rs
Let's move add_to_config to config.rs so it can be used from both inside
and outside of the vm.rs file.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 11:07:26 -08:00
Fabiano Fidêncio
55479a64d2 openapi: Expose TDx configuration
TDx support is already present on the project for quite some time, but
the TDx configuration was not yet exposed to the ones using CH via the
OpenAPI auto generated code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-14 11:12:12 +01:00
Rob Bradford
57184f110a openapi: Add PlatformConfig to OpenAPI spec
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-11 11:20:04 +00:00
Rob Bradford
20b9f95afd vmm: Attach all devices from specified segments to the IOMMU
Since the devices behind the IOMMU cannot be changed at runtime we offer
the ability to place all devices on user chosen segments behind the
IOMMU. This allows the hotplugging of devices behind the IOMMU provided
that they are assigned to a segment that is located behind the iommu.

Fixes: #911

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-11 11:20:04 +00:00
Rob Bradford
6994b33a24 vmm: Add "iommu_segments" to --platform
This provides a list of segments on which all devices will be placed
behind the IOMMU.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-11 11:20:04 +00:00
Sebastien Boeuf
052f38fa96 vmm: Enable guest to report free pages through virtio-balloon
Adding a new parameter free_page_reporting=on|off to the balloon device
so that we can enable the corresponding feature from virtio-balloon.

Running a VM with a balloon device where this feature is enabled allows
the guest to report pages that are free from guest's perspective. This
information is used by the VMM to release the corresponding pages on the
host.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-11 12:10:07 +01:00
Rob Bradford
5e19422fcf vmm: config: Fix PCI segment validation error format string
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-09 13:50:36 +00:00
Rob Bradford
26d1a76ad9 vmm: config: Validate balloon size is less than RAM size
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-09 13:50:36 +00:00
Sebastien Boeuf
10676b74dc vmm: Split VM config and VM state for snapshot/restore
In order to allow for human readable output for the VM configuration, we
pull it out of the snapshot, which becomes effectively the list of
states from the VM. The configuration is stored through a dedicated file
in JSON format (not including any binary output).

Having the ability to read and modify the VM configuration manually
between the snapshot and restore phases makes debugging easier, as well
as empowers users for extending the use cases relying on the
snapshot/restore feature.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-08 15:06:49 +00:00
Rob Bradford
507912385a vmm: Ensure that PIO and MMIO exits complete before pausing
As per this kernel documentation:

      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
      operations are complete (and guest state is consistent) only after userspace
      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
      incomplete operations and then check for pending signals.

      The pending state of the operation is not preserved in state which is
      visible to userspace, thus userspace should ensure that the operation is
      completed before performing a live migration.  Userspace can re-enter the
      guest with an unmasked signal pending or with the immediate_exit field set
      to complete pending operations without allowing any further instructions
      to be executed.

Since we capture the state as part of the pause and override it as part
of the resume we must ensure the state is consistent otherwise we will
lose the results of the MMIO or PIO operation that caused the exit from
which we paused.

Fixes: #3658

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-02-07 15:26:22 +00:00
Sebastien Boeuf
832f09a075 vmm: tdx: Insert payload into the HOB
If a payload is found in the TDVF section, and after it's been copied to
the guest memory, make sure to create the corresponding TdPayload
structure and insert it through the HOB.

Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-04 13:57:56 +01:00
Sebastien Boeuf
3c421593c3 vmm: tdx: Don't load the kernel the usual way
In case of TDX, if a kernel and/or a command line are provided by the
user, they can't be treated the same way as for the non-TDX case. That
is why this patch ensures the function load_kernel() is only invoked for
the non-TDX case.

For the TDX case, whenever TDVF contains a Payload and/or PayloadParam
sections, the file provided through --kernel and the parameters provided
through --cmdline are copied at the locations specified by each TDVF
section.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-04 13:57:56 +01:00
Sebastien Boeuf
7b93a8dd78 vmm: config: Allow --kernel to be used with TDX
The TDVF specification has been updated with the ability to provide a
specific payload, which means we will be able to achieve direct kernel
boot.

For that reason, let's not prevent the user from using --kernel
parameter when running with TDX.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-04 13:57:56 +01:00
Sebastien Boeuf
b3ca1d90e9 vmm: Stop dirty logging only if it has been started
Now that we introduced a separate method to indicate when the migration
is started, both start_dirty_log() and stop_dirty_log() don't have to
carry an implicit meaning as they can focus entirely on the dirty log
being started or stopped.

For that reason, we can now safely move stop_dirty_log() to the code
section performing non-local migration. It makes only sense to stop
logging dirty pages if this has been started before.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-03 13:33:26 +01:00
lizhaoxin1
a45e458c50 vm-migration: Add start_migration() to Migratable trait
In order to clearly decouple when the migration is started compared to
when the dirty logging is started, we introduce a new method to the
Migratable trait. This clarifies the semantics as we don't end up using
start_dirty_log() for identifying when the migration has been started.
And similarly, we rely on the already existing complete_migration()
method to know when the migration has been ended.

A bug was reported when running a local migration with a vhost-user-net
device in server mode. The reason was because the migration_started
variable was never set to "true", since the start_dirty_log() function
was never invoked.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-02-03 13:33:26 +01:00
Fabiano Fidêncio
0dafd47a7c vmm: openapi: Remove mention to net fds
While cloud-hypervisor does support receiving the file descriptors of a
tuntap device, advertising the fds structure via the openAPI can lead to
misinterpretations of what can and what should be done.

An unadvertised consumer will think that they could rather just set the
file descriptors there directly, or even pass them as a byte array.
However, the proper way to go in those cases would be actually sending
those via send_msg(), together with the request.

As hacking the openAPI auto-generated code to properly do this is not
*that* trivial, and as doing so during a `create VM` request is not
supported, we better not advertising those.

Please, for more details, also check:
https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3607#issuecomment-1020935523

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-31 10:38:28 +00:00
Sebastien Boeuf
4e46a1bc3c vmm: api: Support multiple fds with add-net
Based on the latest code from the micro-http crate, this patch adds the
support for multiple file descriptors to be sent along with the add-net
request. This means we can now hotplug multiqueue network interface to
the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-31 10:37:53 +00:00
Sebastien Boeuf
8eed276d14 vm-virtio: Define AccessPlatform trait
Moving the whole codebase to rely on the AccessPlatform definition from
vm-virtio so that we can fully remove it from virtio-queue crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-27 10:00:20 +00:00
Henry Wang
8f4aa07a80 vmm: vm: Init PMU during the VM restore process
If a PMU is enabled in a VM, we also need to initialize the PMU
when the VM is restored. Otherwise, vCPUs cannot be started after
the VM is restored.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2022-01-21 17:59:36 +08:00
Jianyong Wu
5462fd810c seccomp: add ioctl group to seccomp authorized list for arm64
When enable PMU on arm64, ioctl with group KVM_HAS_DEVICE_ATTR will be
blocked by seccomp, add it to authorized list.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-01-21 17:59:36 +08:00
Jianyong Wu
81c5855184 fdt: add PMU node to fdt
PMU node in fdt stores some important info like irq number.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-01-21 17:59:36 +08:00
Jianyong Wu
53060874a7 vmm: Init PMU for vcpu when create vm
PMU is needed in guest for performance profiling, thus should be
enabled.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-01-21 17:59:36 +08:00
Sebastien Boeuf
7b9a110540 vmm: tdx: Pass ACPI tables through the HOB
Relying on helpers for creating the ACPI tables and to add each table to
the HOB, this patch connects the dot to provide the set of ACPI tables
to the TD firmware.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-20 16:50:55 +00:00
Sebastien Boeuf
ea0729c016 vmm: acpi: Create ACPI tables for TDX
The way to create ACPI tables for TDX is different as each table must be
passed through the HOB. This means the XSDT table is not required since
the firmware will take care of creating it. Same for RSDP, this is
firmware responsibility to provide it to the guest.

That's why this patch creates a TDX dedicated function, returning a list
of Sdt objects, which will let the calling code copy the content of each
table through the HOB.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-20 16:50:55 +00:00
Sebastien Boeuf
cdc14815be vmm: tdx: Only create ACPI tables if not running TDX
In case of TDX, we don't want to create the ACPI tables the same way we
do for all the other use cases. That's because the ACPI tables don't
need to be written to guest memory at a specific address, instead they
are passed directly through the HOB.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-20 16:50:55 +00:00
Sebastien Boeuf
4fda4ad6c9 arch, vmm: tdx: Remove TD_VMM_DATA mechanism
It's been decided the ACPI tables will be passed to the firmware in a
different way, rather than using TD_VMM_DATA. Since TD_VMM_DATA was
introduced for this purpose, there's no reason to keep it in our
codebase.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-01-20 16:50:55 +00:00
Anatol Belski
e2a8a1483f acpi: aarch64: Implement DBG2 table
This table is listed as required in the ARM Base Boot Requirements
document. The particular need arises to make the serial debugging of
Windows guest functional.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2022-01-20 09:11:21 +08:00
Michael Zhao
1db7718589 pci, vmm: Pass PCI BDF to vfio and vfio_user
On AArch64, PCI BDF is used for devId in MSI-X routing entry.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-01-18 18:00:00 -08:00
Rob Bradford
4ecc778efe vmm: Avoid deadlock between virtio device activation and vcpu pausing
Ensure all pending virtio activations (as triggered by MMIO write on the
vCPU threads leading to a barrier wait) are completed before pausing the
vCPUs as otherwise there will a deadlock with the VMM waiting for the
vCPU to acknowledge it's pause and the vCPU waiting for the VMM to
activate the device and release the barrier.

Fixes: #3585

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 17:30:06 -08:00
Wei Liu
ef05354c81 memory_manager: drop unneeded clippy suppressions
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-01-18 17:23:27 -08:00
Wei Liu
277cfd07ba device_manager: use if let to drop single match
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-01-18 17:23:27 -08:00
Henry Wang
14ba3f68d3 vmm: cpu: Remove unused import in unit tests
These are the leftovers from the commit 8155be2:
arch: aarch64: vm_memory is not required when configuring vcpu

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2022-01-18 20:34:50 +08:00
Rob Bradford
70f7f64e23 vmm: api: Add "local" option to OpenAPI YAML file
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
88952cc500 vmm: Send FDs across unix socket for migration when in local mode
When in local migration mode send the FDs for the guest memory over the
socket along with the slot that the FD is associated with. This removes
the requirement for copying the guest RAM and gives significantly faster
live migration performance (of the order of 3s to 60ms).

Fixes: #3566

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
715a7d9065 vmm: Add convenience API for getting slots to FDs mapping
This will be used for sending those file descriptors for local
migration.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
1676fffaad vmm: Check shared memory is enabled for local migration
This is required so that the receiving process can access the existing
process's memory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
1daef5e8c9 vmm: Propagate the set of memory slots to FDs received in migration
Create the VM using the FDs (wrapped in Files) that have been received
during the migration process.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
735658a49d vm-migration: Add MemoryFd command for setting FDs for memory
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
b95e46565c vmm: Support using existing files for memory slots
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
eeba1d3ad8 vmm: Support using an existing FD for memory
If this FD (wrapped in a File) is supplied when the RAM region is being
created use that over creating a new one.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
271e17bd79 vmm: Extract code for opening a file for memory
This function is used to open an FD (wrapped in a File) that points to
guest memory from memfd_create() or backed on the filesystem.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Rob Bradford
b9c260c0de vmm, ch-remote: Add "local" option to send-migration API
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-18 09:07:47 +00:00
Wei Liu
8155be2e6b arch: aarch64: vm_memory is not required when configuring vcpu
Drop the unused parameter throughout the code base.

Also take the chance to drop a needless clone.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-01-14 16:03:12 -08:00
Fabiano Fidêncio
fb1755d85d vmm: openapi: Fix "fds" field name for NetConfig
We've been currently using "fd" as the field name, but it should be
called "fds" since  6664e5a6e7 introduced
the name change on the structure field.

Fixes: #3560

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-12 16:44:51 +01:00
Fabiano Fidêncio
cb15ae5462 vmm: openapi: Fix default value for tap
`tap` has its default value set to `None`, but in the openapi yaml file
we've been setting it to `""`.

When using this code on the Kata Containers side we'd be hit by a non
expected behaviour of cloud-hypervisor, as even when using a different
method to initialise the `tuntap` device the code would be treated as if
using `--net tap` (which is a valid use-case).

Related: #3554

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-01-10 13:11:33 +00:00
Rob Bradford
70af81d755 vmm: config: Fix clippy (unnecessary_to_owned) issue
warning: unnecessary use of `to_string`
    --> vmm/src/config.rs:2199:38
     |
2199 | ...                   .get(&memory_zone.to_string())
     |                            ^^^^^^^^^^^^^^^^^^^^^^^^ help: use: `memory_zone`
     |
     = note: `#[warn(clippy::unnecessary_to_owned)]` on by default
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_to_owned

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-07 08:16:26 -08:00
Rob Bradford
8fa3864ae8 vmm: acpi: Fix clippy (needless_late_init) issue
warning: unneeded late initalization
   --> vmm/src/acpi.rs:525:5
    |
525 |     let mut prev_tbl_len: u64;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(clippy::needless_late_init)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_late_init
help: declare `prev_tbl_len` here
    |
552 |     let mut prev_tbl_len: u64 = madt.len() as u64;
    |     ~~~~~~~~~~~~~~~~~~~~~~~~~

warning: unneeded late initalization
   --> vmm/src/acpi.rs:526:5
    |
526 |     let mut prev_tbl_off: GuestAddress;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_late_init
help: declare `prev_tbl_off` here
    |
553 |     let mut prev_tbl_off: GuestAddress = madt_offset;
    |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-07 08:16:26 -08:00