Commit Graph

1839 Commits

Author SHA1 Message Date
Sebastien Boeuf
76dbf85b79 net: Give the user the ability to set MTU
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.

In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-21 16:20:57 +02:00
Sebastien Boeuf
f38056fc9e virtio-devices, vmm: Simplify virtio-mem resize operation
There's no need to delegate the resize operation to the virtio-mem
thread. This can come directly from the vmm thread which will use the
Mem object to update the VIRTIO configuration and trigger the interrupt
for the guest to be notified.

In order to achieve what's described above, the VirtioMemZone structure
now has a handle onto the Mem object directly. This avoids the need for
intermediate Resize and ResizeSender structures.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-20 13:43:40 +02:00
Rob Bradford
f32487f8e8 misc: Automatic beta clippy fixes
e.g. cargo clippy --all --tests --all-targets --fix --features=..

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-20 10:59:48 +01:00
Sebastien Boeuf
1849ffff31 vmm: Remove "amx" feature gate
Given the AMX x86 feature has been made available since kernel v5.17,
and given we don't have any test validating this feature, there's no
need to keep it behing a Rust feature gate.

Fixes #3996

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-16 15:03:31 +01:00
Rob Bradford
0e52be0909 vmm: Ensure default deserialisation for "amx" feature bit
This allows a migration from a binary not compiled with struct member to
be completed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-09-16 15:03:31 +01:00
Sebastien Boeuf
3793ffe888 vmm: config: Move TDX to rely on PayloadConfig
Removing the option --tdx to specify that we want to run a TD VM. Rely
on --platform option by adding the "tdx" boolean parameter. This is the
new way for enabling TDX with Cloud Hypervisor.

Along with this change, the way to retrieve the firmware path has been
updated to rely on the recently introduced PayloadConfig structure.

Fixes #4556

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-05 12:14:59 +01:00
Sebastien Boeuf
b3bef3adda vmm: acpi: Don't declare MMIO config space through PCI buses
The PCI buses should not declare the address space related to the MMIO
config space given it's already declared in the MCFG table and through
the motherboard device PNP0C02 in the DSDT table.

The PCI MMIO config region for the segment was being wrongly exposed as
part of the _CRS for the ACPI bus device (using Memory32Fixed). Exposing
it via this object was ineffectual as the equivalent entry in the
PNP0C02 (_SB_.MBRD) marked those ranges as not usable via the kernel.
Either way, with both devices used by the kernel, the kernel will not
try and use those memory ranges for the device BARs. However under
td-shim on TDX the PNP0C02 device is not on the permitted list of
devices so the the memory ranges were not marked as unusable resulting
in the kernel attempting to allocate BARs that collided with the PCI
MMIO configuration space.

This is based on the kernel documentation PCI/acpi-info.rst which relies
on ACPI and PCI Firmware specifications. And here are the interesting
quotes from this document:

"""
Prior to the addition of Extended Address Space descriptors, the failure
of Consumer/Producer meant there was no way to describe bridge registers
in the PNP0A03/PNP0A08 device itself. The workaround was to describe the
bridge registers (including ECAM space) in PNP0C02 catch-all devices.
With the exception of ECAM, the bridge register space is device-specific
anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need
to know about it.

PNP0C02 “motherboard” devices are basically a catch-all. There’s no
programming model for them other than “don’t use these resources for
anything else.” So a PNP0C02 _CRS should claim any address space that is
(1) not claimed by _CRS under any other device object in the ACPI
namespace and (2) should not be assigned by the OS to something else.

The address range reported in the MCFG table or by _CBA method (see
Section 4.1.3) must be reserved by declaring a motherboard resource. For
most systems, the motherboard resource would appear at the root of the
ACPI namespace (under _SB) in a node with a _HID of EISAID (PNP0C02),
and the resources in this case should not be claimed in the root PCI
bus’s _CRS. The resources can optionally be returned in Int15 E820 or
EFIGetMemoryMap as reserved memory but must always be reported through
ACPI as a motherboard resource.
"""

This change has been manually tested by running a VM with multiple
segments (4 segments), and by hotplugging an additional disk to the
segment number 2 (third segment).

From one shell:
"""
cloud-hypervisor \
    --cpus boot=1 \
    --memory size=1G \
    --kernel vmlinux \
    --cmdline "root=/dev/vda1 rw console=hvc0" \
    --disk path=jammy-server-cloudimg.raw \
    --api-socket /tmp/ch.sock \
    --platform num_pci_segments=4
"""

From another shell (after the VM is booted):
"""
ch-remote \
    --api-socket=/tmp/ch.sock \
    add-disk \
    path=test-disk.raw,id=disk2,pci_segment=2
"""

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-02 14:14:23 +02:00
Nuno Das Neves
784a3aaf3c devices: gic: use VgicConfig everywhere
Use VgicConfig to initialize Vgic.
Use Gic::create_default_config everywhere so we don't always recompute
redist/msi registers.
Add a helper create_test_vgic_config for tests in hypervisor crate.

Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
2022-08-31 08:33:05 +01:00
Jianyong Wu
3a19573c69 vmm: unify payload load chain on AArch64 with x86_64
AArch64 can share the same way of loading payload with x86_64. It makes
the payload loading more consistent between different arches.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Michael Zhao
b65639fad3 vmm:AArch64: move uefi_flash to memory manager
uefi_flash is used when load firmware, that is load payload depends on
device manager. move uefi_flash to memory manager can eliminate the
dependency.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-31 08:32:08 +01:00
Jianyong Wu
9b1452f258 vmm:AArch64: load standalone firmware
A new firmware item has been added into payload config, we need
extend ability to load standalone firmware on AArch64.
"load_kernel" method will be the entry of image loading work including
kernel and firmware.

This change is back compatible. So, we can either load firmware through
"-kernel" like before or "-firmware".

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Jianyong Wu
2054c8699a vmm:AArch64: add load_firmware method
Later, we will load standalone firmware. So, refactor load_kernel
by abstracting load_firmware method.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-08-31 08:32:08 +01:00
Sebastien Boeuf
8c02648ac9 vmm: device_manager: Update virtio-console for proper PTY support
Given the virtio-console is now able to buffer its output when no PTY is
connected on the other end, the device manager code is updated to enable
this. Moving the endpoint type from FilePair to PtyPair enables the
proper codepath in the virtio-console implementation, as well as
updating the PTY resize code, and forcing the PTY to always be
non-blocking.

The non-blocking behavior is required to avoid blocking the guest that
would be waiting on the virtio-console driver. When receiving an
EWOULDBLOCK error, the output will simply be redirected to the temporary
buffer so that it can be later flushed.

The PTY resize logic has been slightly modified to ensure the PTY file
descriptors are closed. It avoids the child process to keep a hold onto
the PTY device, which would have caused the PTY to believe something is
connected on the other end, which would have prevented the detection of
any new connection on the PTY.

Fixes #4521

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
a940f525a8 vmm: Move SerialBuffer to its own crate
We want to be able to reuse the SerialBuffer from the virtio-devices
crate, particularly from the virtio-console implementation. That's why
we move the SerialBuffer definition to its own crate so that it can be
accessed from both vmm and virtio-devices crates, without creating any
cyclic dependency.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
63462fd8ab vmm: serial_manager: Iterate again on EINTR
If the epoll_wait() call returns EINTR, this only means a signal has
been delivered before any of the file descriptors registered triggered
an event or before the end of the timeout (if timeout isn't -1). For
that reason, we should simply try to listen on the epoll loop again.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-30 13:47:51 +02:00
Sebastien Boeuf
0bcb6ff061 vmm: Limit the size of the SerialBuffer
We must limit how much the buffer can grow, otherwise this could lead
the process to consume all the memory on the machine. This could happen
if the output from the guest was very important and nothing would
connect to the PTY for a long time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-24 12:14:59 +02:00
Michael Zhao
d66d64c325 vmm: Restrict the maximum number of HW breakpoints
Set the maximum number of HW breakpoints according to the value returned
from `Hypervisor::get_guest_debug_hw_bps()`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-23 16:57:12 +02:00
Wei Liu
3e6b0a5eab vmm: unify TranslateVirtualAddress error for both x86_64 and aarch64
Using anyhow::Error should cover both architectures.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-22 09:37:21 -07:00
Michael Zhao
c798b958f3 vmm: Extend seccomp rules for GDB
Add 'KVM_SET_GUEST_DEBUG' ioctl to seccomp filter rules.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Michael Zhao
0522e40933 vmm: Implement translate_gva on AArch64
On AArch64, `translate_gva` API is not provided by KVM. We implemented
it in VMM by walking through translation tables.

Address translation is big topic, here we only focus the scenario that
happens in VMM while debugging kernel. This `translate_gva`
implementation is restricted to:
 - Exception Level 1
 - Translate high address range only (kernel space)

This implementation supports following Arm-v8a features related to
address translation:
 - FEAT_LPA
 - FEAT_LVA
 - FEAT_LPA2

The implementation supports page sizes of 4KiB, 16KiB and 64KiB.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Michael Zhao
5febdec81a vmm: Enable gdbstub on AArch64
The `gva_translate` function is still missing, it will be added with a
separate commit.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Nuno Das Neves
fdc8546eef vmm: aarch64: Use GIC_V3_* consts instead of magic numbers in create_madt()
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
2022-08-21 17:06:48 +08:00
Sebastien Boeuf
cdcd4d259e vmm: serial: Wait for PTY to be available before writing to it
The goal of this patch is to provide a reliable way to detect when the
other end of the PTY is connected, and therefore be able to identify
when we can write to the PTY device. This is needed because writing to
the PTY device when the other end isn't connected causes the loss of
the written bytes.

The way to detect the connection on the other end of the PTY is by
knowing the other end is disconnected at first with the presence of the
EPOLLHUP event. Later on, when the connection happens, EPOLLHUP is not
triggered anymore, and that's when we can assume it's okay to write to
the PTY main device.

It's important to note we had to ensure the file descriptor for the
other end was closed, otherwise we would have never seen the EPOLLHUP
event. And we did so by removing the "sub" field from the PtyPair
structure as it was keeping the associated File opened.

Fixes #3170

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-19 14:39:06 +01:00
Rob Bradford
396f9ce2c6 vmm: Deprecate non-PVH firmware loading
Curently all the firmware blobs we support can use PVH loading.

See: #4511

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-18 17:29:44 +01:00
Rob Bradford
282a1001ef vmm: x86_64: Rename load_firmware() to reflect its purpose
This function only supports loading legacy, non-PVH firmware binaries.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
0d682e185f vmm: x86_64: Add support for firmware loading
Since our firmware files are still designed to be used via PVH use the
load_kernel() function to load the firmware falling back to legacy
firmware loading if necessary.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
8ec5a248cd main, vmm: Add option to pass firmware parameter in payload
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
763ea7da42 vmm: x86_64: Split payload loading into it's own function
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
2856074d12 vmm: x86_64: Make kernel loading use PayloadConfig
Minor refactoring to start supporting loading a firmware payload

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
485900eeb4 vmm: x86_64: Use more general name for payload handling
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Rob Bradford
6988da79d2 vmm: x86_64: Split legacy firmware loading into own function
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-17 09:50:42 +01:00
Sebastien Boeuf
98f949d35d vmm: Add new I/O ports for ACPI shutdown and PM timer devices
Adding new I/O ports for both the ACPI shutdown and the ACPI PM timer
devices so they can be triggered from both addresses. The reason for
this change is that TDX expects only certain I/O ports to be enabled
based on what QEMU exposes. We follow this to avoid new ports from being
opened exclusively for Cloud Hypervisor.

We have to keep the former I/O ports available given all firmwares
haven't been updated yet. Once we reach a point where we know both Rust
Hypervisor Firmware, OVMF, TDVF and TDSHIM have been updated with the
new port values, we'll be able to remove the former ports.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-11 11:46:09 +01:00
Rob Bradford
8c22c03e1e vmm: openapi: Switch to describing new payload API
The old API remains usable, and will remain usable for two releases but
we should only advertise the new API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 22:20:07 +01:00
Rob Bradford
51fdc48817 vmm: openapi: Fix OpenAPI YAML file formatting
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 22:20:07 +01:00
Rob Bradford
cef51a9de0 vmm: Encompass guest payload configuration in PayloadConfig
Introduce a new top level member of VmConfig called PayloadConfig that
(currently) encompasses the kernel, commandline and initramfs for the
guest to use.

In future this can be extended for firmware use. The existing
"--kernel", "--cmdline" and "initramfs" CLI parameters now fill the
PayloadConfig.

Any config supplied which uses the now deprecated config members have
those members mapped to the new version with a warning.

See: #4445

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-10 15:12:34 +01:00
Rob Bradford
6bc46ba9c1 vmm: config: Reject VFIO devices with the same path
By checking in the validation logic we get checking for both devices
specified in the initial config but also hotplug too.

Fixes: #4453

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-09 14:32:35 +02:00
Rob Bradford
ea58d2f68a vmm: config: Enhance test_cpu_parsing to add "affinity" parameter
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-08 16:23:00 +01:00
Rob Bradford
d295de4cd5 option_parser: Move test_option_parser to option_parser crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-08 16:23:00 +01:00
Wei Liu
53aecf9341 vmm: add oem_strings to openapi
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-08 08:59:19 +01:00
Wei Liu
57e9b80123 vmm: provide oem_strings option
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-08-08 08:59:19 +01:00
lizhaoxin1
65f42c1f62 vmm: openapi: Add uuid to PlatformConfig
Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
2022-08-04 09:20:06 +02:00
lizhaoxin1
bc3a276b43 arch, vmm: Expose platform uuid via SMBIOS
Parse and set uuid.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-08-04 09:20:06 +02:00
lizhaoxin1
3abc1e1e51 vmm: config: Add "uuid" option to "--platform"
The uuid indicates the unique ID of a virtual machine.
cloud-hypervisor takes the uuid passed by libvirt
and uses it to initialize cloud-init.

Signed-off-by: lizhaoxin1 <Lxiaoyouling@163.com>
2022-08-04 09:20:06 +02:00
Bo Chen
1125fd2667 vmm: api: Use 'BTreeMap' for 'HttpRoutes'
In this way, we get the values sorted by its key by default, which is
useful for the 'http_api' fuzzer.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Bo Chen
eb056d374a vmm: Make 'EpollContext::add_event()' public
So that it can be reused by other crate, e.g. from fuzz targets.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-08-03 10:18:24 +01:00
Sebastien Boeuf
4d74525bdc vmm: Remove unused "poll_queue" from DiskConfig
The parameter "poll_queue" was useful at the time Cloud Hypervisor was
responsible for spawning vhost-user backends, as it was carrying the
information the vhost-user-block backend should have this option enabled
or not.

It's been quite some time that we walked away from this design, as we
now expect a management layer to be responsible for running vhost-user
backends.

That's the reason why we can remove "poll_queue" from the DiskConfig
structure.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-08-02 15:10:11 +02:00
Michael Zhao
7199119bb2 hypervisor: Remove Vcpu::read_mpidr() on AArch64
Replaced `read_mpidr()` with `get_sys_reg()`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Michael Zhao
cd7f36a713 hypervisor: Remove get/set_reg() on AArch64
`Vcpu::get/set_reg()` were only invoked in Vcpu itself.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Michael Zhao
f7b6d99c2d hypervisor: Remove get/set_sys_regs() on AArch64
`hypervisor::Vcpu::get/set_sys_regs()` are only used in Vcpu internally.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-29 11:45:12 +01:00
Rob Bradford
857edc71a9 vmm: cpu: Remove now unused CpuManager::vcpus_paused()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-26 09:22:25 +02:00
Rob Bradford
0e29379bcf vmm: Make gdb break/resuming more resilient
When starting the VM such that it is already on a breakpoint (via
stop_on_boot) when attached to gdb then start the vCPUs in a paused
state rather than starting the vCPUs later (upon resume).

Further, make the resumption/break of the VM more resilient by only
attempting to resume the vCPUs if were are already in a break point and
only attempting to pause/break if we were already running.

Fixes: #4354

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-26 09:22:25 +02:00
Rob Bradford
a749182777 vmm: acpi: Use ACPI platform device addresses from DeviceManager
Remove the hardcoded addresses.

Also remove PM_TMR_BLK as spec compliant implementation will use
X_PM_TMR_BLK over this field.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-25 16:16:06 +01:00
Rob Bradford
2e8eb96ef6 vmm: device_manager: Store ACPI platform addresses for later use
These are ready for inclusion in the FACP table.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-25 16:16:06 +01:00
Wei Liu
ad33f7c5e6 vmm: return seccomp rules according to hypervisors
That requires stashing the hypervisor type into various places.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-22 12:50:12 +01:00
Wei Liu
a96a5d7816 hypervisor, vmm: use new vfio-ioctls
Use the new vfio-ioctls APIs. Drop Cloud Hypervisor's Device trait
since it is no longer needed.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-21 23:37:53 +01:00
Wei Liu
f84ddedb1a hypervisor, vmm: introduce trait functions for aarch64 PMU
The original code uses kvm_device_attr directly outside of the
hyeprvisor crate. That leaks hypervisor details.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-21 23:37:53 +01:00
Wei Liu
f21fc1dcb6 hypervisor: x86: provide a generic MsrEntry structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-20 10:13:41 +01:00
Wei Liu
4d2cc3778f hypervisor: move away from MsrEntries type
It is a flexible array. Switch to vector and slice instead.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-20 10:13:41 +01:00
Wei Liu
05e5106b9b hypervisor x86: provide a generic LapicState structure
This requires making get/set_lapic_reg part of the type.

For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-19 09:38:38 +01:00
Wei Liu
6a8c0fc887 hypervisor: provide a generic FpuState structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
08135fa085 hypervisor: provide a generic CpudIdEntry structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
45fbf840db hypervisor, vmm: move away from CpuId type
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.

Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.

This allows us to decouple CpuIdEntry from hypervisors more easily.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-18 22:15:30 +01:00
Wei Liu
f1ab86fecb hypervisor: x86: provide a generic SpecialRegisters structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
75797827d5 hypervisor: x86: provide a generic SegmentRegister structure
And drop SegmentRegisterOps since it is no longer required.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
8b7781e267 hypervisor: x86: provide a generic StandardRegisters structure
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-15 10:21:43 +01:00
Wei Liu
4201bf4011 hypervisor: provide a generic ClockData structure
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Wei Liu
beb4f86b82 hypervisor, vmm: drop VmState and code
VmState was introduced to hold hypervisor specific VM state. KVM does
not need it and MSHV does not really use it yet.

Just drop the code. It can be easily revived once there is a need.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 22:09:04 +01:00
Alyssa Ross
a455917db5 vmm: fix missed API or debug events
Previously, we were assuming that every time an eventfd notified us,
there was only a single event waiting for us.  This meant that if,
while one API request was being processed, two more arrived, the
second one would not be processed (until the next one arrived, when it
would be processed instead of that event, and so on).  To fix this,
make sure we're processing the number of API and debug requests we've
been told have arrived, rather than just one.  This is easy to
demonstrate by sending lots of API events and adding some sleeps to
make sure multiple events can arrive while each is being processed.

For other uses of eventfd, like the exit event, this doesn't matter —
even if we've received multiple exit events in quick succession, we
only need to exit once.  So I've only made this change where receiving
an event is non-idempotent, i.e. where it matters that we process the
event the right number of times.

Technically, reset requests are also non-idempotent — there's an
observable difference between a VM resetting once, and a VM resetting
once and then immediately resetting again.  But I've left that alone
for now because two resets in immediate succession doesn't sound like
something anyone would ever want to me.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2022-07-14 17:44:11 +01:00
Michael Zhao
2d8635f04a hypervisor: Refactor system_registers on AArch64
Function `system_registers` took mutable vector reference and modified
the vector content. Now change the definition to `get/set` style.
And rename to `get/set_sys_regs` to align with other functions.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Michael Zhao
c445513976 hypervisor: Refactor core_registers on AArch64
On AArch64, the function `core_registers` and `set_core_registers` are
the same thing of `get/set_regs` on x86_64. Now the names are aligned.
This will benefit supporting `gdb`.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-07-14 22:55:19 +08:00
Wei Liu
0e8769d76a device_manager: assert passthrough_device has the correct type
There is a lot of unsafe code in such a small function. Add an assert
to help detect issues earlier.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-14 08:09:50 +01:00
Wei Liu
84bbaf06d1 hypervisor: turn boot_msr_entries into a trait method
This allows dispatching to either KVM or MSHV automatically.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-08 16:49:58 +01:00
Rob Bradford
121729a3b0 vmm: Split signal handling for VM and VMM signals
The VM specific signal (currently only SIGWINCH) should only be handled
when the VM is running.

The generic VMM signals (SIGINT and SIGTERM) need handling at all times.

Split the signal handling into two separate threads which have differing
lifetimes.

Tested by:
1.) Boot full VM and check resize handling (SIGWINCH) works & sending
    SIGTERM leads to cleanup (tested that API socket is removed.)
2.) Start without a VM and send SIGTERM/SIGINT and observe cleanup (API
    socket removed)
3.) Boot full VM, delete VM and observe 2.) holds.
4.) Boot full VM, delete VM, recreate VM and observe 1.) holds.

Fixes: #4269

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-08 15:15:46 +01:00
Rob Bradford
93237f0106 vmm: Set MADT "Online Capable" flag
The Linux kernel now checks for this before marking CPUs as
hotpluggable:

commit aa06e20f1be628186f0c2dcec09ea0009eb69778
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Wed Sep 8 16:41:46 2021 -0500

    x86/ACPI: Don't add CPUs that are not online capable

    A number of systems are showing "hotplug capable" CPUs when they
    are not really hotpluggable.  This is because the MADT has extra
    CPU entries to support different CPUs that may be inserted into
    the socket with different numbers of cores.

    Starting with ACPI 6.3 the spec has an Online Capable bit in the
    MADT used to determine whether or not a CPU is hotplug capable
    when the enabled bit is not set.

    Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html?#local-apic-flags
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-07-01 18:45:05 +01:00
Rob Bradford
adf5881757 build: #[allow(clippy::significant_drop_in_scrutinee) in some crates
This check is new in the beta version of clippy and exists to avoid
potential deadlocks by highlighting when the test in an if or for loop
is something that holds a lock. In many cases we would need to make
significant refactorings to be able to pass this check so disable in the
affected crates.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
b57d7b258d build: Fix beta clippy issue (needless_return)
warning: unneeded `return` statement
   --> pci/src/vfio_user.rs:627:13
    |
627 | /             return Err(std::io::Error::new(
628 | |                 std::io::ErrorKind::Other,
629 | |                 format!("Region not found for 0x{:x}", gpa),
630 | |             ));
    | |_______________^
    |
    = note: `#[warn(clippy::needless_return)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return
help: remove `return`
    |
627 ~             Err(std::io::Error::new(
628 +                 std::io::ErrorKind::Other,
629 +                 format!("Region not found for 0x{:x}", gpa),
630 +             ))
    |

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2716bc3311 build: Fix beta clippy issue (derive_partial_eq_without_eq)
warning: you are deriving `PartialEq` and can implement `Eq`
  --> vmm/src/serial_manager.rs:59:30
   |
59 | #[derive(Debug, Clone, Copy, PartialEq)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-30 20:50:45 +01:00
Rob Bradford
2e664dca64 vmm: Always reset the console mode on VMM exit
Tested:

1. SIGTERM based
2. VM shutdown/poweroff
3. Injected VM boot failure after calling Vm::setup_tty()

Fixes: #4248

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-28 16:45:27 +01:00
Rob Bradford
65ec6631fb vmm: cpu: Store the vCPU snapshots in ascending order
The snapshots are stored in a BTree which is ordered however as the ids
are strings lexical ordering places "11" ahead of "2". So encode the
vCPU id with zero padding so it is lexically sorted.

This fixes issues with CPU restore on aarch64.

See: #4239

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-27 16:20:57 +01:00
Wei Liu
bccd7c7e48 vmm: drop Sync+Send bounds for EndpointHandler
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 23:28:57 +01:00
Wei Liu
8fa1098629 vmm: switch from lazy_static to once_cell
Once_cell does not require using macro and is slated to become part of
Rust std at some point.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-06-20 16:03:07 +01:00
Sebastien Boeuf
335a4e1cc0 vmm: api: Expose kvm_hyperv parameter in OpenAPI description
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-17 15:11:53 +01:00
Sebastien Boeuf
81ba70a497 pci, vmm: Defer mapping VFIO MMIO regions on restore
When restoring a VM, the restore codepath will take care of mapping the
MMIO regions based on the information from the snapshot, rather than
having the mapping being performed during device creation.

When the device is created, information such as which BARs contain the
MSI-X tables are missing, preventing to perform the mapping of the MMIO
regions.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
7df7061610 pci, vmm: Add migratable support to vfio-user devices
Based on recent changes to VfioUserPciDevice, the vfio-user devices can
now be migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Sebastien Boeuf
c021dda267 pci, vmm: Add migratable support to VFIO devices
Based on recent changes to VfioPciDevice, the VFIO devices can now be
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-09 09:19:58 +02:00
Rob Bradford
94fb9f817d vmm: Fix clippy issues under "guest_debug" feature
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-08 11:40:56 +01:00
Michael Zhao
a7a15d56dd aarch64: Move setup_regs to hypervisor
`setup_regs` of AArch64 calls KVM sepecific code. Now move it to
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 11:07:46 +01:00
Sebastien Boeuf
65dc1c83a9 vmm: cpu: Save and restore CPU states during snapshot/restore
Based on recent KVM host patches (merged in Linux 5.16), it's forbidden
to call into KVM_SET_CPUID2 after the first successful KVM_RUN returned.
That means saving CPU states during the pause sequence, and restoring
these states during the resume sequence will not work with the current
design starting with kernel version 5.16.

In order to solve this problem, let's simply move the save/restore logic
to the snapshot/restore sequences rather than the pause/resume ones.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:29 +01:00
Sebastien Boeuf
3edaa8adb6 vmm: Ensure restore matches boot sequence
The vCPU is created and set after all the devices on a VM's boot.
There's no reason to follow a different order on the restore codepath as
this could cause some unexpected behaviors.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-06-06 11:07:17 +01:00
Michael Zhao
9260c3816e vmm: Update unit test for GIC refactoring
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
5d45d6d0fb vmm: Move GIC unit test to hypervisor crate
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
957d3a7443 aarch64: Simplify GIC related structs definition
Combined the `GicDevice` struct in `arch` crate and the `Gic` struct in
`devices` crate.

After moving the KVM specific code for GIC in `arch`, a very thin wapper
layer `GicDevice` was left in `arch` crate. It is easy to combine it
with the `Gic` in `devices` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Michael Zhao
04949755c0 arch: Switch to new GIC interface
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-06-06 10:17:26 +08:00
Rob Bradford
ade3a9c8f6 virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-06-01 09:42:02 +02:00
Yi Wang
dbeb922882 doc: add vm coredump support
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
8b585b96c1 vmm: enable coredump
Based on the newly added guest_debug feature, this patch adds http
endpoint support.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
ccb604e1e1 vmm: add cpu segment note for coredump
The crash tool use a special note segment which named 'QEMU' to
analyze kaslr info and so on. If we don't add the 'QEMU' note
segment, crash tool can't find linux version to move on.

For now, the most convenient way is to add 'QEMU' note segment to
make crash tool happy.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
0e65ca4a6c vmm: save guest memory for coredump
Guest memory is needed for analysis in crash tool, so save it
for coredump.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Yi Wang
7e280b6f70 vmm: save elf header for coredump
The vmcore file of guest is an elf format, so the first step of coredump
is to save the elf header.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
2022-05-30 13:41:40 +02:00
Yi Wang
90034fd6ba vmm: add GuestDebuggable trait
It's useful to dump the guest, which named coredump so that crash
tool can be used to analysize it when guest hung up.

Let's add GuestDebuggable trait and Coredumpxxx error to support
coredump firstly.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Co-authored-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-30 13:41:40 +02:00
Rob Bradford
465db7f08c vmm: config: Remove mergeable option from PmemConfig
Fixes: #3968

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:48:49 +02:00
Rob Bradford
55c5961f43 vmm: config: Remove dax & cache_size options from FsConfig
Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
7c3582b4a8 vmm: config: Fix error message regarding use of cache size without dax
The error message incorrectly said that the user was trying to combine
cache_size without dax whereas it is only usuable with dax.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Rob Bradford
979797786d vmm: Remove DAX cache setup for virtio-fs devices
Remove the code from the DeviceManager that prepares the DAX cache since
the functionality has now been removed.

Fixes: #3889

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-27 09:47:13 +02:00
Michael Zhao
0fd6521759 aarch64: Avoid depending on layout in GIC code
Removing the dependency on `layout` helps moving GIC code into
`hypervisor` crate.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Michael Zhao
3fe20cc09a aarch64: Remove GicDevice trait
`GicDevice` trait was defined for the common part of GicV3 and ITS.
Now that the standalone GicV3 do not exist, `GicDevice` is not needed.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-05-27 10:57:50 +08:00
Rob Bradford
fa07d83565 Revert "virtio-devices, vmm: Optimised async virtio device activation"
This reverts commit f160572f9d.

There has been increased flakiness around the live migration tests since
this was merged. Speculatively reverting to see if there is increased
stability.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-21 21:27:33 +01:00
Rob Bradford
f160572f9d virtio-devices, vmm: Optimised async virtio device activation
In order to ensure that the virtio device thread is spawned from the vmm
thread we use an asynchronous activation mechanism for the virtio
devices. This change optimises that code so that we do not need to
iterate through all virtio devices on the platform in order to find the
one that requires activation. We solve this by creating a separate short
lived VirtioPciDeviceActivator that holds the required state for the
activation (e.g. the clones of the queues) this can then be stored onto
the device manager ready for asynchronous activation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-20 17:07:13 +01:00
Sebastien Boeuf
49db713124 virtio-devices, vmm: Remove unused macro rules
Latest cargo beta version raises warnings about unused macro rules.
Simply remove them to fix the beta build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-20 09:59:43 +01:00
Maksym Pavlenko
3a0429c998 cargo: Clean up serde dependencies
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.

Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2022-05-18 08:21:19 +02:00
Rob Bradford
16a9882153 vmm: cpu: tdx: Don't use fd suffix for something not an FD
The hypervisor::Vcpu is the abstraction over the fd.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
218be2642e hypervisor: Explicitly pub use at the hypervisor crate top-level
Explicitly re-export types from the hypervisor specific modules. This
makes it much clearer what the common functionality that is exposed is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
cd0df05808 vmm, arch: CpuId is x86_64 specific so import from the x86_64 module
It will be removed as a top-level export from the hypervisor crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
d3f66f8702 hypervisor: Make vm module private
And thus only export what is necessary through a `pub use`. This is
consistent with some of the other modules and makes it easier to
understand what the external interface of the hypervisor crate is.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-13 15:39:22 +02:00
Rob Bradford
b1bd87df19 vmm: Simplify MsiInterruptManager generics
By taking advantage of the fact that IrqRoutingEntry is exported by the
hypervisor crate (that is typedef'ed to the hypervisor specific version)
then the code for handling the MsiInterruptManager can be simplified.

This is particularly useful if in this future it is not a typedef but
rather a wrapper type.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
3f9e8d676a hypervisor: Move creation of irq routing struct to hypervisor crate
This removes the requirement to leak as many datastructures from the
hypervisor crate into the vmm crate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-11 11:19:14 +01:00
Rob Bradford
c2c813599d vmm: Don't use kvm_ioctls directly
The IoEventAddress is re-exported through the crate at the top-level.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 15:57:43 +01:00
Rob Bradford
387d56879b vmm, hypervisor: Clean up nomenclature around offloading VM operations
The trait and functionality is about operations on the VM rather than
the VMM so should be named appropriately. This clashed with with
existing struct for the concrete implementation that was renamed
appropriately.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-10 13:10:01 +01:00
Sebastien Boeuf
5f722d0d3f vmm: Fix loading RAW firmware
Whenever going through the codepath of loading a RAW firmware, we always
add an extra RAM region to the guest memory through the memory manager.
But we must be careful to use the updated guest memory rather than a
previous reference that wasn't containing the new region, as this can
lead to the following error:

VmBoot(FirmwareLoad(InvalidGuestAddress(GuestAddress(4290772992))))

This is fixed by the current patch, getting the latest reference onto
the guest memory from the memory manager right after the new region has
been added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-06 18:13:28 +02:00
Bo Chen
42c19e14c5 vmm: Add 'shutdown()' to vCPU seccomp filter
This is required when hot-removing a vfio-user device. Details code path
below:

Thread 6 "vcpu0" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7f8196889700 (LWP 2358305)]
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
  0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
  0x000056189240737d in std::sys::unix::net::Socket::shutdown ()
    at library/std/src/sys/unix/net.rs:383
  std::os::unix::net::stream::UnixStream::shutdown () at library/std/src/os/unix/net/stream.rs:479
  0x000056189210e23d in vfio_user::Client::shutdown (self=0x7f8190014300)
    at vfio_user/src/lib.rs:787
  0x00005618920b9d02 in <pci::vfio_user::VfioUserPciDevice as core::ops::drop::Drop>::drop (
    self=0x7f819002d7c0) at pci/src/vfio_user.rs:551
  0x00005618920b8787 in core::ptr::drop_in_place<pci::vfio_user::VfioUserPciDevice> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b92e3 in core::ptr::drop_in_place<core::cell::UnsafeCell<dyn pci::device::PciDevice>>
    () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b9362 in core::ptr::drop_in_place<std::sync::mutex::Mutex<dyn pci::device::PciDevice>> () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920d8a3e in alloc::sync::Arc<T>::drop_slow (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1092
  0x00005618920ba273 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1688
 0x00005618920b76fb in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mutex::Mutex<dyn pci::device::PciDevice>>> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
 0x0000561891b5e47d in vmm::device_manager::DeviceManager::eject_device (self=0x7f8190009600,
    pci_segment_id=0, device_id=3) at vmm/src/device_manager.rs:4000
 0x0000561891b674bc in <vmm::device_manager::DeviceManager as vm_device:🚌:BusDevice>::write (
    self=0x7f8190009600, base=70368744108032, offset=8, data=&[u8](size=4) = {...})
    at vmm/src/device_manager.rs:4625
 0x00005618921927d5 in vm_device:🚌:Bus::write (self=0x7f8190006e00, addr=70368744108040,
    data=&[u8](size=4) = {...}) at vm-device/src/bus.rs:235
 0x0000561891b72e10 in <vmm::vm::VmOps as hypervisor::vm::VmmOps>::mmio_write (
    self=0x7f81900097b0, gpa=70368744108040, data=&[u8](size=4) = {...}) at vmm/src/vm.rs:378
 0x0000561892133ae2 in <hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run (
    self=0x7f8190013c90) at hypervisor/src/kvm/mod.rs:1114
 0x0000561891914e85 in vmm::cpu::Vcpu::run (self=0x7f819001b230) at vmm/src/cpu.rs:348
 0x000056189189f2cb in vmm::cpu::CpuManager::start_vcpu::{{closure}}::{{closure}} ()
    at vmm/src/cpu.rs:953

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-05 15:33:26 -07:00
Sebastien Boeuf
058a61148c vmm: Factorize net creation
Since both Net and vhost_user::Net implement the Migratable trait, we
can factorize the common part to simplify the code related to the net
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
425902b296 vmm: Factorize disk creation
Since both Block and vhost_user::Blk implement the Migratable trait, we
can factorize the common part to simplify the code related to the disk
creation.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Sebastien Boeuf
54f39aa8cb vmm: Validate vhost-user-block/net are not configured with iommu=on
Extend the validate() function for both DiskConfig and NetConfig so that
we return an error if a vhost-user-block or vhost-user-net device is
expected to be placed behind the virtual IOMMU. Since these devices
don't support this feature, we can't allow iommu to be set to true in
these cases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-05 13:08:41 +02:00
Rob Bradford
707cea2182 vmm, devices: Move logging of 0x80 timestamp to its own device
This is a cleaner approach to handling the I/O port write to 0x80.
Whilst doing this also use generate the timestamp at the start of the VM
creation. For consistency use the same timestamp for the ARM equivalent.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 23:02:53 +01:00
Rob Bradford
c47e3b8689 gdb: Do not use VmmOps for memory manipulation
We don't use the VmmOps trait directly for manipulating memory in the
core of the VMM as it's really designed for the MSHV crate to handle
instruction decoding. As I plan to make this trait MSHV specific to
allow reduced locking for MMIO and PIO handling when running on KVM this
use should be removed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 11:33:02 -07:00
Bo Chen
7fe399598d vmm: device_manager: Map MMIO regions to the guest correctly
To correctly map MMIO regions to the guest, we will need to wait for valid
MMIO region information which is generated from 'PciDevice::allocate_bars()'
(as a part of 'DeviceManager::add_pci_device()').

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-04 13:53:47 +02:00
Rob Bradford
1dfe4eda5c vmm: Prevent "internal" identifiers being used by user
For devices that cannot be named by the user use the "__" prefix to
identify them as internal devices. Check that any identifiers provided
in the config do not clash with those internal names. This prevents the
user from creating a disk such as "__serial" which would then cause a
failure in unpredictable manner.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-04 12:34:11 +02:00
Sebastien Boeuf
6e101f479c vmm: Ensure hotplugged device identifier is unique
Whenever a device (virtio, vfio, vfio-user or vdpa) is hotplugged, we
must verify the provided identifier is unique, otherwise we must return
an error.

Particularly, this will prevent issues with identifiers for serial,
console, IOAPIC, balloon, rng, watchdog, iommu and gpio since all of
these are hardcoded by the VMM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-03 18:34:24 +01:00
Rob Bradford
6d4862245d vmm: Generate event when device is removed
The new event contains the BDF and the device id:

{
  "timestamp": {
    "secs": 2,
    "nanos": 731073396
  },
  "source": "vm",
  "event": "device-removed",
  "properties": {
    "bdf": "0000:00:02.0",
    "id": "test-disk"
  }
}

Fixes: #4038

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-05-03 17:10:36 +02:00
Sebastien Boeuf
a5a2e591c9 vmm: Remove FsConfig from VmConfig when unplugging fs device
All hotpluggable devices were properly removed from the VmConfig when a
remove-device command was issued, except for the "fs" type. Fix this
lack of support as it is causing the integration tests to fail with the
recent addition of verifying that identifiers are unique.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
677c8831af vmm: Ensure uniqueness of generated identifiers
The device identifiers generated from the DeviceManager were not
guaranteed to be unique since they were not taking the list of
identifiers provided through the configuration.

By returning the list of unique identifiers from the configuration, and
by providing it to the DeviceManager, the generation of new identifiers
can rely both on the DeviceTree and the list of IDs from the
configuration.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
Sebastien Boeuf
634c53ea50 vmm: config: Validate provided identifiers are unique
A valid configuration means we can only accept unique identifiers from
the user.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-02 13:26:15 +02:00
LiHui
ec0c1b01c4 vmm: api: Do not delete the API socket on API server creation
The socket will safely deleted on shutdown and so it is not necessary to
delete the API socket when starting the HTTP server.

Fixes: #4026

Signed-off-by: LiHui <andrewli@kubesphere.io>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 18:40:49 +01:00
Rob Bradford
f17aa3755f vmm: Add clarifying comment about Vm::entry_point()
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
744a049007 vmm: Parallelise functionality with kernel loading
Move fuctionality earlier in the boot so as to run in parallel with the
loading of the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
e70bd069b3 vmm: Load kernel asynchronously
Start loading the kernel as possible in the VM in a separate thread.
Whilst it is loading other work can be carried out such as initialising
the devices.

The biggest performance improvement is seen with a more complex set of
devices. If using e.g. four virtio-net devices then the time to start the
kernel improves by 20-30ms. With the simplest configuration the
improvement was of the order of 2-3ms.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
bfeb3120f5 vmm: Refactor kernel loading to decouple from Vm struct
This will allow the kernel to be loaded from another thread.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
ce6d88d187 vmm: Merge aarch64 use statements
These were in their own block and not organised lexically.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
56fe4c61af vmm: Duplicate Vm::entry_point() across architectures
These will have very different implementations when asynchronously
loading the kernel.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
1d1a087fc5 vmm: Refactor kernel command line generation
This allows the same code for generating the kernel command line to be
used on both aarch64 and x86_64 when the latter starts loading the
kernel in asynchronously.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
f1276c58d2 vmm: Commandline inject from devices is aarch64 specific
This is not required for x86_64 and maintains a tight coupling between
kernel loading and the DeviceManager.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Rob Bradford
da33eb5e8c vmm: device_manager: Remove extra whitespace lines
These originated from the removal of the acpi feature gate.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-29 11:03:38 +01:00
Fabiano Fidêncio
fdeb4f7c46 Revert "vmm, openapi: Token Bucket fields should be uint64"
This reverts commit 87eed369cd.

The reason we're reverting this is that OpenAPI Specification[0] doesn't
know how to deal with unsigned types. :-/

Right now the best to do is keep it as it's, as an int64, and try to fix
OpenAPI, or even switch to swagger, as the latter knows how to properly
deal with those.  However, switching to swagger is far from being an 1:1
transition and will require time to experiment, thus reverting this for
now seems the best approach.

[0]: https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 09:26:38 +02:00
Fabiano Fidêncio
87eed369cd vmm, openapi: Token Bucket fields should be uint64
The Token Bucket fields are, on the Cloud Hypervisor side, u64.
However, we expose those as int64 in the OpenAPI YAML file.

With that in mind, let's adjust the yaml file to expose those as uint64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-27 13:16:02 +02:00
Rob Bradford
79f4c2db01 vmm: Enable virtio-iommu in VmConfig::validate()
This means that the automatic enabling of the virtio-iommu will also be
applied to VMs creates via the API as well as the CLI.

Fixes: #4016

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 12:27:00 +01:00
Rob Bradford
bf9f79081a vmm: Only create ACPI memory manager DSDT when resizable
If using the ACPI based hotplug only memory can be added so if the
hotplug RAM size is the same as the boot RAM size then do not include
the memory manager DSDT entries.

Also: this change simplifies the code marginally by making the
HotplugMethod enum Copyable.

This was identified from the following perf output:

     1.78%     0.00%  vmm              cloud-hypervisor      [.] <vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
            |
            ---<vmm::memory_manager::MemorySlots as acpi_tables::aml::Aml>::append_aml_bytes
               <vmm::memory_manager::MemorySlot as acpi_tables::aml::Aml>::append_aml_bytes
               acpi_tables::aml::Name::new
               <acpi_tables::aml::Path as acpi_tables::aml::Aml>::append_aml_bytes
               __libc_malloc

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-26 13:07:19 +02:00
Rob Bradford
62f17ccf8c vmm: Improve error handling for vmm::vm::Error
In particular implement thiserror::Error, cleanup wording and remove
unused errors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
cb03540ffd vmm: config: Derive thiserror::Error
No further changes are necessary that adding a #[derive(Error)] as there
is a manual implementation of Display.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
0270d697ab vmm: cpu: Improve Error reporting
Remove unused enum members, improve error messages and implement
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00
Rob Bradford
47529796d0 arch: Improve arch::Error
Remove unused error enum entries, improve wording and derive
thiserror::Error.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-22 17:46:41 +01:00