e.g. on QEMU on KVM:
cloud-hypervisor: 17.079406ms: <vmm> INFO:arch/src/x86_64/mod.rs:565 -- Running under nested virtualisation. Hypervisor string: KVMKVMKVM
Or under Azure:
cloud-hypervisor: 3.881263ms: <vmm> INFO:arch/src/x86_64/mod.rs:565 -- Running under nested virtualisation. Hypervisor string: Microsoft Hv
Fixes: #5067
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Without breaking the former way of declaring them. This is simply based
on the presence of the GUID TDX Metadata offset. If not present, we
consider the firmware is quite old and therefore we fallback onto the
previous way to expose memory resources.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The preferred way of retrieving the offset where to find the TDVF
descriptor structure is by going through a table of GUIDs that can be
found at a specific offset in the firmware file. If the expected GUIDs
can't be found, we can fallback onto the former way, which is to read
directly the value at a specific offset in the file.
This patch implements the new mechanism without breaking compatibility
for older firmwares as it keeps supporting the previous mechanism.
As a reference, here is the documentation from the EDK2 code, and
particularly from the OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm file:
```
GUIDed structure. To traverse this you should first verify the
presence of the table footer guid
(96b582de-1fb2-45f7-baea-a366c55a082d) at 0xffffffd0. If that
is found, the two bytes at 0xffffffce are the entire table length.
The table is composed of structures with the form:
Data (arbitrary bytes identified by guid)
length from start of data to end of guid (2 bytes)
guid (16 bytes)
so work back from the footer using the length to traverse until you
either find the guid you're looking for or run off the beginning of
the table.
```
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When configuring the vCPUs it is only necessary to provide the guest
memory when booting fresh (for populating the guest memory). As such
refactor the vCPU configuration to remove the use of the
GuestMemoryMmap stored on the CpuManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the KVM version is too old (pre Linux 5.7) then fetch the CPUID
information from the host and use that in the guest. We prefer the KVM
version over the host version as that would use the CPUID for the
running CPU vs the CPU that runs this code which might be different due
to a hybrid topology.
Fixes: #4918
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add TPM's CRB Interface specific address ranges to layouts
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>
On AArch64, `translate_gva` API is not provided by KVM. We implemented
it in VMM by walking through translation tables.
Address translation is big topic, here we only focus the scenario that
happens in VMM while debugging kernel. This `translate_gva`
implementation is restricted to:
- Exception Level 1
- Translate high address range only (kernel space)
This implementation supports following Arm-v8a features related to
address translation:
- FEAT_LPA
- FEAT_LVA
- FEAT_LPA2
The implementation supports page sizes of 4KiB, 16KiB and 64KiB.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The original value didn't include the size of the entry point structure.
The caused the last entry to be corrupted by other code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Previously the code used the SmbiosSysInfo structure for that purpose.
Handle 0x0003, DMI type 127, 27 bytes
<TRUNCATED>
Wrong DMI structures length: 130 bytes announced, structures occupy 131 bytes.
Fix this by using the correct structure and padding.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
There is no need to manually implement Clone for some structures.
It is better to explicitly spell out repr(C) to avoid the compiler
reordering the fields.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
This requires making get/set_lapic_reg part of the type.
For the moment we cannot provide a default variant for the new type,
because picking one will be wrong for the other hypervisor, so I just
drop the test cases that requires LapicState::default().
Signed-off-by: Wei Liu <liuwe@microsoft.com>
CpuId is an alias type for the flexible array structure type over
CpuIdEntry. The type itself and the type of the element in the array
portion are tied to the underlying hypervisor.
Switch to using CpuIdEntry slice or vector directly. The construction of
CpuId type is left to hypervisors.
This allows us to decouple CpuIdEntry from hypervisors more easily.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
We only need to do this for x86 since MSHV does not have aarch64 support
yet. This reduces unnecessary code churn.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
warning: you are deriving `PartialEq` and can implement `Eq`
--> vmm/src/serial_manager.rs:59:30
|
59 | #[derive(Debug, Clone, Copy, PartialEq)]
| ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Combined the `GicDevice` struct in `arch` crate and the `Gic` struct in
`devices` crate.
After moving the KVM specific code for GIC in `arch`, a very thin wapper
layer `GicDevice` was left in `arch` crate. It is easy to combine it
with the `Gic` in `devices` crate.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
`GicDevice` trait was defined for the common part of GicV3 and ITS.
Now that the standalone GicV3 do not exist, `GicDevice` is not needed.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Why combine:
- GicV3 is not required alone
- GicV3 and GicV3Its has separate snapshot/pause code. But the code of
GicV3 was never used.
- Reduce the code complexity of GIC related traits and structs.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
There is no need to include serde_derive separately,
as it can be specified as serde feature instead.
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
This variable name is residual from when these functions acted directly
on the vCPU fd rather than the hypervisor wrapper.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
EDK2 execution requires a flash device at address 0.
The new added device is not a fully functional flash. It doesn't
implement any spec of a flash device. Instead, a piece of memory is used
to simulate the flash simply.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Size of `memory` node in FDT was reduced by 4MiB. Now it is returned.
In memory regions, a `Ram` region of 4MiB was created at address 0. Now
it is removed.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This is a refactoring commit to simplify source code.
Removed some functions that only return a layout const.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Some addresses defined in `layout.rs` were of type `GuestAddress`, and
are `u64`. Now align the types of all the `*_START` definitions to
`GuestAddress`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The reserved space is for devices.
Some devices (like TPM) require arbitrary addresses close to 4GiB.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
`RAM_64BIT_START` was set to 1 GiB, not a real 64-bit address. Now
rename it `RAM_START` to avoid confusion.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Based on latest QEMU patches from branch tdx-qemu-2022.03.29-v7.0.0-rc1
we don't need EFI_RESOURCE_ATTRIBUTE_ENCRYPTED as part of the attributes
we must enable with EFI_RESOURCE_SYSTEM_MEMORY and
EFI_RESOURCE_MEMORY_RESERVED resource types.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Compile this feature in by default as it's well supported on both
aarch64 and x86_64 and we only officially support using it (no non-acpi
binaries are available.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a field for its length and fix up users.
Things work just because all hardcoded values agree with each other.
This is prone to breakage.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The logic of determining VCPU index in creating `cpu-map` node of FDT
can be optimized.
The code is invoked when VCPU topology is specified.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
In Flattened Device Tree (FDT) on AArch64, the VCPU topology is
represented by `cpu-map` node. The source code of creating the node
can be simplified.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Adding a new method to the TdHob structure so that we can easily insert
a HOB_PAYLOAD_INFO_TABLE into the HOB.
Signed-off-by: Jiaqi Gao <jiaqi.gao@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the recent updates of the TDVF specification introducing new
types of TDVF sections, let's extend the enum in our code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a new method to the TdHob structure so that we can easily insert
a ACPI_TABLE_HOB into the HOB.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's been decided the ACPI tables will be passed to the firmware in a
different way, rather than using TD_VMM_DATA. Since TD_VMM_DATA was
introduced for this purpose, there's no reason to keep it in our
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Drop the unused parameter throughout the code base.
Also take the chance to drop a needless clone.
No functional change intended.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
error: unneeded late initalization
Error: --> arch/src/aarch64/gic/gicv3_its.rs:127:9
|
127 | let attr: u64;
| ^^^^^^^^^^^^^^
|
= note: `-D clippy::needless-late-init` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_late_init
help: declare `attr` here
|
128 | let attr: u64 = if save {
| +++++++++++++++
help: remove the assignments from the branches
|
129 ~ u64::from(kvm_bindings::KVM_DEV_ARM_ITS_SAVE_TABLES)
130 | } else {
131 ~ u64::from(kvm_bindings::KVM_DEV_ARM_ITS_RESTORE_TABLES)
|
help: add a semicolon after the `if` expression
|
132 | };
| +
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Each PCI device under a root complex is uniquely identified by its
Requester ID (AKA RID). A Requester ID is a triplet of a Bus number,
Device number, and Function number.
MSIs may be distinguished in part through the use of sideband data
accompanying writes. In the case of PCI devices, this sideband data
may be derived from the Requester ID. A mechanism is required to
associate a device with both the MSI controllers it can address,
and the sideband data that will be associated with its writes to
those controllers.
This commit adds the `msi-map` property for PCI nodes, therefore
creating MSI mapping for each PCI device.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit rewrites the `create_pci_node` in the FDT creator to
create multiple PCI nodes based on the vector of `PciSpaceInfo`,
and each PCI node in FDT reflects a PCI segment.
- The PCI MMIO config space, 32 bits PCI device space and 64 bits
PCI device space is re-calculated based on the `PciSpaceInfo` for
each PCI segment.
- A new FDT property `linux,pci-domain` is added.
- The virtio-iommu node is only created for the first PCI segment.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The constant `PCI_MMIO_CONFIG_SIZE` defined in `vmm/pci_segment.rs`
describes the MMIO configuation size for each PCI segment. However,
this name conflicts with the `PCI_MMCONFIG_SIZE` defined in `layout.rs`
in the `arch` crate, which describes the memory size of the PCI MMIO
configuration region.
Therefore, this commit renames the `PCI_MMIO_CONFIG_SIZE` to
`PCI_MMIO_CONFIG_SIZE_PER_SEGMENT` and moves this constant from `vmm`
crate to `arch` crate.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Currently, a tuple containing PCI space start address and PCI space
size is used to pass the PCI space information to the FDT creator.
In order to support the multiple PCI segment for FDT, more information
such as the PCI segment ID should be passed to the FDT creator. If we
still use a tuple to store these information, the code flexibility and
readablity will be harmed.
To address this issue, this commit replaces the tuple containing the
PCI space information to a structure `PciSpaceInfo` and uses a vector
of `PciSpaceInfo` to store PCI space information for each segment, so
that multiple PCI segment information can be passed to the FDT together.
Note that the scope of this commit will only contain the refactor of
original code, the actual multiple PCI segments support will be in
following series, and for now `--platform num_pci_segments` should only
be 1.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
In order to avoid the identity map region to conflict with a possible
firmware being placed in the last 4MiB of the 4GiB range, we must set
the address to a chosen location. And it makes the most sense to have
this region placed right after the TSS region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Place the 3 page TSS at an explicit location in the 32-bit address space
to avoid conflicting with the loaded raw firmware.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Reduce the size of the reserved 32-bit address space to the range used
by both the PCI MMIO config data and the 32-bit PCI device space.
This avoids issues when using firmware that is loaded into the very top
of the 32-bit address space as the RAM conflicts with the reserved
memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the provided binary isn't an ELF binary assume that it is a firmware
to be loaded in directly. In this case we shouldn't program any of the
registers as KVM starts in that state.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This test generates an array of random numbers and then applies the same
trivial algorithm twice -- once in set_apic_delivery_mode and another
time in an anonymous function.
Its usefulness is limited. Drop it to remove one unsafe in code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Storing multiple data coming from the MemoryManager in order to be able
to restore without creating everything from scratch.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Move the definition of MSI space to layout.rs, so other crates can
reference it. Now it is needed by virtio-iommu.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Add a virtio-iommu node into FDT if iommu option is turned on. Now we
support only one virtio-iommu device.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This commit implements the GIC (including both GICv3 and GICv3ITS)
Pausable trait. The pause of device manager will trigger a "pause"
of GIC, where we flush GIC pending tables and ITS tables to the
guest RAM.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Based on `--memory-zone` and `--numa` param in the Cloud Hypervisor
cmdline, the NUMA memory configuration is described. This commit
adds such NUMA memory configuration to the FDT memory node.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
For the purpose of identification, each NUMA node is associated
with a unique token known as a `numa-node-id`. For the purpose of
device tree binding, a `numa-node-id` is a 32-bit integer.
The CPU node is associated with a NUMA node by the presence of a
`numa-node-id` property which contains the node id of the device.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The optional device tree node distance-map describes the relative
distance (memory latency) between all NUMA nodes.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>