We must explicitly mark these values as u8 as the function that consumes
them takes a T and needs to use the specific width.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We will need the GDT API for the hypervisor's x86 instruction
emulator implementation, it's better if the arch crate depends on the
hypervisor one rather than the other way around.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Before Virtio-mmio was removed, we passed an optional PCI space address
parameter to AArch64 code for generating FDT. The address is none if the
transport is MMIO.
Now Virtio-PCI is the only option, the parameter is mandatory.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Virtio-mmio is removed, now virtio-pci is the only option for virtio
transport layer. We use MSI for PCI device interrupt. While GICv2, the
legacy interrupt controller, doesn't support MSI. So GICv2 is not very
practical for Cloud-hypervisor, we can remove it.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
If the user specified a maximum physical bits value through the
`max_phys_bits` option from `--cpus` parameter, the guest CPUID
will be patched accordingly to ensure the guest will find the
right amount of physical bits.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The OneRegister literally means "one (arbitrary) register". Just call it
"Register" instead. There is no need to inherit KVM's naming scheme in
the hypervisor agnostic code.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to speed up the Linux boot (so as to avoid it having to scan a
large number of pages) place the MP table directly after the SMBIOS
table if there is sufficient room. The start address of the SMBIOS table
is one of the three (and the largest) location that the MP table can
also be located at.
Before:
[ 0.000399] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.014945] check: Scanning 1 areas for low memory corruption
After:
[ 0.000284] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.000421] found SMP MP-table at [mem 0x000f0090-0x000f009f]
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The states of GIC should be part of the VM states. This commit
enables the AArch64 VM states save/restore by adding save/restore
of GIC states.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Currently for AArch64, the GICv3-ITS is tried to be created first
when PCI is not needed, which is unnecessary. This commit fixes
the problem.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Since calling `KVM_GET_ONE_REG` before `KVM_VCPU_INIT` will
result in an error: Exec format error (os error 8). This commit
decouples the vCPU init process from `configure_vcpus`. Therefore
in the process of restoring the vCPUs, these vCPUs can be
initialized separately before started.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The value of GIC register `GICR_TYPER` is needed in restoring
the GIC states. This commit adds a field in the GIC device struct
and a method to construct its value.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
In AArch64 systems, the state of GIC device can only be
retrieved from `KVM_GET_DEVICE_ATTR` ioctl. Therefore to implement
saving/restoring the GIC states, we need to make sure that the
GIC object (either the file descriptor or the device itself) can
be extracted after the VM is started.
This commit refactors the code of GIC creation by adding a new
field `gic_device_entity` in device manager and methods to set/get
this field. The GIC object can be therefore saved in the device
manager after calling `arch::configure_system`.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds a function which allows to save RDIST pending
tables to the guest RAM, as well as unit test case for it.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit implements the `get_device_attr` method for the
`KVM_GET_DEVICE_ATTR` ioctl. This ioctl will be used in retrieving
the GIC status.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit moves the GIC-related code to a separate module.
Therefore the implementation of GIC registers can be introduced
to the new module.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit ports code from firecracker and refactors the existing
AArch64 code as the preparation for implementing save/restore
AArch64 vCPU, including:
1. Modification of `arm64_core_reg` macro to retrive the index of
arm64 core register and implemention of a helper to determine if
a register is a system register.
2. Move some macros and helpers in `arch` crate to the `hypervisor`
crate.
3. Added related unit tests for above functions and macros.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Inject CPUID leaves for advertising KVM HyperV support when the
"kvm_hyperv" toggle is enabled. Currently we only enable a selection of
features required to boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Shrink GICDevice trait to contain hypervisor agnostic API's only, which
are used in generating FDT.
Move all KVM specific logic into KvmGICDevice trait.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This commit enables some mmio-related integration test cases on
AArch64, including:
* some vhost_user test cases
* virtio-blk test cases
* pmem test cases
Also this commit contains a bug fix in creating virtio-blk device.
Previously, when creating the FDT, the virtio-blk device was
labeled in the reverse order of address allocation.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
The retry order to create virtual GIC is GICv3-ITS, GICv3 and GICv2.
But there was not log message to show what was finally created.
The log messages also mute the warning for unused "log" crate.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
SGX expects the EPC region to be reported as "reserved" from the e820
table. This patch adds a new entry to the table if SGX is enabled.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The support for SGX is exposed to the guest through CPUID 0x12. KVM
passes static subleaves 0 and 1 from the host to the guest, without
needing any modification from the VMM itself.
But SGX also relies on dynamic subleaves 2 through N, used for
describing each EPC section. This is not handled by KVM, which means
the VMM is in charge of setting each subleaf starting from index 2
up to index N, depending on the number of EPC sections.
These subleaves 2 through N are not listed as part of the supported
CPUID entries from KVM. But it's important to set them as long as index
0 and 1 are present and indicate that SGX is supported.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the presence of one or multiple SGX EPC sections from the VM
configuration, the MemoryManager will allocate a contiguous block of
guest address space to hold the entire EPC region. Within this EPC
region, each EPC section is memory mapped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The type is now hypervisor::Vm. Switch from KVM specific name vm_fd to a
generic name just like 8186a8eee6 ("vmm: interrupt: Rename vm_fd").
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The OVMF firmware loops around looking for an entry marking the end of
the table. Without this entry processing the tables is an infinite loop.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Taken from crosvm: 44336b913126d73f9f8d6854f57aac92b5db809e and adapted
for Cloud Hypervisor.
This is basic and incomplete support but Linux correctly finds the DMI
data based on this:
root@clr-c6ed47bc1c9d473d9a3a8bddc50ee4cb ~ # dmesg | grep -i dmi
[ 0.000000] DMI: Cloud Hypervisor cloud-hypervisor, BIOS 0
root@clr-c6ed47bc1c9d473d9a3a8bddc50ee4cb ~ # dmesg | grep -i smbio
[ 0.000000] SMBIOS 3.2.0 present.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit fixes some warnings introduced in the previous
hyperviosr crate PR.Removed some unused variables from arch/aarch64
module.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
There are two CPUID leaves for handling CPU topology, 0xb and 0x1f. The
difference between the two is that the 0x1f leaf (Extended Topology
Leaf) supports exposing multiple die packages.
Fixes: #1284
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The extended topology leaf (0x1f) also needs to have the APIC ID (which
is the KVM cpu ID) set. This mirrors the APIC ID set on the 0xb topology
leaf
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently, not every feature of the cloud-hypervisor is enabled
on AArch64, which means that on AArch64 machines, the
`run_unit_tests.sh` needs to be tailored and some unit test cases
should be run on x86_64 only.
Also this commit fixes the typo and unifies `Arm64` and `AArch64`
in the AArch64 document.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Implemented GSI allocator and system allocator for AArch64.
Renamed some layout definitions to align more code between architectures.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
For correctness, when the CPUID supports the LA57 feature, the VMM sets
the CR4.LA57 register, which means a fifth level of page table might be
needed. Even if it's not needed because the kernel should not use
addresses over 1GiB, it's better to define this new level anyway.
This patch only applies to the Linux boot codepath, which means it
affects both vmlinux without PVH and bzImage binaries. The bzImage
does not need this since the page tables and CR4 registers are set in
the decompression code from the kernel.
And for vmlinux with PVH, if we follow the PVH specification, the kernel
must be responsible for setting things up, but the implementation is
missing. This means for now that PVH does not support LA57 with 5 levels
of paging.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the host CPU exposes the support for LA57 feature through its
cpuid, the CR4.LA57 bit is enabled accordingly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When booting VM on AArch64 machines, we need to construct the
flattened device tree before loading kernel. Hence here we add
the implementation of the flattened device tree for AArch64.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
As on AArch64 systems we need register mpidr to create the
flattened device tree, here in this commit we add ported AArch64
register implementation from Firecracker and related changes to
make this commit build.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds ported code of Generic Interrupt Controller (GIC)
software implementation for AArch64, including both GICv2 and
GICv3 devices. These GIC devices are actually emulated by the
host kernel through KVM and will be used in the guest VM as the
interrupt controller for AArch64.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
This commit adds the memory layout design for AArch64 in
`arch/src/aarch64/layout.rs` and related changes in
`arch/src/lib.rs` to make sure this commit can build.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Now the flow of both architectures are aligned to:
1. load kernel
2. create VCPU's
3. configure system
4. start VCPU's
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
This is a preparing commit to build and test CH on AArch64. All building
issues were fixed, but no functionality was introduced.
For X86, the logic of code was not changed at all.
For ARM, the architecture specific part is still empty. And we applied
some tricks to workaround lint warnings. But such code will be replaced
later by other commits with real functionality.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Use the ACPI feature to control whether to build the mptable. This is
necessary as the mptable and ACPI RSDP table can easily overwrite each
other leading to it failing to boot.
TEST=Compile with default features and see that --cpus boot=48 now
works, try with --no-default-features --features "pci" and observe the
--cpus boot=48 also continues to work.
Fixes: #1132
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The setup_mptables() call which is not used on ACPI builds has a side
effect of testing whether there was enough RAM which one of the unit
tests was relying on. Add a similar check for the RSDP address.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Fill and write to guest memory the necessary boot module
structure to allow a guest using the PVH boot protocol
to load an initramfs image.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
We need the project to rely on kvm-bindings and kvm-ioctls branches
which include the serde derive to be able to serialize and deserialize
some KVM structures.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Commit 2adddce2 reorganized the crate for a cleaner multi architecture
(x86_64 and aarch64) support.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
* load the initramfs File into the guest memory, aligned to page size
* finally setup the initramfs address and its size into the boot params
(in configure_64bit_boot)
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
We set it to 0xff, which is for unregistered loaders.
The kernel checks that the bootloader ID is set when e.g. loading
ramdisks, so not setting it when we get a bootparams header from the
loader will prevent the kernel from loading ramdisks.
Fixes: #918
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Validate correct GDT entries, initial segment configuration, and control
register bits that are required by PVH boot protocol.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Expand the unit tests to cover the configure_system() code when
using the PVH boot protocol. Verify the method for adding memory
map table entries in the format specified by PVH boot protocol.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Fill the hvm_start_info and related memory map structures as
specified in the PVH boot protocol. Write the data structures
to guest memory at the GPA that will be stored in %rbx when
the guest starts.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
In order to properly initialize the kvm regs/sregs structs for
the guest, the load_kernel() return type must specify which
boot protocol to use with the entry point address it returns.
Make load_kernel() return an EntryPoint struct containing the
required information. This structure will later be used
in the vCPU configuration methods to setup the appropriate
initial conditions for the guest.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Create supporting definitions to use the hvm start info and memory
map table entry struct definitions from the linux-loader crate in
order to enable PVH boot protocol support
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Remove ACPI table creation from arch crate to the vmm crate simplifying
arch::configure_system()
GuestAddress(0) is used to mean no RSDP table rather than adding
complexity with a conditional argument or an Option type as it will
evaluate to a zero value which would be the default anyway.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add basic processor details to the DSDT table. The code has to be
slightly convoluted (with the second pass over the cpu_devices vector)
in order to keep the objects alive long enough in order to be able to
take their reference.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We need to rely on the latest kvm-ioctls version to benefit from the
recent addition of unregister_ioevent(), allowing us to detach a
previously registered eventfd to a PIO or MMIO guest address.
Because of this update, we had to modify the current constraint we had
on the vmm-sys-util crate, using ">= 0.1.1" instead of being strictly
tied to "0.2.0".
Once the dependency conflict resolved, this commit took care of fixing
build issues caused by recent modification of kvm-ioctls relying on
EventFd reference instead of RawFd.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
To avoid a clash with to_bytes() for the unsigned integer types that is
coming in a future release.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This was verified by comparing the ASL from disassembling the DSDT
before and after. All the individual AML components themselves are also
unit tested.
Fixes: #352
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The virtual IOMMU exposed through virtio-iommu device has a dependency
on ACPI. It needs to expose the device ID of the virtio-iommu device,
and all the other devices attached to this virtual IOMMU. The IDs are
expressed from a PCI bus perspective, based on segment, bus, device and
function.
The guest relies on the topology description provided by the IORT table
to attach devices to the virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The PCI Express Firmware specification says that the region may
be included in the E820 tables (but it must always be in the ACPI
tables.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The PCI Express Firmware spec says that the region to be used for PCI
MMCONFIG should be reserved as part of the motherboard's resources in
the ACPI tables.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The PCI MMCONFIG area must be below 4GiB and must not be part of the
device space. Shrink the device area and put the PCI MMCONFIG region
above it.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Patch the table with the currently used constants. This will be relevant
when we want to adjust the size of the PCI device area to accomodate the
PCI MMCONFIG region.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
These are part of RAM and are used as the initial page table entries for
booting the OS and firmware (identity mapping.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Using the existing layout module start documenting the major regions of
RAM and those areas that are reserved. Some of the constants have also
been renamed to be more consistent and some functions that returned
constant variables have been replaced.
Future commits will move more constants into this file to make it the
canonical source of information about the memory layout.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The last byte was missing from the E820 RAM area. This was due to the
function using the last address relative to the first address in the
range to calculate the size. This incorrectly calculated the size by
one. This produced incorrect E820 tables like this:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffffffe] usable
Signed-off-by: Rob Bradford <robert.bradford@intel.com>