We can ideally defer the address space allocation till we start the
vCPUs for the very first time. Because the VM will not access the memory
until the CPUs start running. Thus there is no need to allocate the
address space eagerly and wait till the time we are going to start the
vCPUs for the first time.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.
TEST=Run `cpuid` in the guest and observe the frequency populated.
Fixes: #5178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Updates include:
- Add references to 'TDX Tools'
- Expand instructions on buidling and using TDShim
- Add version information of guest/host kernel, TDVF, TDShim being tested
Signed-off-by: Bo Chen <chen.bo@intel.com>
This is required for booting Linux:
From: https://lore.kernel.org/all/20221028141220.29217-3-kirill.shutemov@linux.intel.com/
"""
Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions such as using specific instructions or accessing
a specific MSR.
Notable reason for #VE is access to specific guest physical addresses.
It requires special security considerations as it is not fully in
control of the guest kernel. VMM can remove a page from EPT page table
and trigger #VE on access.
The primary use-case for #VE on a memory access is MMIO: VMM removes
page from EPT to trigger exception in the guest which allows guest to
emulate MMIO with hypercalls.
MMIO only happens on shared memory. All conventional kernel memory is
private. This includes everything from kernel stacks to kernel text.
Handling exceptions on arbitrary accesses to kernel memory is
essentially impossible as handling #VE may require access to memory
that also triggers the exception.
TDX module provides mechanism to disable #VE delivery on access to
private memory. If SEPT_VE_DISABLE TD attribute is set, private EPT
violation will not be reflected to the guest as #VE, but will trigger
exit to VMM.
Make sure the attribute is set by VMM. Panic otherwise.
There's small window during the boot before the check where kernel has
early #VE handler. But the handler is only for port I/O and panic as
soon as it sees any other #VE reason.
SEPT_VE_DISABLE makes SEPT violation unrecoverable and terminating the
TD is the only option.
Kernel has no legitimate use-cases for #VE on private memory. It is
either a guest kernel bug (like access of unaccepted memory) or
malicious/buggy VMM that removes guest page that is still in use.
In both cases terminating TD is the right thing to do.
"""
With this change Cloud Hypervisor can boot the current Linux guest
kernel.
Reported-By: Jiaqi Gao <jiaqi.gao@intel.com
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Do the following:
1. Use from_be_bytes to drop mutable slices.
2. Check for the exact buffer size throughout.
3. Simplify ptm_to_request where possible.
4. Make error messages style consistent.
Fix a typo in code comment while at it.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
There is no guarantee that the write can send the whole buffer at once.
In those rare occasions, we should return a sensible error.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The largest possible PTM response is only 16 bytes. Size the output
buffer correctly.
In the socket read function, rely on the caller to provide a
sufficiently large buffer. That eliminates another large stack variable.
In total this saves almost 8KB stack space.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Change "thead" to "thread".
Also make sure the two messages are distinguishable by adding "vmm" and
"vm" prefix.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
The number of aligned operations can not be larger than the number of
descriptors. Initializing the capacity to 1 is good enough per the
observation that most of time there is only one data descriptor in a
given request.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Before Linux v6.0, AArch64 didn't support "socket" in "cpu-map"
(CPU topology) of FDT.
We found that clusters can be used in the same way of sockets. That is
the way we implemented the socket settings in Cloud Hypervisor. But in
fact it was a bug.
Linux commit 26a2b7 fixed the mistake. So the cluster nodes can no
longer act as sockets. And in a following commit dea8c0, sockets were
supported.
This patch fixed the way to configure sockets. In each socket, a default
cluster was added to contain all the cores, because cluster layer is
mandatory in CPU topology on AArch64.
This fix will break the socket settings on the guests where the kernel
version is lower than v6.0. In that case, if socket number is set to
more than 1, the kernel will treat that as FDT mistake and all the CPUs
will be put in single cluster of single socket.
The patch only impacts the case of using FDT, not ACPI.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Fix lowercase label to avoid "mkfs.fat: Warning: lowercase labels
might not work properly on some systems".
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
This patch adds a global execution timeout to the Jenkinsfile to avoid
infinite pending Jenkins pipelines, such as when certain worker nodes
are not available. The global execution timeout is now set to 4 hours
which is derived from total timeout of our longest stage (e.g. the
`Worker build`).
Fixes: #5148
Signed-off-by: Bo Chen <chen.bo@intel.com>
As a first time user of cloud-hypervisor and Rust environment
you get build errors with out this.
Signed-off-by: Ravi kumar Veeramally <ravikumar.veeramally@intel.com>
Right now integration test fails during the test run if
/dev/mshv or /dev/kvm does not exist. We should not
progress and exit early if not present.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Make the code more idiomatic by wrapping the actual size configured in
the returning Result type. This further allows simplifying
get_buffer_size.
The debug message in startup_tpm is more useful if it prints out the
actual size than the wanted size.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>