If the VMM is not already paused then pause the VM prior to executing
the coredump and then resume it after. If the VM is already paused then
the original state is maintained.
Signed-off-by: Yi Wang <foxywang@tencent.com>
If the VM has been configured but not yet booted, all we need to do to
support removing a device is to remove it from the config, so it will
never be created.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
The refactoring on deferring address space allocation (#5169) broke TDX,
as TDX initialization needs to access guest memory for encryption and
measurement of guest pages.
Signed-off-by: Bo Chen <chen.bo@intel.com>
When I refactored this to centralise resetting the tty into
DeviceManager::drop, I tested that the tty was reset if an error
happened on the vmm thread, but not on the main thread. It turns out
that if an error happened on the main thread, the process would just
exit, so drop handlers on other threads wouldn't get run.
To fix this, I've changed start_vmm() to write to the VMM's exit
eventfd and then join the thread if an error happens after the vmm
thread is started.
Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Previously, we used two different functions for configuring ttys.
vmm_sys_util::terminal::Terminal::set_raw_mode() was used to configure
stdio ttys, and cfmakeraw() was used to configure ptys created by
cloud-hypervisor. When I centralized the stdio tty cleanup, I also
switched to using cfmakeraw() everywhere, to avoid duplication.
cfmakeraw sets the OPOST flag, but when we later reset the ttys, we
used vmm_sys_util::terminal::Terminal::set_canon_mode(), which does
not unset this flag. This meant that the terminal was getting mostly,
but not fully, reset.
To fix this without depending on the implementation of cfmakeraw(),
let's just store the original termios for stdio terminals, and restore
them to exactly the state we found them in when cloud-hypervisor exits.
Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
When neither serial nor console are connected to the tty,
cloud-hypervisor shouldn't touch the tty at all. One way in which
this is annoying is that if I am running cloud-hypervisor without it
using my terminal, I expect to be able to suspend it with ^Z like any
other process, but that doesn't work if it's put the terminal into raw
mode.
Instead of putting the tty into raw mode when a VM is created or
restored, do it when a serial or console device is created. Since we
now know it can't be put into raw mode until the Vm object is created,
we can move setting it back to canon mode into the drop handler for
that object, which should always be run in normal operation. We still
also put the tty into canon mode in the SIGTERM / SIGINT handler, but
check whether the tty was actually used, rather than whether stdin is
a tty. This requires passing on_tty around as an atomic boolean.
I explored more of an abstraction over the tty — having an object that
encapsulated stdout and put the tty into raw mode when initialized and
into canon mode when dropped — but it wasn't practical, mostly due to
the special requirements of the signal handler. I also investigated
whether the SIGWINCH listener process could be used here, which I
think would have worked but I'm hesitant to involve it in serial
handling as well as conosle handling.
There's no longer a check for whether the file descriptor is a tty
before setting it into canon mode — it's redundant, because if it's
not a tty it just won't respond to the ioctl.
Tested by shutting down through the API, SIGTERM, and an error
injected after setting raw mode.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
If the VM is shut down, either it's going to be started again, in
which case we still want to be in raw mode, or the process is about to
exit, in which case canon mode will be set at the end of main.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Previously, we were only using it for PTYs, because for PTYs there's
no alternative. But since we have to have it for PTYs anyway, if we
also use it for TTYs, we can eliminate all of the code that handled
SIGWINCH for TTYs.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Now cloud hypervisor will start signal thread to catch
SIGWINCH signal, cloud hypervisor then will resize the
guest console via vconsole.
This patch skip starting signal thread when there is no
need to resize guest console, such as console is not
configured.
Signed-off-by: Yong He <alexyonghe@tencent.com>
We can ideally defer the address space allocation till we start the
vCPUs for the very first time. Because the VM will not access the memory
until the CPUs start running. Thus there is no need to allocate the
address space eagerly and wait till the time we are going to start the
vCPUs for the first time.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Change "thead" to "thread".
Also make sure the two messages are distinguishable by adding "vmm" and
"vm" prefix.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
In order to comply with latest TDX version, we rely onto the branch
kvm-upstream-2022.08.07-v5.19-rc8 from https://github.com/intel/tdx
repository. Updates are based on changes that happened in
arch/x86/include/uapi/asm/kvm.h headers file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This simplifies the Snapshot creation as we expect a SnapshotData to be
provided most of the time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no reason to carry a HashMap of SnapshotDataSection per
Snapshot. And given we now provide at most one SnapshotDataSection per
Snapshot, there's no need to keep the id part of the SnapshotDataSection
structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Without breaking the former way of declaring them. This is simply based
on the presence of the GUID TDX Metadata offset. If not present, we
consider the firmware is quite old and therefore we fallback onto the
previous way to expose memory resources.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The coredump functionality is only implemented for x86_64 so it should
only be compiled in there.
Fixes: #4964
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
TDX was broken by the recent refactoring moving the vCPU creation
earlier than before. The simple and correct way to fix this problem is
by moving the TDX initialization right before the vCPUs creation. The
rest of the TDX setup can remain where it is.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This removes the storage of the GuestMemoryMmap on the CpuManager
further allowing the decoupling of the CpuManager from the
MemoryManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When configuring the vCPUs it is only necessary to provide the guest
memory when booting fresh (for populating the guest memory). As such
refactor the vCPU configuration to remove the use of the
GuestMemoryMmap stored on the CpuManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Thanks to the new way of restoring Vm, we can now create the Vm object
directly with the appropriate VmState rather than having to patch it at
a later time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
No need to provide a boolean to know if the VM is being restored given
we already have this information from the Option<Snapshot>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now the entire codebase has been moved to the new restore design, we can
complete the work by creating a dedicated restore() function for the Vm
object and get rid of the method restore() from the Snapshottable trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The snapshot and restore of AArch64 Gic was done in Vm. Now it is moved
to DeviceManager.
The benefit is that the restore can be done while the Gic is created in
DeviceManager.
While the moving of state data from Vm snapshot to DeviceManager
snapshot breaks the compatability of migration from older versions.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Given the recent factorization that happened in vm.rs, we're now able to
merge Vm::new_from_snapshot with Vm::new.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This moves the devices creation out of the dedicated restore function
which will be eventually removed.
This factorizes the creation of all devices into a single location.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows the clock restoration to be moved out of the dedicated
restore function, which will eventually be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on all the work that has already been merged, it is now possible
to fully move DeviceManager out of the previous restore model, meaning
there's no need for a dedicated restore() function to be implemented
there.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Every Vcpu is now created with the right state if there's an available
snapshot associated with it. This simplifies the restore logic.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the new process, `device::Gic::new()` covers additional actions:
1. Creating `hypervisor::vGic`
2. Initializing interrupt routings
The change makes the vGic device ready in the beginning of
`DeviceManager::create_devices()`. This can unblock the GIC related
devices initialization in the `DeviceManager`.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Moving the creation of the vCPUs before the DeviceManager gets created
will allow for the aarch64 vGIC to be created before the DeviceManager
as well in a follow up patch. The end goal being to adopt the same
creation sequence for both x86_64 and aarch64, and keeping in mind that
the vGIC requires every vCPU to be created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Split the vCPU creation into two distincts parts. On the one hand we
create the actual Vcpu object with the creation of the hypervisor::Vcpu.
And on the other hand, we configure the existing Vcpu, setting registers
to proper values (such as setting the entry point).
This will allow for further work to move the creation earlier in the
boot, so that the hypervisor::Vcpu will be already created when the
DeviceManager gets created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
The CpuManager is now created before the DeviceManager. This is required
as preliminary work for creating the vCPUs before the DeviceManager,
which is required to ensure both x86_64 and aarch64 follow the same
sequence.
It's important to note the optimization for faster PIO accesses on the
PCI config space had to be removed given the VmOps was required by the
CpuManager and by the Vcpu by extension. But given the PciConfigIo is
created as part of the DeviceManager, there was no proper way of moving
things around so that we could provide PciConfigIo early enough.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This optimisation provided some peformance improvement when measured by
perf however when considered in terms of boot time peformance this
optimisation doesn't have any impact measurable using our
peformance-metrics tooling.
Removing this optimisation helps simplify the VMM internals as it allows
the reordering of the VM creation process permitting refactoring of the
restore code path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an TPM2 entry to DSDT ACPI table. Add a TPM2 table to guest's ACPI.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-authored-by: Sean Yoo <t-seanyoo@microsoft.com>