This commit applies the previously created seccomp filter
to the `DbusApi` thread.
Also encloses the main loop of the `DBusApi` thread using
`std::panic::catch_unwind` and `AssertUnwindSafe` in order to mirror
the behavior of the HTTP API.
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
This commit adds support for graceful shutdown of the DBusApi thread
using `futures::channel::oneshot` channels. By using oneshot channels,
we ensure that the thread has enough time to send a response to the
`VmmShutdown` method call before it is terminated. Without this step,
the thread may be terminated before it can send a response, resulting
in an error message on the client side stating that the message
recipient disconnected from the message bus without providing a reply.
Also changes the default values for DBus service name, object path
and interface name.
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
This commit introduces three new dependencies: `zbus`, `futures`
and `blocking`. `blocking` is used to call the Internal API in zbus'
async context which is driven by `futures::executor`. They are all
behind the `dbus_api` feature flag.
The D-Bus API implementation is behind the same `dbus_api` feature
flag as well.
Signed-off-by: Omer Faruk Bayram <omer.faruk@sartura.hr>
The refactoring on deferring address space allocation (#5169) broke TDX,
as TDX initialization needs to access guest memory for encryption and
measurement of guest pages.
Signed-off-by: Bo Chen <chen.bo@intel.com>
The current implementation of breadth first traversal for device tree
uses a temporary vector, therefore causes unnecessary memory copy.
Remove it and do it within vector nodes.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Unlike KVM, there's no internal handling for topoolgy under MSHV. Thus,
if no topology has been passed during the CH launch, use the boot CPUs
count to construct the topology struct.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Originally the AML only accepted one hex number for PCI segment
numbering. Change it to accept two numbers. That makes it possible to
add up to 256 PCI segments.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
When I refactored this to centralise resetting the tty into
DeviceManager::drop, I tested that the tty was reset if an error
happened on the vmm thread, but not on the main thread. It turns out
that if an error happened on the main thread, the process would just
exit, so drop handlers on other threads wouldn't get run.
To fix this, I've changed start_vmm() to write to the VMM's exit
eventfd and then join the thread if an error happens after the vmm
thread is started.
Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Previously, we used two different functions for configuring ttys.
vmm_sys_util::terminal::Terminal::set_raw_mode() was used to configure
stdio ttys, and cfmakeraw() was used to configure ptys created by
cloud-hypervisor. When I centralized the stdio tty cleanup, I also
switched to using cfmakeraw() everywhere, to avoid duplication.
cfmakeraw sets the OPOST flag, but when we later reset the ttys, we
used vmm_sys_util::terminal::Terminal::set_canon_mode(), which does
not unset this flag. This meant that the terminal was getting mostly,
but not fully, reset.
To fix this without depending on the implementation of cfmakeraw(),
let's just store the original termios for stdio terminals, and restore
them to exactly the state we found them in when cloud-hypervisor exits.
Fixes: b6feae0a ("vmm: only touch the tty flags if it's being used")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
In particular the Std::write() API requires that the value implements
AsBytes and copies the slice representation into the table data. This
avoids unaligned writes which can cause a panic with the updated
toolchain.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
For structures that are used in SDT ACPI tables it is necessary for them
to implement this trait for the newly safe Std::write() API.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
This is used on older kernels where close_range() is not available.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Fixes: 505f4dfa ("vmm: close all unused fds in sigwinch listener")
On KVM this is provided by an ioctl, on MSHV this is constant. Although
there is a HV_MAXIMUM_PROCESSORS constant the MSHV ioctl API is limited
to u8.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
The custom 'clone' duplicates 'preserved_fds' so that the validation
logic can be safely carried out on the clone of the VmConfig.
The custom 'drop' ensures 'preserved_fds' are safely closed when the
holding VmConfig instance is destroyed.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Preserved FDs are the ones that share the same life-time as its holding
VmConfig instance, such as FDs for creating TAP devices.
Preserved FDs will stay open as long as the holding VmConfig instance is
valid, and will be closed when the holding VmConfig instance is destroyed.
Signed-off-by: Bo Chen <chen.bo@intel.com>
When neither serial nor console are connected to the tty,
cloud-hypervisor shouldn't touch the tty at all. One way in which
this is annoying is that if I am running cloud-hypervisor without it
using my terminal, I expect to be able to suspend it with ^Z like any
other process, but that doesn't work if it's put the terminal into raw
mode.
Instead of putting the tty into raw mode when a VM is created or
restored, do it when a serial or console device is created. Since we
now know it can't be put into raw mode until the Vm object is created,
we can move setting it back to canon mode into the drop handler for
that object, which should always be run in normal operation. We still
also put the tty into canon mode in the SIGTERM / SIGINT handler, but
check whether the tty was actually used, rather than whether stdin is
a tty. This requires passing on_tty around as an atomic boolean.
I explored more of an abstraction over the tty — having an object that
encapsulated stdout and put the tty into raw mode when initialized and
into canon mode when dropped — but it wasn't practical, mostly due to
the special requirements of the signal handler. I also investigated
whether the SIGWINCH listener process could be used here, which I
think would have worked but I'm hesitant to involve it in serial
handling as well as conosle handling.
There's no longer a check for whether the file descriptor is a tty
before setting it into canon mode — it's redundant, because if it's
not a tty it just won't respond to the ioctl.
Tested by shutting down through the API, SIGTERM, and an error
injected after setting raw mode.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
If the VM is shut down, either it's going to be started again, in
which case we still want to be in raw mode, or the process is about to
exit, in which case canon mode will be set at the end of main.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Having PMU in guests isn't critical, and not all hardware supports
it (e.g. Apple Silicon).
CpuManager::init_pmu already has a fallback for if PMU is not
supported by the VCPU, but we weren't getting that far, because we
would always try to initialise the VCPU with KVM_ARM_VCPU_PMU_V3, and
then bail when it returned with EINVAL.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Previously, we were only using it for PTYs, because for PTYs there's
no alternative. But since we have to have it for PTYs anyway, if we
also use it for TTYs, we can eliminate all of the code that handled
SIGWINCH for TTYs.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Now that the SIGWINCH listener has fallbacks for older kernels, we
don't expect it to routinely fail, so if there's an error setting it
up, we want to know about it.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This will allow the SIGWINCH listener to run on kernels older than
5.5, although on those kernels it will have to make 64 syscalls to
reset all the signal handlers.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
The PTY main file descriptor had to be introduced as a parameter to
start_sigwinch_listener, so that it could be closed in the child.
Really the SIGWINCH listener process should not have any file
descriptors open, except for the ones it needs to function, so let's
make it more robust by having it close all other file descriptors.
For recent kernels, we can do this very conveniently with
close_range(2), but for older kernels, we have to fall back to closing
open file descriptors one at a time.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Significant API changes have occured, most significantly is the switch
to an approach which does not require vm-memory and can run no_std.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Now cloud hypervisor will start signal thread to catch
SIGWINCH signal, cloud hypervisor then will resize the
guest console via vconsole.
This patch skip starting signal thread when there is no
need to resize guest console, such as console is not
configured.
Signed-off-by: Yong He <alexyonghe@tencent.com>
The PR #2333 added I/O rate limiter on block device, with some options
in `DiskConfig`. And the PR #2401 added rate limiter on virtio-net
device with same options, but it still throws `Error::ParseDisk`.
This commit fixes it with correct values.
Fixes: #2401
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
Once error occur, vcpu thread may exit, this should
be critical event for the whole VM, we should fire
exit event and set vcpu state.
If we don't set vcpu state, the shutdown process
will hang at signal_thread, which is waiting the
vcpu state to change.
Signed-off-by: Yong He <alexyonghe@tencent.com>
We need to provide valid FDs while creating 'NetConfig' instances even
for unit tests. Closing invalid FDs would cause random unit test
failures.
Also, two identical 'NetConfig' instances are not allowed any more,
because it would lead to close the same FD twice. This is consistent
with the fact that a clone of a "NetConfig" instance is no
longer *equal* to the instance itself.
Fixes: #5203
Signed-off-by: Bo Chen <chen.bo@intel.com>
These are owned by the config (and are duplicated before being used to
create the `Tap` for the virtio-net device.)
By implementing Drop on NetConfig we have issues with moving out of
members that don't implement the Copy trait. This requires a small
adjustment to the unit tests that use the Default::default() function.
Fixes: #5197
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The custom version duplicates any FDs that have been provided so that
the validation logic used on hotplug, which takes a clone of the config,
can be safely carried out.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This code is indentical to what is in this repository. When a release
gets made we can then switch to that.
Fixes: #5122
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If swtpm becomes unresponsive, guest gets blocked at "recvmsg" on tpm's
data FD. This change adds a timeout to the data fd socket. If swtpm
becomes unresponsive guest waits for "timeout" (secs) and continues to
run after returning an I/O error to tpm commands.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
We can ideally defer the address space allocation till we start the
vCPUs for the very first time. Because the VM will not access the memory
until the CPUs start running. Thus there is no need to allocate the
address space eagerly and wait till the time we are going to start the
vCPUs for the first time.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.
TEST=Run `cpuid` in the guest and observe the frequency populated.
Fixes: #5178
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Change "thead" to "thread".
Also make sure the two messages are distinguishable by adding "vmm" and
"vm" prefix.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
In order to comply with latest TDX version, we rely onto the branch
kvm-upstream-2022.08.07-v5.19-rc8 from https://github.com/intel/tdx
repository. Updates are based on changes that happened in
arch/x86/include/uapi/asm/kvm.h headers file.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A few breaking changes:
1. `-vvv` needs to be written as `-v -v -v`.
2. `--disk D1 D2` and others need to be written as `--disk D1 --disk D2`.
3. `--option=value` needs to be written as `--option value`
Change integration tests to adapt to the breaking changes.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Add new configuration for offloading features, including
Checksum/TSO/UFO, and set these offloading features as
enabled by default.
Fixes: #4792.
Signed-off-by: Yong He <alexyonghe@tencent.com>
MSHV does not require to ensure MMIO/PIO exits complete
before pausing. This patch makes sure the above requirement
by checking the hypervisor type run-time.
Fixes#5037
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This functionality has been obsoleted by our native support for
hugepages and shared memory.
See: #5082
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To align the logging messages with the rest of the code, this
message should be aligned with another similar occurrence in
epoll_helper.rs
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
The double underscore made it different from how other projects would
name this particular macro.
No functional change.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Remove from the documentation and API definition but continue support
using the field (with a deprecation warning.)
See: #4837
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This simplifies the Snapshot creation as we expect a SnapshotData to be
provided most of the time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The information about the identifier related to a Snapshot is only
relevant from the BTreeMap perspective, which is why we can get rid of
the duplicated identifier in every Snapshot structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>