Serial device doesnt support resize semantics. Setting up the
console resize pipe in the Serial device setup path, overwrites
the setup done as part of virtio-console.
Signed-off-by: BharatNarasimman <bharatn@microsoft.com>
This patch removes locks in VmCreate request and VmInfo response
since we needn't use a lock here and should ensure that internal
implementation is transparent to the runtime.
Signed-off-by: Songqian Li <sionli@tencent.com>
Modify `Cargo.toml` in each member crate to follow the dependencies
specified in root `Cargo.toml` file.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
For a VM with virt-console enabled, when a reboot is requested, the
console devices are closed during the shutdown path. As part of this
the sigwinch listener process and the console resizer pipe are closed.
For the new incarnation of the VM, fresh set of console devices are
setup and a new console resizer pipe is created. The new VM should
be setup to use the newly created console devices including the console
resizer pipe.
Reading from the older console resizer pipe results in unexpected eof
error and terminates the cloud hypervisor process.
Signed-off-by: BharatNarasimman <bharatn@microsoft.com>
Rebooting a VM fails with the following error when debug assertions
are enabled:
fatal runtime error: IO Safety violation: owned file descriptor already closed
This happens because FromRawFd::from_raw_fd is used on RawFds stored
in ConsoleInfo every time a VM begins to boot, so the second
time (after a reboot, or if the first attempt to boot via the API
failed), the fd will be closed. Until this assertion is hit, the code
is operating on either closed file descriptors, or new file
descriptors for something completely different. If debug assertions
are disabled, it will just continue doing this with unpredictable
results.
To fix this, and prevent the problem reocurring, ownership of the
console file descriptors needs to be properly tracked, using Rust's
type system, so this commit refactors the console code to do that.
The file descriptors are now passed around with reference counts, so
they won't be closed prematurely. The obvious way to do this would be
to just have each member of ConsoleInfo be an Arc<File>, but we need
to accomodate that serial console file descriptors can also be
sockets. We can't just store an OwnedFd and convert it when it's
used, because we only get a reference from the Arc, so we need to
store the descriptors as their concrete types in an enum. Since this
basically duplicates the ConsoleOutputMode enum from the config, the
ConsoleOutputMode enum is now not used past constructing the
ConsoleInfo.
So that ownership can be represented consistently, the debug console's
tty mode now uses its own stdout descriptor.
I'm still using .try_clone().unwrap() (i.e. dup()) to clone file
descriptors for Endpoint::FilePair and Endpoint::TtyPair, because I
assume there's a reason for them not just to hold a single file
descriptor.
I've also retained the existing behaviour of having serial manager
ignore the tty file descriptor passed to it (which is stdout), and
instead using stdin. It looks a lot weirder now, because it has to
explicitly indicate it's ignoring the fd with an underscore binding.
Fixes: 52eebaf6 ("vmm: refactor DeviceManager to use console_info")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
The assignment of console_resize_pipe in the TTY case seems to have
been accidentally deleted. I've put it back, but since this is adding
code, I used the new safe API for checking whether a file is a
terminal, introduced in Rust 1.70.0. We should probably use that
everywhere, but that's out of scope of this bug fix.
Fixes: 52eebaf6 ("vmm: refactor DeviceManager to use console_info")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
I've moved this so that it's just after the enum definition, which
will hopefully make it less easy to miss if events are added/removed
again in future.
Fixes: 6d1077fc ("vmm: Unix socket backend for serial port")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
And modify to code to use the updated interfaces.
Arguments for map_guest_memory, get_dirty_bitmap, vp.run(),
import_isolated_pages, modify_gpa_host_access have changed.
Update these to use the new interfaces, including new MSHV_*
definitions, and remove some redundant arguments.
Update seccomp IOCTLs to reflect interface changes.
Fix irq-related definitions naming.
Bump vfio-ioctls to support mshv v0.3.0.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Bound checks for virtio-mem and ACPI memory hotplug are off by
one and two, respectively. This prevents users to fully use the reserved
memory hotplug size.
For ACPI, if we specific `--memory size=2G,hotplug_size=4G` and run
`ch-remote resize --memory 6G`, cloud-hypervisor will report the
following error because of the incorrect bound check:
`<vmm> ERROR:vmm/src/lib.rs:1631 -- Error when resizing VM:
MemoryManager(InsufficientHotplugRam)`
Similarly, for virtio-mem, cloud-hypervisor will fail the incorrect
bound check and abort the resize. The VM will see the following error
in dmesg:
`virtio_mem virtio3: unknown error, marking device broken: -22`
This patch has fixed both bound checks and ensure that users can
hot add memory up to the reserved hotplug size.
Signed-off-by: Yuhong Zhong <yz@cs.columbia.edu>
DeviceManager::add_virtio_console_device used to create the console
resize pipe and assign it to self.console_resize_pipe, but when this
was changed to use console_info, that was deleted without replacement.
This meant that, even though the console resize pipe was created by
pre_create_console_devices, the DeviceManager never found out about
it, so console resize didn't work (at least for pty consoles).
To fix this, the console resize pipe needs to be passed to the Vm
initializer, which is already supported, it was just previously not
used for new VMs.
Since DeviceManager already stores the console resize pipe in an Arc,
and Vmm also needs a copy of it, the sensible thing to do is change
DeviceManager::new to take Arc, and then we don't need to dup the file
descriptor, which could fail.
Fixes: 52eebaf6 ("vmm: refactor DeviceManager to use console_info")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Use the StandardRegisters defined in the hypervisor crate instead of
re-defining it from MSHV/KVM crate.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This fixes a issue of running vm compiled in debug with Rust
1.80.0 or later, where this check was introduced.
Signed-off-by: Wenyu Huang <huangwenyuu@outlook.com>
With this we are removing the CloudHypervisor definition of
StandardRegisters instead using an enum which contains different
variants of StandardRegisters coming from their bindigs crate.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Use the IOCTL numbers directly from mshv-ioctls instead of hardcoding
them in the seccomp filters.
Remove seccomp rules for unused ioctls:
MSHV_GET_VERSION_INFO,
MSHV_ASSERT_INTERRUPT.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Passing AccessPlatform trait to virtio-device for requesting
restricting page access during IO for SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement AccessPlatform for SEV-SNP guest to access
restricted page using IO. VMM calls MSHV api to get access
of the pages, MSHV requests guest to release the access.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Add a structure to hold the reference of the Vm trait
from Hypervisor crate to access of restricted page
from SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.
Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.
There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.
The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.
The Cloud Hypervisor component can be enabled with --pvmemcontrol.
Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
BusDevice trait functions currently holds a mutable reference to self,
and exclusive access is guaranteed by taking a Mutex when dispatched by
the Bus object. However, this prevents individual devices from serving
accesses that do not require an mutable reference or is better served
with different synchronization primitives. We switch Bus to dispatch via
BusDeviceSync, which holds a shared reference, and delegate locking to
the BusDeviceSync trait implementation for Mutex<BusDevice>.
Other changes are made to make use of the dyn BusDeviceSync
trait object.
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
HV APIC(i.e., synthetic APIC controller exposed by Microsoft Hypervisor)
does not support one-shot operation using a TSC deadline value. Due to
which we see the following backtrace inside the guest when running with
hypervisor-fw/OVMF:
[ 0.560765] unchecked MSR access error: WRMSR to 0x832 (tried to
write 0x00000000000400ec) at rIP: 0xffffffff8f473594
(native_write_msr+0x4/0x30)
[ 0.560765] Call Trace:
[ 0.560765] ? native_apic_msr_write+0x2b/0x30
[ 0.560765] __setup_APIC_LVTT+0xbc/0xe0
[ 0.560765] lapic_timer_set_oneshot+0x27/0x30
[ 0.560765] clockevents_switch_state+0xaf/0xf0
[ 0.560765] tick_setup_periodic+0x47/0x90
[ 0.560765] tick_setup_device.isra.0+0x7c/0x110
[ 0.560765] tick_check_new_device+0xce/0xf0
[ 0.560765] clockevents_register_device+0x82/0x170
[ 0.560765] clockevents_config_and_register+0x2f/0x40
[ 0.560765] setup_APIC_timer+0xe1/0xf0
[ 0.560765] setup_boot_APIC_clock+0x5f/0x66
[ 0.560765] native_smp_prepare_cpus+0x1d6/0x286
[ 0.560765] kernel_init_freeable+0xcf/0x255
[ 0.560765] ? rest_init+0xb0/0xb0
[ 0.560765] kernel_init+0xe/0x110
[ 0.560765] ret_from_fork+0x22/0x40
Also, if this feature is exposed guest would not finish booting and get
stuck right before unpacking the root filesystem.
Fixes: 06e8d1c40 ("hypervisor: mshv: fix topology for Intel HW on MSHV")
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This requires stashing the config values in `struct Vmm`. The configs
should be validated before before creating the VMM thread. Refactor the
code and update documentation where necessary.
The place where the rules are applied remain the same.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Add file/dir paths from landlock-rules arguments to ruleset. Invoke
apply_landlock on VmConfig to apply config specific rules to ruleset.
Once done, any threads spawned by vmm thread will be automatically
sandboxed with the ruleset in vmm thread.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Introduce ApplyLandlock trait and add implementations to VmConfig
elements with PathBufs. This trait adds config specific rules to
landlock ruleset.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>