Bound checks for virtio-mem and ACPI memory hotplug are off by
one and two, respectively. This prevents users to fully use the reserved
memory hotplug size.
For ACPI, if we specific `--memory size=2G,hotplug_size=4G` and run
`ch-remote resize --memory 6G`, cloud-hypervisor will report the
following error because of the incorrect bound check:
`<vmm> ERROR:vmm/src/lib.rs:1631 -- Error when resizing VM:
MemoryManager(InsufficientHotplugRam)`
Similarly, for virtio-mem, cloud-hypervisor will fail the incorrect
bound check and abort the resize. The VM will see the following error
in dmesg:
`virtio_mem virtio3: unknown error, marking device broken: -22`
This patch has fixed both bound checks and ensure that users can
hot add memory up to the reserved hotplug size.
Signed-off-by: Yuhong Zhong <yz@cs.columbia.edu>
DeviceManager::add_virtio_console_device used to create the console
resize pipe and assign it to self.console_resize_pipe, but when this
was changed to use console_info, that was deleted without replacement.
This meant that, even though the console resize pipe was created by
pre_create_console_devices, the DeviceManager never found out about
it, so console resize didn't work (at least for pty consoles).
To fix this, the console resize pipe needs to be passed to the Vm
initializer, which is already supported, it was just previously not
used for new VMs.
Since DeviceManager already stores the console resize pipe in an Arc,
and Vmm also needs a copy of it, the sensible thing to do is change
DeviceManager::new to take Arc, and then we don't need to dup the file
descriptor, which could fail.
Fixes: 52eebaf6 ("vmm: refactor DeviceManager to use console_info")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Use the StandardRegisters defined in the hypervisor crate instead of
re-defining it from MSHV/KVM crate.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This fixes a issue of running vm compiled in debug with Rust
1.80.0 or later, where this check was introduced.
Signed-off-by: Wenyu Huang <huangwenyuu@outlook.com>
With this we are removing the CloudHypervisor definition of
StandardRegisters instead using an enum which contains different
variants of StandardRegisters coming from their bindigs crate.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Use the IOCTL numbers directly from mshv-ioctls instead of hardcoding
them in the seccomp filters.
Remove seccomp rules for unused ioctls:
MSHV_GET_VERSION_INFO,
MSHV_ASSERT_INTERRUPT.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Passing AccessPlatform trait to virtio-device for requesting
restricting page access during IO for SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement AccessPlatform for SEV-SNP guest to access
restricted page using IO. VMM calls MSHV api to get access
of the pages, MSHV requests guest to release the access.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Add a structure to hold the reference of the Vm trait
from Hypervisor crate to access of restricted page
from SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.
Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.
There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.
The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.
The Cloud Hypervisor component can be enabled with --pvmemcontrol.
Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
BusDevice trait functions currently holds a mutable reference to self,
and exclusive access is guaranteed by taking a Mutex when dispatched by
the Bus object. However, this prevents individual devices from serving
accesses that do not require an mutable reference or is better served
with different synchronization primitives. We switch Bus to dispatch via
BusDeviceSync, which holds a shared reference, and delegate locking to
the BusDeviceSync trait implementation for Mutex<BusDevice>.
Other changes are made to make use of the dyn BusDeviceSync
trait object.
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
HV APIC(i.e., synthetic APIC controller exposed by Microsoft Hypervisor)
does not support one-shot operation using a TSC deadline value. Due to
which we see the following backtrace inside the guest when running with
hypervisor-fw/OVMF:
[ 0.560765] unchecked MSR access error: WRMSR to 0x832 (tried to
write 0x00000000000400ec) at rIP: 0xffffffff8f473594
(native_write_msr+0x4/0x30)
[ 0.560765] Call Trace:
[ 0.560765] ? native_apic_msr_write+0x2b/0x30
[ 0.560765] __setup_APIC_LVTT+0xbc/0xe0
[ 0.560765] lapic_timer_set_oneshot+0x27/0x30
[ 0.560765] clockevents_switch_state+0xaf/0xf0
[ 0.560765] tick_setup_periodic+0x47/0x90
[ 0.560765] tick_setup_device.isra.0+0x7c/0x110
[ 0.560765] tick_check_new_device+0xce/0xf0
[ 0.560765] clockevents_register_device+0x82/0x170
[ 0.560765] clockevents_config_and_register+0x2f/0x40
[ 0.560765] setup_APIC_timer+0xe1/0xf0
[ 0.560765] setup_boot_APIC_clock+0x5f/0x66
[ 0.560765] native_smp_prepare_cpus+0x1d6/0x286
[ 0.560765] kernel_init_freeable+0xcf/0x255
[ 0.560765] ? rest_init+0xb0/0xb0
[ 0.560765] kernel_init+0xe/0x110
[ 0.560765] ret_from_fork+0x22/0x40
Also, if this feature is exposed guest would not finish booting and get
stuck right before unpacking the root filesystem.
Fixes: 06e8d1c40 ("hypervisor: mshv: fix topology for Intel HW on MSHV")
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
This requires stashing the config values in `struct Vmm`. The configs
should be validated before before creating the VMM thread. Refactor the
code and update documentation where necessary.
The place where the rules are applied remain the same.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Add file/dir paths from landlock-rules arguments to ruleset. Invoke
apply_landlock on VmConfig to apply config specific rules to ruleset.
Once done, any threads spawned by vmm thread will be automatically
sandboxed with the ruleset in vmm thread.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Introduce ApplyLandlock trait and add implementations to VmConfig
elements with PathBufs. This trait adds config specific rules to
landlock ruleset.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Users can use this parameter to pass extra paths that 'vmm' and its
child threads can use at runtime. Hotplug is the primary usecase for
this parameter.
In order to hotplug devices that use local files: disks, memory zones,
pmem devices etc, users can use this option to pass the path/s that will
be used during hotplug while starting cloud-hypervisor. Doing this will
allow landlock to add required rules to grant access to these paths when
cloud-hypervisor process starts.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Users can use this cmdline option to enable/disable Landlock based
sandboxing while running cloud-hypervisor.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
landlock syscalls are required by event_monitor, signal_handler,
http-server and vmm threads. Rest of the threads are spawned by the vmm
thread and they automatically inherit the ruleset from the vmm thread.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
In 42e9632c53 a fix was made to address a
typo in the taplo configuration file. Fixing this typo indicated that
many Cargo.toml files were no longer adhering to the formatting rules.
Fix the formatting by running `taplo fmt`.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
While checking if the console device is a tty use the cloned fd instead
of libc::STDOUT_FILENO.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Console devices are created after vm_config is received and the created
devices are passed Vm during vm_receive_state.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>