With this we are removing the CloudHypervisor definition of
StandardRegisters instead using an enum which contains different
variants of StandardRegisters coming from their bindigs crate.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Use the IOCTL numbers directly from mshv-ioctls instead of hardcoding
them in the seccomp filters.
Remove seccomp rules for unused ioctls:
MSHV_GET_VERSION_INFO,
MSHV_ASSERT_INTERRUPT.
Signed-off-by: Nuno Das Neves <nudasnev@microsoft.com>
Passing AccessPlatform trait to virtio-device for requesting
restricting page access during IO for SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Implement AccessPlatform for SEV-SNP guest to access
restricted page using IO. VMM calls MSHV api to get access
of the pages, MSHV requests guest to release the access.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Add a structure to hold the reference of the Vm trait
from Hypervisor crate to access of restricted page
from SEV-SNP guest.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.
Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.
There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.
The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.
The Cloud Hypervisor component can be enabled with --pvmemcontrol.
Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
BusDevice trait functions currently holds a mutable reference to self,
and exclusive access is guaranteed by taking a Mutex when dispatched by
the Bus object. However, this prevents individual devices from serving
accesses that do not require an mutable reference or is better served
with different synchronization primitives. We switch Bus to dispatch via
BusDeviceSync, which holds a shared reference, and delegate locking to
the BusDeviceSync trait implementation for Mutex<BusDevice>.
Other changes are made to make use of the dyn BusDeviceSync
trait object.
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
This reverts commit 94929889ac2489015d60ca1480a5d412e2792c57.
This revert moves landlock config back to VMConfig.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
This requires stashing the config values in `struct Vmm`. The configs
should be validated before before creating the VMM thread. Refactor the
code and update documentation where necessary.
The place where the rules are applied remain the same.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Add file/dir paths from landlock-rules arguments to ruleset. Invoke
apply_landlock on VmConfig to apply config specific rules to ruleset.
Once done, any threads spawned by vmm thread will be automatically
sandboxed with the ruleset in vmm thread.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Introduce ApplyLandlock trait and add implementations to VmConfig
elements with PathBufs. This trait adds config specific rules to
landlock ruleset.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Users can use this parameter to pass extra paths that 'vmm' and its
child threads can use at runtime. Hotplug is the primary usecase for
this parameter.
In order to hotplug devices that use local files: disks, memory zones,
pmem devices etc, users can use this option to pass the path/s that will
be used during hotplug while starting cloud-hypervisor. Doing this will
allow landlock to add required rules to grant access to these paths when
cloud-hypervisor process starts.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Users can use this cmdline option to enable/disable Landlock based
sandboxing while running cloud-hypervisor.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
landlock syscalls are required by event_monitor, signal_handler,
http-server and vmm threads. Rest of the threads are spawned by the vmm
thread and they automatically inherit the ruleset from the vmm thread.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
While checking if the console device is a tty use the cloned fd instead
of libc::STDOUT_FILENO.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Console devices are created after vm_config is received and the created
devices are passed Vm during vm_receive_state.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
During vm_shutdown or vm_snapshot, all the console devices will be
closed. When this happens stdout (FD #2) will also be closed as the
console device using these FD is closed. If the VM were to be started
later, FD#2 can be assigned to a different file. But
pre_create_console_devices looks for FD#2 while opening tty device,
which could point to any file.
To avoid this problem, the STDOUT FD is duplicated when being
assigned to a console device. Even if the console devices were to be
closed, the duplicated FD will be closed and FD#2 will continue to
point to STDOUT.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
While adding console devices, DeviceManager will now use the FDs in
console_info instead of creating them.
To reduce the size of this commit, I marked some variables are unused
with '_' prefix. All those variables are cleaned up in next commit.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Use pre_create_console_devices method to create and populate console
device FDs into console_info in Vmm Object.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
With this change all the information to manage console devices is now
available within Vmm Object.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Introduce ConsoleInfo struct. This struct will be used to store FDs of
console devices created in pre_create_console_devices and passed to
vm_boot.
Move set_raw_mode, create_pty methods to console_devices.rs to
consolidate console management methods into a single module.
Lastly, copy the logic to create and configure console devices into
pre_create_console_devices method.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Misspellings were identified by:
https://github.com/marketplace/actions/check-spelling
* Initial corrections based on forbidden patterns from the action
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
* Adding markdown bullets to readme credits section
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
Currently by default each core is allocated it's own socket. Basically
it is n socket 1 core 1 thread/core kind of a structure as witnessed
from within the guest.
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
This is not a good default topology because resources are distributed
across multiple sockets. For example, a Linux guest with multi socket
configuration will have to calibrate TSC per socket due to which it
might observe a higher amount of boot time than usual.
A better idea for default topology would be 1 socket n core 1
thread/core which ensure better resource locality.
After this change topology would change to:
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Fixes: #6497
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
The original code gave an owned fd to UnixListener. That made the same
fd wrapped into two owned files.
When the files were dropped, the same fd would be closed more than once.
A newly introduced check in Rust's stdlib caught that error.
A newly cloned fd should be given to UnixListener.
Fixes: #6485
Signed-off-by: Wei Liu <liuwe@microsoft.com>
For MSHV we always create frozen partition, so we
resume the VM during boot. Also during pause and resume
VM events we call hypervisor specific API.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Consume FDs passed via SCM_RIGHTs to VmRestore API and assign them
appropriately to RestoredNetConfig's fds field.
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
'NetConfig' FDs, when explicitly passed via SCM_RIGHTS during VM
creation, are marked as invalid during snapshot. See: #6332.
So, Restore should support input for the new net FDs. This patch adds
new field 'net_fds' to 'RestoreConfig'. The FDs passed using this new
field are replaced into the 'fds' field of NetConfig appropriately.
The 'validate()' function ensures all net devices from 'VmConfig' backed
by FDs have a corresponding 'RestoreNetConfig' with a matched 'id' and
expected number of FDs.
The unit tests provide different inputs to parse and validate functions
to make sure parsing and error handling is as per expectation.
Fixes#6286
Signed-off-by: Purna Pavan Chandra <paekkaladevi@linux.microsoft.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
This is to resolve the inconsistencies from our openapi specification,
as default values do not make sense for required fields.
Reported-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
As MSHV also implements set/get_clock data, this patch
removes the KVM feature guard and make it x86_64 only and
both for KVM and MSHV.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
The contents of this crate may change and cause conflicts - re-exporting
the contents is unnecessary.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
The members for {Io, Mmio}{Read, Write} are unused as instead exits of
those types are handled through the VmOps interface. Removing these is
also a prerequisite due to changes in the mutability of the
VcpuFd::run() method.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
When using multiple PCI segments, the 32-bit and 64-bit mmio
aperture is split equally between each segment. Add an option
to configure the 'weight'. For example, a PCI segment with a
`mmio32_aperture_weight` of 2 will be allocated twice as much
32-bit mmio space as a normal PCI segment.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>