Pvmemcontrol provides a way for the guest to control its physical memory
properties, and enables optimizations and security features. For
example, the guest can provide information to the host where parts of a
hugepage may be unbacked, or sensitive data may not be swapped out, etc.
Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT,
and also some other properties of the memory map the back's host memory.
This is achieved by using the KVM_CAP_SYNC_MMU capability. When this
capability is available, the changes in the backing of the memory region
on the host are automatically reflected into the guest. For example, an
mmap() or madvise() that affects the region will be made visible
immediately.
There are two components of the implementation: the guest Linux driver
and Virtual Machine Monitor (VMM) device. A guest-allocated shared
buffer is negotiated per-cpu through a few PCI MMIO registers, the VMM
device assigns a unique command for each per-cpu buffer. The guest
writes its pvmemcontrol request in the per-cpu buffer, then writes the
corresponding command into the command register, calling into the VMM
device to perform the pvmemcontrol request.
The synchronous per-cpu shared buffer approach avoids the kick and busy
waiting that the guest would have to do with virtio virtqueue transport.
The Cloud Hypervisor component can be enabled with --pvmemcontrol.
Co-developed-by: Stanko Novakovic <stanko@google.com>
Co-developed-by: Pasha Tatashin <tatashin@google.com>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
HV APIC(i.e., synthetic APIC controller exposed by Microsoft Hypervisor)
does not support one-shot operation using a TSC deadline value. Due to
which we see the following backtrace inside the guest when running with
hypervisor-fw/OVMF:
[ 0.560765] unchecked MSR access error: WRMSR to 0x832 (tried to
write 0x00000000000400ec) at rIP: 0xffffffff8f473594
(native_write_msr+0x4/0x30)
[ 0.560765] Call Trace:
[ 0.560765] ? native_apic_msr_write+0x2b/0x30
[ 0.560765] __setup_APIC_LVTT+0xbc/0xe0
[ 0.560765] lapic_timer_set_oneshot+0x27/0x30
[ 0.560765] clockevents_switch_state+0xaf/0xf0
[ 0.560765] tick_setup_periodic+0x47/0x90
[ 0.560765] tick_setup_device.isra.0+0x7c/0x110
[ 0.560765] tick_check_new_device+0xce/0xf0
[ 0.560765] clockevents_register_device+0x82/0x170
[ 0.560765] clockevents_config_and_register+0x2f/0x40
[ 0.560765] setup_APIC_timer+0xe1/0xf0
[ 0.560765] setup_boot_APIC_clock+0x5f/0x66
[ 0.560765] native_smp_prepare_cpus+0x1d6/0x286
[ 0.560765] kernel_init_freeable+0xcf/0x255
[ 0.560765] ? rest_init+0xb0/0xb0
[ 0.560765] kernel_init+0xe/0x110
[ 0.560765] ret_from_fork+0x22/0x40
Also, if this feature is exposed guest would not finish booting and get
stuck right before unpacking the root filesystem.
Fixes: 06e8d1c40 ("hypervisor: mshv: fix topology for Intel HW on MSHV")
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
In 42e9632c53 a fix was made to address a
typo in the taplo configuration file. Fixing this typo indicated that
many Cargo.toml files were no longer adhering to the formatting rules.
Fix the formatting by running `taplo fmt`.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Code in this crate is conditional on this feature so it necessary to
expose as a new feature and use that feature as a dependency when the
feature is enabled on the vmm crate.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
The "dhat-heap" feature needs to be enabled inside the vmm crate as a
depenency from the top-level as there is build time check for that
feature inside the vmm crate.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Update the vhost-user-backend crate version used along with related
crates (vhost and virtio-queue.) This requires minor changes to the
types used for the memory in the backends with the use of the
BitmapMmapRegion type for the Bitmap implementation.
Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
This patch bumps the following crates, including `kvm-bindings@0.7.0`*,
`kvm-ioctls@0.16.0`**, `linux-loader@0.11.0`, `versionize@0.2.0`,
`versionize_derive@0.1.6`***, `vhost@0.10.0`,
`vhost-user-backend@0.13.1`, `virtio-queue@0.11.0`, `vm-memory@0.14.0`,
`vmm-sys-util@0.12.1`, and the latest of `vfio-bindings`, `vfio-ioctls`,
`mshv-bindings`,`mshv-ioctls`, and `vfio-user`.
* A fork of the `kvm-bindings` crate is being used to support
serialization of various structs for migration [1]. Also, code changes
are made to accommodate the updated `struct xsave` from the Linux
kernel. Note: these changes related to `struct xsave` break
live-upgrade.
** The new `kvm-ioctls` crate introduced breaking changes for
the `get/set_one_reg` API on `aarch64` [2], so code changes are made to
the new APIs.
*** A fork of the `versionize_derive` crate is being used to support
versionize on packed structs [3].
[1] https://github.com/cloud-hypervisor/kvm-bindings/tree/ch-v0.7.0
[2] https://github.com/rust-vmm/kvm-ioctls/pull/223
[3] https://github.com/cloud-hypervisor/versionize_derive/tree/ch-0.1.6Fixes: #6072
Signed-off-by: Bo Chen <chen.bo@intel.com>
Add a 'rate_limit_groups' field to VmConfig that defines a set of
named RateLimiterGroups.
When the 'rate_limit_group' field of DiskConfig is defined, all
virtio-blk queues will be rate-limited by a shared RateLimiterGroup.
The lifecycle of all RateLimiterGroups is tied to the Vm.
A RateLimiterGroup may exist even if no Disks are configured to use
the RateLimiterGroup. Disks may be hot-added or hot-removed from the
RateLimiterGroup.
When the 'rate_limiter' field of DiskConfig is defined, we construct
an anonymous RateLimiterGroup whose lifecycle is tied to the Disk.
This is primarily done for api backwards compatability. Importantly,
the behavior is not the same! This implementation rate_limits the
aggregate bandwidth / iops of an individual disk rather than the
bandwidth / iops of an individual queue of a disk.
When neither the 'rate_limit_group' or the 'rate_limiter' fields of
DiskConfig is defined, the Disk is not rate-limited.
Signed-off-by: Thomas Barrett <tbarrett@crusoeenergy.com>
Currently there are some inconsistencies in Cargo.toml which is causing
the following warnings during the build process:
Error parsing Cargo.toml manifest, fallback to caching entire file:
Invalid TOML document: expected key-value, found comma
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
vmm: Add igvm module and loader module
Add a separate module named igvm to the vmm crate
with definitions to parse and load igvm to the guest memory.
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch adds igvm to the Vm config and params as well as
the command line argument to pass igvm file to load into
guest memory. The file must maintain the IGVM format.
The CLI option is featured guarded by igvm feature gate.
The IGVM(Independent Guest Virtual Machine) file format
is designed to encapsulate all information required to
launch a virtual machine on any given virtualization stack,
with support for different isolation technologies such as
AMD SEV-SNP and Intel TDX.
At a conceptual level, this file format is a set of commands created
by the tool that generated the file, used by the loader to construct
the initial guest state. The file format also contains measurement
information that the underlying platform will use to confirm that
the file was loaded correctly and signed by the appropriate authorities.
The IGVM file is generated by the tool:
https://github.com/microsoft/igvm-tooling
The IGVM file is parsed by the following crates:
https://github.com/microsoft/igvm
Signed-off-by: Muminul Islam <muislam@microsoft.com>