Commit Graph

1389 Commits

Author SHA1 Message Date
Rob Bradford
e3c35a3579 vmm: Allow specifying the PCI segment ID when adding virtio PCI device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
7a4606f800 vmm: Implement ACPI hotplug/unplug handling for PCI segments
For the bus scanning the GED AML code now calls into a PSCN method that
scans all buses. This approach was chosen since it handles the case
correctly where one GED interrupt is services for two hotplugs on
distinct segments.

The PCIU and PCID field values are now determined by the PSEG field that
is uses to select which segment those values should be used for.
Similarly _EJ0 will notify based on the value of _SEG.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
49f19e061b vmm: Use device's segment when removing a device
The segment ID has been stored in the DeviceTree.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
d33d254921 vmm: Remove hardcoded zero PCI segment id
Replace the hardcoded zero PCI segment id when adding devices to the bus
and extend the DeviceTree to hold the PCI segment id.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b8b0dab1ae vmm: Add segment_id parameter to DeviceManager::add_pci_device
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
c118d7d7d3 vmm: Only fill in PIO and 32-bit MMIO space on zero segment
Since each segment must have disjoint address spaces only advertise
address space in the 32-bit range and the PIO address space on the
default (zero) segment.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
3059ba4305 vmm: Refactor PCI segment creation to support non-default segment
Split PciSegment::new_default_segment() into a separate
PciSegment::new() and those parts required only for the default segment
(PIO PCI config device.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
080ce9b068 vmm: Populate MCFG table with details of all PCI segments
The MCFG table holds the PCI MMIO config details for all the MMIO PCI
config devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
c886d71d29 vmm: Add MMIO & PIO config devices for all PCI segments
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
4f5c179b9b vmm: Construct PCI DSDT data from all segments
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
fbb385834a vmm: Use a vector to store multiple segments
For now this still contains just one segment but is expanding in
preparation for more segments.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b4fc02857f vmm: Advertise PCI MMIO config range for PCI bus
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b55f009b8a vmm: Calculate MMIO config address based on segment id
This means that each segment can have its own PCI MMIO config device
without overlapping.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
b59f1d90dd vmm: Expose _SEG with segment ID for PCI bus
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
a7fba8105f vmm: Customise PCI device name based on segment id
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Rob Bradford
8b67298ad8 vmm: Move PCI bus DSDT data onto PciSegment
This commit moves the code that generates the DSDT data for the PCI bus
into PciSegment making no functional changes to the generated AML.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-02 16:55:42 +00:00
Wei Liu
23f63262e7 vmm: drop underscore from used variables
Variables that start with underscore are used to silence rustc.
Normally those variables are not used in code.

This patch drops the underscore from variables that are used. This is
less confusing to readers.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-10-28 13:38:20 +01:00
Bo Chen
455b0d12e9 vmm: Remove VFIO user device from VmConfig upon device unplug
It ensures we won't recreate the unplugged device on reboot.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-10-28 09:42:52 +01:00
Rob Bradford
beb0c0707f vmm: Move logging output for the debug (0x80) port to info!()
This makes it much easier to use since the info!() level produces far
fewer messages and thus has less overhead.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-26 16:48:09 +01:00
Sebastien Boeuf
0249e8641a Move Cloud Hypervisor to virtio-queue crate
Relying on the vm-virtio/virtio-queue crate from rust-vmm which has been
copied inside the Cloud Hypervisor tree, the entire codebase is moved to
the new definition of a Queue and other related structures.

The reason for this move is to follow the upstream until we get some
agreement for the patches that we need on top of that to make it
properly work with Cloud Hypervisor.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-22 11:38:55 +02:00
Rob Bradford
e9ea9d63f8 vmm: Use assert!() rather than if+panic
As identified by the new beta clippy.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-19 19:42:36 +01:00
Rob Bradford
2cccdc5ddd vmm: Naturally align PCI BARs on relocation
When allocating PCI MMIO BARs they should always be naturally aligned
(i.e. aligned to the size of the BAR itself.)

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-15 14:54:18 -07:00
Rob Bradford
c25bd447a1 vmm: Ensure that allocate_bars() is called before mmio_regions()
The allocate_bars method has a side effect which collates the BARs used
for the device and stores them internally. Ensure that any use of this
internal state is after the state is created otherwise no MMIO regions
will be seen and so none will be mapped.

Fixes: #3237

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-14 10:14:33 -07:00
Sebastien Boeuf
58d8206e2b migration: Use MemoryManager restore code path
Instead of creating a MemoryManager from scratch, let's reuse the same
code path used by snapshot/restore, so that memory regions are created
identically to what they were on the source VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
1e1e61614c vmm: memory_manager: Leverage new codepath for snapshot/restore
Now that all the pieces are in place, we can restore a VM with the new
codepath that restores properly all memory regions, allowing for ACPI
memory hotplug to work properly with snapshot/restore feature.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
6a55768d94 vmm: Create MemoryManager from restore data
Extending the MemoryManager::new() function to be able to create a
MemoryManager from data that have been previously stored instead of
always creating everything from scratch.

This change brings real added value as it allows a VM to be restored
respecting the proper memory layout instead of hoping the regions will
be created the way they were before.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
5b177b205b arch, vmm: Extend the data being snapshot
Storing multiple data coming from the MemoryManager in order to be able
to restore without creating everything from scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
f440976a7c vmm: memory_manager: Add a way to restore memory regions properly
This new function will be able to restore memory regions and memory
zones based on the GuestMemoryMapping list that will be provided through
snapshot/restore and migration phases.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
0d573ae86c vmm: memory_manager: Add file_offset to GuestRamMapping
This will help restoring the region with the correct file offset for the
memory mapping.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
01420f5195 vmm: memory_manager: Add virtio_mem to GuestRamMapping
This will help identify if the range belongs to a virtio-mem region or
not.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
dfb1829f65 vmm: memory_manager: Add zone_id to GuestRamMapping
This can help identifying which zone relates to which memory range.
This is going to be useful when recreating GuestMemory regions from
the previous layout instead of having to recreate everything from
scratch.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
b5d11f72b3 vmm: memory_manager: Factorize allocation of ranges
Create a dedicated function to factorize the allocation of the memory
ranges, and helping with the simplification of MemoryManager::new()
function.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
00951f17d4 vmm: memory_manager: Simplify regions creation
By updating the list of GuestMemory regions with the virtio-mem ones
before the creation of the MemoryManager, we know the GuestMemory is up
to date and the allocation of memory ranges is simplified afterwards.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Sebastien Boeuf
63c6c78c4e vmm: memory_manager: Factorize configuration validation
In order to simplify MemoryManager::new() function. let's move the
memory configuration validation to its own function.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-10-06 18:35:49 -07:00
Rob Bradford
84fc0e093d vmm: Move PciSegment to new file
Move the PciSegment struct and the associated code to a new file. This
will allow some clearer separation between the core DeviceManager and
PCI handling.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-05 10:54:07 +01:00
Rob Bradford
0eb78ab177 vmm: Extract PCI related state from DeviceManager
Move the PCI related state from the DeviceManager struct to a PciSegment
struct inside the DeviceManager. This is in preparation for multiple
segment support. Currently this state is just the bus itself, the MMIO
and PIO config devices and hotplug related state.

The main change that this required is using the Arc<Mutex<PciBus>> in
the device addition logic in order to ensure that
the bus could be created earlier.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-05 10:54:07 +01:00
Rob Bradford
83066cf58e vmm: Set a default maximum physical address size
When using PVH for booting (which we use for all firmwares and direct
kernel boot) the Linux kernel does not configure LA57 correctly. As such
we need to limit the address space to the maximum 4-level paging address
space.

If the user knows that their guest image can take advantage of the
5-level addressing and they need it for their workload then they can
increase the physical address space appropriately.

This PR removes the TDX specific handling as the new address space limit
is below the one that that code specified.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-10-01 08:59:15 -07:00
Sebastien Boeuf
495e444ca6 vmm: Add ACPI tables to TdVmmData when running TDX
Whenever running TDX, we must pass the ACPI tables to the TDVF firmware
running in the guest. The proper way to do this is by adding the tables
to the TdHob as a TdVmmData type, so that TDVF will know how to access
these tables and expose them to the guest OS.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-30 06:35:55 -07:00
Sebastien Boeuf
b99a3a7dc9 vmm: Factorize ACPI tables creation inside boot() function
Instead of having the ACPI tables being created both in x86_64 and
aarch64 implementations of configure_system(), we can remove the
duplicated code by moving the ACPI tables creation in vm.rs inside the
boot() function.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-30 06:35:55 -07:00
Yu Li
08021087ec vmm: add prefault option in memory and memory-zone
The argument `prefault` is provided in MemoryManager, but it can
only be used by SGX and restore.
With prefault (MAP_POPULATE) been set, subsequent page faults will
decrease during running, although it will make boot slower.

This commit adds `prefault` in MemoryConfig and MemoryZoneConfig.
To resolve conflict between memory and restore, argument
`prefault` has been changed from `bool` to `Option<bool>`, when
its value is None, config from memory will be used, otherwise
argument in Option will be used.

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2021-09-29 14:17:35 +02:00
Sebastien Boeuf
59031531b6 vmm: Simplify the way memory is snapshot and restored
By using a single file for storing the memory ranges, we simplify the
way snapshot/restore works by avoiding multiples files, but the main and
more important point is that we have now a way to save only the ranges
that matter. In particular, the ranges related to virtio-mem regions are
not always fully hotplugged, meaning we don't want to save the entire
region. That's where the usage of memory ranges is interesting as it
lets us optimize the snapshot/restore process when one or multiple
virtio-mem regions are involved.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
1ea63f50a1 vmm: Move MemoryRangeTable creation to the MemoryManager
The function memory_range_table() will be reused by the MemoryManager in
a following patch to describe all the ranges that we should snapshot.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
86f86c5348 vmm: Optimize migration for virtio-mem
Copy only the memory ranges that have been plugged through virtio-mem,
allowing for an interesting optimization regarding the time it takes to
migrate a large virtio-mem device. Even if the hotpluggable space is
very large (say 64GiB), if only 1GiB has been previously added to the
VM, only 1GiB will be sent to the destination VM, avoiding the transfer
of the remaining 63GiB which are unused.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
e390775bcb vmm, virtio-devices: Move BlocksState creation to the MemoryManager
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
a1caa6549a vmm: Add page size as a parameter for MemoryRangeTable::from_bitmap()
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
d7115ec656 virtio-devices: mem: Add snapshot/restore support
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
7bbcc0f849 vmm: memory_manager: Make sure the hotplugged_size is up to date
The amount of memory plugged in the virtio-mem region should always be
kept up to date in the hotplugged_size field from VirtioMemZone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
c4dc7a583d vmm: memory_manager: Simplify the MemoryManager structure
There's no need to duplicate the GuestMemory for snapshot purpose, as we
always have a handle onto the GuestMemory through the guest_memory
field.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Sebastien Boeuf
74485924b1 vmm: memory_manager: Simplification to avoid unnecessary locking
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-09-28 10:15:22 -07:00
Rob Bradford
4889999277 vmm: Only advertise a single PCI bus
Since we only support a single PCI bus right now advertise only a single
bus in the ACPI tables. This reduces the number of VM exits from probing
substantially.

Number of PCI config I/O port exits: 17871 -> 1551 (91% reduction) with
direct kernel boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-09-28 14:10:10 +02:00