After refactoring a common function is used to setup these slots and
that function takes care of allocating a new slot so it is not necessary
to reserve the initial region slots.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After the skeleton of InterruptManager and InterruptSourceGroup traits
have been implemented, this new commit takes care of fully implementing
the content of KvmInterruptManager (InterruptManager trait) and
MsiInterruptGroup (InterruptSourceGroup).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.
Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:
- Address of the source/destination memory buffer.
- File offset.
- Length of the data to be read/written.
The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".
To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.
We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.
Update cloud-hypervisor.yaml with the new options for NetConfig.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Expose the details of hotplug RAM slots via an I/O port. This will be
consumed by the ACPI DSDT tables to report the hotplug memory details to
the guest.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a "resize()" method on MemoryManager which will create a new memory
allocation based on the difference between the desired RAM amount and
the amount already in use. After allocating the added RAM using the same
backing method as the boot RAM store the details in a vector and update
the KVM map and create a new GuestMemoryMmap and replace all the users.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Make the GuestMemoryMmap from a Vec<Arc<GuestRegionMmap>> by using this
method we can persist a set of regions in the MemoryManager and then
extend this set with a newly created region. Ultimately that will allow
the hotplug of memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Because virtio-iommu is still evolving (as it's only partly upstream),
some pieces like the ACPI declaration of the different nodes and devices
attached to the virtual IOMMU are changing.
This patch introduces a new ACPI table called VIOT, standing as the high
level table overseeing the IORT table and associated subtables.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This function will be useful for other parts of the VMM that also
estabilish their own mappings.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.
In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.
Follow on commits will further reduce some of the duplicated code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To reflect updated clippy rules:
error: `if` chain can be rewritten with `match`
--> vmm/src/device_manager.rs:1508:25
|
1508 | / if ret > 0 {
1509 | | debug!("MSI message successfully delivered");
1510 | | } else if ret == 0 {
1511 | | warn!("failed to deliver MSI message, blocked by guest");
1512 | | }
| |_________________________^
|
= note: `-D clippy::comparison-chain` implied by `-D warnings`
= help: Consider rewriting the `if` chain to use `cmp` and `match`.
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Address updated clippy errors:
error: redundant clone
--> vmm/src/device_manager.rs:699:32
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^ help: remove this
|
= note: `-D clippy::redundant-clone` implied by `-D warnings`
note: this value is dropped without further use
--> vmm/src/device_manager.rs:699:21
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:737:26
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:737:21
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:754:29
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:754:25
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the running OS has been told that a CPU should be removed it will
shutdown the CPU and then signal to the hypervisor via the "_EJ0" method
on the device that ultimately writes into an I/O port than the vCPU
should be shutdown. Upon notification the hypervisor signals to the
individual thread that it should shutdown and waits for that thread to
end.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Allow the resizing of the number of vCPUs to less than the current
active vCPUs. This does not currently remove them from the system but
the kernel will take them offline.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When we add a vCPU set an "inserting" boolean that is exposed as an ACPI
field that will be checked for and reset when the ACPI GED notification
for CPU devices happens.
This change is a precursor for CPU unplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Continue to notify on all vCPUs but instead separate the notification
functionality into two methods, CSCN that walks through all the CPUs
and CTFY which notifies based on the numerical CPU id. This is an
interim step towards only notifying on changed CPUs and ultimately CPU
removal.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In anticipation for the writing of unit tests comparing two VmConfig
structures, this commit derives the PartialEq trait for VmConfig and
all embedded structures.
This patch also derives the Debug trait for the same set of structures
so that we can print them to facilitate debugging.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The OpenAPI should not have to provide a command line since the CLI
considers the command line as an empty string if nothing is provided.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.
All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.
Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.
Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The FsConfig structure has been recently adjusted so that the default
value matches between OpenAPI and CLI. Unfortunately, with the current
description, there is no way from the OpenAPI to describe a cache_size
value "None", so that DAX does not get enabled. Usually, using a Rust
"Option" works because the default value is None. But in this case, the
default value is Some(8G), which means we cannot describe a None.
This commit tackles the problem, introducing an explicit parameter
"dax", and leaving "cache_size" as a simple u64 integer.
This way, the default value is dax=true and cache_size=8G, but it lets
the opportunity to disable DAX entirely with dax=false, which will
simply ignore the cache_size value.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
VhostUserBlkConfig structure, this patch defines some default values
for num_queues, queue_size and wce.
num_queues is 1, queue_size is 128 and wce is true.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
VhostUserNetConfig structure, this patch defines some default values
for num_queues, queue_size and mac.
num_queues is 2 since that's a pair of TX/RX queues, queue_size is 256
and mac is a randomly generated value.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We want to set different default configurations for vhost-user-net and
vhost-user-blk, which is the reason why the common part corresponding to
the number of queues and the queue size cannot be embedded.
This prepares for the following commit, matching API and CLI behaviors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A simple patch making sure the field "file" is provisioned with the same
default value through CLI and OpenAPI.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Just making sure we have a serde default for the field "file" since it
is not a required field in the OpenAPI definition.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
All structures match between the OpenAPI definition and the internal
configuration code, that's why CpuConfig is being renamed into
CpusConfig.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The signal handling for vCPU signals has changed in the latest release
so switch to the new API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
FsConfig structure, this patch defines some default values for
num_queues, queue_size and the cache_size.
num_queues is set to 1, queue_size is set to 1024, and cache_size is set
to Some(8G) which means that DAX is enabled by default with a shared
region of 8GiB.
Fixes#508
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
NetConfig structure, this patch defines some default values for tap, ip,
mask, mac and iommu.
tap is None, ip is 192.168.249.1, mask is 255.255.255.0, mac is a
randomly generated value, and iommu is false.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the GED device does not use a hardcoded IRQ number the starting
IRQ number can be restored (needed for the hardcoded serial port IRQ.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the previously hardcoded IRQ number used for the GED device.
Instead allocate the IRQ using the allocator and use that value in the
definition in the ACPI device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for handling the creation of the DSDT entries for devices
into the DeviceManager.
This will make it easier to handle device hotplug and also in the future
remove some hardcoded ACPI constants.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for generating the MADT (APIC) table and the DSDT
generation for CPU related functionality into the CpuManager.
There is no functional change just code rearrangement.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When consumer of the HTTP API try to interact with cloud-hypervisor,
they have to provide the equivalent of the config structure related to
each component they need. Problem is, the Rust enum type "Option" cannot
be obtained from the OpenAPI YAML definition.
This patch intends to fix this inconsistency between what is possible
through the CLI and what's possible through the HTTP API by using simple
types bool and int64 instead of Option<u64>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the device setup code assumed that if no IOAPIC was passed in
then the device should be added to the kernel irqchip. As an earlier
change meant that there was always a userspace IOAPIC this kernel based
code can be removed.
The accessor still returns an Option type to leave scope for
implementing a situation without an IOAPIC (no serial or GED device).
This change does not add support no-IOAPIC mode as the original code did
not either.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Update the configuration after a resize to ensure that after a reboot
the added vCPUs are preserved.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This was causing issues when the kernel was trying to reset the
interrupt and making the reboot fail.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The tty mode remains raw mode when cloud-hypervisor is terminted by
SIGTERM or SIGINT. The terminal is unusable due to echoing is
disabled which is really annoying.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
Some physical address bits may become reserved in page table when SME
is enabled on AMD platform. Guest will trigger a reserved bit
violation page fault in this case due to write these reserved bits to 1
in page table. We need reduce the reserved bits to get the right
physical address range.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
The KVM_SET_GSI_ROUTING ioctl is very simple, it overrides the previous
routes configuration with the new ones being applied. This means the
caller, in this case cloud-hypervisor, needs to maintain the list of all
interrupts which needs to be active at all times. This allows to
correctly support multiple devices to be passed through the VM and being
functional at the same time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Call the "CTFY" method that will itself call Notify() on the CPU objects
in the ACPI namespace.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add ability to notify via the GED device that there is some new hotplug
activity. This will be used by the CpuManager (and later DeviceManager
itself) to notify of new hotplug activity.
Currently it has a hardcoded IRQ of 5 as the ACPI tables also need to
refer to this IRQ and the IRQ allocation does not permit the allocation
of specific IRQs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently only increasing the number of vCPUs is supported but in the
future it will be extended.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When configuring a processor after boot as a hotplug CPU we only
configure a subset of the CPU state. In particular we should not
configure the FPU, segment registers (or reconfigure the paging which is
a side-effect of that) nor the main registers. Achieve this by making
the function take an Option type for the start address.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the vCPU thread starting so that there is the possibility to
bring on extra vCPU threads.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently this just holds the thread handle but will be enlarged to
encompass details such as whether the vCPU is currently being inserted
or ejected.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MADT table contains the details of all the potential vCPUs and
whether they are present at boot (as indicated by the flags field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When initialising the ACPI tables and configuring the VM use the new
accessor on the CpuManager to get the number of boot vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>