Move GED device reporting of required device type to scan into an MMIO
region rather than an I/O port.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than have the MemoryManager device sit on the I/O bus allocate
space for MMIO and add it to the MMIO bus.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit relies on the interrupt manager and the resulting interrupt
source group to abstract the knowledge about KVM and how interrupts are
updated and delivered.
This allows the entire "devices" crate to be freed from kvm_ioctls and
kvm_bindings dependencies.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The interrupt manager is passed to the IOAPIC creation, and the IOAPIC
now creates an InterruptSourceGroup for MSI interrupts based on it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By introducing a new InterruptManager dedicated to the IOAPIC, we don't
have to solve the chicken and eggs problem about which of the
InterruptManager or the Ioapic should be created first. It's also
totally fine to have two interrupt manager instances as they both share
the same list of GSI routes and the same allocator.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'direct' property, honor
it while opening the file backing the disk image, and pass it to
vm_virtio::RawFile.
Fixes#631
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'readonly' properly, and pass
it to vm_virtio::Block.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.
Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the InterruptManager is never stored into any structure, it should
be passed as a reference instead of being cloned.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Additionally, since it removes the last bit relying on the Interrupt
trait, the trait and its implementation can be removed from the
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the previous commits, the legacy interrupt implementation can
be completed. The IOAPIC handler is used to deliver the interrupt that
will be triggered through the trigger() method.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By having a reference to the IOAPIC, the KvmInterruptManager is going
to be able to initialize properly the legacy interrupt source group.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to be able to use the InterruptManager abstraction with
virtio-mmio devices, this commit introduces InterruptSourceGroup's
skeleton for legacy interrupts.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When KvmInterruptManager initializes a new InterruptSourceGroup, it's
only for PCI_MSI_IRQ case that it needs to allocate the GSI and create a
new InterruptRoute. That's why this commit moves the general code into
the specific use case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the base InterruptIndex is different from 0, the loop allocating
GSI and HashMap entries won't work as expected. The for loop needs to
start from base, but the limit must be base+count so that we allocate
a number of "count" entries.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the InterruptManager be shared across both PCI and MMIO
devices, this commit moves the initialization earlier in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After refactoring a common function is used to setup these slots and
that function takes care of allocating a new slot so it is not necessary
to reserve the initial region slots.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After the skeleton of InterruptManager and InterruptSourceGroup traits
have been implemented, this new commit takes care of fully implementing
the content of KvmInterruptManager (InterruptManager trait) and
MsiInterruptGroup (InterruptSourceGroup).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.
Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:
- Address of the source/destination memory buffer.
- File offset.
- Length of the data to be read/written.
The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".
To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.
We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.
Update cloud-hypervisor.yaml with the new options for NetConfig.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Expose the details of hotplug RAM slots via an I/O port. This will be
consumed by the ACPI DSDT tables to report the hotplug memory details to
the guest.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a "resize()" method on MemoryManager which will create a new memory
allocation based on the difference between the desired RAM amount and
the amount already in use. After allocating the added RAM using the same
backing method as the boot RAM store the details in a vector and update
the KVM map and create a new GuestMemoryMmap and replace all the users.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Make the GuestMemoryMmap from a Vec<Arc<GuestRegionMmap>> by using this
method we can persist a set of regions in the MemoryManager and then
extend this set with a newly created region. Ultimately that will allow
the hotplug of memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Because virtio-iommu is still evolving (as it's only partly upstream),
some pieces like the ACPI declaration of the different nodes and devices
attached to the virtual IOMMU are changing.
This patch introduces a new ACPI table called VIOT, standing as the high
level table overseeing the IORT table and associated subtables.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This function will be useful for other parts of the VMM that also
estabilish their own mappings.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the need to handle a mutable integer and also centralises
the allocation of these slot numbers.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.
In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.
Follow on commits will further reduce some of the duplicated code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
To reflect updated clippy rules:
error: `if` chain can be rewritten with `match`
--> vmm/src/device_manager.rs:1508:25
|
1508 | / if ret > 0 {
1509 | | debug!("MSI message successfully delivered");
1510 | | } else if ret == 0 {
1511 | | warn!("failed to deliver MSI message, blocked by guest");
1512 | | }
| |_________________________^
|
= note: `-D clippy::comparison-chain` implied by `-D warnings`
= help: Consider rewriting the `if` chain to use `cmp` and `match`.
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#comparison_chain
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Address updated clippy errors:
error: redundant clone
--> vmm/src/device_manager.rs:699:32
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^ help: remove this
|
= note: `-D clippy::redundant-clone` implied by `-D warnings`
note: this value is dropped without further use
--> vmm/src/device_manager.rs:699:21
|
699 | .insert(acpi_device.clone(), 0x3c0, 0x4)
| ^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:737:26
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:737:21
|
737 | .insert(i8042.clone(), 0x61, 0x4)
| ^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
error: redundant clone
--> vmm/src/device_manager.rs:754:29
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> vmm/src/device_manager.rs:754:25
|
754 | .insert(cmos.clone(), 0x70, 0x2)
| ^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_clone
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the running OS has been told that a CPU should be removed it will
shutdown the CPU and then signal to the hypervisor via the "_EJ0" method
on the device that ultimately writes into an I/O port than the vCPU
should be shutdown. Upon notification the hypervisor signals to the
individual thread that it should shutdown and waits for that thread to
end.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Allow the resizing of the number of vCPUs to less than the current
active vCPUs. This does not currently remove them from the system but
the kernel will take them offline.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When we add a vCPU set an "inserting" boolean that is exposed as an ACPI
field that will be checked for and reset when the ACPI GED notification
for CPU devices happens.
This change is a precursor for CPU unplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Continue to notify on all vCPUs but instead separate the notification
functionality into two methods, CSCN that walks through all the CPUs
and CTFY which notifies based on the numerical CPU id. This is an
interim step towards only notifying on changed CPUs and ultimately CPU
removal.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In anticipation for the writing of unit tests comparing two VmConfig
structures, this commit derives the PartialEq trait for VmConfig and
all embedded structures.
This patch also derives the Debug trait for the same set of structures
so that we can print them to facilitate debugging.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The OpenAPI should not have to provide a command line since the CLI
considers the command line as an empty string if nothing is provided.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the Snapshotable placeholder and Migratable traits are provided as
well, the DeviceManager object and all its objects are now Migratable.
All Migratable devices are tracked as Arc<Mutex<dyn Migratable>>
references.
Keeping track of all migratable devices allows for implementing the
Migratable trait for the DeviceManager structure, making the whole
device model potentially migratable.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Migratable devices can be virtio or legacy devices.
In any case, they can potentially be tracked through one of the IO bus
as an Arc<Mutex<dyn BusDevice>>. In order for the DeviceManager to also
keep track of such devices as Migratable trait objects, they must be
shared as mutable atomic references, i.e. Arc<Mutex<T>>. That forces all
Migratable objects to be tracked as Arc<Mutex<dyn Migratable>>.
Virtio devices are typically migratable, and thus for them to be
referenced by the DeviceManager, they now should be built as
Arc<Mutex<VirtioDevice>>.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The FsConfig structure has been recently adjusted so that the default
value matches between OpenAPI and CLI. Unfortunately, with the current
description, there is no way from the OpenAPI to describe a cache_size
value "None", so that DAX does not get enabled. Usually, using a Rust
"Option" works because the default value is None. But in this case, the
default value is Some(8G), which means we cannot describe a None.
This commit tackles the problem, introducing an explicit parameter
"dax", and leaving "cache_size" as a simple u64 integer.
This way, the default value is dax=true and cache_size=8G, but it lets
the opportunity to disable DAX entirely with dax=false, which will
simply ignore the cache_size value.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
VhostUserBlkConfig structure, this patch defines some default values
for num_queues, queue_size and wce.
num_queues is 1, queue_size is 128 and wce is true.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
VhostUserNetConfig structure, this patch defines some default values
for num_queues, queue_size and mac.
num_queues is 2 since that's a pair of TX/RX queues, queue_size is 256
and mac is a randomly generated value.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We want to set different default configurations for vhost-user-net and
vhost-user-blk, which is the reason why the common part corresponding to
the number of queues and the queue size cannot be embedded.
This prepares for the following commit, matching API and CLI behaviors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A simple patch making sure the field "file" is provisioned with the same
default value through CLI and OpenAPI.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Just making sure we have a serde default for the field "file" since it
is not a required field in the OpenAPI definition.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
All structures match between the OpenAPI definition and the internal
configuration code, that's why CpuConfig is being renamed into
CpusConfig.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The signal handling for vCPU signals has changed in the latest release
so switch to the new API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
FsConfig structure, this patch defines some default values for
num_queues, queue_size and the cache_size.
num_queues is set to 1, queue_size is set to 1024, and cache_size is set
to Some(8G) which means that DAX is enabled by default with a shared
region of 8GiB.
Fixes#508
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the CLI and the HTTP API behave the same regarding the
NetConfig structure, this patch defines some default values for tap, ip,
mask, mac and iommu.
tap is None, ip is 192.168.249.1, mask is 255.255.255.0, mac is a
randomly generated value, and iommu is false.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the GED device does not use a hardcoded IRQ number the starting
IRQ number can be restored (needed for the hardcoded serial port IRQ.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the previously hardcoded IRQ number used for the GED device.
Instead allocate the IRQ using the allocator and use that value in the
definition in the ACPI device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for handling the creation of the DSDT entries for devices
into the DeviceManager.
This will make it easier to handle device hotplug and also in the future
remove some hardcoded ACPI constants.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the code for generating the MADT (APIC) table and the DSDT
generation for CPU related functionality into the CpuManager.
There is no functional change just code rearrangement.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When consumer of the HTTP API try to interact with cloud-hypervisor,
they have to provide the equivalent of the config structure related to
each component they need. Problem is, the Rust enum type "Option" cannot
be obtained from the OpenAPI YAML definition.
This patch intends to fix this inconsistency between what is possible
through the CLI and what's possible through the HTTP API by using simple
types bool and int64 instead of Option<u64>.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Previously the device setup code assumed that if no IOAPIC was passed in
then the device should be added to the kernel irqchip. As an earlier
change meant that there was always a userspace IOAPIC this kernel based
code can be removed.
The accessor still returns an Option type to leave scope for
implementing a situation without an IOAPIC (no serial or GED device).
This change does not add support no-IOAPIC mode as the original code did
not either.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Update the configuration after a resize to ensure that after a reboot
the added vCPUs are preserved.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This was causing issues when the kernel was trying to reset the
interrupt and making the reboot fail.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The tty mode remains raw mode when cloud-hypervisor is terminted by
SIGTERM or SIGINT. The terminal is unusable due to echoing is
disabled which is really annoying.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
Some physical address bits may become reserved in page table when SME
is enabled on AMD platform. Guest will trigger a reserved bit
violation page fault in this case due to write these reserved bits to 1
in page table. We need reduce the reserved bits to get the right
physical address range.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
The KVM_SET_GSI_ROUTING ioctl is very simple, it overrides the previous
routes configuration with the new ones being applied. This means the
caller, in this case cloud-hypervisor, needs to maintain the list of all
interrupts which needs to be active at all times. This allows to
correctly support multiple devices to be passed through the VM and being
functional at the same time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Call the "CTFY" method that will itself call Notify() on the CPU objects
in the ACPI namespace.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add ability to notify via the GED device that there is some new hotplug
activity. This will be used by the CpuManager (and later DeviceManager
itself) to notify of new hotplug activity.
Currently it has a hardcoded IRQ of 5 as the ACPI tables also need to
refer to this IRQ and the IRQ allocation does not permit the allocation
of specific IRQs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently only increasing the number of vCPUs is supported but in the
future it will be extended.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When configuring a processor after boot as a hotplug CPU we only
configure a subset of the CPU state. In particular we should not
configure the FPU, segment registers (or reconfigure the paging which is
a side-effect of that) nor the main registers. Achieve this by making
the function take an Option type for the start address.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the vCPU thread starting so that there is the possibility to
bring on extra vCPU threads.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently this just holds the thread handle but will be enlarged to
encompass details such as whether the vCPU is currently being inserted
or ejected.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MADT table contains the details of all the potential vCPUs and
whether they are present at boot (as indicated by the flags field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When initialising the ACPI tables and configuring the VM use the new
accessor on the CpuManager to get the number of boot vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
context ID on vsock man defines a 32-bits value, openapi default integer
is a signed 32-bits value.
This could lead to miss one bit during castings for typed client
implmentations. Lets increase the range of valid values by requesting an
int64.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
In case the VM is started with the flag "--pmem mergeable=on", it means
the user expects the guest persistent memory pages to be marked as
mergeable. This commit relies on the madvise(MADV_MERGEABLE) system call
to inform the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the persistent memory pages should
be marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VM is started with the flag "--memory mergeable=on", it
means the user expects the guest RAM pages to be marked as mergeable.
This commit relies on the madvise(MADV_MERGEABLE) system call to inform
the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the guest RAM pages should be
marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When vmm.ping give a response, we expect get the version from
the VMM not the vmm create
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
vmm.ping will help to check if http API server is up and
running.
This also removes the vmm.info endpoint.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
The CPU manager uses an I/O port and to prevent potential clashes with
assignment for PCI devices ensure that it is allocated by the allocator.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than hardcode the CPU status for all the CPUs instead query from
the CPU manager via the I/O port that is is on via the ACPI tables.
Each CPU device has a _STA method that calls into the CSTA method which
reads and writes the I/O ports via the PRST field which exposes the I/O
port through and OpRegion.
As we only support boot CPUS report that all the CPUs are enabled for
now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The Linux kernel expects all CPUs, whether they be enabled or disabled
to have an _MAT entry containing the LAPIC details for this CPU with the
enabled bit set to 1 (in the flags.)
In the MADT table the same bit is used to determine if the CPU is
present at boot vs available later.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When adding devices to the guest, and populating the device model, we
should prefix the routines with add_. When we're just creating the
device objects but not yet adding them we use make_.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
MMIO devices creation code into its own routine.
Fixes: #441
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
PCI devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
legacy devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
console creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Move CpuManager, Vcpu and related functionality to its own module (and
file) inside the VMM crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In most cases we return a 204 (No Content) and not a 201.
In those cases, we do not send any HTTP body back at all.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The new micro_http package provides a built-in HttpServer wrapper for
running a more robust HTTP server based on the package HTTP API.
Switching to this implementation allows us to, among other things,
handle HTTP requests that are larger than 1024 bytes.
Fixes: #423
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The HTTP API responses are encoded in json
Suggested-by: Samuel Ortiz <sameo@linux.intel.com>
Tested-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Update micro_http create to allow set content type.
Suggested-by: Samuel Ortiz <sameo@linux.intel.com>
Tested-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Pull details of vCPU management (booting, pausing, resuming, shutdown)
into it's own structure. This will ultimately enable this to be moved to
its own file and encapsulate all the vCPU handling for the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove ACPI table creation from arch crate to the vmm crate simplifying
arch::configure_system()
GuestAddress(0) is used to mean no RSDP table rather than adding
complexity with a conditional argument or an Option type as it will
evaluate to a zero value which would be the default anyway.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In vm_reboot, while build the new vm, the old one pointed by self.vm
is not released, that is, the tap opened by self.vm is not closed
either. As a result, the associated dev name slot in host kernel is
still in use state, which prevents the new build from picking it up as
the new opened tap's name, but to use the name in next slot finally.
Call self.vm_shutdown instead here since it has call take() on vm reference,
which could ensure the old vm is destructed before the new vm build.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Ensure that we tell the allocator about all the I/O ports that we are
using for I/O bus attached devices (serial, i8042, ACPI device.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to group together some functions that can be shared across
virtio transport layers, this commit introduces a new trait called
VirtioTransport.
The first function of this trait being ioeventfds() as it is needed from
both virtio-mmio and virtio-pci devices, represented by MmioDevice and
VirtioPciDevice structures respectively.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that kvm-ioctls has been updated, the function unregister_ioevent()
can be used to remove eventfd previously associated with some specific
PIO or MMIO guest address. Particularly, it is useful for the PCI BAR
reprogramming case, as we want to ensure the eventfd will only get
triggered by the new BAR address, and not the old one.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to rely on the latest kvm-ioctls version to benefit from the
recent addition of unregister_ioevent(), allowing us to detach a
previously registered eventfd to a PIO or MMIO guest address.
Because of this update, we had to modify the current constraint we had
on the vmm-sys-util crate, using ">= 0.1.1" instead of being strictly
tied to "0.2.0".
Once the dependency conflict resolved, this commit took care of fixing
build issues caused by recent modification of kvm-ioctls relying on
EventFd reference instead of RawFd.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The specific part of PCI BAR reprogramming that happens for a virtio PCI
device is the update of the ioeventfds addresses KVM should listen to.
This should not be triggered for every BAR reprogramming associated with
the virtio device since a virtio PCI device might have multiple BARs.
The update of the ioeventfds addresses should only happen when the BAR
related to those addresses is being moved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The PciDevice trait is supposed to describe only functions related to
PCI. The specific method ioeventfds() has nothing to do with PCI, but
instead would be more specific to virtio transport devices.
This commit removes the ioeventfds() method from the PciDevice trait,
adding some convenient helper as_any() to retrieve the Any trait from
the structure behing the PciDevice trait. This is the only way to keep
calling into ioeventfds() function from VirtioPciDevice, so that we can
still properly reprogram the PCI BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Storing a strong reference to the AddressManager behind the
DeviceRelocation trait results in a cyclic reference count.
Use a weak reference to break that dependency.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the value being written to the BAR, the implementation can
now detect if the BAR is being moved to another address. If that is the
case, it invokes move_bar() function from the DeviceRelocation trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to trigger the PCI BAR reprogramming from PciConfigIo and
PciConfigMmmio, we need the PciBus to have a hold onto the trait
implementation of DeviceRelocation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By implementing the DeviceRelocation trait for the AddressManager
structure, we now have a way to let the PCI BAR reprogramming happen.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to reuse the SystemAllocator later at runtime, it is moved into
the new structure AddressManager. The goal is to have a hold onto the
SystemAllocator and both IO and MMIO buses so that we can use them
later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a VFIO devices is being attached behind a virtual IOMMU, we
should not automatically map the entire guest memory for the specific
device.
A VFIO device attached to the virtual IOMMU will be driven with IOVAs,
hence we should simply wait for the requests coming from the virtual
IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When VFIO devices are created and if the device is attached to the
virtual IOMMU, the ExternalDmaMapping trait implementation is created
and associated with the device. The idea is to build a hash map of
device IDs with their associated trait implementation.
This hash map is provided to the virtual IOMMU device so that it knows
how to properly trigger external mappings associated with VFIO devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With this implementation of the trait ExternalDmaMapping, we now have
the tool to provide to the virtual IOMMU to trigger the map/unmap on
behalf of the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VFIO container is the object needed to update the VFIO mapping
associated with a VFIO device. This patch allows the device manager
to have access to the VFIO container.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch attaches VFIO devices to the virtual IOMMU if they are
identified as they should be, based on the option "iommu=on". This
simply takes care of adding the PCI device ID to the ACPI IORT table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a VFIO device should be attached to the virtual
IOMMU or not. That's why we introduce an extra option "iommu" with the
value "on" or "off". By default, the device is not attached, which means
"iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fix invalid type for version:
- VmInfo.version.type string
Change Null value from enum as it has problems to build clients with
openapi tools.
- ConsoleConfig.mode.enum Null -> Nil
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
We should return an explicit error when the transition from on VM state
to another is invalid.
The valid_transition() routine for the VmState enum essentially
describes the VM state machine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to pause a VM, we signal all the vCPU threads to get them out
of vmx non-root. Once out, the vCPU thread will check for a an atomic
pause boolean. If it's set to true, then the thread will park until
being resumed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
So that we don't need to forward an ExitBehaviour up to the VMM thread.
This simplifies the control loop and the VMM thread even further.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This commit is the glue between the virtio-pci devices attached to the
vIOMMU, and the IORT ACPI table exposing them to the guest as sitting
behind this vIOMMU.
An important thing is the trait implementation provided to the virtio
vrings for each device attached to the vIOMMU, as they need to perform
proper address translation before they can access the buffers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-vsock device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-console device should be attached to
this virtual IOMMU or not. That's why we introduce an extra option
"iommu" with the value "on" or "off". By default, the device is not
attached, which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-pmem device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-rng device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-net device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-blk device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
One side effect of this new option is that we had to introduce a new
option for the disk path, simply called "path=".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a simple iommu boolean field to the VmConfig structure so that we
can later use it to create a virtio-iommu device for the current VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtual IOMMU exposed through virtio-iommu device has a dependency
on ACPI. It needs to expose the device ID of the virtio-iommu device,
and all the other devices attached to this virtual IOMMU. The IDs are
expressed from a PCI bus perspective, based on segment, bus, device and
function.
The guest relies on the topology description provided by the IORT table
to attach devices to the virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>