It's missing a few knobs (readonly, vhost, wce) that should be exposed
through the rest API.
Fixes: #790
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)
In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.
The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.
Fixes: #783
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.
It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.
Fixes#763
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Currently this only supports creating a new tap interface as the network
backend also only supports that.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))
The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the ioctl syscall KVM_CREATE_VM gets interrupted while creating the
VM, it is expected that we should retry since EINTR should not be
considered a standard error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We create 2 different interrupt managers for separately handling
creation of legacy and MSI interrupt groups.
Doing so allows us to have a cleaner interrupt manager and IOAPIC
initialization path. It also prepares for an InterruptManager trait
design improvement where we remove the interrupt source type dependency
by associating an interrupt configuration type to the trait.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A reference to the VmFd is stored on the AddressManager so it is not
necessary to pass in the VmInfo into all methods that need it as it can
be obtained from the AddressManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The DeviceManager has a reference to the MemoryManager so use that to
get the GuestMemoryMmap rather than the version stored in the VmInfo
struct.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the use of vm_info in methods to get the config and instead use
the config stored on the DeviceManager itself.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. This prepares the way to more easily store state on
the DeviceManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. A follow-up commit will change the callee functions
that create the devices themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Modify these functions to take an &mut self and become methods on
DeviceManager. This allows the removal of some in/out parameters and
leads the way to further refactoring and simplification.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MemoryManager should only be included on the I/O bus when doing ACPI
builds as that is the only time it will be interrogated.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently the MemoryManager is only used on the ACPI code paths after
the DeviceManager has been created. This will change in a future commit
as part of the refactoring so for now always include it but name it with
underscore prefix to indicate it might not always be used.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that devices attached to the virtual IOMMU are described through
virtio configuration, there is no need for the DeviceManager to store
the list of IDs for all these devices. Instead, things are handled
locally when PCI devices are being added.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of relying on the ACPI tables to describe the devices attached
to the virtual IOMMU, let's use the virtio topology, as the ACPI support
is getting deprecated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-block and
vhost-user-block. For now it is necessary to specify both vhost_user
and socket parameters as auto activation is not yet implemented. The wce
parameter for supporting "Write Cache Enabling" is also added to the
disk configuration.
The original command line parameter is still supported for now and will
be removed in a future release.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-net and vhost-user-net.
For now it is necessary to specify both vhost_user and socket parameters
as auto activation is not yet implemented. The original command line
parameter is still supported for now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit improves the existing virtio-blk implementation, allowing
for better I/O performance. The cost for the end user is to accept
allocating more vCPUs to the virtual machine, so that multiple I/O
threads can run in parallel.
One thing to notice, the amount of vCPUs must be egal or superior to the
amount of queues dedicated to the virtio-blk device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The number of queues and the size of each queue were not configurable.
In anticipation for adding multiqueue support, this commit introduces
some new parameters to let the user decide about the number of queues
and the queue size.
Note that the default values for each of these parameters are identical
to the default values used for vhost-user-blk, that is 1 for the number
of queues and 128 for the queue size.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Devices like virtio-pmem and virtio-fs require some dedicated memory
region to be mapped. The memory mapping from the DeviceManager is being
replaced by the usage of MmapRegion from the vm-memory crate.
The unmap will happen automatically when the MmapRegion will be dropped,
which should happen when the DeviceManager gets dropped.
Fixes#240
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Move GED device reporting of required device type to scan into an MMIO
region rather than an I/O port.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than have the MemoryManager device sit on the I/O bus allocate
space for MMIO and add it to the MMIO bus.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit relies on the interrupt manager and the resulting interrupt
source group to abstract the knowledge about KVM and how interrupts are
updated and delivered.
This allows the entire "devices" crate to be freed from kvm_ioctls and
kvm_bindings dependencies.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The interrupt manager is passed to the IOAPIC creation, and the IOAPIC
now creates an InterruptSourceGroup for MSI interrupts based on it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By introducing a new InterruptManager dedicated to the IOAPIC, we don't
have to solve the chicken and eggs problem about which of the
InterruptManager or the Ioapic should be created first. It's also
totally fine to have two interrupt manager instances as they both share
the same list of GSI routes and the same allocator.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'direct' property, honor
it while opening the file backing the disk image, and pass it to
vm_virtio::RawFile.
Fixes#631
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_blk already has it, so it's only fair to give it to
virtio-blk too. Extend DiskConfig with a 'readonly' properly, and pass
it to vm_virtio::Block.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The build is run against "--all-features", "pci,acpi", "pci" and "mmio"
separately. The clippy validation must be run against the same set of
features in order to validate the code is correct.
Because of these new checks, this commit includes multiple fixes
related to the errors generated when manually running the checks.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There's no need for assign_irq() or assign_msix() functions from the
PciDevice trait, as we can see it's never used anywhere in the codebase.
That's why it's better to remove these methods from the trait, and
slightly adapt the existing code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since the InterruptManager is never stored into any structure, it should
be passed as a reference instead of being cloned.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Additionally, since it removes the last bit relying on the Interrupt
trait, the trait and its implementation can be removed from the
codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit replaces the way legacy interrupts were handled with the
brand new implementation of the legacy InterruptSourceGroup for KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Relying on the previous commits, the legacy interrupt implementation can
be completed. The IOAPIC handler is used to deliver the interrupt that
will be triggered through the trigger() method.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By having a reference to the IOAPIC, the KvmInterruptManager is going
to be able to initialize properly the legacy interrupt source group.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to be able to use the InterruptManager abstraction with
virtio-mmio devices, this commit introduces InterruptSourceGroup's
skeleton for legacy interrupts.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When KvmInterruptManager initializes a new InterruptSourceGroup, it's
only for PCI_MSI_IRQ case that it needs to allocate the GSI and create a
new InterruptRoute. That's why this commit moves the general code into
the specific use case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the base InterruptIndex is different from 0, the loop allocating
GSI and HashMap entries won't work as expected. The for loop needs to
start from base, but the limit must be base+count so that we allocate
a number of "count" entries.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the InterruptManager be shared across both PCI and MMIO
devices, this commit moves the initialization earlier in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After refactoring a common function is used to setup these slots and
that function takes care of allocating a new slot so it is not necessary
to reserve the initial region slots.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Based on all the previous changes, we can at this point replace the
entire interrupt management with the implementation of InterruptManager
and InterruptSourceGroup traits.
By using KvmInterruptManager from the DeviceManager, we can provide both
VirtioPciDevice and VfioPciDevice a way to pick the kind of
InterruptSourceGroup they want to create. Because they choose the type
of interrupt to be MSI/MSI-X, they will be given a MsiInterruptGroup.
Both MsixConfig and MsiConfig are responsible for the update of the GSI
routes, which is why, by passing the MsiInterruptGroup to them, they can
still perform the GSI route management without knowing implementation
details. That's where the InterruptSourceGroup is powerful, as it
provides a generic way to manage interrupt, no matter the type of
interrupt and no matter which hypervisor might be in use.
Once the full replacement has been achieved, both SystemAllocator and
KVM specific dependencies can be removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After the skeleton of InterruptManager and InterruptSourceGroup traits
have been implemented, this new commit takes care of fully implementing
the content of KvmInterruptManager (InterruptManager trait) and
MsiInterruptGroup (InterruptSourceGroup).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Callbacks are not the most Rust idiomatic way of programming. The right
way is to use a Trait to provide multiple implementation of the same
interface.
Additionally, a Trait will allow for multiple functions to be defined
while using callbacks means that a new callback must be introduced for
each new function we want to add.
For these two reasons, the current commit modifies the existing
VirtioInterrupt callback into a Trait of the same name.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating KVM GSI routes at
some point, it is necessary that it can access the list of routes
contained by gsi_msi_routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because MsixConfig will be responsible for updating the KVM GSI routes
at some point, it must have access to the VmFd to invoke the KVM ioctl
KVM_SET_GSI_ROUTING.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point here is to let MsixConfig take care of the GSI allocation,
which means the SystemAllocator must be passed from the vmm crate all
the way down to the pci crate.
Once this is done, the GSI allocation and irq_fd creation is performed
by MsixConfig directly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we will need to share the same list of GSI routes across
multiple PCI devices (virtio-pci, VFIO), this commit moves the creation
of such list to a higher level location in the code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:
- Address of the source/destination memory buffer.
- File offset.
- Length of the data to be read/written.
The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".
To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.
We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.
Update cloud-hypervisor.yaml with the new options for NetConfig.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add support to allow VMMs to open the same tap device many times, it will
create multiple file descriptors meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Use independent bits for storing whether there is a CPU or memory device
changed when reporting changes via ACPI GED interrupt. This prevents a
later notification squashing an earlier one and ensure that hotplugging
both CPU and memory at the same time succeeds.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If a new amount of RAM is requested in the VmResize command try and
hotplug if it an increase (MemoryManager::Resize() silently ignores
decreases.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If there is a GED interrupt and the field indicates that the memory
device has changed triggers a scan of the memory devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Generate and expose the DSDT table entries required to support memory
hotplug. The AML methods call into the MemoryManager via I/O ports
exposed as fields.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Expose the details of hotplug RAM slots via an I/O port. This will be
consumed by the ACPI DSDT tables to report the hotplug memory details to
the guest.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a "resize()" method on MemoryManager which will create a new memory
allocation based on the difference between the desired RAM amount and
the amount already in use. After allocating the added RAM using the same
backing method as the boot RAM store the details in a vector and update
the KVM map and create a new GuestMemoryMmap and replace all the users.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When the value is read from the I/O port via the ACPI AML functions to
determine what has been triggered the notifiction value is reset
preventing a second read from exposing the value. If we need support
multiple types of GED notification (such as memory hotplug) then we
should avoid reading the value multiple times.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Make the GuestMemoryMmap from a Vec<Arc<GuestRegionMmap>> by using this
method we can persist a set of regions in the MemoryManager and then
extend this set with a newly created region. Ultimately that will allow
the hotplug of memory.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If neither PCI or MMIO are built in, we should not bother creating any
virtio devices at all.
When building a minimal VMM made of a kernel with an initramfs and a
serial console, the RNG virtio device is still created even though there
is no way it can ever get probed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Because virtio-iommu is still evolving (as it's only partly upstream),
some pieces like the ACPI declaration of the different nodes and devices
attached to the virtual IOMMU are changing.
This patch introduces a new ACPI table called VIOT, standing as the high
level table overseeing the IORT table and associated subtables.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>