Call the "CTFY" method that will itself call Notify() on the CPU objects
in the ACPI namespace.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add ability to notify via the GED device that there is some new hotplug
activity. This will be used by the CpuManager (and later DeviceManager
itself) to notify of new hotplug activity.
Currently it has a hardcoded IRQ of 5 as the ACPI tables also need to
refer to this IRQ and the IRQ allocation does not permit the allocation
of specific IRQs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently only increasing the number of vCPUs is supported but in the
future it will be extended.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When configuring a processor after boot as a hotplug CPU we only
configure a subset of the CPU state. In particular we should not
configure the FPU, segment registers (or reconfigure the paging which is
a side-effect of that) nor the main registers. Achieve this by making
the function take an Option type for the start address.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the vCPU thread starting so that there is the possibility to
bring on extra vCPU threads.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently this just holds the thread handle but will be enlarged to
encompass details such as whether the vCPU is currently being inserted
or ejected.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MADT table contains the details of all the potential vCPUs and
whether they are present at boot (as indicated by the flags field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When initialising the ACPI tables and configuring the VM use the new
accessor on the CpuManager to get the number of boot vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the kvm crates now depend on vmm-sys-util, the bump must be
atomic.
The kvm-bindings and ioctls 0.2.0 and 0.4.0 crates come with a few API
changes, one of them being the use of a kvm_ioctls specific error type.
Porting our code to that type makes for a fairly large diff stat.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
context ID on vsock man defines a 32-bits value, openapi default integer
is a signed 32-bits value.
This could lead to miss one bit during castings for typed client
implmentations. Lets increase the range of valid values by requesting an
int64.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
In case the VM is started with the flag "--pmem mergeable=on", it means
the user expects the guest persistent memory pages to be marked as
mergeable. This commit relies on the madvise(MADV_MERGEABLE) system call
to inform the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the persistent memory pages should
be marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VM is started with the flag "--memory mergeable=on", it
means the user expects the guest RAM pages to be marked as mergeable.
This commit relies on the madvise(MADV_MERGEABLE) system call to inform
the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the guest RAM pages should be
marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When vmm.ping give a response, we expect get the version from
the VMM not the vmm create
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
vmm.ping will help to check if http API server is up and
running.
This also removes the vmm.info endpoint.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
The CPU manager uses an I/O port and to prevent potential clashes with
assignment for PCI devices ensure that it is allocated by the allocator.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than hardcode the CPU status for all the CPUs instead query from
the CPU manager via the I/O port that is is on via the ACPI tables.
Each CPU device has a _STA method that calls into the CSTA method which
reads and writes the I/O ports via the PRST field which exposes the I/O
port through and OpRegion.
As we only support boot CPUS report that all the CPUs are enabled for
now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The Linux kernel expects all CPUs, whether they be enabled or disabled
to have an _MAT entry containing the LAPIC details for this CPU with the
enabled bit set to 1 (in the flags.)
In the MADT table the same bit is used to determine if the CPU is
present at boot vs available later.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When adding devices to the guest, and populating the device model, we
should prefix the routines with add_. When we're just creating the
device objects but not yet adding them we use make_.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
MMIO devices creation code into its own routine.
Fixes: #441
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
PCI devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
legacy devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
console creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Move CpuManager, Vcpu and related functionality to its own module (and
file) inside the VMM crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In most cases we return a 204 (No Content) and not a 201.
In those cases, we do not send any HTTP body back at all.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The new micro_http package provides a built-in HttpServer wrapper for
running a more robust HTTP server based on the package HTTP API.
Switching to this implementation allows us to, among other things,
handle HTTP requests that are larger than 1024 bytes.
Fixes: #423
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The HTTP API responses are encoded in json
Suggested-by: Samuel Ortiz <sameo@linux.intel.com>
Tested-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Update micro_http create to allow set content type.
Suggested-by: Samuel Ortiz <sameo@linux.intel.com>
Tested-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Pull details of vCPU management (booting, pausing, resuming, shutdown)
into it's own structure. This will ultimately enable this to be moved to
its own file and encapsulate all the vCPU handling for the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove ACPI table creation from arch crate to the vmm crate simplifying
arch::configure_system()
GuestAddress(0) is used to mean no RSDP table rather than adding
complexity with a conditional argument or an Option type as it will
evaluate to a zero value which would be the default anyway.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In vm_reboot, while build the new vm, the old one pointed by self.vm
is not released, that is, the tap opened by self.vm is not closed
either. As a result, the associated dev name slot in host kernel is
still in use state, which prevents the new build from picking it up as
the new opened tap's name, but to use the name in next slot finally.
Call self.vm_shutdown instead here since it has call take() on vm reference,
which could ensure the old vm is destructed before the new vm build.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Ensure that we tell the allocator about all the I/O ports that we are
using for I/O bus attached devices (serial, i8042, ACPI device.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to group together some functions that can be shared across
virtio transport layers, this commit introduces a new trait called
VirtioTransport.
The first function of this trait being ioeventfds() as it is needed from
both virtio-mmio and virtio-pci devices, represented by MmioDevice and
VirtioPciDevice structures respectively.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that kvm-ioctls has been updated, the function unregister_ioevent()
can be used to remove eventfd previously associated with some specific
PIO or MMIO guest address. Particularly, it is useful for the PCI BAR
reprogramming case, as we want to ensure the eventfd will only get
triggered by the new BAR address, and not the old one.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We need to rely on the latest kvm-ioctls version to benefit from the
recent addition of unregister_ioevent(), allowing us to detach a
previously registered eventfd to a PIO or MMIO guest address.
Because of this update, we had to modify the current constraint we had
on the vmm-sys-util crate, using ">= 0.1.1" instead of being strictly
tied to "0.2.0".
Once the dependency conflict resolved, this commit took care of fixing
build issues caused by recent modification of kvm-ioctls relying on
EventFd reference instead of RawFd.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The specific part of PCI BAR reprogramming that happens for a virtio PCI
device is the update of the ioeventfds addresses KVM should listen to.
This should not be triggered for every BAR reprogramming associated with
the virtio device since a virtio PCI device might have multiple BARs.
The update of the ioeventfds addresses should only happen when the BAR
related to those addresses is being moved.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The PciDevice trait is supposed to describe only functions related to
PCI. The specific method ioeventfds() has nothing to do with PCI, but
instead would be more specific to virtio transport devices.
This commit removes the ioeventfds() method from the PciDevice trait,
adding some convenient helper as_any() to retrieve the Any trait from
the structure behing the PciDevice trait. This is the only way to keep
calling into ioeventfds() function from VirtioPciDevice, so that we can
still properly reprogram the PCI BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Storing a strong reference to the AddressManager behind the
DeviceRelocation trait results in a cyclic reference count.
Use a weak reference to break that dependency.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the value being written to the BAR, the implementation can
now detect if the BAR is being moved to another address. If that is the
case, it invokes move_bar() function from the DeviceRelocation trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to trigger the PCI BAR reprogramming from PciConfigIo and
PciConfigMmmio, we need the PciBus to have a hold onto the trait
implementation of DeviceRelocation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By implementing the DeviceRelocation trait for the AddressManager
structure, we now have a way to let the PCI BAR reprogramming happen.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to reuse the SystemAllocator later at runtime, it is moved into
the new structure AddressManager. The goal is to have a hold onto the
SystemAllocator and both IO and MMIO buses so that we can use them
later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case a VFIO devices is being attached behind a virtual IOMMU, we
should not automatically map the entire guest memory for the specific
device.
A VFIO device attached to the virtual IOMMU will be driven with IOVAs,
hence we should simply wait for the requests coming from the virtual
IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When VFIO devices are created and if the device is attached to the
virtual IOMMU, the ExternalDmaMapping trait implementation is created
and associated with the device. The idea is to build a hash map of
device IDs with their associated trait implementation.
This hash map is provided to the virtual IOMMU device so that it knows
how to properly trigger external mappings associated with VFIO devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With this implementation of the trait ExternalDmaMapping, we now have
the tool to provide to the virtual IOMMU to trigger the map/unmap on
behalf of the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VFIO container is the object needed to update the VFIO mapping
associated with a VFIO device. This patch allows the device manager
to have access to the VFIO container.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch attaches VFIO devices to the virtual IOMMU if they are
identified as they should be, based on the option "iommu=on". This
simply takes care of adding the PCI device ID to the ACPI IORT table.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a VFIO device should be attached to the virtual
IOMMU or not. That's why we introduce an extra option "iommu" with the
value "on" or "off". By default, the device is not attached, which means
"iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fix invalid type for version:
- VmInfo.version.type string
Change Null value from enum as it has problems to build clients with
openapi tools.
- ConsoleConfig.mode.enum Null -> Nil
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
We should return an explicit error when the transition from on VM state
to another is invalid.
The valid_transition() routine for the VmState enum essentially
describes the VM state machine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to pause a VM, we signal all the vCPU threads to get them out
of vmx non-root. Once out, the vCPU thread will check for a an atomic
pause boolean. If it's set to true, then the thread will park until
being resumed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
So that we don't need to forward an ExitBehaviour up to the VMM thread.
This simplifies the control loop and the VMM thread even further.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This commit is the glue between the virtio-pci devices attached to the
vIOMMU, and the IORT ACPI table exposing them to the guest as sitting
behind this vIOMMU.
An important thing is the trait implementation provided to the virtio
vrings for each device attached to the vIOMMU, as they need to perform
proper address translation before they can access the buffers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-vsock device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-console device should be attached to
this virtual IOMMU or not. That's why we introduce an extra option
"iommu" with the value "on" or "off". By default, the device is not
attached, which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-pmem device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-rng device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-net device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-blk device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
One side effect of this new option is that we had to introduce a new
option for the disk path, simply called "path=".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding a simple iommu boolean field to the VmConfig structure so that we
can later use it to create a virtio-iommu device for the current VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtual IOMMU exposed through virtio-iommu device has a dependency
on ACPI. It needs to expose the device ID of the virtio-iommu device,
and all the other devices attached to this virtual IOMMU. The IDs are
expressed from a PCI bus perspective, based on segment, bus, device and
function.
The guest relies on the topology description provided by the IORT table
to attach devices to the virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case some virtio devices are attached to the virtual IOMMU, their
vring addresses need to be translated from IOVA into GPA. Otherwise it
makes no sense to try to access them, and they would cause out of range
errors.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding virtio feature VIRTIO_F_IOMMU_PLATFORM when explicitly asked by
the user. The need for this feature is to be able to attach the virtio
device to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The virtio specification defines a device can be reset, which was not
supported by this virtio-console implementation. The reason it is needed
is to support unbinding this device from the guest driver, and rebind it
to vfio-pci driver.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As of commit 2b94334a, Firecracker includes all the changes we need.
We can now switch to using it instead of carrying a copy.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>