Based on the seccomp crate, we create a new vmm module responsible for
creating a seccomp filter that will be applied to the VMM main thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The seccomp crate from Firecracker is nicely implemented, documented and
tested, which is a good reason for relying on it to create and apply
seccomp filters.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.
This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use this boolean to turn on the KVM_MEM_READONLY flag to indicate that
this memory mapping should not be writable by the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
According to `asm-generic/termios.h`, the `struct winsize` should be:
struct winsize {
unsigned short ws_row;
unsigned short ws_col;
unsigned short ws_xpixel;
unsigned short ws_ypixel;
};
The ioctl of TIOCGWINSZ will trigger a segfault on aarch64.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
This feature is stable and there is no need for this to be behind a
flag. This will also reduce the time needed to run the integration test
as we will not be running them all again under the flag.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This table currently contains only all the VFIO devices and it should
really contain all the PCI devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previously this was only returned if the device had an IOMMU mapping and
whether the device should be added to the virtio-iommu. This was already
captured earlier as part of creating the device so use that information
instead.
Always returning the B/D/F is helpful as it facilitates virtio PCI
device hotplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
I spent a few minutes trying to understand why we were unconditionally
updating the VM config memory size, even if the guest memory resizing
did not happen.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The IORT table for virtio-iommu use was removed and replaced with a
purely virtio based solution. Although the table construction was
removed these structures were left behind.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use a new feature called "pvh_boot" to enable using the PVH boot
protocol if the guest kernel supports it. The feature can be enabled
by building with:
cargo build [--release] --features "pvh_boot"
Once performance has been evaluated, this can be made part of the
default set of features so that any guest that supports it boots
using PVH as the preferred option as is the case in QEMU.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Fill the hvm_start_info and related memory map structures as
specified in the PVH boot protocol. Write the data structures
to guest memory at the GPA that will be stored in %rbx when
the guest starts.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
In order to properly initialize the kvm regs/sregs structs for
the guest, the load_kernel() return type must specify which
boot protocol to use with the entry point address it returns.
Make load_kernel() return an EntryPoint struct containing the
required information. This structure will later be used
in the vCPU configuration methods to setup the appropriate
initial conditions for the guest.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
When using "--disk" with a vhost socket and not using self spawning then
it is not necessary or helpful to specify the path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By using a Vec to hold the list of devices on the PciBus, there's a
problem when we use unplug. Indeed, the vector of devices gets reduced
and if the unplugged device was not the last one from the list, every
other device after this one is shifted on the bus.
To solve this problem, a HashMap is used. This allows to keep track of
the exact place where each device stands on the bus.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The option desired_ram is in byte, make larger the amount of memory to
add.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
With some of the factorization that happened to be able to support VFIO
hotplug, one mistake was made. In case a vIOMMU is created through a
virtio-iommu device, and no matter the "iommu" option value from the
VFIO device parameter, the VFIO device was always placed behind the
virtual IOMMU.
This commit fixes this wrong behavior by making sure the device
configuration is taken into account to decide if it should be attached
or not to the virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new id option to the VFIO hotplug command so that it matches the
VFIO coldplug semantic.
This is done by refactoring the existing code for VFIO hotplug, where
VmAddDeviceData structure is replaced by DeviceConfig. This structure is
the one used whenever a VFIO device is coldplugged, which is why it
makes sense to reuse it for the hotplug codepath.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.
This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.
If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.
Fixes#881
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The 32 bits MMIO address space is handled separately from the 64 bits
one. For this reason, we need to invoke the appropriate freeing function
to remove a range from this address space.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that PciDevice trait has a dedicated function to remove the bars,
the DeviceManager can invoke this function whenever a PCI device is
unplugged from the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Upon removal of a PCI device, make sure we don't hold onto the device ID
as it could be reused for another device later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to handle the case where devices are very often plugged and
unplugged from a VM, we need to handle the PCI device ID allocation
better.
Any PCI device could be removed, which means we cannot simply rely on
the vector size to give the next available PCI device ID.
That's why this patch stores in memory the information about the 32
slots availability. Based on this information, whenever a new slot is
needed, the code can correctly provide an available ID, or simply return
an error because all slots are taken.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit ensures that when a VFIO device is hot-unplugged from the
VM, it is also removed from the VmConfig. This prevents a potential
reboot from creating the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new field to the DeviceConfig, allowing the VMM to allocate a name
to the VFIO devices.
By identifying a VFIO device with a unique name, we can make sure a user
can properly unplug it at any time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit implements the eject function so that a VFIO device will be
removed from any bus it might sit on, and from any list it might be
stored in.
The idea is to reach a point where there is no reference of the device
anywhere in the code, so that the Drop implementation will be invoked
and so that the device will be fully removed from the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the guest OS is done removing a PCI device, it will invoke the _EJ0
method from ACPI, associated with the device. This will trigger a port
IO write to a region known by the VMM. Upon this writing, the VMM will
trap the VM exit and retrieve the written value.
Based on the value, the VMM will invoke its eject_device() method to
finalize the removal of the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As we try to keep track of every PCI device related to the VM, we don't
want to have separate lists depending on the concrete type associated
with the PciDevice trait. Also, we want to be able to cast the actual
type into any trait or concrete type.
The most efficient way to solve all these issues is to store every
device as an Arc<dyn Any + Send + Sync>. This gives the ability to
downcast into the appropriate concrete type, and then to cast back into
any trait that we might need.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new list storing the device names across the entire codebase. VFIO
devices are added to the list whenever a new one is created. By default,
each VFIO device is given a name "vfioX" where X is the first available
integer.
Along with this new list of names, another list is created, grouping PCI
device's name with its associated b/d/f. This will be useful to keep
track of the created devices so that we can implement unplug
functionality.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The Vm structure was used to store a strong reference to the IO bus.
This is not needed anymore since the AddressManager is logically the
one holding this strong reference. This has been made possible by the
introduction of Weak references on the Bus structure itself.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
DeviceManager anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
CpuManager anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the IO
and MMIO buses, there's no need to use Weak references from the PciBus
anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point is to make sure the DeviceManager holds a strong reference of
each BusDevice inserted on the IO and MMIO buses. This will allow these
buses to hold Weak references onto the BusDevice devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.
This commit simply takes care of making the necessary changes to change
this function as mutable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's more logical to name the field referring to the DeviceManager as
"device_manager" instead of "devices".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:
DeviceManager ---> AddressManager ---> Bus ---> BusDevice
^ |
| |
+---------------------------------------------+
This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.
One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>