Anticipating the need for a slightly different function for restoring
vCPUs, this patch factorizes most of the vCPU creation, so that it can
be reused for migration purposes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
These two new helpers will be useful to capture a vCPU state and being
able to restore it at a later time.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In anticipation for the CpuManager to aggregate all Vcpu snapshots
together, this change makes sure the CpuManager has a handle onto
every vCPU.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based on the list of Migratable devices stored by the DeviceManager, the
DeviceManager can implement the Snapshottable trait by aggregating all
devices snapshots together.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Serial and Ioapic both implement the Migratable trait, hence the
DeviceManager can store them in the list of Migratable devices.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
We need the project to rely on kvm-bindings and kvm-ioctls branches
which include the serde derive to be able to serialize and deserialize
some KVM structures.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The parse_size helper function can now be consolidated into the
ByteSized FromStr implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now all parsing code makes use of the Toggle and it's FromStr support
move the helper function into the from_str() implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The integration tests and documentation make use of empty value strings
like "--net tap=" accept them but return None so that the default value
will be used as expected.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This simplifies the parsing of the option by using OptionParser along
with its automatic conversion behaviour.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Byte sizes are quantities ending in "K", "M", "G" and by implementing
this type with a FromStr implementation the values can be converted
using .parse().
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Before porting over to OptionParser add a unit test to validate the
current memory parsing code. This showed up a bug where the "size=" was
always required. Temporarily resolve this by assigning the string a
default value which will later be replaced when the code is refactored.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This will be used to simplify and consolidate much of the parsing code
used for command line parameters.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
On some systems, the open() system call is used by Cloud-Hypervisor,
that's why it should be part of the seccomp filters whitelist.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both clock_gettime and gettimeofday syscalls where missing when running
Cloud-Hypervisor on a Linux host without vDSO enabled. On a system with
vDSO enabled, the syscalls performed by vDSO were not filtered, that's
why we didn't have to whitelist them.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Extend the update_memory() method from DeviceManager so that VFIO PCI
devices can update their DMA mappings to the physical IOMMU, after a
memory hotplug has been performed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the memory is resized, it's important to retrieve the new
region to pass it down to the device manager, this way it can decide
what to do with it.
Also, there's no need to use a boolean as we can instead use an Option
to carry the information about the region. In case of virtio-mem, there
will be no region since the whole memory has been reserved up front by
the VMM at boot. This means only the ACPI hotplug will return a region
and is the only method that requires the memory to be updated from the
device manager.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Commit 2adddce2 reorganized the crate for a cleaner multi architecture
(x86_64 and aarch64) support.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
For now, the codebase does not support booting from initramfs with PVH
boot protocol, therefore we need to fallback to the legacy boot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
* load the initramfs File into the guest memory, aligned to page size
* finally setup the initramfs address and its size into the boot params
(in configure_64bit_boot)
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".
Signed-off-by: Hui Zhu <teawater@antfin.com>
The persistent memory will be hotplugged via DeviceManager and saved in
the config for later use.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split it into a method that creates a single device which is called by
the multiple device version so this can be used when dynamically adding
a device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Whenever the VM memory is resized, DeviceManager needs to be notified
so that it can subsequently notify each virtio devices about it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This separates the filters used between the VMM and API threads, so that
we can apply different rules for each thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the seccomp crate, we create a new vmm module responsible for
creating a seccomp filter that will be applied to the VMM main thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The seccomp crate from Firecracker is nicely implemented, documented and
tested, which is a good reason for relying on it to create and apply
seccomp filters.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.
This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use this boolean to turn on the KVM_MEM_READONLY flag to indicate that
this memory mapping should not be writable by the VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
According to `asm-generic/termios.h`, the `struct winsize` should be:
struct winsize {
unsigned short ws_row;
unsigned short ws_col;
unsigned short ws_xpixel;
unsigned short ws_ypixel;
};
The ioctl of TIOCGWINSZ will trigger a segfault on aarch64.
Signed-off-by: Qiu Wenbo <qiuwenbo@phytium.com.cn>
This feature is stable and there is no need for this to be behind a
flag. This will also reduce the time needed to run the integration test
as we will not be running them all again under the flag.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This table currently contains only all the VFIO devices and it should
really contain all the PCI devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previously this was only returned if the device had an IOMMU mapping and
whether the device should be added to the virtio-iommu. This was already
captured earlier as part of creating the device so use that information
instead.
Always returning the B/D/F is helpful as it facilitates virtio PCI
device hotplug.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
I spent a few minutes trying to understand why we were unconditionally
updating the VM config memory size, even if the guest memory resizing
did not happen.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The IORT table for virtio-iommu use was removed and replaced with a
purely virtio based solution. Although the table construction was
removed these structures were left behind.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use a new feature called "pvh_boot" to enable using the PVH boot
protocol if the guest kernel supports it. The feature can be enabled
by building with:
cargo build [--release] --features "pvh_boot"
Once performance has been evaluated, this can be made part of the
default set of features so that any guest that supports it boots
using PVH as the preferred option as is the case in QEMU.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Fill the hvm_start_info and related memory map structures as
specified in the PVH boot protocol. Write the data structures
to guest memory at the GPA that will be stored in %rbx when
the guest starts.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
In order to properly initialize the kvm regs/sregs structs for
the guest, the load_kernel() return type must specify which
boot protocol to use with the entry point address it returns.
Make load_kernel() return an EntryPoint struct containing the
required information. This structure will later be used
in the vCPU configuration methods to setup the appropriate
initial conditions for the guest.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
When using "--disk" with a vhost socket and not using self spawning then
it is not necessary or helpful to specify the path.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By using a Vec to hold the list of devices on the PciBus, there's a
problem when we use unplug. Indeed, the vector of devices gets reduced
and if the unplugged device was not the last one from the list, every
other device after this one is shifted on the bus.
To solve this problem, a HashMap is used. This allows to keep track of
the exact place where each device stands on the bus.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The option desired_ram is in byte, make larger the amount of memory to
add.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
With some of the factorization that happened to be able to support VFIO
hotplug, one mistake was made. In case a vIOMMU is created through a
virtio-iommu device, and no matter the "iommu" option value from the
VFIO device parameter, the VFIO device was always placed behind the
virtual IOMMU.
This commit fixes this wrong behavior by making sure the device
configuration is taken into account to decide if it should be attached
or not to the virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new id option to the VFIO hotplug command so that it matches the
VFIO coldplug semantic.
This is done by refactoring the existing code for VFIO hotplug, where
VmAddDeviceData structure is replaced by DeviceConfig. This structure is
the one used whenever a VFIO device is coldplugged, which is why it
makes sense to reuse it for the hotplug codepath.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.
This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.
If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.
Fixes#881
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The 32 bits MMIO address space is handled separately from the 64 bits
one. For this reason, we need to invoke the appropriate freeing function
to remove a range from this address space.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that PciDevice trait has a dedicated function to remove the bars,
the DeviceManager can invoke this function whenever a PCI device is
unplugged from the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Upon removal of a PCI device, make sure we don't hold onto the device ID
as it could be reused for another device later.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to handle the case where devices are very often plugged and
unplugged from a VM, we need to handle the PCI device ID allocation
better.
Any PCI device could be removed, which means we cannot simply rely on
the vector size to give the next available PCI device ID.
That's why this patch stores in memory the information about the 32
slots availability. Based on this information, whenever a new slot is
needed, the code can correctly provide an available ID, or simply return
an error because all slots are taken.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit ensures that when a VFIO device is hot-unplugged from the
VM, it is also removed from the VmConfig. This prevents a potential
reboot from creating the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new field to the DeviceConfig, allowing the VMM to allocate a name
to the VFIO devices.
By identifying a VFIO device with a unique name, we can make sure a user
can properly unplug it at any time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit implements the eject function so that a VFIO device will be
removed from any bus it might sit on, and from any list it might be
stored in.
The idea is to reach a point where there is no reference of the device
anywhere in the code, so that the Drop implementation will be invoked
and so that the device will be fully removed from the VMM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the guest OS is done removing a PCI device, it will invoke the _EJ0
method from ACPI, associated with the device. This will trigger a port
IO write to a region known by the VMM. Upon this writing, the VMM will
trap the VM exit and retrieve the written value.
Based on the value, the VMM will invoke its eject_device() method to
finalize the removal of the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As we try to keep track of every PCI device related to the VM, we don't
want to have separate lists depending on the concrete type associated
with the PciDevice trait. Also, we want to be able to cast the actual
type into any trait or concrete type.
The most efficient way to solve all these issues is to store every
device as an Arc<dyn Any + Send + Sync>. This gives the ability to
downcast into the appropriate concrete type, and then to cast back into
any trait that we might need.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new list storing the device names across the entire codebase. VFIO
devices are added to the list whenever a new one is created. By default,
each VFIO device is given a name "vfioX" where X is the first available
integer.
Along with this new list of names, another list is created, grouping PCI
device's name with its associated b/d/f. This will be useful to keep
track of the created devices so that we can implement unplug
functionality.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The Vm structure was used to store a strong reference to the IO bus.
This is not needed anymore since the AddressManager is logically the
one holding this strong reference. This has been made possible by the
introduction of Weak references on the Bus structure itself.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
DeviceManager anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the
IO and MMIO buses, there's no need to use Weak references from the
CpuManager anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that the BusDevice devices are stored as Weak references by the IO
and MMIO buses, there's no need to use Weak references from the PciBus
anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The point is to make sure the DeviceManager holds a strong reference of
each BusDevice inserted on the IO and MMIO buses. This will allow these
buses to hold Weak references onto the BusDevice devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The method add_vfio_device() from the DeviceManager needs to be mutable
if we want later to be able to update some internal fields from the
DeviceManager from this same function.
This commit simply takes care of making the necessary changes to change
this function as mutable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It's more logical to name the field referring to the DeviceManager as
"device_manager" instead of "devices".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By inserting the DeviceManager on the IO bus, we introduced some cyclic
dependency:
DeviceManager ---> AddressManager ---> Bus ---> BusDevice
^ |
| |
+---------------------------------------------+
This cycle needs to be broken by inserting a Weak reference instead of
an Arc (considered as a strong reference).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Ensures the configuration is updated after a new device has been
hotplugged. In the event of a reboot, this means the new VM will be
started with the new device that had been previously hotplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit finalizes the VFIO PCI hotplug support, based on all the
previous commits preparing for it.
One thing to notice, this does not support vIOMMU yet. This means we can
hotplug VFIO PCI devices, but we cannot attach them to an existing or a
new virtio-iommu device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This factorization is very important as it will allow both the standard
codepath and the VFIO PCI hotplug codepath to rely on the same function
to perform the addition of a new VFIO PCI device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Whenever the user wants to hotplug a new VFIO PCI device, the VMM will
have to trigger a hotplug notification through the GED device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Through the BusDevice implementation from the DeviceManager, and by
inserting the DeviceManager on the IO bus for a specific IO port range,
the VMM now has the ability to handle PCI device hotplug.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation of inserting the DeviceManager on the IO/MMIO buses,
the DeviceManager must implement the BusDevice trait.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Create a small method that will perform both hotplug of all the devices
identified by PCIU bitmap, and then perform the hotunplug of all the
devices identified by the PCID bitmap.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The _EJ0 method provides the guest OS a way to notify the VMM that the
device has been properly ejected from the guest OS. Only after this
point, the VMM can fully remove the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new PHPR device in the DSDT table introduces some specific
operation regions and the associated fields.
PCIU stands for "PCI up", which identifies PCI devices that must be
added.
PCID stands for "PCI down", which identifies PCI devices that must be
removed.
B0EJ stands for "Bus 0 eject", which identifies which device on the bus
has been ejected by the guest OS.
Thanks to these fields, the VMM and the guest OS can communicate while
performing hotplug/hotunplug operations.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adds the DVNT method to the PCI0 device in the DSDT table. This new
method is responsible for checking each slot and notify the guest OS if
one of the slots is supposed to be added or removed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the ACPI support for describing the 32 device
slots attached to the main PCI host bridge.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In anticipation of the support for device hotplug, this commit moves the
DeviceManager object into an Arc<Mutex<>> when the DeviceManager is
being created. The reason is, we need the DeviceManager to implement the
BusDevice trait and then provide it to the IO bus, so that IO accesses
related to device hotplug can be handled correctly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We want to prevent from losing interrupts while they are masked. The
way they can be lost is due to the internals of how they are connected
through KVM. An eventfd is registered to a specific GSI, and then a
route is associated with this same GSI.
The current code adds/removes a route whenever a mask/unmask action
happens. Problem with this approach, KVM will consume the eventfd but
it won't be able to find an associated route and eventually it won't
be able to deliver the interrupt.
That's why this patch introduces a different way of masking/unmasking
the interrupts, simply by registering/unregistering the eventfd with the
GSI. This way, when the vector is masked, the eventfd is going to be
written but nothing will happen because KVM won't consume the event.
Whenever the unmask happens, the eventfd will be registered with a
specific GSI, and if there's some pending events, KVM will trigger them,
based on the route associated with the GSI.
Suggested-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Recently, vhost_user_block gained the ability of actively polling the
queue, a feature that can be disabled with the poll_queue property.
This change adds this property to DiskConfig, so it can be used
through the "disk" argument.
For the moment, it can only be used when vhost_user=true, but this
will change once virtio-block gets the poll_queue feature too.
Fixes: #787
Signed-off-by: Sergio Lopez <slp@redhat.com>
Fix "readonly" and "wce" defaults in cloud-hypervisor.yaml to match
their respective defaults in config.rs:DiskConfig.
Signed-off-by: Sergio Lopez <slp@redhat.com>
It's missing a few knobs (readonly, vhost, wce) that should be exposed
through the rest API.
Fixes: #790
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)
In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.
The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.
Fixes: #783
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the latest vm-memory version, including the freshly
introduced structure GuestMemoryAtomic, this patch replaces every
occurrence of Arc<ArcSwap<GuestMemoryMmap> with
GuestMemoryAtomic<GuestMemoryMmap>.
The point is to rely on the common RCU-like implementation from
vm-memory so that we don't have to do it from Cloud-Hypervisor.
Fixes#735
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--disk"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When a virtio-fs device is created with a dedicated shared region, by
default the region should be mapped as PROT_NONE so that no pages can be
faulted in.
It's only when the guest performs the mount of the virtiofs filesystem
that we can expect the VMM, on behalf of the backend, to perform some
new mappings in the reserved shared window, using PROT_READ and/or
PROT_WRITE.
Fixes#763
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Currently this only supports creating a new tap interface as the network
backend also only supports that.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))
The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the ioctl syscall KVM_CREATE_VM gets interrupted while creating the
VM, it is expected that we should retry since EINTR should not be
considered a standard error.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the InterruptManager trait depend on an InterruptType forces
implementations into supporting potentially very different kind of
interrupts from the same code base. What we're defining through the
current, interrupt type based create_group() method is a need for having
different interrupt managers for different kind of interrupts.
By associating the InterruptManager trait to an interrupt group
configuration type, we create a cleaner design to support that need as
we're basically saying that one interrupt manager should have the single
responsibility of supporting one kind of interrupt (defined through its
configuration).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We create 2 different interrupt managers for separately handling
creation of legacy and MSI interrupt groups.
Doing so allows us to have a cleaner interrupt manager and IOAPIC
initialization path. It also prepares for an InterruptManager trait
design improvement where we remove the interrupt source type dependency
by associating an interrupt configuration type to the trait.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A reference to the VmFd is stored on the AddressManager so it is not
necessary to pass in the VmInfo into all methods that need it as it can
be obtained from the AddressManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The DeviceManager has a reference to the MemoryManager so use that to
get the GuestMemoryMmap rather than the version stored in the VmInfo
struct.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove the use of vm_info in methods to get the config and instead use
the config stored on the DeviceManager itself.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. This prepares the way to more easily store state on
the DeviceManager.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove some in/out parameters and instead rely on them as members of the
&mut self parameter. A follow-up commit will change the callee functions
that create the devices themselves.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Modify these functions to take an &mut self and become methods on
DeviceManager. This allows the removal of some in/out parameters and
leads the way to further refactoring and simplification.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The MemoryManager should only be included on the I/O bus when doing ACPI
builds as that is the only time it will be interrogated.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently the MemoryManager is only used on the ACPI code paths after
the DeviceManager has been created. This will change in a future commit
as part of the refactoring so for now always include it but name it with
underscore prefix to indicate it might not always be used.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that devices attached to the virtual IOMMU are described through
virtio configuration, there is no need for the DeviceManager to store
the list of IDs for all these devices. Instead, things are handled
locally when PCI devices are being added.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of relying on the ACPI tables to describe the devices attached
to the virtual IOMMU, let's use the virtio topology, as the ACPI support
is getting deprecated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-block and
vhost-user-block. For now it is necessary to specify both vhost_user
and socket parameters as auto activation is not yet implemented. The wce
parameter for supporting "Write Cache Enabling" is also added to the
disk configuration.
The original command line parameter is still supported for now and will
be removed in a future release.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-net and vhost-user-net.
For now it is necessary to specify both vhost_user and socket parameters
as auto activation is not yet implemented. The original command line
parameter is still supported for now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit improves the existing virtio-blk implementation, allowing
for better I/O performance. The cost for the end user is to accept
allocating more vCPUs to the virtual machine, so that multiple I/O
threads can run in parallel.
One thing to notice, the amount of vCPUs must be egal or superior to the
amount of queues dedicated to the virtio-blk device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>