Explicit call to 'close()' is required on file descriptors allocated
from 'epoll::create()', which is missing for the 'EpollContext' and
'VringWorker'. This patch enforces to close the file descriptors by
reusing the Drop trait of the 'File' struct.
Signed-off-by: Bo Chen <chen.bo@intel.com>
This module will be dedicated to DeviceNode and DeviceTree definitions
along with some dedicated unit tests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adds DeviceManager method `make_virtio_fs_device` which creates a single
device, and modifies `make_virtio_fs_devices` to use this method.
Implements the new `vm.add-fs route`.
Signed-off-by: Dean Sheather <dean@coder.com>
Now that the restore path uses RestoreConfig structure, we add a new
parameter called "prefault" to it. This will give the user the ability
to populate the pages corresponding to the mapped regions backed by the
snapshotted memory files.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This connects the dots together, making the request from the user reach
the actual implementation for restoring the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This connects the dots together, making the request from the user reach
the actual implementation for snapshotting the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is only implementing the send() function in order to store all Vm
states into a file.
This needs to be extended for live migration, by adding more transport
methods, and also the recv() function must be implemented.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By aggregating snapshots from the CpuManager, the MemoryManager and the
DeviceManager, Vm implements the snapshot() function from the
Snapshottable trait.
And by restoring snapshots from the CpuManager, the MemoryManager and
the DeviceManager, Vm implements the restore() function from the
Snapshottable trait.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Implement the Snapshottable trait for Vcpu, and then implements it for
CpuManager. Note that CpuManager goes through the Snapshottable
implementation of Vcpu for every vCPU in order to implement the
Snapshottable trait for itself.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
A Snapshottable component can snapshot itself and
provide a MigrationSnapshot payload as a result.
A MigrationSnapshot payload is a map of component IDs to a list of
migration sections (MigrationSection). As component can be made of
several Migratable sub-components (e.g. the DeviceManager and its
device objects), a migration snapshot can be made of multiple snapshot
itself.
A snapshot is a list of migration sections, each section being a
component state snapshot. Having multiple sections allows for easier and
backward compatible migration payload extensions.
Once created, a migratable component snapshot may be transported and this
is what the Transportable trait defines, through 2 methods: send and recv.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
This separates the filters used between the VMM and API threads, so that
we can apply different rules for each thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a new id option to the VFIO hotplug command so that it matches the
VFIO coldplug semantic.
This is done by refactoring the existing code for VFIO hotplug, where
VmAddDeviceData structure is replaced by DeviceConfig. This structure is
the one used whenever a VFIO device is coldplugged, which is why it
makes sense to reuse it for the hotplug codepath.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "remove-device" that will let a
user hot-unplug a VFIO PCI device from an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the new command "add-device" that will let a user
hotplug a VFIO PCI device to an already running VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The kernel does not adhere to the ACPI specification (probably to work
around broken hardware) and rather than busy looping after requesting an
ACPI reset it will attempt to reset by other mechanisms (such as i8042
reset.)
In order to trigger a reset the devices write to an EventFd (called
reset_evt.) This is used by the VMM to identify if a reset is requested
and make the VM reboot. As the reset_evt is part of the VMM and reused
for both the old and new VM it is possible for the newly booted VM to
immediately get reset as there is an old event sitting in the EventFd.
The simplest solution is to "drain" the reset_evt EventFd on reboot to
make sure that there is no spurious events in the EventFd.
Fixes: #783
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If no socket is supplied when enabling "vhost_user=true" on "--net"
follow the "exe" path in the /proc entry for this process and launch the
network backend (via the vmm_path field.)
Currently this only supports creating a new tap interface as the network
backend also only supports that.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It is necessary to do this at the start of the VMM execution rather than
later as it must be done in the main thread in order to satisfy the
checks required by PTRACE_MODE_READ_FSCREDS (see proc(5) and
ptrace(2))
The alternative is to run as CAP_SYS_PTRACE but that has its
disadvantages.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit introduces an empty implementation of both InterruptManager
and InterruptSourceGroup traits, as a proper basis for further
implementation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For now the new memory size is only used after a reboot but support for
hotplugging memory will be added in a later commit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to support resizing either vCPUs or memory or both
make the fields in the resize command optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows us to change the memory map that is being used by the
devices via an atomic swap (by replacing the map with another one). The
ArcSwap provides the mechanism for atomically swapping from to another
whilst still giving good read performace. It is inside an Arc so that we
can use a single ArcSwap for all users.
Not covered by this change is replacing the GuestMemoryMmap itself.
This change also removes some vertical whitespace from use blocks in the
files that this commit also changed. Vertical whitespace was being used
inconsistently and broke rustfmt's behaviour of ordering the imports as
it would only do it within the block.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The memory manager is responsible for setting up the guest memory and in
the long term will also handle addition of guest memory.
In this commit move code for creating the backing memory and populating
the allocator into the new implementation trying to make as minimal
changes to other code as possible.
Follow on commits will further reduce some of the duplicated code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently only increasing the number of vCPUs is supported but in the
future it will be extended.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When vmm.ping give a response, we expect get the version from
the VMM not the vmm create
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
vmm.ping will help to check if http API server is up and
running.
This also removes the vmm.info endpoint.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Move CpuManager, Vcpu and related functionality to its own module (and
file) inside the VMM crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove ACPI table creation from arch crate to the vmm crate simplifying
arch::configure_system()
GuestAddress(0) is used to mean no RSDP table rather than adding
complexity with a conditional argument or an Option type as it will
evaluate to a zero value which would be the default anyway.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In vm_reboot, while build the new vm, the old one pointed by self.vm
is not released, that is, the tap opened by self.vm is not closed
either. As a result, the associated dev name slot in host kernel is
still in use state, which prevents the new build from picking it up as
the new opened tap's name, but to use the name in next slot finally.
Call self.vm_shutdown instead here since it has call take() on vm reference,
which could ensure the old vm is destructed before the new vm build.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
In order to pause a VM, we signal all the vCPU threads to get them out
of vmx non-root. Once out, the vCPU thread will check for a an atomic
pause boolean. If it's set to true, then the thread will park until
being resumed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
So that we don't need to forward an ExitBehaviour up to the VMM thread.
This simplifies the control loop and the VMM thread even further.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We used to have errors definitions spread across vmm, vm, api,
and http.
We now have a cleaner separation: All API routines only return an
ApiResult. All VM operations, including the VMM wrappers, return a
VmResult. This makes it easier to carry errors up to the HTTP caller.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to support further use cases where a VM configuration could be
modified through the HTTP API, we only store the passed VM config when
being asked to create a VM. The actual creation will happen when booting
a new config for the first time.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We use the serde crate to serialize and deserialize the VmVConfig
structure. This structure will be passed from the HTTP API caller as a
JSON payload and we need to deserialize it into a VmConfig.
For a convenient use of the HTTP API, we also provide Default traits
implementations for some of the VmConfig fields (vCPUs, memory, etc...).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Cloud Hyper HTTP server runs a synchronous, multi-threaded
loop that receives HTTP requests and tries to call the corresponding
endpoint handlers for the requests URIs.
An endpoint handler will parse the HTTP request and potentially
translate it into and IPC request. The handler holds an notifier and an
mspc Sender for respectively notifying and sending the IPC payload to
the VMM API server. The handler then waits for an API server response
and translate it back into an HTTP response.
The HTTP server is responsible for sending the reponse back to the
caller.
The HTTP server uses a static routes hash table that maps URIs to
endpoint handlers.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We now start the main VMM thread, which will be listening for VM and IPC
related events.
In order to start the configured VM, we no longer directly call the VM
API but we use the IPC instead, to first create and then start a VM.
Fixes: #303
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based on the newly defined Cloud Hypervisor IPC, those helpers send
VmCreate and VmStart requests respectively. This will be used by the
main thread to create and start a VM based on the CLI parameters.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This starts the main, single VMM thread, which:
1. Creates the VMM instance
2. Starts the VMM control loop
3. Manages the VMM control loop exits for handling resets and shutdowns.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Unlike the Vmm structure we removed with commit bdfd1a3f, this new one
is really meant to represent the VM monitoring/management object.
For that, we implement a control loop that will replace the one that's
currently embedded within the Vm structure itself.
This will allow us to decouple the VM lifecycle management from the VM
object itself, by having a constantly running VMM control loop.
Besides the VM specific events (exit, reset, stdin for now), the VMM
control loop also handles all the Cloud Hypervisor IPC requests.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The VMM thread and control loop will be the sole consumer of the
EpollContext and EpollDispatch API, so let's move it to lib.rs.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Cloud Hypervisor IPC is a simple, mpsc based protocol for threads to
send command to the furture VMM thread. This patch adds the API
definition for that IPC, which will be used by both the main thread
to e.g. start a new VM based on the CLI arguments and the future HTTP
server to relay external requests received from a local Unix domain
socket.
We are moving it to its own "api" module because this is where the
external API (HTTP based) will also be implemented.
The VMM thread will be listening for IPC requests from an mpsc receiver,
process them and send a response back through another mpsc channel.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As we're going to move the control loop to the VMM thread, the exit and
reset EventFds are no longer going to be owned by the VM.
We pass a copy of them when creating the Vm instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to transfer the control loop to a separate VMM thread, we want
to shrink the VM control loop to a bare minimum.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Once passed to the VM creation routine, a VmConfig structure is
immutable. We can simply carry a Arc of it instead of a reference.
This also allows us to remove any lifetime bound from our VM.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Vmm structure is just a placeholder for the KVM instance. We can
create it directly from the VM creation routine instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can integrate the kernel loading into the VM start method.
The VM start flow is then: Vm::new() -> vm.start(), which feels more
natural.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With ACPI disabled there is no way to support both reset and shutdown so
make the VMM exit if the VM is rebootet (via i8042 or triple-fault
reset.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor out DeviceManager into it's own file. This is part of a bigger
effort to reduce complexity in the vm.rs file but will also allow future
separation to allow making PCI support optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Being able to reboot requires us to identify all the resources we are
leaking and cleaning those up before we can enable reboot. For now if
the user requests a reboot then shutdown instead.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a 2nd EventFd to the VM to control resetting (rebooting) the VM this
supplements the EventFd used for managing shutdown of the VM.
The default behaviour on i8042 or triple-fault based reset is currently
unchanged i.e. it will trigger a shutdown.
In order to support restarting the VM it was necessary to make start()
function take a reference to the config.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The command line parsing of the user input was not properly
abstracted from the vmm specific code. In the case of --net,
the parsing was done when the device manager was adding devices.
In order to fix this confusion, this patch introduces a new
module "config" dedicated to the translation of a VmParams
structure into a VmCfg structure. The former is built based
on the input provided by the user, while the latter is the
result of the parsing of every options.
VmCfg is meant to be consumed by the vmm specific code, and
it is also a fully public structure so that it can directly
be built from a testing environment.
Fixes#31
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use a catchall case for all reasons that we do not handle, and
move the vCPU run switch into its own function.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based on the Firecracker devices crate from commit 9cdb5b2.
It is a trimmed down version compared to the Firecracker one, to remove
a bunch of pulled dependencies (logger, metrics, rate limiter, etc...).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>