We used to have errors definitions spread across vmm, vm, api,
and http.
We now have a cleaner separation: All API routines only return an
ApiResult. All VM operations, including the VMM wrappers, return a
VmResult. This makes it easier to carry errors up to the HTTP caller.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to support further use cases where a VM configuration could be
modified through the HTTP API, we only store the passed VM config when
being asked to create a VM. The actual creation will happen when booting
a new config for the first time.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We use the serde crate to serialize and deserialize the VmVConfig
structure. This structure will be passed from the HTTP API caller as a
JSON payload and we need to deserialize it into a VmConfig.
For a convenient use of the HTTP API, we also provide Default traits
implementations for some of the VmConfig fields (vCPUs, memory, etc...).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The linux_loader crate Cmdline struct is not serializable.
Instead of forcing the upstream create to carry a serde dependency, we
simply use a String for the passed command line and build the actual
CmdLine when we need it (in vm::new()).
Also, the cmdline offset is not a configuration knob, so we remove it.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
They point to a vm_virtio structure (VhostUserConfig) and in order to
make the whole config serializable (through the serde crate for
example), we'd have to add a serde dependency to the vm_virtio crate.
Instead we use a local, serializable structure and convert it to
VhostUserConfig from the DeviceManager code.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The kernel path was the only mandatory command line option.
With the addition of the --api-socket option, we can run without a
kernel path and get it later through the API.
Since we can end up with VM configurations that are no longer valid by
default, we need to provide a validation check for it. For now, if the
kernel path is not defined, the VM configuration is invalid.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Cloud Hyper HTTP server runs a synchronous, multi-threaded
loop that receives HTTP requests and tries to call the corresponding
endpoint handlers for the requests URIs.
An endpoint handler will parse the HTTP request and potentially
translate it into and IPC request. The handler holds an notifier and an
mspc Sender for respectively notifying and sending the IPC payload to
the VMM API server. The handler then waits for an API server response
and translate it back into an HTTP response.
The HTTP server is responsible for sending the reponse back to the
caller.
The HTTP server uses a static routes hash table that maps URIs to
endpoint handlers.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The cloud-hypervisor API uses HTTP as a transport and is accessible
through a local UNIX socket.
The API root path is /api/v1 and is a collection of RPC-style methods.
All methods are static, unlike typical REST APIs. Variable (e.g. device
IDs) are passed through the request body.
Fixes: #244
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based off of crosvm revision b5237bbcf074eb30cf368a138c0835081e747d71
add a CMOS device. This environments that can't use KVM clock to get the
current time (e.g. Windows and EFI.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Refactor the PCI datastructures to move the device ownership to a PciBus
struct. This PciBus struct can then be used by both a PciConfigIo and
PciConfigMmio in order to expose the configuration space via both IO
port and also via MMIO for PCI MMCONFIG.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to avoid introducing a dependency on arch in the devices crate
pass the constant in to the IOAPIC device creation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Using the existing layout module start documenting the major regions of
RAM and those areas that are reserved. Some of the constants have also
been renamed to be more consistent and some functions that returned
constant variables have been replaced.
Future commits will move more constants into this file to make it the
canonical source of information about the memory layout.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We now start the main VMM thread, which will be listening for VM and IPC
related events.
In order to start the configured VM, we no longer directly call the VM
API but we use the IPC instead, to first create and then start a VM.
Fixes: #303
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Based on the newly defined Cloud Hypervisor IPC, those helpers send
VmCreate and VmStart requests respectively. This will be used by the
main thread to create and start a VM based on the CLI parameters.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This starts the main, single VMM thread, which:
1. Creates the VMM instance
2. Starts the VMM control loop
3. Manages the VMM control loop exits for handling resets and shutdowns.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Unlike the Vmm structure we removed with commit bdfd1a3f, this new one
is really meant to represent the VM monitoring/management object.
For that, we implement a control loop that will replace the one that's
currently embedded within the Vm structure itself.
This will allow us to decouple the VM lifecycle management from the VM
object itself, by having a constantly running VMM control loop.
Besides the VM specific events (exit, reset, stdin for now), the VMM
control loop also handles all the Cloud Hypervisor IPC requests.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The VMM thread and control loop will be the sole consumer of the
EpollContext and EpollDispatch API, so let's move it to lib.rs.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Cloud Hypervisor IPC is a simple, mpsc based protocol for threads to
send command to the furture VMM thread. This patch adds the API
definition for that IPC, which will be used by both the main thread
to e.g. start a new VM based on the CLI arguments and the future HTTP
server to relay external requests received from a local Unix domain
socket.
We are moving it to its own "api" module because this is where the
external API (HTTP based) will also be implemented.
The VMM thread will be listening for IPC requests from an mpsc receiver,
process them and send a response back through another mpsc channel.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As we're going to move the control loop to the VMM thread, the exit and
reset EventFds are no longer going to be owned by the VM.
We pass a copy of them when creating the Vm instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to handle the VM STDIN stream from a separate VMM thread
without having to export the DeviceManager, we simply add a console
handling method to the Vm structure.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to transfer the control loop to a separate VMM thread, we want
to shrink the VM control loop to a bare minimum.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Once passed to the VM creation routine, a VmConfig structure is
immutable. We can simply carry a Arc of it instead of a reference.
This also allows us to remove any lifetime bound from our VM.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Vmm structure is just a placeholder for the KVM instance. We can
create it directly from the VM creation routine instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can integrate the kernel loading into the VM start method.
The VM start flow is then: Vm::new() -> vm.start(), which feels more
natural.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Convert Path to PathBuf and remove the associated lifetime.
Now we can remove the VmConfig associated lifetime.
Fixes#298
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Since vhost-user-blk use same error definition with vhost-user-net,
those errors need define to common usage.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Probe for the size of the host physical address range and use that to
establish the address range for the VM. This removes the limitation on
the size of the VM RAM and gives more space for the devices.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
After the 32-bit gap the memory is shared between the devices and the
RAM. Ensure that the ACPI tables correctly indicate where the RAM ends
and the device area starts by patching the precompiled tables. We get
the following valid output now from the PCI bus probing (8GiB guest)
[ 0.317757] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
[ 0.319035] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
[ 0.320215] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[ 0.321431] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfebfffff window]
[ 0.322613] pci_bus 0000:00: resource 8 [mem 0x240000000-0xfffffffff window]
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rerrange "use" statements and make rename variables and fields to
indicate they might be unused.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This removes the register_devices() function with all that functionality
spread across the places where the devices are created.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Mark exit_evt with an underscore it may be unused (it is ignored if the
"acpi" feature is not turned on.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add (non-default) support for using MMIO for virtio devices. This can be
tested by:
cargo build --no-default-features --features "mmio"
All necessary options will be included injected into the kernel
commandline.
Fixes: #243
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than calling it at the very start of the VM execution (i.e. when
the VCPUs are created) do it as part of the DeviceManager creation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create the virtio devices independently of adding them to the PCI bus.
Instead accrue the devices in a vector and add them to the bus en-masse.
This will allow the virtio device creation to be used independently of
PCI based transport.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than sending a signal to the signal handler used for handling
SIGWINCH calls instead use the crate provided termination method. This
also unregisters the signal handler which also means that there won't be
a leaked signal handler remaining.
This leaked signal handler is what was causing a failure to cleanup up
the thread on subsequent requests breaking two reboots in a row.
Fixes: #252
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With ACPI disabled there is no way to support both reset and shutdown so
make the VMM exit if the VM is rebootet (via i8042 or triple-fault
reset.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit relies on the new vsock::unix module to create the backend
that will be used from the virtio-vsock device.
The concept of backend is interesting here as it would allow for a vhost
kernel backend to be plugged if that was needed someday.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on previous patch introducing the new flag "--vsock", this commit
creates a new virtio-vsock device based on the presence of this flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The new flag vsock is meant to be used in order to create a VM with a
virtio-vsock device attached to it. Two parameters are needed with this
device, "cid" representing the guest context ID, and "sock" representing
the UNIX socket path which can be accessed from the host.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The default number of MSI-X vector allocated was 2, which is the minimum
defined by the virtio specification. The reason for this minimum is that
virtio needs at least one interrupt to signal that configuration changed
and at least one to specify something happened regarding the virtqueues.
But this current implementation is not optimal because our VMM supports
as many MSI-X vectors as allowed by the MSI-X specification (2048 max).
For that reason, the current patch relies on the number of virtqueues
needed by the virtio device to determine the right amount of MSI-X
vectors needed. It's important not to forget the dedicated vector for
any configuration change too.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Refactor out DeviceManager into it's own file. This is part of a bigger
effort to reduce complexity in the vm.rs file but will also allow future
separation to allow making PCI support optional.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The command "cargo build --no-default-features" does not recursively
disable the default features across the workspace. Instead add an acpi
feature at the top-level, making it default, and then make that feature
conditional on all the crate acpi features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For virtio-fs and virtio-pmem regions of memory are manually mapped into
the address space of the VMM. In order to cleanly reboot we need to
unmap those regions.
Fixes: #223
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Do this by using the same mechanism as the vCPU threads by sending a
signal to the thread. As this is the same mechanism reuse the same code
and rename the "vcpus" member to "threads" to indicate this represents
both the vCPU threads and also the signal handler thread.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Put the ACPI support behind a feature and ensure that the code compiles
without that feature by adding an extra build to Travis.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As part of the cleanup of the VM shutdown all the vCPU threads. This is
achieved by toggling a shared atomic boolean variable which is checked
in the vCPU loop. To trigger the vCPU code to look at this boolean it is
necessary to send a signal to the vCPU which will interrupt the running
KVM_RUN ioctl.
Fixes: #229
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Being able to reboot requires us to identify all the resources we are
leaking and cleaning those up before we can enable reboot. For now if
the user requests a reboot then shutdown instead.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Sadly only the first few characters of the thread name is preserved so
use a shorter name for the vCPU thread for now. Also give the signal
handling thread a name.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add an I/O port "device" to handle requests from the kernel to shutdown
or trigger a reboot, borrowing an I/O used for ACPI on the Q35 platform.
The details of this I/O port are included in the FADT
(SLEEP_STATUS_REG/SLEEP_CONTROL_REG/RESET_REG) with the details of the
value to write in the FADT for reset (RESET_VALUE) and in the DSDT for
shutdown (S5 -> 0x05)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a 2nd EventFd to the VM to control resetting (rebooting) the VM this
supplements the EventFd used for managing shutdown of the VM.
The default behaviour on i8042 or triple-fault based reset is currently
unchanged i.e. it will trigger a shutdown.
In order to support restarting the VM it was necessary to make start()
function take a reference to the config.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Only add the ACPI PNP device for the COM1 serial port if it is not
turned off with "--serial off"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently when the VCPU thread exits on an error the VMM continues to
run with no way of shutting down the main thread.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a revision 2 RSDP table only supporting an XSDT along with support
for creating generic SDT based tables.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This patch factorizes the existing virtio-fs code by relying onto the
common code part of the vhost_user module in the vm-virtio crate.
In details, it factorizes the vhost-user setup, and reuses the error
types defined by the module instead of defining its own types.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
vhost-user-net introduced a new module vhost_user inside the vm-virtio
crate. Because virtio-fs is actually vhost-user-fs, it belongs to this
new module and needs to be moved there.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The currently directory handling process to open tempfile by
OpenOptions with custom_flags(O_TMPFILE) is workable for tmp
filesystem, but not workable for hugetlbfs, add new directory
handling process which works fine for both tmpfs and hugetlbfs.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Following the refactoring of the code allowing multiple threads to
access the same instance of the guest memory, this patch goes one step
further by adding RwLock to it. This anticipates the future need for
being able to modify the content of the guest memory at runtime.
The reasons for adding regions to an existing guest memory could be:
- Add virtio-pmem and virtio-fs regions after the guest memory was
created.
- Support future hotplug of devices, memory, or anything that would
require more memory at runtime.
Because most of the time, the lock will be taken as read only, using
RwLock instead of Mutex is the right approach.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The VMM guest memory was cloned (copied) everywhere the code needed to
have ownership of it. In order to clean the code, and in anticipation
for future support of modifying this guest memory instance at runtime,
it is important that every part of the code share the same instance.
Because VirtioDevice implementations need to have access to it from
different threads, that's why Arc must be used in this case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Latest clippy version complains about our existing code for the
following reasons:
- trait objects without an explicit `dyn` are deprecated
- `...` range patterns are deprecated
- lint `clippy::const_static_lifetime` has been renamed to
`clippy::redundant_static_lifetimes`
- unnecessary `unsafe` block
- unneeded return statement
All these issues have been fixed through this patch, and rustfmt has
been run to cleanup potential formatting errors due to those changes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We timestamp the VM creation time, and log the elapsed time between that
instant and the debug ioport events.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The 0x80 IO port is typically used for BIOS debugging and testing on
bare metal x86 platforms.
We use that port and its dedicated 16 debug codes to time and track the
guest boot process.
Fixes#63
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the cache_size parameter from virtio-fs device is not empty, the
VMM creates a dedicated memory region where the shared files will be
memory mapped by the virtio-fs device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support the more performant version of virtio-fs, that is
the one relying on a shared memory region between host and guest, we
introduce two new parameters to the --fs device.
The "dax" parameter allows the user to choose if he wants to use the
shared memory region with virtio-fs. By default, this parameter is "on".
The "cache_size" parameter allows the user to specify the amount of
memory that should be shared between host and guest. By default, the
value of this parameter is 8Gib as advised by virtio-fs maintainers.
Note that dax=off and cache_size are incompatible.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>