205 Commits

Author SHA1 Message Date
Sebastien Boeuf
294c26bfb7 vmm: Add virtio-pmem support to cloud-hypervisor
This patch plumbs the virtio-pmem device to the VMM. By adding a new
command line option "--pmem", we can now expose some persistent memory
to the guest OS, backed by the provided source.

The point of having such support in cloud-hypervisor is to be able to
share some memory between the host and the guest as DAXable.
One interesting use case is to boot directly from an image passed
through virtio-pmem, instead of going through virtio-blk. This can
allow good performances while avoiding the guest cache, which would
prevent the VM memory footprint from growing too much.

Fixes #68

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-01 14:38:55 +01:00
Sebastien Boeuf
1cb2378499 vmm: Add support for multiple virtio-fs devices
Until now, the VMM was only accepting a single instance of a virtio-fs
device. This commit extend the virtio-fs support by allowing several
devices to be created for a single VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Sebastien Boeuf
53085c7ccc memory: Allow memory to be backed by a file
In the context of vhost-user, we need the guest RAM to be backed by
a file in order to be accessed by an external process. This patch
adds the new flag "file=" to the "--memory" option so that we can
specify from the command line if the memory needs to be backed, and
by which specific file.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Sebastien Boeuf
2ede30b6d3 vmm: Add virtio-fs support to the VMM
The user can now share some files and directories with the guest by
providing the corresponding vhost-user socket. The virtiofsd daemon
should be started by the user before to start the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Jing Liu
30266a41be vm-memory usage: vm-memory latest codes rename MmapError to Error
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-06-26 08:33:46 -07:00
Jing Liu
9da2343cb7 device: Improvement for BusDevice trait and PciDevice trait
BusDevice includes two methods which are only for PCI devices, which should
be as members of PciDevice trait for a better clean high level APIs.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-06-25 06:17:30 -07:00
Sebastien Boeuf
5e803ab18f vmm: Integrate userspace IOAPIC
The previous commit introduced a userspace implementation of an IOAPIC
and this commits aims to plumb it into the cloud-hypervisor VMM.

Here is the list of new things brought by this patch:
- Update the rust-vmm/kvm-ioctls dependency to benefit from latest
  patches including the support for split irqchip, and the vector
  being returned when a VM exit is caused by an EOI.
- Enable the split irqchip (which means no IOAPIC or PIC is emulated
  in kernel). This is done conditionally based on the support of the
  TSC_DEADLINE_TIMER from both KVM and the underlying CPU. The
  dependency on TSC_DEADLINE_TIMER is related to KVM which does not
  support creating the in kernel PIT if it has a split irqchip.
- Rely on callbacks to handle the following use cases:
  - in kernel IOAPIC + serial IRQ (pin based)
  - in kernel IOAPIC + virtio-pci MSI-X
  - in kernel IOAPIC + virtio-pci IRQ (pin based)
  - userspace IOAPIC + serial IRQ (pin based)
  - userspace IOAPIC + virtio-pci MSI-X
  - userspace IOAPIC + virtio-pci IRQ (pin based)

Fixes #13

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Sebastien Boeuf
c8c4a4d444 devices: Create Interrupt trait to abstract interrupt delivery
This commit anticipate the future need from having support for both
in kernel and userspace IOAPIC. The way to signal an interrupt from
the serial device will vary depending on the use case, but this should
be independent from the serial implementation itself.

That's why this patch provides a generic trait for the serial device
to call from, so that it can trigger interrupts independently from the
IOAPIC type chosen (in kernel vs userspace).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Cathy Zhang
429b53a672 vmm: Add bzimage loader support
VMM may load different format kernel image to start guest, we currently
only have elf loader support, so add bzimage loader support in case
that VMM would like to load bzimage.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2019-06-13 11:28:20 +01:00
Chao Peng
0f54429848 vmm: Move all the CPUID related code to CpuidPatch
As more CPUID handling and CpuidPatch common code being added, it's
reasonable to move all the common code to the same place and in the
future we may consider move it to individual file when neccesary.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Chao Peng
a0f4376eb0 vmm: Set the APIC ID in the extended topology
KVM exposes CPUID 0BH when host supports that, but the APIC ID that KVM
provides is the host APIC ID so we need replace that with ours.

Without this Linux guest reports something like:
[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 APIC: 21

Fixes #42

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Sebastien Boeuf
0d0d19e223 vmm: Enable TSC_DEADLINE_TIMER allows for PIT emulation removal
As mentioned in the KVM documentation, TSC_DEADLINE_TIMER feature
needs some special checks to validate that it is supported as the
cpuid will always report it as disabled.

We need to use the KVM_CHECK_EXTENSION ioctl to request the value
of KVM_CAP_TSC_DEADLINE_TIMER. In case it is supported through
the local APIC emulation provided by the CREATE_IRQCHIP in KVM,
we have to set manually this feature by patching the cpuid.

Here quoted from the KVM documentation:
```
The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always
returned as false, since the feature depends on KVM_CREATE_IRQCHIP
for local APIC support. Instead it is reported via

  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you
emulate the feature in userspace, then you can enable the feature
for KVM_SET_CPUID2.
```
This patch implements the behavior described above, and this allows
the VMM to remove the emulated Programmable Interval Timer (PIT) when
the TSC_DEADLINE_TIMER feature can be enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-10 09:11:47 -07:00
Sebastien Boeuf
24dbe7003a irq: Fix pin based interrupt for virtio-pci
When the KVM capability KVM_CAP_SIGNAL_MSI is not present, the VMM
falls back from MSI-X onto pin based interrupts. Unfortunately, this
was not working as expected because the VirtioPciDevice object was
always creating an MSI-X capability structure in the PCI configuration
space. This was causing the guest drivers to expect MSI-X interrupts
instead of the pin based generated ones.

This patch takes care of avoiding the creation of a dedicated MSI-X
capability structure when MSI is not supported by KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 18:19:52 +01:00
Sebastien Boeuf
47a4065aaf interrupt: Use a single closure to describe pin based and MSI-X
In order to factorize the complexity brought by closures, this commit
merges IrqClosure and MsixClosure into a generic InterruptDelivery one.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
8df05b72dc vmm: Add MSI-X support to virtio-pci devices
In order to allow virtio-pci devices to use MSI-X messages instead
of legacy pin based interrupts, this patch implements the MSI-X
support for cloud-hypervisor. The VMM code and virtio-pci bits have
been modified based on the "msix" module previously added to the pci
crate.

Fixes #12

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
13a065d2cd dep: Rely on latest kvm-ioctls crate
In order to have access to the newly added signal_msi() function
from the kvm-ioctls crate, this commit updates the version of the
kvm-ioctls to the latest one.

Because set_user_memory_region() has been swtiched to "unsafe", we
also need to handle this small change in our cloud-hypervisor code
directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
d3c7b45542 interrupt: Make IRQ delivery generic
Because we cannot always assume the irq fd will be the way to send
an IRQ to the guest, this means we cannot make the assumption that
every virtio device implementation should expect an EventFd to
trigger an IRQ.

This commit organizes the code related to virtio devices so that it
now expects a Rust closure instead of a known EventFd. This lets the
caller decide what should be done whenever a device needs to trigger
an interrupt to the guest.

The closure will allow for other type of interrupt mechanism such as
MSI to be implemented. From the device perspective, it could be a
pin based interrupt or an MSI, it does not matter since the device
will simply call into the provided callback, passing the appropriate
Queue as a reference. This design keeps the device model generic.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Rob Bradford
f63d4a7418 vm: Disable stdin and terminal reconfiguration when headless
When not running on a tty (tested with libc's isatty()) disable stdin
and do not reconfigure the terminal.

This is required to ensure that the VM responds correctly when running
in a headless environment such as Jenkins.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Rob Bradford
425841a4fe vm: Do not explictly exit on reset
Instead return from the control_loop() and calling function cleanly.
This is helpful for the testing framework as that means we can launch
multiple VMs in a row.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Sebastien Boeuf
e5e651895b config: Reorganize command line parsing
The command line parsing of the user input was not properly
abstracted from the vmm specific code. In the case of --net,
the parsing was done when the device manager was adding devices.

In order to fix this confusion, this patch introduces a new
module "config" dedicated to the translation of a VmParams
structure into a VmCfg structure. The former is built based
on the input provided by the user, while the latter is the
result of the parsing of every options.

VmCfg is meant to be consumed by the vmm specific code, and
it is also a fully public structure so that it can directly
be built from a testing environment.

Fixes #31

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-24 17:08:52 +01:00
Rob Bradford
a09f918adc main, vmm: Add support for multiple --disk options
Store the list of disks in a Vec<PathBuf> and then iterate over that
when creating the block devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-22 17:46:59 +01:00
Samuel Ortiz
8bb71fad76 vmm: Simplify the vcpu run switch
Use a catchall case for all reasons that we do not handle, and
move the vCPU run switch into its own function.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-21 07:56:17 -07:00
Sebastien Boeuf
c1f1fe713f vm: Propagate errors appropriately
In order to get meaningful error messages, we want to make sure all
errors are passed up the call stack. This patch fixes this previous
limitation by separating errors related to the DeviceManager from
errors related to the Vm.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-14 05:34:35 +01:00
Chao Peng
6ecdd98634 virtio: Enable qcow support for virtio-block
With this enabled, one can pass a QCOW format disk
image with '--disk' switch.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-05-13 22:08:29 +01:00
Samuel Ortiz
2c94529660 vmm: Propagate boot_kernel errors properly
So that our error traces are more meaningful.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-10 16:32:39 +02:00
Samuel Ortiz
83dadb818f vmm: Remove useless memory setting log
We don't really need to tell everyone where the host and guest memory
address is...

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-10 16:32:39 +02:00
Samuel Ortiz
3f38b42f05 vmm: Fix the Error enum comment
Our error handling is no longer only related to KVM ioctls.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-10 16:32:39 +02:00
Samuel Ortiz
cacce5f7c4 vmm: Use random local MAC address as the default one
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-10 16:32:39 +02:00
Sebastien Boeuf
5934f30fde vmm: Add support for letting the VMM create the TAP interface
Until now, the only way to get some networking with cloud-hypervisor
was to let the user create a TAP interface first, and then to provide
the name of this interface to the VMM.

This patch extend the previous behavior by adding the support for the
creation of a brand new TAP interface from the VMM itself. In case no
interface name is provided through "tap=<if_name>", we will assume
the user wants the VMM to create and set the interface on its behalf,
no matter the value of other parameters (ip, mask, and mac).
In this same scenario, because the user expects the VMM to create the
TAP interface, he can also provide the associated IP address and subnet
mask associated with it. In case those values are not provided, some
default ones will be picked.

No matter the value of "tap", the MAC address will always be set, and
if no value is provided, the VMM will come up with a default value for
it.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-10 16:32:39 +02:00
Chao Peng
8e7579b20e vm-virtio: Add virtio-rng implementation
Most of the code is taken from crosvm(bbd24c5) but is modified to
be adapted to the current VirtioDevice definition and epoll
implementation.

A new command option '--rng' is provided and it gives one the option
to override the entropy source which is /dev/urandom by default.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-05-10 16:32:39 +02:00
Chao Peng
97865b605f vmm: Provide a common method to build a virtio PCI device
Since more virtio devices will be added and this code can be reused
for any type of virtio device.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-05-10 16:32:39 +02:00
Sebastien Boeuf
c0be6642ad vmm: Leverage virtio-net to provide connectivity
This patch expand the device registration to add a new virtio-net
device in case the user provide the appropriate flag --net from the
command line.

If the flag is provided, the code will parse the TAP interface name
and the expected MAC address from the command line. The VM will be
connected to the provided TAP interface, and it will communicate the
MAC address to the virtio-net driver.

If the flag is not provided, the VM will not register any virtio-net
device, therefore it will not have any connectivity with the host.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-10 16:32:39 +02:00
Samuel Ortiz
040ea5432d cloud-hypervisor: Add proper licensing
Add the BSD and Apache license.
Make all crosvm references point to the BSD license.
Add the right copyrights and identifier to our VMM code.
Add Intel copyright to the vm-virtio and pci crates.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-09 15:44:17 +02:00
Samuel Ortiz
8f05773eae vmm: Fix build warning
Use the VM vcpus vector instead of creating a mutable one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-09 15:44:17 +02:00
Sebastien Boeuf
5c9fc816de serial: Set terminal in raw mode
In order to have proper output from the serial, we need to setup the
terminal in raw mode. When the VM is shutting down, it is also the
VMM responsibility to set the terminal back into canonical mode if we
don't want to get any weird behavior from the shell.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-09 15:44:17 +02:00
Sebastien Boeuf
112418d928 main: Add kernel command line support
In order to let the user choose which kernel parameters to append, the
kernel boot parameters can be now specified from the command line.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-09 15:44:01 +02:00
Sebastien Boeuf
1270d09301 cloud-hypervisor: Add --disk option to provide VM rootfs
Based on the new virtio-blk support, this commit allows any user to
specify a --disk option in order to select the rootfs it wants to
use for the VM.

For now it assumes the partition 3 /dev/vd3 is the one where we can
find the rootfs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-08 08:55:09 +02:00
Sebastien Boeuf
b67e0b3dad vmm: Use virtio-blk to support booting from disk image
After the virtio-blk device support has been introduced in the
previous commit, the vmm need to rely on this new device to boot
from disk images instead of initrd built into the kernel.

In order to achieve the proper support of virtio-blk, this commit
had to handle a few things:

  - Register an ioevent fd for each virtqueue. This important to be
    notified from the virtio driver that something has been written
    on the queue.

  - Fix the retrieval of 64bits BAR address. This is needed to provide
    the right address which need to be registered as the notification
    address from the virtio driver.

  - Fix the write_bar and read_bar functions. They were both assuming
    to be provided with an address, from which they were trying to
    find the associated offset. But the reality is that the offset is
    directly provided by the Bus layer.

  - Register a new virtio-blk device as a virtio-pci device from the
    vm.rs code. When the VM is started, it expects a block device to
    be created, using this block device as the VM rootfs.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-08 08:55:09 +02:00
Chao Peng
2a539ab176 vmm: Expose Hypervisor CPUID bit
This is required at least for kvm-clock.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-05-08 08:55:06 +02:00
Samuel Ortiz
0adc3481df vmm: Add PCI root
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:55:06 +02:00
Sebastien Boeuf
342bdc3619 devices: Add support for i8042 reset device
Introduce emulation of i8042 device to allow the guest to stop the
VM by issuing a reset event.

The device has been copied over from the Crosvm code base, relying on
the commit 0268e26e1ac9e09aa51d733482c5df139cd8d588.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-08 08:55:00 +02:00
Sebastien Boeuf
29b90a8aee vmm: Create and handle an exit event
An exit event is required to be created and handled for the purpose
of letting the guest kernel stop the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-08 08:40:42 +02:00
Sebastien Boeuf
afbf824a48 vmm: Handle stdin from a generic epoll loop
Instead of handling stdin in its own separate loop, we use a generic
one that can be reused for other events handling.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
a7bdf5ee48 vmm: Register an irqfd for our serial device
And get console input working.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
c6c5e10a04 vmm: Add a basic stdin loop
After starting all vCPUs, we loop for STDIN input.
We need a more scalable eventfd control loop, obviously.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
0b6ec34505 vmm: Retry running a CPU when getting EAGAIN or EINTR from the run ioctl
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
25f4063da6 cloud-hypervisor: Add the --memory option
You guessed it: To specify the amount of memory for the VM.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
59b5e53c40 cloud-hypervisor: Add the --cpus option
You guessed it: To specify the number of vcpus.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
1853b350ee cloud-hypervisor: Add devices crate
Based on the Firecracker devices crate from commit 9cdb5b2.

It is a trimmed down version compared to the Firecracker one, to remove
a bunch of pulled dependencies (logger, metrics, rate limiter, etc...).

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00
Samuel Ortiz
7e2d1aca2d vmm: Boot kernel
Our command line was not copied properly since we were not allocating
enough space for it.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-08 08:40:42 +02:00