Commit Graph

86 Commits

Author SHA1 Message Date
Sebastien Boeuf
1a484a82f9 vmm: Don't break from epoll loop on EINTR
The existing code taking care of the epoll loop was too restrictive as
it was propagating the error returned from the epoll_wait() syscall, no
matter what was the error. This causes the epoll loop to be broken,
leading to the VM termination.

This patch enforces the parsing of the returned error and prevent from
the error propagation in case it is EINTR, which stands for Interrupted.
In case the epoll loop is interrupted, it is appropriate to retry.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-02 08:37:34 +01:00
Sebastien Boeuf
532f6a96f3 vmm: Factorize VM related information into a structure
In order to fix the clippy error complaining about the number of
arguments passed to a function exceeding the maximum of 7 arguments,
this patch factorizes those parameters into a more global one called
VmInfo.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-02 08:35:16 +01:00
Sebastien Boeuf
c0756c429d vmm: Increase memory slot from virtio-pmem
Since virtio-pmem uses a KVM user memory region, it needs to increment
the slot index in use to prevent from any conflict with further VFIO
allocations (used for mapping mappable memory BARs).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-08-02 08:35:16 +01:00
Samuel Ortiz
fa41ddd94f arch: Add a Reserved memory region to the memory hole
We add a Reserved region type at the end of the memory hole to prevent
32-bit devices allocations to overlap with architectural address ranges
like IOAPIC, TSS or APIC ones.

Eventually we should remove that reserved range by allocating all the
architectural ranges before letting 32-bit devices use the memory hole.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-25 11:45:38 +01:00
Samuel Ortiz
299d887856 arch: Add SubRegion memory type
We want to be able to differentiate between memory regions that must be
managed separately from the main address space (e.g. the 32-bit memory
hole) and ones that are reserved (i.e. from which we don't want to allow
the VMM to allocate address ranges.

We are going to use a reserved memory region for restricting the 32-bit
memory hole from expanding beyond the IOAPIC and TSS addresses.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
d92d797896 vfio: Update memory slot index to support multiple VFIO devices
In order to correctly support multiple VFIO devices, we need to
increment the memory slot index every time it is being used to set some
user memory region through KVM. That's why the mem_slot parameter is
made mutable.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
b9f677c46c vmm: Fix the memory slot index
The memory slot index provided to the DeviceManager was wrong since
only the RAM memory regions are set as user memory regions to KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Sebastien Boeuf
b5eab43aa5 vfio: Create a global KVM VFIO device for all VFIO devices
KVM does not support multiple KVM VFIO devices to be created when
trying to support multiple VFIO devices. This commit creates one
global KVM VFIO device being shared with every VFIO device, which
makes possible the support for passing several devices through the
VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-25 11:45:38 +01:00
Samuel Ortiz
4d16ca8ae7 vmm: Support direct device assignment
With the VFIO crate, we can now support directly assigned PCI devices
into cloud-hypervisor guests.

We support assigning multiple host devices, through the --device command
line parameter. This parameter takes the host device sysfs path.

Fixes: #60

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-24 11:55:08 +02:00
Samuel Ortiz
4e48309660 vm: Factorize all virtio devices creation routines
Our DeviceManager::new() routine is reaching north of 250 lines.
For simplicity and readbility sake, extract all virtio devices creation
code into their own routines.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-23 08:41:37 +01:00
fazlamehrab
24438e0390 vm-virtio: Enable the vmm support for virtio-console
To use the implemented virtio console device, the users can select one
of the three options ("off", "tty" or "file=/path/to/the/file") with
the command line argument "--console". By default, the console is
enabled as a device named "hvc0" (option: tty). When "off" option is
used, the console device is not added to the VM configuration at all.

Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
2019-07-22 23:08:56 +01:00
Sebastien Boeuf
f98a69f42e vm-allocator: Introduce an MMIO hole address allocator
With this new AddressAllocator as part of the SystemAllocator, the
VMM can now decide with finer granularity where to place memory.

By allocating the RAM and the hole into the MMIO address space, we
ensure that no memory will be allocated by accident where the RAM or
where the hole is.
And by creating the new MMIO hole address space, we create a subset
of the entire MMIO address space where we can place 32 bits BARs for
example.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-22 09:51:16 -07:00
Samuel Ortiz
0a04a950a1 vm-allocator: Expand the IRQ allocation API to support GSI
GSI (Global System Interrupt) is an extension of just a linear array of
IRQs. It takes IOAPICs into account for example.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-22 09:51:16 -07:00
Chao Peng
96fb38a5aa vm-allocator: Align address at allocation time
There is alignment support for AddressAllocator but there are occations
that the alignment is known only when we call allocate(). One example
is PCI BAR which is natually aligned, means for which we have to align
the base address to its size.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-07-22 09:51:16 -07:00
Chao Peng
af7cd74e04 vm-allocator: Make port IO non optional
This is only for allocating the port IO address range.
If a platform does not have PIO devices at all, the address
range will simply be unused.
So, simplify the vm-allocator data structure by making both
MMIO and PIO mandatory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-07-22 09:51:16 -07:00
Sebastien Boeuf
1268165040 pci: Allow for registering IO and Memory BAR
This patch adds the support for both IO and Memory BARs by expecting
the function allocate_bars() to identify the type of each BAR.
Based on the type, register_mapping() insert the address range on the
appropriate bus (PIO or MMIO).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-22 09:50:10 -07:00
Rob Bradford
cb81f8be5b vmm: Make serial port controllable via command line
Add a "--serial" command line that takes as input either "off", "tty"
(default and current behaviour) and "file=/path/to/file".

When "--serial off" is used the serial device is not added to the VM
configuration at all.

Integration tests added that check for interrupts present (or not) and
that when sending to a file the file contains the expected serial
output.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-07-11 12:17:58 +01:00
Samuel Ortiz
7ed073805d config: Fix default memory size parameter
We need to give it a suffix.

Fixes: #96

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-07-10 08:35:10 +02:00
Sebastien Boeuf
d9ce29117e vmm: Flag --disk should be optional
Now that cloud-hypervisor VMM supports virtio-pmem, it can directly
boot a VM from an image exposed as a persistent memory block device.

That's why there is no need to force the --disk option as being
mandatory.

Fixes #90

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-09 21:58:02 +02:00
Sebastien Boeuf
f0a76ad424 vmm: Add support for multiple virtio-net devices
Until now, the VMM was only accepting a single instance of virtio-net
device. This commit extends the virtio-net support by allowing several
devices to be created for a single VM.

Fixes #71

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-09 18:55:30 +01:00
Sebastien Boeuf
a2947f9a9f cli: Accept K,M,G suffixes for size parameters
For every parameter dealing with a size as option, such as memory or
virtio-pmem, the CLI can now parse sizes with the suffixes K, M or G.

Fixes #70

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-09 15:22:26 +01:00
Jing Liu
2bb0b22cc1 pci: Refine pci topology
PciConfigIo is a legacy pci bus dispatcher, which manages all pci
devices including a pci root bridge. However, it is unnecessary to
design a complex hierarchy which redirects every access by PciRoot.

Since pci root bridge is also a pci device instance, and only contains
easy config space read/write, and PciConfigIo actually acts as a pci bus
to dispatch resource based resolving when VMExit, we re-arrange to make
the pci hierarchy clean.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-07-09 10:01:18 +02:00
Rob Bradford
49d6b495d5 vmm: Remove println! from debugging
Remove println! left over from virtio-fs development.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-07-02 13:50:50 +02:00
Sebastien Boeuf
34e09923a5 vmm: Add support for multiple virtio-pmem devices
Until now, the VMM was only accepting a single instance of virtio-pmem
device. This commit extend the virtio-pmem support by allowing several
devices to be created for a single VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-01 14:38:55 +01:00
Sebastien Boeuf
294c26bfb7 vmm: Add virtio-pmem support to cloud-hypervisor
This patch plumbs the virtio-pmem device to the VMM. By adding a new
command line option "--pmem", we can now expose some persistent memory
to the guest OS, backed by the provided source.

The point of having such support in cloud-hypervisor is to be able to
share some memory between the host and the guest as DAXable.
One interesting use case is to boot directly from an image passed
through virtio-pmem, instead of going through virtio-blk. This can
allow good performances while avoiding the guest cache, which would
prevent the VM memory footprint from growing too much.

Fixes #68

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-07-01 14:38:55 +01:00
Sebastien Boeuf
1cb2378499 vmm: Add support for multiple virtio-fs devices
Until now, the VMM was only accepting a single instance of a virtio-fs
device. This commit extend the virtio-fs support by allowing several
devices to be created for a single VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Sebastien Boeuf
53085c7ccc memory: Allow memory to be backed by a file
In the context of vhost-user, we need the guest RAM to be backed by
a file in order to be accessed by an external process. This patch
adds the new flag "file=" to the "--memory" option so that we can
specify from the command line if the memory needs to be backed, and
by which specific file.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Sebastien Boeuf
2ede30b6d3 vmm: Add virtio-fs support to the VMM
The user can now share some files and directories with the guest by
providing the corresponding vhost-user socket. The virtiofsd daemon
should be started by the user before to start the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-27 21:46:00 +02:00
Jing Liu
30266a41be vm-memory usage: vm-memory latest codes rename MmapError to Error
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-06-26 08:33:46 -07:00
Jing Liu
9da2343cb7 device: Improvement for BusDevice trait and PciDevice trait
BusDevice includes two methods which are only for PCI devices, which should
be as members of PciDevice trait for a better clean high level APIs.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-06-25 06:17:30 -07:00
Sebastien Boeuf
5e803ab18f vmm: Integrate userspace IOAPIC
The previous commit introduced a userspace implementation of an IOAPIC
and this commits aims to plumb it into the cloud-hypervisor VMM.

Here is the list of new things brought by this patch:
- Update the rust-vmm/kvm-ioctls dependency to benefit from latest
  patches including the support for split irqchip, and the vector
  being returned when a VM exit is caused by an EOI.
- Enable the split irqchip (which means no IOAPIC or PIC is emulated
  in kernel). This is done conditionally based on the support of the
  TSC_DEADLINE_TIMER from both KVM and the underlying CPU. The
  dependency on TSC_DEADLINE_TIMER is related to KVM which does not
  support creating the in kernel PIT if it has a split irqchip.
- Rely on callbacks to handle the following use cases:
  - in kernel IOAPIC + serial IRQ (pin based)
  - in kernel IOAPIC + virtio-pci MSI-X
  - in kernel IOAPIC + virtio-pci IRQ (pin based)
  - userspace IOAPIC + serial IRQ (pin based)
  - userspace IOAPIC + virtio-pci MSI-X
  - userspace IOAPIC + virtio-pci IRQ (pin based)

Fixes #13

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Sebastien Boeuf
c8c4a4d444 devices: Create Interrupt trait to abstract interrupt delivery
This commit anticipate the future need from having support for both
in kernel and userspace IOAPIC. The way to signal an interrupt from
the serial device will vary depending on the use case, but this should
be independent from the serial implementation itself.

That's why this patch provides a generic trait for the serial device
to call from, so that it can trigger interrupts independently from the
IOAPIC type chosen (in kernel vs userspace).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Rob Bradford
c4c8b9314d build: Switch over to using rust-vmm linux-loader crate
With everything now merged upstream we no longer need to rely on Cathy's
fork.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-13 11:28:20 +01:00
Cathy Zhang
429b53a672 vmm: Add bzimage loader support
VMM may load different format kernel image to start guest, we currently
only have elf loader support, so add bzimage loader support in case
that VMM would like to load bzimage.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2019-06-13 11:28:20 +01:00
Chao Peng
0f54429848 vmm: Move all the CPUID related code to CpuidPatch
As more CPUID handling and CpuidPatch common code being added, it's
reasonable to move all the common code to the same place and in the
future we may consider move it to individual file when neccesary.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Chao Peng
a0f4376eb0 vmm: Set the APIC ID in the extended topology
KVM exposes CPUID 0BH when host supports that, but the APIC ID that KVM
provides is the host APIC ID so we need replace that with ours.

Without this Linux guest reports something like:
[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 APIC: 21

Fixes #42

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Sebastien Boeuf
0d0d19e223 vmm: Enable TSC_DEADLINE_TIMER allows for PIT emulation removal
As mentioned in the KVM documentation, TSC_DEADLINE_TIMER feature
needs some special checks to validate that it is supported as the
cpuid will always report it as disabled.

We need to use the KVM_CHECK_EXTENSION ioctl to request the value
of KVM_CAP_TSC_DEADLINE_TIMER. In case it is supported through
the local APIC emulation provided by the CREATE_IRQCHIP in KVM,
we have to set manually this feature by patching the cpuid.

Here quoted from the KVM documentation:
```
The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always
returned as false, since the feature depends on KVM_CREATE_IRQCHIP
for local APIC support. Instead it is reported via

  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you
emulate the feature in userspace, then you can enable the feature
for KVM_SET_CPUID2.
```
This patch implements the behavior described above, and this allows
the VMM to remove the emulated Programmable Interval Timer (PIT) when
the TSC_DEADLINE_TIMER feature can be enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-10 09:11:47 -07:00
Sebastien Boeuf
24dbe7003a irq: Fix pin based interrupt for virtio-pci
When the KVM capability KVM_CAP_SIGNAL_MSI is not present, the VMM
falls back from MSI-X onto pin based interrupts. Unfortunately, this
was not working as expected because the VirtioPciDevice object was
always creating an MSI-X capability structure in the PCI configuration
space. This was causing the guest drivers to expect MSI-X interrupts
instead of the pin based generated ones.

This patch takes care of avoiding the creation of a dedicated MSI-X
capability structure when MSI is not supported by KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 18:19:52 +01:00
Sebastien Boeuf
47a4065aaf interrupt: Use a single closure to describe pin based and MSI-X
In order to factorize the complexity brought by closures, this commit
merges IrqClosure and MsixClosure into a generic InterruptDelivery one.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
8df05b72dc vmm: Add MSI-X support to virtio-pci devices
In order to allow virtio-pci devices to use MSI-X messages instead
of legacy pin based interrupts, this patch implements the MSI-X
support for cloud-hypervisor. The VMM code and virtio-pci bits have
been modified based on the "msix" module previously added to the pci
crate.

Fixes #12

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
13a065d2cd dep: Rely on latest kvm-ioctls crate
In order to have access to the newly added signal_msi() function
from the kvm-ioctls crate, this commit updates the version of the
kvm-ioctls to the latest one.

Because set_user_memory_region() has been swtiched to "unsafe", we
also need to handle this small change in our cloud-hypervisor code
directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
d3c7b45542 interrupt: Make IRQ delivery generic
Because we cannot always assume the irq fd will be the way to send
an IRQ to the guest, this means we cannot make the assumption that
every virtio device implementation should expect an EventFd to
trigger an IRQ.

This commit organizes the code related to virtio devices so that it
now expects a Rust closure instead of a known EventFd. This lets the
caller decide what should be done whenever a device needs to trigger
an interrupt to the guest.

The closure will allow for other type of interrupt mechanism such as
MSI to be implemented. From the device perspective, it could be a
pin based interrupt or an MSI, it does not matter since the device
will simply call into the provided callback, passing the appropriate
Queue as a reference. This design keeps the device model generic.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Rob Bradford
f63d4a7418 vm: Disable stdin and terminal reconfiguration when headless
When not running on a tty (tested with libc's isatty()) disable stdin
and do not reconfigure the terminal.

This is required to ensure that the VM responds correctly when running
in a headless environment such as Jenkins.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Rob Bradford
425841a4fe vm: Do not explictly exit on reset
Instead return from the control_loop() and calling function cleanly.
This is helpful for the testing framework as that means we can launch
multiple VMs in a row.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Sebastien Boeuf
842515c2f1 vendor: Add vmm-sys-util duplicate
Since the top-level Cargo.toml specifies a vmm-sys-util revision
but not the sub crates, Cargo.lock points at 2 different crates.
cargo vendor copies both of them into the vendor directory but
forces the build to use the one coming from the top level driven
requirement.

Although this is a waste of space, this is a cargo vendor limitation
that we have to live with for now.

Also, because the dependency onto linux-loader had to be updated,
we had to specify a newly introduced feature called "elf".

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 10:12:04 +02:00
Samuel Ortiz
a6b7715f4b vendor: Move to the rust-vmm vmm-sys-util package
Locked to 60fe35be but no longer dependent on liujing2 repo.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-04 17:51:52 +02:00
Sebastien Boeuf
e5e651895b config: Reorganize command line parsing
The command line parsing of the user input was not properly
abstracted from the vmm specific code. In the case of --net,
the parsing was done when the device manager was adding devices.

In order to fix this confusion, this patch introduces a new
module "config" dedicated to the translation of a VmParams
structure into a VmCfg structure. The former is built based
on the input provided by the user, while the latter is the
result of the parsing of every options.

VmCfg is meant to be consumed by the vmm specific code, and
it is also a fully public structure so that it can directly
be built from a testing environment.

Fixes #31

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-24 17:08:52 +01:00
Rob Bradford
a09f918adc main, vmm: Add support for multiple --disk options
Store the list of disks in a Vec<PathBuf> and then iterate over that
when creating the block devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-22 17:46:59 +01:00
Samuel Ortiz
8bb71fad76 vmm: Simplify the vcpu run switch
Use a catchall case for all reasons that we do not handle, and
move the vCPU run switch into its own function.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-21 07:56:17 -07:00
Samuel Ortiz
9299502955 cloud-hypervisor: Switch to crates.io kvm-ioctls
Fixes: #15

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-15 05:59:08 +01:00