Commit Graph

8371 Commits

Author SHA1 Message Date
Sebastien Boeuf
5e803ab18f vmm: Integrate userspace IOAPIC
The previous commit introduced a userspace implementation of an IOAPIC
and this commits aims to plumb it into the cloud-hypervisor VMM.

Here is the list of new things brought by this patch:
- Update the rust-vmm/kvm-ioctls dependency to benefit from latest
  patches including the support for split irqchip, and the vector
  being returned when a VM exit is caused by an EOI.
- Enable the split irqchip (which means no IOAPIC or PIC is emulated
  in kernel). This is done conditionally based on the support of the
  TSC_DEADLINE_TIMER from both KVM and the underlying CPU. The
  dependency on TSC_DEADLINE_TIMER is related to KVM which does not
  support creating the in kernel PIT if it has a split irqchip.
- Rely on callbacks to handle the following use cases:
  - in kernel IOAPIC + serial IRQ (pin based)
  - in kernel IOAPIC + virtio-pci MSI-X
  - in kernel IOAPIC + virtio-pci IRQ (pin based)
  - userspace IOAPIC + serial IRQ (pin based)
  - userspace IOAPIC + virtio-pci MSI-X
  - userspace IOAPIC + virtio-pci IRQ (pin based)

Fixes #13

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Sebastien Boeuf
950bd205c6 devices: Add userspace IOAPIC implementation
The goal for cloud-hypervisor is to keep the host safe. With this in
mind, we want to emulate as much as possible in userspace instead of
in kernel directly.

The IOAPIC is a good candidate to move from kernel to userspace, which
is why this commit introduces a userspace implementation of the IOAPIC
82093AA based on the documentation:
https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf

This code is inspired from the files devices/src/ioapic.rs and
devices/src/split_irqchip_common.rs from the crosvm codebase. The
reference version used being 6c1e23eee3065b3f3d6fc4fb992ac9884dbabf68.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Sebastien Boeuf
c8c4a4d444 devices: Create Interrupt trait to abstract interrupt delivery
This commit anticipate the future need from having support for both
in kernel and userspace IOAPIC. The way to signal an interrupt from
the serial device will vary depending on the use case, but this should
be independent from the serial implementation itself.

That's why this patch provides a generic trait for the serial device
to call from, so that it can trigger interrupts independently from the
IOAPIC type chosen (in kernel vs userspace).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-21 10:09:34 +02:00
Sebastien Boeuf
2a7fbe8eae CI: Fix the Ubuntu VM update stuck on an interactive window
We need to export the variable DEBIAN_FRONTEND=noninteractive from the
Jenkinsfile if we want to make sure the VM update won't get stuck into
an interactive window.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-17 23:24:38 +02:00
Samuel Ortiz
fe43e86193 README: Use a permanent Slack invite link
The previous one was mistakenly set to expire after 30 days.

Fixes: #59

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-17 06:19:55 -07:00
Rob Bradford
c4c8b9314d build: Switch over to using rust-vmm linux-loader crate
With everything now merged upstream we no longer need to rely on Cathy's
fork.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-13 11:28:20 +01:00
Rob Bradford
226d3366d2 tests: Add direct boot test using bzImage
This supplements our direct booting test that uses vmlinux.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-13 11:28:20 +01:00
Cathy Zhang
429b53a672 vmm: Add bzimage loader support
VMM may load different format kernel image to start guest, we currently
only have elf loader support, so add bzimage loader support in case
that VMM would like to load bzimage.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2019-06-13 11:28:20 +01:00
Chao Peng
0f54429848 vmm: Move all the CPUID related code to CpuidPatch
As more CPUID handling and CpuidPatch common code being added, it's
reasonable to move all the common code to the same place and in the
future we may consider move it to individual file when neccesary.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Chao Peng
a0f4376eb0 vmm: Set the APIC ID in the extended topology
KVM exposes CPUID 0BH when host supports that, but the APIC ID that KVM
provides is the host APIC ID so we need replace that with ours.

Without this Linux guest reports something like:
[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 APIC: 21

Fixes #42

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
2019-06-13 07:06:44 +02:00
Sebastien Boeuf
0d0d19e223 vmm: Enable TSC_DEADLINE_TIMER allows for PIT emulation removal
As mentioned in the KVM documentation, TSC_DEADLINE_TIMER feature
needs some special checks to validate that it is supported as the
cpuid will always report it as disabled.

We need to use the KVM_CHECK_EXTENSION ioctl to request the value
of KVM_CAP_TSC_DEADLINE_TIMER. In case it is supported through
the local APIC emulation provided by the CREATE_IRQCHIP in KVM,
we have to set manually this feature by patching the cpuid.

Here quoted from the KVM documentation:
```
The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always
returned as false, since the feature depends on KVM_CREATE_IRQCHIP
for local APIC support. Instead it is reported via

  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you
emulate the feature in userspace, then you can enable the feature
for KVM_SET_CPUID2.
```
This patch implements the behavior described above, and this allows
the VMM to remove the emulated Programmable Interval Timer (PIT) when
the TSC_DEADLINE_TIMER feature can be enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-10 09:11:47 -07:00
Sebastien Boeuf
946a5d4f21 build: Update Cargo.lock for syn crate update
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-10 09:11:47 -07:00
Rob Bradford
72f3a69796 tests: Add test for booting from vmlinux
Download and build a Linux kernel and use the vmlinux produced as the
kernel used with a direct boot kernel test.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-10 16:49:34 +01:00
Rob Bradford
445b484881 fixup! resources: Shrink 5.0 kernel config 2019-06-10 16:49:34 +01:00
Rob Bradford
a45f473683 tests: Loosen memory check requirements
With slide variations in the kernel the memory size checks can fail so
round down the testing numbers to the nearest multiple of 1000 to make
the tests more stable.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-10 16:49:34 +01:00
Rob Bradford
52ce042125 tests: Bump the Clear Linux version
Switch the Clear Linux version to a newer release and cache that in an
azure bucket in the same region to improve the CI speed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-10 16:49:34 +01:00
Rob Bradford
fa0f1c8ab8 resources: Shrink 5.0 kernel config
Remove some of the kernel configuration options that are not necessary
for manual testing and for testing with the CI in order to reduce the
kernel build time.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-10 16:49:34 +01:00
Sebastien Boeuf
24dbe7003a irq: Fix pin based interrupt for virtio-pci
When the KVM capability KVM_CAP_SIGNAL_MSI is not present, the VMM
falls back from MSI-X onto pin based interrupts. Unfortunately, this
was not working as expected because the VirtioPciDevice object was
always creating an MSI-X capability structure in the PCI configuration
space. This was causing the guest drivers to expect MSI-X interrupts
instead of the pin based generated ones.

This patch takes care of avoiding the creation of a dedicated MSI-X
capability structure when MSI is not supported by KVM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 18:19:52 +01:00
Sebastien Boeuf
4be3dfeb37 build: Update Cargo.lock for linux-loader crate update
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 18:19:52 +01:00
Sebastien Boeuf
4d98dcb077 msix: Handle MSI-X device masking
As mentioned in the PCI specification, the Function Mask from the
Message Control Register can be set to prevent a device from injecting
MSI-X messages. This supersedes the vector masking as it interacts at
the device level.

Here quoted from the specification:
For MSI and MSI-X, while a vector is masked, the function is prohibited
from sending the associated message, and the function must set the
associated Pending bit whenever the function would otherwise send the
message. When software unmasks a vector whose associated Pending bit is
set, the function must schedule sending the associated message, and
clear the Pending bit as soon as the message has been sent. Note that
clearing the MSI-X Function Mask bit may result in many messages
needing to be sent.

This commit implements the behavior described above by reorganizing
the way the PCI configuration space is being written. It is indeed
important to be able to catch a change in the Message Control
Register without having to implement it for every PciDevice
implementation. Instead, the PciConfiguration has been modified to
take care of handling any update made to this register.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 13:33:53 +01:00
Sebastien Boeuf
d810c7712d msix: Handle MSI-X vector masking
The current MSI-X implementation completely ignores the values found
in the Vector Control register related to a specific vector, and never
updates the Pending Bit Array.

According to the PCI specification, MSI-X vectors can be masked
through the Vector Control register on bit 0. If this bit is set,
the device should not inject any MSI message. When the device
runs into such situation, it must not inject the interrupt, but
instead it must update the bit corresponding to the vector number
in the Pending Bit Array.

Later on, if/when the Vector Control register is updated, and if
the bit 0 is flipped from 0 to 1, the device must look into the PBA
to find out if there was a pending interrupt for this specific
vector. If that's the case, an MSI message is injected and the
bit from the PBA is cleared.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 13:33:53 +01:00
Sebastien Boeuf
42378caa8b vm-virtio: Fix alignment and MSI-X table size on the BAR
As mentioned in the PCI specification:

If a dedicated Base Address register is not feasible, it is
recommended that a function isolate the MSI-X structures from
the non-MSI-X structures with aligned 8 KB ranges rather than
the mandatory aligned 4 KB ranges.

That's why this patch ensures that each structure present on the
BAR is 8KiB aligned.

It also fixes the MSI-X table and PBA sizes so that they can support
up to 2048 vectors, as specified for MSI-X.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 13:33:53 +01:00
Sebastien Boeuf
edd1279609 pci: Allow QWORD read and write to MSI-X table
As mentioned in the PCI specification, MSI-X table supports both
DWORD and QWORD accesses:

For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 13:33:53 +01:00
Sebastien Boeuf
00cdbbc673 pci: Make MSI-X PBA read only
Relying on the PCI specification, the Pending Bit Array is read only.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-07 13:33:53 +01:00
Rob Bradford
bbd0f5eebb build: Update Cargo.lock for linux-loader crate update
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-07 12:39:42 +01:00
Rob Bradford
b0a575d361 tests: Add a test for PCI MSI
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
47a4065aaf interrupt: Use a single closure to describe pin based and MSI-X
In order to factorize the complexity brought by closures, this commit
merges IrqClosure and MsixClosure into a generic InterruptDelivery one.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
8df05b72dc vmm: Add MSI-X support to virtio-pci devices
In order to allow virtio-pci devices to use MSI-X messages instead
of legacy pin based interrupts, this patch implements the MSI-X
support for cloud-hypervisor. The VMM code and virtio-pci bits have
been modified based on the "msix" module previously added to the pci
crate.

Fixes #12

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
13a065d2cd dep: Rely on latest kvm-ioctls crate
In order to have access to the newly added signal_msi() function
from the kvm-ioctls crate, this commit updates the version of the
kvm-ioctls to the latest one.

Because set_user_memory_region() has been swtiched to "unsafe", we
also need to handle this small change in our cloud-hypervisor code
directly.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
4b53dc4921 pci: Add MSI-X implementation
In order to support MSI-X, this commit adds to the pci crate a new
module called "msix". This module brings all the necessary pieces
to let any PCI device implement MSI-X support.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Sebastien Boeuf
d3c7b45542 interrupt: Make IRQ delivery generic
Because we cannot always assume the irq fd will be the way to send
an IRQ to the guest, this means we cannot make the assumption that
every virtio device implementation should expect an EventFd to
trigger an IRQ.

This commit organizes the code related to virtio devices so that it
now expects a Rust closure instead of a known EventFd. This lets the
caller decide what should be done whenever a device needs to trigger
an interrupt to the guest.

The closure will allow for other type of interrupt mechanism such as
MSI to be implemented. From the device perspective, it could be a
pin based interrupt or an MSI, it does not matter since the device
will simply call into the provided callback, passing the appropriate
Queue as a reference. This design keeps the device model generic.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 15:27:35 +01:00
Rob Bradford
1f53488003 tests: Switch to launching by command
Launch the test binary by command rather than using using the vmm layer.
This makes it easier to manage the running VM as you can explicitly kill
it.

Also switch to using credibility for the tests which catches assertions
and continues with subsequent commands and reports the issues at the
end. This means it is possible to cleanup even on failed test runs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Rob Bradford
ddce3df826 tests: Add basic integration testing
Add basic integration testing of the hypervisor using a cloud-init to
configure the VM at boot and SSH to control it at runtime.

Initial test just boots the VM up checks some basic resources and
reboots. With a second test that calls into the first to check that
subsequent tests work correctly.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Rob Bradford
f63d4a7418 vm: Disable stdin and terminal reconfiguration when headless
When not running on a tty (tested with libc's isatty()) disable stdin
and do not reconfigure the terminal.

This is required to ensure that the VM responds correctly when running
in a headless environment such as Jenkins.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Rob Bradford
425841a4fe vm: Do not explictly exit on reset
Instead return from the control_loop() and calling function cleanly.
This is helpful for the testing framework as that means we can launch
multiple VMs in a row.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-06 11:03:13 +01:00
Samuel Ortiz
74a21f24e1 vendor: Remove vendoring
The cargo interaction with the .cargo/config does not meet our
requirements.

Regardless of .cargo/config explicitly replacing our external sources
with vendored ones, cargo build will rely first on Cargo.lock to update
its local source cache. If a dependency has been push forced, build
fails because of our top level Cargo.toml description.
This prevents us from actually pinning dependencies, which defeats the
vendoring purpose.

We're removing vendoring for now, until we understand it better.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-06 10:00:16 +01:00
Sebastien Boeuf
842515c2f1 vendor: Add vmm-sys-util duplicate
Since the top-level Cargo.toml specifies a vmm-sys-util revision
but not the sub crates, Cargo.lock points at 2 different crates.
cargo vendor copies both of them into the vendor directory but
forces the build to use the one coming from the top level driven
requirement.

Although this is a waste of space, this is a cargo vendor limitation
that we have to live with for now.

Also, because the dependency onto linux-loader had to be updated,
we had to specify a newly introduced feature called "elf".

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2019-06-06 10:12:04 +02:00
Samuel Ortiz
89fc75d5d3 docs: Initial vendoring documentation
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-04 17:51:52 +02:00
Samuel Ortiz
a6b7715f4b vendor: Move to the rust-vmm vmm-sys-util package
Locked to 60fe35be but no longer dependent on liujing2 repo.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-04 17:51:52 +02:00
Samuel Ortiz
d5f5648b37 vendor: Add vendored dependencies
We use cargo vendor to generate a .cargo/config file and the vendor
directory. Vendoring allows us to lock our dependencies and to modify
them easily from the top level Cargo.toml.

We vendor all dependencies, including the crates.io ones, which allows
for network isolated builds.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-06-04 17:51:52 +02:00
Rob Bradford
e3f7bc2e9d build: Update Cargo.lock to reflect changed dependencies
This also adds a new comment to Cargo.lock that is introduced by newer
cargo.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-06-03 14:18:18 +01:00
Jing Liu
8370a5bcc2 vmm: Repair the port IO memory alignment
The IO memory alignment should be set as byte alignment instead of 0x400
which is copied from crosvm.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
2019-05-28 08:05:55 -07:00
Sebastien Boeuf
e5e651895b config: Reorganize command line parsing
The command line parsing of the user input was not properly
abstracted from the vmm specific code. In the case of --net,
the parsing was done when the device manager was adding devices.

In order to fix this confusion, this patch introduces a new
module "config" dedicated to the translation of a VmParams
structure into a VmCfg structure. The former is built based
on the input provided by the user, while the latter is the
result of the parsing of every options.

VmCfg is meant to be consumed by the vmm specific code, and
it is also a fully public structure so that it can directly
be built from a testing environment.

Fixes #31

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-24 17:08:52 +01:00
Rob Bradford
9900daacf8 README: Update for new --disk usage
And fix the use of "also" that remained when the two sections on usage
were flipped around.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-22 17:46:59 +01:00
Rob Bradford
a09f918adc main, vmm: Add support for multiple --disk options
Store the list of disks in a Vec<PathBuf> and then iterate over that
when creating the block devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2019-05-22 17:46:59 +01:00
Samuel Ortiz
52790424f2 vm-allocator: Force documenting all public APIs
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-22 07:44:32 +02:00
Samuel Ortiz
9f247751e7 vm-allocator: Allow for freeing system resources
We allow freeing PIO and MMIO address ranges for now.

Fixes: #27

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-22 07:44:32 +02:00
Samuel Ortiz
4b451b01d9 vm-allocator: Allow for freeing address ranges
We can only free ranges that exactly map an already allocated one, i.e.
this is not a range resizing.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-22 07:44:32 +02:00
Samuel Ortiz
8bb71fad76 vmm: Simplify the vcpu run switch
Use a catchall case for all reasons that we do not handle, and
move the vCPU run switch into its own function.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2019-05-21 07:56:17 -07:00
Logan Saso
6615d55223 Revert "main: Fix --net behavior"
This reverts commit 8e9e7601f5.
2019-05-17 21:22:21 +01:00