This should reduce the integration testing time considerably. When a
custom kernel is no longer required we can pull kernel from tarball
again.
Fixes: #100
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The VFIO integration test first boots a QEMU guest and then assigns the
QEMU virtio-pci networking device into a nested cloud-hypervisor guest.
We then check that we can ssh into the nested guest and verify that it's
running with the right kernel command line.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the VFIO crate, we can now support directly assigned PCI devices
into cloud-hypervisor guests.
We support assigning multiple host devices, through the --device command
line parameter. This parameter takes the host device sysfs path.
Fixes: #60
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
VFIO explictly tells us if a MMIO region can be mapped into the guest
address space or not. Except for MSI-X table BARs, we try to map them
into the guest whenever VFIO allows us to do so. This avoids unnecessary
VM exits when the guest tries to access those regions.
Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We track all MSI and MSI-X capabilities changes, which allows us to also
track all MSI and MSI-X table changes.
With both pieces of information we can build kvm irq routing tables and
map the physical device MSI/X vectors to the guest ones. Once that
mapping is in place we can toggle the VFIO IRQ API accordingly and
enable disable MSI or MSI-X interrupts, from the physical device up to
the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to properly manage the VFIO device interrupt settings, we need
to keep track of both MSI and MSI-X PCI config capabilities changes.
When the guest programs the device for interrupt delivery, it writes to
the MSI and MSI-X capabilities. This information must be trapped and
cached in order to map the physical device interrupt delivery path to
the guest one. In other words, tracking MSI and MSI-X capabilites will
allow us to accurately build the KVM interrupt routes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This brings the initial PCI support to the VFIO crate.
The VfioPciDevice is the main structure and holds an inner VfioDevice.
VfioPciDevice implements the PCI trait, leaving the IRQ assignments
empty as this will be driven by both the guest and the VFIO PCI device,
not by the VMM.
As we must trap BAR programming from the guest (We don't want to program
the actual device with guest addresses), we use our local PCI
configuration cache to read and write BARs.
Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Virtual Function I/O (VFIO) kernel subsystem exposes a vast and
relatively complex userspace API. This commit abstracts and simplifies
this API into both an internal and external API.
The external API is to be consumed by VFIO device implementation through
the VfioDevice structure. A VfioDevice instance can:
- Enable and disable all interrupts (INTX, MSI and MSI-X) on the
underlying VFIO device.
- Read and write all of the VFIO device memory regions.
- Set the system's IOMMU tables for the underlying device.
Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Our DeviceManager::new() routine is reaching north of 250 lines.
For simplicity and readbility sake, extract all virtio devices creation
code into their own routines.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Two integration tests are added for testing the implemented virtio
console device for single port operation. One checks the presence
and the simple stdout operation. The other test checks the stdout
on file (option: file) using virtio console.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
To use the implemented virtio console device, the users can select one
of the three options ("off", "tty" or "file=/path/to/the/file") with
the command line argument "--console". By default, the console is
enabled as a device named "hvc0" (option: tty). When "off" option is
used, the console device is not added to the VM configuration at all.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
The virtio console device is a console for the communication between
the host and guest userspace. It has two parts: the device and the
driver. The console device is implemented here as a virtio-pci device
to the guest. On the other side, the guest OS expected to have a
character device driver which provides an interface to the userspace
applications.
The console device can have multiple ports where each port has one
transmit queue and one receive queue. The current implementation only
supports one port. For data IO communication, one or more empty
buffers are placed in the receive queue for incoming data, and
outgoing characters are placed in the transmit queue. Details spec
can be found from the following link.
https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.pdf#e7
Apart from the console, for the communication between guest and host,
the Cloud Hypervisor has a legacy serial device implemented. However,
the implementation of a console device lets us be independent of legacy
pin-based interrupts without losing the logs and access to the VM.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
With this new AddressAllocator as part of the SystemAllocator, the
VMM can now decide with finer granularity where to place memory.
By allocating the RAM and the hole into the MMIO address space, we
ensure that no memory will be allocated by accident where the RAM or
where the hole is.
And by creating the new MMIO hole address space, we create a subset
of the entire MMIO address space where we can place 32 bits BARs for
example.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The requested address for a range can be the base of the entire
address space, this is a valid use case.
In particular, when creating an MMIO address space of 0-64GiB, we
might want to create a range of 0-1GiB if the RAM of our VM is 1G.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch fixes the function first_available_range() responsible
for finding the first range that could fit the requested size.
The algorithm was working, that is allocating ranges from the end
of the address space because we created an empty region right at the
end. But the problem is, the VMM might request for some specific
allocations at fixed address to allocate the RAM for example. In this
case, the RAM range could be 0-1GiB, which means with the previous
algorithm, the new available range would have been found right after
1GiB.
This is not the intended behavior, and that's why the algorithm has
been fixed by this patch, making sure to walk down existing ranges
starting from the end.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
GSI (Global System Interrupt) is an extension of just a linear array of
IRQs. It takes IOAPICs into account for example.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
There is alignment support for AddressAllocator but there are occations
that the alignment is known only when we call allocate(). One example
is PCI BAR which is natually aligned, means for which we have to align
the base address to its size.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
This is only for allocating the port IO address range.
If a platform does not have PIO devices at all, the address
range will simply be unused.
So, simplify the vm-allocator data structure by making both
MMIO and PIO mandatory.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
This patch adds the support for both IO and Memory BARs by expecting
the function allocate_bars() to identify the type of each BAR.
Based on the type, register_mapping() insert the address range on the
appropriate bus (PIO or MMIO).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The way the function write_reg() was implemented, it was not keeping
the bits supposed to be read-only whenever the guest was writing to one
of those. That's why this commit takes care of protecting those bits,
preventing them from being updated.
The tricky part is about the BARs since we also need to handle the very
specific case where the BAR is being written with all 1's. In that case
we want to return the size of the BAR instead of its address.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A BAR can be three different types: IO, 32 bits Memory, or 64 bits
Memory. The VMM needs a way to set the right type depending on its
needs.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support use cases that require MSI, the pci crate is
being expanded with the description of an MSI PCI capability
structure through the new MsiCap Rust structure.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit enhances the current msi-x code hosted in the pci crate
in order to be reused by the vfio crate. Specifically, it creates
several useful methods for the MsixCap structure that can simplify
the caller's code.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The QCOW2 format is documented here:
https://git.qemu.org/?p=qemu.git;a=blob;f=docs/interop/qcow2.txt;hb=HEAD
The only difference between v2 and v3 is the addition of some extra
fields into the header in v3 for which there are default values in v2.
This introduces a new unit test for the behaviour but it has been
manually verified by the converting the image from v3 to v2
with a command like:
qemu-img convert -O qcow2 -o compat=0.10 clear-29620-cloud.img clear-29620-cloud.img.v2
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In some situations it is possible for the setting of the capabilities to
fail due to the variable naming of the build artifacts resulting in the
first parameter to setcap being rejected and thus the whole command
failing.
Use xargs -n 1 to ensure that every potential target independently has
its caps set.
Further it was observed that in some situations the binary produced by
cargo test --all --no-run would not be used and instead a new binary
would be produced when the test was run using the second method. This
again would result in test failures as that binary did not have the
desired capabilities set. Therefore build the test binaries with the
same methodology used to run them.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
On the Jenkins build slaves disk I/O is a bottlneck so make /tmp a tmpfs
which removes I/O issues when running lots of VMs at the same time.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Create a struct to handle all the details for the guest under test
including details of network and disks.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Allow replacement of the network details used for the VM. By replacing
those from the file checked into the source tree we can continue to use
the file in the tree for manual testing but adjust the network per-VM to
allow parallel testing.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In the future this will provide the basis for the ability to customise
the cloud-init file per VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By sleeping more earlier this will speed up the tests as the SSH
connection will complete on the first attempt and thus alleviate timeout
and backoff delays.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use the tempdir crate to create a temporary directory that is deleted
when the structure goes out of scope.
Use this temporary directory for all temporary test files created by the
tests. The cloud init file is still in /tmp as that is created by the
test wrapper code.
This is the first stage towards being able to run the integration tests
in parallel.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The addition of [workspace] to the top level Cargo.toml is necessary to
have the binaries colocated together.
The Cargo.lock files have also been refreshed by the change to the
Cargo.toml.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The type of interrupt_evt has changed along with the addition of an
msix_config member for the virtio device.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than relying on shared memory for a temporary file for QCOW
testing instead use tempfile crate to get a temporary file. The vector
cache tests also need a trivial update after the refactor.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the adoption for rust-vmm linux-loader crate some small changes
were needed to update the unit tests to reflect this change:
* configure_system now takes an extra parameter
* the e820 entry structure comes from the linux-loader crate
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some refactoring has taken place since the unit tests were written:
The read/write in BusDevice now take a base address and the interrupt
handling code has changed necessitating the need for a new TestInterrupt
struct.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The older version of pnet had a bug which broke some of the behaviour
that the unit tests relied upon.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a "--serial" command line that takes as input either "off", "tty"
(default and current behaviour) and "file=/path/to/file".
When "--serial off" is used the serial device is not added to the VM
configuration at all.
Integration tests added that check for interrupts present (or not) and
that when sending to a file the file contains the expected serial
output.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>