Zeroing a cluster will be done from more than one place in qcow.rs soon,
add a helper to reduce duplication.
Change-Id: Idb40539f8e4ed2338fc84c0d53b37c913f2d90fe
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1697122
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 13c219139504d0a191948fd205c835a2505cadba)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are many corner cases when handling sizes that approach u64::max.
Limit the files to 16TB.
BUG=979458
TEST=Added unittest to check large disks fail
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Change-Id: I93a87c17267ae69102f8d46ced9dbea8c686d093
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1679892
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 93b0c02227f3acf2c6ff127976875cf3ce98c003)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The extra % operation will be slower, but none of these divisions are in
hot paths. They are only used during setup. Many of these operations
take untrusted input from the disk file, so need to be hardened.
BUG=979458
TEST=unit tests still pass
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Change-Id: I0e93c73b345faf643da53ea41bde3349d756bdc7
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1679891
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit eecbccc4d9d70b2fd63681a2b3ced6a6aafe81bb)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Before this change, a corrupt or malicious qcow file could cause crosvm
to allocate absurd amounts of memory. The fuzzer found this case,
limit the L1 table size so it can't cause issues.
BUG=chromium:974123
TEST=run fuzzer locally, add unit test
Change-Id: Ieb6db6c87f71df726b3cc9a98404581fe32fb1ce
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1660890
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 70d7bad28414e4b0d8bdf2d5eb85618a3b1e83c6)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Start with a valid header so the invalid cluster bits field is tested in
isolation. Before this change the test would pass even if the cluster
bits check was removed from the code because the header was invalid for
other reasons.
BUG=none
TEST=this is a test
Change-Id: I5c09417ae3f974522652a50cb0fdc5dc0e10dd44
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1660889
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit c9f254b1921335231b32550b5ae6b8416e1ca7aa)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
qcow currently makes assumptions about being able to fit the L1 and
refcount tables in memory. This isn't necessarily true for very large
files. The new limits to the size of in-memory buffers still allow 1TB
files with default cluster sizes. Because there aren't any 1 TB
chromebooks available yet, limit disks to that size.
This works around issues found by clusterfuzz related to large files
built with small clusters.
BUG=972165,972175,972160,972172
TEST=fuzzer locally + new unit tests
Change-Id: I15d5d8e3e61213780ff6aea5759b155c63d49ea3
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651460
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit a094f91d2cc96e9eeb0681deb81c37e9a85e7a55)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
u32's get multiplied together and can overflow. A usize was being
returned, make everything a u64 to make sure it fits.
Change-Id: I87071d294f4e62247c9ae72244db059a7b528b62
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651459
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 21c941786ea0cb72114f3e9c7c940471664862b5)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a lower limit because cases such as eight byte clusters aren't
practical and aren't worth handling, tracking a cluster costs 16 bytes.
Also put an upper limit on the cluster size, choose 21 bits to match
qemu.
Change-Id: Ifcab081d0e630b5d26b0eafa552bd7c695821686
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651458
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit cae80e321acdccb1591124f6bf657758f1e75d1d)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When reading from or writing to the MSI-X table, the function provided
by the PCI crate expects the offset to start from the beginning of the
table. That's why it is VFIO specific code to be responsible for
providing the right offset, which means it needs to be the offset
substracted by the beginning of the MSI-X table offset.
This bug was not discovered until we tested VFIO with some device where
the MSI-X table was placed on a BAR at an offset different from 0x0.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The offsets returned by the table_offset() and pba_offset() function
were wrong as they were shifting the value by 3 bits. The MSI-X spec
defines the MSI-X table and PBA offsets as being defined on 3-31 bits,
but this does not mean it has to be shifted. Instead, the address is
still on 32 bits and assumes the LSB bits 0-2 are set to 0.
VFIO was working fine with devices were the MSI-X offset were 0x0, but
the bug was found on a device where the offset was non-null.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The existing code taking care of the epoll loop was too restrictive as
it was propagating the error returned from the epoll_wait() syscall, no
matter what was the error. This causes the epoll loop to be broken,
leading to a non-functional virtio device.
This patch enforces the parsing of the returned error and prevent from
the error propagation in case it is EINTR, which stands for Interrupted.
In case the epoll loop is interrupted, it is appropriate to retry.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The existing code taking care of the epoll loop was too restrictive as
it was propagating the error returned from the epoll_wait() syscall, no
matter what was the error. This causes the epoll loop to be broken,
leading to the VM termination.
This patch enforces the parsing of the returned error and prevent from
the error propagation in case it is EINTR, which stands for Interrupted.
In case the epoll loop is interrupted, it is appropriate to retry.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to fix the clippy error complaining about the number of
arguments passed to a function exceeding the maximum of 7 arguments,
this patch factorizes those parameters into a more global one called
VmInfo.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since virtio-pmem uses a KVM user memory region, it needs to increment
the slot index in use to prevent from any conflict with further VFIO
allocations (used for mapping mappable memory BARs).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the context of VFIO, we use Vt-d, which means we rely on an IOMMU.
Depending on the IOMMU capability, and in particular if it is not able
to perform SC (Snooping Control), the memory will not be tagged as WB
by KVM, but instead the vCPU will rely on its MTRR/PAT MSRs to find the
appropriate way of interact with specific memory regions.
Because when Vt-d is not involved KVM sets the memory as WB (write-back)
the VMM should set the memory default as WB. That's why this patch sets
the MSR MTRRdefType with the default memory type being WB.
One thing that it is worth noting is that we might have to specifically
create some UC (uncacheable) regions if we see some issues with the
ranges corresponding to the MMIO ranges that should trap.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce a new DiskConfig implementation for Ubuntu Bionic with
different cloud init preparation details and use this when testing with
test_simple_launch.
Adjust the memory expectation for downwards as the EFI boot results in a
slightly different memory map. Also enable serial port as Ubuntu does
not support a virtio-console based boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Download Bionic image and convert to raw (the QCOW version of the file
doesn't currently work) for running the integration tests.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than embed the disks in a vector and have integer indicies into
the vector for the different disks instead abstract this through an enum
type used on the DiskConfig trait.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extract the network configuration into its own struct and also extract
the prepare_files() and prepare_cloudinit() functions into a struct for
the Clear Linux distribution.
This struct is behind a trait that is used by the Guest implementation
to prepare the files.
This will allow a different implementation to be used for the Ubuntu
disk files.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the newly added code in the pci crate, the vfio crate can now
properly expose an expansion ROM BAR if the device has one.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The expansion ROM BAR can be considered like a 32-bit memory BAR with a
slight difference regarding the amount of reserved bits at the beginning
of its 32-bit value. Bit 0 indicates if the BAR is enabled or disabled,
while bits 1-10 are reserved. The remaining upper 21 bits hold the BAR
address.
This commit extends the pci crate in order to support expansion ROM BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support VFIO for devices supporting multiple functions,
we need to mask the multi-function bit (bit 7 from Header Type byte).
Otherwise, the guest kernel ends up tryng to enumerate those devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Every PCI device is exposed as a separate device, on a specific PCI
slot, and we explicitely don't support to expose a device as a multi
function device. For this reason, this patch makes sure the enumeration
of the PCI bus will not find some multi function device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add cloud-init data for Ubuntu and introduce a convenience script that
can be used to generate cloud-init disk images for manual testing.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And rename the default user from "admin" to "cloud" as the admin user
clashes with a standard user and group name on Ubuntu.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As per the VIRTIO specification, every virtio device configuration can
be updated while the guest is running. The guest needs to be notified
when this happens, and it can be done in two different ways, depending
on the type of interrupt being used for those devices.
In case the device uses INTx, the allocated IRQ pin is shared between
queues and configuration updates. The way for the guest to differentiate
between an interrupt meant for a virtqueue or meant for a configuration
update is tied to the value of the ISR status field. This field is a
simple 32 bits bitmask where only bit 0 and 1 can be changed, the rest
is reserved.
In case the device uses MSI/MSI-X, the driver should allocate a
dedicated vector for configuration updates. This case is much simpler as
it only requires the device to send the appropriate MSI vector.
The cloud-hypervisor codebase was not supporting the update of a virtio
device configuration. This patch extends the existing VirtioInterrupt
closure to accept a type that can be Config or Queue, so that based on
this type, the closure implementation can make the right choice about
which interrupt pin or vector to trigger.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We add a Reserved region type at the end of the memory hole to prevent
32-bit devices allocations to overlap with architectural address ranges
like IOAPIC, TSS or APIC ones.
Eventually we should remove that reserved range by allocating all the
architectural ranges before letting 32-bit devices use the memory hole.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We want to be able to differentiate between memory regions that must be
managed separately from the main address space (e.g. the 32-bit memory
hole) and ones that are reserved (i.e. from which we don't want to allow
the VMM to allocate address ranges.
We are going to use a reserved memory region for restricting the 32-bit
memory hole from expanding beyond the IOAPIC and TSS addresses.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Since our VFIO code does not support pin based interrupt, but only MSI
and MSI-X, it is cleaner to not expose any Interrupt Pin to the guest by
setting its value to 0.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Until the codebase can properly expose the ROM BAR into the guest, it is
better to disable it for now, returning always 0 when the register is
being read.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The IO BAR alignment was already set to 4 bytes, this patch simply added
a comment for it.
The Memory BAR alignment was also set to the right value, but it was not
explained why 0x1000 was needed, and also why 0x10 could sometimes be
used as correct alignment.
A Memory BAR must be aligned at least on 16 bytes since the first 4 bits
are dedicated to some specific information about the BAR itself. But in
case a BAR is identified as mappable from VFIO, this means our VMM might
memory map it into the VMM address space, and set KVM accordingly using
the ioctl KVM_SET_USER_MEMORY_REGION. In case of KVM, we have to take
into account that it expects addresses to be page aligned, which means
4K in this case.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to correctly support multiple VFIO devices, we need to
increment the memory slot index every time it is being used to set some
user memory region through KVM. That's why the mem_slot parameter is
made mutable.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The memory slot index provided to the DeviceManager was wrong since
only the RAM memory regions are set as user memory regions to KVM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
KVM does not support multiple KVM VFIO devices to be created when
trying to support multiple VFIO devices. This commit creates one
global KVM VFIO device being shared with every VFIO device, which
makes possible the support for passing several devices through the
VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
There is one corner case which was not properly handled by the current
code from our AddressAllocator. If both the address start (from the
next range) and the requested region size are already aligned on the
same value as "alignment", when doing the substract of the requested
size + alignment, the returned address is already aligned. The problem
is that we might end up overlapping with an existing range since the
check between the available delta and the requested size does not take
into account a full extra alignment.
By substracting the requested size + alignment - 1 from the address
start of the next range, we ensure this kind of corner case would not
happen since the address won't be naturally aligned and after some
adjustment from the function align_address(), the correct start address
will be returned.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The check performed on the end address was wrong since the end address
was actually the address right after the end. To get the right end
address, we need to add (region size - 1) to the start address.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The newer kernel is resulting in entropy being slightly lower than
previously. Adjust the expected entropy downwards.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This should reduce the integration testing time considerably. When a
custom kernel is no longer required we can pull kernel from tarball
again.
Fixes: #100
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The VFIO integration test first boots a QEMU guest and then assigns the
QEMU virtio-pci networking device into a nested cloud-hypervisor guest.
We then check that we can ssh into the nested guest and verify that it's
running with the right kernel command line.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the VFIO crate, we can now support directly assigned PCI devices
into cloud-hypervisor guests.
We support assigning multiple host devices, through the --device command
line parameter. This parameter takes the host device sysfs path.
Fixes: #60
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
VFIO explictly tells us if a MMIO region can be mapped into the guest
address space or not. Except for MSI-X table BARs, we try to map them
into the guest whenever VFIO allows us to do so. This avoids unnecessary
VM exits when the guest tries to access those regions.
Signed-off-by: Zhang, Xiong Y <xiong.y.zhang@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>