This patch enables the vhost-user protocol features to let the slave
initiates some request towards the master (VMM). It also takes care of
receiving the requests from the slave and take appropriate actions based
on the request type.
The way the flow works now are as follow:
- The VMM creates a region of memory that is made available to the
guest by exposing it through the virtio-fs PCI BAR 2.
- The virtio-fs device is created by the VMM, exposing some protocol
features bits to virtiofsd, letting it know that it can send some
request to the VMM through a dedicated socket.
- On behalf of the guest driver asking for reading or writing a file,
virtiofsd sends a request to the VMM, asking for a file descriptor to
be mapped into the shared memory region at a specific offset.
- The guest can directly read/write the file at the offset of the
memory region.
This implementation is more performant than the one using exclusively
the virtqueues. With the virtqueues, the content of the file needs to be
copied to the queues every time the guest is asking to access it.
With the shared memory region, the virtqueues become the control plane
where the libfuse commands are sent to virtiofsd. The data plane is
literally the whole memory region which does not need any extra copy of
the file content. The only penalty is the first time a file is accessed,
it needs to be mapped into the VMM virtual address space.
Another interesting case where this solution will not perform as well as
expected is when a file is larger than the region itself. This means the
file needs to be mapped in several times, but more than that this means
it needs to be remapped every time it's being accessed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When the cache_size parameter from virtio-fs device is not empty, the
VMM creates a dedicated memory region where the shared files will be
memory mapped by the virtio-fs device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support the more performant version of virtio-fs, that is
the one relying on a shared memory region between host and guest, we
introduce two new parameters to the --fs device.
The "dax" parameter allows the user to choose if he wants to use the
shared memory region with virtio-fs. By default, this parameter is "on".
The "cache_size" parameter allows the user to specify the amount of
memory that should be shared between host and guest. By default, the
value of this parameter is 8Gib as advised by virtio-fs maintainers.
Note that dax=off and cache_size are incompatible.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In the context of shared memory regions, they could not be present for
most of the virtio devices. For this reason, we prefer dedicate a BAR
for the shared memory regions.
Another reason is that memory regions, if there are several, can be
allocated all at once as a contiguous region, which then can be used as
its own BAR. It would be more complicated to try to allocate the BAR 0
holding the regular information about the virtio-pci device along with
the shared memory regions.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch cleans up the VirtioDevice trait. Since some function are PCI
specific and since they are not even used, it makes sense to remove them
from the trait definition.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the newly added SharedMemoryConfig capability to the virtio
specification, and based on the fact that it is not tied to the type of
transport (pci or mmio), we can create as part of the VirtioDevice trait
a new method that will provide the shared memory regions associated with
the device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Based on the latest version of the virtio specification, the structure
virtio_pci_cap has been updated and a new structure virtio_pci_cap64 has
been introduced.
virtio_pci_cap now includes a field "id" that does not modify the
existing structure size since there was a 3 bytes reserved field
already there. The id is used in the context of shared memory regions
which need to be identified since there could be more than one of this
kind of capability.
virtio_pci_cap64 is a new structure that includes virtio_pci_cap and
extends it to allow 64 bits offsets and 64 bits region length. This is
used in the context of shared memory regions capability, as we might need
to describe regions of 4G or more, that could be placed at a 4G offset
or more in the associated BAR.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The length of the PCI capability as it is being calculated by the guest
was not accurate since it was not including the implicit 2 bytes offset.
The reason for this offset is that the structure itself does not contain
the capability ID (1 byte) and the next capability pointer (1 byte), but
the structure exposed through PCI config space does include those bytes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
No matter if the communication is coming from the master or the slave,
it should always reply with an ack if the message header specifies that
it expects a reply.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because the way to mount virtio-fs filesystem changed with newest
kernel, we need to update the mount command in our integration tests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to be able to use the latest version from virtiofsd binary, the
integration tests will now build it directly from a branch located on
sboeuf's QEMU fork. The same way the kernel is hosted on sboeuf's linux
kernel fork, this allows to update the version of the virtiofs daemon
based on latest patches from virtio-fs maintainers.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The last kernel we were using included some manual porting of the
virtio-pmem and virtio-fs patches. By moving to 5.3-rc3, we are now
closer to upstream since the virtio-pmem patches are part of the linux
kernel now. Additionally, this includes the latest patches from
virtio-fs maintainers, which works with the latest version of virtiofsd.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The numbr of bytes read was being incorrectly increased by the potential
length of the end of the sliced data rather than the number of bytes
that was in the range. This caused a panic when the the network was
under load by using iperf.
It's important to note that in the Firecracker code base the function
that read_slice() returns the number of bytes read which is used to
increment this counter. The VM memory version however only returns the
empty unit "()".
Fixes: #166
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
One of the features of the virtio console device is its size can be
configured and updated. Our first iteration of the console device
implementation is lack of this feature. As a result, it had a
default fixed size which could not be changed. This commit implements
the console config feature and lets us change the console size from
the vmm side.
During the activation of the device, vmm reads the current terminal
size, sets the console configuration accordinly, and lets the driver
know about this configuration by sending an interrupt. Later, if
someone changes the terminal size, the vmm detects the corresponding
event, updates the configuration, and sends interrupt as before. As a
result, the console device driver, in the guest, updates the console
size.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
Poor performance was observed when booting kernels with "console=ttyS0"
and the serial port disabled.
This change introduces a "null" console output mode and makes it the
default for the serial console. In this case the serial port
is advertised as per other output modes but there is no input and any
output is dropped.
Fixes: #163
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Relying on the newly added MSI-X helper, the interrupt callback checks
the interrupts are enabled on the device before to try triggering the
interrupt.
Fixes#156
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to check if device's interrupts are enabled, this patch adds
a helper function to the MsixConfig structure so that at any point in
time we can check if an interrupt should be delivered or not.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The vector control offset is at the 4th byte of each MSI-X table entry.
For that reason, it is located at 0xc, and not 0x10 as implemented.
This commit fixes the current MSI-X code, allowing proper reading and
writing of each vector control register in the MSI-X table.
Fixes#156
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This makes the log macros (error!, warn!, info!, etc) in the code work.
It currently defaults to showing only error! messages, but by passing an
increasing number of "-v"s on the command line the verbosity can be
increased.
By default log output goes onto stderr but it can also be sent to a
file.
Fixes: #121
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the addition of commands after running the integration tests the
exit code of the tests were lost.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The structure provided to the add_capability() function should only
contain what comes after the capability ID and the next capability
pointer, which are located on the first WORD.
Because the structure TestCap included _vndr and _next fields, they
were directly set after the first WORD, while the assertion was
expecting to find the values of len and foo fields.
Fixes#105
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The structure of the vmm-sys-util crate has changed with lots of code
moving to submodules.
This change adjusts the use of the imported structs to reference the
submodules.
Fixes: #145
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Update all dependencies with "cargo upgrade" with the exception of
vmm-sys-utils which needs some extra porting work.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the header puts the refcount table outside the file size or if it
specifies a table much larger than needed, fail to open the file.
These might not be hard qcow errors, but they are situations that crosvm
will never encounter.
BUG=986061
TEST=fuzzer with new test cases completes in less than 5 seconds.
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Change-Id: If048c96f6255ca81740e20f3f4eb7669467dbb7b
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1716365
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 969a0b49ff0a9afbca18230181542bbe7e06b8f7)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Zeroing a cluster will be done from more than one place in qcow.rs soon,
add a helper to reduce duplication.
Change-Id: Idb40539f8e4ed2338fc84c0d53b37c913f2d90fe
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1697122
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 13c219139504d0a191948fd205c835a2505cadba)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There are many corner cases when handling sizes that approach u64::max.
Limit the files to 16TB.
BUG=979458
TEST=Added unittest to check large disks fail
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Change-Id: I93a87c17267ae69102f8d46ced9dbea8c686d093
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1679892
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 93b0c02227f3acf2c6ff127976875cf3ce98c003)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The extra % operation will be slower, but none of these divisions are in
hot paths. They are only used during setup. Many of these operations
take untrusted input from the disk file, so need to be hardened.
BUG=979458
TEST=unit tests still pass
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Change-Id: I0e93c73b345faf643da53ea41bde3349d756bdc7
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1679891
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit eecbccc4d9d70b2fd63681a2b3ced6a6aafe81bb)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Before this change, a corrupt or malicious qcow file could cause crosvm
to allocate absurd amounts of memory. The fuzzer found this case,
limit the L1 table size so it can't cause issues.
BUG=chromium:974123
TEST=run fuzzer locally, add unit test
Change-Id: Ieb6db6c87f71df726b3cc9a98404581fe32fb1ce
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1660890
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 70d7bad28414e4b0d8bdf2d5eb85618a3b1e83c6)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Start with a valid header so the invalid cluster bits field is tested in
isolation. Before this change the test would pass even if the cluster
bits check was removed from the code because the header was invalid for
other reasons.
BUG=none
TEST=this is a test
Change-Id: I5c09417ae3f974522652a50cb0fdc5dc0e10dd44
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1660889
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit c9f254b1921335231b32550b5ae6b8416e1ca7aa)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
qcow currently makes assumptions about being able to fit the L1 and
refcount tables in memory. This isn't necessarily true for very large
files. The new limits to the size of in-memory buffers still allow 1TB
files with default cluster sizes. Because there aren't any 1 TB
chromebooks available yet, limit disks to that size.
This works around issues found by clusterfuzz related to large files
built with small clusters.
BUG=972165,972175,972160,972172
TEST=fuzzer locally + new unit tests
Change-Id: I15d5d8e3e61213780ff6aea5759b155c63d49ea3
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651460
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit a094f91d2cc96e9eeb0681deb81c37e9a85e7a55)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
u32's get multiplied together and can overflow. A usize was being
returned, make everything a u64 to make sure it fits.
Change-Id: I87071d294f4e62247c9ae72244db059a7b528b62
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651459
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit 21c941786ea0cb72114f3e9c7c940471664862b5)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a lower limit because cases such as eight byte clusters aren't
practical and aren't worth handling, tracking a cluster costs 16 bytes.
Also put an upper limit on the cluster size, choose 21 bits to match
qemu.
Change-Id: Ifcab081d0e630b5d26b0eafa552bd7c695821686
Signed-off-by: Dylan Reid <dgreid@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1651458
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>
Tested-by: kokoro <noreply+kokoro@google.com>
(cherry picked from crosvm commit cae80e321acdccb1591124f6bf657758f1e75d1d)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When reading from or writing to the MSI-X table, the function provided
by the PCI crate expects the offset to start from the beginning of the
table. That's why it is VFIO specific code to be responsible for
providing the right offset, which means it needs to be the offset
substracted by the beginning of the MSI-X table offset.
This bug was not discovered until we tested VFIO with some device where
the MSI-X table was placed on a BAR at an offset different from 0x0.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>