This config option provided very little value and instead we now enable
this feature (which then lets the guest control the cache mode)
unconditionally.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the method that is used to decide whether the guest should be
signalled into the Queue implementation from vm-virtio. This removes
duplicated code between vhost_user_backend and the vm-virtio block
implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.
Support for vhost-user-net self spawning was also included.
Fixes: #1177
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.
The default behaviour is to kill the process, this normally gets
audit logged.
--seccomp none disables seccomp
log Just logs violations but doesn't stop it
trap causes a signal to be be sent that can be trapped.
If you suspect you're hitting a seccomp action then you can
check the audit log; you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.
This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.
Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Changes is vhost crate require VhostUserDaemon users to create and
provide a vhost::Listener in advance. This allows us to adopt
sandboxing strategies in the future, by being able to create the UNIX
socket before switching to a restricted namespace.
Update also the reference to vhost crate in Cargo.lock to point to the
latest commit from the dragonball branch.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Rather than repeat syntax for the vhost-user-block backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than repeat syntax for the vhost-user-net backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The ch-remote usage says:
OPTIONS:
--api-socket <api-socket> HTTP API socket path (UNIX domain socket).
However it doesn't seem to actually accept that syntax, instead
requiring "--api-socket=". This may be a clap bug however it is resolved
by setting the number of arguments requires to exactly one. Which is
also the actual correct number.
Fixes: #1117Fixes: #1116
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It's possible to have multiple vsock devices so in preparation for
hotplug/unplug it is important to be able to have a unique identifier
for each device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Check that if any device using vhost-user (net & disk with
vhost_user=true) or virtio-fs is enabled then check shared memory is
also enabled.
Fixes: #848
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").
Fixes: #1011
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
Currently unimplemented. Once implemented, this API will allow for
creating virtio-fs devices in the VM after it has booted.
Signed-off-by: Dean Sheather <dean@coder.com>
By adding a "thread_id" parameter to handle_event(), the backend crate
can now indicate to the backend implementation which thread triggered
the processing of some events.
This is applied to vhost-user-net backend and allows for simplifying a
lot the code since each thread is identical.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding the "thread_index" parameter to the function exit_event() from
the VhostUserBackend trait, the backend crate now has the ability to ask
the backend implementation about the exit event related to a specific
thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By changing the mutability of this function, after adapting all
backends, we should be able to implement multithreads with
multiqueues support without hitting a bottleneck on the backend
locking.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both blk, net and fs backends have been updated to avoid the requirement
of having handle_event(&mut self). This will allow the backend crate to
avoid taking a write lock onto the backend object, which will remove the
potential contention point when multiple threads will be handling
multiqueues.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As the VmConfig::Parse() also does validation work it only make sense to
parse the VM options on the VM boot path only.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Replace the existing VmConfig::valid() check with a call into
.validate() as part of earlier config setup or boot API checks.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".
Signed-off-by: Hui Zhu <teawater@antfin.com>
This change introduces a new CLI option --seccomp. This allows the user
to enable/disable the seccomp filters when needed. Because the user now
has the possibility to disable the seccomp filters, and because the
Cloud-Hypervisor project wants to enforce the maximum security by
default, the seccomp filters are now applied by default.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.
This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.
Signed-off-by: Sergio Lopez <slp@redhat.com>
This change enables vhost_user_fs to process multiple requests in
parallel by scheduling them into a ThreadPool (from the Futures
crate).
Parallelism on a single file is limited by the nature of the operation
executed on it. A recent commit replaced the Mutex that protects the
File within HandleData with a RwLock, to allow some operations (at
this moment, only "read" and "write") to proceed in parallel by
acquiring a read lock.
A more complex approach was also implemented [1], involving
instrumentation through vhost_user_backend to be able to serialize
completions, reducing the pressure on the vring RwLock. This strategy
improved the performance on some corner cases, while making it worse
on other, more common ones. This fact, in addition to it requiring
wider changes through the source code, prompted me to drop it in favor
of this one.
[1] https://github.com/slp/cloud-hypervisor/tree/vuf_async
Signed-off-by: Sergio Lopez <slp@redhat.com>
This option was superseded by using "--net" with "vhost_user=true". This
option wasn't being parsed any more but was left over.
Fixes: #806
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some of the help strings had extra newlines in them or otherwise strange
wrapping. The strings were rewrapped with the nightly version of rustfmt
that supports string formatting.
Fixes: #899
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This change, combined with the compiler hint to inline get_used_event,
shortens the window between the memory read and the actual check by
calling get_used_event from needs_notification.
Without it, when putting enough pressure on the vring, it's possible
that a notification is wrongly omitted, causing the queue to stall.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.
This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.
If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.
Fixes#881
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Support sending a request body this will usually be JSON encoded data
representing the details of the request.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit introduces a basic implementation of a remote control of a
running VMM implementing a subset of the API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We weren't processing events arriving at the HIPRIO queue, which
implied ignoring FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
requests.
One effect of this issue was that file descriptors weren't closed on
the server, so it eventually hits RLIMIT_NOFILE. Additionally, the
guest OS may hang while attempting to unmount the filesystem.
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_fs doesn't really support all vhost protocol features, just
MQ and SLAVE_REQ, so return that in protocol_features().
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The integration test test_memory_mergeable_on has been fairly unstable
for quite some time now. Because it can take some time for the VM to be
spawned and to be able to perform a correct measure of the PSS, this
commit simply increases the time before such measure is done.
This should return more accurate PSS results, which should help
stabilize the test.
Fixes#781
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that vhost_user_fs rust daemon supports virtiofs's dax mode, this adds
the two dax tests accordingly.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
This adds the missing part of supporting virtiofs dax on the slave end,
that is, receiving a socket pair fd from the master end to set up a
communication channel for sending setupmapping & removemapping messages.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Add a build-script to propagate the git commit hash to other crates at
compile time through environment variables, and display the hash along
with the '--version' option.
Fixes#729
Signed-off-by: Bo Chen <chen.bo@intel.com>
test_vfio has been failing consistently on the CI so mark it with
a "#[ignore]" and then forceably build it again but ignore the build
result.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than handling the KILL_EVENT in the event handler itself use the
newly added support in VhostUserBackend for providing a kill event
framework.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Check the rust formatting rather than just reformatting code on the CI
agent.
Also fix a formatting error that slipped in whilst the cargo fmt check
was not working correctly.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A previous version of this change attempted to avoid panicking by not
using .expect() when handling an error when attempting to write to the
log file. Unfortunately the macro eprintln!() that was used to replace
the .expect() also has the behaviour of panicking if stderr cannot be
used. Instead swallow the error completely as if writing to the log has
failed at logging time it is almost certainly the case that any message
about the log would also not be seen.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the vhost-user-blk binary will be removed and the newer
release will integrate this block backend into cloud-hypervisor
binary. The block backend code has been added num_queues cmdline
support, we need update multiple queues help info for this
block-backend in the cloud-hypervisor.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
We measure the memory overhead that the VMM process adds to the guest VM
and compare it with a maximum acceptable limit. The test is run against
a simple VM, running 1 vCPU and 512MB of RAM. Although this is not by
any mean a comprehensive VMM overhead measurement, it will allow us to
detect when and if any PR makes our code cross an arbitrary memory
overhead threshold.
Fixes: #64
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As this can happen during the running of the VMM we should be very
careful not to panic() as that can lead to a thread being used by the VM
disappearing.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding the num_queues parameter for vhost-user-blk backend, which
can enable MQ support in the backend.
This patch has enabled the MQ support from handle_event, and the
vhost-user-backend crate will enable multiple threads to call this
handle_event to handle read/write operations.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-block and
vhost-user-block. For now it is necessary to specify both vhost_user
and socket parameters as auto activation is not yet implemented. The wce
parameter for supporting "Write Cache Enabling" is also added to the
disk configuration.
The original command line parameter is still supported for now and will
be removed in a future release.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a socket and vhost_user parameter to this option so that the same
configuration option can be used for both virtio-net and vhost-user-net.
For now it is necessary to specify both vhost_user and socket parameters
as auto activation is not yet implemented. The original command line
parameter is still supported for now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to reduce the amount of times VMs are being started through
integration tests, this commit consolidates very similar tests related
to virtio-blk into a single one.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Just add a new integration test to verify that multiqueue support is
correctly supported and that we can find the right amount of queues in
the guest.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The number of queues and the size of each queue were not configurable.
In anticipation for adding multiqueue support, this commit introduces
some new parameters to let the user decide about the number of queues
and the queue size.
Note that the default values for each of these parameters are identical
to the default values used for vhost-user-blk, that is 1 for the number
of queues and 128 for the queue size.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add explicit exclusions with --net-backend from --block-backend (and
vice-versa.) And also with "--kernel" as this is the option for "VM boot" that is never optional.
Ideally we would conflcit the backend arguments against the "vm-group"
however this does not work as it includes some arguments that have a
default value set and thus clap thinks those arguments are always
provided. Conflicting with "--kernel" is thus a reasonable compromise.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split the basic launching functionality into its own function in the
newly added vhost_user_block crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split the basic launching functionality into its own function in the
newly added vhost_user_net crate.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extract the majority of the code that provides the vhost-user-block
backend into its own crate and port the binary to use it.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extract the majority of the code that provides the vhost-user-net
backend into its own crate and port the binary to use it.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Generate the VM params until after the logging has been enabled. This
will make it possible to reuse the logging for vhost backends.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previous commit added 'direct' support for virtio-blk to be able to
open files backing disk images with O_DIRECT, so add an integration
test for this new feature.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Previous commit added 'readonly' support for virtio-blk to be able to
configure read-only devices, so add an integration test for this new
feature.
Signed-off-by: Sergio Lopez <slp@redhat.com>
This test is a variant of test_boot_vhost_user_blk(), named
test_boot_vhost_user_blk_direct(), that instances the vhost-user-blk
daemon with 'direct=true', to test this recently introduced feature
for opening files with O_DIRECT.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add missing WCE (write-cache enable) property support. This not only
an enhancement, but also a fix for a bug.
Right now, when vhost_user_blk uses a qcow2 image, it doesn't write
the QCOW2 metadata until the guest explicitly requests a flush. In
practice, this is equivalent to the write back semantic.
Without WCE, the guest assumes write through for the virtio_blk
device, and doesn't send those flush requests. Adding support for WCE,
and enabling it by default, we ensure the guest does send said
requests.
Supporting "WCE = false" would require updating our qcow2
implementation to ensure that, when required, it honors the write
through semantics by not deferring the updates to QCOW2 metadata.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add support for opening the disk images with O_DIRECT. This allows
bypassing the host's file system cache, which is useful to avoid
polluting its cache and for better data integrity.
This mode of operation can be enabled by adding the "direct=<bool>"
parameter to the "backend" argument:
./target/debug/vhost_user_blk --backend image=test.raw,sock=/tmp/vhostblk,direct=true
The "direct" parameter defaults to "false", to preserve the original
behavior.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Use RawFile as backend instead of File. This allows us to abstract
the access to the actual image with a specialized layer, so we have a
place where we can deal with the low-level peculiarities.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Doing I/O on an image opened with O_DIRECT requires to adhere to
certain restrictions, requiring the following elements to be aligned:
- Address of the source/destination memory buffer.
- File offset.
- Length of the data to be read/written.
The actual alignment value depends on various elements, and according
to open(2) "(...) there is currently no filesystem-independent
interface for an application to discover these restrictions (...)".
To discover such value, we iterate through a list of alignments
(currently, 512 and 4096) calling pread() with each one and checking
if the operation succeeded.
We also extend RawFile so it can be used as a backend for QcowFile,
so the later can be easily adapted to support O_DIRECT too.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Update queue number with 4 to verify if vhost-user-net device
and backend could work well with multiple queue.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
There are two new options num_queues and queue_size defined for
virtio-net, add them in test_valid_vm_config_net which is used
to validate that both the CLI and the OpenAPI will generate the
same configuration.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Update the common part in net_util.rs under vm-virtio to add mq
support, meanwhile enable mq for virtio-net device, vhost-user-net
device and vhost-user-net backend. Multiple threads will be created,
one thread will be responsible to handle one queue pair separately.
To gain the better performance, it requires to have the same amount
of vcpus as queue pair numbers defined for the net device, due to
the cpu affinity.
Multiple thread support is not added for vhost-user-net backend
currently, it will be added in future.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Add num_queues and queue_size for virtio-net device to make them configurable,
while add the associated options in command line.
Update cloud-hypervisor.yaml with the new options for NetConfig.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Since the common parts are put into net_util.rs under vm-virtio,
refactoring code for virtio-net device, vhost-user-net device
and backend to shrink the code size and improve readability
meanwhile.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
This specifies how much address space should be reserved for hotplugging
of RAM. This space is reserved by adding move the start of the device
area by the desired amount.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the latest version of virtiofsd, some changes have been made
to the socket path parameter. This needs to be updated in the cloud
hypervisor codebase, so that our CI keeps running correctly.
Fixes#611
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The cache= option from the virtiofsd backend is meant to be used
depending on how the user wants to use the guest page cache. For most
use cases, we want to bypass the guest page cache to reduce the guest
memory footprint, which is why "cache=none" should be the default.
"cache=always" is the other option when the user wants to do something
specific by using the guest page cache, but should not be the default
anymore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With latest kernel, virtio-fs mount command has been simplified, and
this needs to be applied everywhere in our tests and documentation.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because we don't always reach the expected footprint improvements with
KSM, let's review the numbers. By reducing the expectations and
increasing the amount of pages to scan, this should stabilize the CI.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch is to check if block device is readonly
when backend set readonly=true.
The lsblk command can show the RO value in the guest.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
The current backend only support rw, and we also need
add readonly support.
The new command:
vhost_user_blk \
--backend "image=/home/test.img, \
sock=/home/path/vhost.socket, \
readonly=true"
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
The goal here is to ensure that CLI and OpenAPI both behave as closely
as possible, and also that they behave as expected.
Leveraging the reorganization of the code, we can now compare two
VmConfig structures generated from one CLI entry on one side, and from
an OpenAPI entry (JSON payload) on the other side.
Fixes#535
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This brings more modularity to the code, which will be helpful when we
will later test the CLI and OpenAPI generate the same VmConfig output.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
On our CI the /tmp filesystem is mounted as tmpfs and this is the
location where the test disk images are located. When the CI worker
nodes have less memory and fewer CPUs the tmpfs fills up as the tests
run in parallel.
Introduce a mechanism to reduce the parallelism of the tests based on
starvation of the tmpfs disk availability.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By default, and in order to avoid falling into the legacy CLI usage, the
CPU argument should at least include "boot=" to define the number of
CPUs.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to validate that multiple devices can be passed through and
they are still fully functional, this patch extends the existing VFIO
test to pass a second virtio-net device, and verifies that both
interfaces are functional by ssh'ing into each network interface.
Fixes#503
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The easiest way to detect if the kernel is willing to accept hotplug
vCPUs is to check the dmesg output.
Switch the test to bionic as the Clear Cloud image lacks "dmesg."
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to validate the new virtio-fs daemon written in Rust is
behaving correctly, a new integration test has been added. Important to
note that for now, only a test with cache=none and dax=off can be added
since the daemon does not support shared memory region yet.
The long term goal being to replace virtiofsd with vhost_user_daemon
once it will reach parity regarding the supported features.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because the vhost_user_backend crate needs some changes to support
moving the process to a different mount namespace and perform a pivot
root, it is not possible to change '/' to the given shared directory.
This commit, as a temporary measure, let the code point at the given
shared directory.
The long term solution is to perform the mount namespace change and the
pivot root as this will provide greater security.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch implements a vhost-user-fs daemon based on Rust. It only
supports communicating through the virtqueues. The support for the
shared memory region associated with DAX will be added later.
It relies on all the code copied over from the crosvm repository, based
on the commit 961461350c0b6824e5f20655031bf6c6bf6b7c30.
It also relies on the vhost_user_backend crate, limiting the amount of
code needed to get this daemon up and running.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Instead of using bash and awk, using Rust allows us to retrieve
information about a VM process with the right permissions as we are not
forced to spawn a new child process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The test validates that when the mergeable option is enabled, the
resulting PSS for two instances of cloud-hypervisor is lower than two
instances not using the mergeable flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the persistent memory pages should
be marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the guest RAM pages should be
marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When vmm.ping give a response, we expect get the version from
the VMM not the vmm create
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
vmm.ping will help to check if http API server is up and
running.
This also removes the vmm.info endpoint.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Use the new vhost-user-blk backend for the integration tests,
eliminating the need for building vubd using the implementation in
QEMU.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Create a vhost-user-blk backend using vhost-user-backend and following
the conventions established by the existing vhost-user-net
implementation.
This backend is based on https://github.com/slp/vhost-user-backend,
but a bit simplified, making it closer to the original implementation
in Firecracker. The main features missing are EVENT_IDX, support for
asynchronous I/O and multiqueue, but it's still fully functional and
provides a good starting point for evolving it into a more complete
implementation.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Extend VhostUserBackend trait with protocol_features(), so device
backend implementations can freely define which protocol features they
want to support.
Signed-off-by: Sergio Lopez <slp@redhat.com>
A new ClearLinux image has been uploaded to the Azure storage account.
It is based off of the ClearLinux cloudguest image 31310 version, with
two extra bundles added to it.
First bundle is sysadmin-basic to include utility like netcat, and the
second bundle is iperf, adding the iperf binary to the image.
The image is 2G in size.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When VFIO devices are created and if the device is attached to the
virtual IOMMU, the ExternalDmaMapping trait implementation is created
and associated with the device. The idea is to build a hash map of
device IDs with their associated trait implementation.
This hash map is provided to the virtual IOMMU device so that it knows
how to properly trigger external mappings associated with VFIO devices.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a VFIO device should be attached to the virtual
IOMMU or not. That's why we introduce an extra option "iommu" with the
value "on" or "off". By default, the device is not attached, which means
"iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Clean up the error handling and ensure that where possible errors are
propagated. Make use of std::convert::From in order to translate error
types.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Simplify the check for the unusual situation where the memory is not
configured by using .ok_or() on the option to convert it to a result.
This cleans up a bunch of extra indentation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Remove messages that are left over from the development of the project
that represent normal operation for the backend. This cleans up the
console output and improves performance.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We pause a VM from the API, then SSH'ing into it should fail.
After resuming, SSH'ing should work again.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Because the L2 VM running in the VFIO integration test is actually
running as L3 (since the CI runs in a VM), it can take quite some
time for this VM to boot.
The way to solve this issue is to extend the sleep time before to try
communicating with the L2 VM, but also to speed up the boot time by
using virtio-console instead of serial. We suspect the use of serial,
implying PIO VM exits for each character on the serial port is quite
expensive compared to the paravirtualized console.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Azure virtual machines can have private IPs in the 172.16.x.x range,
causing some issues with the VFIO test. By using 172.17.x.x for this
test, we avoid IP conflicts.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that our custom kernel includes all the patches for the full support
of virtio-iommu, we can go one step further by attaching the virtio-net
device to the virtual IOMMU and use it to SSH some commands validating
both disks and the network card are isolated into their own IOMMU group.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that cloud-hypervisor can expose a virtual IOMMU to its guest VM,
the integration test validating the VFIO support with virtio-net can be
updated to use cloud-hypervisor exclusively.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-vsock device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-console device should be attached to
this virtual IOMMU or not. That's why we introduce an extra option
"iommu" with the value "on" or "off". By default, the device is not
attached, which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-pmem device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-rng device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-net device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Having the virtual IOMMU created with --iommu is one thing, but we also
need a way to decide if a virtio-blk device should be attached to this
virtual IOMMU or not. That's why we introduce an extra option "iommu"
with the value "on" or "off". By default, the device is not attached,
which means "iommu=off".
One side effect of this new option is that we had to introduce a new
option for the disk path, simply called "path=".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We use the serde crate to serialize and deserialize the VmVConfig
structure. This structure will be passed from the HTTP API caller as a
JSON payload and we need to deserialize it into a VmConfig.
For a convenient use of the HTTP API, we also provide Default traits
implementations for some of the VmConfig fields (vCPUs, memory, etc...).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The kernel path was the only mandatory command line option.
With the addition of the --api-socket option, we can run without a
kernel path and get it later through the API.
Since we can end up with VM configurations that are no longer valid by
default, we need to provide a validation check for it. For now, if the
kernel path is not defined, the VM configuration is invalid.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The API server will unconditionally run through a UNIX domain socket
which default path is /run/user/<uid>/cloud-hypervisor.<pid>.
The --api-socket command line option allows to override that default
value with some custom socket path.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the API server socket option, we will be able to support a model
where the user can start cloud-hypervisor with no options or an
alternative API server socket path. In this case, we don't want to try
to start a new guest VM, and for that we need to know if the user has
set any VM configuration at all. Grouping all VM configuration specific
options together is one way to be able to know about it.
If the user has not set any VM configuration, we only start the API
server. If it has set anything, we will verify that the overall
configuration is valid and will implicitly convert that configuration
into a request to the API server.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
An integration test relying on the new vhost-user-net backend now
replaces the previous test using the QEMU test backend. This allows
us to avoid building the QEMU backend, and we now really exercise the
vhost-user-net implementation as it is used for the ssh communication
in this test.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Create vhost-user-net backend with Tap interface, to offload network
transaction from cloud-hypervisor. The goal is to provide flexibility
about the backend being in use, but also more security as it will allow
users to isolate the backend with different security profiles since it
will run as a dedicated process on the host.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We now start the main VMM thread, which will be listening for VM and IPC
related events.
In order to start the configured VM, we no longer directly call the VM
API but we use the IPC instead, to first create and then start a VM.
Fixes: #303
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Unlike the Vmm structure we removed with commit bdfd1a3f, this new one
is really meant to represent the VM monitoring/management object.
For that, we implement a control loop that will replace the one that's
currently embedded within the Vm structure itself.
This will allow us to decouple the VM lifecycle management from the VM
object itself, by having a constantly running VMM control loop.
Besides the VM specific events (exit, reset, stdin for now), the VMM
control loop also handles all the Cloud Hypervisor IPC requests.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Once passed to the VM creation routine, a VmConfig structure is
immutable. We can simply carry a Arc of it instead of a reference.
This also allows us to remove any lifetime bound from our VM.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Vmm structure is just a placeholder for the KVM instance. We can
create it directly from the VM creation routine instead.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Now that vhost-user has been fixed regarding size of the virtqueues,
booting a VM with the firmware from a vhost-user-blk backend actually
works. That's why this commit updates the previously introduced
integration test to make it use the firmware instead of direct kernel
boot.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit extends the existing integration test related to
vhost-user-blk by validating the block image contains one file
"foo" containing "bar".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Currently we need use backend device from Qemu to test vhost-user-blk
device. Once the rust backend is ready, we will replace it.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Now that we have expanded our address space we should add automated
testing to try VMs that use a large amount of RAM.
As the hypervisor does an anonymous mmap() for the backing of memory and
during a typical test boot the guest will not touch it all it should be
possible to test large RAM VMs even if that exceeds the RAM of the host
machine.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Label tests that definitely can't function with virtio-mmio (because
they use the firmware) and within those that can be used mark individual
assertions that will no longer hold (around PCI.)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This patch extends the current set of integration tests to correctly
validate that virtio-vsock is functional. It establishes a communication
between host and guest relying on the newly integrated vsock device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Rely on the newly generated Clear Linux image for the integration
testing of cloud-hypervisor. The image has been generated using the
Clear Linux clr-installer tooling, which means it is in compliance with
the Clear Linux licensing.
This new image contains one more bundle that was not part of the default
cloudguest image. This bundle is basic-sysadmin, and contains both nc
and socat utilities.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The goal here is to decouple the Guest instance from the ssh connection
to send some commands to the guest. The reason being to allow ssh
commands to be issued from a different thread, which can be useful to
wait for the end of a command with a thread.join().
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As part of the reboot test check that the binary cleanly terminated
after the subsequent shutdown.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new flag vsock is meant to be used in order to create a VM with a
virtio-vsock device attached to it. Two parameters are needed with this
device, "cid" representing the guest context ID, and "sock" representing
the UNIX socket path which can be accessed from the host.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The default number of MSI-X vector allocated was 2, which is the minimum
defined by the virtio specification. The reason for this minimum is that
virtio needs at least one interrupt to signal that configuration changed
and at least one to specify something happened regarding the virtqueues.
But this current implementation is not optimal because our VMM supports
as many MSI-X vectors as allowed by the MSI-X specification (2048 max).
For that reason, the current patch relies on the number of virtqueues
needed by the virtio device to determine the right amount of MSI-X
vectors needed. It's important not to forget the dedicated vector for
any configuration change too.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Being able to reboot requires us to identify all the resources we are
leaking and cleaning those up before we can enable reboot. For now if
the user requests a reboot then shutdown instead.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that we have ACPI shutdown support "reboot" will actually reboot the
VM rather than trigger the VMM to exit.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The virtiofsd daemon takes a bit of time creating and listening on the
socket. By adding 10s timeout, we make sure the vhost-user socket has
been properly created before the VMM tries to connect to it.
Also, the daemon needs cap_dac_override capabilities to access debugfs
filesystem.
Last thing, both virtio-fs and virtio-pmem tests were slightly different
from the others since they were not explicitly killing cloud-hypervisor
and virtiofsd processes once the test was done.
Fixes#182
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
panic()ing after a panic() has already been recovered by the credibility
test system (i.e. after an aver! has failed) results in an abort which
triggers SIGILL.
Adjust the SSH based commands to generate a Result<...,Error> which we
then either propagate through the test block. Or if the function is
directly being evaluated in an aver! macro call .unwrap_with_default()
(or .unwrap_or() in the case where the default would be wrong.)
See #182
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When virtio-fs is being tested through the integration tests, there is
one specific test where DAX and cache region are disabled. In this case
the virtiofsd daemon should be used with the correct option cache=none
instead of cache=always.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Latest clippy version complains about our existing code for the
following reasons:
- trait objects without an explicit `dyn` are deprecated
- `...` range patterns are deprecated
- lint `clippy::const_static_lifetime` has been renamed to
`clippy::redundant_static_lifetimes`
- unnecessary `unsafe` block
- unneeded return statement
All these issues have been fixed through this patch, and rustfmt has
been run to cleanup potential formatting errors due to those changes.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By introducing new kernel configuration related to DAX support, the
tests are not working as they were before. The format of the image
passed through virtio-pmem needs to be in proper raw format, otherwise
the virtio-pmem driver cannot complete its probing.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The existing integration tests are extended to support both use cases
where dax=on and dax=off.
In order to support DAX, the kernel configuration needs to be updated to
include CONFIG_FS_DAX and CONFIG_ZONE_DEVICE.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support the more performant version of virtio-fs, that is
the one relying on a shared memory region between host and guest, we
introduce two new parameters to the --fs device.
The "dax" parameter allows the user to choose if he wants to use the
shared memory region with virtio-fs. By default, this parameter is "on".
The "cache_size" parameter allows the user to specify the amount of
memory that should be shared between host and guest. By default, the
value of this parameter is 8Gib as advised by virtio-fs maintainers.
Note that dax=off and cache_size are incompatible.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because the way to mount virtio-fs filesystem changed with newest
kernel, we need to update the mount command in our integration tests.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Poor performance was observed when booting kernels with "console=ttyS0"
and the serial port disabled.
This change introduces a "null" console output mode and makes it the
default for the serial console. In this case the serial port
is advertised as per other output modes but there is no input and any
output is dropped.
Fixes: #163
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This makes the log macros (error!, warn!, info!, etc) in the code work.
It currently defaults to showing only error! messages, but by passing an
increasing number of "-v"s on the command line the verbosity can be
increased.
By default log output goes onto stderr but it can also be sent to a
file.
Fixes: #121
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Introduce a new DiskConfig implementation for Ubuntu Bionic with
different cloud init preparation details and use this when testing with
test_simple_launch.
Adjust the memory expectation for downwards as the EFI boot results in a
slightly different memory map. Also enable serial port as Ubuntu does
not support a virtio-console based boot.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than embed the disks in a vector and have integer indicies into
the vector for the different disks instead abstract this through an enum
type used on the DiskConfig trait.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extract the network configuration into its own struct and also extract
the prepare_files() and prepare_cloudinit() functions into a struct for
the Clear Linux distribution.
This struct is behind a trait that is used by the Guest implementation
to prepare the files.
This will allow a different implementation to be used for the Ubuntu
disk files.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And rename the default user from "admin" to "cloud" as the admin user
clashes with a standard user and group name on Ubuntu.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The newer kernel is resulting in entropy being slightly lower than
previously. Adjust the expected entropy downwards.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This should reduce the integration testing time considerably. When a
custom kernel is no longer required we can pull kernel from tarball
again.
Fixes: #100
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The VFIO integration test first boots a QEMU guest and then assigns the
QEMU virtio-pci networking device into a nested cloud-hypervisor guest.
We then check that we can ssh into the nested guest and verify that it's
running with the right kernel command line.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the VFIO crate, we can now support directly assigned PCI devices
into cloud-hypervisor guests.
We support assigning multiple host devices, through the --device command
line parameter. This parameter takes the host device sysfs path.
Fixes: #60
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Two integration tests are added for testing the implemented virtio
console device for single port operation. One checks the presence
and the simple stdout operation. The other test checks the stdout
on file (option: file) using virtio console.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
To use the implemented virtio console device, the users can select one
of the three options ("off", "tty" or "file=/path/to/the/file") with
the command line argument "--console". By default, the console is
enabled as a device named "hvc0" (option: tty). When "off" option is
used, the console device is not added to the VM configuration at all.
Signed-off-by: A K M Fazla Mehrab <fazla.mehrab.akm@intel.com>
Create a struct to handle all the details for the guest under test
including details of network and disks.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Allow replacement of the network details used for the VM. By replacing
those from the file checked into the source tree we can continue to use
the file in the tree for manual testing but adjust the network per-VM to
allow parallel testing.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In the future this will provide the basis for the ability to customise
the cloud-init file per VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
By sleeping more earlier this will speed up the tests as the SSH
connection will complete on the first attempt and thus alleviate timeout
and backoff delays.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Use the tempdir crate to create a temporary directory that is deleted
when the structure goes out of scope.
Use this temporary directory for all temporary test files created by the
tests. The cloud init file is still in /tmp as that is created by the
test wrapper code.
This is the first stage towards being able to run the integration tests
in parallel.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a "--serial" command line that takes as input either "off", "tty"
(default and current behaviour) and "file=/path/to/file".
When "--serial off" is used the serial device is not added to the VM
configuration at all.
Integration tests added that check for interrupts present (or not) and
that when sending to a file the file contains the expected serial
output.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that cloud-hypervisor VMM supports virtio-pmem, it can directly
boot a VM from an image exposed as a persistent memory block device.
That's why there is no need to force the --disk option as being
mandatory.
Fixes#90
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Until now, the VMM was only accepting a single instance of virtio-net
device. This commit extends the virtio-net support by allowing several
devices to be created for a single VM.
Fixes#71
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For every parameter dealing with a size as option, such as memory or
virtio-pmem, the CLI can now parse sizes with the suffixes K, M or G.
Fixes#70
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>