Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This needed to be updated to include specifying the boot and maxmium
vCPUs as well as the newly added topology for those vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the user to optionally specify the desired CPU topology. All
parts of the topology must be specified and the product of all parts
must match the maximum vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently, not every feature of the cloud-hypervisor is enabled
on AArch64, which means that on AArch64 machines, the
`run_unit_tests.sh` needs to be tailored and some unit test cases
should be run on x86_64 only.
Also this commit fixes the typo and unifies `Arm64` and `AArch64`
in the AArch64 document.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
If the server returns an error then print out the response body if there
is one present.
Fixes: #1262
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This config option provided very little value and instead we now enable
this feature (which then lets the guest control the cache mode)
unconditionally.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the method that is used to decide whether the guest should be
signalled into the Queue implementation from vm-virtio. This removes
duplicated code between vhost_user_backend and the vm-virtio block
implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.
Support for vhost-user-net self spawning was also included.
Fixes: #1177
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.
The default behaviour is to kill the process, this normally gets
audit logged.
--seccomp none disables seccomp
log Just logs violations but doesn't stop it
trap causes a signal to be be sent that can be trapped.
If you suspect you're hitting a seccomp action then you can
check the audit log; you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.
This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.
Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Changes is vhost crate require VhostUserDaemon users to create and
provide a vhost::Listener in advance. This allows us to adopt
sandboxing strategies in the future, by being able to create the UNIX
socket before switching to a restricted namespace.
Update also the reference to vhost crate in Cargo.lock to point to the
latest commit from the dragonball branch.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Rather than repeat syntax for the vhost-user-block backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than repeat syntax for the vhost-user-net backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The ch-remote usage says:
OPTIONS:
--api-socket <api-socket> HTTP API socket path (UNIX domain socket).
However it doesn't seem to actually accept that syntax, instead
requiring "--api-socket=". This may be a clap bug however it is resolved
by setting the number of arguments requires to exactly one. Which is
also the actual correct number.
Fixes: #1117Fixes: #1116
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It's possible to have multiple vsock devices so in preparation for
hotplug/unplug it is important to be able to have a unique identifier
for each device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Check that if any device using vhost-user (net & disk with
vhost_user=true) or virtio-fs is enabled then check shared memory is
also enabled.
Fixes: #848
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").
Fixes: #1011
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
Currently unimplemented. Once implemented, this API will allow for
creating virtio-fs devices in the VM after it has booted.
Signed-off-by: Dean Sheather <dean@coder.com>
By adding a "thread_id" parameter to handle_event(), the backend crate
can now indicate to the backend implementation which thread triggered
the processing of some events.
This is applied to vhost-user-net backend and allows for simplifying a
lot the code since each thread is identical.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding the "thread_index" parameter to the function exit_event() from
the VhostUserBackend trait, the backend crate now has the ability to ask
the backend implementation about the exit event related to a specific
thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By changing the mutability of this function, after adapting all
backends, we should be able to implement multithreads with
multiqueues support without hitting a bottleneck on the backend
locking.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both blk, net and fs backends have been updated to avoid the requirement
of having handle_event(&mut self). This will allow the backend crate to
avoid taking a write lock onto the backend object, which will remove the
potential contention point when multiple threads will be handling
multiqueues.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As the VmConfig::Parse() also does validation work it only make sense to
parse the VM options on the VM boot path only.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Replace the existing VmConfig::valid() check with a call into
.validate() as part of earlier config setup or boot API checks.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".
Signed-off-by: Hui Zhu <teawater@antfin.com>
This change introduces a new CLI option --seccomp. This allows the user
to enable/disable the seccomp filters when needed. Because the user now
has the possibility to disable the seccomp filters, and because the
Cloud-Hypervisor project wants to enforce the maximum security by
default, the seccomp filters are now applied by default.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.
This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.
Signed-off-by: Sergio Lopez <slp@redhat.com>
This change enables vhost_user_fs to process multiple requests in
parallel by scheduling them into a ThreadPool (from the Futures
crate).
Parallelism on a single file is limited by the nature of the operation
executed on it. A recent commit replaced the Mutex that protects the
File within HandleData with a RwLock, to allow some operations (at
this moment, only "read" and "write") to proceed in parallel by
acquiring a read lock.
A more complex approach was also implemented [1], involving
instrumentation through vhost_user_backend to be able to serialize
completions, reducing the pressure on the vring RwLock. This strategy
improved the performance on some corner cases, while making it worse
on other, more common ones. This fact, in addition to it requiring
wider changes through the source code, prompted me to drop it in favor
of this one.
[1] https://github.com/slp/cloud-hypervisor/tree/vuf_async
Signed-off-by: Sergio Lopez <slp@redhat.com>
This option was superseded by using "--net" with "vhost_user=true". This
option wasn't being parsed any more but was left over.
Fixes: #806
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some of the help strings had extra newlines in them or otherwise strange
wrapping. The strings were rewrapped with the nightly version of rustfmt
that supports string formatting.
Fixes: #899
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This change, combined with the compiler hint to inline get_used_event,
shortens the window between the memory read and the actual check by
calling get_used_event from needs_notification.
Without it, when putting enough pressure on the vring, it's possible
that a notification is wrongly omitted, causing the queue to stall.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.
This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.
If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.
Fixes#881
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Support sending a request body this will usually be JSON encoded data
representing the details of the request.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit introduces a basic implementation of a remote control of a
running VMM implementing a subset of the API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We weren't processing events arriving at the HIPRIO queue, which
implied ignoring FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
requests.
One effect of this issue was that file descriptors weren't closed on
the server, so it eventually hits RLIMIT_NOFILE. Additionally, the
guest OS may hang while attempting to unmount the filesystem.
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_fs doesn't really support all vhost protocol features, just
MQ and SLAVE_REQ, so return that in protocol_features().
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The integration test test_memory_mergeable_on has been fairly unstable
for quite some time now. Because it can take some time for the VM to be
spawned and to be able to perform a correct measure of the PSS, this
commit simply increases the time before such measure is done.
This should return more accurate PSS results, which should help
stabilize the test.
Fixes#781
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Now that vhost_user_fs rust daemon supports virtiofs's dax mode, this adds
the two dax tests accordingly.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
This adds the missing part of supporting virtiofs dax on the slave end,
that is, receiving a socket pair fd from the master end to set up a
communication channel for sending setupmapping & removemapping messages.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Add a build-script to propagate the git commit hash to other crates at
compile time through environment variables, and display the hash along
with the '--version' option.
Fixes#729
Signed-off-by: Bo Chen <chen.bo@intel.com>
test_vfio has been failing consistently on the CI so mark it with
a "#[ignore]" and then forceably build it again but ignore the build
result.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than handling the KILL_EVENT in the event handler itself use the
newly added support in VhostUserBackend for providing a kill event
framework.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Check the rust formatting rather than just reformatting code on the CI
agent.
Also fix a formatting error that slipped in whilst the cargo fmt check
was not working correctly.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
A previous version of this change attempted to avoid panicking by not
using .expect() when handling an error when attempting to write to the
log file. Unfortunately the macro eprintln!() that was used to replace
the .expect() also has the behaviour of panicking if stderr cannot be
used. Instead swallow the error completely as if writing to the log has
failed at logging time it is almost certainly the case that any message
about the log would also not be seen.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Since the vhost-user-blk binary will be removed and the newer
release will integrate this block backend into cloud-hypervisor
binary. The block backend code has been added num_queues cmdline
support, we need update multiple queues help info for this
block-backend in the cloud-hypervisor.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
We measure the memory overhead that the VMM process adds to the guest VM
and compare it with a maximum acceptable limit. The test is run against
a simple VM, running 1 vCPU and 512MB of RAM. Although this is not by
any mean a comprehensive VMM overhead measurement, it will allow us to
detect when and if any PR makes our code cross an arbitrary memory
overhead threshold.
Fixes: #64
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As this can happen during the running of the VMM we should be very
careful not to panic() as that can lead to a thread being used by the VM
disappearing.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding the num_queues parameter for vhost-user-blk backend, which
can enable MQ support in the backend.
This patch has enabled the MQ support from handle_event, and the
vhost-user-backend crate will enable multiple threads to call this
handle_event to handle read/write operations.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>