In anticipation for allowing memory zones to be removed, but also in
anticipation for refactoring NUMA parameter, we introduce a mandatory
'id' option to the --memory-zone parameter.
This forces the user to provide a unique identifier for each memory zone
so that we can refer to these.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Through this new parameter, we give users the opportunity to specify a
set of CPUs attached to a NUMA node that has been previously created
from the --memory-zone parameter.
This parameter will be extended in the future to describe the distance
between multiple nodes.
For instance, if a user wants to attach CPUs 0, 1, 2 and 6 to a NUMA
node, here are two different ways of doing so:
Either
./cloud-hypervisor ... --numa id=0,cpus=0-2:6
Or
./cloud-hypervisor ... --numa id=0,cpus=0:1:2:6
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With the introduction of this new option, the user will be able to
describe if a particular memory zone should belong to a specific NUMA
node from a guest perspective.
For instance, using '--memory-zone size=1G,guest_numa_node=2' would let
the user describe that a memory zone of 1G in the guest should be
exposed as being associated with the NUMA node 2.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.
The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introducing a new CLI option --memory-zone letting the user specify
custom memory zones. When this option is present, the --memory size
must be explicitly set to 0.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".
Fixes: #1180
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.
Signed-off-by: Bo Chen <chen.bo@intel.com>
Introducing the new CLI option --sgx-epc along with the OpenAPI
structure SgxEpcConfig, so that a user can now enable one or multiple
SGX Enclave Page Cache sections within a contiguous region from the
guest address space.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit store balloon size to MemoryConfig.
After reboot, virtio-balloon can use this size to inflate back to
the size before reboot.
Signed-off-by: Hui Zhu <teawater@antfin.com>
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_block crate itself.
The binary will be built with:
`cargo build --all --bin vhost_user_block` or just `cargo build --all`
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_net crate itself.
The binary will be built with:
`cargo build --all --bin vhost_user_net` or just `cargo build --all`
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_fs crate itself.
The binary will be built with:
`cargo build --all --bin vhost_user_fs` or just `cargo build --all`
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This needed to be updated to include specifying the boot and maxmium
vCPUs as well as the newly added topology for those vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This allows the user to optionally specify the desired CPU topology. All
parts of the topology must be specified and the product of all parts
must match the maximum vCPUs.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Currently, not every feature of the cloud-hypervisor is enabled
on AArch64, which means that on AArch64 machines, the
`run_unit_tests.sh` needs to be tailored and some unit test cases
should be run on x86_64 only.
Also this commit fixes the typo and unifies `Arm64` and `AArch64`
in the AArch64 document.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>
If the server returns an error then print out the response body if there
is one present.
Fixes: #1262
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This config option provided very little value and instead we now enable
this feature (which then lets the guest control the cache mode)
unconditionally.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the method that is used to decide whether the guest should be
signalled into the Queue implementation from vm-virtio. This removes
duplicated code between vhost_user_backend and the vm-virtio block
implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.
Support for vhost-user-net self spawning was also included.
Fixes: #1177
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.
The default behaviour is to kill the process, this normally gets
audit logged.
--seccomp none disables seccomp
log Just logs violations but doesn't stop it
trap causes a signal to be be sent that can be trapped.
If you suspect you're hitting a seccomp action then you can
check the audit log; you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.
This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.
Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Changes is vhost crate require VhostUserDaemon users to create and
provide a vhost::Listener in advance. This allows us to adopt
sandboxing strategies in the future, by being able to create the UNIX
socket before switching to a restricted namespace.
Update also the reference to vhost crate in Cargo.lock to point to the
latest commit from the dragonball branch.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Rather than repeat syntax for the vhost-user-block backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than repeat syntax for the vhost-user-net backend in multiple
places store it in one place and reference it from the required places.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The ch-remote usage says:
OPTIONS:
--api-socket <api-socket> HTTP API socket path (UNIX domain socket).
However it doesn't seem to actually accept that syntax, instead
requiring "--api-socket=". This may be a clap bug however it is resolved
by setting the number of arguments requires to exactly one. Which is
also the actual correct number.
Fixes: #1117Fixes: #1116
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
It's possible to have multiple vsock devices so in preparation for
hotplug/unplug it is important to be able to have a unique identifier
for each device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Check that if any device using vhost-user (net & disk with
vhost_user=true) or virtio-fs is enabled then check shared memory is
also enabled.
Fixes: #848
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").
Fixes: #1011
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
Currently unimplemented. Once implemented, this API will allow for
creating virtio-fs devices in the VM after it has booted.
Signed-off-by: Dean Sheather <dean@coder.com>
By adding a "thread_id" parameter to handle_event(), the backend crate
can now indicate to the backend implementation which thread triggered
the processing of some events.
This is applied to vhost-user-net backend and allows for simplifying a
lot the code since each thread is identical.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By adding the "thread_index" parameter to the function exit_event() from
the VhostUserBackend trait, the backend crate now has the ability to ask
the backend implementation about the exit event related to a specific
thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By changing the mutability of this function, after adapting all
backends, we should be able to implement multithreads with
multiqueues support without hitting a bottleneck on the backend
locking.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Both blk, net and fs backends have been updated to avoid the requirement
of having handle_event(&mut self). This will allow the backend crate to
avoid taking a write lock onto the backend object, which will remove the
potential contention point when multiple threads will be handling
multiqueues.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
As the VmConfig::Parse() also does validation work it only make sense to
parse the VM options on the VM boot path only.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Replace the existing VmConfig::valid() check with a call into
.validate() as part of earlier config setup or boot API checks.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))
Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".
Signed-off-by: Hui Zhu <teawater@antfin.com>
This change introduces a new CLI option --seccomp. This allows the user
to enable/disable the seccomp filters when needed. Because the user now
has the possibility to disable the seccomp filters, and because the
Cloud-Hypervisor project wants to enforce the maximum security by
default, the seccomp filters are now applied by default.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.
This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.
Signed-off-by: Sergio Lopez <slp@redhat.com>
This change enables vhost_user_fs to process multiple requests in
parallel by scheduling them into a ThreadPool (from the Futures
crate).
Parallelism on a single file is limited by the nature of the operation
executed on it. A recent commit replaced the Mutex that protects the
File within HandleData with a RwLock, to allow some operations (at
this moment, only "read" and "write") to proceed in parallel by
acquiring a read lock.
A more complex approach was also implemented [1], involving
instrumentation through vhost_user_backend to be able to serialize
completions, reducing the pressure on the vring RwLock. This strategy
improved the performance on some corner cases, while making it worse
on other, more common ones. This fact, in addition to it requiring
wider changes through the source code, prompted me to drop it in favor
of this one.
[1] https://github.com/slp/cloud-hypervisor/tree/vuf_async
Signed-off-by: Sergio Lopez <slp@redhat.com>
This option was superseded by using "--net" with "vhost_user=true". This
option wasn't being parsed any more but was left over.
Fixes: #806
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Some of the help strings had extra newlines in them or otherwise strange
wrapping. The strings were rewrapped with the nightly version of rustfmt
that supports string formatting.
Fixes: #899
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This change, combined with the compiler hint to inline get_used_event,
shortens the window between the memory read and the actual check by
calling get_used_event from needs_notification.
Without it, when putting enough pressure on the vring, it's possible
that a notification is wrongly omitted, causing the queue to stall.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.
This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.
If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.
Fixes#881
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Support sending a request body this will usually be JSON encoded data
representing the details of the request.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit introduces a basic implementation of a remote control of a
running VMM implementing a subset of the API.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We weren't processing events arriving at the HIPRIO queue, which
implied ignoring FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
requests.
One effect of this issue was that file descriptors weren't closed on
the server, so it eventually hits RLIMIT_NOFILE. Additionally, the
guest OS may hang while attempting to unmount the filesystem.
Signed-off-by: Sergio Lopez <slp@redhat.com>
vhost_user_fs doesn't really support all vhost protocol features, just
MQ and SLAVE_REQ, so return that in protocol_features().
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.
Signed-off-by: Sergio Lopez <slp@redhat.com>
The integration test test_memory_mergeable_on has been fairly unstable
for quite some time now. Because it can take some time for the VM to be
spawned and to be able to perform a correct measure of the PSS, this
commit simply increases the time before such measure is done.
This should return more accurate PSS results, which should help
stabilize the test.
Fixes#781
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>