Commit Graph

365 Commits

Author SHA1 Message Date
Sebastien Boeuf
3ff82b4b65 main, vmm: Add mandatory id to memory zones
In anticipation for allowing memory zones to be removed, but also in
anticipation for refactoring NUMA parameter, we introduce a mandatory
'id' option to the --memory-zone parameter.

This forces the user to provide a unique identifier for each memory zone
so that we can refer to these.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-07 07:37:14 +02:00
Sebastien Boeuf
42f963d6f2 main, vmm: Add new --numa parameter
Through this new parameter, we give users the opportunity to specify a
set of CPUs attached to a NUMA node that has been previously created
from the --memory-zone parameter.

This parameter will be extended in the future to describe the distance
between multiple nodes.

For instance, if a user wants to attach CPUs 0, 1, 2 and 6 to a NUMA
node, here are two different ways of doing so:
Either
	./cloud-hypervisor ... --numa id=0,cpus=0-2:6
Or
	./cloud-hypervisor ... --numa id=0,cpus=0:1:2:6

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 15:25:00 +02:00
Sebastien Boeuf
768dbd1fb0 vmm: Add 'guest_numa_node' option to 'memory-zone'
With the introduction of this new option, the user will be able to
describe if a particular memory zone should belong to a specific NUMA
node from a guest perspective.

For instance, using '--memory-zone size=1G,guest_numa_node=2' would let
the user describe that a memory zone of 1G in the guest should be
exposed as being associated with the NUMA node 2.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-01 14:11:49 +02:00
Sebastien Boeuf
e6f585a31c vmm: Add 'host_numa_nodes' option to memory zones
Since memory zones have been introduced, it is now possible for a user
to specify multiple backends for the guest RAM. By adding a new option
'host_numa_node' to the 'memory-zone' parameter, we allow the guest RAM
to be backed by memory that might come from a specific NUMA node on the
host.

The option expects a node identifier, specifying which NUMA node should
be used to allocate the memory associated with a specific memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
ad5d0e4713 vmm: Remove 'mergeable' from memory zones
The flag 'mergeable' should only apply to the entire guest RAM, which is
why it is removed from the MemoryZoneConfig as it is defined as a global
parameter at the MemoryConfig level.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 07:26:49 +02:00
Sebastien Boeuf
c58dd761f4 vmm: Remove 'file' option from MemoryConfig
After the introduction of user defined memory zones, we can now remove
the deprecated 'file' option from --memory parameter. This makes this
parameter simpler, letting more advanced users define their own custom
memory zones through the dedicated parameter.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Sebastien Boeuf
be475ddc22 main, vmm: Let the user define distincts memory zones
Introducing a new CLI option --memory-zone letting the user specify
custom memory zones. When this option is present, the --memory size
must be explicitly set to 0.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-25 16:43:10 +02:00
Bo Chen
8e74637ebb main, vmm: seccomp: Add the '--seccomp log' option
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".

Fixes: #1180

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
b41884a406 main, vmm: seccomp: Use SeccompAction instead of SeccompLevel
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Wei Liu
085d165f8a bin: switch to hypervisor::new
Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-07-15 17:21:07 +02:00
Sebastien Boeuf
d9244e9f4c vmm: Add option for enabling SGX EPC regions
Introducing the new CLI option --sgx-epc along with the OpenAPI
structure SgxEpcConfig, so that a user can now enable one or multiple
SGX Enclave Page Cache sections within a contiguous region from the
guest address space.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-07-15 15:08:56 +02:00
Hui Zhu
800220acbb virtio-balloon: Store the balloon size to support reboot
This commit store balloon size to MemoryConfig.
After reboot, virtio-balloon can use this size to inflate back to
the size before reboot.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-07-07 17:25:13 +01:00
Hui Zhu
8ffbc3d031 vmm: api: ch-remote: Add balloon to VmResizeData
Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-07-07 17:25:13 +01:00
Hui Zhu
8b6b97b86f vmm: Add virtio-balloon support
This commit adds new option balloon to memory config.
Set it to on will open the balloon function.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-07-07 17:25:13 +01:00
Rob Bradford
72802b34cd vhost_user_block: Move binary into vhost_user_block crate
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_block crate itself.

The binary will be built with:

`cargo build --all --bin vhost_user_block` or just `cargo build --all`

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-06 10:56:10 +02:00
Rob Bradford
6959d27e8c vhost_user_net: Move binary into vhost_user_net crate
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_net crate itself.

The binary will be built with:

`cargo build --all --bin vhost_user_net` or just `cargo build --all`

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-06 10:56:10 +02:00
Rob Bradford
d6a05ceabb vhost_user_fs: Move binary into vhost_user_fs crate
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_fs crate itself.

The binary will be built with:

`cargo build --all --bin vhost_user_fs` or just `cargo build --all`

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-06 10:56:10 +02:00
Rob Bradford
2a6eb31d5b vm-virtio, virtio-devices: Split device implementation from virt queues
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.

This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-07-02 17:09:28 +01:00
Wei Liu
b27439b6ed arch, hypervisor, vmm: KvmHyperVisor -> KvmHypervisor
"Hypervisor" is one word. The "v" shouldn't be capitalised.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2020-06-25 10:25:13 +02:00
Rob Bradford
9b7afd4aac bin: ch-remote: Implement "counters" command
This is used to obtain the counters from the VM. The raw JSON data is
presented to the user.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-06-25 07:02:44 +02:00
Muminul Islam
e4dee57e81 arch, pci, vmm: Initial switch to the hypervisor crate
Start moving the vmm, arch and pci crates to being hypervisor agnostic
by using the hypervisor trait and abstractions. This is not a complete
switch and there are still some remaining KVM dependencies.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-06-22 15:03:15 +02:00
Rob Bradford
284891b5df main: Populate "--cpus" with appropriate syntax definition
This needed to be updated to include specifying the boot and maxmium
vCPUs as well as the newly added topology for those vCPUs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-06-17 12:18:09 +02:00
Rob Bradford
4a0439a993 vmm: config: Extend CpusConfig to add the topology
This allows the user to optionally specify the desired CPU topology. All
parts of the topology must be specified and the product of all parts
must match the maximum vCPUs.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-06-17 12:18:09 +02:00
Henry Wang
99e72be169 unit tests: Fix unit tests and docs for AArch64
Currently, not every feature of the cloud-hypervisor is enabled
on AArch64, which means that on AArch64 machines, the
`run_unit_tests.sh` needs to be tailored and some unit test cases
should be run on x86_64 only.

Also this commit fixes the typo and unifies `Arm64` and `AArch64`
in the AArch64 document.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-06-15 17:28:05 +01:00
Anatol Belski
abd6204d27 source: Fix file permissions
Rust sources and some data files should not be executable. The perms are
set to 644.

Signed-off-by: Anatol Belski <ab@php.net>
2020-06-10 18:47:27 +01:00
Bo Chen
eda9bfc7a1 vhost_user_fs: Replace the '--sock' parameter with '--socket'
We are keeping the '--sock' parameter for backward compatibility.

Fixes: #1091

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-06-08 17:41:12 +02:00
Bo Chen
a8cdf2f070 tests,vm-virtio,vmm: Use 'socket' for all CLI/API parameters
This patch unifies the inconsistent uses of 'socket' and 'sock' from our
CLI/API parameters.

Fixes: #1091

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-06-08 17:41:12 +02:00
Rob Bradford
90e7accf8b ch-remote: Show response body from error
If the server returns an error then print out the response body if there
is one present.

Fixes: #1262

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-06-08 17:35:54 +02:00
Rob Bradford
c31ad72ee9 build: Address issues found by 1.43.0 clippy
These are mostly due to use of "bare use" statements and unnecessary vector
creation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-27 19:32:12 +02:00
Rob Bradford
3497eeff49 main: Set the umask to 0077
This ensures that all created filed are only read/write for the current user.

Fixes: #1240

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-27 16:46:51 +01:00
Rob Bradford
af8292b623 vmm, config, vhost_user_blk: remove "wce" parameter
This config option provided very little value and instead we now enable
this feature (which then lets the guest control the cache mode)
unconditionally.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-21 08:40:43 +02:00
Rob Bradford
a813b57f59 vm-virtio, vhost_user_{fs,block,backend}: Move EVENT_IDX handling
Move the method that is used to decide whether the guest should be
signalled into the Queue implementation from vm-virtio. This removes
duplicated code between vhost_user_backend and the vm-virtio block
implementation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-20 12:56:25 +02:00
Rob Bradford
1b8b5ac179 vhost-user_net, vm-virtio, vmm: Permit host MAC address setting
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.

Support for vhost-user-net self spawning was also included.

Fixes: #1177

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-15 11:45:09 +01:00
Dr. David Alan Gilbert
4120a7dee9 vhost_user_fs: Add seccomp
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.

The default behaviour is to kill the process, this normally gets
audit logged.

--seccomp none  disables seccomp
          log   Just logs violations but doesn't stop it
          trap  causes a signal to be be sent that can be trapped.

If you suspect you're hitting a seccomp action then you can
check the audit log;  you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.

This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-14 18:56:19 +02:00
Sergio Lopez
6aab0a5458 vhost_user_fs: Implement support for optional sandboxing
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.

Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-05-14 17:16:23 +02:00
Sergio Lopez
c4bf383fd7 vhost_user_*: Create a vhost::Listener in advance
Changes is vhost crate require VhostUserDaemon users to create and
provide a vhost::Listener in advance. This allows us to adopt
sandboxing strategies in the future, by being able to create the UNIX
socket before switching to a restricted namespace.

Update also the reference to vhost crate in Cargo.lock to point to the
latest commit from the dragonball branch.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-05-14 17:16:23 +02:00
Rob Bradford
f3f398eb44 vhost_user_block: Consolidate the vhost-user-block backend syntax
Rather than repeat syntax for the vhost-user-block backend in multiple
places store it in one place and reference it from the required places.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-11 09:40:40 +02:00
Rob Bradford
3220292d45 vhost_user_net: Consolidate the vhost-user-net backend syntax
Rather than repeat syntax for the vhost-user-net backend in multiple
places store it in one place and reference it from the required places.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-11 09:40:40 +02:00
Rob Bradford
0d720cc3d8 bin: ch-remote: Ensure ch-remote supports syntax it advertises
The ch-remote usage says:

OPTIONS:
        --api-socket <api-socket>    HTTP API socket path (UNIX domain socket).

However it doesn't seem to actually accept that syntax, instead
requiring "--api-socket=". This may be a clap bug however it is resolved
by setting the number of arguments requires to exactly one. Which is
also the actual correct number.

Fixes: #1117
Fixes: #1116

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-08 09:54:50 +01:00
Rob Bradford
d5bfa2dfc8 vmm, vhost_user_block: Make parameter names match --disk
Make the --block-backend parameters match the --disk parameters.

Fixes: #898

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-30 15:20:55 +02:00
Rob Bradford
6c2bca5f1b bin: ch-remote: Add support for adding vsock devices
Add support for adding a vsock device using the HTTP API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 12:44:49 +01:00
Rob Bradford
f8501a3bd3 vmm: config: Move --vsock syntax to VsockConfig
This means it can be reused with ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 12:44:49 +01:00
Sebastien Boeuf
6e049e0da1 vmm: Add an identifier to the --vsock device
It's possible to have multiple vsock devices so in preparation for
hotplug/unplug it is important to be able to have a unique identifier
for each device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 12:44:49 +01:00
Rob Bradford
10348f73e4 vmm, main: Support only zero or one vsock devices
The Linux kernel does not support multiple virtio-vsock devices.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-28 20:07:18 +02:00
Rob Bradford
7481e4d959 vmm: config: Validate that shared memory is enabled if using vhost-user
Check that if any device using vhost-user (net & disk with
vhost_user=true) or virtio-fs is enabled then check shared memory is
also enabled.

Fixes: #848

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-24 16:01:49 +01:00
Bo Chen
3f42f86d81 vmm: Add the 'shared' and 'hugepages' controls to MemoryConfig
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").

Fixes: #1011

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-04-23 21:39:51 +02:00
Dean Sheather
bb2139a408 vmm/api: Add vm.add-fs route
Currently unimplemented. Once implemented, this API will allow for
creating virtio-fs devices in the VM after it has booted.

Signed-off-by: Dean Sheather <dean@coder.com>
2020-04-20 20:36:26 +02:00
Sebastien Boeuf
1a0a2c0182 vhost_user_backend: Provide the thread ID to handle_event()
By adding a "thread_id" parameter to handle_event(), the backend crate
can now indicate to the backend implementation which thread triggered
the processing of some events.

This is applied to vhost-user-net backend and allows for simplifying a
lot the code since each thread is identical.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-14 14:11:41 +02:00
Sebastien Boeuf
cfffb7edb0 vhost_user_backend: Allow for one exit_event per thread
By adding the "thread_index" parameter to the function exit_event() from
the VhostUserBackend trait, the backend crate now has the ability to ask
the backend implementation about the exit event related to a specific
thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-14 14:11:41 +02:00
Sebastien Boeuf
40e4dc6339 vhost_user_backend: Change handle_event as immutable
By changing the mutability of this function, after adapting all
backends, we should be able to implement multithreads with
multiqueues support without hitting a bottleneck on the backend
locking.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-14 14:11:41 +02:00
Sebastien Boeuf
8f434df1fb vhost_user: Adapt backends to let handle_event be immutable
Both blk, net and fs backends have been updated to avoid the requirement
of having handle_event(&mut self). This will allow the backend crate to
avoid taking a write lock onto the backend object, which will remove the
potential contention point when multiple threads will be handling
multiqueues.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-14 14:11:41 +02:00
Rob Bradford
31928fb103 main: Consistently use eprintln!() for error messages
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-08 12:06:09 +01:00
Rob Bradford
11dd609fa5 main: Only try and parse VM options on VM boot path
As the VmConfig::Parse() also does validation work it only make sense to
parse the VM options on the VM boot path only.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-08 12:06:09 +01:00
Rob Bradford
aaf382eee2 vmm: Move kernel check to VmConfig::validate() method
Replace the existing VmConfig::valid() check with a call into
.validate() as part of earlier config setup or boot API checks.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-08 12:06:09 +01:00
Sebastien Boeuf
a517ca23a0 vmm: Move restore parameters into common RestoreConfig structure
The goal here is to move the restore parameters into a dedicated
structure that can be reused from the entire codebase, making the
addition or removal of a parameter easier.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-08 10:56:14 +02:00
Rob Bradford
22958261aa main: Print human readable error for command line error
Fixes: #367

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-06 10:31:24 +01:00
Sebastien Boeuf
3ef1c00cfb ch-remote: Fix snapshot and restore subcommands
So that they are listed and can be used as expected.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 17:55:30 +01:00
Sebastien Boeuf
dc97b67dac main: Fix restore CLI
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 17:55:30 +01:00
Sebastien Boeuf
859a96181f ch-remote: Add --restore option
Introduce restore wrapper to ch-remote.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 13:24:25 +01:00
Sebastien Boeuf
35c0ea6c25 ch-remote: Add --snapshot option
Introduce the snapshot wrapper to ch-remote.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 13:24:25 +01:00
Samuel Ortiz
fe2d884605 main: Support VM restore from the command line
Through the new CLI --restore option.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-04-02 13:24:25 +01:00
Damjan Georgievski
4db252b418 main, vmm: add --initramfs cli option
currently unused, the initramfs argument is added to the cli,
and stored in vmm::config:VmConfig as an Option(InitramfsConfig(PathBuf))

Signed-off-by: Damjan Georgievski <gdamjan@gmail.com>
2020-03-26 11:59:03 +01:00
Rob Bradford
f3f4d07595 ch-remote: Add support for hotplugging network devices
Call the new HTTP API for hotplugging network devices using the same
syntax as coldplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Rob Bradford
9df601a1df bin, vmm: Centralise the net syntax
This will allow the syntax to be reused with cloud-hypervsor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 17:58:06 +01:00
Hui Zhu
4a7a2cff8c tests: Add test for hotplug_size and hotplug_method
Add test for hotplug_size and hotplug_method.

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Hui Zhu
e6b934a56a vmm: Add support for virtio-mem
This commit adds new option hotplug_method to memory config.
It can set the hotplug method to "acpi" or "virtio-mem".

Signed-off-by: Hui Zhu <teawater@antfin.com>
2020-03-25 15:54:16 +01:00
Rob Bradford
0b0510108d ch-remote: Add support for hotplugging persistent memory
Call the new HTTP API for hotplugging persistent memory using the same
syntax as coldplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
a7296bbb52 bin, vmm: Centralise the pmem syntax
This will allow the syntax to be reused with cloud-hypervisor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 13:18:17 +01:00
Rob Bradford
05ce2dc820 ch-remote: Add support for hotplugging disks
Call the new HTTP API for hotplugging disks using the same syntax as
disk coldplug.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Rob Bradford
66da29d8dd bin, vmm: Centralise the disk syntax
This will allow the syntax to be reused with cloud-hypervsor binary and
ch-remote.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-25 09:35:53 +00:00
Sebastien Boeuf
5120c275a2 main: Add seccomp support
This change introduces a new CLI option --seccomp. This allows the user
to enable/disable the seccomp filters when needed. Because the user now
has the possibility to disable the seccomp filters, and because the
Cloud-Hypervisor project wants to enforce the maximum security by
default, the seccomp filters are now applied by default.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
db62cb3f4d vmm: Add seccomp filter to the VMM thread
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Rob Bradford
f7197e8415 vmm: Add a "discard_writes=" to --pmem
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.

This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-20 14:46:34 +01:00
Rob Bradford
477bc17f18 bin: Share VFIO device syntax between cloud-hypervisor and ch-remote
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-18 23:38:55 +00:00
Sergio Lopez
07cc73bddc vhost_user_fs: add a flag to disable extended attributes
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-13 15:20:34 +00:00
Sergio Lopez
710520e9a1 vhost_user_fs: Process requests in parallel with a thread pool
This change enables vhost_user_fs to process multiple requests in
parallel by scheduling them into a ThreadPool (from the Futures
crate).

Parallelism on a single file is limited by the nature of the operation
executed on it. A recent commit replaced the Mutex that protects the
File within HandleData with a RwLock, to allow some operations (at
this moment, only "read" and "write") to proceed in parallel by
acquiring a read lock.

A more complex approach was also implemented [1], involving
instrumentation through vhost_user_backend to be able to serialize
completions, reducing the pressure on the vring RwLock. This strategy
improved the performance on some corner cases, while making it worse
on other, more common ones. This fact, in addition to it requiring
wider changes through the source code, prompted me to drop it in favor
of this one.

[1] https://github.com/slp/cloud-hypervisor/tree/vuf_async

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-13 15:20:34 +00:00
Rob Bradford
4579afa091 vmm: For --disk error if socket and path is specified
This is an error as the path should be specfied by the unmanaged
backend.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:41:52 +00:00
Rob Bradford
4f2469e054 main: Remove "--vhost-user-net"
This option was superseded by using "--net" with "vhost_user=true". This
option wasn't being parsed any more but was left over.

Fixes: #806

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:00:43 +00:00
Rob Bradford
ca3b39c0be bin: Fix wrapping in help strings
Some of the help strings had extra newlines in them or otherwise strange
wrapping. The strings were rewrapped with the nightly version of rustfmt
that supports string formatting.

Fixes: #899

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 18:03:18 +00:00
Sergio Lopez
3957d1ee27 vhost_user_backend: call get_used_event from needs_notification
This change, combined with the compiler hint to inline get_used_event,
shortens the window between the memory read and the actual check by
calling get_used_event from needs_notification.

Without it, when putting enough pressure on the vring, it's possible
that a notification is wrongly omitted, causing the queue to stall.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-12 14:34:21 +00:00
Rob Bradford
9a7d9c9465 ch-remote: Support removing VFIO devices
Add a "remove-device" command that allows removing VFIO devices from the
VM after boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Rob Bradford
0d53ba4395 ch-remote: Support adding VFIO devices
Add an "add-device" command that allows adding VFIO devices to the VM
after boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Rob Bradford
babefbd9bf main: Remove spurious second help line for "--device"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Sebastien Boeuf
9023444ad3 vmm: Add id field to --device through CLI
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.

This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.

If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.

Fixes #881

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:57 +00:00
Rob Bradford
21160f7490 ch-remote: Add "resize" command
This command lets you change the number of vCPUs and RAM that the VM
has.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
bb2d04b39d ch-remote: Add support for sending a request body
Support sending a request body this will usually be JSON encoded data
representing the details of the request.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
bde4f735ab ch-remote: Refactor HTTP response handling
Extract HTTP response handling into its own function.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
ba8cd4d55a bin: Introduce "ch-remote" for controlling VMM
This commit introduces a basic implementation of a remote control of a
running VMM implementing a subset of the API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-09 15:03:01 +00:00
Cathy Zhang
6341736286 vhost_user_net: Provide tap option for vhost_user_net backend
Provide vhost_user_net backend with the tap option, it allows to
use the existing tap interface.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-03-05 15:09:20 +00:00
Sergio Lopez
531f4ff6b0 vhost_user_fs: Remove an unneeded unwrap in handle_event
Remove an unneeded unwrap in handle_event.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-26 12:08:12 +01:00
Sergio Lopez
e52129efb4 vhost_user_fs: Process events from HIPRIO queue
We weren't processing events arriving at the HIPRIO queue, which
implied ignoring FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
requests.

One effect of this issue was that file descriptors weren't closed on
the server, so it eventually hits RLIMIT_NOFILE. Additionally, the
guest OS may hang while attempting to unmount the filesystem.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-26 12:08:12 +01:00
Sergio Lopez
1c5562b656 vhost_user_fs: Add support for EVENT_IDX
Now that Queue supports EVENT_IDX, expose the feature and add support
for it in vhost_user_fs.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Sergio Lopez
eae4f1d249 vhost_user_fs: Add support for indirect descriptors
Now that Queue supports indirect descriptors, expose the feature and
support them in vhost_user_fs too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Sergio Lopez
ea0bc240fd vhost_user_fs: Be honest about protocol supported features
vhost_user_fs doesn't really support all vhost protocol features, just
MQ and SLAVE_REQ, so return that in protocol_features().

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Rob Bradford
d7b0b9842d tests: Move integration tests to their own directory
Simplify main.rs by moving the integration tests to their own directory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-25 10:42:54 +00:00
Rob Bradford
374ac77c63 main, vmm: Remove deprecated --vhost-user-net
This has been superseded by using --net with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
ffd816ebfa main, vmm: Remove deprecated --vhost-user-blk
This has been superseded by using --disk with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Sergio Lopez
5c06b7f862 vhost_user_block: Implement optional static polling
Actively polling the virtqueue significantly reduces the latency of
each I/O operation, at the expense of using more CPU time. This
features is specially useful when using low-latency devices (SSD,
NVMe) as the backend.

This change implements static polling. When a request arrives after
being idle, vhost_user_block will keep checking the virtqueue for new
requests, until POLL_QUEUE_US (50us) has passed without finding one.

POLL_QUEUE_US is defined to be 50us, based on the current latency of
enterprise SSDs (< 30us) and the overhead of the emulation.

This feature is enabled by default, and can be disabled by using the
"poll_queue" parameter of "block-backend".

This is a test using null_blk as a backend for the image, with the
following parameters:

 - null_blk gb=20 nr_devices=1 irqmode=2 completion_nsec=0 no_sched=1

With "poll_queue=false":

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=169MiB/s][r=43.2k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=433: Tue Feb 18 11:12:59 2020
  read: IOPS=43.2k, BW=169MiB/s (177MB/s)(1688MiB/10001msec)
    clat (usec): min=17, max=836, avg=21.64, stdev= 3.81
     lat (usec): min=17, max=836, avg=21.77, stdev= 3.81
    clat percentiles (nsec):
     |  1.00th=[19328],  5.00th=[19840], 10.00th=[20352], 20.00th=[21120],
     | 30.00th=[21376], 40.00th=[21376], 50.00th=[21376], 60.00th=[21632],
     | 70.00th=[21632], 80.00th=[21888], 90.00th=[22144], 95.00th=[22912],
     | 99.00th=[28544], 99.50th=[30336], 99.90th=[39168], 99.95th=[42752],
     | 99.99th=[71168]
   bw (  KiB/s): min=168440, max=188496, per=100.00%, avg=172912.00, stdev=3975.63, samples=19
   iops        : min=42110, max=47124, avg=43228.00, stdev=993.91, samples=19
  lat (usec)   : 20=5.90%, 50=94.08%, 100=0.02%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=10.35%, sys=25.82%, ctx=432417, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=432220,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=1688MiB (1770MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=427867/0, merge=0/0, ticks=7346/0, in_queue=0, util=99.04%

With "poll_queue=true" (default):

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=260MiB/s][r=66.7k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=422: Tue Feb 18 11:14:47 2020
  read: IOPS=68.5k, BW=267MiB/s (280MB/s)(2674MiB/10001msec)
    clat (usec): min=10, max=966, avg=13.60, stdev= 3.49
     lat (usec): min=10, max=966, avg=13.70, stdev= 3.50
    clat percentiles (nsec):
     |  1.00th=[11200],  5.00th=[11968], 10.00th=[11968], 20.00th=[12224],
     | 30.00th=[12992], 40.00th=[13504], 50.00th=[13760], 60.00th=[13888],
     | 70.00th=[14016], 80.00th=[14144], 90.00th=[14272], 95.00th=[14656],
     | 99.00th=[20352], 99.50th=[23936], 99.90th=[35072], 99.95th=[36096],
     | 99.99th=[47872]
   bw (  KiB/s): min=265456, max=296456, per=100.00%, avg=274229.05, stdev=13048.14, samples=19
   iops        : min=66364, max=74114, avg=68557.26, stdev=3262.03, samples=19
  lat (usec)   : 20=98.84%, 50=1.15%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=8.24%, sys=21.15%, ctx=684669, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=684611,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=2674MiB (2804MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=677855/0, merge=0/0, ticks=7026/0, in_queue=0, util=99.04%

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sergio Lopez
1ef6996207 vhost_user_backend: Add helpers for EVENT_IDX
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sebastien Boeuf
ddf6caf955 ci: Improve test_memory_mergeable_on stability
The integration test test_memory_mergeable_on has been fairly unstable
for quite some time now. Because it can take some time for the VM to be
spawned and to be able to perform a correct measure of the PSS, this
commit simply increases the time before such measure is done.
This should return more accurate PSS results, which should help
stabilize the test.

Fixes #781

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 12:36:28 +00:00