Commit Graph

245 Commits

Author SHA1 Message Date
Sebastien Boeuf
5120c275a2 main: Add seccomp support
This change introduces a new CLI option --seccomp. This allows the user
to enable/disable the seccomp filters when needed. Because the user now
has the possibility to disable the seccomp filters, and because the
Cloud-Hypervisor project wants to enforce the maximum security by
default, the seccomp filters are now applied by default.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
db62cb3f4d vmm: Add seccomp filter to the VMM thread
This commit introduces the application of the seccomp filter to the VMM
thread. The filter is empty for now (SeccompLevel::None).

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Rob Bradford
f7197e8415 vmm: Add a "discard_writes=" to --pmem
This opens the backing file read-only, makes the pages in the mmap()
read-only and also makes the KVM mapping read-only. The file is also
mapped with MAP_PRIVATE to make the changes local to this process only.

This is functional alternative to having support for making a
virtio-pmem device readonly. Unfortunately there is no concept of
readonly virtio-pmem (or any type of NVDIMM/PMEM) in the Linux kernel so
to be able to have a block device that is appears readonly in the guest
requires significant specification and kernel changes.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-20 14:46:34 +01:00
Rob Bradford
477bc17f18 bin: Share VFIO device syntax between cloud-hypervisor and ch-remote
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-18 23:38:55 +00:00
Sergio Lopez
07cc73bddc vhost_user_fs: add a flag to disable extended attributes
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-13 15:20:34 +00:00
Sergio Lopez
710520e9a1 vhost_user_fs: Process requests in parallel with a thread pool
This change enables vhost_user_fs to process multiple requests in
parallel by scheduling them into a ThreadPool (from the Futures
crate).

Parallelism on a single file is limited by the nature of the operation
executed on it. A recent commit replaced the Mutex that protects the
File within HandleData with a RwLock, to allow some operations (at
this moment, only "read" and "write") to proceed in parallel by
acquiring a read lock.

A more complex approach was also implemented [1], involving
instrumentation through vhost_user_backend to be able to serialize
completions, reducing the pressure on the vring RwLock. This strategy
improved the performance on some corner cases, while making it worse
on other, more common ones. This fact, in addition to it requiring
wider changes through the source code, prompted me to drop it in favor
of this one.

[1] https://github.com/slp/cloud-hypervisor/tree/vuf_async

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-13 15:20:34 +00:00
Rob Bradford
4579afa091 vmm: For --disk error if socket and path is specified
This is an error as the path should be specfied by the unmanaged
backend.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:41:52 +00:00
Rob Bradford
4f2469e054 main: Remove "--vhost-user-net"
This option was superseded by using "--net" with "vhost_user=true". This
option wasn't being parsed any more but was left over.

Fixes: #806

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-13 11:00:43 +00:00
Rob Bradford
ca3b39c0be bin: Fix wrapping in help strings
Some of the help strings had extra newlines in them or otherwise strange
wrapping. The strings were rewrapped with the nightly version of rustfmt
that supports string formatting.

Fixes: #899

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 18:03:18 +00:00
Sergio Lopez
3957d1ee27 vhost_user_backend: call get_used_event from needs_notification
This change, combined with the compiler hint to inline get_used_event,
shortens the window between the memory read and the actual check by
calling get_used_event from needs_notification.

Without it, when putting enough pressure on the vring, it's possible
that a notification is wrongly omitted, causing the queue to stall.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-03-12 14:34:21 +00:00
Rob Bradford
9a7d9c9465 ch-remote: Support removing VFIO devices
Add a "remove-device" command that allows removing VFIO devices from the
VM after boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Rob Bradford
0d53ba4395 ch-remote: Support adding VFIO devices
Add an "add-device" command that allows adding VFIO devices to the VM
after boot.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Rob Bradford
babefbd9bf main: Remove spurious second help line for "--device"
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-12 12:56:10 +01:00
Sebastien Boeuf
9023444ad3 vmm: Add id field to --device through CLI
Add the ability to specify the "id" associated with a device, by adding
an extra option to the parameter --device.

This new option is not mandatory, and by default, the VMM will take care
of finding a unique identifier.

If the identifier provided by the user through this new option is not
unique, an error will be thrown and the VM won't be started.

Fixes #881

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-11 13:10:57 +00:00
Rob Bradford
21160f7490 ch-remote: Add "resize" command
This command lets you change the number of vCPUs and RAM that the VM
has.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
bb2d04b39d ch-remote: Add support for sending a request body
Support sending a request body this will usually be JSON encoded data
representing the details of the request.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
bde4f735ab ch-remote: Refactor HTTP response handling
Extract HTTP response handling into its own function.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-11 08:22:09 +01:00
Rob Bradford
ba8cd4d55a bin: Introduce "ch-remote" for controlling VMM
This commit introduces a basic implementation of a remote control of a
running VMM implementing a subset of the API.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-03-09 15:03:01 +00:00
Cathy Zhang
6341736286 vhost_user_net: Provide tap option for vhost_user_net backend
Provide vhost_user_net backend with the tap option, it allows to
use the existing tap interface.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2020-03-05 15:09:20 +00:00
Sergio Lopez
531f4ff6b0 vhost_user_fs: Remove an unneeded unwrap in handle_event
Remove an unneeded unwrap in handle_event.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-26 12:08:12 +01:00
Sergio Lopez
e52129efb4 vhost_user_fs: Process events from HIPRIO queue
We weren't processing events arriving at the HIPRIO queue, which
implied ignoring FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
requests.

One effect of this issue was that file descriptors weren't closed on
the server, so it eventually hits RLIMIT_NOFILE. Additionally, the
guest OS may hang while attempting to unmount the filesystem.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-26 12:08:12 +01:00
Sergio Lopez
1c5562b656 vhost_user_fs: Add support for EVENT_IDX
Now that Queue supports EVENT_IDX, expose the feature and add support
for it in vhost_user_fs.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Sergio Lopez
eae4f1d249 vhost_user_fs: Add support for indirect descriptors
Now that Queue supports indirect descriptors, expose the feature and
support them in vhost_user_fs too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Sergio Lopez
ea0bc240fd vhost_user_fs: Be honest about protocol supported features
vhost_user_fs doesn't really support all vhost protocol features, just
MQ and SLAVE_REQ, so return that in protocol_features().

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-25 11:12:50 +00:00
Rob Bradford
d7b0b9842d tests: Move integration tests to their own directory
Simplify main.rs by moving the integration tests to their own directory.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-25 10:42:54 +00:00
Rob Bradford
374ac77c63 main, vmm: Remove deprecated --vhost-user-net
This has been superseded by using --net with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Rob Bradford
ffd816ebfa main, vmm: Remove deprecated --vhost-user-blk
This has been superseded by using --disk with vhost_user=true and
socket=<socket>

Fixes: #678

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-24 07:26:31 +01:00
Sergio Lopez
5c06b7f862 vhost_user_block: Implement optional static polling
Actively polling the virtqueue significantly reduces the latency of
each I/O operation, at the expense of using more CPU time. This
features is specially useful when using low-latency devices (SSD,
NVMe) as the backend.

This change implements static polling. When a request arrives after
being idle, vhost_user_block will keep checking the virtqueue for new
requests, until POLL_QUEUE_US (50us) has passed without finding one.

POLL_QUEUE_US is defined to be 50us, based on the current latency of
enterprise SSDs (< 30us) and the overhead of the emulation.

This feature is enabled by default, and can be disabled by using the
"poll_queue" parameter of "block-backend".

This is a test using null_blk as a backend for the image, with the
following parameters:

 - null_blk gb=20 nr_devices=1 irqmode=2 completion_nsec=0 no_sched=1

With "poll_queue=false":

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=169MiB/s][r=43.2k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=433: Tue Feb 18 11:12:59 2020
  read: IOPS=43.2k, BW=169MiB/s (177MB/s)(1688MiB/10001msec)
    clat (usec): min=17, max=836, avg=21.64, stdev= 3.81
     lat (usec): min=17, max=836, avg=21.77, stdev= 3.81
    clat percentiles (nsec):
     |  1.00th=[19328],  5.00th=[19840], 10.00th=[20352], 20.00th=[21120],
     | 30.00th=[21376], 40.00th=[21376], 50.00th=[21376], 60.00th=[21632],
     | 70.00th=[21632], 80.00th=[21888], 90.00th=[22144], 95.00th=[22912],
     | 99.00th=[28544], 99.50th=[30336], 99.90th=[39168], 99.95th=[42752],
     | 99.99th=[71168]
   bw (  KiB/s): min=168440, max=188496, per=100.00%, avg=172912.00, stdev=3975.63, samples=19
   iops        : min=42110, max=47124, avg=43228.00, stdev=993.91, samples=19
  lat (usec)   : 20=5.90%, 50=94.08%, 100=0.02%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=10.35%, sys=25.82%, ctx=432417, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=432220,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=1688MiB (1770MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=427867/0, merge=0/0, ticks=7346/0, in_queue=0, util=99.04%

With "poll_queue=true" (default):

fio --ioengine=sync --bs=4k --rw randread --name randread --direct=1
--filename=/dev/vdb --time_based --runtime=10

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.14
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=260MiB/s][r=66.7k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=422: Tue Feb 18 11:14:47 2020
  read: IOPS=68.5k, BW=267MiB/s (280MB/s)(2674MiB/10001msec)
    clat (usec): min=10, max=966, avg=13.60, stdev= 3.49
     lat (usec): min=10, max=966, avg=13.70, stdev= 3.50
    clat percentiles (nsec):
     |  1.00th=[11200],  5.00th=[11968], 10.00th=[11968], 20.00th=[12224],
     | 30.00th=[12992], 40.00th=[13504], 50.00th=[13760], 60.00th=[13888],
     | 70.00th=[14016], 80.00th=[14144], 90.00th=[14272], 95.00th=[14656],
     | 99.00th=[20352], 99.50th=[23936], 99.90th=[35072], 99.95th=[36096],
     | 99.99th=[47872]
   bw (  KiB/s): min=265456, max=296456, per=100.00%, avg=274229.05, stdev=13048.14, samples=19
   iops        : min=66364, max=74114, avg=68557.26, stdev=3262.03, samples=19
  lat (usec)   : 20=98.84%, 50=1.15%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  cpu          : usr=8.24%, sys=21.15%, ctx=684669, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=684611,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s), io=2674MiB (2804MB), run=10001-10001msec

Disk stats (read/write):
  vdb: ios=677855/0, merge=0/0, ticks=7026/0, in_queue=0, util=99.04%

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sergio Lopez
1ef6996207 vhost_user_backend: Add helpers for EVENT_IDX
Add helpers to Vring and VhostUserSlaveReqHandler for EVENT_IDX, so
consumers of this crate can make use of this feature.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2020-02-19 17:13:47 +00:00
Sebastien Boeuf
ddf6caf955 ci: Improve test_memory_mergeable_on stability
The integration test test_memory_mergeable_on has been fairly unstable
for quite some time now. Because it can take some time for the VM to be
spawned and to be able to perform a correct measure of the PSS, this
commit simply increases the time before such measure is done.
This should return more accurate PSS results, which should help
stabilize the test.

Fixes #781

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-02-19 12:36:28 +00:00
Liu Bo
4970e2f703 vhost-user-fs: add dax tests for vhost_user_fs rust daemon
Now that vhost_user_fs rust daemon supports virtiofs's dax mode, this adds
the two dax tests accordingly.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
2020-02-19 07:52:50 +00:00
Liu Bo
59958f0a61 vhost_user_fs: add the ability to set slave req fd
This adds the missing part of supporting virtiofs dax on the slave end,
that is, receiving a socket pair fd from the master end to set up a
communication channel for sending setupmapping & removemapping messages.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
2020-02-19 07:52:50 +00:00
Rob Bradford
f7378bc092 tests: Add self spawning vhost-user-block test
Also rename the net self spawning test to differentiate it from the the
block one.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-18 08:43:47 +00:00
Rob Bradford
7485a0c1f7 Revert "build: Don't fail build on test_vfio failure"
This reverts commit 014844d0da.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-17 08:23:02 +00:00
Rob Bradford
f03602a4c9 tests: Add self spawning vhost-user-net test
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-14 17:32:49 +00:00
Bo Chen
ebd83699dc main: Display git commit hash with the '--version' option
Add a build-script to propagate the git commit hash to other crates at
compile time through environment variables, and display the hash along
with the '--version' option.

Fixes #729

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-02-14 10:00:14 +01:00
Rob Bradford
e8e4f43d52 tests: Use hugepages for test_vfio
Using hugepages with VFIO and virtiofs can improve the performance of
the test_vfio test.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-13 17:55:21 +01:00
Rob Bradford
014844d0da build: Don't fail build on test_vfio failure
test_vfio has been failing consistently on the CI so mark it with
a "#[ignore]" and then forceably build it again but ignore the build
result.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-12 15:37:47 +00:00
Rob Bradford
da7f31d4bc bin: vhost_user_fs: Port to new exit event strategy
Rather than handling the KILL_EVENT in the event handler itself use the
newly added support in VhostUserBackend for providing a kill event
framework.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-11 15:21:07 +01:00
Rob Bradford
7f032c8bb3 bin: vhost_user_fs: Shutdown worker thread on exit
This will ensure a clean shutdown of the backend

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-10 09:16:49 +01:00
Rob Bradford
99cb8dc0a4 bin: vhost_user_fs use error! macro logging for consistency
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-10 09:16:49 +01:00
Rob Bradford
6e6b2b84fe scripts: Check the Rust formatting is valid
Check the rust formatting rather than just reformatting code on the CI
agent.

Also fix a formatting error that slipped in whilst the cargo fmt check
was not working correctly.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-04 16:47:05 +01:00
Rob Bradford
bc053f1b13 main: Ignore error on log writing
A previous version of this change attempted to avoid panicking by not
using .expect() when handling an error when attempting to write to the
log file. Unfortunately the macro eprintln!() that was used to replace
the .expect() also has the behaviour of panicking if stderr cannot be
used. Instead swallow the error completely as if writing to the log has
failed at logging time it is almost certainly the case that any message
about the log would also not be seen.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-04 12:22:23 +00:00
Yang Zhong
91739be120 main: Add help info for block-backend
Since the vhost-user-blk binary will be removed and the newer
release will integrate this block backend into cloud-hypervisor
binary. The block backend code has been added num_queues cmdline
support, we need update multiple queues help info for this
block-backend in the cloud-hypervisor.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-02-04 11:06:17 +00:00
Samuel Ortiz
ae6cf4c922 tests: integration: Add memory overhead test
We measure the memory overhead that the VMM process adds to the guest VM
and compare it with a maximum acceptable limit. The test is run against
a simple VM, running 1 vCPU and 512MB of RAM. Although this is not by
any mean a comprehensive VMM overhead measurement, it will allow us to
detect when and if any PR makes our code cross an arbitrary memory
overhead threshold.

Fixes: #64

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2020-02-04 09:23:56 +00:00
Rob Bradford
7cb61d3960 main: Don't panic (by calling .expect()) if writing to the log fails
As this can happen during the running of the VMM we should be very
careful not to panic() as that can lead to a thread being used by the VM
disappearing.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-02-03 19:08:08 +01:00
Yang Zhong
789a39a2d5 ci: Add MQ support in the test cases
The CI need to add MQ support, and the vcpu number should
be bigger than or equal to queue number.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-02-03 09:49:27 +01:00
Yang Zhong
99da1dff90 vhost-user-blk: Add MQ support in backend
Adding the num_queues parameter for vhost-user-blk backend, which
can enable MQ support in the backend.

This patch has enabled the MQ support from handle_event, and the
vhost-user-backend crate will enable multiple threads to call this
handle_event to handle read/write operations.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
2020-02-03 09:49:27 +01:00
Rob Bradford
b4d04bdff6 tests: Add CLI <-> API validation test for --disk changes
Check that the CLI generates the JSON data as expected.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-29 08:06:37 +00:00
Rob Bradford
12f4cd951a tests: Use "--disks" for vhost-user-block testing
Use newly unified command line options for testing vhost-user based
block support.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-01-29 08:06:37 +00:00