Commit Graph

71 Commits

Author SHA1 Message Date
Bo Chen
9aba1fdee6 virtio-devices, vmm: Use syscall definitions from the libc crate
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
864a5e4fe0 virtio-devices, vmm: Simplify 'get_seccomp_rules'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
08ac3405f5 virtio-devices, vmm: Move to the seccompiler crate
Fixes: #2929

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Rob Bradford
5e74848ab4 vmm: seccomp: Permit syscalls used for vfio-user on vCPU thread
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Markus Theil
5b0d4bb398 virtio-devices: seccomp: allow unix socket connect in vsock thread
Allow vsocks to connect to Unix sockets on the host running
cloud-hypervisor with enabled seccomp.

Reported-by: Philippe Schaaf <philippe.schaaf@secunet.com>
Tested-by: Franz Girlich <franz.girlich@tu-ilmenau.de>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
2021-08-06 08:44:47 -07:00
Muminul Islam
83c44a2411 vmm, virtio-devices: Add missing seccomp rules for MSHV
This patch adds all the seccomp rules missing for MSHV.
With this patch MSFT internal CI runs with seccomp enabled.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-08-03 11:09:07 -07:00
Muminul Islam
3baa0c3721 vmm: Add MSHV_VP_TRANSLATE_GVA to seccomp rule
This rule is needed to boot windows guest.
This bug was introduced while we tried to boot
windows guest on MSHV.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Muminul Islam
81895b9b40 hypervisor: Implement start/stop_dirty_log for MSHV
This patch modify the existing live migration code
to support MSHV. Adds couple of new functions to enable
and disable dirty page tracking. Add missing IOCTL
to the seccomp rules for live migration.
Adds necessary flags for MSHV.
This changes don't affect KVM functionality at all.

In order to get better performance it is good to
enable dirty page tracking when we start live migration
and disable it when the migration is done.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Sebastien Boeuf
0ac4545c5b vmm: Extend seccomp filters with fcntl() for HTTP thread
Whenever a file descriptor is sent through the control message, it
requires fcntl() syscall to handle it, meaning we must allow it through
the list of syscalls authorized for the HTTP thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-21 15:34:22 +02:00
Sebastien Boeuf
d68c388cac vmm: Update seccomp filters for HTTP thread
The micro-http crate now uses recvmsg() syscall in order to receive file
descriptors through control messages. This means the syscall must be
part of the authorized list in the seccomp filters.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-15 08:13:48 +00:00
Wei Liu
71bbaf556f vmm: seccomp: add seccomp rules for MSHV
Add a minimum set of rules that allow Cloud Hypervisor to run Linux on
top of Microsoft Hypervisor.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Wei Liu
8819bb0f21 vmm: seccomp: make use of KVM feature
The to-be-introduced MSHV rules don't need to contain KVM rules and vice
versa.

Put KVM constants into to a module. This avoids the warnings about
dead code in the future.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Michael Zhao
1a7a76511f vmm: Enable pty console on AArch64
Allowed syscall "SYS_readlinkat" on AArch64 which is required by pty.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-10 15:05:25 +02:00
William Douglas
a2cfe71c0a vmm: Update the http thread's seccomp filter
Because the http thread no longer needs to create the api socket,
remove the socket, bind and listen syscalls from the seccomp filter.

Signed-off-by: William Douglas <william.douglas@intel.com>
2021-04-29 09:44:40 +01:00
Rob Bradford
17072e9a6f vmm: seccomp: Add missing SYS_newfstatat
This is used when running on a new libc like Fedora34.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-12 18:02:29 +02:00
Sebastien Boeuf
46f96f27a4 vmm: Add missing syscall for vCPU unplug
clock_nanosleep() is triggered when hot-unplugging a vCPU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-31 10:51:52 +01:00
Rob Bradford
7c302373ed vmm: Address Rust 1.51.0 clippy issue (needless_question_mark)
warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:493:5
    |
493 | /     Ok(SeccompFilter::new(
494 | |         rules.into_iter().collect(),
495 | |         SeccompAction::Trap,
496 | |     )?)
    | |_______^
    |
    = note: `#[warn(clippy::needless_question_mark)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
493 |     SeccompFilter::new(
494 |         rules.into_iter().collect(),
495 |         SeccompAction::Trap,
496 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:507:5
    |
507 | /     Ok(SeccompFilter::new(
508 | |         rules.into_iter().collect(),
509 | |         SeccompAction::Log,
510 | |     )?)
    | |_______^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
507 |     SeccompFilter::new(
508 |         rules.into_iter().collect(),
509 |         SeccompAction::Log,
510 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/vm.rs:887:9
    |
887 |         Ok(CString::new(cmdline).map_err(Error::CmdLineCString)?)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `CString::new(cmdline).map_err(Error::CmdLineCString)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
a0c07474a3 vmm: seccomp: Add KVM_MEMORY_ENCRYPT_OP ioctl to seccomp filter
This is the basis for TDX based operations on the various KVM file
descriptors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-09 16:26:06 +01:00
Sebastien Boeuf
4ed0e1a3c8 net_util: Simplify TX/RX queue handling
The main idea behind this commit is to remove all the complexity
associated with TX/RX handling for virtio-net. By using writev() and
readv() syscalls, we could get rid of intermediate buffers for both
queues.

The complexity regarding the TAP registration has been simplified as
well. The RX queue is only processed when some data are ready to be
read from TAP. The event related to the RX queue getting more
descriptors only serves the purpose to register the TAP file if it's not
already.

With all these simplifications, the code is more readable but more
performant as well. We can see an improvement of 10% for a single
queue device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-22 10:39:23 +00:00
Rob Bradford
c1d9edbfc0 vmm: seccomp: Add getrandom to vCPU thread filter
This can be triggered upon device reset.

Fixes: #2278

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-18 16:15:13 +00:00
Sebastien Boeuf
9353856426 vmm: Fix seccomp filters for vCPUs
Depending on the host OS the code for looking up the time for the CMOS
make require extra syscalls to be permitted for the vCPU thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-11 11:24:57 +00:00
Sebastien Boeuf
19167e7647 pci: vfio: Implement INTx support
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00
William Douglas
48963e322a Enable pty console
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.

Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.

From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).

The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.

The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).

Signed-off-by: William Douglas <william.r.douglas@gmail.com>
2021-02-09 10:03:28 +00:00
Sebastien Boeuf
c6854c5a97 block_util: Simplify RAW synchronous implementation
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-01 13:45:08 +00:00
Bo Chen
8d418e1fca vmm: Refine the seccomp filter list for the vCPU thread
This patch refines the sccomp filter list for the vCPU thread, as we are
no longer spawning virtio-device threads from the vCPU thread.

Fixes: #2170

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-01-21 09:41:40 +00:00
Rob Bradford
cb826aa2f1 vmm: seccomp: Add open() to vCPU permitted syscalls
Older libc (like RHEL7) uses open() rather than openat(). This was
demonstrated through a failure to open /etc/localtime as used by
gmtime() libc call trigged from the vCPU thread (CMOS device.)

Fixes: #2111

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-11 11:24:13 +01:00
Rob Bradford
14af74cb5b vmm: seccomp: Allow clock_gettime() on the vCPU thread
If the vCPU thread calls log!() the time difference between the call
time and the boot up time is reported. On most environments and
architectures this covered by a vDSO call rather than a syscall. However
on some platforms this turns into a syscall.

Fixes: #2080

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 10:13:42 -08:00
Rob Bradford
444905071b vmm: seccomp: Permit TUNGETIFF through the filter
This is used to obtain the TAP device name.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 22:51:30 +01:00
Rob Bradford
1ab1341775 vmm: seccomp_filters: Add KVM_GET_DIRTY_LOG to permitted calls
The live migration support added use of this ioctl but it wasn't
included in the permitted list.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-25 01:27:26 +01:00
Sebastien Boeuf
28e12e9f3a vmm, hypervisor: Fix snapshot/restore for Windows guest
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.

First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.

Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-21 19:11:03 +01:00
Rob Bradford
885ee9567b vmm: Add support for creating virtio-watchdog
The watchdog device is created through the "--watchdog" parameter. At
most a single watchdog can be created per VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-21 16:02:39 +01:00
Sebastien Boeuf
c02a02edfc vmm: Allow unlink syscall for vCPU threads
Without the unlink(2) syscall being allowed, Cloud-Hypervisor crashes
when we remove a virtio-vsock device that has been previously added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-06 16:05:59 +01:00
Henry Wang
3ea4a0797d vmm: seccomp: unify AArch64 and x86_64 FTRUNCATE syscall
The definition of libc::SYS_ftruncate on AArch64 is different
from that on x86_64. This commit unifies the previously hard-coded
syscall number for AArch64.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
48544e4e82 vmm: seccomp: whitelist KVM_GET_REG_LIST in seccomp
`KVM_GET_REG_LIST` ioctl is needed in save/restore AArch64 vCPU.
Therefore we whitelist this ioctl in seccomp.

Also this commit unifies the `SYS_FTRUNCATE` syscall for x86_64
and AArch64.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Sebastien Boeuf
7c346c3844 vmm: Kill vhost-user self-spawned process on failure
If after the creation of the self-spawned backend, the VMM cannot create
the corresponding vhost-user frontend, the VMM must kill the freshly
spawned process in order to ensure the error propagation can happen.

In case the child process would still be around, the VMM cannot return
the error as it waits onto the child to terminate.

This should help us identify when self-spawned failures are caused by a
connection being refused between the VMM and the backend.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-18 17:26:25 +01:00
Sebastien Boeuf
555c5c5d9c vmm: Add missing syscalls to signal thread
When the VMM is terminated by receiving a SIGTERM signal, the signal
handler thread must be able to invoke ioctl(TCGETS) and ioctl(TCSETS)
without error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-18 13:40:10 +01:00
Rob Bradford
41a9b1adef vmm: Add missing syscall to vCPU thread
Fixes: #1717

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-18 13:40:10 +01:00
Sebastien Boeuf
b3435d51d9 vmm: cpu: Add missing io_uring syscalls to vCPU threads
Some of the io_uring setup happens upon activation of the virtio-blk
device, which is initially triggered through an MMIO VM exit. That's why
the vCPU threads must authorize io_uring related syscalls.

This commit ensures the virtio-blk io_uring implementation can be used
along with the seccomp filters enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 11:59:47 +02:00
Bo Chen
9682d74763 vmm: seccomp: Add seccomp filters for signal_handler worker thread
This patch covers the last worker thread with dedicated secomp filters.

Fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-09-11 07:42:31 +02:00
Bo Chen
2612a6df29 vmm: seccomp: Add seccomp filters for the vcpu worker thread
Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-09-11 07:42:31 +02:00
Michael Zhao
a95b6bbd8b vmm: Add seccomp rules for starting vhost-user-net backend on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-31 08:19:23 +02:00
Sebastien Boeuf
1b4591aecc vmm: memory_manager: Apply NUMA policy to memory zones
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
bdef54ead6 vmm: Add brk syscall to the API thread
The brk syscall is not always called as the system might not need it.
But when it's needed from the API thread, this causes the thread to
terminate as it is not part of the authorized list of syscalls.

This should fix some sporadic failures on the CI with the musl build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-11 15:04:21 +01:00
Jose Carlos Venegas Munoz
90acb01bad vmm: seccomp: add mprotect to API thread filter
Add mprotect to API thread rules. Prevent the VMM is
killed when it is used.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-08-05 21:35:21 +01:00
Bo Chen
8e74637ebb main, vmm: seccomp: Add the '--seccomp log' option
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".

Fixes: #1180

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
b41884a406 main, vmm: seccomp: Use SeccompAction instead of SeccompLevel
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Sebastien Boeuf
917027c55b vmm: Rely on virtio-blk io_uring when possible
In case the host supports io_uring and the specific io_uring options
needed, the VMM will choose the asynchronous version of virtio-blk.
This will enable better I/O performances compared to the default
synchronous version.

This is also important to note the VMM won't be able to use the
asynchronous version if the backend image is in QCOW format.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Jianyong Wu
d24b110519 seccomp: AArch64: Add SYS_unlinkat to seccomp whitelist
This commit fixes an "Bad syscall" error when shutting down the VM
on AArch64 by adding the SYS_unlinkat syscall to the seccomp
whitelist.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-07-27 07:25:07 +00:00
Michael Zhao
726e45e0ce vmm: Divide Seccomp KVM IOCTL rules by architecture
Refactored the construction of KVM IOCTL rules for Seccomp.
Separating the rules by architecture can reduce the risk of bugs and
attacks.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-06 13:40:38 +01:00
Michael Zhao
8820e9e133 vmm: Fix Seccomp filter for AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-02 08:46:24 +01:00