Commit Graph

49 Commits

Author SHA1 Message Date
William Douglas
48963e322a Enable pty console
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.

Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.

From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).

The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.

The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).

Signed-off-by: William Douglas <william.r.douglas@gmail.com>
2021-02-09 10:03:28 +00:00
Sebastien Boeuf
c6854c5a97 block_util: Simplify RAW synchronous implementation
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-01 13:45:08 +00:00
Bo Chen
8d418e1fca vmm: Refine the seccomp filter list for the vCPU thread
This patch refines the sccomp filter list for the vCPU thread, as we are
no longer spawning virtio-device threads from the vCPU thread.

Fixes: #2170

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-01-21 09:41:40 +00:00
Rob Bradford
cb826aa2f1 vmm: seccomp: Add open() to vCPU permitted syscalls
Older libc (like RHEL7) uses open() rather than openat(). This was
demonstrated through a failure to open /etc/localtime as used by
gmtime() libc call trigged from the vCPU thread (CMOS device.)

Fixes: #2111

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-11 11:24:13 +01:00
Rob Bradford
14af74cb5b vmm: seccomp: Allow clock_gettime() on the vCPU thread
If the vCPU thread calls log!() the time difference between the call
time and the boot up time is reported. On most environments and
architectures this covered by a vDSO call rather than a syscall. However
on some platforms this turns into a syscall.

Fixes: #2080

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-01-04 10:13:42 -08:00
Rob Bradford
444905071b vmm: seccomp: Permit TUNGETIFF through the filter
This is used to obtain the TAP device name.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-12-17 22:51:30 +01:00
Rob Bradford
1ab1341775 vmm: seccomp_filters: Add KVM_GET_DIRTY_LOG to permitted calls
The live migration support added use of this ioctl but it wasn't
included in the permitted list.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-11-25 01:27:26 +01:00
Sebastien Boeuf
28e12e9f3a vmm, hypervisor: Fix snapshot/restore for Windows guest
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.

First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.

Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-21 19:11:03 +01:00
Rob Bradford
885ee9567b vmm: Add support for creating virtio-watchdog
The watchdog device is created through the "--watchdog" parameter. At
most a single watchdog can be created per VM.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-10-21 16:02:39 +01:00
Sebastien Boeuf
c02a02edfc vmm: Allow unlink syscall for vCPU threads
Without the unlink(2) syscall being allowed, Cloud-Hypervisor crashes
when we remove a virtio-vsock device that has been previously added.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-10-06 16:05:59 +01:00
Henry Wang
3ea4a0797d vmm: seccomp: unify AArch64 and x86_64 FTRUNCATE syscall
The definition of libc::SYS_ftruncate on AArch64 is different
from that on x86_64. This commit unifies the previously hard-coded
syscall number for AArch64.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Henry Wang
48544e4e82 vmm: seccomp: whitelist KVM_GET_REG_LIST in seccomp
`KVM_GET_REG_LIST` ioctl is needed in save/restore AArch64 vCPU.
Therefore we whitelist this ioctl in seccomp.

Also this commit unifies the `SYS_FTRUNCATE` syscall for x86_64
and AArch64.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
2020-09-23 12:37:25 +01:00
Sebastien Boeuf
7c346c3844 vmm: Kill vhost-user self-spawned process on failure
If after the creation of the self-spawned backend, the VMM cannot create
the corresponding vhost-user frontend, the VMM must kill the freshly
spawned process in order to ensure the error propagation can happen.

In case the child process would still be around, the VMM cannot return
the error as it waits onto the child to terminate.

This should help us identify when self-spawned failures are caused by a
connection being refused between the VMM and the backend.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-18 17:26:25 +01:00
Sebastien Boeuf
555c5c5d9c vmm: Add missing syscalls to signal thread
When the VMM is terminated by receiving a SIGTERM signal, the signal
handler thread must be able to invoke ioctl(TCGETS) and ioctl(TCSETS)
without error.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-18 13:40:10 +01:00
Rob Bradford
41a9b1adef vmm: Add missing syscall to vCPU thread
Fixes: #1717

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-09-18 13:40:10 +01:00
Sebastien Boeuf
b3435d51d9 vmm: cpu: Add missing io_uring syscalls to vCPU threads
Some of the io_uring setup happens upon activation of the virtio-blk
device, which is initially triggered through an MMIO VM exit. That's why
the vCPU threads must authorize io_uring related syscalls.

This commit ensures the virtio-blk io_uring implementation can be used
along with the seccomp filters enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-09-16 11:59:47 +02:00
Bo Chen
9682d74763 vmm: seccomp: Add seccomp filters for signal_handler worker thread
This patch covers the last worker thread with dedicated secomp filters.

Fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-09-11 07:42:31 +02:00
Bo Chen
2612a6df29 vmm: seccomp: Add seccomp filters for the vcpu worker thread
Partially fixes: #925

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-09-11 07:42:31 +02:00
Michael Zhao
a95b6bbd8b vmm: Add seccomp rules for starting vhost-user-net backend on AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-08-31 08:19:23 +02:00
Sebastien Boeuf
1b4591aecc vmm: memory_manager: Apply NUMA policy to memory zones
Relying on the new option 'host_numa_node' from the 'memory-zone'
parameter, the user can now define which NUMA node from the host
should be used to back the current memory zone.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-27 08:39:38 -07:00
Sebastien Boeuf
bdef54ead6 vmm: Add brk syscall to the API thread
The brk syscall is not always called as the system might not need it.
But when it's needed from the API thread, this causes the thread to
terminate as it is not part of the authorized list of syscalls.

This should fix some sporadic failures on the CI with the musl build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-11 15:04:21 +01:00
Jose Carlos Venegas Munoz
90acb01bad vmm: seccomp: add mprotect to API thread filter
Add mprotect to API thread rules. Prevent the VMM is
killed when it is used.

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
2020-08-05 21:35:21 +01:00
Bo Chen
8e74637ebb main, vmm: seccomp: Add the '--seccomp log' option
This patch extends the CLI option '--seccomp' to accept the 'log'
parameter in addition 'true/false'. It also refactors the
vmm::seccomp_filters module to support both "SeccompAction::Trap" and
"SeccompAction::Log".

Fixes: #1180

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Bo Chen
b41884a406 main, vmm: seccomp: Use SeccompAction instead of SeccompLevel
This patch replaces the usage of 'SeccompLevel' with 'SeccompAction',
which is the first step to support the 'log' action over system
calls that are not on the allowed list of seccomp filters.

Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-08-04 11:40:49 +02:00
Sebastien Boeuf
917027c55b vmm: Rely on virtio-blk io_uring when possible
In case the host supports io_uring and the specific io_uring options
needed, the VMM will choose the asynchronous version of virtio-blk.
This will enable better I/O performances compared to the default
synchronous version.

This is also important to note the VMM won't be able to use the
asynchronous version if the backend image is in QCOW format.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-08-03 14:15:01 +01:00
Jianyong Wu
d24b110519 seccomp: AArch64: Add SYS_unlinkat to seccomp whitelist
This commit fixes an "Bad syscall" error when shutting down the VM
on AArch64 by adding the SYS_unlinkat syscall to the seccomp
whitelist.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2020-07-27 07:25:07 +00:00
Michael Zhao
726e45e0ce vmm: Divide Seccomp KVM IOCTL rules by architecture
Refactored the construction of KVM IOCTL rules for Seccomp.
Separating the rules by architecture can reduce the risk of bugs and
attacks.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-06 13:40:38 +01:00
Michael Zhao
8820e9e133 vmm: Fix Seccomp filter for AArch64
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-07-02 08:46:24 +01:00
Sebastien Boeuf
e35d4c5b28 hypervisor: Store all supported MSRs
On x86 architecture, we need to save a list of MSRs as part of the vCPU
state. By providing the full list of MSRs supported by KVM, this patch
fixes the remaining snapshot/restore issues, as the vCPU is restored
with all its previous states.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-30 14:03:03 +01:00
Sebastien Boeuf
e2b5c78dc5 hypervisor: Re-order vCPU state for storing and restoring
Some vCPU states such as MP_STATE can be modified while retrieving
other states. For this reason, it's important to follow a specific
order that will ensure a state won't be modified after it has been
saved. Comments about ordering requirements have been copied over
from Firecracker commit 57f4c7ca14a31c5536f188cacb669d2cad32b9ca.

This patch also set the previously saved VCPU_EVENTS, as this was
missing from the restore codepath.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-30 14:03:03 +01:00
Sebastien Boeuf
9f4714c32a vmm: Extend seccomp filters with KVM_KVMCLOCK_CTRL
Now that the VMM uses KVM_KVMCLOCK_CTRL from the KVM API, it must be
added to the seccomp filters list.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-24 12:38:56 +02:00
Sebastien Boeuf
f5150aa261 vmm: Extend seccomp filters with KVM_GET_CLOCK and KVM_SET_CLOCK
Now that the VMM uses both KVM_GET_CLOCK and KVM_SET_CLOCK from the KVM
API, they must be added to the seccomp filters list.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-23 14:36:01 +01:00
Sebastien Boeuf
a998e89375 build(deps): bump signal-hook from 0.1.15 to 0.1.16
Bumps [signal-hook](https://github.com/vorner/signal-hook) from 0.1.15
to 0.1.16.
- [Release notes](https://github.com/vorner/signal-hook/releases)
- [Changelog](https://github.com/vorner/signal-hook/blob/master/CHANGELOG.md)
- [Commits](vorner/signal-hook@v0.1.15...v0.1.16)

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-06-22 14:09:11 +01:00
Rob Bradford
929d70bc7f net_util: Only try and enable the TAP device if it not already enabled
This allows an existing TAP interface to be used without needing
CAP_NET_ADMIN permissions on the Cloud Hypervisor binary as the ioctl to
bring up the interface is avoided.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-06-08 17:56:10 +02:00
Rob Bradford
c31ad72ee9 build: Address issues found by 1.43.0 clippy
These are mostly due to use of "bare use" statements and unnecessary vector
creation.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-27 19:32:12 +02:00
Rob Bradford
0728bece0c vmm: seccomp: Ensure that umask() can be reprogrammed
When doing self spawning the child will attempt to set the umask() again. Let
it through the seccomp rules so long as it the safe mask again.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-27 16:46:51 +01:00
Michael Zhao
1befae872d build: Fixed build errors and warnings on AArch64
This is a preparing commit to build and test CH on AArch64. All building
issues were fixed, but no functionality was introduced.
For X86, the logic of code was not changed at all.
For ARM, the architecture specific part is still empty. And we applied
some tricks to workaround lint warnings. But such code will be replaced
later by other commits with real functionality.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2020-05-21 11:56:26 +01:00
Rob Bradford
11049401ce vmm: seccomp: Add ioctl() commands interface hardware address
This is necessary to support setting the MAC address on the tap
interface on the host.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-05-15 11:45:09 +01:00
Sebastien Boeuf
68fc432978 vmm: Update seccomp filters with clock_nanosleep
The clock_nanosleep system call needs to be whitelisted since the commit
12e00c0f45 introduced the use of a sleep()
function. Without this patch, we can see an error when the VM is paused
or killed.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-05-15 12:34:53 +02:00
Rob Bradford
8cef35745b vmm: seccomp: Add fork, gettid and pipe2 syscalls to permitted list
This is needed for self spawning with the musl target.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 17:57:01 +01:00
Rob Bradford
ce7678f29f vmm: seccomp: Add tkill syscall to permitted list
This is needed for rebooting on the musl target.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 17:57:01 +01:00
Rob Bradford
12758d7fad vmm: seccomp: Add epoll_pwait syscall to permitted list
This is needed for basic operation on the musl target.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2020-04-29 17:57:01 +01:00
Bo Chen
3f42f86d81 vmm: Add the 'shared' and 'hugepages' controls to MemoryConfig
The new 'shared' and 'hugepages' controls aim to replace the 'file'
option in MemoryConfig. This patch also updated all related integration
tests to use the new controls (instead of providing explicit paths to
"/dev/shm" or "/dev/hugepages").

Fixes: #1011

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Bo Chen <chen.bo@intel.com>
2020-04-23 21:39:51 +02:00
Sebastien Boeuf
b1554642e4 vmm: seccomp: Add missing mremap() syscall
While testing self spawned vhost-user backends, it appeared that the
backend was aborting due to a missing system call in the seccomp
filters. mremap() was the culprit and this patch simply adds it to the
whitelist.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-14 14:11:41 +02:00
Sebastien Boeuf
b9f9f01fcc vmm: Extend seccomp filters to allow snapshot/restore
A few KVM ioctls were missing in order to perform both snapshot and
restore while keeping seccomp enabled.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-07 12:26:10 +02:00
Sebastien Boeuf
2d17f4384a vmm: seccomp: Add missing open() syscall
On some systems, the open() system call is used by Cloud-Hypervisor,
that's why it should be part of the seccomp filters whitelist.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-04-02 09:56:48 +02:00
Sebastien Boeuf
e4ea8b0bef vmm: Add missing syscalls to the seccomp filters
Both clock_gettime and gettimeofday syscalls where missing when running
Cloud-Hypervisor on a Linux host without vDSO enabled. On a system with
vDSO enabled, the syscalls performed by vDSO were not filtered, that's
why we didn't have to whitelist them.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-27 16:50:52 +00:00
Sebastien Boeuf
feb8d7ae90 vmm: Separate seccomp filters between VMM and API threads
This separates the filters used between the VMM and API threads, so that
we can apply different rules for each thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00
Sebastien Boeuf
cb98d90097 vmm: Create new seccomp_filter module
Based on the seccomp crate, we create a new vmm module responsible for
creating a seccomp filter that will be applied to the VMM main thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2020-03-24 14:59:57 +01:00