Commit Graph

99 Commits

Author SHA1 Message Date
Alyssa Ross
3c0b389c82 vmm: allow getdents64 in seccomp filter
This is used on older kernels where close_range() is not available.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
Fixes: 505f4dfa ("vmm: close all unused fds in sigwinch listener")
2023-04-22 11:40:17 +01:00
Alyssa Ross
38a1b45783 vmm: use the SIGWINCH listener for TTYs too
Previously, we were only using it for PTYs, because for PTYs there's
no alternative.  But since we have to have it for PTYs anyway, if we
also use it for TTYs, we can eliminate all of the code that handled
SIGWINCH for TTYs.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Alyssa Ross
505f4dfa53 vmm: close all unused fds in sigwinch listener
The PTY main file descriptor had to be introduced as a parameter to
start_sigwinch_listener, so that it could be closed in the child.
Really the SIGWINCH listener process should not have any file
descriptors open, except for the ones it needs to function, so let's
make it more robust by having it close all other file descriptors.

For recent kernels, we can do this very conveniently with
close_range(2), but for older kernels, we have to fall back to closing
open file descriptors one at a time.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2023-04-05 11:23:06 +01:00
Praveen K Paladugu
1143d54ee0 tpm: Add recv timeout while running recvmsg
If swtpm becomes unresponsive, guest gets blocked at "recvmsg" on tpm's
data FD. This change adds a timeout to the data fd socket. If swtpm
becomes unresponsive guest waits for "timeout" (secs) and continues to
run after returning an I/O error to tpm commands.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
2023-02-10 17:49:03 +01:00
Rob Bradford
c22c4675b3 arch, hypervisor: Populate CPUID leaf 0x4000_0010 (TSC frequency)
This hypervisor leaf includes details of the TSC frequency if that is
available from KVM. This can be used to efficiently calculate time
passed when there is an invariant TSC.

TEST=Run `cpuid` in the guest and observe the frequency populated.

Fixes: #5178

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2023-02-09 18:32:21 +01:00
Rob Bradford
a48d7c281e vmm: seccomp: Remove unreachable patterns
Make HypervisorType enum's members conditional on build time features.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-12-13 18:10:42 +00:00
Sebastien Boeuf
22be5f9d0f vmm: Extend list of authorized ioctls for vDPA
Adding VHOST_VDPA_GET_CONFIG_SIZE and VHOST_VDPA_SUSPEND to the list of
authorized ioctls for the vmm thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-10-13 10:03:23 +02:00
Anatol Belski
a18b08c682 seccomp: mshv: Allow create partition ioctl
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
2022-10-11 09:05:24 +01:00
Sebastien Boeuf
76dbf85b79 net: Give the user the ability to set MTU
Add a new "mtu" parameter to the NetConfig structure and therefore to
the --net option. This allows Cloud Hypervisor's users to define the
Maximum Transmission Unit (MTU) they want to use for the network
interface that they create.

In details, there are two main aspects. On the one hand, the TAP
interface is created with the proper MTU if it is provided. And on the
other hand the guest is made aware of the MTU through the VIRTIO
configuration. That means the MTU is properly set on both the TAP on the
host and the network interface in the guest.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-09-21 16:20:57 +02:00
Michael Zhao
c798b958f3 vmm: Extend seccomp rules for GDB
Add 'KVM_SET_GUEST_DEBUG' ioctl to seccomp filter rules.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2022-08-21 17:07:26 +08:00
Wei Liu
ad33f7c5e6 vmm: return seccomp rules according to hypervisors
That requires stashing the hypervisor type into various places.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2022-07-22 12:50:12 +01:00
Sebastien Boeuf
49db713124 virtio-devices, vmm: Remove unused macro rules
Latest cargo beta version raises warnings about unused macro rules.
Simply remove them to fix the beta build.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-05-20 09:59:43 +01:00
Bo Chen
42c19e14c5 vmm: Add 'shutdown()' to vCPU seccomp filter
This is required when hot-removing a vfio-user device. Details code path
below:

Thread 6 "vcpu0" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7f8196889700 (LWP 2358305)]
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
  0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
  0x000056189240737d in std::sys::unix::net::Socket::shutdown ()
    at library/std/src/sys/unix/net.rs:383
  std::os::unix::net::stream::UnixStream::shutdown () at library/std/src/os/unix/net/stream.rs:479
  0x000056189210e23d in vfio_user::Client::shutdown (self=0x7f8190014300)
    at vfio_user/src/lib.rs:787
  0x00005618920b9d02 in <pci::vfio_user::VfioUserPciDevice as core::ops::drop::Drop>::drop (
    self=0x7f819002d7c0) at pci/src/vfio_user.rs:551
  0x00005618920b8787 in core::ptr::drop_in_place<pci::vfio_user::VfioUserPciDevice> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b92e3 in core::ptr::drop_in_place<core::cell::UnsafeCell<dyn pci::device::PciDevice>>
    () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920b9362 in core::ptr::drop_in_place<std::sync::mutex::Mutex<dyn pci::device::PciDevice>> () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
  0x00005618920d8a3e in alloc::sync::Arc<T>::drop_slow (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1092
  0x00005618920ba273 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7f81968852b8)
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1688
 0x00005618920b76fb in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mutex::Mutex<dyn pci::device::PciDevice>>> ()
    at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
 0x0000561891b5e47d in vmm::device_manager::DeviceManager::eject_device (self=0x7f8190009600,
    pci_segment_id=0, device_id=3) at vmm/src/device_manager.rs:4000
 0x0000561891b674bc in <vmm::device_manager::DeviceManager as vm_device:🚌:BusDevice>::write (
    self=0x7f8190009600, base=70368744108032, offset=8, data=&[u8](size=4) = {...})
    at vmm/src/device_manager.rs:4625
 0x00005618921927d5 in vm_device:🚌:Bus::write (self=0x7f8190006e00, addr=70368744108040,
    data=&[u8](size=4) = {...}) at vm-device/src/bus.rs:235
 0x0000561891b72e10 in <vmm::vm::VmOps as hypervisor::vm::VmmOps>::mmio_write (
    self=0x7f81900097b0, gpa=70368744108040, data=&[u8](size=4) = {...}) at vmm/src/vm.rs:378
 0x0000561892133ae2 in <hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run (
    self=0x7f8190013c90) at hypervisor/src/kvm/mod.rs:1114
 0x0000561891914e85 in vmm::cpu::Vcpu::run (self=0x7f819001b230) at vmm/src/cpu.rs:348
 0x000056189189f2cb in vmm::cpu::CpuManager::start_vcpu::{{closure}}::{{closure}} ()
    at vmm/src/cpu.rs:953

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-05-05 15:33:26 -07:00
Rob Bradford
4a04d1f8f2 vmm: seccomp: Allow SYS_rseq as required by newer glibc
glibc 2.35 as shipped by Fedora 36 now uses the rseq syscall.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-04-21 13:02:51 +01:00
Bo Chen
eed2a0d06b vmm: Add 'libc::SYS_shutdown' to vmm 'seccomp' filter list
Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-03-31 09:22:07 +01:00
Sebastien Boeuf
c73c6039c3 vmm: Enable vDPA support
Based on the newly added Vdpa device along with the new vdpa parameter,
this patch enables the support for vDPA devices.

It's important to note this the only virtio device for which we provide
an ExternalDmaMapping instance. This will allow for the right DMA ranges
to be mapped/unmapped.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2022-03-18 12:28:40 +01:00
Akira Moroo
a2a492f3df seccomp: Add ioctls to seccomp filter for guest debug
This commit adds `KVM_SET_GUEST_DEBUG` and `KVM_TRANSLATE` ioctls to
seccomp filter to enable guest debugging without `--seccomp=false`.

Signed-off-by: Akira Moroo <retrage01@gmail.com>
2022-02-23 11:16:09 +00:00
Jianyong Wu
5462fd810c seccomp: add ioctl group to seccomp authorized list for arm64
When enable PMU on arm64, ioctl with group KVM_HAS_DEVICE_ATTR will be
blocked by seccomp, add it to authorized list.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-01-21 17:59:36 +08:00
Rob Bradford
0fcbcea275 vmm: seccomp: Remove set_tid_address syscall from seccomp filter
The origins of the requirement for this syscall in the seccomp filter
list are unknown.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-06 09:52:39 -08:00
Rob Bradford
10d1922393 vmm: seccomp: Remove arch_prctl syscall from seccomp filter
The origins of the requirement for this syscall in the seccomp filter
list are unknown.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2022-01-06 09:52:24 -08:00
Rob Bradford
cbc388c7e2 vmm: Add ioctls to seccomp filter for block topology detection
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-12-17 12:42:10 +01:00
Rob Bradford
bde81405a8 vmm: seccomp: Remove fork & evecve syscalls
These were use for the self spawning vhost-user device feature that has
been removed.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-12-16 20:56:50 +01:00
Sebastien Boeuf
03a606c7ec arch, vmm: Place KVM identity map region after TSS region
In order to avoid the identity map region to conflict with a possible
firmware being placed in the last 4MiB of the 4GiB range, we must set
the address to a chosen location. And it makes the most sense to have
this region placed right after the TSS region.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-12-04 19:33:34 +00:00
Rob Bradford
419870ae45 vmm: Add epoll_ctl() syscall to vCPU seccomp filter
Fix seccomp violation when trying to add the out FD to the epoll loop
when the serial buffer needs to be flushed.

0x00007ffff7dc093e in epoll_ctl () at ../sysdeps/unix/syscall-template.S:120
0x0000555555db9b6d in epoll::ctl (epfd=56, op=epoll::ControlOptions::EPOLL_CTL_MOD, fd=55, event=...)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/epoll-4.3.1/src/lib.rs:155
0x00005555556f5127 in vmm::serial_buffer::SerialBuffer::add_out_poll (self=0x7fffe800b5d0) at vmm/src/serial_buffer.rs:101
0x00005555556f583d in vmm::serial_buffer::{impl#1}::write (self=0x7fffe800b5d0, buf=...) at vmm/src/serial_buffer.rs:139
0x0000555555a30b10 in std::io::Write::write_all<vmm::serial_buffer::SerialBuffer> (self=0x7fffe800b5d0, buf=...)
    at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/io/mod.rs:1527
0x0000555555ab82fb in devices::legacy::serial::Serial::handle_write (self=0x7fffe800b520, offset=0, v=13) at devices/src/legacy/serial.rs:217
0x0000555555ab897f in devices::legacy::serial::{impl#2}::write (self=0x7fffe800b520, _base=1016, offset=0, data=...) at devices/src/legacy/serial.rs:295
0x0000555555f30e95 in vm_device:🚌:Bus::write (self=0x7fffe8006ce0, addr=1016, data=...) at vm-device/src/bus.rs:235
0x00005555559406d4 in vmm::vm::{impl#4}::pio_write (self=0x7fffe8009640, port=1016, data=...) at vmm/src/vm.rs:459

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-11-16 07:27:46 -08:00
Sebastien Boeuf
932c8c9713 vmm: Add CPU affinity support
With the introduction of a new option `affinity` to the `cpus`
parameter, Cloud Hypervisor can now let the user choose the set
of host CPUs where to run each vCPU.

This is useful when trying to achieve CPU pinning, as well as making
sure the VM runs on a specific NUMA node.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-11-12 09:40:37 +00:00
Alyssa Ross
330b5ea3be vmm: notify virtio-console of pty resizes
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize.  This is the only way to be notified
by the kernel of a pty resize.

We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in.  To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty.  The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal.  This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.

Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity.  I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.

I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process.  The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2021-09-14 15:43:25 +01:00
Alyssa Ross
8abe8c679b seccomp: allow mmap everywhere brk is allowed
Musl often uses mmap to allocate memory where Glibc would use brk.
This has caused seccomp violations for me on the API and signal
handling threads.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2021-09-10 12:01:31 -07:00
Alyssa Ross
7549149bb5 vmm: ensure signal handlers run on the right thread
Despite setting up a dedicated thread for signal handling, we weren't
making sure that the signals we were listening for there were actually
dispatched to the right thread.  While the signal-hook provides an
iterator API, so we can know that we're only processing the signals
coming out of the iterator on our signal handling thread, the actual
signal handling code from signal-hook, which pushes the signals onto
the iterator, can run on any thread.  This can lead to seccomp
violations when the signal-hook signal handler does something that
isn't allowed on that thread by our seccomp policy.

To reproduce, resize a terminal running cloud-hypervisor continuously
for a few minutes.  Eventually, the kernel will deliver a SIGWINCH to
a thread with a restrictive seccomp policy, and a seccomp violation
will trigger.

As part of this change, it's also necessary to allow rt_sigreturn(2)
on the signal handling thread, so signal handlers are actually allowed
to run on it.  The fact that this didn't seem to be needed before
makes me think that signal handlers were almost _never_ actually
running on the signal handling thread.

Signed-off-by: Alyssa Ross <hi@alyssa.is>
2021-09-02 21:33:31 +01:00
Bo Chen
9aba1fdee6 virtio-devices, vmm: Use syscall definitions from the libc crate
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
864a5e4fe0 virtio-devices, vmm: Simplify 'get_seccomp_rules'
Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Bo Chen
08ac3405f5 virtio-devices, vmm: Move to the seccompiler crate
Fixes: #2929

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-08-18 10:42:19 +02:00
Rob Bradford
5e74848ab4 vmm: seccomp: Permit syscalls used for vfio-user on vCPU thread
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-08-10 16:01:00 +01:00
Markus Theil
5b0d4bb398 virtio-devices: seccomp: allow unix socket connect in vsock thread
Allow vsocks to connect to Unix sockets on the host running
cloud-hypervisor with enabled seccomp.

Reported-by: Philippe Schaaf <philippe.schaaf@secunet.com>
Tested-by: Franz Girlich <franz.girlich@tu-ilmenau.de>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
2021-08-06 08:44:47 -07:00
Muminul Islam
83c44a2411 vmm, virtio-devices: Add missing seccomp rules for MSHV
This patch adds all the seccomp rules missing for MSHV.
With this patch MSFT internal CI runs with seccomp enabled.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-08-03 11:09:07 -07:00
Muminul Islam
3baa0c3721 vmm: Add MSHV_VP_TRANSLATE_GVA to seccomp rule
This rule is needed to boot windows guest.
This bug was introduced while we tried to boot
windows guest on MSHV.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Muminul Islam
81895b9b40 hypervisor: Implement start/stop_dirty_log for MSHV
This patch modify the existing live migration code
to support MSHV. Adds couple of new functions to enable
and disable dirty page tracking. Add missing IOCTL
to the seccomp rules for live migration.
Adds necessary flags for MSHV.
This changes don't affect KVM functionality at all.

In order to get better performance it is good to
enable dirty page tracking when we start live migration
and disable it when the migration is done.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
2021-07-29 16:29:53 +01:00
Sebastien Boeuf
0ac4545c5b vmm: Extend seccomp filters with fcntl() for HTTP thread
Whenever a file descriptor is sent through the control message, it
requires fcntl() syscall to handle it, meaning we must allow it through
the list of syscalls authorized for the HTTP thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-21 15:34:22 +02:00
Sebastien Boeuf
d68c388cac vmm: Update seccomp filters for HTTP thread
The micro-http crate now uses recvmsg() syscall in order to receive file
descriptors through control messages. This means the syscall must be
part of the authorized list in the seccomp filters.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-07-15 08:13:48 +00:00
Wei Liu
71bbaf556f vmm: seccomp: add seccomp rules for MSHV
Add a minimum set of rules that allow Cloud Hypervisor to run Linux on
top of Microsoft Hypervisor.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Wei Liu
8819bb0f21 vmm: seccomp: make use of KVM feature
The to-be-introduced MSHV rules don't need to contain KVM rules and vice
versa.

Put KVM constants into to a module. This avoids the warnings about
dead code in the future.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
2021-07-05 09:44:02 +02:00
Michael Zhao
1a7a76511f vmm: Enable pty console on AArch64
Allowed syscall "SYS_readlinkat" on AArch64 which is required by pty.

Signed-off-by: Michael Zhao <michael.zhao@arm.com>
2021-06-10 15:05:25 +02:00
William Douglas
a2cfe71c0a vmm: Update the http thread's seccomp filter
Because the http thread no longer needs to create the api socket,
remove the socket, bind and listen syscalls from the seccomp filter.

Signed-off-by: William Douglas <william.douglas@intel.com>
2021-04-29 09:44:40 +01:00
Rob Bradford
17072e9a6f vmm: seccomp: Add missing SYS_newfstatat
This is used when running on a new libc like Fedora34.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-04-12 18:02:29 +02:00
Sebastien Boeuf
46f96f27a4 vmm: Add missing syscall for vCPU unplug
clock_nanosleep() is triggered when hot-unplugging a vCPU.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-03-31 10:51:52 +01:00
Rob Bradford
7c302373ed vmm: Address Rust 1.51.0 clippy issue (needless_question_mark)
warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:493:5
    |
493 | /     Ok(SeccompFilter::new(
494 | |         rules.into_iter().collect(),
495 | |         SeccompAction::Trap,
496 | |     )?)
    | |_______^
    |
    = note: `#[warn(clippy::needless_question_mark)]` on by default
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
493 |     SeccompFilter::new(
494 |         rules.into_iter().collect(),
495 |         SeccompAction::Trap,
496 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/seccomp_filters.rs:507:5
    |
507 | /     Ok(SeccompFilter::new(
508 | |         rules.into_iter().collect(),
509 | |         SeccompAction::Log,
510 | |     )?)
    | |_______^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark
help: try
    |
507 |     SeccompFilter::new(
508 |         rules.into_iter().collect(),
509 |         SeccompAction::Log,
510 |     )
    |

warning: Question mark operator is useless here
   --> vmm/src/vm.rs:887:9
    |
887 |         Ok(CString::new(cmdline).map_err(Error::CmdLineCString)?)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `CString::new(cmdline).map_err(Error::CmdLineCString)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_question_mark

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-26 11:32:09 +00:00
Rob Bradford
a0c07474a3 vmm: seccomp: Add KVM_MEMORY_ENCRYPT_OP ioctl to seccomp filter
This is the basis for TDX based operations on the various KVM file
descriptors.

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-03-09 16:26:06 +01:00
Sebastien Boeuf
4ed0e1a3c8 net_util: Simplify TX/RX queue handling
The main idea behind this commit is to remove all the complexity
associated with TX/RX handling for virtio-net. By using writev() and
readv() syscalls, we could get rid of intermediate buffers for both
queues.

The complexity regarding the TAP registration has been simplified as
well. The RX queue is only processed when some data are ready to be
read from TAP. The event related to the RX queue getting more
descriptors only serves the purpose to register the TAP file if it's not
already.

With all these simplifications, the code is more readable but more
performant as well. We can see an improvement of 10% for a single
queue device.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-22 10:39:23 +00:00
Rob Bradford
c1d9edbfc0 vmm: seccomp: Add getrandom to vCPU thread filter
This can be triggered upon device reset.

Fixes: #2278

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-18 16:15:13 +00:00
Sebastien Boeuf
9353856426 vmm: Fix seccomp filters for vCPUs
Depending on the host OS the code for looking up the time for the CMOS
make require extra syscalls to be permitted for the vCPU thread.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
2021-02-11 11:24:57 +00:00
Sebastien Boeuf
19167e7647 pci: vfio: Implement INTx support
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2021-02-10 17:34:56 +00:00