Latest cargo beta version raises warnings about unused macro rules.
Simply remove them to fix the beta build.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is required when hot-removing a vfio-user device. Details code path
below:
Thread 6 "vcpu0" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7f8196889700 (LWP 2358305)]
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
78 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
0x00007f8196dae7ab in shutdown () at ../sysdeps/unix/syscall-template.S:78
0x000056189240737d in std::sys::unix::net::Socket::shutdown ()
at library/std/src/sys/unix/net.rs:383
std::os::unix::net::stream::UnixStream::shutdown () at library/std/src/os/unix/net/stream.rs:479
0x000056189210e23d in vfio_user::Client::shutdown (self=0x7f8190014300)
at vfio_user/src/lib.rs:787
0x00005618920b9d02 in <pci::vfio_user::VfioUserPciDevice as core::ops::drop::Drop>::drop (
self=0x7f819002d7c0) at pci/src/vfio_user.rs:551
0x00005618920b8787 in core::ptr::drop_in_place<pci::vfio_user::VfioUserPciDevice> ()
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
0x00005618920b92e3 in core::ptr::drop_in_place<core::cell::UnsafeCell<dyn pci::device::PciDevice>>
() at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
0x00005618920b9362 in core::ptr::drop_in_place<std::sync::mutex::Mutex<dyn pci::device::PciDevice>> () at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
0x00005618920d8a3e in alloc::sync::Arc<T>::drop_slow (self=0x7f81968852b8)
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1092
0x00005618920ba273 in <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop (self=0x7f81968852b8)
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/alloc/src/sync.rs:1688
0x00005618920b76fb in core::ptr::drop_in_place<alloc::sync::Arc<std::sync::mutex::Mutex<dyn pci::device::PciDevice>>> ()
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ptr/mod.rs:188
0x0000561891b5e47d in vmm::device_manager::DeviceManager::eject_device (self=0x7f8190009600,
pci_segment_id=0, device_id=3) at vmm/src/device_manager.rs:4000
0x0000561891b674bc in <vmm::device_manager::DeviceManager as vm_device:🚌:BusDevice>::write (
self=0x7f8190009600, base=70368744108032, offset=8, data=&[u8](size=4) = {...})
at vmm/src/device_manager.rs:4625
0x00005618921927d5 in vm_device:🚌:Bus::write (self=0x7f8190006e00, addr=70368744108040,
data=&[u8](size=4) = {...}) at vm-device/src/bus.rs:235
0x0000561891b72e10 in <vmm::vm::VmOps as hypervisor::vm::VmmOps>::mmio_write (
self=0x7f81900097b0, gpa=70368744108040, data=&[u8](size=4) = {...}) at vmm/src/vm.rs:378
0x0000561892133ae2 in <hypervisor::kvm::KvmVcpu as hypervisor::cpu::Vcpu>::run (
self=0x7f8190013c90) at hypervisor/src/kvm/mod.rs:1114
0x0000561891914e85 in vmm::cpu::Vcpu::run (self=0x7f819001b230) at vmm/src/cpu.rs:348
0x000056189189f2cb in vmm::cpu::CpuManager::start_vcpu::{{closure}}::{{closure}} ()
at vmm/src/cpu.rs:953
Signed-off-by: Bo Chen <chen.bo@intel.com>
Based on the newly added Vdpa device along with the new vdpa parameter,
this patch enables the support for vDPA devices.
It's important to note this the only virtio device for which we provide
an ExternalDmaMapping instance. This will allow for the right DMA ranges
to be mapped/unmapped.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds `KVM_SET_GUEST_DEBUG` and `KVM_TRANSLATE` ioctls to
seccomp filter to enable guest debugging without `--seccomp=false`.
Signed-off-by: Akira Moroo <retrage01@gmail.com>
When enable PMU on arm64, ioctl with group KVM_HAS_DEVICE_ATTR will be
blocked by seccomp, add it to authorized list.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
In order to avoid the identity map region to conflict with a possible
firmware being placed in the last 4MiB of the 4GiB range, we must set
the address to a chosen location. And it makes the most sense to have
this region placed right after the TSS region.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Fix seccomp violation when trying to add the out FD to the epoll loop
when the serial buffer needs to be flushed.
0x00007ffff7dc093e in epoll_ctl () at ../sysdeps/unix/syscall-template.S:120
0x0000555555db9b6d in epoll::ctl (epfd=56, op=epoll::ControlOptions::EPOLL_CTL_MOD, fd=55, event=...)
at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/epoll-4.3.1/src/lib.rs:155
0x00005555556f5127 in vmm::serial_buffer::SerialBuffer::add_out_poll (self=0x7fffe800b5d0) at vmm/src/serial_buffer.rs:101
0x00005555556f583d in vmm::serial_buffer::{impl#1}::write (self=0x7fffe800b5d0, buf=...) at vmm/src/serial_buffer.rs:139
0x0000555555a30b10 in std::io::Write::write_all<vmm::serial_buffer::SerialBuffer> (self=0x7fffe800b5d0, buf=...)
at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/io/mod.rs:1527
0x0000555555ab82fb in devices::legacy::serial::Serial::handle_write (self=0x7fffe800b520, offset=0, v=13) at devices/src/legacy/serial.rs:217
0x0000555555ab897f in devices::legacy::serial::{impl#2}::write (self=0x7fffe800b520, _base=1016, offset=0, data=...) at devices/src/legacy/serial.rs:295
0x0000555555f30e95 in vm_device:🚌:Bus::write (self=0x7fffe8006ce0, addr=1016, data=...) at vm-device/src/bus.rs:235
0x00005555559406d4 in vmm::vm::{impl#4}::pio_write (self=0x7fffe8009640, port=1016, data=...) at vmm/src/vm.rs:459
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With the introduction of a new option `affinity` to the `cpus`
parameter, Cloud Hypervisor can now let the user choose the set
of host CPUs where to run each vCPU.
This is useful when trying to achieve CPU pinning, as well as making
sure the VM runs on a specific NUMA node.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize. This is the only way to be notified
by the kernel of a pty resize.
We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in. To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty. The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal. This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.
Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity. I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.
I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process. The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Musl often uses mmap to allocate memory where Glibc would use brk.
This has caused seccomp violations for me on the API and signal
handling threads.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Despite setting up a dedicated thread for signal handling, we weren't
making sure that the signals we were listening for there were actually
dispatched to the right thread. While the signal-hook provides an
iterator API, so we can know that we're only processing the signals
coming out of the iterator on our signal handling thread, the actual
signal handling code from signal-hook, which pushes the signals onto
the iterator, can run on any thread. This can lead to seccomp
violations when the signal-hook signal handler does something that
isn't allowed on that thread by our seccomp policy.
To reproduce, resize a terminal running cloud-hypervisor continuously
for a few minutes. Eventually, the kernel will deliver a SIGWINCH to
a thread with a restrictive seccomp policy, and a seccomp violation
will trigger.
As part of this change, it's also necessary to allow rt_sigreturn(2)
on the signal handling thread, so signal handlers are actually allowed
to run on it. The fact that this didn't seem to be needed before
makes me think that signal handlers were almost _never_ actually
running on the signal handling thread.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Allow vsocks to connect to Unix sockets on the host running
cloud-hypervisor with enabled seccomp.
Reported-by: Philippe Schaaf <philippe.schaaf@secunet.com>
Tested-by: Franz Girlich <franz.girlich@tu-ilmenau.de>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
This patch adds all the seccomp rules missing for MSHV.
With this patch MSFT internal CI runs with seccomp enabled.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This rule is needed to boot windows guest.
This bug was introduced while we tried to boot
windows guest on MSHV.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
This patch modify the existing live migration code
to support MSHV. Adds couple of new functions to enable
and disable dirty page tracking. Add missing IOCTL
to the seccomp rules for live migration.
Adds necessary flags for MSHV.
This changes don't affect KVM functionality at all.
In order to get better performance it is good to
enable dirty page tracking when we start live migration
and disable it when the migration is done.
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Whenever a file descriptor is sent through the control message, it
requires fcntl() syscall to handle it, meaning we must allow it through
the list of syscalls authorized for the HTTP thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The micro-http crate now uses recvmsg() syscall in order to receive file
descriptors through control messages. This means the syscall must be
part of the authorized list in the seccomp filters.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The to-be-introduced MSHV rules don't need to contain KVM rules and vice
versa.
Put KVM constants into to a module. This avoids the warnings about
dead code in the future.
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Because the http thread no longer needs to create the api socket,
remove the socket, bind and listen syscalls from the seccomp filter.
Signed-off-by: William Douglas <william.douglas@intel.com>
The main idea behind this commit is to remove all the complexity
associated with TX/RX handling for virtio-net. By using writev() and
readv() syscalls, we could get rid of intermediate buffers for both
queues.
The complexity regarding the TAP registration has been simplified as
well. The RX queue is only processed when some data are ready to be
read from TAP. The event related to the RX queue getting more
descriptors only serves the purpose to register the TAP file if it's not
already.
With all these simplifications, the code is more readable but more
performant as well. We can see an improvement of 10% for a single
queue device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Depending on the host OS the code for looking up the time for the CMOS
make require extra syscalls to be permitted for the vCPU thread.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
With all the preliminary work done in the previous commits, we can
update the VFIO implementation to support INTx along with MSI and MSI-X.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add the ability for cloud-hypervisor to create, manage and monitor a
pty for serial and/or console I/O from a user. The reasoning for
having cloud-hypervisor create the ptys is so that clients, libvirt
for example, could exit and later re-open the pty without causing I/O
issues. If the clients were responsible for creating the pty, when
they exit the main pty fd would close and cause cloud-hypervisor to
get I/O errors on writes.
Ideally the main and subordinate pty fds would be kept in the main
vmm's Vm structure. However, because the device manager owns parsing
the configuration for the serial and console devices, the information
is instead stored in new fields under the DeviceManager structure
directly.
From there hooking up the main fd is intended to look as close to
handling stdin and stdout on the tty as possible (there is some future
work ahead for perhaps moving support for the pty into the
vmm_sys_utils crate).
The main fd is used for reading user input and writing to output of
the Vm device. The subordinate fd is used to setup raw mode and it is
kept open in order to avoid I/O errors when clients open and close the
pty device.
The ability to handle multiple inputs as part of this change is
intentional. The current code allows serial and console ptys to be
created and both be used as input. There was an implementation gap
though with the queue_input_bytes needing to be modified so the pty
handlers for serial and console could access the methods on the serial
and console structures directly. Without this change only a single
input source could be processed as the console would switch based on
its input type (this is still valid for tty and isn't otherwise
modified).
Signed-off-by: William Douglas <william.r.douglas@gmail.com>
Using directly preadv and pwritev, we can simply use a RawFd instead of
a file, and we don't need to use the more complex implementation from
the qcow crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch refines the sccomp filter list for the vCPU thread, as we are
no longer spawning virtio-device threads from the vCPU thread.
Fixes: #2170
Signed-off-by: Bo Chen <chen.bo@intel.com>
Older libc (like RHEL7) uses open() rather than openat(). This was
demonstrated through a failure to open /etc/localtime as used by
gmtime() libc call trigged from the vCPU thread (CMOS device.)
Fixes: #2111
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If the vCPU thread calls log!() the time difference between the call
time and the boot up time is reported. On most environments and
architectures this covered by a vDSO call rather than a syscall. However
on some platforms this turns into a syscall.
Fixes: #2080
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The live migration support added use of this ioctl but it wasn't
included in the permitted list.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The snasphot/restore feature is not working because some CPU states are
not properly saved, which means they can't be restored later on.
First thing, we ensure the CPUID is stored so that it can be properly
restored later. The code is simplified and pushed down to the hypervisor
crate.
Second thing, we identify for each vCPU if the Hyper-V SynIC device is
emulated or not. In case it is, that means some specific MSRs will be
set by the guest. These MSRs must be saved in order to properly restore
the VM.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The watchdog device is created through the "--watchdog" parameter. At
most a single watchdog can be created per VM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Without the unlink(2) syscall being allowed, Cloud-Hypervisor crashes
when we remove a virtio-vsock device that has been previously added.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The definition of libc::SYS_ftruncate on AArch64 is different
from that on x86_64. This commit unifies the previously hard-coded
syscall number for AArch64.
Signed-off-by: Henry Wang <Henry.Wang@arm.com>