This was causing some issues because of the use of 2 different versions
for the vm-memmory crate. We'll wait for all dependencies to be properly
resolved before we move to 0.7.0.
This reverts commit 76b6c62d07.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
These structs are not read on the VMM side but are used in communication
with the guest.
As identified by the new beta clippy.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
There's no need to patch the vhost crate anymore since the fixes we were
looking for have been released as part of 0.2.0 on crates.io.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Setting the reply_ack should depend on the set of acknowledged features
containing the REPLY_ACK flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to support correctly the snapshot/restore and migration use
cases, we must be careful with the ranges that we discard by punching
holes. On restore, there might be some ranges already plugged in,
meaning they should not be discarded. That's why we loop over the list
of blocks to discard only the ranges that are marked as unplugged.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By creating the BlocksState object in the MemoryManager, we can directly
provide it to the virtio-mem device when being created. This will allow
the MemoryManager through each VirtioMemZone to have a handle onto the
blocks that are plugged at any point in time.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This is going to be useful to let virtio-mem report the list of ranges
that are currently plugged, so that both snapshot/restore and migration
will copy only what is needed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This will be helpful to support the creation of a MemoryRangeTable from
virtio-mem, as it uses 2M pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with virtio-mem devices attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement the infrastructure that lets a virtio-mem device map the guest
memory into the device. This is necessary since with virtio-mem zones
memory can be added or removed and the vfio-user device must be
informed.
Fixes: #3025
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
For vfio-user the mapping handler is per device and needs to be removed
when the device in unplugged.
For VFIO the mapping handler is for the default VFIO container (used
when no vIOMMU is used - using a vIOMMU does not require mappings with
virtio-mem)
To represent these two use cases use an enum for the handlers that are
stored.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Adding the snapshot/restore support along with migration as well,
allowing a VM with a virtio-balloon device attached to be properly
migrated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The MSI IOVA address on X86 and AArch64 is different.
This commit refactored the code to receive the MSI IOVA address and size
from device_manager, which provides the actual IOVA space data for both
architectures.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
For most use cases, there is no need to create multiple VFIO containers
as it causes unwanted behaviors. Especially when passing multiple
devices from the same IOMMU group, we need to use the same container so
that it can properly list the groups that have been already opened. The
correct logic was already there in vfio-ioctls, but it was incorrectly
used from our VMM implementation.
For the special case where we put a VFIO device behind a vIOMMU, we must
create one container per device, as we need to control the DMA mappings
per device, which is performed at the container level. Because we must
keep one container per device, the vIOMMU use case prevents multiple
devices attached to the same IOMMU group to be passed through the VM.
But this is a limitation that we are fine with, especially since the
vIOMMU doesn't let us group multiple devices in the same group from a
guest perspective.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When a pty is resized (using the TIOCSWINSZ ioctl -- see ioctl_tty(2)),
the kernel will send a SIGWINCH signal to the pty's foreground process
group to notify it of the resize. This is the only way to be notified
by the kernel of a pty resize.
We can't just make the cloud-hypervisor process's process group the
foreground process group though, because a process can only set the
foreground process group of its controlling terminal, and
cloud-hypervisor's controlling terminal will often be the terminal the
user is running it in. To work around this, we fork a subprocess in a
new process group, and set its process group to be the foreground
process group of the pty. The subprocess additionally must be running
in a new session so that it can have a different controlling
terminal. This subprocess writes a byte to a pipe every time the pty
is resized, and the virtio-console device can listen for this in its
epoll loop.
Alternatives I considered were to have the subprocess just send
SIGWINCH to its parent, and to use an eventfd instead of a pipe.
I decided against the signal approach because re-purposing a signal
that has a very specific meaning (even if this use was only slightly
different to its normal meaning) felt unclean, and because it would
have required using pidfds to avoid race conditions if
cloud-hypervisor had terminated, which added complexity. I decided
against using an eventfd because using a pipe instead allows the child
to be notified (via poll(2)) when nothing is reading from the pipe any
more, meaning it can be reliably notified of parent death and
terminate itself immediately.
I used clone3(2) instead of fork(2) because without
CLONE_CLEAR_SIGHAND the subprocess would inherit signal-hook's signal
handlers, and there's no other straightforward way to restore all signal
handlers to their defaults in the child process. The only way to do
it would be to iterate through all possible signals, or maintain a
global list of monitored signals ourselves (vmm:vm::HANDLED_SIGNALS is
insufficient because it doesn't take into account e.g. the SIGSYS
signal handler that catches seccomp violations).
Signed-off-by: Alyssa Ross <hi@alyssa.is>
This prepares us to be able to handle console resizes in the console
device's epoll loop, which we'll have to do if the output is a pty,
since we won't get SIGWINCH from it.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Musl often uses mmap to allocate memory where Glibc would use brk.
This has caused seccomp violations for me on the API and signal
handling threads.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
As well as reducing the amount of code this also improves the binary
size slightly:
cargo bloat --release -n 2000 --bin cloud-hypervisor | grep virtio_devices::seccomp_filters::get_seccomp_rules
Before:
0.1% 0.2% 7.8KiB virtio_devices virtio_devices::seccomp_filters::get_seccomp_rules
After:
0.0% 0.1% 3.0KiB virtio_devices virtio_devices::seccomp_filters::get_seccomp_rules
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Introduce a common solution for spawning the virtio threads which will
make it easier to add the panic handling.
During this effort I discovered that there were no seccomp filters
registered for the vhost-user-net thread nor the vhost-user-block
thread. This change also incorporates basic seccomp filters for those as
part of the refactoring.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the processing of the input from stdin, PTY or file from the VMM
thread to the existing virtio-console thread. The handling of the resize
of a virtio-console has not changed but the name of the struct used to
support that has been renamed to reflect its usage.
Fixes: #3060
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
vhdx_sync.rs in block_util implements traits to represent the vhdx
crate as a supported block device in the cloud hypervisor. The vhdx
is added to the block device list in device_manager.rs at the vmm
crate so that it can automatically detect a vhdx disk and invoke the
corresponding crate.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Fazla Mehrab <akm.fazla.mehrab@intel.com>
We are relying on applying empty 'seccomp' filters to support the
'--seccomp false' option, which will be treated as an error with the
updated 'seccompiler' crate. This patch fixes this issue by explicitly
checking whether the 'seccomp' filter is empty before applying the
filter.
Signed-off-by: Bo Chen <chen.bo@intel.com>