Correctly implement the virtio specification by setting the writeback
field on the request based on the algorithm in the spec.
TEST=Boot with hypervisor-firmware with CH in verbose mode. See info
level messages saying cache mode is writethrough in firmware (no support
for flush or WCE). Once in the Linux kernel see messages that mode is
writeback.
Fixes: #1216
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
When this is set to false the write needs to be followed by a flush on
the underlying disk (leading to a fsync()).
The default behaviour is not changed with this change.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Previous to adding a a trait method to inform the backends of the acked
features backends can use features than the guest has not enabled which
could lead to unpredictable results.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Move the method that is used to decide whether the guest should be
signalled into the Queue implementation from vm-virtio. This removes
duplicated code between vhost_user_backend and the vm-virtio block
implementation.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The implementation of this virtio block (and vhost-user block) command
called a function that was a no-op on Linux. Use the same function as
virtio-pmem to ensure that data is not lost when the guest asks for it
to be flused to disk.
Fixes: #399
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Now that snapshot/restore support has been enabled for virtio-vsock, the
corresponding integration test is expanded with some validation that
virtio-vsock supports to be snapshot and restored.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When compiled with pci feature, the integration test now validates that
/dev/vdb can be correctly read while being placed behing a virtual
IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
For both virtio-mmio and virtio-pci transport layers, we were setting
every field from the saved snapshot during a restore. This is a problem
when we don't want to override specific fields such as iommu_mapping_cb
because the saved snapshot doesn't contain the appropriate information.
That's why this commit sets only the appropriate field from the saved
snapshot during a restore.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Provide implementation for both snapshot() and restore() methods from
the Snapshottable trait, so that we can snapshot and restore a VM with
devices attached to a virtual IOMMU.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The integration test validating that --serial off works correctly was
not properly written as it was using the FW, which by default would use
the kernel command line found in the EFI partition. Unfortunately, this
kernel command line was including "console=ttyS0", which causes the
kernel to try to write to the serial port, even if there's no serial
port being emulated.
The problem is, when no emulation of the serial port is provided, the
default value returned on 0x3f8 is 0, which makes the guest kernel think
that some data needs to be read.
The only way to avoid all this is by ensuring we can control the kernel
command line by removing any occurence of "console=ttyS0" from it.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Explicit call to 'close()' is required on file descriptors allocated
from 'epoll::create()', which is missing for the 'EpollContext' and
'VringWorker'. This patch enforces to close the file descriptors by
reusing the Drop trait of the 'File' struct.
Signed-off-by: Bo Chen <chen.bo@intel.com>
According to the virtio spec the guest should always be interrupted when
"used" descriptors are returned from the device to the driver. However
this was not the case for the TX queue in either the virtio-net
implementation or the vhost-user-net implementation.
This would have meant that the guest could end up with a reduced TX
throughput as it would not know that the packets had been dispatched via
the VMM.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
As the parsing code is reused the flush feature is already implemented
and ready to be used.
Fixes: #1197
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add missing syscall used by the musl build.
TEST=scripts/dev_cli.sh tests --libc musl --integration -- vhost_user_fs_daemon
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extend the set of tests we have for virtio-net and vhost-user-net to
check for host MAC address setting.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add a new "host_mac" parameter to "--net" and "--net-backend" and use
this to set the MAC address on the tap interface. If no address is given
one is randomly assigned and is stored in the config.
Support for vhost-user-net self spawning was also included.
Fixes: #1177
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Do this by reading the HW address information and then modifying the
HW address to match the desired address. Preserving the rest of the
state including the address type.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The clock_nanosleep system call needs to be whitelisted since the commit
12e00c0f45 introduced the use of a sleep()
function. Without this patch, we can see an error when the VM is paused
or killed.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.
The default behaviour is to kill the process, this normally gets
audit logged.
--seccomp none disables seccomp
log Just logs violations but doesn't stop it
trap causes a signal to be be sent that can be trapped.
If you suspect you're hitting a seccomp action then you can
check the audit log; you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.
This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Like the actions that don't take data such as "pause" or "resume" use a
common handler implementation to remove duplicated code for handling
simple endpoints like the hotplug ones.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Many of the API requests take a similar form with a single data item
(i.e. config for a device hotplug) expand the VmAction enum to handle
those actions and a single function to dispatch those API events.
For now port the existing helper functions to use this new API. In the
future the HTTP layer can create the VmAction directly avoiding the
extra layer of indirection.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than save the save a function pointer and use that instead the
underlying action. This is useful for two reasons:
1. We can ensure that we generate HttpErrors in the same way as the
other endpoints where API error variant should be determined by the
request being made not the underlying error.
2. It can be extended to handle other generic actions where the function
prototype differs slightly.
As result of this refactoring it was found that the "vm.delete" endpoint
was not connected so address that issue.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Extend the EndpointHandler trait to include automatic support for
handling PUT requests. This will allow the removal of lots of duplicated
code in the following commit from the API handling code.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.
Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Changes is vhost crate require VhostUserDaemon users to create and
provide a vhost::Listener in advance. This allows us to adopt
sandboxing strategies in the future, by being able to create the UNIX
socket before switching to a restricted namespace.
Update also the reference to vhost crate in Cargo.lock to point to the
latest commit from the dragonball branch.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Allow callers to provide a file descriptor for /proc/self/fd. This is
useful for sandboxing, as we may be running in a namespace that
doesn't have access to /proc.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Open a file descriptor to /proc/self/fd instead of /proc. We aren't
using any other entries from that directory, and doing this allows us
to keep working even if /proc is no longer present in our
namespace (useful for sandboxing).
Signed-off-by: Sergio Lopez <slp@redhat.com>
There's a simple way to trigger PCI BAR reprogramming for a given
device, by removing it and then hotplugging it back. The Linux kernel
will simply choose to place the BARs at different location than the ones
chosen by Cloud-Hypervisor. By doing so, and creating the snapshot after
this hotplug operation, we can manage to validate that the resource are
correctly restored for a given virtio-pci device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>