This reverts commit 66fde245b39b71dfc2e46803bcaf640f8175bcb5.
The commit broke the VFIO support for MSI. Issue needs to be
investigated but in the meantime, it is safer to fix the codebase.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The setcap tool is unusual in its behaviour in that it will process each
parameter separately and abort if one of the parameters is invalid (e.g.
after a symlink.) But any previous parameters will be correctly
processed. This means that depending on the generated test binary name
an invalid entry might occur before it and thus abort the setcap.
Fix to only setcap on the test binary by excluding the other files.
Because the binary name is based on a hash the PR that introduced this
version worked but once merged the hash changed and broke the build.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to validate the new virtio-fs daemon written in Rust is
behaving correctly, a new integration test has been added. Important to
note that for now, only a test with cache=none and dax=off can be added
since the daemon does not support shared memory region yet.
The long term goal being to replace virtiofsd with vhost_user_daemon
once it will reach parity regarding the supported features.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Because the vhost_user_backend crate needs some changes to support
moving the process to a different mount namespace and perform a pivot
root, it is not possible to change '/' to the given shared directory.
This commit, as a temporary measure, let the code point at the given
shared directory.
The long term solution is to perform the mount namespace change and the
pivot root as this will provide greater security.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This patch implements a vhost-user-fs daemon based on Rust. It only
supports communicating through the virtqueues. The support for the
shared memory region associated with DAX will be added later.
It relies on all the code copied over from the crosvm repository, based
on the commit 961461350c0b6824e5f20655031bf6c6bf6b7c30.
It also relies on the vhost_user_backend crate, limiting the amount of
code needed to get this daemon up and running.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a Server type that links the FUSE protocol with the virtio
transport. It parses messages sent on the virtio queue and then
calls the appropriate method of the Filesystem trait.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
One small modification has been applied to the original code. Because
cloud-hypervisor didn't have the macro used by crosvm, the match
statement in the function handle_message() has been updated.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Introduce helpers to split a virtio descriptor into its readable part on
one side, and into its writable part on the other side. This is useful
to separate the request from the reply.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Two important modifications have been applied to the original code:
- GuestMemory is replaced by GuestMemoryMmap from the vm-memory crate,
which comes with different ways of accessing the memory regions.
- VolatileSlice has different methods, which means the code has been
updated accordingly.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The vm_memory implementation for VolatileSlice is able to read and write
to a source or destination which implements a Read or Write trait.
Unfortunately, this is not enough for this specific use case as we need
to be able to write to a file at a specific offset, which is not
provided by the Read or Write trait.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add a "passthrough" file system implementation that just forwards its
requests to the appropriate system call.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add the `Filesystem` trait, which is the main interface between the
transport and the actual file system implementation.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The multikey module provides a BTreeMap implementation that can use one
of 2 different kinds of keys to look up a value. This is needed by the
virtio-fs server since it needs to be able to look up keys either by
u64 or by a (ino_t, dev_t) pair.
This code has been ported over from crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
To be able to deal with FUSE requests, this commit introduces FUSE
definitions, copied over from the crosvm commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new crate will be dedicated to vhost_user_fs specific code that can
be used as a library from the vhost-user-fs daemon.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to iterate over a chain of descriptor chains, this code has
been ported over from crosvm, based on the commit
961461350c0b6824e5f20655031bf6c6bf6b7c30.
The main modification compared to the original code is the way the
sorting between readable and writable descriptors happens.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
By giving the same caps to both cloud-hypervisor and the test binary, we
can access information under /proc related to the VM PID.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Instead of using bash and awk, using Rust allows us to retrieve
information about a VM process with the right permissions as we are not
forced to spawn a new child process.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The test validates that when the mergeable option is enabled, the
resulting PSS for two instances of cloud-hypervisor is lower than two
instances not using the mergeable flag.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VM is started with the flag "--pmem mergeable=on", it means
the user expects the guest persistent memory pages to be marked as
mergeable. This commit relies on the madvise(MADV_MERGEABLE) system call
to inform the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the persistent memory pages should
be marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case the VM is started with the flag "--memory mergeable=on", it
means the user expects the guest RAM pages to be marked as mergeable.
This commit relies on the madvise(MADV_MERGEABLE) system call to inform
the host kernel about these pages.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In order to let the user indicate if the guest RAM pages should be
marked as mergeable or not, a new option is being introduced.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When vmm.ping give a response, we expect get the version from
the VMM not the vmm create
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
vmm.ping will help to check if http API server is up and
running.
This also removes the vmm.info endpoint.
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
The CPU manager uses an I/O port and to prevent potential clashes with
assignment for PCI devices ensure that it is allocated by the allocator.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Rather than hardcode the CPU status for all the CPUs instead query from
the CPU manager via the I/O port that is is on via the ACPI tables.
Each CPU device has a _STA method that calls into the CSTA method which
reads and writes the I/O ports via the PRST field which exposes the I/O
port through and OpRegion.
As we only support boot CPUS report that all the CPUs are enabled for
now.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The Linux kernel expects all CPUs, whether they be enabled or disabled
to have an _MAT entry containing the LAPIC details for this CPU with the
enabled bit set to 1 (in the flags.)
In the MADT table the same bit is used to determine if the CPU is
present at boot vs available later.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This is necessary as adding support for NamedFields requires a PkgLength
calculation that does not include the length itself.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
We disabled msi/msix twice inside Drop trait for VfioPciDevice,
which resulted in error message "Could not disable MSI-X". Eliminating
this error by check whether the msi/msix capability is enabled.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
CC: Liu Jiang <gerry@linux.alibaba.com>
The comments of vfio kernel module said that individual subindex
interrupts can be disabled using the -1 value for DATA_EVENTFD or
the index can be disabled as a whole with:
flags = (DATA_NONE|ACTION_TRIGGER), count = 0.
Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com>
CC: Liu Jiang <gerry@linux.alibaba.com>
Update all references to the new repository locations. Many of these will
redirect however the one used for the hypervisor-fw binary does not so
this is required to allow the builds to pass.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This commit introduces a dedicated document describing the device model
supported by cloud-hypervisor VMM.
It needs to be updated anytime a new device will be added in the future.
Fixes#437
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
When adding devices to the guest, and populating the device model, we
should prefix the routines with add_. When we're just creating the
device objects but not yet adding them we use make_.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
MMIO devices creation code into its own routine.
Fixes: #441
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
PCI devices creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
In order to reduce the DeviceManager's new() complexity, we can move the
ACPI device creation code into its own routine.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>