A new version of vm-memory was released upstream which resulted in some
components pulling in that new version. Update the version number used
to point to the latest version but continue to use our patched version
due to the fix for #1258
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Misspellings were identified by https://github.com/marketplace/actions/check-spelling
* Initial corrections suggested by Google Sheets
* Additional corrections by Google Chrome auto-suggest
* Some manual corrections
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Increase the number of open files limit for the sandboxed process to the
maximum allowed in the system. The maximum is obtained by reading the
/proc/sys/fs/nr_open sysctl file, and the setting is done using the setrlimit
syscall. Failure to read or parse the nr_open file, or to set the rlimit
results in a panic.
Signed-off-by: Ricardo Koller <ricarkol@gmail.com>
The binary is still built in the same location but the source code and
the dependencies for it come from the vhost_user_fs crate itself.
The binary will be built with:
`cargo build --all --bin vhost_user_fs` or just `cargo build --all`
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Split the generic virtio code (queues and device type) from the
VirtioDevice trait, transport and device implementations.
This also simplifies the feature handling in vhost_user_backend as the
vm-virtio crate is no longer has any features.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
This corresponds to QEMU's 63659fe74e76f5c52854 commit.
the setattr code uses both fchmod and fchmodat in different cases,
however we only had fchmodat in the whitelist.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This is a preparing commit to build and test CH on AArch64. All building
issues were fixed, but no functionality was introduced.
For X86, the logic of code was not changed at all.
For ARM, the architecture specific part is still empty. And we applied
some tricks to workaround lint warnings. But such code will be replaced
later by other commits with real functionality.
Signed-off-by: Michael Zhao <michael.zhao@arm.com>
Add missing syscall used by the musl build.
TEST=scripts/dev_cli.sh tests --libc musl --integration -- vhost_user_fs_daemon
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Implement seccomp; we use one filter for all threads.
The syscall list comes from the C daemon with syscalls added
as I hit them.
The default behaviour is to kill the process, this normally gets
audit logged.
--seccomp none disables seccomp
log Just logs violations but doesn't stop it
trap causes a signal to be be sent that can be trapped.
If you suspect you're hitting a seccomp action then you can
check the audit log; you could also switch to running with 'log'
to collect a bunch of calls to report.
To see where the syscalls are coming from use 'trap' with a debugger
or coredump to backtrace it.
This can be improved for some syscalls to restrict the parameters
to some syscalls to make them more restrictive.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Implement support for setting up a sandbox for running the
service. The technique for this has been borrowed from virtiofsd, and
consists on switching to new PID, mount and network namespaces, and
then switching root to the directory to be shared.
Future patches will implement additional hardening features like
dropping capabilities and seccomp filters.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Allow callers to provide a file descriptor for /proc/self/fd. This is
useful for sandboxing, as we may be running in a namespace that
doesn't have access to /proc.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Open a file descriptor to /proc/self/fd instead of /proc. We aren't
using any other entries from that directory, and doing this allows us
to keep working even if /proc is no longer present in our
namespace (useful for sandboxing).
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add the WRITE_KILL_PRIV write flag, corresponding to
FUSE_WRITE_KILL_PRIV introduced in 7.31, and use to only remove the
setuid and setgid bits (by switching credentials) conditionally.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add support for MAX_PAGES, corresonding to FUSE_MAX_PAGES introduced
in FUSE 7.28.
This allows us to negotiate with the kernel the maximum number of
pages that we support to transfer in a single request.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Add support for FOPEN_CACHE_DIR, a flag that allows us to tell the
guest that it's safe to cache a directory, introduced in FUSE 7.28.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Virtio-fs daemon expects fs_slave_io() returns the number of bytes
read/written on success, but we always return 0 and make userspace think
nothing has been read/written.
Fix it by returning the actual bytes read/written. Note that This
depends on the corresponding fix in vhost crate.
Fixes: #949
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Implement missing support for FUSE_LSEEK, which basically implies
calling to libc::lseek on the file handle. As this operation alters
the file offset, we take a write lock on the File's RwLock.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Extended attributes (xattr) support has a huge impact on write
performance. The reason for this is that, if enabled, FUSE sends a
setxattr request after each write operation, and due to the inode
locking inside the kernel during said request, the ability to execute
the operations in parallel becomes heavily limited.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Replace HandleData's File Mutex with a RwLock to have more granularity
on the lock. This allows operations on the same File that are safe to
be run in parallel (at this moment, read and write), to acquire a read
lock to avoid waiting on each other.
Signed-off-by: Sergio Lopez <slp@redhat.com>