The old semantics of virFork() violates the priciple of good
usability: it requires the caller to check the pid argument
after use, *even when virFork returned -1*, in order to properly
abort a child process that failed setup done immediately after
fork() - that is, the caller must call _exit() in the child.
While uses in virfile.c did this correctly, uses in 'virsh
lxc-enter-namespace' and 'virt-login-shell' would happily return
from the calling function in both the child and the parent,
leading to very confusing results. [Thankfully, I found the
problem by inspection, and can't actually trigger the double
return on error without an LD_PRELOAD library.]
It is much better if the semantics of virFork are impossible
to abuse. Looking at virFork(), the parent could only ever
return -1 with a non-negative pid if it misused pthread_sigmask,
but this never happens. Up until this patch series, the child
could return -1 with non-negative pid if it fails to set up
signals correctly, but we recently fixed that to make the child
call _exit() at that point instead of forcing the caller to do
it. Thus, the return value and contents of the pid argument are
now redundant (a -1 return now happens only for failure to fork,
a child 0 return only happens for a successful 0 pid, and a
parent 0 return only happens for a successful non-zero pid),
so we might as well return the pid directly rather than an
integer of whether it succeeded or failed; this is also good
from the interface design perspective as users are already
familiar with fork() semantics.
One last change in this patch: before returning the pid directly,
I found cases where using virProcessWait unconditionally on a
cleanup path of a virFork's -1 pid return would be nicer if there
were a way to avoid it overwriting an earlier message. While
such paths are a bit harder to come by with my change to a direct
pid return, I decided to keep the virProcessWait change in this
patch.
* src/util/vircommand.h (virFork): Change signature.
* src/util/vircommand.c (virFork): Guarantee that child will only
return on success, to simplify callers. Return pid rather than
status, now that the situations are always the same.
(virExec): Adjust caller, also avoid open-coding process death.
* src/util/virprocess.c (virProcessWait): Tweak semantics when pid
is -1.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/commandtest.c (test23): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Auditing all callers of virCommandRun and virCommandWait that
passed a non-NULL pointer for exit status turned up some
interesting observations. Many callers were merely passing
a pointer to avoid the overall command dying, but without
caring what the exit status was - but these callers would
be better off treating a child death by signal as an abnormal
exit. Other callers were actually acting on the status, but
not all of them remembered to filter by WIFEXITED and convert
with WEXITSTATUS; depending on the platform, this can result
in a status being reported as 256 times too big. And among
those that correctly parse the output, it gets rather verbose.
Finally, there were the callers that explicitly checked that
the status was 0, and gave their own message, but with fewer
details than what virCommand gives for free.
So the best idea is to move the complexity out of callers and
into virCommand - by default, we return the actual exit status
already cleaned through WEXITSTATUS and treat signals as a
failed command; but the few callers that care can ask for raw
status and act on it themselves.
* src/util/vircommand.h (virCommandRawStatus): New prototype.
* src/libvirt_private.syms (util/command.h): Export it.
* docs/internals/command.html.in: Document it.
* src/util/vircommand.c (virCommandRawStatus): New function.
(virCommandWait): Adjust semantics.
* tests/commandtest.c (test1): Test it.
* daemon/remote.c (remoteDispatchAuthPolkit): Adjust callers.
* src/access/viraccessdriverpolkit.c (virAccessDriverPolkitCheck):
Likewise.
* src/fdstream.c (virFDStreamCloseInt): Likewise.
* src/lxc/lxc_process.c (virLXCProcessStart): Likewise.
* src/qemu/qemu_command.c (qemuCreateInBridgePortWithHelper):
Likewise.
* src/xen/xen_driver.c (xenUnifiedXendProbe): Simplify.
* tests/reconnect.c (mymain): Likewise.
* tests/statstest.c (mymain): Likewise.
* src/bhyve/bhyve_process.c (virBhyveProcessStart)
(virBhyveProcessStop): Don't overwrite virCommand error.
* src/libvirt.c (virConnectAuthGainPolkit): Likewise.
* src/openvz/openvz_driver.c (openvzDomainGetBarrierLimit)
(openvzDomainSetBarrierLimit): Likewise.
* src/util/virebtables.c (virEbTablesOnceInit): Likewise.
* src/util/viriptables.c (virIpTablesOnceInit): Likewise.
* src/util/virnetdevveth.c (virNetDevVethCreate): Fix debug
message.
* src/qemu/qemu_capabilities.c (virQEMUCapsInitQMP): Add comment.
* src/storage/storage_backend_iscsi.c
(virStorageBackendISCSINodeUpdate): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, a caller waiting for a child process either requires
the child to have status 0, or must use WIFEXITED() and friends
itself. But in many cases, we want the middle ground of treating
fatal signals as an error, and directly accessing the normal exit
value without having to use WEXITSTATUS(), in order to easily
detect an expected non-zero exit status. This adds the middle
ground to the low-level virProcessWait; the next patch will add
it to virCommand.
* src/util/virprocess.h (virProcessWait): Alter signature.
* src/util/virprocess.c (virProcessWait): Add parameter.
(virProcessRunInMountNamespace): Adjust caller.
* src/util/vircommand.c (virCommandWait): Likewise.
* src/util/virfile.c (virFileAccessibleAs): Likewise.
* src/lxc/lxc_container.c (lxcContainerHasReboot)
(lxcContainerAvailable): Likewise.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* tools/virt-login-shell.c (main): Likewise.
* tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput): Likewise.
* tests/commandtest.c (test23): Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
When a child fails without exec'ing, we want a well-known status;
best is to match what env(1), nice(1), su(1), and other wrapper
programs do. This patch adds enum values that later patches will
use, and sets up virFork as the first client of EXIT_CANCELED
for errors detected prior to even attempting exec, as well as
virExec to distinguish between a missing executable vs. a binary
that cannot be executed.
This is a slight semantic change in the unlikely case of a child
process failing to restore its signal mask - we now kill the
child with a known status instead of relying on the caller to
notice and do an appropriate _exit(). A subsequent patch will
make further cleanups based on an audit of all callers.
* src/internal.h (EXIT_CANCELED, EXIT_CANNOT_INVOKE)
(EXIT_ENOENT): New enum.
* src/util/vircommand.c (virFork): Document specific exit value if
child aborts early.
(virExec): Distinguish between various exec failures.
* tests/commandtest.c (test1): Enhance test.
(test22): New test.
Signed-off-by: Eric Blake <eblake@redhat.com>
There are some units within libvirt that utilize virCommand API to run
some commands and deserve own unit testing. These units are, however,
not desired to be rewritten to dig virCommand API usage out. As a great
example virNetDevBandwidth could be used. The problem with the bandwidth
unit is: it uses virCommand API heavily. Therefore we need a mechanism
to not really run a command, but rather see its string representation
after which we can decide if the unit construct the correct sequence of
commands or not.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Most of our code base uses space after comma but not before;
fix the remaining uses before adding a syntax check.
* src/util/vircommand.c: Consistently use commas.
* src/util/virlog.c: Likewise.
* src/util/virnetdevbandwidth.c: Likewise.
* src/util/virnetdevmacvlan.c: Likewise.
* src/util/virnetdevvportprofile.c: Likewise.
* src/util/virnetlink.c: Likewise.
* src/util/virpci.c: Likewise.
* src/util/virsysinfo.c: Likewise.
* src/util/virusb.c: Likewise.
Signed-off-by: Eric Blake <eblake@redhat.com>
When running setuid, we must be careful about what env vars
we allow commands to inherit from us. Replace the
virCommandAddEnvPass function with two new ones which do
filtering
virCommandAddEnvPassAllowSUID
virCommandAddEnvPassBlockSUID
And make virCommandAddEnvPassCommon use the appropriate
ones
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCommandAddEnvPassCommon method ignored the failure to
pre-allocate the env variable array with VIR_RESIZE_N. While
this is harmless, it confuses the test harness which is trying
to validate OOM handling of every individual allocation call.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The VIR_FREE() macro will cast away any const-ness. This masked a
number of places where we passed a 'const char *' string to
VIR_FREE. Fortunately in all of these cases, the variable was not
in fact const data, but a heap allocated string. Fix all the
variable declarations to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=964358
POSIX states that multi-threaded apps should not use functions
that are not async-signal-safe between fork and exec, yet we
were using getpwuid_r and initgroups. Although rare, it is
possible to hit deadlock in the child, when it tries to grab
a mutex that was already held by another thread in the parent.
I actually hit this deadlock when testing multiple domains
being started in parallel with a command hook, with the following
backtrace in the child:
Thread 1 (Thread 0x7fd56bbf2700 (LWP 3212)):
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007fd5761e7388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fd5761e7257 in __pthread_mutex_lock (mutex=0x7fd56be00360)
at pthread_mutex_lock.c:61
#3 0x00007fd56bbf9fc5 in _nss_files_getpwuid_r (uid=0, result=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, errnop=0x7fd56bbf25b8)
at nss_files/files-pwd.c:40
#4 0x00007fd575aeff1d in __getpwuid_r (uid=0, resbuf=0x7fd56bbf0c70,
buffer=0x7fd55c2a65f0 "", buflen=1024, result=0x7fd56bbf0cb0)
at ../nss/getXXbyYY_r.c:253
#5 0x00007fd578aebafc in virSetUIDGID (uid=0, gid=0) at util/virutil.c:1031
#6 0x00007fd578aebf43 in virSetUIDGIDWithCaps (uid=0, gid=0, capBits=0,
clearExistingCaps=true) at util/virutil.c:1388
#7 0x00007fd578a9a20b in virExec (cmd=0x7fd55c231f10) at util/vircommand.c:654
#8 0x00007fd578a9dfa2 in virCommandRunAsync (cmd=0x7fd55c231f10, pid=0x0)
at util/vircommand.c:2247
#9 0x00007fd578a9d74e in virCommandRun (cmd=0x7fd55c231f10, exitstatus=0x0)
at util/vircommand.c:2100
#10 0x00007fd56326fde5 in qemuProcessStart (conn=0x7fd53c000df0,
driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1,
stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
flags=1) at qemu/qemu_process.c:3694
...
The solution is to split the work of getpwuid_r/initgroups into the
unsafe portions (getgrouplist, called pre-fork) and safe portions
(setgroups, called post-fork).
* src/util/virutil.h (virSetUIDGID, virSetUIDGIDWithCaps): Adjust
signature.
* src/util/virutil.c (virSetUIDGID): Add parameters.
(virSetUIDGIDWithCaps): Adjust clients.
* src/util/vircommand.c (virExec): Likewise.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* src/security/security_dac.c (virSecurityDACSetProcessLabel):
Likewise.
* src/lxc/lxc_container.c (lxcContainerSetID): Likewise.
* configure.ac (AC_CHECK_FUNCS_ONCE): Check for setgroups, not
initgroups.
Signed-off-by: Eric Blake <eblake@redhat.com>
Recent changes uncovered a NEGATIVE_RETURNS in the return from sysconf()
when processing a for loop in virtTestCaptureProgramExecChild() in
testutils.c
Code review uncovered 3 other code paths with the same condition that
weren't found by Covirity, so fixed those as well.
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
This patch adds two sets of functions:
1) lower level virProcessSet*() functions that will immediately set
the RLIMIT_MEMLOCK. RLIMIT_NPROC, or RLIMIT_NOFILE of either the
current process (using setrlimit()) or any other process (using
prlimit()). "current process" is indicated by passing a 0 for pid.
2) functions for virCommand* that will setup a virCommand object to
set those limits at a later time just after it has forked a new
process, but before it execs the new program.
configure.ac has prlimit and setrlimit added to the list of functions
to check for, and the low level functions log an "unsupported" error)
on platforms that don't support those functions.
My commit 7a2e845a86 (and its
prerequisites) managed to effectively ignore the
clear_emulator_capabilities setting in qemu.conf (visible in the code
as the VIR_EXEC_CLEAR_CAPS flag when qemu is being exec'ed), with the
result that the capabilities are always cleared regardless of the
qemu.conf setting. This patch fixes it by passing the flag through to
virSetUIDGIDWithCaps(), which uses it to decide whether or not to
clear existing capabilities before adding in those that were
requested.
Note that the existing capabilities are *always* cleared if the new
process is going to run as non-root, since the whole point of running
non-root is to have the capabilities removed (it's still possible to
maintain individual capabilities as needed using the capBits argument
though).
CC libvirt_util_la-vircommand.lo
../../src/util/vircommand.c:2358:1: error: 'virCommandHandshakeChild' defined but not used [-Werror=unused-function]
The function is only implemented inside #ifndef WIN32.
* src/util/vircommand.c (virCommandHandshakeChild): Hoist earlier,
so that win32 build doesn't hit an unused forward declaration.
virCommand was previously calling virSetUIDGID() to change the uid and
gid of the child process, then separately calling
virSetCapabilities(). This did not work if the desired uid was != 0,
since a setuid to anything other than 0 normally clears all
capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the
uid, gid, and capabilities bits. This will get the new process setup
properly.
Since the static functions virSetCapabilities() and
virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch
will make CAP_SYS_RAWIO (which is required for passthrough of generic
scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and
74e0349) be retained by qemu when necessary. Apparently that
capability has been broken for non-root qemu ever since it was
originally added.
This is an interim measure to make sure everything still works in this
order. The next step will be to perform capabilities drop and
setuid/gid as a single operation (which is the only way to keep any
capabilities when switching to a non-root uid).
virCommand gets two new APIs: virCommandSetSELinuxLabel() and
virCommandSetAppArmorProfile(), which both save a copy of a
null-terminated string in the virCommand. During virCommandRun, if the
string is non-NULL and we've been compiled with AppArmor and/or
SELinux security driver support, the appropriate security library
function is called for the child process, using the string that was
previously set. In the case of SELinux, setexeccon_raw() is called,
and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use
the upcoming virSecurityManagerSetChildProcessLabel() prior to running
a child process, rather than needing to setup a hook function to be
called (and in turn call virSecurityManagerSetProcessLabel()) *during*
the setup of the child process.
If a uid and/or gid is specified for a command, it will be set just
after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set
uid/gid. This moves the setting of uid/gid and dropping of
capabilities closer to each other, which is important since the two
should really be done at the same time (libcapng provides a single
function that does both, which we will be unable to use, but want to
mimic as closely as possible).
All args except "cmd" in the call to virExec are now redundant, since
they can all be found in cmd, so remove the args and reference the
data directly in cmd. One exception to this is that "infd" was being
modified within virExec, and modifying the original in cmd caused make
check failures, so cmd->infd is copied to a local, and the local is
used during virExec().
virExecWithHook is only called from one place, so it always has the
same "hook" function (virHookCommand), and the data sent to that
function is always a virCommandPtr, so eliminate the function and
generic data from the arglist, and replace it with "virCommandPtr
cmd". The call to (hook)(data) is replaced with
"virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the
next patch, which will remove all other args to virExec (since they
are now redundant, as they're all members of virCommandPtr).
Currently, if a command wants to do asynchronous IO, a callback
is registered in the libvirtd eventloop to handle writes and
reads. However, there's a race in virCommandWait. The eventloop
may already be executing the callback, while virCommandWait is
mangling internal state of virCommand. To deal with it, we need
to either introduce locking or spawn a separate thread where we
poll() on stdio from child. The former, however, requires to
unlock all mutexes held, as the event loop may execute other
callbacks which tries to lock one of the mutexes, deadlock and
thus never wake us up. So it's safer to spawn a separate thread.
Currently, if we want to feed stdin, or catch stdout or stderr of a
virCommand we have to use virCommandRun(). When using virCommandRunAsync()
we have to register FD handles by hand. This may lead to code duplication.
Hence, introduce an internal API, which does this automatically within
virCommandRunAsync(). The intended usage looks like this:
virCommandPtr cmd = virCommandNew*(...);
char *buf = NULL;
...
virCommandSetOutputBuffer(cmd, &buf);
virCommandDoAsyncIO(cmd);
if (virCommandRunAsync(cmd, NULL) < 0)
goto cleanup;
...
if (virCommandWait(cmd, NULL) < 0)
goto cleanup;
/* @buf now contains @cmd's stdout */
VIR_DEBUG("STDOUT: %s", NULLSTR(buf));
...
cleanup:
VIR_FREE(buf);
virCommandFree(cmd);
Note, that both stdout and stderr buffers may change until virCommandWait()
returns.