My commit 7a2e845a86 (and its
prerequisites) managed to effectively ignore the
clear_emulator_capabilities setting in qemu.conf (visible in the code
as the VIR_EXEC_CLEAR_CAPS flag when qemu is being exec'ed), with the
result that the capabilities are always cleared regardless of the
qemu.conf setting. This patch fixes it by passing the flag through to
virSetUIDGIDWithCaps(), which uses it to decide whether or not to
clear existing capabilities before adding in those that were
requested.
Note that the existing capabilities are *always* cleared if the new
process is going to run as non-root, since the whole point of running
non-root is to have the capabilities removed (it's still possible to
maintain individual capabilities as needed using the capBits argument
though).
A value which is equal to a integer maximum such as LLONG_MAX is
a valid integer value.
The patch fix the following error:
1, virsh memtune vm --swap-hard-limit -1
2, virsh start vm
In debug mode, it shows error like:
virScaleInteger:1813 : numerical overflow:\
value too large: 9007199254740991KiB
uid_t and gid_t are opaque types, ranging from s32 to u32 to u64.
Explicitly cast the magic -1 to the appropriate type.
Signed-off-by: Philipp Hahn <hahn@univention.de>
The uid_t|gid_t values are explicitly casted to "unsigned long", but the
printf() still used "%d", which is for signed values.
Change the format to "%u".
Signed-off-by: Philipp Hahn <hahn@univention.de>
Normally when a process' uid is changed to non-0, all the capabilities
bits are cleared, even those explicitly set with calls to
capng_update()/capng_apply() made immediately before setuid. And
*after* the process' uid has been changed, it no longer has the
necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities
bits, it is necessary to either call capng_change_id() (which
unfortunately doesn't currently call initgroups to setup auxiliary
group membership), or to perform the small amount of calisthenics
contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities
setting/clearing in virSetUIDGIDWithCaps() and virCommand's
virSetCapabilities() (which it will replace in the next patch) is that
the new function properly clears the capabilities bounding set, so it
will not be possible for a child process to set any new
capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in
capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed
to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current
capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities
maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to
setuid/setgid/change the bounding set (unless they were also requested
by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled
with setting the uid, this is necessarily done in a single function,
rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at
time of execve) for a process that has uid != 0, the *file*
capabilities of the binary being executed must also have the desired
capabilities bit(s) set (see "man 7 capabilities"). This can be done
with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio").
Rather than treating uid:gid of 0:0 as a NOP, we blindly pass that
through to the lower layers. However, we *do* check for a requested
value of "-1" to mean "don't change this setting". setregid() and
setreuid() already interpret -1 as a NOP, so this is just an
optimization, but we are also calling getpwuid_r and initgroups, and
it's unclear what the former would do with a uid of -1.
Currently, whenever somebody calls saferead() on nonblocking FD
(safewrite() is totally interchangeable for purpose of this message)
he might get wrong return value. For instance, in the first iteration
some data is read. The number of bytes read is stored into local
variable 'nread'. However, in next iterations we can get -1 from
read() with errno == EAGAIN, in which case the -1 is returned despite
fact some data has already been read. So the caller gets confused.
Bare read() should be used for nonblocking FD.
There's no need to do lots of readlink() calls to canonicalize
a name if we're only going to use stat() on it, since stat()
already chases symlinks.
* src/util/virutil.c (virGetDeviceID): Let stat() do the symlink
chasing.
"virGetDeviceID" could be used across the sources, but it doesn't
relate with this series, and could be done later.
* src/util/virutil.h: (Declare virGetDeviceID, and
vir{Get,Set}DeviceUnprivSGIO)
* src/util/virutil.c: (Implement virGetDeviceID and
vir{Get,Set}DeviceUnprivSGIO)
* src/libvirt_private.syms: Export private symbols of upper helpers