Fix libvirtd hang since fork() was called while another thread had
security manager locked.
We have the stack security driver, which internally manages other security drivers,
just call them "top" and "nested".
We call virSecurityStackPreFork() to lock the top one, and it also locks
and then unlocks the nested drivers prior to fork. Then in qemuSecurityPostFork(),
it unlocks the top one, but not the nested ones. Thus, if one of the nested
drivers ("dac" or "selinux") is still locked, it will cause a deadlock. If we always
surround nested locks with top lock, it is always secure. Because we have got top lock
before fork child libvirtd.
However, it is not always the case in the current code, We discovered this case:
the nested list obtained through the qemuSecurityGetNested() will be locked directly
for subsequent use, such as in virQEMUDriverCreateCapabilities(), where the nested list
is locked using qemuSecurityGetDOI, but the top one is not locked beforehand.
The problem stack is as follows:
libvirtd thread1 libvirtd thread2 child libvirtd
| | |
| | |
virsh capabilities qemuProcessLanuch |
| | |
| lock top |
| | |
lock nested | |
| | |
| fork------------------->|(nested lock held by thread1)
| | |
| | |
unlock nested unlock top unlock top
|
|
qemuSecuritySetSocketLabel
|
|
lock nested (deadlock)
In this commit, we ensure that the top lock is acquired before the nested lock,
so during fork, it's not possible for another task to acquire the nested lock.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1303031
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
While glibc provides qsort(), which usually is just a mergesort,
until sorting arrays so huge that temporary array used by
mergesort would not fit into physical memory (which in our case
is never), we are not guaranteed it'll use mergesort. The
advantage of mergesort is clear - it's stable. IOW, if we have an
array of values parsed from XML, qsort() it and produce some
output based on those values, we can then compare the output with
some expected output, line by line.
But with newer glibc this is all history. After [1], qsort() is
no longer mergesort but introsort instead, which is not stable.
This is suboptimal, because in some cases we want to preserve
order of equal items. For instance, in ebiptablesApplyNewRules(),
nwfilter rules are sorted by their priority. But if two rules
have the same priority, we want to keep them in the order they
appear in the XML. Since it's hard/needless work to identify
places where stable or unstable sorting is needed, let's just
play it safe and use stable sorting everywhere.
Fortunately, glib provides g_qsort_with_data() which indeed
implement mergesort and it's a drop in replacement for qsort(),
almost. It accepts fifth argument (pointer to opaque data), that
is passed to comparator function, which then accepts three
arguments.
We have to keep one occurance of qsort() though - in NSS module
which deliberately does not link with glib.
1: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=03bf8357e8291857a435afcc3048e0b697b6cc04
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Normally when a child process is started by libvirt, the SELinux label
of that process is set to virtd_t (plus an MCS range). In at least one
case (passt) we need for the SELinux label of a child process label to
match the label that the binary would have transitioned to
automatically if it had been run standalone (in the case of passt,
that label is passt_t).
This patch modifies virSecuritySELinuxSetChildProcessLabel() (and all
the functions above it in the call chain) so that the toplevel
function can set a new argument "useBinarySpecificLabel" to true. If
it is true, then virSecuritySELinuxSetChildProcessLabel() will call
the new function virSecuritySELinuxContextSetFromFile(), which uses
the selinux library function security_compute_create() to determine
what would be the label of the new process if it had been run
standalone (rather than being run by libvirt) - the MCS range from the
normally-used label is added to this newly derived label, and that is
what is used for the new process rather than whatever is in the
domain's security label (which will usually be virtd_t).
In order to easily verify that nothing was broken by these changes to
the call chain, all callers currently set useBinarySpecificPath =
false, so all behavior should be completely unchanged. (The next
patch will set it to true only for the case of running passt.)
https://bugzilla.redhat.com/2172267
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The virSecurityDomainSetTPMLabels() and
virSecurityDomainRestoreTPMLabels() APIs set/restore label on two
files/directories:
1) the TPM state (tpm->data.emulator.storagepath), and
2) the TPM log file (tpm->data.emulator.logfile).
Soon there will be a need to set the label on the log file but
not on the state. Therefore, extend these APIs for a boolean flag
that when set does both, but when unset does only 2).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This change was generated using the following spatch:
@ rule1 @
expression a;
identifier f;
@@
<...
- f(*a);
... when != a;
- *a = NULL;
+ g_clear_pointer(a, f);
...>
@ rule2 @
expression a;
identifier f;
@@
<...
- f(a);
... when != a;
- a = NULL;
+ g_clear_pointer(&a, f);
...>
Then, I left some of the changes out, like tools/nss/ (which
doesn't link with glib) and put back a comment in
qemuBlockJobProcessEventCompletedActiveCommit() which coccinelle
decided to remove (I have no idea why).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Attaching a newly created vhostuser port to a VM fails due to an
apparmor denial
internal error: unable to execute QEMU command 'chardev-add': Failed
to bind socket to /run/openvswitch/vhu838c4d29-c9: Permission denied
In the case of a net device type VIR_DOMAIN_NET_TYPE_VHOSTUSER, the
underlying chardev is not labeled in qemuDomainAttachNetDevice prior
to calling qemuMonitorAttachCharDev.
A simple fix would be to call qemuSecuritySetChardevLabel using the
embedded virDomainChrSourceDef in the virDomainNetDef vhostuser data,
but this incurs the risk of incorrectly restoring the label. E.g.
consider the DAC driver behavior with a vhostuser net device, which
uses a socket for the chardev backend. The DAC driver uses XATTRS to
store original labelling information, but XATTRS are not compatible
with sockets. Without the original labelling information, the socket
labels will be restored with root ownership, preventing other
less-privileged processes from connecting to the socket.
This patch avoids overloading chardev labelling with vhostuser net
devices by introducing virSecurityManager{Set,Restore}NetdevLabel,
which is currently only implemented for the apparmor driver. The
new APIs are then used to set and restore labels for the vhostuser
net devices.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Use virAppendElement instead of virInsertElementsN to implement
VIR_APPEND_ELEMENT which allows us to remove error handling as the
only relevant errors were removed when switching to aborting memory
allocation functions.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In a few places it may happen that the array we want to sort is
still NULL (e.g. because there were no leases found, no paths for
secdriver to lock or no cache banks). However, passing NULL to
qsort() is undefined and even though glibc plays nicely we
shouldn't rely on undefined behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
Historically, we declared pointer type to our types:
typedef struct _virXXX virXXX;
typedef virXXX *virXXXPtr;
But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.
This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:
https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The argument (if not NULL) points to the file the domain is
restoring from. On QEMU command line this used to be '-incoming
$path', but we've switched to passing FD ages ago and thus this
argument is used only in AppArmor (which loads the profile on
domain start). Anyway, the argument does not refer to stdin,
rename it to 'incomingPath' then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
These APIs were removed/renamed in v6.5.0-rc1~142 and v6.5.0-rc1~141
because they deemed unused. And if it wasn't for the RFE [1] things
would stay that way.
The RFE asks for us to not change DAC ownership on the file a domain is
restoring from. We have been doing that for ages (if not forever),
nevertheless it's annoying because if the restore file is on an NFS
remembering owner won't help - NFS doesn't support XATTRs yet. But more
importantly, there is no need for us to chown() the file because when
restoring the domain the file is opened and the FD is then passed to
QEMU. Therefore, we really need only to set SELinux and AppArmor.
This reverts bd22eec903976c5c51b1d00e335c315699e5acd6.
This partially reverts 4ccbd207f213066c000f43eb544eb00ec745023b.
The difference to the original code is that secdrivers are now
not required to provide dummy implementation to avoid
virReportUnsupportedError(). The callback is run if it exists, if
it doesn't zero is returned without any error.
1: https://bugzilla.redhat.com/show_bug.cgi?id=1851016
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
When locking files for metadata change, we open() them for R/W
access. The write access is needed because we want to acquire
exclusive (write) lock (to mutually exclude with other daemons
trying to modify XATTRs on the same file). Anyway, the open()
might fail if the file lives on a RO filesystem. Well, if that's
the case, ignore the error and continue with the next file on the
list. We won't change any seclabel on the file anyway - there is
nothing to remember then.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In the light of recent commit of 9d83281382 fix the comment that
says directories can't be locked. Well, in general they can, but
not in our case.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The new name is virSecurityManagerDomainRestorePathLabel().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
After previous commit this function is used no more.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The AppArmor secdriver does not use labels to grant access to
resources. Therefore, it doesn't use XATTRs and hence it lacks
implementation of .domainMoveImageMetadata callback. This leads
to a harmless but needless error message appearing in the logs:
virSecurityManagerMoveImageMetadata:476 : this function is not
supported by the connection driver: virSecurityManagerMoveImageMetadata
Closes: https://gitlab.com/libvirt/libvirt/-/issues/25
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This API allows drivers to separate out handling of @stdin_path
of virSecurityManagerSetAllLabel(). The thing is, the QEMU driver
uses transactions for virSecurityManagerSetAllLabel() which
relabels devices from inside of domain's namespace. This is what
we usually want. Except when resuming domain from a file. The
file is opened before any namespace is set up and the FD is
passed to QEMU to read the migration stream from. Because of
this, the file lives outside of the namespace and if it so
happens that the file is a block device (i.e. it lives under
/dev) its copy will be created in the namespace. But the FD that
is passed to QEMU points to the original living in the host and
not in the namespace. So relabeling the file inside the namespace
helps nothing.
But if we have a separate API for relabeling the restore file
then the QEMU driver can continue calling
virSecurityManagerSetAllLabel() with transactions enabled and
call this new API without transactions.
We already have an API for relabeling a single file
(virSecurityManagerDomainSetPathLabel()) but in case of SELinux
it uses @imagelabel (which allows RW access) and we want to use
@content_context (which allows RO access).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The way that our file locking works is that we open() the file we
want to lock and then use fcntl(fd, F_SETLKW, ...) to lock it.
The problem is, we are doing all of these as root which doesn't
work if the file lives on root squashed NFS, because if it does
then the open() fails. The way to resolve this is to make this a
non fatal error and leave callers deal with this (i.e. disable
remembering) - implemented in the previous commit.
https://bugzilla.redhat.com/show_bug.cgi?id=1804672
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
There are some cases where we want to remember the original owner
of a file but we fail to lock it for XATTR change (e.g. root
squashed NFS). If that is the case we error out and refuse to
start a domain. Well, we can do better if we disable remembering
for paths we haven't locked successfully.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
So far, in the lock state we are storing only the file
descriptors of the files we've locked. Therefore, when unlocking
them and something does wrong the only thing we can report is FD
number, which is not user friendly at all. But if we store paths
among with FDs we can do better error reporting.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The S_ISSOCK macro is not available on Windows platforms.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently, when security driver is not available users are informed that
it wasn't found which can be confusing.
1. Update error message
2. Add comment to domain doc
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by Sebastian Mitterle <smitterl@redhat.com>
Use G_GNUC_UNUSED from GLib instead of ATTRIBUTE_UNUSED.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This function returns the name of the secdriver. Since the name
is invariant we don't really need to lock the manager - it won't
change.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
This function is in fact returning the name of the virtualization
driver that registered the security manager/driver.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
In upcoming commits, virSecurityManagerSetAllLabel() will perform
rollback in case of failure by calling
virSecurityManagerRestoreAllLabel(). But in order to do that, the
former needs to have @migrated argument so that it can be passed
to the latter.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
The usleep function was missing on older mingw versions, but we can rely
on it existing everywhere these days. It may only support times upto 1
second in duration though, so we'll prefer to use g_usleep instead.
The commandhelper program is not changed since that can't link to glib.
Fortunately it doesn't need to build on Windows platforms either.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
If there are two paths on the list that are the same we need to
lock it only once. Because when we try to lock it the second time
then open() fails. And if it didn't, locking it the second time
would fail for sure. After all, it is sufficient to lock all
paths just once satisfy the caller.
Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Document why we need to sort paths while it's still fresh in my
memory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The purpose of this API is to allow caller move XATTRs (or remove
them) from one file to another. This will be needed when moving
top level of disk chain (either by introducing new HEAD or
removing it).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This reverts commit a5e16020907e91bca1b0ab6c4ee5dbbdcccf6a54.
Getting rid of unistd.h from our headers will require more work than
just fixing the broken mingw build. Revert it until I have a more
complete proposal.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
util/virutil.h bogously included unistd.h. Drop it and replace it by
including it directly where needed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Now that we have replacement in the form of the image labeling function
we can drop the unnecessary functions by replacing all callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Security labeling of disks consists of labeling of the disk image
itself and it's backing chain. Modify
virSecurityManager[Set|Restore]ImageLabel to take a boolean flag that
will label the full chain rather than the top image itself.
This allows to delete/unify some parts of the code and will also
simplify callers in some cases.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This reverts commit 3e26b476b5f322353bf0dcd8e3f037ca672b8c62.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Trying to use virlockd to lock metadata turns out to be too big
gun. Since we will always spawn a separate process for relabeling
we are safe to use thread unsafe POSIX locks and take out
virtlockd completely out of the picture.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When metadata locking is enabled that means the security commit
processing will be run in a fork similar to how namespaces use fork()'s
for processing. This is done to ensure libvirt can properly and
synchronously modify the metadata to store the original owner data.
Since fork()'s (e.g. virFork) have been seen as a performance bottleneck
being able to disable them allows the admin to choose whether the
performance 'hit' is worth the extra 'security' of being able to
remember the original owner of a lock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Two new APIs are added so that security driver can lock and
unlock paths it wishes to touch. These APIs are not for other
drivers to call but security drivers (DAC and SELinux). That is
the reason these APIs are not exposed through our
libvirt_private.syms file.
Three interesting things happen in this commit. The first is the
global @lockManagerMutex. Unfortunately, this has to exist so that
there is only one thread talking to virtlockd at a time. If there
were more threads and one of them closed the connection
prematurely, it would cause virtlockd killing libvirtd. Instead
of complicated code that would handle that, let's have a mutex
and keep the code simple.
The second interesting thing is keeping connection open between
lock and unlock API calls. This is achieved by duplicating client
FD and keeping it open until unlock is called. This trick is used
by regular disk content locking code when the FD is leaked to
qemu.
Finally, the third thing is polling implemented at client side.
Since virtlockd has only one thread that handles locking
requests, all it can do is either acquire lock or error out.
Therefore, the polling has to be implemented in client. The
polling is capped at 60 second timeout, which should be plenty
since the metadata lock is held only for a fraction of a second.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Now that we know what metadata lock manager user wishes to use we
can load it when initializing security driver. This is achieved
by adding new argument to virSecurityManagerNewDriver() and
subsequently to all functions that end up calling it.
The cfg.mk change is needed in order to allow lock_manager.h
inclusion in security driver without 'syntax-check' complaining.
This is safe thing to do as locking APIs will always exist (it's
only backend implementation that changes). However, instead of
allowing the include for all other drivers (like cpu, network,
and so on) allow it only for security driver. This will still
trigger the error if including from other drivers.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It will be desirable to run transactions more often than we
currently do. Even if the domain we're relabeling the paths for
does not run in a namespace. If that's the case, there is no need
to fork() as we are already running in the right namespace. To
differentiate whether transaction code should fork() or not the
@pid argument now accepts -1 (which means do not fork).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The virSecurityStackAddNested() can fail in which case
virSecurityManagerNewStack() should fail too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>