https://bugzilla.redhat.com/show_bug.cgi?id=1632833
When doing a SCSI passthrough we don't put format= onto the
command line. This causes qemu to probe the format automatically
which ends up in a warning in the domain log and possible qemu
disabling writes to the first block (according to the warning
message).
Based-on-work-of: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The @alloc object returned by virDomainResctrlVcpuMatch is not
properly referenced and un-referenced in virDomainCachetuneDefParse.
This patch fixes this problem.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The URI parser used by libvirt does not populate uri->path if the
trailing slash is missing. The code virStorageSourceParseBackingURI
would then not populate src->path.
As only NBD network disks are allowed to have the 'name' field in the
XML defining the disk source omitted we'd generate an invalid XML which
we'd not parse again.
Fix it by populating src->path with an empty string if the uri is
lacking slash.
As pointed out above NBD is special in this case since we actually allow
it being NULL. The URI path is used as export name. Since an empty
export does not make sense the new approach clears the src->path if the
trailing slash is present but nothing else.
Add test cases now to cover all the various cases for NBD and non-NBD
uris as there was to time only 1 test abusing the quirk witout slash for
NBD and all other URIs contained the slash or in case of NBD also the
export name.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The name is misleading. Change it to 'uristr' so that 'path' can be
reused in the proper context later.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
If the same source gets built twice ('build same source on different
hosts at different times') the resulting files may differ.
Fix this by sorting the hash keys before usage.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
There are couple of things wrong with the current implementation.
The first one is that in the first loop the code tries to build a
list of fuse.glusterfs mount points. Well, since the strings are
allocated in a temporary buffer and are not duplicated this
results in wrong decision made later in the code.
The second problem is that the code does not take into account
subtree mounts. For instance, if there's a fuse.gluster mounted
at /some/path and another FS mounted at /some/path/subdir the
code would not recognize this subdir mount.
Reported-by: Han Han <hhan@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
If the given path is already a mount point (e.g. a bind mount of
a file, or simply a direct mount point of a FS), then our code
fails to detect that because the first thing it does is cutting
off part after last slash '/'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
On s390x the struct member f_type of statsfs is hard coded to 'unsigned
int'. Change virFileIsSharedFixFUSE() to take a 'long long int' and use
a temporary to avoid pointer-casting.
This fixes the following error:
../../src/util/virfile.c:3578:38: error: cast increases required alignment of target type [-Werror=cast-align]
virFileIsSharedFixFUSE(path, (long *) &sb.f_type);
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.ibm.com>
virFileReadValueUint does not log errors for non-existient files,
it merely returns -2.
Commit 12093f1 introduced this.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
-net name= will be deprecated in QEMU 3.1:
commit 101625a4d4ac7e96227a156bc5f6d21a9cc383cd
net: Deprecate the "name" parameter of -net
git describe: v3.0.0-791-g101625a4d4
Use the id option instead, supported since QEMU 1.2:
commit 6687b79d636cd60ed9adb1177d0d946b58fa7717
convert net_client_init() to OptsVisitor
git describe: v1.0-3564-g6687b79d63 contains: v1.2.0-rc0~142^2~8
Thankfully, libvirt only uses -net for non-PCI, non-virtio NICs
on ARM.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
We now explicitly handle media change elsewhere so we can drop the
switch statement. This will also make it more intuitive once CDROM
device hotplug might be supported.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Disk hotplug has slightly different semantics from media changing. Move
the media change code out and add proper initialization of the new
source object and proper cleanups if something fails.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The disk hotplug code also overloads media change which is not ideal.
This will allow splitting out of the media change code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The disk storage source needs to be prepared if we want to use -blockdev
or secrets for the new media image. It does not hurt to do the same for
the legacy hotplug code as well.
Unfortunately helpers like qemuDomainPrepareDiskSource take
virDomainDiskDef as an argument and it would be hard to fix them to take
an explicit source, so the function also temporarily replaces disk->src
for the new source in this function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Some functions require us to replace disk->src with the new source for
them to work properly. To avoid confusion all places which allow
explicit virStorageSource should get the appropriate definition.
The legacy code fortunately does not need anything from the old source
so that does not require modifications.
Blockdev does require the old definition so we'll pass it explicitly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Since the code is also used when changing media we need to allow
specifying explicit source for which we are going to prepare. With this
change callers don't have to replace disk->src with the new source
definition for generating these.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
qemu media changing code tried to assume old media's format for the new
one if that was not specified. Since the format will always be present
it does not make sense to keep the code around.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Old media changing code does not bother setting up the secrets for new
media or actually removing/adding of the corresponding objects.
Additionally it uses secrets setup for the old image to be removed as
the secret for the new image which is wrong.
Remove the support for secrets while changing media for the legacy
approach. The only reasonable way to fix it is when using blockdev.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
While the idea was good the implementation not so much as we need to
take into account the old disk data and the new source. The code will be
consolidated later in a different way.
This reverts commit 663b1d55de.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Preparing the storage source prior to assigning the alias will not work
as the names of the certain objects depend on the alias for the legacy
hotplug case as we generate the object names for the secrets based on
the alias.
This reverts commit 192fdaa614.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This enables to use both cgroup v1 and v2 at the same time together
with libvirt. It is supported by kernel and there is valid use-case,
not all controllers are implemented in cgroup v2 so there might be
configurations where administrator would enable these missing
controllers in cgroup v1.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In order to set CPU cfs period using cgroup v2 'cpu.max' interface
we need to load the current value of CPU cfs quota first because
format of 'cpu.max' interface is '$quota $period' and in order to
change 'period' we need to write 'quota' as well. Writing only one
number changes only 'quota'.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroups v2 we need to handle threads and processes differently.
If you need to move a process you need to write its pid into
cgrou.procs file and it will move the process with all its threads
as well. The whole process will be moved if you use tid of any thread.
In order to move only threads at first we need to create threaded group
and after that we can write the relevant thread tids into cgroup.threads
file. Threads can be moved only into cgroups that are children of
cgroup of its process.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When creating cgroup hierarchy we need to enable controllers in the
parent cgroup in order to be usable. That means writing "+{controller}"
into cgroup.subtree_control file. We can enable only controllers that
are enabled for parent cgroup, that means we need to do that for the
whole cgroup tree.
Cgroups for threads needs to be handled differently in cgroup v2. There
are two types of controllers:
- domain controllers: these cannot be enabled for threads
- threaded controllers: these can be enabled for threads
In addition there are multiple types of cgroups:
- domain: normal cgroup
- domain threaded: a domain cgroup that serves as root for threaded
cgroups
- domain invalid: invalid cgroup, can be changed into threaded, this
is the default state if you create subgroup inside
domain threaded group or threaded group
- threaded: threaded cgroup which can have domain threaded or
threaded as parent group
In order to create threaded cgroup it's sufficient to write "threaded"
into cgroup.type file, it will automatically make parent cgroup
"domain threaded" if it was only "domain". In case the parent cgroup
is already "domain threaded" or "threaded" it will modify only the type
of current cgroup. After that we can enable threaded controllers.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Cgroup v2 has only single mount point for all controllers. The list
of controllers is stored in cgroup.controllers file, name of controllers
are separated by space.
In cgroup v2 there is no cpuacct controller, the cpu.stat file always
exists with usage stats.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If the placement was copied from parent or set to absolute path
there is nothing to do, otherwise set the placement based on
process placement from /proc/self/cgroup or /proc/{pid}/cgroup.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
When reconnecting to a domain we are validating the cgroup name.
In case of cgroup v2 we need to validate only the new format for host
without systemd '{machinename}.libvirt-{drivername}' or scope name
generated by systemd.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We cannot detect only mount points to figure out whether cgroup v2
is available because systemd uses cgroup v2 for process tracking and
all controllers are mounted as cgroup v1 controllers.
To make sure that this is no the situation we need to check
'cgroup.controllers' file if it's not empty to make sure that cgroup
v2 is not mounted only for process tracking.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Place cgroup v2 backend type before cgroup v1 to make it obvious
that cgroup v2 is preferred implementation.
Following patches will introduce support for hybrid configuration
which will allow us to use both at the same time, but we should
prefer cgroup v2 regardless.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
My commit d6b8838 fixed the uid:gid for the pre-created UNIX sockets
but did not account for the different umask of libvirtd and QEMU.
Since commit 0e1a1a8c we set umask to '0002' for the QEMU process.
Manually tune-up the permissions to match what we would have gotten
if QEMU had created the socket.
https://bugzilla.redhat.com/show_bug.cgi?id=1633389
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1632711
GlusterFS is typically safe when it comes to migration. It's a
network FS after all. However, it can be mounted via FUSE driver
they provide. If that is the case we fail to identify it and
think migration is not safe and require VIR_MIGRATE_UNSAFE flag.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
In commit v4.7.0-168-g993d85ae5e I introduced two Icelake CPU models,
but failed to actually include them in the CPU map index.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We switched to opening mode='bind' sockets ourselves:
commit 30fb2276d8
qemu: support passing pre-opened UNIX socket listen FD
in v4.5.0-rc1~251
Then fixed qemuBuildChrChardevStr to change libvirtd's label
while creating the socket:
commit b0c6300fc4
qemu: ensure FDs passed to QEMU for chardevs have correct SELinux labels
v4.5.0-rc1~52
Also add labeling of these sockets to the DAC driver.
Instead of duplicating the logic which decides whether libvirt should
pre-create the socket, assume an existing path meaning that it was created
by libvirt.
https://bugzilla.redhat.com/show_bug.cgi?id=1633389
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This function updates the used QEMU capabilities of @vm by querying
the QEMU capabilities cache.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move the fetch of @ipAddrLeft to after the goto skip_instantiate
and remove the (req->binding) guard since we know that as long
as req->binding is created, then req->threadkey is filled in.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 87a8a30d6 added the function based on the virsh function,
but used an unsigned long long instead of a double and thus that
limits the maximum result.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
When libxlDomainMigrationDstPrepare adds the @args to an
virNetSocketAddIOCallback using libxlMigrateDstReceive as
the target of the virNetSocketIOFunc @func with the knowledge
that the libxlMigrateDstReceive will virObjectUnref @args
at the end thus not needing to Unref during normal processing
for libxlDomainMigrationDstPrepare.
However, Coverity believes there's an issue with this. The
problem is there can be @nsocks virNetSocketAddIOCallback's
added, but only one virObjectUnref. That means the first
one done will Unref and the subsequent callers may not get
the @args (or @opaque) as they expected. If there's only
one socket returned from virNetSocketNewListenTCP, then sure
that works. However, if it returned more than one there's
going to be a problem.
To resolve this, since we start with 1 reference from the
virObjectNew for @args, we will add 1 reference for each
time @args is used for virNetSocketAddIOCallback. Then
since libxlDomainMigrationDstPrepare would be done with
@args, move it's virObjectUnref from the error: label to
the done: label (since error: falls through). That way
once the last IOCallback is done, then @args will be freed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Remove the "!params" check from the condition since it's possible
someone could pass a non NULL value there, but a 0 for the nparams
and thus continue on. The external API only checks if @nparams is
non-zero, then check for NULL @params.
Found by Coverity
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
The pointer related to uml_driver needs to be checked before its usage
inside the function. Some attributes of the driver are being accessed
while the pointer is NULL considering the current logic.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This reverts commit 1602aa28f8.
There is no need to call virCgroupRemove() nor virCgroupFree() if
virCgroupEnableMissingControllers() fails because it will not modify
'group' at all.
The cleanup of directories is done in virCgroupMakeGroup().
Reviewed-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Turns out, there are couple of bugs that prevent this feature
from being operational. Given how close to the release we are
disable the feature temporarily. Hopefully, it can be enabled
back after all the bugs are fixed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Cgroups are linux specific and we need to make sure that the code is
compiled only on linux. On different OSes it fails the compilation:
../../src/util/vircgroupv1.c:65:19: error: variable has incomplete type 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:65:12: note: forward declaration of 'struct mntent'
struct mntent entry;
^
../../src/util/vircgroupv1.c:74:12: error: implicit declaration of function 'getmntent_r' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
while (getmntent_r(mounts, &entry, buf, sizeof(buf)) != NULL) {
^
../../src/util/vircgroupv1.c:814:39: error: use of undeclared identifier 'MS_NOSUID'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:49: error: use of undeclared identifier 'MS_NODEV'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:58: error: use of undeclared identifier 'MS_NOEXEC'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:841:65: error: use of undeclared identifier 'MS_BIND'
if (mount(src, group->legacy[i].mountPoint, "none", MS_BIND,
^
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
All the system headers are used only if we are compiling on linux
and they all are present otherwise we would have seen build errors
because in our tests/vircgrouptest.c we use only __linux__ to check
whether to skip the cgroup tests or not.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
tests/vircgrouptest.c uses #ifdef __linux__ for a long time and no
failure was reported so far so it's safe to assume that __linux__ is
good enough to guard cgroup code.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Let's ignore the checking of interface type when we call the function
qemuARPGetInterfaces to get IP from host's arp table.
Signed-off-by: Lin Ma <lma@suse.com>
Reviewed-by: Chen Hanxiao <chenhanxiao@gmail.com>
It may happen that in the list of paths/disk sources to relabel
there is a disk source. If that is the case, the path is NULL. In
that case, we shouldn't try to lock the path. It's likely a
network disk anyway and therefore there is nothing to lock.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
This shouldn't be needed per-se. Security manager shouldn't
disappear during transactions - it's immutable. However, it
doesn't hurt to grab a reference either - transaction code uses
it after all.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This array is allocated in virSecuritySELinuxContextListAppend()
but never freed. This commit is essentially the same as ca25026.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Use the mnemonic macros of libdbus for 1 (TRUE) and 0 (FALSE).
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Report a debug message if dbus_watch_handle() returns FALSE.
dbus_watch_handle() returns FALSE if there wasn't enough memory for
reading or writing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
As documented at
https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga2522ac5075dfe0a1535471f6e045e1ee
the creator of a non-shared D-Bus connection has to release the last
reference after closing for freeing.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Bjoern Walk <bwalk@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Stefan Zimmermann <stzi@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Grab a ref for info->bus (a DBus connection) as long as the while loop
is running. With the grabbed reference it is ensured that info->bus
isn't freed as long as the while loop is executed. This is necessary
as it's allowed to drop the last ref for the bus connection in a
handler.
There was already a bug of this kind in libdbus itself:
https://bugs.freedesktop.org/show_bug.cgi?id=15635.
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The only place where VIR_DOMAIN_EVENT_RESUMED should be generated is the
RESUME event handler to make sure we don't generate duplicate events or
state changes. In the worse case the duplicity can revert or cover
changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events
we could happily mark the domain as running and report
VIR_DOMAIN_EVENT_RESUMED to registered clients.
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Thanks to the previous commit the RESUME event handler knows what reason
should be used when changing the domain state to VIR_DOMAIN_RUNNING, but
the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic
VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can
be easily deduced from the running reason, which saves us from having to
pass one more value to the handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Whenever we get the RESUME event from QEMU, we change the state of the
affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED
reason. This is fine if the domain is resumed unexpectedly, but when we
sent "cont" to QEMU we usually have a better reason for the state
change. The better reason is used in qemuProcessStartCPUs which also
sets the domain state to running if qemuMonitorStartCPUs reports
success. Thus we may end up with two state updates in a row, but the
final reason is correct.
This patch is a preparation for dropping the state change done in
qemuMonitorStartCPUs for which we need to pass the actual running reason
to the RESUME event handler and use it there instead of
VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED
reasons when changing domain state to running with more specific ones.
All of them are done when libvirtd reconnects to an existing domain
after being restarted and sees an unfinished migration or save.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT was defined but not used anywhere
in our event generation code. This fixes qemuDomainRevertToSnapshot to
properly report why the domain was resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
A deadlock situation can occur when autostarting a LXC domain 'guest'
due to two threads attempting to take opposing locks while holding
opposing locks (AB BA problem). Thread A takes and holds the 'vm' lock
while attempting to take the 'client' lock, meanwhile, thread B takes
and holds the 'client' lock while attempting to take the 'vm' lock.
The potential for this can be seen as follows:
Thread A:
virLXCProcessAutostartDomain (takes vm lock)
--> virLXCProcessStart
--> virLXCProcessConnectMonitor
--> virLXCMonitorNew
--> virNetClientSetCloseCallback (wants client lock)
Thread B:
virNetClientIncomingEvent (takes client lock)
--> virNetClientIOHandleInput
--> virNetClientCallDispatch
--> virNetClientCallDispatchMessage
--> virNetClientProgramDispatch
--> virLXCMonitorHandleEventInit
--> virLXCProcessMonitorInitNotify (wants vm lock)
Since these threads are scheduled independently and are preemptible it
is possible for the deadlock scenario to occur where each thread locks
their first lock but both will fail to get their second lock and just
spin forever. You get something like:
virLXCProcessAutostartDomain (takes vm lock)
--> virLXCProcessStart
--> virLXCProcessConnectMonitor
--> virLXCMonitorNew
<...>
virNetClientIncomingEvent (takes client lock)
--> virNetClientIOHandleInput
--> virNetClientCallDispatch
--> virNetClientCallDispatchMessage
--> virNetClientProgramDispatch
--> virLXCMonitorHandleEventInit
--> virLXCProcessMonitorInitNotify (wants vm lock but spins)
<...>
--> virNetClientSetCloseCallback (wants client lock but spins)
Neither thread ever gets the lock it needs to be able to continue
while holding the lock that the other thread needs.
The actual window for preemption which can cause this deadlock is
rather small, between the calls to virNetClientProgramNew() and
execution of virNetClientSetCloseCallback(), both in
virLXCMonitorNew(). But it can be seen in real world use that this
small window is enough.
By moving the call to virNetClientSetCloseCallback() ahead of
virNetClientProgramNew() we can close any possible chance of the
deadlock taking place. There should be no other implications to the
move since the close callback (in the unlikely event was called) will
spin on the vm lock. The remaining work that takes place between the
old call location of virNetClientSetCloseCallback() and the new
location is unaffected by the move.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
With the introduction of cgroup v2 there are new names used with
cgroups based on which version is used:
- legacy: cgroup v1
- unified: cgroup v2
- hybrid: cgroup v1 and cgroup v2
Let's use 'legacy' instead of 'cgroupv1' or 'controllers' in our code.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
They all need virCgroupV1GetMemoryUnlimitedKB() so it's easier to
move them in one commit.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We need to update one test-case because now new cgroup object will be
created only if there is any cgroup backend available.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We will need to extract current cgroup v1 implementation into separate
backend because there will be new cgroup v2 implementation and both will
have to co-exist.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
This will be required once cgroup v2 is introduced. The cgroup
detection is not simple and we will have multiple backends so we
should not just jump into the middle of the detection code.
In order to use virCgroupNewSelf we need to create all the remaining
data files:
- {name}.cgroups represents /proc/cgroups, it is a list of cgroup
controllers compiled into kernel
- {name}.self.cgroup represents /proc/self/cgroup, it describes
cgroups to which the process belongs
For "no-cgroups" we need to modify the expected behavior because
virCgroupNewSelf() will fail if there are no controllers available.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Because we can set which files to return for cgroup tests there
is no need to have special function tailored to run tests.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Once we introduce cgroup v2 support we need to handle processes and
threads differently.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Use flags in virCgroupAddTaskInternal instead of boolean parameter.
Following patch will add new flag to indicate thread instead of process.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
In cgroup v2 we need to handle processes and threads differently,
following patch will introduce virCgroupAddThread.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If we are on host with systemd we need to build cgroup hierarchy
ourselves for controllers that are not managed by systemd.
As a starting parent we need to force root group because
virCgroupMakeGroup() takes that parent in order to inherit values
for cpuset controller.
By default cpuset controller is managed by systemd so we will never
hit the issue but for v2 cgroups we need to use parent cgroup every
time.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
If virCgroupEnableMissingControllers() fails it could have already
created some directories, we should clean it up as well.
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>