qemu: Signal domain condition in qemuProcessStop a bit later

Signaling the condition before vm->def->id is reset to -1 is dangerous:
in case a waiting thread wakes up, it does not see anything interesting
(the domain is still marked as running) and just enters virDomainObjWait
where it waits forever because the condition will never be signalled
again.

Originally it was impossible to get into such situation because the vm
object was locked all the time between signaling the condition and
resetting vm->def->id, but after commit 860a999802 released in 6.8.0,
qemuDomainObjStopWorker called in qemuProcessStop between
virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm
object giving other threads a chance to wake up and possibly hang.

In real world, this can be easily reproduced by killing, destroying, or
just shutting down (from the guest OS) a domain while it is being
migrated somewhere else. The migration job would never finish.

So let's make sure we delay signaling the domain condition to the point
when a woken up thread can detect the domain is not active anymore.

https://bugzilla.redhat.com/show_bug.cgi?id=1949869

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This commit is contained in:
Jiri Denemark 2021-07-16 15:52:50 +02:00
parent 370ac3d25c
commit 364995ed57

View File

@ -7852,9 +7852,6 @@ void qemuProcessStop(virQEMUDriver *driver,
if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
driver->inhibitCallback(false, driver->inhibitOpaque);
/* Wake up anything waiting on domain condition */
virDomainObjBroadcast(vm);
if ((timestamp = virTimeStringNow()) != NULL) {
qemuDomainLogAppendMessage(driver, vm, "%s: shutting down, reason=%s\n",
timestamp,
@ -7925,6 +7922,9 @@ void qemuProcessStop(virQEMUDriver *driver,
vm->def->id = -1;
/* Wake up anything waiting on domain condition */
virDomainObjBroadcast(vm);
virFileDeleteTree(priv->libDir);
virFileDeleteTree(priv->channelTargetDir);