qemu: Fix hang when migration is canceled at the last moment

When a migration is canceled very late once virtual CPUs are already stopped, QEMU will automatically resume them. If this happens after we exited a waiting loop in qemuMigrationSrcWaitForCompletion, but before a loop that tries to make sure CPUs are stopped by waiting for the appropriate event, we may end up waiting forever because the CPUs are running (they were resumed by migrate_cancel), but the STOP event is already gone. This is possible because we enter monitor for fetching migration statistics at which point other APIs can be processed and migration may change its state. We should recheck the state when we get back from the monitor code. https://issues.redhat.com/browse/RHEL-52493 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
2025-01-20 16:28:20 +01:00 · 2025-01-20 16:28:20 +01:00 · 0ca8d870a2
commit 0ca8d870a2
parent ab10c0695d
1 changed files with 7 additions and 0 deletions
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@ -2169,6 +2169,13 @@ qemuMigrationSrcWaitForCompletion(virDomainObj *vm,

    ignore_value(qemuMigrationAnyFetchStats(vm, asyncJob, jobData, NULL));

+    /* We need to recheck migration status here as it might have changed while
+     * we were fetching statistics. For example, the migration might have been
+     * canceled.
+     */
+    if ((rv = qemuMigrationAnyCompleted(vm, asyncJob, dconn, flags)) < 0)
+        return rv;
+
    qemuDomainJobDataUpdateTime(jobData);
    qemuDomainJobDataUpdateDowntime(jobData);
    g_clear_pointer(&vm->job->completed, virDomainJobDataFree);