qemu: Report error from both sides of migration

When migration fails in Perform phase, we call Finish on the destination
host with cancelled=1 and get the error from there and report it to the
user. This works well if the error on the destination caused the
migration to fail. But in other cases the main error may reported by the
source and the destination would just be complaining about broken
migration stream.

In other words, we don't really know which error caused the migration to
fail and we have no way of detecting that. So instead of choosing one
error, this patch will combine the error messages from both sides of
migration into a single message and report it to the user. The result
would be, for example:

    operation failed: migration failed. Message from the source host:
    operation failed: job 'migration out' failed: Certificate does not
    match the hostname ble.bla. Message from the destination host:
    operation failed: job 'migration in' failed: load of migration
    failed: Invalid argument

And yes, this is ugly, but I wasn't able to come up with a better way of
fixing this issue.

https://issues.redhat.com/browse/RHEL-58933

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This commit is contained in:
Jiri Denemark 2025-01-14 13:30:52 +01:00
parent 14fc6812df
commit 899bf2746a
2 changed files with 26 additions and 26 deletions

View File

@ -3430,29 +3430,29 @@ virDomainMigrateVersion3Full(virDomainPtr domain,
if (ddomain) { if (ddomain) {
VIR_ERROR(_("finish step ignored that migration was cancelled")); VIR_ERROR(_("finish step ignored that migration was cancelled"));
} else { } else {
/* If Finish reported a useful error, use it instead of the virErrorPtr err = virGetLastError();
* original "migration unexpectedly failed" error. /* When both Confirm and Finish reported an error in QEMU driver,
* we don't really know which error is the root cause. Let's report
* both errors to the user.
* *
* This is ugly but we can't do better with the APIs we have. We * This is ugly but we can't do better with the APIs we have. We
* only replace the error if Finish was called with cancelled == 1 * only replace the error if Finish was called with cancelled == 1
* and reported a real error (old libvirt would report an error * and reported a real error (old libvirt would report an error
* from RPC instead of MIGRATE_FINISH_OK), which only happens when * from RPC instead of MIGRATE_FINISH_OK).
* the domain died on destination. To further reduce a possibility
* of false positives we also check that Perform returned
* VIR_ERR_OPERATION_FAILED.
*/ */
if (orig_err && if (orig_err &&
orig_err->domain == VIR_FROM_QEMU && orig_err->domain == VIR_FROM_QEMU &&
orig_err->code == VIR_ERR_OPERATION_FAILED) { orig_err->code == VIR_ERR_OPERATION_FAILED &&
virErrorPtr err = virGetLastError(); err &&
if (err &&
err->domain == VIR_FROM_QEMU && err->domain == VIR_FROM_QEMU &&
err->code != VIR_ERR_MIGRATE_FINISH_OK) { err->code != VIR_ERR_MIGRATE_FINISH_OK) {
virReportError(VIR_ERR_OPERATION_FAILED,
_("migration failed. Message from the source host: %1$s. Message from the destination host: %2$s"),
orig_err->message, err->message);
g_clear_pointer(&orig_err, virFreeError); g_clear_pointer(&orig_err, virFreeError);
} }
} }
} }
}
/* If ddomain is NULL, then we were unable to start /* If ddomain is NULL, then we were unable to start
* the guest on the target, and must restart on the * the guest on the target, and must restart on the

View File

@ -5904,29 +5904,29 @@ qemuMigrationSrcPerformPeer2Peer3(virQEMUDriver *driver,
if (ddomain) { if (ddomain) {
VIR_ERROR(_("finish step ignored that migration was cancelled")); VIR_ERROR(_("finish step ignored that migration was cancelled"));
} else { } else {
/* If Finish reported a useful error, use it instead of the virErrorPtr err = virGetLastError();
* original "migration unexpectedly failed" error. /* When both Confirm and Finish reported an error in QEMU driver,
* we don't really know which error is the root cause. Let's report
* both errors to the user.
* *
* This is ugly but we can't do better with the APIs we have. We * This is ugly but we can't do better with the APIs we have. We
* only replace the error if Finish was called with cancelled == 1 * only replace the error if Finish was called with cancelled == 1
* and reported a real error (old libvirt would report an error * and reported a real error (old libvirt would report an error
* from RPC instead of MIGRATE_FINISH_OK), which only happens when * from RPC instead of MIGRATE_FINISH_OK).
* the domain died on destination. To further reduce a possibility
* of false positives we also check that Perform returned
* VIR_ERR_OPERATION_FAILED.
*/ */
if (orig_err && if (orig_err &&
orig_err->domain == VIR_FROM_QEMU && orig_err->domain == VIR_FROM_QEMU &&
orig_err->code == VIR_ERR_OPERATION_FAILED) { orig_err->code == VIR_ERR_OPERATION_FAILED &&
virErrorPtr err = virGetLastError(); err &&
if (err &&
err->domain == VIR_FROM_QEMU && err->domain == VIR_FROM_QEMU &&
err->code != VIR_ERR_MIGRATE_FINISH_OK) { err->code != VIR_ERR_MIGRATE_FINISH_OK) {
virReportError(VIR_ERR_OPERATION_FAILED,
_("migration failed. Message from the source host: %1$s. Message from the destination host: %2$s"),
orig_err->message, err->message);
g_clear_pointer(&orig_err, virFreeError); g_clear_pointer(&orig_err, virFreeError);
} }
} }
} }
}
/* If ddomain is NULL, then we were unable to start /* If ddomain is NULL, then we were unable to start
* the guest on the target, and must restart on the * the guest on the target, and must restart on the