It just calls migrate QMP command with resume=true without having to
worry about migration capabilities or parameters, storage migration,
etc. since everything has already been done in the normal Perform phase.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
qemuMigrationSrcRun does a lot of thing before and after telling QEMU to
start the migration. Let's make the core reusable by moving it to a new
qemuMigrationSrcStart function.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
To make the code flow a bit more sensible.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Mostly we just need to check whether the domain is in a failed post-copy
migration that can be resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This phase marks a migration protocol as broken in a post-copy phase.
Libvirt is no longer actively watching the migration in this phase as
the migration API that started the migration failed.
This may either happen when post-copy migration really fails (QEMU
enters postcopy-paused migration state) or when the migration still
progresses between both QEMU processes, but libvirt lost control of it
because the connection between libvirt daemons (in p2p migration) or a
daemon and client (non-p2p migration) was closed. For example, when one
of the daemons was restarted.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Both qemuMigrationJobSetPhase and qemuMigrationJobStartPhase were doing
the same thing, which made little sense. Let's follow the difference
between qemuDomainObjSetJobPhase and qemuDomainObjStartJobPhase and
change job owner only in qemuMigrationJobStartPhase.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We will want to update migration phase without affecting job ownership.
Either in the thread that already owns the job or from an event handler
which only changes the phase (of a job no-one owns) without assuming it.
Let's move the ownership change to a new qemuDomainObjStartJobPhase
helper and let qemuDomainObjSetJobPhase set the phase without touching
ownership.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The check can reveal a serious bug in our migration code and we should
not silently ignore it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Into a new qemuMigrationCheckPhase helper, which can be reused in other
places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When recovering from a failed post-copy migration, we need to go through
all migration phases again, but don't need to repeat all the steps in
each phase. Let's create a new set of migration phases dedicated to
post-copy recovery so that we can easily distinguish between normal and
recovery code.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Turn the final part of Begin phase formatting a domain XML for migration
into a reusable helper.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When libvirt daemon is restarted during an active post-copy migration,
we do not always mark the migration as broken. In this phase libvirt is
not really needed for migration to finish successfully. In fact the
migration could have even finished while libvirt was not running or it
may still be happily running.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
So far migration could only be completed while a migration API was
running and waiting for the migration to finish. In case such API could
not be called (the connection that initiated the migration is broken)
the migration would just be aborted or left in a "don't know what to do"
state. But this will change soon and we will be able to successfully
complete such migration once we get the corresponding event from QEMU.
This is specific to post-copy migration when vCPUs are already running
on the destination and we're only waiting for all memory pages to be
transferred. Such post-copy migration (which no-one is actively
watching) is called unattended migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When reconnecting to an active domain we need to use a different job
structure than the one referenced from the VM object.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Normally migrationPort is released in the Finish phase, but we need to
make sure it is properly released also in case qemuMigrationDstFinish is
not called at all. Currently the only callback which is called in this
situation qemuMigrationDstPrepareCleanup which already releases
migrationPort. This patch adds similar handling to additional callbacks
which will be used in the future.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
By separating it into a dedicated qemuMigrationSrcComplete function
which can be later called in other places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The function which started a migration phase should also finish it by
calling qemuMigrationJobFinish/qemuMigrationJobContinue so that the code
is easier to follow.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
- qemuMigrationDstFinish
- qemuMigrationDstFinishOffline
- qemuMigrationDstFinishActive
- qemuMigrationDstFinishFresh
- qemuMigrationDstFinishResume
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
To keep all cookie handling (parsing and formatting) in the same
function.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
- qemuMigrationDstFinish
- qemuMigrationDstFinishOffline
- qemuMigrationDstFinishActive
- qemuMigrationDstFinishFresh
- qemuMigrationDstFinishResume
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Refactors qemuMigrationDstFinish by moving some parts to a dedicated
function for easier introduction of postcopy resume code without
duplicating common parts of the Finish phase. The goal is to have the
following call graph:
- qemuMigrationDstFinish
- qemuMigrationDstFinishOffline
- qemuMigrationDstFinishActive
- qemuMigrationDstFinishFresh
- qemuMigrationDstFinishResume
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We want to prevent our error path that can potentially kill the domain
on the destination host from overwriting an error reported earlier, but
we were only doing so in one specific path when starting vCPUs fails.
Let's do it in all paths.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The comment about QEMU < 0.10.6 has been irrelevant for years.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
By separating it into a dedicated qemuMigrationDstComplete function
which can be later called in other places.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The final part of Finish phase will be refactored into a dedicated
function and we don't want to generate the cookie there.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Let's call it "error" so that it's clear the label is only used in
failure path.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Most of the code in "endjob" label is executed only on failure. Let's
duplicate the rest so that the label can be used only in error path
making the success path easier to follow and refactor.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Code executed only when dom != NULL can be moved before "endjob" label,
to the only place where dom is set.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
We don't need the object until we get to the "endjob" label.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When connection breaks during post-copy migration, QEMU enters
'postcopy-paused' state. We need to handle this state and make the
situation visible to upper layers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Migration is a job which takes some time and if it succeeds, there's
nothing to call another migration on. If a migration fails, it might
make sense to rerun it with different arguments, but this would only be
done once the first migration fails rather than while it is still
running.
If this was not enough, the migration job now stays active even if
post-copy migration fails and anyone possibly retrying the migration
would be waiting for the job timeout just to get a suboptimal error
message.
So let's special case getting a migration job when another one is
already active.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When migration fails after it already switched to post-copy phase on the
source, but early enough that we haven't called Finish on the
destination yet, we know the vCPUs were not started on the destination
and the source host still has a complete state of the domain. Thus we
can just ignore the fact post-copy phase started and normally abort the
migration and resume vCPUs on the source.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
When post-copy migration fails, we can't just abort the migration and
resume the domain on the source host as it is already running on the
destination host and no host has a complete state of the domain memory.
Instead of the current approach of just marking the domain on both ends
as paused/running with a post-copy failed sub state, we will keep the
migration job active (even though the migration API will return failure)
so that the state is more visible and we can better control what APIs
can be called on the domains and even allow for resuming the migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The function never returns anything but zero.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
The events would normally be triggered only if we're changing domain
state. But most of the time the domain is already in the right state and
we're just changing its substate from {PAUSED,RUNNING}_POSTCOPY to
*_POSTCOPY_FAILED. Let's emit lifecycle events explicitly when post-copy
migration fails to make the failure visible without polling.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
There's no need to artificially pause a domain when post-copy fails
from our point of view unless QEMU connection is broken too as migration
may still be progressing well.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The virStorageSourceGetActualType() function returns either
virStorageSource->type (which is of type virStorageType), or
virStorageSourcePoolDef->type, which really stores a value of the
same enum. Thus, the latter struct can be changed so that the
virStorageSourceGetActualType() function can return correct type
instead of generic int.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
There are three places (two in domain_conf.c and one in
qemu_migration.c) where a virStorageSource->type is typecasted to
virStorageType (for the purpose of catching missing enum member
in a switch() statement at compile time). This is needless,
because as of v8.2.0-rc1~120 the struct member is of proper type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
My recent commit v8.3.0-201-gc500955e95 tried to fix a regression which
would cause the function to return success even if virCloseCallbacksSet
failed. But due to a strange code flow in the function introduced an
opposite regression. The function would return NULL on success when
called without VIR_MIGRATE_CHANGE_PROTECTION flag.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit v8.3.0-152-g49ef0f95c6 removed explicit VIR_FREE from
qemuMigrationBegin, effectively reverting v1.2.14-57-g77ddd0bba2
The xml variable was used to hold the return value and thus had to be
unset when an error happened after xml was already non-NULL. Such code
may be quite confusing though and we usually avoid it by not storing
anything to a return variable until everything succeeded.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Remove the argument from the function prototypes and the callback
handler.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Access the 'driver' struct from the private data rather than the passed
opaque pointer in preparation to remove the opaque pointer.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All QEMU versions we care about already support migration events.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Acquiring job introduced in commit [1] to fix a race described in the
commit. Actually it does not help because we get domain in create API
before acuiring job. Then [2] fixed the race but [1] was not reverted even
it is does not required by [2] to work properly.
[1] commit b629c64e5e0a32ef439b8eeb3a697e2cd76f3248
Author: Martin Kletzander <mkletzan@redhat.com>
Date: Thu Oct 30 14:38:35 2014 +0100
qemu: avoid rare race when undefining domain
[2] commit c7d1c139ca3402e875002753952e80ce8054374e
Author: Martin Kletzander <mkletzan@redhat.com>
Date: Thu Dec 11 11:14:08 2014 +0100
qemu: avoid rare race when undefining domain
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@openvz.org>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>