If the job wasn't started, we don't need to end the synchronous job. Add
a note and drop the unnecessary calls.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Rather than directly modifying fields in the qemuBlockJobDataPtr
structure add a bunch of fields which allow to do the transitions.
This will help later when adding more complexity to the job handling.
APIs introduced in this patch are:
qemuBlockJobDiskNew - prepare for starting a new blockjob on a disk
qemuBlockJobDiskGetJob - get the block job data structure for a disk
For individual job state manipulation the following APIs are added:
qemuBlockJobStarted - Sets the job as started with qemu. Until that
the job can be cancelled without asking qemu.
qemuBlockJobStartupFinalize - finalize job startup. If the job was
started in qemu already, just releases
reference to the job object. Otherwise
clears everything as if the job was never
started.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Extract the disk mirroring startup code from the loop into a separate
function to allow cleaner cleanup paths.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When cancelling job after a reconnect we can now use the disk block job
state rather than having to re-detect it in the migration code.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Internally we do a 'block-copy' to accomodate non-shared storage
migration but the code did not fill in that the block job was active on
the disk when starting the copy job. Since we handle block jobs finishes
regardless of having it registered it's not a problem but soon will
become one.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
All the public APIs of the qemu_blockjob module operate on a 'disk'.
Since I'll be adding APIs which operate on a job later let's rename the
existing ones.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If migration is cancelled or confirm phase fails the domain
should be kept on the source even if VIR_MIGRATE_UNDEFINE_SOURCE
was requested.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
There are some checks done when parsing a migration cookie. For
instance, one of the checks ensures that the domain is not being
migrated onto the same host. If that is the case, then we are in
big trouble because the @vm is the same domain object used by
source and it has some jobs sets and everything so recovering
from failed cookie parsing would be needlessly hard.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The function currently takes virDomainObjPtr because it's using
both: the domain definition and domain private data.
Unfortunately, this means that in prepare phase we can't parse
migration cookie before putting incoming domain def onto domain
objects list (addressed in the very next commit). Change the
arguments so that virDomainDef and private data are passed
instead of virDomainObjPtr.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
There are several functions called in the cleanup path. Some of
them do report error (e.g. qemuDomainRemoveInactiveJob()) which
may result in overwriting an error reported earlier with some
less useful message.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
The only place where VIR_DOMAIN_EVENT_RESUMED should be generated is the
RESUME event handler to make sure we don't generate duplicate events or
state changes. In the worse case the duplicity can revert or cover
changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events
we could happily mark the domain as running and report
VIR_DOMAIN_EVENT_RESUMED to registered clients.
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When a domain is killed on the source host while it is being migrated
and libvirtd is waiting for the migration to finish (waiting for the
domain condition in qemuMigrationSrcWaitForCompletion), the run-time
state including priv->job.current may already be freed once
virDomainObjWait returns with -1. Thus the priv->job.current pointer
cached in jobInfo is no longer valid and setting jobInfo->status may
crash the daemon.
https://bugzilla.redhat.com/show_bug.cgi?id=1593137
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Once we called qemuDomainObjEnterRemote to talk to the destination
daemon during a peer to peer migration, the vm lock is released and we
only hold an async job. If the source domain dies at this point the
monitor EOF callback is allowed to do its job and (among other things)
clear all private data irrelevant for stopped domain. Thus when we call
qemuDomainObjExitRemote, the domain may already be gone and we should
avoid touching runtime private data (such as current job info).
In other words after acquiring the lock in qemuDomainObjExitRemote, we
need to check the domain is still alive. Unless we're doing offline
migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1589730
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The variable is used to store the offline migration capability of the
destination daemon. Let's call it 'dstOffline' so that we can later use
'offline' to indicate whether we were asked to do offline migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
And replace all calls with virObjectEventStateQueue such that:
qemuDomainEventQueue(driver, event);
becomes:
virObjectEventStateQueue(driver->domainEventState, event);
And remove NULL checking from all callers.
Signed-off-by: Anya Harter <aharter@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Replace instances where we previously called virGetLastError just to
either get the code or to check if an error exists with
virGetLastErrorCode to avoid a validity pre-check.
Signed-off-by: Ramy Elkest <ramyelkest@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
The alias of the secret for decrypting the TLS passphrase is useless
besides for TLS setup. Stop passing it around.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This has been broken since commit v4.0.0-165-g93412bb827 which added
jobInfo->statsType enum to distinguish various statistics types. During
migration the type will always be QEMU_DOMAIN_JOB_STATS_TYPE_MIGRATION,
however the destination code consuming the statistics data from
migration cookie failed to properly set the type. So even though
everything was filled in, the type remained *_NONE and any attempt to
fetch the statistics data of a completed migration on the destination
host failed.
https://bugzilla.redhat.com/show_bug.cgi?id=1584071
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Implement the secure way to transport non-shared storage data across
migrations. The new approach uses blockdev-add to create the NBD client
so that the TLS secret object can be specified.
https://bugzilla.redhat.com/show_bug.cgi?id=1300772
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Separate the code relevant for this approach so that we can later add a
second implementation without making the function messy.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Drop the mention of 'drive mirror' from the function names and mention
NBD. This will help when adding the 'blockdev mirror' migration code
which will allow using TLS.
Additionally fix some of the function comments to make more sense
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The initiation of a synchronous block job in the NBD storage migration
code was placed after entering the monitor thus after the lock on the VM
object was unlocked. Thankfully nothing bad could happen in this
situation since the migration job prevents any disk detaches or other
modifications of the domain object.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
The pointer to the qemu driver is already included in domain object's
private data, so does not need to be passed as yet another parameter
when the domain object is already passed.
Also removes parameter 'driver' from functions which had it just because of
qemuBlockJobUpdate.
Signed-off-by: Roland Schulz <schullzroll@gmail.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
When adding a new object to the domain object list, there should
have been 2 virObjectRef calls made one for each list into which
the object was placed to match the 2 virObjectUnref calls that
would occur during Remove as part of virHashRemoveEntry when
virObjectFreeHashData is called when the element is removed from
the hash table as set up in virDomainObjListNew.
Some drivers (libxl, lxc, qemu, and vz) handled this inconsistency
by calling virObjectRef upon successful return from virDomainObjListAdd
in order to use virDomainObjEndAPI when done with the returned @vm.
While others (bhyve, openvz, test, and vmware) handled this via only
calling virObjectUnlock upon successful return from virDomainObjListAdd.
This patch will "unify" the approach to use virDomainObjEndAPI
for any @vm successfully returned from virDomainObjListAdd.
Because list removal is so tightly coupled with list addition,
this patch fixes the list removal algorithm to return the object
as entered - "locked and reffed". This way, the callers can then
decide how to uniformly handle add/remove success and failure.
This removes the onus on the caller to "specially handle" the
@vm during removal processing.
The Add/Remove logic allows for some logic simplification such
as in libxl where we can Remove the @vm directly rather than
needing to set a @remove_dom boolean and removing after the
libxlDomainObjEndJob completes as the @vm is locked/reffed.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Use the TLS env for migration when starting the NBD server if TLS is
enabled for migration.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
To allow encryption of the non-shared storage migration NBD connection
we will need to instantiated the NBD server with the TLS env.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When a VM is destroyed while being migrated (waiting in
qemuMigrationSrcWaitForCompletion) the private object cleanup code frees
the 'current' job info. Since the migration code attempts to setup
various aspects of the current job even on failure this results into a
crash.
Job data is cleared in qemuDomainObjPrivateDataClear since commit
888aa4b6b9db
Fix this by skipping all of the code which requires the qemu process to
be alive if the VM is not active any more.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Since libvirt is currently not able to setup the NBD migration stream
secured by TLS we should not allow such migration since data would be
transferred unencrypted.
This will break compatibility of TLS migration if non-shared storage is
requested but the security implications are more severe.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Trying to delete the non-existent TLS objects results in ugly error
messages in the log, which could easily confuse users. Let's avoid this
confusion by not trying to delete the objects if we were not asked to
enable TLS migration and thus we didn't created the objects anyway.
This patch restores the behavior to the state before "qemu: Reset all
migration parameters".
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We store the flags passed to the API which started the migration. Let's
use them instead of a separate bool to check if post-copy migration was
requested.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When an async job is running, we sometimes need to know how it was
started to distinguish between several types of the job, e.g., post-copy
vs. normal migration. So far we added a specific bool item to
qemuDomainJobObj for such cases, which doesn't scale very well and
storing such bools in status XML would be painful so we didn't do it.
A better approach is to store the flags passed to the API which started
the async job, which can be easily stored in status XML.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
When an always-on migration capability is supposed to be enabled on both
sides of migration, each side can only enable the feature if it is
enabled by the other side.
Thus the source host sends a list of supported migration capabilities in
the migration cookie generated in the Begin phase. The destination host
consumes the list in the Prepare phase and decides what capabilities can
be enabled when starting a QEMU process for incoming migration. Once
done the destination sends the list of supported capabilities back to
the source where it is used during the Perform phase to determine what
capabilities can be automatically enabled.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some migration capabilities may be enabled automatically, but only if
both sides of migration support them. Thus we need to be able transfer
the list of supported migration capabilities in migration cookie.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Since every parameter or capability set in qemuMigrationCompression
structure is now reflected in qemuMigrationParams structure, we can
replace qemuMigrationAnyCompressionDump with a new API which will work
on qemuMigrationParams.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
There's no need to call this API explicitly in the migration code. We
can pass the compression parameters to qemuMigrationParamsFromFlags and
it can internally call qemuMigrationParamsSetCompression to apply them
to the qemuMigrationParams structure.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Propagate the calls up the stack to the point where
qemuMigrationParamsFromFlags is called. The end goal achieved in the
following few patches is to merge compression parameters into the
general migration parameters code.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Most migration capabilities are directly connected with
virDomainMigrateFlags so qemuMigrationParamsFromFlags can automatically
enable them.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Some migration capabilities are always enabled if QEMU supports them. We
can just drop the explicit code for them and let
qemuMigrationParamsCheck automatically set such capabilities.
QEMU_MONITOR_MIGRATION_CAPS_EVENTS would normally be one of the always
on features, but it is the only feature we want to enable even for other
jobs which internally use migration (such as save and snapshot). Hence
this capability is set very early after libvirtd connects to QEMU
monitor.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It's just a tiny wrapper around qemuMigrationParamsSetCapability and
setting priv->job.postcopyEnabled is not something qemuMigrationParams
code should be doing anyway so let the callers do it.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Every migration entry point in qemu_driver is supposed to call
qemuMigrationParamsFromFlags to transform flags and parameters into
qemuMigrationParams structure and pass the result to qemuMigration*
APIs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Instead of checking each capability at the time we want to set it in
qemuMigrationParamsSetCapability we can check all of them at once in
qemuMigrationParamsCheck.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We reached the point when qemuMigrationParamsApply is the only API which
sends migration parameters and capabilities to QEMU. Thus all but the
TLS parameters can be set before we ask QEMU for the current values of
all parameters in qemuMigrationParamsCheck.
Supported migration capabilities are queried as soon as libvirt connects
to QEMU monitor so we can check them anytime.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
We reached the point when qemuMigrationParamsApply is the only API which
sends migration parameters and capabilities to QEMU. Thus all but the
TLS parameters can be set before we ask QEMU for the current values of
all parameters in qemuMigrationParamsCheck.
Supported migration capabilities are queried as soon as libvirt connects
to QEMU monitor so we can check them anytime.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Prefer xbzrle-cache-size migration parameter over the special
migrate-set-cache-size QMP command.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>