Commit Graph

492 Commits

Author SHA1 Message Date
Jiri Denemark
eb084a733b qemu: Report more migration statistics
memory_dirty_rate corresponds to dirty-pages-rate in QEMU and
memory_iteration is what QEMU reports in dirty-sync-count.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-01-08 18:18:58 +01:00
Jiri Denemark
b638b9b35c qemu: Create a proper type for migration status enum
The enum will be called qemuMonitorMigrationStatus.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-01-08 18:18:58 +01:00
Jiri Denemark
09bbd96239 qemu: Rename qemuMonitorMigrationStatus struct
The structure actually contains migration statistics rather than just
the status as the name suggests. Renaming it as
qemuMonitorMigrationStats removes the confusion.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-01-08 18:18:58 +01:00
Jiri Denemark
f87668b70e qemu: Fix NBD migration with default listenAddress
My commit 674afcb09e moved computing the
default listen address from qemuMigrationPrepareAny to
qemuMigrationPrepareIncoming. However, I didn't notice listenAddress was
later passed to qemuMigrationStartNBDServer. Thus, it would be called
with the original value of listenAddress (NULL).

Let's add the updated listen address to qemuProcessIncomingDef and use
it when starting NBD servers.

Reported-by: Michael Chapman <mike@very.puzzling.org>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-01-08 10:39:20 +01:00
Jiri Denemark
0e747f2029 qemu: Add debug message to spice migration
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2016-01-07 13:20:38 +01:00
Michael Chapman
28c9eea032 qemu: do not leak NBD disk data in migration cookie
Valgrind complained:

==18990== 20 (16 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 188 of 996
==18990==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==18990==    by 0x5292E9B: virAllocN (viralloc.c:191)
==18990==    by 0x2221E731: qemuMigrationCookieXMLParseStr (qemu_migration.c:1012)
==18990==    by 0x2221F390: qemuMigrationEatCookie (qemu_migration.c:1413)
==18990==    by 0x222228CE: qemuMigrationPrepareAny (qemu_migration.c:3463)
==18990==    by 0x22224121: qemuMigrationPrepareDirect (qemu_migration.c:3865)
==18990==    by 0x22251C25: qemuDomainMigratePrepare3Params (qemu_driver.c:12414)
==18990==    by 0x5389EE0: virDomainMigratePrepare3Params (libvirt-domain.c:5107)
==18990==    by 0x1278DB: remoteDispatchDomainMigratePrepare3ParamsHelper (remote.c:5425)
==18990==    by 0x53FF287: virNetServerProgramDispatch (virnetserverprogram.c:437)
==18990==    by 0x540523D: virNetServerProcessMsg (virnetserver.c:135)
==18990==    by 0x54052C7: virNetServerHandleJob (virnetserver.c:156)
==18990==
==18990== 20 (16 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 189 of 996
==18990==    at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==18990==    by 0x5292E9B: virAllocN (viralloc.c:191)
==18990==    by 0x2221E731: qemuMigrationCookieXMLParseStr (qemu_migration.c:1012)
==18990==    by 0x2221F390: qemuMigrationEatCookie (qemu_migration.c:1413)
==18990==    by 0x222249D2: qemuMigrationRun (qemu_migration.c:4395)
==18990==    by 0x22226365: doNativeMigrate (qemu_migration.c:4693)
==18990==    by 0x22228E45: qemuMigrationPerform (qemu_migration.c:5553)
==18990==    by 0x2225144B: qemuDomainMigratePerform3Params (qemu_driver.c:12621)
==18990==    by 0x539F5D8: virDomainMigratePerform3Params (libvirt-domain.c:5206)
==18990==    by 0x127305: remoteDispatchDomainMigratePerform3ParamsHelper (remote.c:5557)
==18990==    by 0x53FF287: virNetServerProgramDispatch (virnetserverprogram.c:437)
==18990==    by 0x540523D: virNetServerProcessMsg (virnetserver.c:135)

If we're replacing the NBD data, it's simplest to free the old object
(including the disk list) and allocate a new one.

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2016-01-04 14:54:23 +01:00
Daniel P. Berrange
a48539c013 qemu: convert monitor to use qemuDomainLogContextPtr indirectly
Currently the QEMU monitor is given an FD to the logfile. This
won't work in the future with virtlogd, so it needs to use the
qemuDomainLogContextPtr instead, but it shouldn't directly
access that object either. So define a callback that the
monitor can use for reporting errors from the log file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-26 14:30:15 +00:00
Daniel P. Berrange
69b0992178 qemu: unify code for reporting errors from QEMU log files
There are two pretty similar functions qemuProcessReadLog and
qemuProcessReadChildErrors. Both read from the QEMU log file
and try to strip out libvirt messages. The latter then reports
an error, while the former lets the callers report an error.

Re-write qemuProcessReadLog so that it uses a single read
into a dynamically allocated buffer. Then introduce a new
qemuProcessReportLogError that calls qemuProcessReadLog
and reports an error.

Convert all callers to use qemuProcessReportLogError.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-26 14:30:15 +00:00
Jiri Denemark
e7c6f45759 qemu: Use qemuProcessLaunch in migration Prepare phase
Using qemuProcess{Init,Launch,FinishStartup} allows us to run
pre-migration commands on destination before asking QEMU to wait for
incoming migration data.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-25 15:27:31 +01:00
Jiri Denemark
ad1012978f qemu: Skip starting NBD servers for offline migration
NBD storage migration will not work with offline migration anyway and we
already checked that the user did not ask for it. Thus it doesn't make
sense to keep the code after 'done' label where we jump in case of
offline migration.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-25 15:27:31 +01:00
Jiri Denemark
95e2415b95 qemu: Kill QEMU process if Prepare phase fails
Some failure paths in qemuMigrationPrepareAny forgot to kill the just
started QEMU process. This patch fixes this by combining 'stop' and
'endjob' label into a new label 'stopjob'. This name was chosen to avoid
confusion with the most common semantics of 'endjob'. Normally, 'endjob'
is always called at the end of an API to stop the job we entered at the
beginning. In qemuMigrationPrepareAny we only want to stop the job in
failure path; on success we need to carry the job over to the Finish
phase.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-25 15:27:31 +01:00
Jiri Denemark
674afcb09e qemu: Separate incoming URI generation from qemuMigrationPrepareAny
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-25 15:27:31 +01:00
Jiri Denemark
2c4ba8b4f3 qemu: Use -incoming defer for migrations
Traditionally, we pass incoming migration URI on QEMU command line,
which has some drawbacks. Depending on the URI QEMU may initialize its
migration state immediately without giving us a chance to set any
additional migration parameters (this applies mainly for fd: URIs). For
some URIs the monitor may be completely blocked from the beginning until
migration is finished, which means we may be stuck in qmp_capabilities
command without being able to send any QMP commands.

QEMU solved this by introducing "defer" parameter for -incoming command
line option. This will tell QEMU to prepare for an incoming migration
while the actual incoming URI is sent using migrate-incoming QMP
command. Before calling this command we can normally talk to the
monitor and even set any migration parameters which will be honored by
the incoming migration.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-19 09:41:23 +01:00
Jiri Denemark
34b9fe6101 qemu: Move incoming URI code to qemu_migration
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-19 09:41:23 +01:00
Jiri Denemark
6d1f8899a6 qemu: Refactor waiting for completed migration on destination
Move the code from qemuMigrationFinish into a dedicated function.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-11-19 09:41:23 +01:00
Daniel P. Berrange
2e90c9daf9 qemu: assume support for all migration protocols except rdma
Since we require QEMU 0.12.0, we can assume that QEMU supports
all of the fd, tcp, unix and exec migration protocols.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-10 10:38:17 +00:00
Peter Krempa
afb792bd38 qemu: migration: Actually error out on unsupported migration flag
The code reported that a migration flag is unsupported but didn't jump
to the error label. Probably an oversight in commit f88af9dc that
introduced the flag checking.
2015-11-05 15:23:37 +01:00
Peter Krempa
f59808b724 qemu: migration: Properly parse memory hotplug migration flag
Since the flag was not enabled when 'eating' the migration cookie,
libvirt reported a bogus error when memory hotplug was enabled:

 unsupported migration cookie feature memory-hotplug

The error was ignored though due to a bug in the code so it slipped
through testing.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1278404
2015-11-05 15:20:21 +01:00
Luyao Huang
926a98de21 qemu: fix migration flags undefinesource cannot work
In commit f41be296, we moved vm->persistent check into
qemuDomainRemoveInactive, but we didn't change the vm->persistent
before call qemuDomainRemoveInactive in some place before and just
call it to remove the inactive vm.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
2015-10-27 10:43:52 +01:00
Shivaprasad G Bhat
b39a1fe165 Close the source fd if the destination qemu exits during tunnelled migration
Tunnelled migration can hang if the destination qemu exits despite all the
ABI checks. This happens whenever the destination qemu exits before the
complete transfer is noticed by source qemu. The savevm state checks at
runtime can fail at destination and cause qemu to error out.
The source qemu cant notice it as the EPIPE is not propogated to it.
The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
never get to 100% transfer completion.
The qemuMigrationWaitForCompletion() never breaks out as well since
the ssh connection to destination is healthy, and the source qemu also thinks
the migration is ongoing as the Fd to which it transfers, is never
closed or broken. So, the migration will hang forever. Even Ctrl-C on the
virsh migrate wouldn't be honoured. Close the source side FD when there is
an error in the stream. That way, the source qemu updates itself and
qemuMigrationWaitForCompletion() notices the failure.

Close the FD for all kinds of errors to be sure. The error message is not
copied for EPIPE so that the destination error is copied instead later.

Note:
Reproducible with repeated migrations between Power hosts running in different
subcores-per-core modes.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
2015-10-16 13:26:32 +02:00
Peter Krempa
01b4baba59 qemu: migration: Skip few checks while doing offline migration
qemuMigrationIsAllowed would disallow offline migration if the VM
contained host devices or memory modules. Since during offline migration
we don't transfer any state we can safely migrate VMs with such
configuration.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1265049
2015-10-07 09:10:05 +02:00
Peter Krempa
b6c44af0f7 qemu: migration: Use migration flags in qemuMigrationIsAllowed
Use the migration @flags for checking various migration aspects rather
than picking them out as booleans. Document the new semantics in the
function header.
2015-10-07 09:09:22 +02:00
Peter Krempa
f558c66f17 qemu: migration: Drop @def from qemuMigrationIsAllowed
Now that qemuMigrationIsAllowed is always called with @vm, we can drop
the @def argument and simplify the control flow.

Additionally the comment is invalid so drop it.
2015-10-07 09:09:21 +02:00
Peter Krempa
b866991f0c qemu: migration: Split source and destination migration checks
Extract the hostdev check from qemuMigrationIsAllowed into a separate
function since that is the only part that needs to be done in the v2
migration protocol prepare phase on the destination. All other checks
were added when the v3 protocol existed so they don't need to be
extracted.

This change will allow to drop the @def argument for
qemuMigrationIsAllowed and further simplify the function.
2015-10-07 09:08:59 +02:00
Jiri Denemark
be5347bb72 qemu: Wait until destination QEMU consumes all migration data
Even though QEMU on the source host reports completed migration and thus
we move to the Finish phase, QEMU on the destination host may still be
processing migration data. Thus before we can start guest CPUs on the
destination, we have to wait for a completed migration event.

https://bugzilla.redhat.com/show_bug.cgi?id=1265902

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-10-06 16:15:21 +02:00
Jiri Denemark
829c62b7a5 qemu: Make updating stats in qemuMigrationCheckJobStatus optional
With new QEMU which supports migration events,
qemuMigrationCheckJobStatus needs to explicitly query QEMU for migration
statistics once migration is completed to make sure the caller sees
up-to-date statistics with both old and new QEMU. However, some callers
are not interested in the statistics at all and once we start waiting
for a completed migration on the destination host too, checking the
statistics would even fail. Let's push the decision whether to update
the statistics or not to the caller.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-10-06 16:15:21 +02:00
Jiri Denemark
2af983f4c4 qemu: Introduce flags in qemuMigrationCompleted
The function already has two bool parameters and we will need to add a
new one. Let's switch to flags to make the callers readable.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-10-06 16:15:21 +02:00
Jiri Denemark
b106c8b910 qemu: Copy completed migration stats only on success
The destination host gets detailed statistics about the current
migration form the source host via migration cookie and copies them to
the domain object so that they can be queried using
virDomainGetJobStats. However, we should only copy statistics to the
domain object when migration finished successfully.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-10-06 16:15:21 +02:00
Jiri Denemark
d27c66dbaa qemu: Always update migration times on destination
Even if we are migrating a domain with VIR_MIGRATE_PAUSED flag set, we
should still update the total time of the migration. Updating downtime
doesn't hurt either, even though we don't actually start guest CPUs.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-10-06 16:15:21 +02:00
Michal Privoznik
f41be29635 qemu: Move vm->persistent check into qemuDomainRemoveInactive
So far we have the following pattern occurring over and over
again:

  if (!vm->persistent)
      qemuDomainRemoveInactive(driver, vm);

It's safe to put the check into the function and save some LoC.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-09-24 10:52:38 +02:00
Peter Krempa
59173c3dd9 conf: Add XML parser flag that will allow us to do incompatible updates
Add a new parser flag that will mark code paths that parse XML files
wich will not be used with existing VM state so that post parse
callbacks can possibly do ABI incompatible changes if needed.
2015-09-22 16:09:27 +02:00
Peter Krempa
1891cad542 conf: Add helper to determine whether memory hotplug is enabled for a vm
Add a simple helper so that the code doesn't have to rewrite the same
condition multiple times.
2015-09-22 16:09:27 +02:00
Jiri Denemark
79ccfec803 qemu: Fix some corner cases in persistent migration
When persistently migrating a domain to a destination host where the
same domain already exists (i.e., it is persistent and shutdown at the
destination), we would happily throw away the original persistent
definition without properly freeing it. And when updating the definition
fails for some reason we don't properly revert to the original state
leaving the domain broken.

In addition to fixing these issues, the patch also makes sure the domain
definition parsed from a migration cookie is either used or freed.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 14:40:55 +02:00
Jiri Denemark
c641d55083 qemu: Queue events in migration Finish phase ASAP
For quite a long time we don't need to postpone queueing events until
the end of the function since we no longer have the big driver lock.
Let's make the code of qemuMigrationFinish simpler by queuing events at
the time we generate them.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:50:04 +02:00
Jiri Denemark
cda2afac79 qemuDomainEventQueue: Check if event is non-NULL
Every single call to qemuDomainEventQueue() uses the following pattern:

    if (event)
        qemuDomainEventQueue(driver, event);

Let's move the check for valid event to qemuDomainEventQueue and
simplify all callers.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:50:03 +02:00
Jiri Denemark
5f7ad32778 qemu: Don't report false errors in migration protocol v2
Finish is the final state in v2 of our migration protocol. If something
fails, we have no option to abort the migration and resume the original
domain. Non fatal errors (such as failure to start guest CPUs or make
the domain persistent) has to be treated as success. Keeping the domain
running while reporting the failure was just asking for trouble.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:50:03 +02:00
Jiri Denemark
cc40c28410 qemu: Kill domain when migration finish fails
Whenever something fails during incoming migration in Finish phase
before we started guest CPUs, we need to kill the domain in addition to
reporting the failure.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:50:03 +02:00
Jiri Denemark
f5c509623f qemu: Don't fail migration on save status failure
When we save status XML at the point during migration where we have
already started the domain on destination, we can't really go back and
abort migration. Thus the only thing we can do is to log a warning and
report success.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:50:02 +02:00
Jiri Denemark
8874d37f94 qemu: Simplify qemuMigrationFinish
Offline migration is quite special because we don't really need to do
anything but make the domain persistent. Let's do it separately from
normal migration to avoid cluttering the code with
!(flags & VIR_MIGRATE_OFFLINE).

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 13:49:49 +02:00
Jiri Denemark
a86b188567 qemu: Split qemuMigrationFinish
Separate code which makes incoming domain persistent into
qemuMigrationPersist.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-09-18 10:52:39 +02:00
Peter Krempa
a98e5a7815 qemu: migration: Relax enforcement of memory hotplug support
If the current live definition does not have memory hotplug enabled, but
the persistent one does libvirt would reject migration if the
destination does not support memory hotplug even if the user didn't want
to persist the VM at the destination and thus the XML containing the
memory hotplug definition would not be used. To fix this corner case the
code will check for memory hotplug in the newDef only if
VIR_MIGRATE_PERSIST_DEST was used.
2015-09-09 09:39:55 +02:00
John Ferlan
ea3c5f25eb qemu: Check virGetLastError return value for migration finish failure
Commit id '2e7cea243' added a check for an error from Finish instead
of 'unexpected error'; however, if for some reason there wasn't an
error, then virGetLastError could return NULL resulting in the
NULL pointer deref to err->domain.
2015-09-04 15:19:04 -04:00
Peter Krempa
6da3b694cc qemu: Forbid image pre-creation for non-shared storage migration
Libvirt doesn't reliably know the location of the backing chain when
pre-creating images for non-shared migration. This isn't a problem for
full copy, but incremental copy requires the information.

Forbid pre-creating the image in cases where incremental migration is
required. This limitation can perhaps be lifted once libvirt will fully
support loading of backing chain information from the XML.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1249587
2015-08-05 17:24:59 +02:00
Jiri Denemark
e8d0166e1d qemu: Do not reset labels when migration fails
When stopping a domain on the destination host after a failed migration,
we need to avoid reseting security labels since the domain is still
running on the source host. While we were correctly doing so in some
cases, there were still some paths which did this wrong.

https://bugzilla.redhat.com/show_bug.cgi?id=1242904

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-31 15:15:12 +02:00
Peter Krempa
136f3de411 qemu: Reject migration with memory-hotplug if destination doesn't support it
If destination libvirt doesn't support memory hotplug since all the
support was introduced by adding new elements the destination would
attempt to start qemu with an invalid configuration. The worse part is
that qemu might hang in such situation.

Fix this by sending a required migration feature called 'memory-hotplug'
to the destination. If the destination doesn't recognize it it will fail
the migration.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1248350
2015-07-30 16:44:02 +02:00
Michal Privoznik
cd043390ff qemuMigrationRun: Don't leak @fd
If we are migrating to an UNIX socket, we accept() a connection
from qemu and use that FD to set up a tunnel. However, the FD is
not closed as often as it should be.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-07-15 11:40:41 +02:00
Jiri Denemark
2e7cea2435 qemu: Use error from Finish instead of "unexpectedly failed"
When QEMU exits on destination during migration, the source reports
either success (if the failure happened at the very end) or unhelpful
"unexpectedly failed" error message. However, the Finish API called on
the destination may report a real error so let's use it instead of the
generic one.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-10 11:47:13 +02:00
Jiri Denemark
44c42b564d qemu: Don't report false error from MigrateFinish
virDomainMigrateFinish* APIs were unfortunately designed to return the
pointer to the domain on destination and NULL on error. This looks OK in
normal cases but the same API is also called when we know migration
failed and thus we expect Finish to return NULL even if it actually did
all it was supposed to do without any error. The call is defined to
return nonnull domain pointer over RPC, which means returning NULL will
always result in an error being send. If this was not in fact an error,
the API itself wouldn't set anything to the thread local virError, which
makes the RPC layer come up with it's own "Library function returned
error but did not set virError" error.

This is quite confusing and also hard to detect by the caller. This
patch adds a special error code which can be used to check that Finish
successfully aborted migration.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-10 11:47:13 +02:00
Jiri Denemark
e68f395fcb qemu: Remember incoming migration errors
If QEMU fails during incoming migration, the domain disappears including
a possibly useful error message read from QEMU log file. Let's remember
the error in virQEMUDriver so that Finish can report more than just "no
such domain".

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-10 11:47:13 +02:00
Jiri Denemark
3409f5bc4e qemu: Wait for migration events on domain condition
Since we already support the MIGRATION event, we just need to make sure
the domain condition is signalled whenever a p2p connection drops or the
domain is paused due to IO error and we can avoid waking up every 50 ms
to check whether something happened.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-09 21:57:30 +02:00
Jiri Denemark
6d2edb6a42 qemu: Update migration state according to MIGRATION event
We don't need to call query-migrate every 50ms when we get the current
migration state via MIGRATION event.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-07-09 21:53:35 +02:00
Luyao Huang
09444724bc qemu: Avoid removing persistent config if migration fails
When migration fails in qemuMigrationPrepareAny, we unconditionally call
qemuDomainRemoveInactive, which should only be called for transient
domains. The check for !vm->persistent was accidentally removed by
commit 540c339.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-25 10:18:39 +02:00
Jiri Denemark
d823fa6f64 qemu: cancel drive mirrors when p2p connection breaks
When a connection to the destination host during a p2p migration drops,
we know we will have to cancel the migration; it doesn't make sense to
waste resources by trying to finish the migration. We already do so
after sending "migrate" command to QEMU and we should do it while
waiting for drive mirrors to become ready too.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:19:49 +02:00
Jiri Denemark
d29c45587b qemu: Refactor qemuMigrationWaitForCompletion
Checking status of all part of migration and aborting it when something
failed is a complex thing which makes the waiting loop hard to read.
This patch moves all the checks into a separate function similarly to
what was done for drive mirror loops.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:12 +02:00
Jiri Denemark
92b5bcccaa qemu: Don't pass redundant job name around
Instead of passing current job name to several functions which already
know what the current job is we can generate the name where we actually
need to use it.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:12 +02:00
Jiri Denemark
c1a7f199e8 qemu: Refactor qemuMigrationUpdateJobStatus
Once we start waiting for migration events instead of polling
query-migrate, priv->job.current will not be regularly updated anymore
because we will get the current status directly from the events. Thus
virDomainGetJob{Info,Stats} will have to query QEMU, but they can't just
blindly update priv->job.current structure. This patch introduces
qemuMigrationFetchJobStatus which just fills in a caller supplied
structure and makes qemuMigrationUpdateJobStatus a tiny wrapper around
it.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:12 +02:00
Jiri Denemark
2ad46e5b0e qemu: Do not poll for spice migration status
QEMU_CAPS_SEAMLESS_MIGRATION capability says QEMU supports
SPICE_MIGRATE_COMPLETED event. Thus we can just drop all code which
polls query-spice and replace it with waiting for the event.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:11 +02:00
Jiri Denemark
d814c70b3b qemu: Use domain condition for asyncAbort
To avoid polling for asyncAbort flag changes.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:11 +02:00
Jiri Denemark
e8f263e0d0 qemu: Cancel disk mirrors after libvirtd restart
When libvirtd is restarted during migration, we properly cancel the
ongoing migration (unless it managed to almost finished before the
restart). But if we were also migrating storage using NBD, we would
completely forget about the running disk mirrors.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:11 +02:00
Jiri Denemark
40cd0290dc qemu: Make qemuMigrationCancelDriveMirror usable without async job
We don't have an async job when reconnecting to existing domains after
libvirtd restart.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:10 +02:00
Jiri Denemark
a9ba39a1a7 qemu: Abort migration early if disk mirror failed
Abort migration as soon as we detect that some of the disk mirrors
failed. There's no sense in trying to finish memory migration first.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:10 +02:00
Jiri Denemark
cebb110f73 qemu: Cancel storage migration in parallel
Instead of cancelling disk mirrors sequentially, let's just call
block-job-cancel for all migrating disks and then wait until all
disappear.

In case we cancel disk mirrors at the end of successful migration we
also need to check all block jobs completed successfully. Otherwise we
have to abort the migration.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:10 +02:00
Jiri Denemark
4172b96a3e qemu: Use domain condition for synchronous block jobs
By switching block jobs to use domain conditions, we can drop some
pretty complicated code in NBD storage migration.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:15:10 +02:00
Jiri Denemark
39564891f8 qemu: Properly report failed migration
Because we are polling we may detect some errors after we asked QEMU for
migration status even though they occurred before. If this happens and
QEMU reports migration completed successfully, we would happily report
the migration succeeded even though we should have cancelled it because
of the other error.

In practise it is not a big issue now but it will become a much bigger
issue once the check for storage migration status is moved inside the
loop in qemuMigrationWaitForCompletion.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-06-19 15:13:16 +02:00
Pavel Boldin
93a19e283e qemu: migration: selective block device migration
https://bugzilla.redhat.com/show_bug.cgi?id=1203032

Implement a `migrate_disks' parameters for the QEMU driver. This multi-
value parameter can be used to explicitly specify what block devices
are to be migrated using the NBD server. Tunnelled migration using NBD
is to be done.

Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-06-18 16:46:09 +02:00
Michal Privoznik
cb7297c150 qemuMigrationDriveMirror: Force raw format for NBD
When playing with disk migration lately, I've noticed this warning in
domain logs:

WARNING: Image format was not specified for 'nbd://masina:49153/drive-virtio-disk0' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.

So I started digging into qemu source code to see what has triggered
the warning. I'd expect qemu to know formats of guest's disks since we
tell them on command line. This lead me to qmp_drive_mirror() where
the following can be found:

    if (!has_format) {
        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
    }

So, format is automatically initialized from the disk iff mode !=
"existing". Unfortunately, in migration we are tied to use this mode
(NBD doesn't support creating new images). Therefore the only way to
avoid this warning is to pass format. The discussion on the mail-list [1]
resulted in the code that always forces NBD export as "raw" format.

[1] https://www.redhat.com/archives/libvir-list/2015-June/msg00153.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Pavel Boldin <pboldin@mirantis.com>
2015-06-18 16:46:09 +02:00
Michal Privoznik
9c5efd1afd qemuMigrationBeginPhase: Fix function header indentation
This function is returning a string (domain XML). Since d3ce7363
when it was first introduced, it was indented incorrectly:

static char
*qemuMigrationBeginPhase(..)

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-06-18 16:46:09 +02:00
Daniel P. Berrange
d587704cc7 rpc: allow selection of TCP address family
By default, getaddrinfo() will return addresses for both
IPv4 and IPv6 if both protocols are enabled, and so the
RPC code will listen/connect to both protocols too. There
may be cases where it is desirable to restrict this to
just one of the two protocols, so add an 'int family'
parameter to all the TCP related APIs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-06-11 12:11:18 +01:00
Ján Tomko
12b949dfb2 maint: remove incorrect apostrophes from 'its' 2015-06-04 10:01:42 +02:00
Jiri Denemark
82cffb58a1 Use virDomainDiskByName where appropriate
Most virDomainDiskIndexByName callers do not care about the index; what
they really want is a disk def pointer.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-21 14:35:02 +02:00
Jiri Denemark
a692277873 qemu: Don't give up on first error in qemuMigrationCancelDriverMirror
When cancelling drive mirror, always try to do that for all disks even
if it fails for some of them. Report the first error we saw.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-15 08:05:31 +02:00
Jiri Denemark
5139924b8d qemu: Keep track of what disks are being migrated
Instead of redoing the same filtering over and over everytime we need to
walk through all disks which are being migrated.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-15 08:05:31 +02:00
Jiri Denemark
46a7a49535 Move QEMU-only fields from virDomainDiskDef into privateData
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-15 08:05:31 +02:00
Jiri Denemark
078717e151 Rename virDomainHasBlockjob as qemuDomainHasBlockjob
And move it to qemu_domain.[ch] because this API is QEMU-only.

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-15 08:05:26 +02:00
zhang bo
7eb5b4bf6f qemuMigrationPrepareAny: Drop useless variable @now
As of eeb008dbfc the variable is not used anymore. Drop it.

Signed-off-by: Wang Yufei <james.wangyufei@huawei.com>
Signed-off-by: Zhang Bo <oscar.zhangbo@huawei.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-05-13 16:50:20 +02:00
Jiri Denemark
fc3601a308 qemu: Properly rename persistent def after migration
When migrating a domain while changing its name and using
VIR_MIGRATE_PERSIST_DEST flag, libvirt would fail to properly change the
name in the persistent definition. The inconsistency results in weird
behavior when dumping domain XML, destroying the domain, restarting
libvirtd and likely in several other situations.

Since the new name is already stored in vm->def->name, we just need to
make sure the persistent definition uses this new name too.

https://bugzilla.redhat.com/show_bug.cgi?id=1076354

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-04 22:59:51 +02:00
Jiri Denemark
b45ec56f58 qemu: Forbid unsupported parameters for tunnelled migration
Neither migrate URI nor lister address make any sense for tunnelled
migration.

https://bugzilla.redhat.com/show_bug.cgi?id=1066375
https://bugzilla.redhat.com/show_bug.cgi?id=1073233

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
2015-05-04 15:06:33 +02:00
Michael Chapman
99725f946c qemu: migration: use sync block job helpers
In qemuMigrationDriveMirror we can start all disk mirrors in parallel.
We wait until they are all ready, or one of them aborts.

In qemuMigrationCancelDriveMirror, we wait until all mirrors are
properly stopped. This is necessary to ensure that destination VM is
fully in sync with the (paused) source VM.

If a drive mirror can not be cancelled, then the destination is not in a
consistent state. In this case it is not safe to continue with the
migration.

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2015-04-29 13:11:42 +02:00
Jiri Denemark
aa9f139599 migration: Usable time statistics without requiring NTP
virDomainGetJobStats is able to report statistics of a completed
migration, however to get usable downtime and total time statistics both
hosts have to keep synchronized time. To provide at least some
estimation of the times even when NTP daemons are not running on both
hosts we can just ignore the time needed to transfer a migration cookie
to the destination host. The result will be also inaccurate but a bit
more predictable. The total/down time will just be at least what we
report.

https://bugzilla.redhat.com/show_bug.cgi?id=1213434
2015-04-24 15:02:00 +02:00
Michal Privoznik
79d14a9930 Introduce virDomainObjEndAPI
This is basically turning qemuDomObjEndAPI into a more general
function. Other drivers which gets a reference to domain objects may
benefit from this function too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-04-24 13:22:45 +02:00
Peter Krempa
bd57977391 qemu: migration: Refactor hostdev validation in migration check
The hostdev check can error out right away.
2015-04-22 14:05:50 +02:00
Cole Robinson
835cf84b7e domain: conf: Drop expectedVirtTypes
This needs to specified in way too many places for a simple validation
check. The ostype/arch/virttype validation checks later in
DomainDefParseXML should catch most of the cases that this was covering.
2015-04-20 16:43:43 -04:00
Huanle Han
c61ded8a7d qemu: fix index error when clean up vport profile
1. 'last_good_net' indicates the index of last successfully configured
net. so def->nets[last_good_net] should also be clean up if error occurs.

2. if error occurs in 'virNetDevMacVLanVPortProfileRegisterCallback'
(second 'goto err_exit' in loop), we should also do
'virNetDevVPortProfileDisassociate' cleanup for the
'virNetDevVPortProfileAssociate'(first code block in loop). So we should
consider the net is successfully configured after first code block in
loop finishes.

Signed-off-by: Huanle Han <hanxueluo@gmail.com>
2015-04-14 14:49:15 +02:00
Peter Krempa
cfc0a3d4ce qemu: blockjob: Separate qemuDomainBlockJobAbort from qemuDomainBlockJobImpl
Sacrifice a few lines of code in favor of the code being more readable.
2015-04-14 10:00:56 +02:00
Michal Privoznik
65a88572ad qemuMigrationPrecreateStorage: Fix debug message
When pre-creating storage for domains, we need to find corresponding
disk in the XML on the destination (domain XML may differ there, e.g.
disk is accessible under different path). For better debugging, I'm
printing all info I received on a disk. But there was a typo when
printing the disk capacity: "%lluu" instead of "%llu".

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-04-13 11:40:57 +02:00
Xing Lin
522e81cbb5 qemu_migration.c: sleep first before checking for migration status.
The problem with the previous implementation is,
even when qemuMigrationUpdateJobStatus() detects a migration job
has completed, it will do a sleep for 50 ms (which is unnecessary
and only adds up to the VM pause time).

Signed-off-by: Xing Lin <xinglin@cs.utah.edu>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-04-13 09:52:28 +02:00
Michael Chapman
72df8314f0 qemu_migrate: use nested job when adding NBD to cookie
qemuMigrationCookieAddNBD is usually called from within an async
MIGRATION_OUT or MIGRATION_IN job, so it needs to start a nested job.

(The one exception is during the Begin phase when change protection
isn't enabled, but qemuDomainObjEnterMonitorAsync will behave the same
as qemuDomainObjEnterMonitor in this case.)

This bug was encountered with a libvirt client that repeatedly queries
the disk mirroring block job info during a migration. If one of these
queries occurs just as the Perform migration cookie is baked, libvirt
crashes.

Relevant logs are as follows:

    6701: warning : qemuDomainObjEnterMonitorInternal:1544 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous
[1] 6701: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block","id":"libvirt-629"}
[2] 6699: info : qemuMonitorIOWrite:503 : QEMU_MONITOR_IO_WRITE: mon=0x7fefdc004700 buf={"execute":"query-block","id":"libvirt-629"}
[3] 6704: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block-jobs","id":"libvirt-630"}
[4] 6699: info : qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_REPLY: mon=0x7fefdc004700 reply={"return": [...], "id": "libvirt-629"}
    6699: error : qemuMonitorJSONIOProcessLine:211 : internal error: Unexpected JSON reply '{"return": [...], "id": "libvirt-629"}'

At [1] qemuMonitorBlockStatsUpdateCapacity sends its request, then waits
on mon->notify. At [2] the request is written out to the monitor socket.
At [3] qemuMonitorBlockJobInfo sends its request, and also waits on
mon->notify. The reply from the first request is received at [4].
However, qemuMonitorJSONIOProcessLine is not expecting this reply since
the second request hadn't completed sending. The reply is dropped and an
error is returned.

qemuMonitorIO signals mon->notify twice during its error handling,
waking up both of the threads waiting on it. One of them clears mon->msg
as it exits qemuMonitorSend; the other crashes:

  qemuMonitorSend (mon=0x7fefdc004700, msg=<value optimized out>) at qemu/qemu_monitor.c:975
  975         while (!mon->msg->finished) {
  (gdb) print mon->msg
  $1 = (qemuMonitorMessagePtr) 0x0

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2015-04-08 10:30:17 +02:00
Michael Chapman
e5d729ba42 qemu: fix race between disk mirror fail and cancel
If a VM migration is aborted, a disk mirror may be failed by QEMU before
libvirt has a chance to cancel it. The disk->mirrorState remains at
_ABORT in this case, and this breaks subsequent mirrorings of that disk.

We should instead check the mirrorState directly and transition to _NONE
if it is already aborted. Do the check *after* aborting the block job in
QEMU to avoid a race.

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2015-04-08 09:45:47 +02:00
Michael Chapman
77ddd0bba2 qemu: fix error propagation in qemuMigrationBegin
If virCloseCallbacksSet fails, qemuMigrationBegin must return NULL to
indicate an error occurred.

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2015-04-08 09:45:47 +02:00
Shanzhi Yu
ffe3d3e886 conf: Rename virDomainHasDiskMirror and detect block jobs properly
virDomainHasDiskMirror() currently detects only jobs that add the mirror
elements. Since some operations like migration are interlocked by
existing block jobs on the given domain the check needs to be
instrumented to check regular jobs too.

This patch renames virDomainHasDiskMirror to virDomainHasDiskBlockjob
and adds an argument that allows to select that it returns true only for
block copy jobs as those interlock making the domain persistent.

Other two uses trigger on any block job type.

Signed-off-by: Shanzhi Yu <shyu@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
2015-04-02 10:37:47 +02:00
Peter Krempa
c5710066e8 qemu: migration: Forbid migration with memory modules lacking info
Make sure that libvirt has all vital information needed to reliably
represent configuration of guest's memory devices in case of a
migration.

This patch forbids migration in case the required slot number and module
base address are not present (failed to be loaded from qemu via
monitor).
2015-03-23 14:25:15 +01:00
Michael Chapman
a1b1805155 qemu: skip precreation of network disks
Commit cf54c60699 introduced the ability
to create missing storage volumes during migration. For network disks,
however, we may not necessarily be able to detect whether they already
exist -- there is no straight-forward way to map the disk to a storage
volume, and even if there were it's possible no configured storage pool
actually contains the disk.

It is better to assume the network disk exists in this case, rather than
aborting the migration completely. If the volume really is missing, QEMU
will generate an appropriate error later in the migration.

Signed-off-by: Michael Chapman <mike@very.puzzling.org>
2015-03-23 10:25:20 +01:00
Eric Blake
e2660cb8a6 qemu: track 'cancelling' migration state
In qemu 2.3, the migration status will include 'cancelling' in the
window between when an asynchronous cancel has been requested and
when the migration is actually halted.  Previously, qemu hid this
state and reported 'active'.  Libvirt manages the sequence okay
even when the string is unrecognized (that is, it will report an
unknown state:

Migration: [ 69 %]^Cerror: internal error: unexpected migration status in cancelling.

but the migration is still cancelled), but recognizing the string
makes for a smoother user experience.

* src/qemu/qemu_monitor.h
(QEMU_MONITOR_MIGRATION_STATUS_CANCELLING): Add enum.
* src/qemu/qemu_monitor.c (qemuMonitorMigrationStatus): Map it.
* src/qemu/qemu_migration.c (qemuMigrationUpdateJobStatus): Adjust
clients.
* src/qemu/qemu_monitor_json.c
(qemuMonitorJSONGetMigrationStatusReply): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2015-03-18 14:59:34 -06:00
Laine Stump
451547a422 util: clean up #includes of virnetdevopenvswitch.h
virnetdevopenvswitch.h declares a few functions that can be called to
add ports to and remove them from OVS bridges, and retrieve the
migration data for a port. It does not contain any data definitions
that are used by domain_conf.h. But for some reason, domain_conf.h
virnetdevopenvswitch.h should be directly #including it. This adds a
few lines to the project, but saves all the files that don't need it
from the extra computing, and makes the dependencies more clear cut.
2015-03-18 14:43:47 -04:00
Pavel Hrdina
cf521fc8ba memtune: change the way how we store unlimited value
There was a mess in the way how we store unlimited value for memory
limits and how we handled values provided by user.  Internally there
were two possible ways how to store unlimited value: as 0 value or as
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED.  Because we chose to store memory
limits as unsigned long long, we cannot use -1 to represent unlimited.
It's much easier for us to say that everything greater than
VIR_DOMAIN_MEMORY_PARAM_UNLIMITED means unlimited and leave 0 as valid
value despite that it makes no sense to set limit to 0.

Remove unnecessary function virCompareLimitUlong.  The update of test
is to prevent the 0 to be miss-used as unlimited in future.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1146539

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2015-03-06 11:52:24 +01:00
Michal Privoznik
37cf163ab2 virQEMUCapsCacheLookupCopy: Pass machine type
It will come handy in the near future when we will filter some
capabilities based on it.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-02-20 13:27:59 +01:00
Michal Privoznik
80c5f10e86 qemuMigrationDriveMirror: Listen to events
https://bugzilla.redhat.com/show_bug.cgi?id=1179678

When migrating with storage, libvirt iterates over domain disks and
instruct qemu to migrate the ones we are interested in (shared, RO and
source-less disks are skipped). The disks are migrated in series. No
new disk is transferred until the previous one hasn't been quiesced.
This is checked on the qemu monitor via 'query-jobs' command. If the
disk has been quiesced, it practically went from copying its content
to mirroring state, where all disk writes are mirrored to the other
side of migration too. Having said that, there's one inherent error in
the design. The monitor command we use reports only active jobs. So if
the job fails for whatever reason, we will not see it anymore in the
command output. And this can happen fairly simply: just try to migrate
a domain with storage. If the storage migration fails (e.g. due to
ENOSPC on the destination) we resume the host on the destination and
let it run on partly copied disk.

The proper fix is what even the comment in the code says: listen for
qemu events instead of polling. If storage migration changes state an
event is emitted and we can act accordingly: either consider disk
copied and continue the process, or consider disk mangled and abort
the migration.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-02-19 14:12:38 +01:00
Luyao Huang
45853b5289 qemu: fix crash when migrateuri has no scheme
https://bugzilla.redhat.com/show_bug.cgi?id=1191355

When we attempt to migrate a vm with a migrateuri that has no scheme:

 # virsh migrate test4 --live qemu+ssh://lhuang/system --migrateuri 127.0.0.1

target libvirtd will crash because uri->scheme is NULL in
qemuMigrationPrepareDirect on this line:

     if (STRNEQ(uri->scheme, "tcp") &&

Add a value check before this line. Also fix a bug like this in
doNativeMigrate, that could only happen when destination libvirtd
returned an incorrect URI.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-02-11 13:20:30 +01:00
Luyao Huang
1b2c9ce752 qemu: Properly report error on uuid mismatch in the migration cookie
Add the missing jump to the error label when the uuid in the
migration cookie XML does not match the uuid of the migrated
domain.

Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2015-02-05 08:14:36 +01:00
Ján Tomko
5c703ca396 Always check return value of qemuDomainObjExitMonitor
Depending on the context, either error out if the domain
has disappeared in the meantime, or just ignore the value
to allow marking the function as ATTRIBUTE_RETURN_CHECK.
2015-01-19 10:12:32 +01:00