Calling qemuProcessStop without a job opens a way to race conditions
with qemuDomainObjExitMonitor called in another thread. A real world
example of such a race condition:
- migration thread (A) calls qemuMigrationWaitForSpice
- another thread (B) starts processing qemuDomainAbortJob API
- thread B signals thread A via qemuDomainObjAbortAsyncJob
- thread B enters monitor (qemuDomainObjEnterMonitor)
- thread B calls qemuMonitorSend
- thread A awakens and calls qemuProcessStop
- thread A calls qemuMonitorClose and sets priv->mon to NULL
- thread B calls qemuDomainObjExitMonitor with priv->mon == NULL
=> monitor stays ref'ed and locked
Depending on how lucky we are, the race may result in a memory leak or
it can even deadlock libvirtd's event loop if it tries to lock the
monitor to process an event received before qemuMonitorClose was called.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
After removing capability check for fd migration the code that was left
behind didn't make quite sense. The old exec migration would be used in
case when pipe() failed. Remove the old code and make failure of pipe()
a hard error.
This additionally removes usage of virCgroupAllowDevicePath outside of
qemu_cgroup.c.
https://bugzilla.redhat.com/show_bug.cgi?id=1293351
Since we already have virtio channel events, we know when guest
agent within guest has (dis-)connected. Instead of us blindly
connecting to a socket that no one is listening to, we can just
follow what qemu-ga does. This has a nice benefit that we don't
need to 'guest-ping' the agent just to timeout and find out
nobody is listening.
The way that this commit is implemented:
- don't connect in qemuProcessLaunch directly, defer that to event
callback (which already follows the agent) -
processSerialChangedEvent
- after migration is settled, before we resume vCPUs, ask qemu
whether somebody is listening on the socket and if so, connect
to it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
When starting a qemu process there are certain checks done to ensure
that the configuration makes sense. Extract them into a separate
function so that they can be reused in the test code.
The virDomainObjFormat and virDomainSaveStatus methods
both call into virDomainDefFormat, so should be providing
a non-NULL virCapsPtr instance.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
virDomainSaveConfig calls virDomainDefFormat which was setting the caps
to NULL, thus keeping the old behaviour (i.e. not looking at
netprefix). This patch adds the virCapsPtr to the function and allows
the configuration to be saved and skipping interface names that were
registered with virCapabilitiesSetNetPrefix().
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
This function may be called with @dconnuri == NULL, e.g. from
virDomainMigrateToURI3() if the flags are missing
VIR_MIGRATE_PEER2PEER flag. Moreover, all later functions called
from here do wrap it into NULLSTR() so why not do the same here?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
memory_dirty_rate corresponds to dirty-pages-rate in QEMU and
memory_iteration is what QEMU reports in dirty-sync-count.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The structure actually contains migration statistics rather than just
the status as the name suggests. Renaming it as
qemuMonitorMigrationStats removes the confusion.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
My commit 674afcb09e3d33500cfbbcf870ebf92cb99ecfa3 moved computing the
default listen address from qemuMigrationPrepareAny to
qemuMigrationPrepareIncoming. However, I didn't notice listenAddress was
later passed to qemuMigrationStartNBDServer. Thus, it would be called
with the original value of listenAddress (NULL).
Let's add the updated listen address to qemuProcessIncomingDef and use
it when starting NBD servers.
Reported-by: Michael Chapman <mike@very.puzzling.org>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Valgrind complained:
==18990== 20 (16 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 188 of 996
==18990== at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==18990== by 0x5292E9B: virAllocN (viralloc.c:191)
==18990== by 0x2221E731: qemuMigrationCookieXMLParseStr (qemu_migration.c:1012)
==18990== by 0x2221F390: qemuMigrationEatCookie (qemu_migration.c:1413)
==18990== by 0x222228CE: qemuMigrationPrepareAny (qemu_migration.c:3463)
==18990== by 0x22224121: qemuMigrationPrepareDirect (qemu_migration.c:3865)
==18990== by 0x22251C25: qemuDomainMigratePrepare3Params (qemu_driver.c:12414)
==18990== by 0x5389EE0: virDomainMigratePrepare3Params (libvirt-domain.c:5107)
==18990== by 0x1278DB: remoteDispatchDomainMigratePrepare3ParamsHelper (remote.c:5425)
==18990== by 0x53FF287: virNetServerProgramDispatch (virnetserverprogram.c:437)
==18990== by 0x540523D: virNetServerProcessMsg (virnetserver.c:135)
==18990== by 0x54052C7: virNetServerHandleJob (virnetserver.c:156)
==18990==
==18990== 20 (16 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 189 of 996
==18990== at 0x4A057BB: calloc (vg_replace_malloc.c:593)
==18990== by 0x5292E9B: virAllocN (viralloc.c:191)
==18990== by 0x2221E731: qemuMigrationCookieXMLParseStr (qemu_migration.c:1012)
==18990== by 0x2221F390: qemuMigrationEatCookie (qemu_migration.c:1413)
==18990== by 0x222249D2: qemuMigrationRun (qemu_migration.c:4395)
==18990== by 0x22226365: doNativeMigrate (qemu_migration.c:4693)
==18990== by 0x22228E45: qemuMigrationPerform (qemu_migration.c:5553)
==18990== by 0x2225144B: qemuDomainMigratePerform3Params (qemu_driver.c:12621)
==18990== by 0x539F5D8: virDomainMigratePerform3Params (libvirt-domain.c:5206)
==18990== by 0x127305: remoteDispatchDomainMigratePerform3ParamsHelper (remote.c:5557)
==18990== by 0x53FF287: virNetServerProgramDispatch (virnetserverprogram.c:437)
==18990== by 0x540523D: virNetServerProcessMsg (virnetserver.c:135)
If we're replacing the NBD data, it's simplest to free the old object
(including the disk list) and allocate a new one.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Currently the QEMU monitor is given an FD to the logfile. This
won't work in the future with virtlogd, so it needs to use the
qemuDomainLogContextPtr instead, but it shouldn't directly
access that object either. So define a callback that the
monitor can use for reporting errors from the log file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are two pretty similar functions qemuProcessReadLog and
qemuProcessReadChildErrors. Both read from the QEMU log file
and try to strip out libvirt messages. The latter then reports
an error, while the former lets the callers report an error.
Re-write qemuProcessReadLog so that it uses a single read
into a dynamically allocated buffer. Then introduce a new
qemuProcessReportLogError that calls qemuProcessReadLog
and reports an error.
Convert all callers to use qemuProcessReportLogError.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Using qemuProcess{Init,Launch,FinishStartup} allows us to run
pre-migration commands on destination before asking QEMU to wait for
incoming migration data.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
NBD storage migration will not work with offline migration anyway and we
already checked that the user did not ask for it. Thus it doesn't make
sense to keep the code after 'done' label where we jump in case of
offline migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Some failure paths in qemuMigrationPrepareAny forgot to kill the just
started QEMU process. This patch fixes this by combining 'stop' and
'endjob' label into a new label 'stopjob'. This name was chosen to avoid
confusion with the most common semantics of 'endjob'. Normally, 'endjob'
is always called at the end of an API to stop the job we entered at the
beginning. In qemuMigrationPrepareAny we only want to stop the job in
failure path; on success we need to carry the job over to the Finish
phase.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Traditionally, we pass incoming migration URI on QEMU command line,
which has some drawbacks. Depending on the URI QEMU may initialize its
migration state immediately without giving us a chance to set any
additional migration parameters (this applies mainly for fd: URIs). For
some URIs the monitor may be completely blocked from the beginning until
migration is finished, which means we may be stuck in qmp_capabilities
command without being able to send any QMP commands.
QEMU solved this by introducing "defer" parameter for -incoming command
line option. This will tell QEMU to prepare for an incoming migration
while the actual incoming URI is sent using migrate-incoming QMP
command. Before calling this command we can normally talk to the
monitor and even set any migration parameters which will be honored by
the incoming migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Since we require QEMU 0.12.0, we can assume that QEMU supports
all of the fd, tcp, unix and exec migration protocols.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code reported that a migration flag is unsupported but didn't jump
to the error label. Probably an oversight in commit f88af9dc that
introduced the flag checking.
Since the flag was not enabled when 'eating' the migration cookie,
libvirt reported a bogus error when memory hotplug was enabled:
unsupported migration cookie feature memory-hotplug
The error was ignored though due to a bug in the code so it slipped
through testing.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1278404
In commit f41be296, we moved vm->persistent check into
qemuDomainRemoveInactive, but we didn't change the vm->persistent
before call qemuDomainRemoveInactive in some place before and just
call it to remove the inactive vm.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Tunnelled migration can hang if the destination qemu exits despite all the
ABI checks. This happens whenever the destination qemu exits before the
complete transfer is noticed by source qemu. The savevm state checks at
runtime can fail at destination and cause qemu to error out.
The source qemu cant notice it as the EPIPE is not propogated to it.
The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
never get to 100% transfer completion.
The qemuMigrationWaitForCompletion() never breaks out as well since
the ssh connection to destination is healthy, and the source qemu also thinks
the migration is ongoing as the Fd to which it transfers, is never
closed or broken. So, the migration will hang forever. Even Ctrl-C on the
virsh migrate wouldn't be honoured. Close the source side FD when there is
an error in the stream. That way, the source qemu updates itself and
qemuMigrationWaitForCompletion() notices the failure.
Close the FD for all kinds of errors to be sure. The error message is not
copied for EPIPE so that the destination error is copied instead later.
Note:
Reproducible with repeated migrations between Power hosts running in different
subcores-per-core modes.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
qemuMigrationIsAllowed would disallow offline migration if the VM
contained host devices or memory modules. Since during offline migration
we don't transfer any state we can safely migrate VMs with such
configuration.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1265049
Use the migration @flags for checking various migration aspects rather
than picking them out as booleans. Document the new semantics in the
function header.
Now that qemuMigrationIsAllowed is always called with @vm, we can drop
the @def argument and simplify the control flow.
Additionally the comment is invalid so drop it.
Extract the hostdev check from qemuMigrationIsAllowed into a separate
function since that is the only part that needs to be done in the v2
migration protocol prepare phase on the destination. All other checks
were added when the v3 protocol existed so they don't need to be
extracted.
This change will allow to drop the @def argument for
qemuMigrationIsAllowed and further simplify the function.
Even though QEMU on the source host reports completed migration and thus
we move to the Finish phase, QEMU on the destination host may still be
processing migration data. Thus before we can start guest CPUs on the
destination, we have to wait for a completed migration event.
https://bugzilla.redhat.com/show_bug.cgi?id=1265902
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
With new QEMU which supports migration events,
qemuMigrationCheckJobStatus needs to explicitly query QEMU for migration
statistics once migration is completed to make sure the caller sees
up-to-date statistics with both old and new QEMU. However, some callers
are not interested in the statistics at all and once we start waiting
for a completed migration on the destination host too, checking the
statistics would even fail. Let's push the decision whether to update
the statistics or not to the caller.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The function already has two bool parameters and we will need to add a
new one. Let's switch to flags to make the callers readable.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The destination host gets detailed statistics about the current
migration form the source host via migration cookie and copies them to
the domain object so that they can be queried using
virDomainGetJobStats. However, we should only copy statistics to the
domain object when migration finished successfully.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Even if we are migrating a domain with VIR_MIGRATE_PAUSED flag set, we
should still update the total time of the migration. Updating downtime
doesn't hurt either, even though we don't actually start guest CPUs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
So far we have the following pattern occurring over and over
again:
if (!vm->persistent)
qemuDomainRemoveInactive(driver, vm);
It's safe to put the check into the function and save some LoC.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Add a new parser flag that will mark code paths that parse XML files
wich will not be used with existing VM state so that post parse
callbacks can possibly do ABI incompatible changes if needed.
When persistently migrating a domain to a destination host where the
same domain already exists (i.e., it is persistent and shutdown at the
destination), we would happily throw away the original persistent
definition without properly freeing it. And when updating the definition
fails for some reason we don't properly revert to the original state
leaving the domain broken.
In addition to fixing these issues, the patch also makes sure the domain
definition parsed from a migration cookie is either used or freed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
For quite a long time we don't need to postpone queueing events until
the end of the function since we no longer have the big driver lock.
Let's make the code of qemuMigrationFinish simpler by queuing events at
the time we generate them.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Every single call to qemuDomainEventQueue() uses the following pattern:
if (event)
qemuDomainEventQueue(driver, event);
Let's move the check for valid event to qemuDomainEventQueue and
simplify all callers.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Finish is the final state in v2 of our migration protocol. If something
fails, we have no option to abort the migration and resume the original
domain. Non fatal errors (such as failure to start guest CPUs or make
the domain persistent) has to be treated as success. Keeping the domain
running while reporting the failure was just asking for trouble.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Whenever something fails during incoming migration in Finish phase
before we started guest CPUs, we need to kill the domain in addition to
reporting the failure.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When we save status XML at the point during migration where we have
already started the domain on destination, we can't really go back and
abort migration. Thus the only thing we can do is to log a warning and
report success.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Offline migration is quite special because we don't really need to do
anything but make the domain persistent. Let's do it separately from
normal migration to avoid cluttering the code with
!(flags & VIR_MIGRATE_OFFLINE).
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>