mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2024-11-06 13:20:20 +00:00
304 lines
16 KiB
HTML
304 lines
16 KiB
HTML
|
<?xml version="1.0" encoding="UTF-8"?>
|
||
|
<!DOCTYPE html>
|
||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
|
<body>
|
||
|
|
||
|
<h1>Domain state capture using Libvirt</h1>
|
||
|
|
||
|
<ul id="toc"></ul>
|
||
|
|
||
|
<p>
|
||
|
In order to aid application developers to choose which
|
||
|
operations best suit their needs, this page compares the
|
||
|
different means for capturing state related to a domain managed
|
||
|
by libvirt.
|
||
|
</p>
|
||
|
|
||
|
<p>
|
||
|
The information here is primarily geared towards capturing the
|
||
|
state of an active domain. Capturing the state of an inactive
|
||
|
domain essentially amounts to copying the contents of guest
|
||
|
disks, followed by a fresh boot of the same domain configuration
|
||
|
with disks restored back to that saved state.
|
||
|
</p>
|
||
|
|
||
|
<h2><a id="definitions">State capture trade-offs</a></h2>
|
||
|
|
||
|
<p>One of the features made possible with virtual machines is live
|
||
|
migration -- transferring all state related to the guest from
|
||
|
one host to another with minimal interruption to the guest's
|
||
|
activity. In this case, state includes domain memory (including
|
||
|
register and device contents), and domain storage (whether the
|
||
|
guest's view of the disks are backed by local storage on the
|
||
|
host, or by the hypervisor accessing shared storage over a
|
||
|
network). A clever observer will then note that if all state is
|
||
|
available for live migration, then there is nothing stopping a
|
||
|
user from saving some or all of that state at a given point of
|
||
|
time in order to be able to later rewind guest execution back to
|
||
|
the state it previously had. The astute reader will also realize
|
||
|
that state capture at any level requires that the data must be
|
||
|
stored and managed by some mechanism. This processing might fit
|
||
|
in a single file, or more likely require a chain of related
|
||
|
files, and may require synchronization with third-party tools
|
||
|
built around managing the amount of data resulting from
|
||
|
capturing the state of multiple guests that each use multiple
|
||
|
disks.
|
||
|
</p>
|
||
|
|
||
|
<p>
|
||
|
There are several libvirt APIs associated with capturing the
|
||
|
state of a guest, which can later be used to rewind that guest
|
||
|
to the conditions it was in earlier. The following is a list of
|
||
|
trade-offs and differences between the various facets that
|
||
|
affect capturing domain state for active domains:
|
||
|
</p>
|
||
|
|
||
|
<dl>
|
||
|
<dt>Duration</dt>
|
||
|
<dd>Capturing state can be a lengthy process, so while the
|
||
|
captured state ideally represents an atomic point in time
|
||
|
corresponding to something the guest was actually executing,
|
||
|
capturing state tends to focus on minimizing guest downtime
|
||
|
while performing the rest of the state capture in parallel
|
||
|
with guest execution. Some interfaces require up-front
|
||
|
preparation (the state captured is not complete until the API
|
||
|
ends, which may be some time after the command was first
|
||
|
started), while other interfaces track the state when the
|
||
|
command was first issued, regardless of the time spent in
|
||
|
capturing the rest of the state. Also, time spent in state
|
||
|
capture may be longer than the time required for live
|
||
|
migration, when state must be duplicated rather than shared.
|
||
|
</dd>
|
||
|
|
||
|
<dt>Amount of state</dt>
|
||
|
<dd>For an online guest, there is a choice between capturing the
|
||
|
guest's memory (all that is needed during live migration when
|
||
|
the storage is already shared between source and destination),
|
||
|
the guest's disk state (all that is needed if there are no
|
||
|
pending guest I/O transactions that would be lost without the
|
||
|
corresponding memory state), or both together. Reverting to
|
||
|
partial state may still be viable, but typically, booting from
|
||
|
captured disk state without corresponding memory is comparable
|
||
|
to rebooting a machine that had power cut before I/O could be
|
||
|
flushed. Guests may need to use proper journaling methods to
|
||
|
avoid problems when booting from partial state.
|
||
|
</dd>
|
||
|
|
||
|
<dt>Quiescing of data</dt>
|
||
|
<dd>Even if a guest has no pending I/O, capturing disk state may
|
||
|
catch the guest at a time when the contents of the disk are
|
||
|
inconsistent. Cooperating with the guest to perform data
|
||
|
quiescing is an optional step to ensure that captured disk
|
||
|
state is fully consistent without requiring additional memory
|
||
|
state, rather than just crash-consistent. But guest
|
||
|
cooperation may also have time constraints, where the guest
|
||
|
can rightfully panic if there is too much downtime while I/O
|
||
|
is frozen.
|
||
|
</dd>
|
||
|
|
||
|
<dt>Quantity of files</dt>
|
||
|
<dd>When capturing state, some approaches store all state within
|
||
|
the same file (internal), while others expand a chain of
|
||
|
related files that must be used together (external), for more
|
||
|
files that a management application must track.
|
||
|
</dd>
|
||
|
|
||
|
<dt>Impact to guest definition</dt>
|
||
|
<dd>Capturing state may require temporary changes to the guest
|
||
|
definition, such as associating new files into the domain
|
||
|
definition. While state capture should never impact the
|
||
|
running guest, a change to the domain's active XML may have
|
||
|
impact on other host operations being performed on the domain.
|
||
|
</dd>
|
||
|
|
||
|
<dt>Third-party integration</dt>
|
||
|
<dd>When capturing state, there are tradeoffs to how much of the
|
||
|
process must be done directly by the hypervisor, and how much
|
||
|
can be off-loaded to third-party software. Since capturing
|
||
|
state is not instantaneous, it is essential that any
|
||
|
third-party integration see consistent data even if the
|
||
|
running guest continues to modify that data after the point in
|
||
|
time of the capture.</dd>
|
||
|
|
||
|
<dt>Full vs. incremental</dt>
|
||
|
<dd>When periodically repeating the action of state capture, it
|
||
|
is useful to minimize the amount of state that must be
|
||
|
captured by exploiting the relation to a previous capture,
|
||
|
such as focusing only on the portions of the disk that the
|
||
|
guest has modified in the meantime. Some approaches are able
|
||
|
to take advantage of checkpoints to provide an incremental
|
||
|
backup, while others are only capable of a full backup even if
|
||
|
that means re-capturing unchanged portions of the disk.</dd>
|
||
|
|
||
|
<dt>Local vs. remote</dt>
|
||
|
<dd>Domains that completely use remote storage may only need
|
||
|
some mechanism to keep track of guest memory state while using
|
||
|
external means to manage storage. Still, hypervisor and guest
|
||
|
cooperation to ensure points in time when no I/O is in flight
|
||
|
across the network can be important for properly capturing
|
||
|
disk state.</dd>
|
||
|
|
||
|
<dt>Network latency</dt>
|
||
|
<dd>Whether it's domain storage or saving domain state into
|
||
|
remote storage, network latency has an impact on snapshot
|
||
|
data. Having dedicated network capacity, bandwidth, or quality
|
||
|
of service levels may play a role, as well as planning for how
|
||
|
much of the backup process needs to be local.</dd>
|
||
|
</dl>
|
||
|
|
||
|
<p>
|
||
|
An example of the various facets in action is migration of a
|
||
|
running guest. In order for the guest to be able to resume on
|
||
|
the destination at the same place it left off at the source, the
|
||
|
hypervisor has to get to a point where execution on the source
|
||
|
is stopped, the last remaining changes occurring since the
|
||
|
migration started are then transferred, and the guest is started
|
||
|
on the target. The management software thus must keep track of
|
||
|
the starting point and any changes since the starting
|
||
|
point. These last changes are often referred to as dirty page
|
||
|
tracking or dirty disk block bitmaps. At some point in time
|
||
|
during the migration, the management software must freeze the
|
||
|
source guest, transfer the dirty data, and then start the guest
|
||
|
on the target. This period of time must be minimal. To minimize
|
||
|
overall migration time, one is advised to use a dedicated
|
||
|
network connection with a high quality of service. Alternatively
|
||
|
saving the current state of the running guest can just be a
|
||
|
point in time type operation which doesn't require updating the
|
||
|
"last vestiges" of state prior to writing out the saved state
|
||
|
file. The state file is the point in time of whatever is current
|
||
|
and may contain incomplete data which if used to restart the
|
||
|
guest could cause confusion or problems because some operation
|
||
|
wasn't completed depending upon where in time the operation was
|
||
|
commenced.
|
||
|
</p>
|
||
|
|
||
|
<h2><a id="apis">State capture APIs</a></h2>
|
||
|
<p>With those definitions, the following libvirt APIs related to
|
||
|
state capture have these properties:</p>
|
||
|
<dl>
|
||
|
<dt><a href="html/libvirt-libvirt-domain.html#virDomainManagedSave"><code>virDomainManagedSave</code></a></dt>
|
||
|
<dd>This API saves guest memory, with libvirt managing all of
|
||
|
the saved state, then stops the guest. While stopped, the
|
||
|
disks can be copied by a third party. However, since any
|
||
|
subsequent restart of the guest by libvirt API will restore
|
||
|
the memory state (which typically only works if the disk state
|
||
|
is unchanged in the meantime), and since it is not possible to
|
||
|
get at the memory state that libvirt is managing, this is not
|
||
|
viable as a means for rolling back to earlier saved states,
|
||
|
but is rather more suited to situations such as suspending a
|
||
|
guest prior to rebooting the host in order to resume the guest
|
||
|
when the host is back up. This API also has a drawback of
|
||
|
potentially long guest downtime, and therefore does not lend
|
||
|
itself well to live backups.</dd>
|
||
|
|
||
|
<dt><a href="html/libvirt-libvirt-domain.html#virDomainSave"><code>virDomainSave</code></a></dt>
|
||
|
<dd>This API is similar to virDomainManagedSave(), but moves the
|
||
|
burden on managing the stored memory state to the user. As
|
||
|
such, the user can now couple saved state with copies of the
|
||
|
disks to perform a revert to an arbitrary earlier saved state.
|
||
|
However, changing who manages the memory state does not change
|
||
|
the drawback of potentially long guest downtime when capturing
|
||
|
state.</dd>
|
||
|
|
||
|
<dt><a href="html/libvirt-libvirt-domain-snapshot.html#virDomainSnapshotCreateXML"><code>virDomainSnapshotCreateXML</code></a></dt>
|
||
|
<dd>This API wraps several approaches for capturing guest state,
|
||
|
with a general premise of creating a snapshot (where the
|
||
|
current guest resources are frozen in time and a new wrapper
|
||
|
layer is opened for tracking subsequent guest changes). It
|
||
|
can operate on both offline and running guests, can choose
|
||
|
whether to capture the state of memory, disk, or both when
|
||
|
used on a running guest, and can choose between internal and
|
||
|
external storage for captured state. However, it is geared
|
||
|
towards post-event captures (when capturing both memory and
|
||
|
disk state, the disk state is not captured until all memory
|
||
|
state has been collected first). Using QEMU as the
|
||
|
hypervisor, internal snapshots currently have lengthy downtime
|
||
|
that is incompatible with freezing guest I/O, but external
|
||
|
snapshots are quick when memory contents are not also saved.
|
||
|
Since creating an external snapshot changes which disk image
|
||
|
resource is in use by the guest, this API can be coupled
|
||
|
with <a href="html/libvirt-libvirt-domain.html#virDomainBlockCommit"><code>virDomainBlockCommit()</code></a>
|
||
|
to restore things back to the guest using its original disk
|
||
|
image, where a third-party tool can read the backing file
|
||
|
prior to the live commit. See also
|
||
|
the <a href="formatsnapshot.html">XML details</a> used with
|
||
|
this command.</dd>
|
||
|
|
||
|
<dt><a href="html/libvirt-libvirt-domain.html#virDomainFSFreeze"><code>virDomainFSFreeze</code></a>, <a href="html/libvirt-libvirt-domain.html#virDomainFSThaw"><code>virDomainFSThaw</code></a></dt>
|
||
|
<dd>This pair of APIs does not directly capture guest state, but
|
||
|
can be used to coordinate with a trusted live guest that state
|
||
|
capture is about to happen, and therefore guest I/O should be
|
||
|
quiesced so that the state capture is fully consistent, rather
|
||
|
than merely crash consistent. Some APIs are able to
|
||
|
automatically perform a freeze and thaw via a flags parameter,
|
||
|
rather than having to make separate calls to these
|
||
|
functions. Also, note that freezing guest I/O is only possible
|
||
|
with trusted guests running a guest agent, and that some
|
||
|
guests place maximum time limits on how long I/O can be
|
||
|
frozen.</dd>
|
||
|
|
||
|
<dt><a href="html/libvirt-libvirt-domain-checkpoint.html#virDomainCheckpointCreateXML"><code>virDomainCheckpointCreateXML</code></a></dt>
|
||
|
<dd>This API does not actually capture guest state, rather it
|
||
|
makes it possible to track which portions of guest disks have
|
||
|
changed between a checkpoint and the current live execution of
|
||
|
the guest. However, while it is possible use this API to
|
||
|
create checkpoints in isolation, it is more typical to create
|
||
|
a checkpoint as a side-effect of starting a new incremental
|
||
|
backup with <code>virDomainBackupBegin()</code> or at the
|
||
|
creation of an external snapshot
|
||
|
with <code>virDomainSnapshotCreateXML2()</code>, since a
|
||
|
second incremental backup is most useful when using the
|
||
|
checkpoint created during the first. See also
|
||
|
the <a href="formatcheckpoint.html">XML details</a> used with
|
||
|
this command.</dd>
|
||
|
|
||
|
<dt><a href="html/libvirt-libvirt-domain.html#virDomainBackupBegin"><code>virDomainBackupBegin</code></a>, <a href="html/libvirt-libvirt-domain.html#virDomainBackupEnd"><code>virDomainBackupEnd</code></a></dt>
|
||
|
<dd>This API wraps approaches for capturing the state of disks
|
||
|
of a running guest, but does not track accompanying guest
|
||
|
memory state. The capture is consistent to the start of the
|
||
|
operation, where the captured state is stored independently
|
||
|
from the disk image in use with the guest and where it can be
|
||
|
easily integrated with a third-party for capturing the disk
|
||
|
state. Since the backup operation is stored externally from
|
||
|
the guest resources, there is no need to commit data back in
|
||
|
at the completion of the operation. When coupled with
|
||
|
checkpoints, this can be used to capture incremental backups
|
||
|
instead of full.</dd>
|
||
|
</dl>
|
||
|
|
||
|
<h2><a id="examples">Examples</a></h2>
|
||
|
<p>The following two sequences both accomplish the task of
|
||
|
capturing the disk state of a running guest, then wrapping
|
||
|
things up so that the guest is still running with the same file
|
||
|
as its disk image as before the sequence of operations began.
|
||
|
The difference between the two sequences boils down to the
|
||
|
impact of an unexpected interruption made at any point in the
|
||
|
middle of the sequence: with such an interruption, the first
|
||
|
example leaves the guest tied to a temporary wrapper file rather
|
||
|
than the original disk, and requires manual clean up of the
|
||
|
domain definition; while the second example has no impact to the
|
||
|
domain definition.</p>
|
||
|
|
||
|
<p>1. Backup via temporary snapshot
|
||
|
<pre>
|
||
|
virDomainFSFreeze()
|
||
|
virDomainSnapshotCreateXML(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)
|
||
|
virDomainFSThaw()
|
||
|
third-party copy the backing file to backup storage # most time spent here
|
||
|
virDomainBlockCommit(VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) per disk
|
||
|
wait for commit ready event per disk
|
||
|
virDomainBlockJobAbort() per disk
|
||
|
</pre></p>
|
||
|
|
||
|
<p>2. Direct backup
|
||
|
<pre>
|
||
|
virDomainFSFreeze()
|
||
|
virDomainBackupBegin()
|
||
|
virDomainFSThaw()
|
||
|
wait for push mode event, or pull data over NBD # most time spent here
|
||
|
virDomainBackupEnd()
|
||
|
</pre></p>
|
||
|
|
||
|
</body>
|
||
|
</html>
|