<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>

    <h1>Domain state capture using Libvirt</h1>

    <ul id="toc"></ul>

    <p>
      In order to aid application developers to choose which
      operations best suit their needs, this page compares the
      different means for capturing state related to a domain managed
      by libvirt.
    </p>

    <p>
      The information here is primarily geared towards capturing the
      state of an active domain. Capturing the state of an inactive
      domain essentially amounts to copying the contents of guest
      disks, followed by a fresh boot of the same domain configuration
      with disks restored back to that saved state.
    </p>

    <h2><a id="definitions">State capture trade-offs</a></h2>

    <p>One of the features made possible with virtual machines is live
      migration -- transferring all state related to the guest from
      one host to another with minimal interruption to the guest's
      activity. In this case, state includes domain memory (including
      register and device contents), and domain storage (whether the
      guest's view of the disks are backed by local storage on the
      host, or by the hypervisor accessing shared storage over a
      network).  A clever observer will then note that if all state is
      available for live migration, then there is nothing stopping a
      user from saving some or all of that state at a given point of
      time in order to be able to later rewind guest execution back to
      the state it previously had. The astute reader will also realize
      that state capture at any level requires that the data must be
      stored and managed by some mechanism. This processing might fit
      in a single file, or more likely require a chain of related
      files, and may require synchronization with third-party tools
      built around managing the amount of data resulting from
      capturing the state of multiple guests that each use multiple
      disks.
    </p>

    <p>
      There are several libvirt APIs associated with capturing the
      state of a guest, which can later be used to rewind that guest
      to the conditions it was in earlier.  The following is a list of
      trade-offs and differences between the various facets that
      affect capturing domain state for active domains:
    </p>

    <dl>
      <dt>Duration</dt>
      <dd>Capturing state can be a lengthy process, so while the
        captured state ideally represents an atomic point in time
        corresponding to something the guest was actually executing,
        capturing state tends to focus on minimizing guest downtime
        while performing the rest of the state capture in parallel
        with guest execution.  Some interfaces require up-front
        preparation (the state captured is not complete until the API
        ends, which may be some time after the command was first
        started), while other interfaces track the state when the
        command was first issued, regardless of the time spent in
        capturing the rest of the state.  Also, time spent in state
        capture may be longer than the time required for live
        migration, when state must be duplicated rather than shared.
      </dd>

      <dt>Amount of state</dt>
      <dd>For an online guest, there is a choice between capturing the
        guest's memory (all that is needed during live migration when
        the storage is already shared between source and destination),
        the guest's disk state (all that is needed if there are no
        pending guest I/O transactions that would be lost without the
        corresponding memory state), or both together.  Reverting to
        partial state may still be viable, but typically, booting from
        captured disk state without corresponding memory is comparable
        to rebooting a machine that had power cut before I/O could be
        flushed. Guests may need to use proper journaling methods to
        avoid problems when booting from partial state.
      </dd>

      <dt>Quiescing of data</dt>
      <dd>Even if a guest has no pending I/O, capturing disk state may
        catch the guest at a time when the contents of the disk are
        inconsistent. Cooperating with the guest to perform data
        quiescing is an optional step to ensure that captured disk
        state is fully consistent without requiring additional memory
        state, rather than just crash-consistent.  But guest
        cooperation may also have time constraints, where the guest
        can rightfully panic if there is too much downtime while I/O
        is frozen.
      </dd>

      <dt>Quantity of files</dt>
      <dd>When capturing state, some approaches store all state within
        the same file (internal), while others expand a chain of
        related files that must be used together (external), for more
        files that a management application must track.
      </dd>

      <dt>Impact to guest definition</dt>
      <dd>Capturing state may require temporary changes to the guest
        definition, such as associating new files into the domain
        definition. While state capture should never impact the
        running guest, a change to the domain's active XML may have
        impact on other host operations being performed on the domain.
      </dd>

      <dt>Third-party integration</dt>
      <dd>When capturing state, there are tradeoffs to how much of the
        process must be done directly by the hypervisor, and how much
        can be off-loaded to third-party software.  Since capturing
        state is not instantaneous, it is essential that any
        third-party integration see consistent data even if the
        running guest continues to modify that data after the point in
        time of the capture.</dd>

      <dt>Full vs. incremental</dt>
      <dd>When periodically repeating the action of state capture, it
        is useful to minimize the amount of state that must be
        captured by exploiting the relation to a previous capture,
        such as focusing only on the portions of the disk that the
        guest has modified in the meantime.  Some approaches are able
        to take advantage of checkpoints to provide an incremental
        backup, while others are only capable of a full backup even if
        that means re-capturing unchanged portions of the disk.</dd>

      <dt>Local vs. remote</dt>
      <dd>Domains that completely use remote storage may only need
        some mechanism to keep track of guest memory state while using
        external means to manage storage. Still, hypervisor and guest
        cooperation to ensure points in time when no I/O is in flight
        across the network can be important for properly capturing
        disk state.</dd>

      <dt>Network latency</dt>
      <dd>Whether it's domain storage or saving domain state into
        remote storage, network latency has an impact on snapshot
        data. Having dedicated network capacity, bandwidth, or quality
        of service levels may play a role, as well as planning for how
        much of the backup process needs to be local.</dd>
    </dl>

    <p>
      An example of the various facets in action is migration of a
      running guest. In order for the guest to be able to resume on
      the destination at the same place it left off at the source, the
      hypervisor has to get to a point where execution on the source
      is stopped, the last remaining changes occurring since the
      migration started are then transferred, and the guest is started
      on the target. The management software thus must keep track of
      the starting point and any changes since the starting
      point. These last changes are often referred to as dirty page
      tracking or dirty disk block bitmaps. At some point in time
      during the migration, the management software must freeze the
      source guest, transfer the dirty data, and then start the guest
      on the target. This period of time must be minimal. To minimize
      overall migration time, one is advised to use a dedicated
      network connection with a high quality of service. Alternatively
      saving the current state of the running guest can just be a
      point in time type operation which doesn't require updating the
      "last vestiges" of state prior to writing out the saved state
      file. The state file is the point in time of whatever is current
      and may contain incomplete data which if used to restart the
      guest could cause confusion or problems because some operation
      wasn't completed depending upon where in time the operation was
      commenced.
    </p>

    <h2><a id="apis">State capture APIs</a></h2>
    <p>With those definitions, the following libvirt APIs related to
      state capture have these properties:</p>
    <dl>
      <dt><a href="html/libvirt-libvirt-domain.html#virDomainManagedSave"><code>virDomainManagedSave</code></a></dt>
      <dd>This API saves guest memory, with libvirt managing all of
        the saved state, then stops the guest. While stopped, the
        disks can be copied by a third party.  However, since any
        subsequent restart of the guest by libvirt API will restore
        the memory state (which typically only works if the disk state
        is unchanged in the meantime), and since it is not possible to
        get at the memory state that libvirt is managing, this is not
        viable as a means for rolling back to earlier saved states,
        but is rather more suited to situations such as suspending a
        guest prior to rebooting the host in order to resume the guest
        when the host is back up. This API also has a drawback of
        potentially long guest downtime, and therefore does not lend
        itself well to live backups.</dd>

      <dt><a href="html/libvirt-libvirt-domain.html#virDomainSave"><code>virDomainSave</code></a></dt>
      <dd>This API is similar to virDomainManagedSave(), but moves the
        burden on managing the stored memory state to the user. As
        such, the user can now couple saved state with copies of the
        disks to perform a revert to an arbitrary earlier saved state.
        However, changing who manages the memory state does not change
        the drawback of potentially long guest downtime when capturing
        state.</dd>

      <dt><a href="html/libvirt-libvirt-domain-snapshot.html#virDomainSnapshotCreateXML"><code>virDomainSnapshotCreateXML</code></a></dt>
      <dd>This API wraps several approaches for capturing guest state,
        with a general premise of creating a snapshot (where the
        current guest resources are frozen in time and a new wrapper
        layer is opened for tracking subsequent guest changes).  It
        can operate on both offline and running guests, can choose
        whether to capture the state of memory, disk, or both when
        used on a running guest, and can choose between internal and
        external storage for captured state.  However, it is geared
        towards post-event captures (when capturing both memory and
        disk state, the disk state is not captured until all memory
        state has been collected first).  Using QEMU as the
        hypervisor, internal snapshots currently have lengthy downtime
        that is incompatible with freezing guest I/O, but external
        snapshots are quick when memory contents are not also saved.
        Since creating an external snapshot changes which disk image
        resource is in use by the guest, this API can be coupled
        with <a href="html/libvirt-libvirt-domain.html#virDomainBlockCommit"><code>virDomainBlockCommit()</code></a>
        to restore things back to the guest using its original disk
        image, where a third-party tool can read the backing file
        prior to the live commit.  See also
        the <a href="formatsnapshot.html">XML details</a> used with
        this command.</dd>

      <dt><a href="html/libvirt-libvirt-domain.html#virDomainFSFreeze"><code>virDomainFSFreeze</code></a>, <a href="html/libvirt-libvirt-domain.html#virDomainFSThaw"><code>virDomainFSThaw</code></a></dt>
      <dd>This pair of APIs does not directly capture guest state, but
        can be used to coordinate with a trusted live guest that state
        capture is about to happen, and therefore guest I/O should be
        quiesced so that the state capture is fully consistent, rather
        than merely crash consistent.  Some APIs are able to
        automatically perform a freeze and thaw via a flags parameter,
        rather than having to make separate calls to these
        functions. Also, note that freezing guest I/O is only possible
        with trusted guests running a guest agent, and that some
        guests place maximum time limits on how long I/O can be
        frozen.</dd>

      <dt><a href="html/libvirt-libvirt-domain-checkpoint.html#virDomainCheckpointCreateXML"><code>virDomainCheckpointCreateXML</code></a></dt>
      <dd>This API does not actually capture guest state, rather it
        makes it possible to track which portions of guest disks have
        changed between a checkpoint and the current live execution of
        the guest.  However, while it is possible use this API to
        create checkpoints in isolation, it is more typical to create
        a checkpoint as a side-effect of starting a new incremental
        backup with <code>virDomainBackupBegin()</code> or at the
        creation of an external snapshot
        with <code>virDomainSnapshotCreateXML2()</code>, since a
        second incremental backup is most useful when using the
        checkpoint created during the first.  See also
        the <a href="formatcheckpoint.html">XML details</a> used with
        this command.</dd>

      <dt><a href="html/libvirt-libvirt-domain.html#virDomainBackupBegin"><code>virDomainBackupBegin</code></a>, <a href="html/libvirt-libvirt-domain.html#virDomainBackupEnd"><code>virDomainBackupEnd</code></a></dt>
      <dd>This API wraps approaches for capturing the state of disks
        of a running guest, but does not track accompanying guest
        memory state.  The capture is consistent to the start of the
        operation, where the captured state is stored independently
        from the disk image in use with the guest and where it can be
        easily integrated with a third-party for capturing the disk
        state.  Since the backup operation is stored externally from
        the guest resources, there is no need to commit data back in
        at the completion of the operation.  When coupled with
        checkpoints, this can be used to capture incremental backups
        instead of full.</dd>
    </dl>

    <h2><a id="examples">Examples</a></h2>
    <p>The following two sequences both accomplish the task of
      capturing the disk state of a running guest, then wrapping
      things up so that the guest is still running with the same file
      as its disk image as before the sequence of operations began.
      The difference between the two sequences boils down to the
      impact of an unexpected interruption made at any point in the
      middle of the sequence: with such an interruption, the first
      example leaves the guest tied to a temporary wrapper file rather
      than the original disk, and requires manual clean up of the
      domain definition; while the second example has no impact to the
      domain definition.</p>

    <p>1. Backup via temporary snapshot
      <pre>
virDomainFSFreeze()
virDomainSnapshotCreateXML(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)
virDomainFSThaw()
third-party copy the backing file to backup storage # most time spent here
virDomainBlockCommit(VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) per disk
wait for commit ready event per disk
virDomainBlockJobAbort() per disk
      </pre></p>

    <p>2. Direct backup
      <pre>
virDomainFSFreeze()
virDomainBackupBegin()
virDomainFSThaw()
wait for push mode event, or pull data over NBD # most time spent here
virDomainBackupEnd()
    </pre></p>

  </body>
</html>