When the backup job is terminated normally the security label is
restored by the blockjob finishing handler.
If the VM dies or is destroyed that wouldn't happen as the blockjob
handler wouldn't be called.
Restore the security label on disk store where we remember that the job
was running at the point when 'qemuBackupJobTerminate' was called.
Not resetting the security label means that we also leak the xattr
attributes remembering the label which prevents any further use of the
file, which is a problem for block devices.
This also requires that the call to 'qemuBackupJobTerminate' from
'qemuProcessStop' happens only after 'vm->pid' was reset as otherwise
the security subdrivers attempt to enter the process namespace which
fails if the process isn't running any more.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1939082
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
qemuBackupBegin can take a full backup of the disks (excluding any
operations with bitmaps) without the need to wait for the
blockdev-reopen support in qemu.
Add a check that no checkpoint creation is required and the disk backup
mode isn't VIR_DOMAIN_BACKUP_DISK_BACKUP_MODE_INCREMENTAL.
Call to virDomainBackupAlignDisks is moved earlier as it initializes the
disk backup mode if not present in user config.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Commit cb29e4e801 didn't take into account that the VM can be inactive
when it's destroyed. This means that the job would remain active also
when the VM became inactive.
To fix this properly:
1) Remove the bogus VM liveness check and early return
(reverts the aforementioned commit)
2) Conditionalize the stats assignment only when the stats object is
present
(properly fix the crash when VM dies when reconnecting)
3) end the asyncjob only when it was already set
(prevent corruption of priv->jobs_queued)
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1937598
Fixes: cb29e4e801
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'qemuBackupJobTerminate' needs the API flags to see whether
VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL. Unfortunately when called via
qemuProcessReconnect()->qemuProcessStop() early (e.g. if the qemu
process died while we were reconnecting) the job is cleared temporarily
so that other APIs can be called. This would mean that we couldn't clean
up the files in some cases.
Save the 'apiFlags' inside the backup object and set it from the
'qemuDomainJobObj' 'apiFlags' member when reconnecting to a VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If the VM isn't active calculating the job stats doesn't make sense.
Additionally this prevents a crash of libvirtd if qemu terminates while
libvirt wasn't running:
Thread 28 "init-backup-tes" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb9310640 (LWP 3201116)]
qemuDomainJobInfoUpdateTime (jobInfo=0x0) at ../../../libvirt/src/qemu/qemu_domainjob.c:275
275 if (!jobInfo->started)
(gdb) bt
#0 qemuDomainJobInfoUpdateTime (jobInfo=0x0) at ../../../libvirt/src/qemu/qemu_domainjob.c:275
#1 0x00007fffcba1a12d in qemuBackupJobTerminate (vm=0x7fff9c1bc840, jobstatus=QEMU_DOMAIN_JOB_STATUS_CANCELED) at ../../../libvirt/src/qemu/qemu_backup.c:563
#2 0x00007fffcbaefcae in qemuProcessStop
(driver=0x7fff9c144ff0, vm=0x7fff9c1bc840, reason=VIR_DOMAIN_SHUTOFF_DAEMON, asyncJob=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
at ../../../libvirt/src/qemu/qemu_process.c:7812
#3 0x00007fffcbaf2a10 in qemuProcessReconnect (opaque=<optimized out>) at ../../../libvirt/src/qemu/qemu_process.c:8578
#4 0x00007ffff7c46bb5 in virThreadHelper (data=<optimized out>) at ../../../libvirt/src/util/virthread.c:233
#5 0x00007ffff6e453f9 in start_thread () at /lib64/libpthread.so.0
#6 0x00007ffff766fb53 in clone () at /lib64/libc.so.6
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Upcoming patch will remove unnecessary actions if the VM crashed. The
cleanup needs to be performed always, thus needs to be moved earlier.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The code handles XML bits and internal definition and should be
in conf directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Up until now we had a runtime code and XML related code in the same
source file inside util directory.
This patch takes the runtime part and extracts it into the new
storage_file directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Refactor in 0316c28a45 used incorrect source variable to initialize
the variable which holds the name of the bitmap which needs to be
deleted after the backup job finishes. This resulted into deleting the
source bitmap of the backup rather than the temporary one.
Use 'dd->incrementalBitmap' which holds the temporary bitmap name
instead of 'dd->backupdisk->incremental' which holds the name of the
source bitmap which is used by the backup.
Fixes: 0316c28a45
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1908647
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
It's a technical detail in qemu that QCOW2 is needed for a pull-mode
backup.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Libvirt's backup code has two modes:
1) push - where qemu actively writes the difference since the checkpoint
into the output file
2) pull - where we instruct qemu to expose a frozen disk state along
with a bitmap of blocks which changed since the checkpoint
For push mode qemu needs the temporary bitmap we use where we calculate
the actual changes to be present on the block node backing the disk.
For pull mode where we expose the bitmap via NBD qemu actually wants the
bitmap to be present for the exported block node which is the scratch
file.
Until now we've calculated the bitmap twice and installed it both to the
scratch file and to the disk node, but we don't need to since we know
when it's needed.
Pass in the 'pull' flag and decide where to install the bitmap according
to it and also when to register the bitmap name with the blockjob.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Verify that the checkpoint requested by an incremental backup exists.
Unfortunately validating whether the checkpoint configuration actually
matches the disk may not be reasonably feasible as the disk may have
been renamed/snapshotted/etc. We still rely on bitmap presence.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
If we don't have a consistent chain of bitmaps for the backup to proceed
we'd report VIR_ERR_INVALID_ARG error code, which makes it hard to
decide whether an incremental backup makes even sense.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Don't hide our use of GHashTable behind our typedef. This will also
promote the use of glibs hash function directly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Matt Coleman <matt@datto.com>
Centralize the logic deciding which arguments to use when exporting a
block backend via NBD to a single place so that it can be centrally
fixed in upcoming commits to support the new export method via
'block-export-add'.
Additionally this allows simplification of the caller from migration as
the logic deciding which arguments to use is extracted too.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We've put the aliases into the backup job definition after the status
XML was already written so they didn't appear in the on-disk state.
Move the code putting them into the private definition earlier, so that
the status XML update done by saving blockjobs already writes them out.
Also add a note notifying that the block job status update writes the
status XML.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1870488
Fixes: 423576679a
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Commit 423576679a implementing TLS forgot to remove the comment.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
g_new() is used in only 3 places. Switching them to g_new0() will do
no harm, reduces confusion, and helps me sleep better at night knowing
that all allocated memory is initialized to 0 :-) (Yes, I *know* that
in all three cases the associated memory is immediately assigned some
other value. Today.)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use the configured TLS env to setup encryption of the TLS transport.
https://bugzilla.redhat.com/show_bug.cgi?id=1822631
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The bitmap name used for the incremental backup would be leaked
otherwise.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Originally the function was cleaning up a failed job only but now
there's other stuff that needs to be cleared too.
Make only steps which clean up after a failed job depend on the
'started' field and execute the rest of the code always.
This fixes a leak of the backup job tracking object and the blockdev-add
helper data.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code would repeatedly mark the first disk's blockjob as started
rather than accessing all the blockjobs. Fix the dereferencing operator.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Two functions called in sequence both initialized the virStorageSource
backing 'store' leading to a memleak.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The cleanup path expects that 'def' is assigned to 'priv->backup', but
that's not the case for early failures. Add a check to stop overwriting
of 'def' so that it can be freed.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reuse qemuBlockGetBitmapMergeActions which allows removal of the ad-hoc
implementation of bitmap merging for backup. The new approach is simpler
and also more robust in case some of the bitmaps break as they remove
the dependency on the whole chain of bitmaps working.
The new approach also allows backups if a snapshot is created outside of
libvirt.
Additionally the code is greatly simplified.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Fetch the checkpoint list for every disk specifically based on the new
per-disk 'incremental' field.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a disk is not captured by one of the intermediate checkpoints the
code would fail, but we can easily calculate the bitmaps to merge
correctly by skipping over checkpoints which don't describe the disk.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The algorithm is getting quite complex. Split out the lookup of range of
backing chain storage sources and bitmaps contained in them which
correspond to one checkpoint.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We still have to use -drive to instantiate sd disks. Combining that with
the new logic for blockjobs would be very complicated and not worth it
given that 'sd' cards work only on few rarely used machine types of
non-common architectures and libvirt didn't implement support for 'sd'
bus controllers. This will allow us to use -blockdev for other kinds on
such machines while sacrificing block jobs.
Note: this is currently no-op as we mask-out the QEMU_CAPS_BLOCKDEV
capability if any of the disks has bus='sd'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If a backup job fails midway it's hard to figure out what happened as
it's running asynchronous. Use the VIR_DOMAIN_JOB_ERRMSG job statistics
field to pass through the error from the first failed backup-blockjob
so that both the consumer of the virDomainGetJobStats and the
corresponding event can see the error.
event 'job-completed' for domain backup-test:
operation: 9
time_elapsed: 46
disk_total: 104857600
disk_processed: 10158080
disk_remaining: 94699520
success: 0
errmsg: No space left on device
virsh domjobinfo backup-test --completed --anystats
Job type: Failed
Operation: Backup
Time elapsed: 46 ms
File processed: 9.688 MiB
File remaining: 90.312 MiB
File total: 100.000 MiB
Error message: No space left on device
https://bugzilla.redhat.com/show_bug.cgi?id=1812827
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In order to add a string to qemuDomainJobInfo we must ensure that it's
freed and copied properly. Add helpers to copy and free the structure
and adjust the code to use them properly for the new semantics.
Additionally also allocation is changed to g_new0 as it includes the
type and thus it's very easy to grep for all the allocations of a given
type.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We always tried to install backing store for the image even if it didn't
make sense, e.g. for a full backup into a raw image. Additionally we
didn't record the backing file into the qcow2 metadata so the image
itself contained the diff of data but reading from it would be
incomplete as it depends on the backing image.
This patch fixes both issues by carefully installing the correct backing
file when appropriate and also recording it into the metadata when
creating the image.
https://bugzilla.redhat.com/show_bug.cgi?id=1813310
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The code attempting to clean up after a failed pull mode backup job
wrongly entered monitor but didn't clean up nor exit monitor due to a
logic bug. Fix the condition.
Introduced in a1521f84a5https://bugzilla.redhat.com/show_bug.cgi?id=1817327
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When preparing images for block jobs we modify their seclabels so
that QEMU can open them. However, as mentioned in the previous
commit, secdrivers base some it their decisions whether the image
they are working on is top of of the backing chain. Fortunately,
in places where we call secdrivers we know this and the
information can be passed to secdrivers.
The problem is the following: after the first blockcommit from
the base to one of the parents the XATTRs on the base image are
not cleared and therefore the second attempt to do another
blockcommit fails. This is caused by blockcommit code calling
qemuSecuritySetImageLabel() over the base image, possibly
multiple times (to ensure RW/RO access). A naive fix would be to
call the restore function. But this is not possible, because that
would deny QEMU the access to the base image. Fortunately, we
can use the fact that seclabels are remembered only for the top
of the backing chain and not for the rest of the backing chain.
And thanks to the previous commit we can tell secdrivers which
images are top of the backing chain.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1803551
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Use the glib allocation function that never returns NULL and remove the
now dead-code checks from all callers.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Create a wrapper for qemuBlockGetNamedNodeData named
qemuBlockGetNamedNodeData. The purpose of the wrapper is to integrate
the monitor handling functionality and in the future possible
qemuCaps-based flags.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use the user-configured name of the bitmap when merging the appropriate
bitmaps for an incremental backup so that the user can see it as
configured. Additionally expose the default bitmap name if nothing is
configured.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Pass the exportname as configured when exporting the image via NBD and
fill it with the default if it's not configured.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Inactive VM doesn't have qemuCaps set thus we'd never properly report
that VM backups are supported only for running VMs.
Move the capability check after the active check.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
We insert the checkpoint metadata into the list of checkpoints prior to
actually creating the on-disk bits. If the 'transaction' or any other
steps done between inserting the checkpoint and creating the on-disk
data fail we'd end up with an unusable checkpoint that would vanish
after libvirtd restart.
Prevent this by rolling back the metadata if we didn't actually take and
record the checkpoint.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Commit d75f865fb9 caused a job-deadlock if
a VM is running the backup job and being destroyed as it removed the
cleanup of the async job type and there was nothing to clean up the
backup job.
Add an explicit cleanup of the backup job when destroying a VM.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When cancelling the blockjobs as part of failed backup job startup
recover we didn't pass in the correct async job type. Luckily the block
job handler and cancellation code paths use no block job at all
currently so those were correct.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that we delete the images elsewhere it's not required. Additionally
it's safe to do as we never released an upstream version which required
this being in place.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
While qemu is running both locations are identical in semantics, but the
move will allow us to fix the scenario when the VM is destroyed or
crashes where we'd leak the images.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In contrast to snapshots the backup job does not complain when the
backup job's store file has backing pre-configured. It's actually
required so that the NBD server exposes all the data properly.
Remove our fake termination and use the existing disk source as backing.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
g_get_real_time() returns the time since epoch in microseconds.
It uses gettimeofday() internally while libvirt used clock_gettime
because it is declared async signal safe. In practice gettimeofday
is also async signal safe *provided* the timezone parameter is
NULL. This is indeed the case in g_get_real_time().
Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
To allow backups work across external snapshots we need to improve the
algorithm which calculates which bitmaps to merge.
The algorithm must look for appropriately named bitmaps in the image and
possibly descend into a backing image if the current image does not have
the bitmap.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The function will require the bitmap topology for the full
implementation. To facilitate testing, add the propagation of the
necessary data beforehand so that the test code can stay unchanged
during the changes.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>