qemu: blockjob: Transition into 'ready' state only from expected states

In certain rare occasions qemu can transition a block job which was
already 'ready' into 'standby' and then back. If this happens in the
following order libvirt will get confused about the actual job state:

1) the block copy job is 'ready' (job->state == QEMU_BLOCKJOB_STATE_READY)

2) user calls qemuDomainBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT
   flag but without VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC

3) the block job is switched to synchronous event handling

4) the block job blips to 'standby' and back to 'ready', the event is
   not processed since the blockjob is in sync mode for now

5) qemuDomainBlockJobPivot is called:
    5.1) 'job-complete' QMP command is issued
    5.2) job->state is set to QEMU_BLOCKJOB_STATE_PIVOTING

6) code for synchronous-wait for the job completion in qemuDomainBlockJobAbort
   is invoked

7) the waiting loop calls qemuBlockJobUpdate:

    7.1) job->newstate is QEMU_BLOCKJOB_STATE_READY due to 4)
    7.2) qemuBlockJobEventProcess is called
    7.3) the handler for QEMU_BLOCKJOB_STATE_READY overwrites
         job->state from QEMU_BLOCKJOB_STATE_PIVOTING to QEMU_BLOCKJOB_STATE_READY

8) qemuDomainBlockJobAbort is looking for a finished job, so waits again

9) qemu finishes the blockjob and transitions it into 'concluded' state

10) qemuBlockJobUpdate is triggered again, this time finalizing the job.
    10.1) job->newstate is = QEMU_BLOCKJOB_STATE_CONCLUDED
          job->state is = QEMU_BLOCKJOB_STATE_READY
    10.2) qemuBlockJobEventProcessConcluded is called, the function
          checks whether there was an error with the blockjob. Since
          there was no error job->newstate becomes
          QEMU_BLOCKJOB_STATE_COMPLETED.
    10.3) qemuBlockJobEventProcessConcludedTransition selects the action
          for the appropriate block job type where we have:

    case QEMU_BLOCKJOB_TYPE_COPY:
        if (job->state == QEMU_BLOCKJOB_STATE_PIVOTING && success)
            qemuBlockJobProcessEventConcludedCopyPivot(driver, vm, job, asyncJob);
        else
            qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob);
        break;

          Since job->state is QEMU_BLOCKJOB_STATE_READY,
          qemuBlockJobProcessEventConcludedCopyAbort is called.

This patch forbids transitions to QEMU_BLOCKJOB_STATE_READY if the
previous job state isn't QEMU_BLOCKJOB_STATE_RUNNING or
QEMU_BLOCKJOB_STATE_NEW.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1951507
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This commit is contained in:
Peter Krempa 2021-04-20 12:58:28 +02:00
parent 16c69e7aae
commit 81e770c2f1

View File

@ -1697,14 +1697,21 @@ qemuBlockJobEventProcess(virQEMUDriver *driver,
break;
case QEMU_BLOCKJOB_STATE_READY:
/* mirror may be NULL for copy job corresponding to migration */
if (job->disk) {
job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY;
qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate);
/* in certain cases qemu can blip out and back into 'ready' state for
* a blockjob. In cases when we already are past RUNNING the job such
* as when pivoting/aborting this could reset the internally set job
* state, thus we ignore it if the job isn't in expected state */
if (job->state == QEMU_BLOCKJOB_STATE_NEW ||
job->state == QEMU_BLOCKJOB_STATE_RUNNING) {
/* mirror may be NULL for copy job corresponding to migration */
if (job->disk) {
job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY;
qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate);
}
job->state = job->newstate;
qemuDomainSaveStatus(vm);
}
job->state = job->newstate;
job->newstate = -1;
qemuDomainSaveStatus(vm);
break;
case QEMU_BLOCKJOB_STATE_NEW: