mirror of
https://gitlab.com/libvirt/libvirt.git
synced 2024-10-13 01:29:16 +00:00
540c339a25
There is one problem that causes various errors in the daemon. When domain is waiting for a job, it is unlocked while waiting on the condition. However, if that domain is for example transient and being removed in another API (e.g. cancelling incoming migration), it get's unref'd. If the first call, that was waiting, fails to get the job, it unref's the domain object, and because it was the last reference, it causes clearing of the whole domain object. However, when finishing the call, the domain must be unlocked, but there is no way for the API to know whether it was cleaned or not (unless there is some ugly temporary variable, but let's scratch that). The root cause is that our APIs don't ref the objects they are using and all use the implicit reference that the object has when it is in the domain list. That reference can be removed when the API is waiting for a job. And because each domain doesn't do its ref'ing, it results in the ugly checking of the return value of virObjectUnref() that we have everywhere. This patch changes qemuDomObjFromDomain() to ref the domain (using virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which should be the only function in which the return value of virObjectUnref() is checked. This makes all reference counting deterministic and makes the code a bit clearer. Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
311 lines
9.2 KiB
Plaintext
311 lines
9.2 KiB
Plaintext
QEMU Driver Threading: The Rules
|
|
=================================
|
|
|
|
This document describes how thread safety is ensured throughout
|
|
the QEMU driver. The criteria for this model are:
|
|
|
|
- Objects must never be exclusively locked for any prolonged time
|
|
- Code which sleeps must be able to time out after suitable period
|
|
- Must be safe against dispatch of asynchronous events from monitor
|
|
|
|
|
|
Basic locking primitives
|
|
------------------------
|
|
|
|
There are a number of locks on various objects
|
|
|
|
* virQEMUDriverPtr
|
|
|
|
The qemu_conf.h file has inline comments describing the locking
|
|
needs for each field. Any field marked immutable, self-locking
|
|
can be accessed without the driver lock. For other fields there
|
|
are typically helper APIs in qemu_conf.c that provide serialized
|
|
access to the data. No code outside qemu_conf.c should ever
|
|
acquire this lock
|
|
|
|
* virDomainObjPtr
|
|
|
|
Will be locked after calling any of the virDomainFindBy{ID,Name,UUID}
|
|
methods. However, preferred method is qemuDomObjFromDomain() that uses
|
|
virDomainFindByUUIDRef() which also increases the reference counter and
|
|
finds the domain in the domain list without blocking all other lookups.
|
|
When the domain is locked and the reference increased, the prefered way of
|
|
decrementing the reference counter and unlocking the domain is using the
|
|
qemuDomObjEndAPI() function.
|
|
|
|
Lock must be held when changing/reading any variable in the virDomainObjPtr
|
|
|
|
If the lock needs to be dropped & then re-acquired for a short period of
|
|
time, the reference count must be incremented first using virDomainObjRef().
|
|
There is no need to increase the reference count if qemuDomObjFromDomain()
|
|
was used for looking up the domain. In this case there is one reference
|
|
already added by that function.
|
|
|
|
This lock must not be held for anything which sleeps/waits (i.e. monitor
|
|
commands).
|
|
|
|
|
|
|
|
* qemuMonitorPrivatePtr: Job conditions
|
|
|
|
Since virDomainObjPtr lock must not be held during sleeps, the job
|
|
conditions provide additional protection for code making updates.
|
|
|
|
Qemu driver uses two kinds of job conditions: asynchronous and
|
|
normal.
|
|
|
|
Asynchronous job condition is used for long running jobs (such as
|
|
migration) that consist of several monitor commands and it is
|
|
desirable to allow calling a limited set of other monitor commands
|
|
while such job is running. This allows clients to, e.g., query
|
|
statistical data, cancel the job, or change parameters of the job.
|
|
|
|
Normal job condition is used by all other jobs to get exclusive
|
|
access to the monitor and also by every monitor command issued by an
|
|
asynchronous job. When acquiring normal job condition, the job must
|
|
specify what kind of action it is about to take and this is checked
|
|
against the allowed set of jobs in case an asynchronous job is
|
|
running. If the job is incompatible with current asynchronous job,
|
|
it needs to wait until the asynchronous job ends and try to acquire
|
|
the job again.
|
|
|
|
Immediately after acquiring the virDomainObjPtr lock, any method
|
|
which intends to update state must acquire either asynchronous or
|
|
normal job condition. The virDomainObjPtr lock is released while
|
|
blocking on these condition variables. Once the job condition is
|
|
acquired, a method can safely release the virDomainObjPtr lock
|
|
whenever it hits a piece of code which may sleep/wait, and
|
|
re-acquire it after the sleep/wait. Whenever an asynchronous job
|
|
wants to talk to the monitor, it needs to acquire nested job (a
|
|
special kind of normal job) to obtain exclusive access to the
|
|
monitor.
|
|
|
|
Since the virDomainObjPtr lock was dropped while waiting for the
|
|
job condition, it is possible that the domain is no longer active
|
|
when the condition is finally obtained. The monitor lock is only
|
|
safe to grab after verifying that the domain is still active.
|
|
|
|
|
|
* qemuMonitorPtr: Mutex
|
|
|
|
Lock to be used when invoking any monitor command to ensure safety
|
|
wrt any asynchronous events that may be dispatched from the monitor.
|
|
It should be acquired before running a command.
|
|
|
|
The job condition *MUST* be held before acquiring the monitor lock
|
|
|
|
The virDomainObjPtr lock *MUST* be held before acquiring the monitor
|
|
lock.
|
|
|
|
The virDomainObjPtr lock *MUST* then be released when invoking the
|
|
monitor command.
|
|
|
|
|
|
Helper methods
|
|
--------------
|
|
|
|
To lock the virDomainObjPtr
|
|
|
|
virObjectLock()
|
|
- Acquires the virDomainObjPtr lock
|
|
|
|
virObjectUnlock()
|
|
- Releases the virDomainObjPtr lock
|
|
|
|
|
|
|
|
To acquire the normal job condition
|
|
|
|
qemuDomainObjBeginJob()
|
|
- Waits until the job is compatible with current async job or no
|
|
async job is running
|
|
- Waits for job.cond condition 'job.active != 0' using virDomainObjPtr
|
|
mutex
|
|
- Rechecks if the job is still compatible and repeats waiting if it
|
|
isn't
|
|
- Sets job.active to the job type
|
|
|
|
|
|
qemuDomainObjEndJob()
|
|
- Sets job.active to 0
|
|
- Signals on job.cond condition
|
|
|
|
|
|
|
|
To acquire the asynchronous job condition
|
|
|
|
qemuDomainObjBeginAsyncJob()
|
|
- Waits until no async job is running
|
|
- Waits for job.cond condition 'job.active != 0' using virDomainObjPtr
|
|
mutex
|
|
- Rechecks if any async job was started while waiting on job.cond
|
|
and repeats waiting in that case
|
|
- Sets job.asyncJob to the asynchronous job type
|
|
|
|
|
|
qemuDomainObjEndAsyncJob()
|
|
- Sets job.asyncJob to 0
|
|
- Broadcasts on job.asyncCond condition
|
|
|
|
|
|
|
|
To acquire the QEMU monitor lock
|
|
|
|
qemuDomainObjEnterMonitor()
|
|
- Acquires the qemuMonitorObjPtr lock
|
|
- Releases the virDomainObjPtr lock
|
|
|
|
qemuDomainObjExitMonitor()
|
|
- Releases the qemuMonitorObjPtr lock
|
|
- Acquires the virDomainObjPtr lock
|
|
|
|
These functions must not be used by an asynchronous job.
|
|
|
|
|
|
To acquire the QEMU monitor lock as part of an asynchronous job
|
|
|
|
qemuDomainObjEnterMonitorAsync()
|
|
- Validates that the right async job is still running
|
|
- Acquires the qemuMonitorObjPtr lock
|
|
- Releases the virDomainObjPtr lock
|
|
- Validates that the VM is still active
|
|
|
|
qemuDomainObjExitMonitor()
|
|
- Releases the qemuMonitorObjPtr lock
|
|
- Acquires the virDomainObjPtr lock
|
|
|
|
These functions are for use inside an asynchronous job; the caller
|
|
must check for a return of -1 (VM not running, so nothing to exit).
|
|
Helper functions may also call this with QEMU_ASYNC_JOB_NONE when
|
|
used from a sync job (such as when first starting a domain).
|
|
|
|
|
|
To keep a domain alive while waiting on a remote command
|
|
|
|
qemuDomainObjEnterRemote()
|
|
- Releases the virDomainObjPtr lock
|
|
|
|
qemuDomainObjExitRemote()
|
|
- Acquires the virDomainObjPtr lock
|
|
|
|
|
|
Design patterns
|
|
---------------
|
|
|
|
|
|
* Accessing something directly to do with a virDomainObjPtr
|
|
|
|
virDomainObjPtr obj;
|
|
|
|
obj = qemuDomObjFromDomain(dom);
|
|
|
|
...do work...
|
|
|
|
qemuDomObjEndAPI(&obj);
|
|
|
|
|
|
* Updating something directly to do with a virDomainObjPtr
|
|
|
|
virDomainObjPtr obj;
|
|
|
|
obj = qemuDomObjFromDomain(dom);
|
|
|
|
qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
|
|
|
|
...do work...
|
|
|
|
qemuDomainObjEndJob(obj);
|
|
|
|
qemuDomObjEndAPI(&obj);
|
|
|
|
|
|
* Invoking a monitor command on a virDomainObjPtr
|
|
|
|
virDomainObjPtr obj;
|
|
qemuDomainObjPrivatePtr priv;
|
|
|
|
obj = qemuDomObjFromDomain(dom);
|
|
|
|
qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
|
|
|
|
...do prep work...
|
|
|
|
if (virDomainObjIsActive(vm)) {
|
|
qemuDomainObjEnterMonitor(obj);
|
|
qemuMonitorXXXX(priv->mon);
|
|
qemuDomainObjExitMonitor(obj);
|
|
}
|
|
|
|
...do final work...
|
|
|
|
qemuDomainObjEndJob(obj);
|
|
qemuDomObjEndAPI(&obj);
|
|
|
|
|
|
* Running asynchronous job
|
|
|
|
virDomainObjPtr obj;
|
|
qemuDomainObjPrivatePtr priv;
|
|
|
|
obj = qemuDomObjFromDomain(dom);
|
|
|
|
qemuDomainObjBeginAsyncJob(obj, QEMU_ASYNC_JOB_TYPE);
|
|
qemuDomainObjSetAsyncJobMask(obj, allowedJobs);
|
|
|
|
...do prep work...
|
|
|
|
if (qemuDomainObjEnterMonitorAsync(driver, obj,
|
|
QEMU_ASYNC_JOB_TYPE) < 0) {
|
|
/* domain died in the meantime */
|
|
goto error;
|
|
}
|
|
...start qemu job...
|
|
qemuDomainObjExitMonitor(driver, obj);
|
|
|
|
while (!finished) {
|
|
if (qemuDomainObjEnterMonitorAsync(driver, obj,
|
|
QEMU_ASYNC_JOB_TYPE) < 0) {
|
|
/* domain died in the meantime */
|
|
goto error;
|
|
}
|
|
...monitor job progress...
|
|
qemuDomainObjExitMonitor(driver, obj);
|
|
|
|
virObjectUnlock(obj);
|
|
sleep(aWhile);
|
|
virObjectLock(obj);
|
|
}
|
|
|
|
...do final work...
|
|
|
|
qemuDomainObjEndAsyncJob(obj);
|
|
qemuDomObjEndAPI(&obj);
|
|
|
|
|
|
* Coordinating with a remote server for migration
|
|
|
|
virDomainObjPtr obj;
|
|
qemuDomainObjPrivatePtr priv;
|
|
|
|
obj = qemuDomObjFromDomain(dom);
|
|
|
|
qemuDomainObjBeginAsyncJob(obj, QEMU_ASYNC_JOB_TYPE);
|
|
|
|
...do prep work...
|
|
|
|
if (virDomainObjIsActive(vm)) {
|
|
qemuDomainObjEnterRemote(obj);
|
|
...communicate with remote...
|
|
qemuDomainObjExitRemote(obj);
|
|
/* domain may have been stopped while we were talking to remote */
|
|
if (!virDomainObjIsActive(vm)) {
|
|
qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
|
|
_("guest unexpectedly quit"));
|
|
}
|
|
}
|
|
|
|
...do final work...
|
|
|
|
qemuDomainObjEndAsyncJob(obj);
|
|
qemuDomObjEndAPI(&obj);
|