After Jirka's migration patches libvirt is listening on migration
events from qemu instead of actively polling on the monitor. There is,
however, a little regression (introduced in 6d2edb6a42d0d41). The
problem is, the current status of migration job is updated in
qemuProcessHandleMigrationStatus if and only if migration job was
started. But eventually every asynchronous job may result in
migration. Therefore, since this job is not strictly a
migration job, internal state was not updated and later checks failed:
virsh # save fedora22 /tmp/fedora22_ble.save
error: Failed to save domain fedora22 to /tmp/fedora22_ble.save
error: operation failed: domain save job: is not active
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
If QEMU fails during incoming migration, the domain disappears including
a possibly useful error message read from QEMU log file. Let's remember
the error in virQEMUDriver so that Finish can report more than just "no
such domain".
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Since we already support the MIGRATION event, we just need to make sure
the domain condition is signalled whenever a p2p connection drops or the
domain is paused due to IO error and we can avoid waking up every 50 ms
to check whether something happened.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
We don't need to call query-migrate every 50ms when we get the current
migration state via MIGRATION event.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Even if QEMU supports migration events it doesn't send them by default.
We have to enable them by calling migrate-set-capabilities. Let's enable
migration events everytime we can and clear QEMU_CAPS_MIGRATION_EVENT in
case migrate-set-capabilities does not support events.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Some guests lock the tray and QEMU eject command will simply fail to
eject the media. But the guest OS can handle this attempt to eject the
media and can unlock the tray and open it. In this case, we should try
again to actually eject the media.
If the first attempt fails to detect a tray_open we will fail with
error, from monitor. If we receive that event, we know, that the guest
properly reacted to the eject request, unlocked the tray and opened it.
In this case, we need to run the command again to actually eject the
media from the device. The reason to call it again is, that QEMU
doesn't wait for the guest to react and report an error, that the tray
is locked.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1147471
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
QEMU_CAPS_SEAMLESS_MIGRATION capability says QEMU supports
SPICE_MIGRATE_COMPLETED event. Thus we can just drop all code which
polls query-spice and replace it with waiting for the event.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When libvirtd is restarted during migration, we properly cancel the
ongoing migration (unless it managed to almost finished before the
restart). But if we were also migrating storage using NBD, we would
completely forget about the running disk mirrors.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
By switching block jobs to use domain conditions, we can drop some
pretty complicated code in NBD storage migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
After libvirt issues the balloon resize command, the current balloon
size needs to be changed to the maximum memory size since the vCPUs were
not started and thus the balloon driver could not return the memory.
Since GetXMLDesc and other APIs return the balloon size without updating
it in case they are not able to obtain the job and the memory balloon
does not support the asynchronous event the sizing might be incorrect.
Since the monitor code now supports ullongs when setting balloon size,
drop the legacy code with overflow checking.
Additionally the comment mentioning that the job is treated as a sync
job does not make sense any more since the monitor is entered
asynchronously.
Store the emulator pinning cpu mask as a pure virBitmap rather than the
virDomainPinDef since it stores only the bitmap and refactor
qemuDomainPinEmulator to do the same operations in a much saner way.
As a side effect virDomainEmulatorPinAdd and virDomainEmulatorPinDel can
be removed since they don't add any value.
In case when <vcpu ... cpuset=""> is not specified, the vcpupin array is
not guaranteed to be allocated to def->vcpus. This would cause a crash
for TCG since it does not report thread IDs for vCPUs.
Most virDomainDiskIndexByName callers do not care about the index; what
they really want is a disk def pointer.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When starting a domain, if a domain specifies security drivers we do not have
loaded, we fail. However we don't check for this during
reconnect, so any operation relying on security driver functionality would fail.
If someone e.g. starts a domain with selinux driver loaded, then they change
the security driver to 'none' in config, restart the daemon and call dump/save/..,
QEMU will return an error.
As we shouldn't kill the domain, we should at least log an error to let the
user know that domain reconnect wasn't completely clean.
https://bugzilla.redhat.com/show_bug.cgi?id=1183893
So far, we are not reporting if numatune was even defined. The
value of zero is blindly returned (which maps onto
VIR_DOMAIN_NUMATUNE_MEM_STRICT). Unfortunately, we are making
decisions based on this value. Instead, we should not only return
the correct value, but report to the caller if the value is valid
at all.
For better viewing of this patch use '-w'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Other threads may be blocked in qemuBlockJobSyncWait. Ensure that
they're woken up when the domain is stopped.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
If we received zero iothreads from the monitor, but were perhaps
expecting to receive something, then the code was skipping the check
to ensure what's in the monitor matches our expectations. So invert
the checks to check that what we get back matches expectations and
then check there are zero iothreads returned.
Rather than have a separate routine to parse the alias of an iothread
returned from qemu in order to get the iothread_id value, parse the alias
when returning and just return the iothread_id in qemuMonitorIOThreadInfoPtr
This set of patches removes the function, changes the "char *name" to
"unsigned int" and handles all the fallout.
Remove the iothreadspin array from cputune and replace with a cpumask
to be stored in the iothreadids list.
Adjust the test output because our printing goes in order of the iothreadids
list now.
Add 'thread_id' to the virDomainIOThreadIDDef as a means to store the
'thread_id' as returned from the live qemu monitor data.
Remove the iothreadpids list from _qemuDomainObjPrivate and replace with
the new iothreadids 'thread_id' element.
Rather than use the default numbering scheme of 1..number of iothreads
defined for the domain, use the iothreadid's list for the iothread_id
Since iothreadids list keeps track of the iothread_id's, these are
now used in place of the many places where a for loop would "know"
that the ID was "+ 1" from the array element.
The new tests ensure usage of the <iothreadid> values for an exact number
of iothreads and the usage of a smaller number of <iothreadid> values than
iothreads that exist (and usage of the default numbering scheme).
If a user hot-attaches the guest agent channel libvirt would ignore it
until the restart of libvirtd or shutdown/destroy and start of the VM
itself.
This patch adds code that opens or closes the guest agent connection
according to the state of the guest agent channel according to
connect/disconnect events.
To allow opening the channel from the event handler qemuConnectAgent
needed to be exported.
When the guest agent channel gets hotplugged to a VM, libvirt would
still report that "QEMU guest agent is not configured" rather than
stating that the connection was not established yet.
Currently the code won't be able to connect to the agent after hotplug
but that will change in a later patch.
As the qemuFindAgentConfig() helper is quite helpful in this case move
it to a more usable place and export it.
This is basically turning qemuDomObjEndAPI into a more general
function. Other drivers which gets a reference to domain objects may
benefit from this function too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1198645
Once upon a time, there was a little domain. And the domain was pinned
onto a NUMA node and hasn't fully allocated its memory:
<memory unit='KiB'>2355200</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
Oh little me, said the domain, what will I do with so little memory.
If I only had a few megabytes more. But the old admin noticed the
whimpering, barely audible to untrained human ear. And good admin he
was, he gave the domain yet more memory. But the old NUMA topology
witch forbade to allocate more memory on the node zero. So he
decided to allocate it on a different node:
virsh # numatune little_domain --nodeset 0-1
virsh # setmem little_domain 2355200
The little domain was happy. For a while. Until bad, sharp teeth
shaped creature came. Every process in the system was afraid of him.
The OOM Killer they called him. Oh no, he's after the little domain.
There's no escape.
Do you kids know why? Because when the little domain was born, her
father, Libvirt, called numa_set_membind(). So even if the admin
allowed her to allocate memory from other nodes in the cgroups, the
membind() forbid it.
So what's the lesson? Libvirt should rely on cgroups, whenever
possible and use numa_set_membind() as the last ditch effort.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The destination libvirt daemon in a migration may segfault if the client
disconnects immediately after the migration has begun:
# virsh -c qemu+tls://remote/system list --all
Id Name State
----------------------------------------------------
...
# timeout --signal KILL 1 \
virsh migrate example qemu+tls://remote/system \
--verbose --compressed --live --auto-converge \
--abort-on-error --unsafe --persistent \
--undefinesource --copy-storage-all --xml example.xml
Killed
# virsh -c qemu+tls://remote/system list --all
error: failed to connect to the hypervisor
error: unable to connect to server at 'remote:16514': Connection refused
The crash is in:
1531 void
1532 qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj)
1533 {
1534 qemuDomainObjPrivatePtr priv = obj->privateData;
1535 qemuDomainJob job = priv->job.active;
1536
1537 priv->jobs_queued--;
Backtrace:
#0 at qemuDomainObjEndJob at qemu/qemu_domain.c:1537
#1 in qemuDomainRemoveInactive at qemu/qemu_domain.c:2497
#2 in qemuProcessAutoDestroy at qemu/qemu_process.c:5646
#3 in virCloseCallbacksRun at util/virclosecallbacks.c:350
#4 in qemuConnectClose at qemu/qemu_driver.c:1154
...
qemuDomainRemoveInactive calls virDomainObjListRemove, which in this
case is holding the last remaining reference to the domain.
qemuDomainRemoveInactive then calls qemuDomainObjEndJob, but the domain
object has been freed and poisoned by then.
This patch bumps the domain's refcount until qemuDomainRemoveInactive
has completed. We also ensure qemuProcessAutoDestroy does not return the
domain to virCloseCallbacksRun to be unlocked in this case. There is
similar logic in bhyveProcessAutoDestroy and lxcProcessAutoDestroy
(which call virDomainObjListRemove directly).
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Instead of always using controller 0 and incrementing port number,
respect the maximum port numbers of controllers and use all of them.
Ports for virtio consoles are quietly reserved, but not formatted
(neither in XML nor on QEMU command line).
Also rejects duplicate virtio-serial addresses.
https://bugzilla.redhat.com/show_bug.cgi?id=890606https://bugzilla.redhat.com/show_bug.cgi?id=1076708
Test changes:
* virtio-auto.args
Filling out the port when just the controller is specified.
switched from using
maxport + 1
to:
first free port on the controller
* virtio-autoassign.args
Filling out the address when no <address> is specified.
Started using all the controllers instead of 0, also discards
the bus value.
* xml -> xml output of virtio-auto
The port assignment is no longer done as a part of XML parsing,
so the unspecified values stay 0.
Two places would call to qemuPrepareCpumap() with priv->autoNodeset to
convert it to a cpuset. Remove the function and use the prepared cpuset
automatically.
When the synchronous pivot option is selected, libvirt would not update
the backing chain until the job was exitted. Some applications then
received invalid data as their job serialized first.
This patch removes polling to wait for the ABORT/PIVOT job completion
and replaces it with a condition. If a synchronous operation is
requested the update of the XML is executed in the job of the caller of
the synchronous request. Otherwise the monitor event callback uses a
separate worker to update the backing chain with a new job.
This is a regression since 1a92c719101e5bfa6fe2b78006ad04c7f075ea28
When the ABORT job is finished synchronously you get the following call
stack:
#0 qemuBlockJobEventProcess
#1 qemuDomainBlockJobImpl
#2 qemuDomainBlockJobAbort
#3 virDomainBlockJobAbort
While previously or while using the _ASYNC flag you'd get:
#0 qemuBlockJobEventProcess
#1 processBlockJobEvent
#2 qemuProcessEventHandler
#3 virThreadPoolWorker
When using 'dimm' memory devices with qemu, some of the information
like the slot number and base address need to be reloaded from qemu
after process start so that it reflects the actual state. The state then
allows to use memory devices across migrations.
virnetdevopenvswitch.h declares a few functions that can be called to
add ports to and remove them from OVS bridges, and retrieve the
migration data for a port. It does not contain any data definitions
that are used by domain_conf.h. But for some reason, domain_conf.h
virnetdevopenvswitch.h should be directly #including it. This adds a
few lines to the project, but saves all the files that don't need it
from the extra computing, and makes the dependencies more clear cut.
When libvirt is starting a domain, it reports the state as SHUTOFF until
it's RUNNING. This is not ideal because domain startup may take a long
time (usually because of some configuration issues, firewalls blocking
access to network disks, etc.) and domain lists provided by libvirt look
awkward. One can see weird shutoff domains with IDs in a list of active
domains or even shutoff transient domains. In any case, it looks more
like a bug in libvirt than a normal state a domain goes through.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Use the utilities introduced in the previous patches so the qemu
driver is able to create tap devices that are bound (and unbound
on domain destroyal) to Midonet virtual ports.
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
We're parsing memballoon status period as unsigned int, but when we're
trying to set it, both we and qemu use signed int. That means large
values will get wrapped around to negative one resulting in error.
Basically the same problem as commit e3a7b874 was dealing with when
updating live domain.
QEMU changed the accepted value to int64 in commit 1f9296b5, but even
values as INT_MAX don't make sense since the value passed means seconds.
Hence adding capability flag for this change isn't worth it.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1140958
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
As pointed out by jtomko in his review of the IOThreads pinning code:
http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html
there are some comments sprinkled in indicating IOThreads were using
the same structure as the VcpuPin code...
This is the first patch of a few that will change the virDomainVcpuPin*
structures and code to just virDomainPin* - starting with the data
structure naming...
As there are two possible approaches to define a domain's memory size -
one used with legacy, non-NUMA VMs configured in the <memory> element
and per-node based approach on NUMA machines - the user needs to make
sure that both are specified correctly in the NUMA case.
To avoid this burden on the user I'd like to replace the NUMA case with
automatic totaling of the memory size. To achieve this I need to replace
direct access to the virDomainMemtune's 'max_balloon' field with
two separate getters depending on the desired size.
The two sizes are needed as:
1) Startup memory size doesn't include memory modules in some
hypervisors.
2) After startup these count as the usable memory size.
Note that the comments for the functions are future aware and document
state that will be present after a few later patches.
Surprisingly we did not grab a VM job when a block job finished and we'd
happily rewrite the backing chain data. This made it possible to crash
libvirt when queueing two backing chains tightly and other badness.
To fix it, add yet another handler to the helper thread that handles
monitor events that require a job.