After libvirt issues the balloon resize command, the current balloon
size needs to be changed to the maximum memory size since the vCPUs were
not started and thus the balloon driver could not return the memory.
Since GetXMLDesc and other APIs return the balloon size without updating
it in case they are not able to obtain the job and the memory balloon
does not support the asynchronous event the sizing might be incorrect.
Since the monitor code now supports ullongs when setting balloon size,
drop the legacy code with overflow checking.
Additionally the comment mentioning that the job is treated as a sync
job does not make sense any more since the monitor is entered
asynchronously.
Store the emulator pinning cpu mask as a pure virBitmap rather than the
virDomainPinDef since it stores only the bitmap and refactor
qemuDomainPinEmulator to do the same operations in a much saner way.
As a side effect virDomainEmulatorPinAdd and virDomainEmulatorPinDel can
be removed since they don't add any value.
In case when <vcpu ... cpuset=""> is not specified, the vcpupin array is
not guaranteed to be allocated to def->vcpus. This would cause a crash
for TCG since it does not report thread IDs for vCPUs.
Most virDomainDiskIndexByName callers do not care about the index; what
they really want is a disk def pointer.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
When starting a domain, if a domain specifies security drivers we do not have
loaded, we fail. However we don't check for this during
reconnect, so any operation relying on security driver functionality would fail.
If someone e.g. starts a domain with selinux driver loaded, then they change
the security driver to 'none' in config, restart the daemon and call dump/save/..,
QEMU will return an error.
As we shouldn't kill the domain, we should at least log an error to let the
user know that domain reconnect wasn't completely clean.
https://bugzilla.redhat.com/show_bug.cgi?id=1183893
So far, we are not reporting if numatune was even defined. The
value of zero is blindly returned (which maps onto
VIR_DOMAIN_NUMATUNE_MEM_STRICT). Unfortunately, we are making
decisions based on this value. Instead, we should not only return
the correct value, but report to the caller if the value is valid
at all.
For better viewing of this patch use '-w'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Other threads may be blocked in qemuBlockJobSyncWait. Ensure that
they're woken up when the domain is stopped.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
If we received zero iothreads from the monitor, but were perhaps
expecting to receive something, then the code was skipping the check
to ensure what's in the monitor matches our expectations. So invert
the checks to check that what we get back matches expectations and
then check there are zero iothreads returned.
Rather than have a separate routine to parse the alias of an iothread
returned from qemu in order to get the iothread_id value, parse the alias
when returning and just return the iothread_id in qemuMonitorIOThreadInfoPtr
This set of patches removes the function, changes the "char *name" to
"unsigned int" and handles all the fallout.
Remove the iothreadspin array from cputune and replace with a cpumask
to be stored in the iothreadids list.
Adjust the test output because our printing goes in order of the iothreadids
list now.
Add 'thread_id' to the virDomainIOThreadIDDef as a means to store the
'thread_id' as returned from the live qemu monitor data.
Remove the iothreadpids list from _qemuDomainObjPrivate and replace with
the new iothreadids 'thread_id' element.
Rather than use the default numbering scheme of 1..number of iothreads
defined for the domain, use the iothreadid's list for the iothread_id
Since iothreadids list keeps track of the iothread_id's, these are
now used in place of the many places where a for loop would "know"
that the ID was "+ 1" from the array element.
The new tests ensure usage of the <iothreadid> values for an exact number
of iothreads and the usage of a smaller number of <iothreadid> values than
iothreads that exist (and usage of the default numbering scheme).
If a user hot-attaches the guest agent channel libvirt would ignore it
until the restart of libvirtd or shutdown/destroy and start of the VM
itself.
This patch adds code that opens or closes the guest agent connection
according to the state of the guest agent channel according to
connect/disconnect events.
To allow opening the channel from the event handler qemuConnectAgent
needed to be exported.
When the guest agent channel gets hotplugged to a VM, libvirt would
still report that "QEMU guest agent is not configured" rather than
stating that the connection was not established yet.
Currently the code won't be able to connect to the agent after hotplug
but that will change in a later patch.
As the qemuFindAgentConfig() helper is quite helpful in this case move
it to a more usable place and export it.
This is basically turning qemuDomObjEndAPI into a more general
function. Other drivers which gets a reference to domain objects may
benefit from this function too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1198645
Once upon a time, there was a little domain. And the domain was pinned
onto a NUMA node and hasn't fully allocated its memory:
<memory unit='KiB'>2355200</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
Oh little me, said the domain, what will I do with so little memory.
If I only had a few megabytes more. But the old admin noticed the
whimpering, barely audible to untrained human ear. And good admin he
was, he gave the domain yet more memory. But the old NUMA topology
witch forbade to allocate more memory on the node zero. So he
decided to allocate it on a different node:
virsh # numatune little_domain --nodeset 0-1
virsh # setmem little_domain 2355200
The little domain was happy. For a while. Until bad, sharp teeth
shaped creature came. Every process in the system was afraid of him.
The OOM Killer they called him. Oh no, he's after the little domain.
There's no escape.
Do you kids know why? Because when the little domain was born, her
father, Libvirt, called numa_set_membind(). So even if the admin
allowed her to allocate memory from other nodes in the cgroups, the
membind() forbid it.
So what's the lesson? Libvirt should rely on cgroups, whenever
possible and use numa_set_membind() as the last ditch effort.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The destination libvirt daemon in a migration may segfault if the client
disconnects immediately after the migration has begun:
# virsh -c qemu+tls://remote/system list --all
Id Name State
----------------------------------------------------
...
# timeout --signal KILL 1 \
virsh migrate example qemu+tls://remote/system \
--verbose --compressed --live --auto-converge \
--abort-on-error --unsafe --persistent \
--undefinesource --copy-storage-all --xml example.xml
Killed
# virsh -c qemu+tls://remote/system list --all
error: failed to connect to the hypervisor
error: unable to connect to server at 'remote:16514': Connection refused
The crash is in:
1531 void
1532 qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj)
1533 {
1534 qemuDomainObjPrivatePtr priv = obj->privateData;
1535 qemuDomainJob job = priv->job.active;
1536
1537 priv->jobs_queued--;
Backtrace:
#0 at qemuDomainObjEndJob at qemu/qemu_domain.c:1537
#1 in qemuDomainRemoveInactive at qemu/qemu_domain.c:2497
#2 in qemuProcessAutoDestroy at qemu/qemu_process.c:5646
#3 in virCloseCallbacksRun at util/virclosecallbacks.c:350
#4 in qemuConnectClose at qemu/qemu_driver.c:1154
...
qemuDomainRemoveInactive calls virDomainObjListRemove, which in this
case is holding the last remaining reference to the domain.
qemuDomainRemoveInactive then calls qemuDomainObjEndJob, but the domain
object has been freed and poisoned by then.
This patch bumps the domain's refcount until qemuDomainRemoveInactive
has completed. We also ensure qemuProcessAutoDestroy does not return the
domain to virCloseCallbacksRun to be unlocked in this case. There is
similar logic in bhyveProcessAutoDestroy and lxcProcessAutoDestroy
(which call virDomainObjListRemove directly).
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
Instead of always using controller 0 and incrementing port number,
respect the maximum port numbers of controllers and use all of them.
Ports for virtio consoles are quietly reserved, but not formatted
(neither in XML nor on QEMU command line).
Also rejects duplicate virtio-serial addresses.
https://bugzilla.redhat.com/show_bug.cgi?id=890606https://bugzilla.redhat.com/show_bug.cgi?id=1076708
Test changes:
* virtio-auto.args
Filling out the port when just the controller is specified.
switched from using
maxport + 1
to:
first free port on the controller
* virtio-autoassign.args
Filling out the address when no <address> is specified.
Started using all the controllers instead of 0, also discards
the bus value.
* xml -> xml output of virtio-auto
The port assignment is no longer done as a part of XML parsing,
so the unspecified values stay 0.
Two places would call to qemuPrepareCpumap() with priv->autoNodeset to
convert it to a cpuset. Remove the function and use the prepared cpuset
automatically.
When the synchronous pivot option is selected, libvirt would not update
the backing chain until the job was exitted. Some applications then
received invalid data as their job serialized first.
This patch removes polling to wait for the ABORT/PIVOT job completion
and replaces it with a condition. If a synchronous operation is
requested the update of the XML is executed in the job of the caller of
the synchronous request. Otherwise the monitor event callback uses a
separate worker to update the backing chain with a new job.
This is a regression since 1a92c719101e5bfa6fe2b78006ad04c7f075ea28
When the ABORT job is finished synchronously you get the following call
stack:
#0 qemuBlockJobEventProcess
#1 qemuDomainBlockJobImpl
#2 qemuDomainBlockJobAbort
#3 virDomainBlockJobAbort
While previously or while using the _ASYNC flag you'd get:
#0 qemuBlockJobEventProcess
#1 processBlockJobEvent
#2 qemuProcessEventHandler
#3 virThreadPoolWorker
When using 'dimm' memory devices with qemu, some of the information
like the slot number and base address need to be reloaded from qemu
after process start so that it reflects the actual state. The state then
allows to use memory devices across migrations.
virnetdevopenvswitch.h declares a few functions that can be called to
add ports to and remove them from OVS bridges, and retrieve the
migration data for a port. It does not contain any data definitions
that are used by domain_conf.h. But for some reason, domain_conf.h
virnetdevopenvswitch.h should be directly #including it. This adds a
few lines to the project, but saves all the files that don't need it
from the extra computing, and makes the dependencies more clear cut.
When libvirt is starting a domain, it reports the state as SHUTOFF until
it's RUNNING. This is not ideal because domain startup may take a long
time (usually because of some configuration issues, firewalls blocking
access to network disks, etc.) and domain lists provided by libvirt look
awkward. One can see weird shutoff domains with IDs in a list of active
domains or even shutoff transient domains. In any case, it looks more
like a bug in libvirt than a normal state a domain goes through.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Use the utilities introduced in the previous patches so the qemu
driver is able to create tap devices that are bound (and unbound
on domain destroyal) to Midonet virtual ports.
Signed-off-by: Antoni Segura Puimedon <toni+libvirt@midokura.com>
We're parsing memballoon status period as unsigned int, but when we're
trying to set it, both we and qemu use signed int. That means large
values will get wrapped around to negative one resulting in error.
Basically the same problem as commit e3a7b874 was dealing with when
updating live domain.
QEMU changed the accepted value to int64 in commit 1f9296b5, but even
values as INT_MAX don't make sense since the value passed means seconds.
Hence adding capability flag for this change isn't worth it.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1140958
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
As pointed out by jtomko in his review of the IOThreads pinning code:
http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html
there are some comments sprinkled in indicating IOThreads were using
the same structure as the VcpuPin code...
This is the first patch of a few that will change the virDomainVcpuPin*
structures and code to just virDomainPin* - starting with the data
structure naming...
As there are two possible approaches to define a domain's memory size -
one used with legacy, non-NUMA VMs configured in the <memory> element
and per-node based approach on NUMA machines - the user needs to make
sure that both are specified correctly in the NUMA case.
To avoid this burden on the user I'd like to replace the NUMA case with
automatic totaling of the memory size. To achieve this I need to replace
direct access to the virDomainMemtune's 'max_balloon' field with
two separate getters depending on the desired size.
The two sizes are needed as:
1) Startup memory size doesn't include memory modules in some
hypervisors.
2) After startup these count as the usable memory size.
Note that the comments for the functions are future aware and document
state that will be present after a few later patches.
Surprisingly we did not grab a VM job when a block job finished and we'd
happily rewrite the backing chain data. This made it possible to crash
libvirt when queueing two backing chains tightly and other badness.
To fix it, add yet another handler to the helper thread that handles
monitor events that require a job.
https://bugzilla.redhat.com/show_bug.cgi?id=1197600
So, libvirt uses pid file to track pid of started qemus. Whenever
a domain is started, its pid is put into corresponding pid file.
The pid file path is generated based on domain name and stored
into domain object internals. However, it's not stored in the
status XML and therefore lost on daemon restarts. Hence, later,
when domain is being shut down, the daemon does not know which
pid file to unlink, and the correct pid file is left behind. To
avoid this, lets generate the pid file path again in
qemuProcessReconnect().
Reported-by: Luyao Huang <lhuang@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Instead of checking defaultMode for every channel that has no mode
configured, test it only once outside of channel loop. This fixes a bug
that in case all possible channels are fore example set to insecure, but
defaultMode is set to secure, we wouldn't auto-generate TLS port. This
results in failure while starting a guest.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143832
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
We have two different places that needs to be updated while touching
code for allocation spice ports. Add a bool option to
'qemuProcessSPICEAllocatePorts' function to switch between true and fake
allocation so we can use this function also in qemu_driver to generate
native domain definition.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Since adding the support for scheduler policy settings in commit
8680ea97, there are two enums with the same information. That was
caused by rewriting the patch since first draft.
Find out thanks to clang, but there was no impact whatsoever.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has
completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image,
the event looks something like this:
"timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event":
"BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len":
8412790784, "offset": 409993216, "speed": 8796093022207, "type":
"mirror", "error": "No space left on device"}}
If "len" does not equal "offset" it's considered an error, and we can
clearly see "error" field filled in. However, later in the event
processing this case was handled no differently to case of job being
aborted via separate API. It's time that we start differentiate these
two because of the future work.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated
each time. The callback code handling the events checks if a blockjob
was started via our public APIs prior to setting the mirrorState.
However, some block jobs may be started internally (e.g. during
storage migration), in which case we don't bother with setting
disk->mirror (there's nothing we can set it to anyway), or other
fields. But it will come handy if we update the mirrorState in these
cases too. The event wasn't delivered just for fun - we've started the
job after all.
So, in this commit, the mirrorState is set to whatever job status
we've obtained. Of course, there are some actions on some statuses
that we want to perform. But instead of if {} else if {} else {} ...
enumeration, let's move to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
We do have a check for valid per-domain security model, however we still
do permit an invalid security model for a domain's device (those which
are specified with <source> element).
This patch introduces a new function virSecurityManagerCheckAllLabel
which compares user specified security model against currently
registered security drivers. That being said, it also permits 'none'
being specified as a device security model.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1165485
Signed-off-by: Ján Tomko <jtomko@redhat.com>
If a previous commit I fixed the incorrect handling of vcpu pids
for TCG mode QEMU:
commit b07f3d821dfb11a118ee75ea275fd6ab737d9500
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Thu Dec 18 16:34:39 2014 +0000
Don't setup fake CPU pids for old QEMU
The code assumes that def->vcpus == nvcpupids, so when we setup
fake CPU pids for old QEMU with nvcpupids == 1, we cause the
later code to read off the end of the array. This has fun results
like sche_setaffinity(0, ...) which changes libvirtd's own CPU
affinity, or even better sched_setaffinity($RANDOM, ...) which
changes the affinity of a random OS process.
The intent was that this would merely disable the ability to set
per-vCPU affinity. It should still have been possible to set VM
level host CPU affinity.
Unfortunately, when you set <vcpu cpuset='0-1'>4</vcpu>, the XML
parser will internally take this & initialize an entry in the
def->cputune.vcpupin array for every VCPU. IOW this is implicitly
being treated as
<cputune>
<vcpupin cpuset='0-1' vcpu='0'/>
<vcpupin cpuset='0-1' vcpu='1'/>
<vcpupin cpuset='0-1' vcpu='2'/>
<vcpupin cpuset='0-1' vcpu='3'/>
</cputune>
Even more fun, the faked cputune elements are hidden from view when
querying the live XML, because their cpuset mask is the same as the
VM default cpumask.
The upshot was that it was impossible to set VM level CPU affinity.
To fix this we must update qemuProcessSetVcpuAffinities so that it
only reports a fatal error if the per-VCPU cpu mask is different
from the VM level cpu mask.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>