Currently QEMU capabilities are initialized before the QEMU driver
sets ownership on its various directories. The upshot is that if
you change the user/group in the qemu.conf file, libvirtd will fail
to probe QEMU the first time it is run after the config change.
Moving QEMU capabilities initialization to after the chown() calls
fixes this
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This previous commit
commit 1a50ba2cb0
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date: Mon Nov 26 15:17:13 2012 +0100
qemu: Fix QMP Capabability Probing Failure
which attempted to make sure the QEMU process used for probing
ran as the right user id, caused serious performance regression
and unreliability in probing. The -daemonize switch in QEMU
guarantees that the monitor socket is present before the parent
process exits. This means libvirtd is guaranteed to be able to
connect immediately. By switching from -daemonize to the
virCommandDaemonize API libvirtd was no longer synchronized with
QEMU's startup process. The result was that the QEMU monitor
failed to open and went into its 200ms sleep loop. This happened
for all 25 binaries resulting in 5 seconds worth of sleeping
at libvirtd startup. In addition sometimes when libvirt connected,
QEMU would be partially initialized and crash causing total
failure to probe that binary.
This commit reverts the previous change, ensuring we do use the
-daemonize flag to QEMU. Startup delay is cut from 7 seconds
to 2 seconds on my machine, which is on a par with what it was
prior to the capabilities rewrite.
To deal with the fact that QEMU needs to be able to create the
pidfile, we switch pidfile location fron runDir to libDir, which
QEMU is guaranteed to be able to write to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there is no reason to hold qemu driver locked
throughout whole API execution. Moreover, we can use the
new qemuDomObjFromDomain() internal API to lookup domain then.
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
if a guest is put into S3 state (such as via virsh dompmsuspend)
and then an external snapshot is taken, qemu forcefully transitions
the domain to paused, but libvirt doesn't reflect that change
internally. Thus, a user has to use 'virsh suspend' to get libvirt
back in sync with qemu state, and if the user doesn't know this
trick, then the guest appears hung.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
Track fact that qemu wakes up a suspended domain on migration.
The previous fix to avoid leaking securityDriverNames forgot to
handle the case of securityDriverNames being NULL, leading to
a crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The autodestroy callback code has the following function
called from a hash iterator
qemuDriverCloseCallbackRun(void *payload,
const void *name,
void *opaque)
{
...
char *uuidstr = name
...
dom = closeDef->cb(data->driver, dom, data->conn);
if (dom)
virObjectUnlock(dom);
virHashRemoveEntry(data->driver->closeCallbacks, uuidstr);
}
The closeDef->cb function may well cause the current callback
to be removed, if it shuts down 'dom'. As such the use of
'uuidstr' in virHashRemoveEntry is accessing free'd memory.
We must make a copy of the uuid str before invoking the
callback to be safe.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This will allow storing additional topology data in the NUMA topology
definition.
This patch changes the storage type and fixes fallout of the change
across the drivers using it.
This patch also changes semantics of adding new NUMA cell information.
Until now the data were re-allocated and copied to the topology
definition. This patch changes the addition function to steal the
pointer to a pre-allocated structure to simplify the code.
The way in that memory balloon suppression was handled for S390
is flawed for a number or reasons.
1. Just preventing the default balloon to be created in the case
of VIR_ARCH_S390[X] is not sufficient. An explicit memballoon
element in the guest definition will still be honored, resulting
both in a -balloon option and the allocation of a PCI bus address,
neither being supported.
2. Prohibiting balloon for S390 altogether at a domain_conf level
is no good solution either as there's work in progress on the QEMU
side to implement a virtio-balloon device, although in
conjunction with a new machine type. Suppressing the balloon
should therefore be done at the QEMU driver level depending
on the present capabilities.
Therefore we remove the conditional suppression of the default
balloon in domain_conf.c.
Further, we are claiming the memballoon device for virtio-s390
during device address assignment to prevent it from being considered
as a PCI device.
Finally, we suppress the generation of the balloon command line option
if this is a virtio-s390 machine.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Should have been done in commit 56fd513 already, but was missed
due to oversight: qemuDomainSendKey didn't release the driver lock
in its cleanup section. This fixes an issue introduced by commit
8c5d2ba.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=892079
One of my previous patches (f2a4e5f176) tried to fix crashing
libvirtd on domain detroy. However, we need to copy pattern from
qemuProcessHandleMonitorEOF() instead of decrementing reference
counter. The rationale for this is, if qemu process is dying due
to domain being destroyed, we obtain EOF on both the monitor and
agent sockets. However, if the exit is expected, qemuProcessStop
is called, which cleans both agent and monitor sockets up. We
want qemuAgentClose() to be called iff the EOF is not expected,
so we don't leak an FD and memory. Moreover, there could be race
with qemuProcessHandleMonitorEOF() which could have already
closed the agent socket, in which case we don't want to do
anything.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
The snapshot name is used to create path to the definition save file.
When the name contains slashes the creation of the file fails. Reject
such names.
When the snapshot definition can't be saved, the
qemuDomainSnapshotCreate function succeeded without filling some of the
fields in the internal definition.
This patch removes the snapshot and returns failure if the XML file
cannot be written.
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The driver mutex was unlocked in qemuDomainModifyDeviceFlags before
entering qemuDomainObjBeginJobWithDriver where it will be unlocked once
more leaving it in an undefined state. The result was that two
threads were simultaneously looking up the domain hash table during
multiple parallel device attach/detach operations.
Luckily this triggered a virHashIterationError.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
In commit c4bbaaf8, caps->arch was checked uninitialized, rendering the
whole check useless.
This patch moves the conditional setting of QEMU_CAPS_NO_ACPI to
qemuCapsInitQMP, and removes the no longer needed exception for S390.
It also clears the flag for all non-x86 archs instead of just S390 in
qemuCapsInitHelp.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After live change of cpu counts, the number of processor threads is
verified. This patch makes use of this approach to check if qemu ignored
the request for cpu hot-unplug and report an appropriate message.
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
https://bugzilla.redhat.com/show_bug.cgi?id=892079
With current code, if user calls virDomainPMSuspendForDuration()
followed by virDomainDestroy(), the former API checks for qemu agent
presence, which will evaluate as true (if agent is configured). While
talking to qemu agent, the qemu driver is unlocked, so the latter API
starts executing. However, if machine dies meanwhile, libvirtd gets
EOF on the agent socket and qemuProcessHandleAgentEOF() is called. The
handler clears reference to qemu agent while the destroy API already
holding a reference to it. This leads to NULL dereferencing later in
the code. Therefore, the agent pointer should be set to NULL only if
we are the exclusive owner of it.
While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
Perform all the appropriate plumbing.
When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,
libvirt will think of them as "paused" event after they are resumed and
effectively running. With this patch the discrepancy goes away.
This is meant to address bug 892791.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
This is the QEMU backend code for the SCLP console support.
It includes SCLP capability detection, QEMU command line generation
and a test case.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Since we daemonized QEMU for capabilities probing there is a long
time if QEMU fails to launch. This is because we're not passing in
any virDomainObjPtr instance and thus the monitor code can not
check to see if the PID is still alive.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code is initializing capabilities before setting
directory permissions. Thus the QEMU binaries being run may
not have the ability to create the UNIX monitor socket on
the first run of libvirtd.
This prevents domain starting and disk attaching if the shared disk's
setting conflicts with other active domain(s), E.g. A domain with
"sgio" set as "filtered", however, another active domain is using
it set as "unfiltered".
This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>