To enable locking to be introduced to the security manager
objects later, turn virSecurityManager into a virObjectLockable
class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
From qemu's point of view these are still just tap devices, so there's
no reason they shouldn't work with vhost-net; as a matter of fact,
Raja Sivaramakrishnan <srajag00@yahoo.com> verified on libvir-list
that at least the qemu_command.c part of this patch works:
https://www.redhat.com/archives/libvir-list/2012-December/msg01314.html
(the hotplug case is extrapolation on my part).
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To avoid confusion between 'virCapsPtr' and 'qemuCapsPtr'
do some renaming of various fucntions/variables. All
instances of 'qemuCapsPtr' are renamed to 'qemuCaps'. To
avoid that clashing with the 'qemuCaps' typedef though,
rename the latter to virQEMUCaps.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupPtr instance APIs are safe to use without locking
in the QEMU driver, since all internal state they rely on is
immutable. Update the comment to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We are requesting for stderr catching for all cases in
virFileWrapperFdNew(). There is no need to have a separate
function just to report an error, esp. when we can do it in
virFileWrapperFdClose().
The qemuParseGlusterString() replaced dst->src without a VIR_FREE() of
what was in there before.
The qemuBuildCommandLine() did not properly free the boot_buf depending
on various usages.
The qemuParseCommandLineDisk() had numerous paths that didn't clean up
the virDomainDiskDefPtr def properly. Adjust the logic to go through an
error: label before cleanup in order to free the resource.
Currently the activePciHostdevs, inactivePciHostdevsd and
activeUsbHostdevs lists are all implicitly protected by the
QEMU driver lock. Now that the lists all inherit from the
virObjectLockable, we can make the locking explicit, removing
the dependency on the QEMU driver lock for correctness.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When iterating over USB host devices to setup cgroups, the
usbDevice object was leaked in both LXC and QEMU driers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU driver struct has a 'qemuVersion' field that was previously
used to cache the version lookup from capabilities. With the recent
QEMU capabilities rewrite the caching happens at a lower level so
this field is pointless. Removing it avoids worries about locking
when updating it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The duplicate VM checking should be done atomically with
virDomainObjListAdd, so shoud not be a separate function.
Instead just use flags to indicate what kind of checks are
required.
This pair, used in virDomainCreateXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainRestoreFlags:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, true)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainDefineXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 0) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
0, NULL)))
goto cleanup;
Otherwise, we get a lot of scary (but harmless) noise in the logs:
2013-02-05 15:35:48.555+0000: 8637: error : qemuMonitorJSONCheckError:353 : internal error unable to execute QEMU command 'add-fd': Parameter 'fdset-id' expects an existing fdset-id
one for every qemu 1.2 binary that we probe.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd): During
probe, avoid logging failures.
As a step towards making virDomainObjList thread-safe turn it
into an opaque virObject, preventing any direct access to its
internals.
As part of this a new method virDomainObjListForEach is
introduced to replace all existing usage of virHashForEach
Currently the virQEMUDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virQEMUDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virQEMUDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a compression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
If a decompression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
Add support for QEMU -add-fd command line parameter detection.
This intentionally rejects qemu 1.2, where 'add-fd' QMP did
not allow full control of set ids, and where there was no command
line counterpart, but accepts qemu 1.3.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add entry points for calling the qemu 'add-fd' and 'remove-fd'
monitor commands. There is no entry point for 'query-fdsets';
the assumption is that a developer can use
virsh qemu-monitor-command domain '{"execute":"query-fdsets"}'
when debugging issues, and that meanwhile, libvirt is responsible
enough to remember what fds it associated with what fdsets.
Likewise, on the 'add-fd' command, it is assumed that libvirt
will always pass a set id, rather than letting qemu autogenerate
the next available id number.
* src/qemu/qemu_monitor.c (qemuMonitorAddFd, qemuMonitorRemoveFd):
New functions.
* src/qemu/qemu_monitor.h (qemuMonitorAddFd, qemuMonitorRemoveFd):
New prototypes.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New prototypes.
https://bugzilla.redhat.com/show_bug.cgi?id=894723
Currently, if qemuProcessStart() succeeds, but it's decompression
binary that returns nonzero status, we don't kill the qemu process,
but remove it from internal domain list, leaving the qemu process
hanging around totally uncontrolled.
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
With our code, we fail to query for tray-open attribute currently.
That's because in HMP it is 'tray-open' and in QMP it's 'tray_open'.
It always has been. However, we got it exactly the opposite.
A logic bug meant we reported KVM was possible for every
architecture, merely based on whether the query-kvm command
exists. We should instead have been doing it based on whether
the query-kvm command returns 'present: 1'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently QEMU capabilities are initialized before the QEMU driver
sets ownership on its various directories. The upshot is that if
you change the user/group in the qemu.conf file, libvirtd will fail
to probe QEMU the first time it is run after the config change.
Moving QEMU capabilities initialization to after the chown() calls
fixes this
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This previous commit
commit 1a50ba2cb0
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date: Mon Nov 26 15:17:13 2012 +0100
qemu: Fix QMP Capabability Probing Failure
which attempted to make sure the QEMU process used for probing
ran as the right user id, caused serious performance regression
and unreliability in probing. The -daemonize switch in QEMU
guarantees that the monitor socket is present before the parent
process exits. This means libvirtd is guaranteed to be able to
connect immediately. By switching from -daemonize to the
virCommandDaemonize API libvirtd was no longer synchronized with
QEMU's startup process. The result was that the QEMU monitor
failed to open and went into its 200ms sleep loop. This happened
for all 25 binaries resulting in 5 seconds worth of sleeping
at libvirtd startup. In addition sometimes when libvirt connected,
QEMU would be partially initialized and crash causing total
failure to probe that binary.
This commit reverts the previous change, ensuring we do use the
-daemonize flag to QEMU. Startup delay is cut from 7 seconds
to 2 seconds on my machine, which is on a par with what it was
prior to the capabilities rewrite.
To deal with the fact that QEMU needs to be able to create the
pidfile, we switch pidfile location fron runDir to libDir, which
QEMU is guaranteed to be able to write to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there is no reason to hold qemu driver locked
throughout whole API execution. Moreover, we can use the
new qemuDomObjFromDomain() internal API to lookup domain then.
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
if a guest is put into S3 state (such as via virsh dompmsuspend)
and then an external snapshot is taken, qemu forcefully transitions
the domain to paused, but libvirt doesn't reflect that change
internally. Thus, a user has to use 'virsh suspend' to get libvirt
back in sync with qemu state, and if the user doesn't know this
trick, then the guest appears hung.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
Track fact that qemu wakes up a suspended domain on migration.
The previous fix to avoid leaking securityDriverNames forgot to
handle the case of securityDriverNames being NULL, leading to
a crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The autodestroy callback code has the following function
called from a hash iterator
qemuDriverCloseCallbackRun(void *payload,
const void *name,
void *opaque)
{
...
char *uuidstr = name
...
dom = closeDef->cb(data->driver, dom, data->conn);
if (dom)
virObjectUnlock(dom);
virHashRemoveEntry(data->driver->closeCallbacks, uuidstr);
}
The closeDef->cb function may well cause the current callback
to be removed, if it shuts down 'dom'. As such the use of
'uuidstr' in virHashRemoveEntry is accessing free'd memory.
We must make a copy of the uuid str before invoking the
callback to be safe.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This will allow storing additional topology data in the NUMA topology
definition.
This patch changes the storage type and fixes fallout of the change
across the drivers using it.
This patch also changes semantics of adding new NUMA cell information.
Until now the data were re-allocated and copied to the topology
definition. This patch changes the addition function to steal the
pointer to a pre-allocated structure to simplify the code.
The way in that memory balloon suppression was handled for S390
is flawed for a number or reasons.
1. Just preventing the default balloon to be created in the case
of VIR_ARCH_S390[X] is not sufficient. An explicit memballoon
element in the guest definition will still be honored, resulting
both in a -balloon option and the allocation of a PCI bus address,
neither being supported.
2. Prohibiting balloon for S390 altogether at a domain_conf level
is no good solution either as there's work in progress on the QEMU
side to implement a virtio-balloon device, although in
conjunction with a new machine type. Suppressing the balloon
should therefore be done at the QEMU driver level depending
on the present capabilities.
Therefore we remove the conditional suppression of the default
balloon in domain_conf.c.
Further, we are claiming the memballoon device for virtio-s390
during device address assignment to prevent it from being considered
as a PCI device.
Finally, we suppress the generation of the balloon command line option
if this is a virtio-s390 machine.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Should have been done in commit 56fd513 already, but was missed
due to oversight: qemuDomainSendKey didn't release the driver lock
in its cleanup section. This fixes an issue introduced by commit
8c5d2ba.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=892079
One of my previous patches (f2a4e5f176) tried to fix crashing
libvirtd on domain detroy. However, we need to copy pattern from
qemuProcessHandleMonitorEOF() instead of decrementing reference
counter. The rationale for this is, if qemu process is dying due
to domain being destroyed, we obtain EOF on both the monitor and
agent sockets. However, if the exit is expected, qemuProcessStop
is called, which cleans both agent and monitor sockets up. We
want qemuAgentClose() to be called iff the EOF is not expected,
so we don't leak an FD and memory. Moreover, there could be race
with qemuProcessHandleMonitorEOF() which could have already
closed the agent socket, in which case we don't want to do
anything.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
The snapshot name is used to create path to the definition save file.
When the name contains slashes the creation of the file fails. Reject
such names.
When the snapshot definition can't be saved, the
qemuDomainSnapshotCreate function succeeded without filling some of the
fields in the internal definition.
This patch removes the snapshot and returns failure if the XML file
cannot be written.
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The driver mutex was unlocked in qemuDomainModifyDeviceFlags before
entering qemuDomainObjBeginJobWithDriver where it will be unlocked once
more leaving it in an undefined state. The result was that two
threads were simultaneously looking up the domain hash table during
multiple parallel device attach/detach operations.
Luckily this triggered a virHashIterationError.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
In commit c4bbaaf8, caps->arch was checked uninitialized, rendering the
whole check useless.
This patch moves the conditional setting of QEMU_CAPS_NO_ACPI to
qemuCapsInitQMP, and removes the no longer needed exception for S390.
It also clears the flag for all non-x86 archs instead of just S390 in
qemuCapsInitHelp.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After live change of cpu counts, the number of processor threads is
verified. This patch makes use of this approach to check if qemu ignored
the request for cpu hot-unplug and report an appropriate message.
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
https://bugzilla.redhat.com/show_bug.cgi?id=892079
With current code, if user calls virDomainPMSuspendForDuration()
followed by virDomainDestroy(), the former API checks for qemu agent
presence, which will evaluate as true (if agent is configured). While
talking to qemu agent, the qemu driver is unlocked, so the latter API
starts executing. However, if machine dies meanwhile, libvirtd gets
EOF on the agent socket and qemuProcessHandleAgentEOF() is called. The
handler clears reference to qemu agent while the destroy API already
holding a reference to it. This leads to NULL dereferencing later in
the code. Therefore, the agent pointer should be set to NULL only if
we are the exclusive owner of it.
While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
Perform all the appropriate plumbing.
When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,
libvirt will think of them as "paused" event after they are resumed and
effectively running. With this patch the discrepancy goes away.
This is meant to address bug 892791.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
This is the QEMU backend code for the SCLP console support.
It includes SCLP capability detection, QEMU command line generation
and a test case.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Since we daemonized QEMU for capabilities probing there is a long
time if QEMU fails to launch. This is because we're not passing in
any virDomainObjPtr instance and thus the monitor code can not
check to see if the PID is still alive.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code is initializing capabilities before setting
directory permissions. Thus the QEMU binaries being run may
not have the ability to create the UNIX monitor socket on
the first run of libvirtd.
This prevents domain starting and disk attaching if the shared disk's
setting conflicts with other active domain(s), E.g. A domain with
"sgio" set as "filtered", however, another active domain is using
it set as "unfiltered".
This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
This patch adds a new domain lookup helper qemuDomObjFromDomainDriver
that lookups the domain and leaves the driver locked. The driver is
returned as the second argument of that function. If the lookup fails
the driver is unlocked to help avoid cleanup codepaths.
This patch also improves docs for the helpers.
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch exports qemuMigrationIsAllowed and adds a new parameter to it
to denote if it's a remote migration or a local migration. Local
migrations are used in snapshots and saving of the machine state and
have fewer restrictions. This patch also adjusts callers of the function
and tweaks some error messages to be more universal.
Currently there is no way to detect it via QMP and requesting "-sandbox
off" works correctly even if it was compiled out, so this will work
unless someone both requests the sandbox in qemu.conf and builds QEMU
without the support for it.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
When the disk snapshot part of an external system checkpoint fails the
memory image is retained. This patch adds code to remove the image in
such case.
In case the snapshot code isn't able to restart CPUs after an external
checkpoint we would leak a copy of the domains XML definition. This
patch fixes the cleanup path.
False positive, but it breaks the build with gcc-4.6.3.
qemu/qemu_migration.c:2931:37: error: 'offline' may be used
uninitialized in this function [-Werror=uninitialized]
qemu/qemu_migration.c:2887:10: note: 'offline' was declared here
When restarting CPUs after an external snapshot, the restarting function
was called without the appropriate async job type. This caused that a
new sync job wasn't created and allowed races in the monitor.
Offline migration transfers inactive definition of a domain (which may
or may not be active). After successful completion, the domain remains
in its current state on source host and is defined but inactive on
destination host. It's a bit more clever than virDomainGetXMLDesc() on
source host followed by virDomainDefineXML() on destination host, as
offline migration will run pre-migration hook to update the domain XML
on destination host. Currently, copying non-shared storage is not
supported during offline migration.
Offline migration can be requested with a new migration flag called
VIR_MIGRATE_OFFLINE (which has to be combined with
VIR_MIGRATE_PERSIST_DEST flag).
This fixes a problem that showed up during testing of:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
Due to a logic error in the function that gets the name of the bridge
an interface connects to, any time a bridge was specified directly
(type='bridge') rather than indirectly (type='network'), An error
would be logged (although the operation would then complete
successfully):
Network type 6 is not supported
The final virReportError() in the function
qemuDomainNetGetBridgeName() was apparently avoided in the past with a
"goto cleanup" at the end of each case, but the case of bridge somehow
no longer has that final goto cleanup.
The proper solution is anyway to not rely on goto's, but put the error
log inside an else {} clause, so that it's executed only if the type
is neither bridge nor network (in reality, this function should only
ever be called for those two types, that's why this is an internal
error).
While making this change, the error message was also tuned to be more
correct (since it's not really the type of the network, but the type
of the interface, and it *is* otherwise supported, it's just that the
interface type in question doesn't *have* a bridge device associated
with it, or at least we don't know how to get it).
If a network interface model is not specified, libvirt will run
into an unchecked NULL pointer coredump. On the other hand if
the empty model is ignored, a PCI bus address would be generated,
which is not supported by S390.
Since the only valid network type model for S390 is virtio,
we use this as the default value, which is the same for QEMU.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Things are supposed to look like:
<machine canonical='pc-0.12'>pc</machine>
But are currently swapped. This can cause many VMs to revert to having
machine type='pc' which will affect save/restore across qemu upgrades.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Unmanaged PCI devices were only leaked if pciDeviceListAdd failed but
managed devices were always leaked. And leaking PCI device is likely to
leave PCI config file descriptor open. This patch fixes
qemuReattachPciDevice to either free the PCI device or add it to the
inactivePciHostdevs list.
An attempt to attach device that is already attached to a domain results
in the following error:
virsh # attach-device rhel6 pci2 --persistent
error: Failed to attach device from pci2
error: invalid argument: device is already in the domain configuration
The "invalid argument" error code looks wrong, we usually use "operation
invalid" when the action cannot be done in current state.
Only one error in qemu_monitor was already using the relatively
new OPERATION_UNSUPPORTED error, even though it is a better fit
for all of the messages related to options that are unsupported
due to the version of qemu in use rather than due to a user's
XML or .conf file choice. Suggested by Osier Yang.
* src/qemu/qemu_monitor.c (qemuMonitorSendFileHandle)
(qemuMonitorAddHostNetwork, qemuMonitorRemoveHostNetwork)
(qemuMonitorAttachDrive, qemuMonitorDiskSnapshot)
(qemuMonitorDriveMirror, qemuMonitorTransaction)
(qemuMonitorBlockCommit, qemuMonitorDrivePivot)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup)
(qemuMonitorGetVersion, qemuMonitorGetMachines)
(qemuMonitorGetCPUDefinitions, qemuMonitorGetCommands)
(qemuMonitorGetEvents, qemuMonitorGetKVMState)
(qemuMonitorGetObjectTypes, qemuMonitorGetObjectProps)
(qemuMonitorGetTargetArch): Use better error category.
Without this patch, attempts to create a disk snapshot when qemu
is too old results in a cryptic message:
virsh # snapshot-create 23 --disk-only
error: operation failed: Failed to take snapshot: unknown command: 'snapshot_blkdev'
Now it reports:
virsh # snapshot-create 23 --disk-only
error: unsupported configuration: live disk snapshot not supported with this QEMU binary
All versions of qemu that support live disk snapshot also support
QMP (basically upstream qemu 1.1 and later, and backports to RHEL 6.2).
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DISK_SNAPSHOT): New
capability.
* src/qemu/qemu_capabilities.c (qemuCaps): Track it.
(qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
it.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Simplify.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextDiskSnapshot):
Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextDiskSnapshot):
Likewise.
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
The fact that only the guest agent, or ACPI flag can be used
when requesting reboot/shutdown is merely a limitation of the
QEMU driver impl at this time. Thus it should not be in
libvirt.c code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The default machine type must be stored in the first element of
the caps->machineTypes array. This was done for help output
parsing but not for QMP probing.
Added a helper function qemuSetDefaultMachine to apply the same
fix up for both probing methods.
Further, it was necessary to set caps->nmachineTypes after QMP
probing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=872292
Libvirt should not attempt to call a QMP command that has not been
documented in qemu.git - if future qemu introduces a command by the
same name but with subtly different semantics, then libvirt will be
broken when trying to use that command.
We also had some code that could never be reached - some of our
commands have an alternate for new vs. old qemu HMP commands; but
if we are new enough to support QMP, we only need a fallback to
the new HMP counterpart, and don't need to try for a QMP counterpart
for the old HMP version.
See also this attempt to convert the three snapshot commands to QMP:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01597.html
although it looks like that will still not happen before qemu 1.3.
That thread eventually decided that qemu would use the name
'save-vm' rather than 'savevm', which mitigates the fact that
libvirt's attempt to use a QMP 'savevm' would be broken, but we
might not be as lucky on the other commands.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONSetCPU)
(qemuMonitorJSONAddDrive, qemuMonitorJSONDriveDel)
(qemuMonitorJSONCreateSnapshot, qemuMonitorJSONLoadSnapshot)
(qemuMonitorJSONDeleteSnapshot): Use only HMP fallback for now.
(qemuMonitorJSONAddHostNetwork, qemuMonitorJSONRemoveHostNetwork)
(qemuMonitorJSONAttachDrive, qemuMonitorJSONGetGuestDriveAddress):
Delete; QMP implies QEMU_CAPS_DEVICE, which prefers AddNetdev,
RemoveNetdev, and AddDrive anyways (qemu_hotplug.c has all callers).
* src/qemu/qemu_monitor.c (qemuMonitorAddHostNetwork)
(qemuMonitorRemoveHostNetwork, qemuMonitorAttachDrive): Reflect
deleted commands.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddHostNetwork)
(qemuMonitorJSONRemoveHostNetwork, qemuMonitorJSONAttachDrive):
Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
Commit 1b2ebf95 overlooked that there were two spots affected.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice):
Transfer backing chain before deletion.
* src/qemu/qemu_driver.c (qemuDomainDetachDeviceDiskLive): Fix
spacing (partly to ensure a different-looking patch).
This patch adds two labels and gets rid of a ton of duplicated code.
This patch also fixes some error message and switches most of them to
proper error reporting functions.
This patch adds macros to help retrieve configuration values from qemu
driver's configuration. Some configuration options are grouped
together in the process.
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of 1a50ba2cb0 we fail to connect to the
monitor instead of getting an exit status != 0 from qemu itself. This
breaks capabilities probing for the non QMP case.
Replace the following names
* struct qemu_snap_remove with virQEMUSnapRemovePtr
* struct qemu_snap_reparent with virQEMUSnapReparentPtr
* struct qemu_save_header with virQEMUSaveHeaderPtr
* enum qemu_save_formats with virQEMUSaveFormat
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
Throughout the code, we've always used VIR_DOMAIN_SHUTDOWN* flags
even for virDomainReboot() API and its implementation. Fortunately,
the appropriate macros has the same value. But if we want to keep
things consistent, we should be using the correct macros. This
patch doesn't break anything, luckily.
Error messages produced while dispatching guest agent commands didn't
have an apparent reference to the fact that they are dealing with guest
agent commands. This patch fixes up some of the messages to contain that
reference.
using qemu guest agent. As said in previous patch,
@mountPoint must be NULL and @flags zero because
qemu guest agent doesn't support these arguments
yet. If qemu learns them, we can start supporting
them as well.
With QMP capability probing, the version was not set.
virsh version returns:
...
Cannot extract running QEMU hypervisor version
This is fixed by computing caps->version from QMP major,
minor, micro values.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
QMP Capability probing will fail if QEMU cannot bind to the
QMP monitor socket in the qemu_driver->libDir directory.
That's because the child process is stripped of all
capabilities and this directory is chown'ed to the configured
QEMU user/group (normally qemu:qemu) by the QEMU driver.
To prevent this from happening, the driver startup will now pass
the QEMU uid and gid down to the capability probing code.
All capability probing invocations of QEMU will be run with
the configured QEMU uid instead of libvirtd's.
Furter, the pid file handling is moved to libvirt, as QEMU
cannot write to the qemu_driver->runDir (root:root). This also
means that the libvirt daemonizing must be used.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If qemuMonitorOpenUnix is called without a related pid, i.e. for
QMP probing, a connect failure can happen as the result of a race.
Without a pid there is no retry and thus we give up too early.
This changes the code to retry if no pid is supplied.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Change some legacy function names to use 'qemu' as their
prefix instead of 'qemud' which was a hang over from when
the QEMU driver ran inside a separate daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
The actual regression is due to a latent bug. The hot unplug code
was computing the set of files needing cgroup ACL revocation based
on the XML passed in by the user, rather than based on the domain's
details on which disk was being deleted. As long as the revoke
path was always recomputing the backing chain, this didn't really
matter; but now that we want to compute the chain exactly once and
remember that computation, we need to hang on to the backing chain
until after the revoke has happened.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice):
Transfer backing chain before deletion.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Although we require various C99 features, we don't yet require a
complete C99 compiler. On RHEL 5, compilation complained:
qemu/qemu_command.c: In function 'qemuBuildGraphicsCommandLine':
qemu/qemu_command.c:4688: error: 'for' loop initial declaration used outside C99 mode
* src/qemu/qemu_command.c (qemuBuildGraphicsCommandLine): Declare
variable sooner.
* src/qemu/qemu_process.c (qemuProcessInitPasswords): Likewise.
The error "... but the cause is unknown" appeared for XMLs similar to
this:
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/dev/zero'/>
<target dev='sr0'/>
</disk>
Notice unsupported disk type (for the driver), but also no address
specified. The first part is not a problem and we should not abort
immediately because of that, but the combination with the address
unknown was causing an unspecified error.
While fixing this, I added an error to one place where this return
value was not managed properly.
I have been testing libvirt v1.0.0 for deployment within my
organization, and in the process discovered what appears to be a bug
that breaks virsh attach-device, when attaching an RBD volume to an
instance. First, here is the error presented, with v1.0.0 (this worked
in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml
error: Failed to attach device from G84VV1.xml
error: cannot open file 'dc3-1-test/G84VV1': No such file or directory
Using git bisect, I narrowed the problem down to this as the first
commit to break this setup:
4d34c92947 is the first bad commit
Upcoming patches for revert-and-clone branching of snapshots need
to be able to copy a domain definition; make this step reusable.
* src/conf/domain_conf.h (virDomainDefCopy): New prototype.
* src/conf/domain_conf.c (virDomainObjCopyPersistentDef): Split...
(virDomainDefCopy): ...into new function.
(virDomainObjSetDefTransient): Use it.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use it.
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.
Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
Currently, if user calls virDomainAbortJob we just issue
'migrate_cancel' and hope for the best. However, if user calls
the API in wrong phase when migration hasn't been started yet
(perform phase) the cancel request is just ignored. With this
patch, the request is remembered and as soon as perform phase
starts, migration is cancelled.
For S390, the default console target type cannot be of type 'serial'.
It is necessary to at least interpret the 'arch' attribute
value of the os/type element to produce the correct default type.
Therefore we need to extend the signature of defaultConsoleTargetType
to account for architecture. As a consequence all the drivers
supporting this capability function must be updated.
Despite the amount of changed files, the only change in behavior is
that for S390 the default console target type will be 'virtio'.
N.B.: A more future-proof approach could be to to use hypervisor
specific capabilities to determine the best possible console type.
For instance one could add an opaque private data pointer to the
virCaps structure (in case of QEMU to hold capsCache) which could
then be passed to the defaultConsoleTargetType callback to determine
the console target type.
Seems to be however a bit overengineered for the use case...
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When the libvirt daemon is restarted it tries to reconnect to running
qemu domains. Since commit d38897a5d4 the
re-connection code runs in separate threads. In the original
implementation the maximum of domain ID's (that is used as an
initializer for numbering guests created next) while libvirt was
reconnecting to the guest.
With the threaded implementation this opens a possibility for race
conditions with the thread that is autostarting guests. When there's a
guest running with id 1 and the daemon is restarted. The autostart code
is reached first and spawns the first guest that should be autostarted
as id 1. This results into the following unwanted situation:
# virsh list
Id Name State
----------------------------------------------------
1 guest1 running
1 guest2 running
This patch extracts the detection code before the re-connection threads
are started so that the maximum id of the guests being reconnected to is
known.
The only semantic change created by this is if the guest with greatest ID
quits before we are able to reconnect it's ID is used anyway as the
greatest one as without this patch the greatest ID of a process we could
successfuly reconnect to would be used.
This patch adds support for external disk snapshots of inactive domains.
The snapshot is created by calling using qemu-img by calling:
qemu-img create -f format_of_snapshot -o
backing_file=/path/to/src,backing_fmt=format_of_backing_image
/path/to/snapshot
in case the backing image format is known or probing is allowed and
otherwise:
qemu-img create -f format_of_snapshot -o backing_file=/path/to/src
/path/to/snapshot
on each of the disks selected for snapshotting. This patch also modifies
the snapshot preparing function to support creating external snapshots
and to sanitize arguments. For now the user isn't able to mix external
and internal snapshots but this restriction might be lifted in the
future.
Some operations, APIs needs domain to be paused prior operation can be
performed, e.g. (managed-) save of a domain. The processors should be
restored in the end. However, if 'cont' fails for some reason, we log a
message but this is not sufficient as an event should be emitted as
well. Mgmt application can then decide what to do.
The code that was split out into the qemuDomainSaveMemory expands the
pointer containing the XML description of the domain that it gets from
higher layers. If the pointer changes the old one is invalid and the
upper layer function tries to free it causing an abort.
This patch changes the expansion of the original string to a new
allocation and copy of the contents.
qemu is sensitive to the order of arguments passed. Hence, if a
device requires a controller, the controller cmd string must
precede device cmd string. The same apply for controllers, when
for instance ccid controller requires usb controller. So
controllers create partial ordering in which they should be added
to qemu cmd line.
Some of the pre-snapshot check have restrictions wired in regarding
configuration options that influence taking of external checkpoints.
This patch removes restrictions that would inhibit taking of such a
snapshot.
This patch adds support to take external system checkpoints.
The functionality is layered on top of the previous disk-only snapshot
code. When the checkpoint is requested the domain memory is saved to the
memory image file using migration to file. (The user may specify to
take the memory image while the guest is live with the
VIR_DOMAIN_SNAPSHOT_CREATE_LIVE flag.)
The memory save image shares format with the image created by
virDomainSave() API.
Before now, libvirt supported only internal snapshots for active guests.
This patch renames this function to qemuDomainSnapshotCreateActiveInternal
to prepare the grounds for external active snapshots.
The new external system checkpoints will require an async job while the
snapshot is taken. This patch adds QEMU_ASYNC_JOB_SNAPSHOT to track this
job type.
The code that saves domain memory by migration to file can be reused
while doing external checkpoints of a machine. This patch extracts the
common code and places it in a separate function.
When pausing the guest while migration is running (to speed up
convergence) the virDomainSuspend API checks if the migration job is
active before entering the job. This could cause a possible race if the
virDomainSuspend is called while the job is active but ends before the
Suspend API enters the job (this would require that the migration is
aborted). This would cause a incorrect event to be emitted.
Both system checkpoint snapshots and disk snapshots were iterating
over all disks, doing a final sanity check before doing any work.
But since future patches will allow offline snapshots to be either
external or internal, it makes sense to share the pass over all
disks, and then relax restrictions in that pass as new modes are
implemented. Future patches can then handle external disks when
the domain is offline, then handle offline --disk-snapshot, and
finally, combine with migration to file to gain a complete external
system checkpoint snapshot of an active domain without using 'savevm'.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotIsAllowed): Merge...
(qemuDomainSnapshotPrepare): ...into one function.
(qemuDomainSnapshotCreateXML): Update caller.
Now that the XML supports listing internal snapshots, it is worth
always populating the <memory> and <disks> element to match.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Always
parse disk info and set memory info.
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=871273
when using virsh qemu-attach to attach an existing qemu process,
if it misses the -M option in qemu command line, libvirtd crashed
because the NULL value of def->os.machine in later use.
Example:
/usr/libexec/qemu-kvm -name foo \
-cdrom /var/lib/libvirt/images/boot.img \
-monitor unix:/tmp/demo,server,nowait \
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
This patch tries to set default machine type if the value of
def->os.machine is still NULL after qemu command line parsing.
Per the code comment in qemuCapsInitQMPBasic() and commit 43e23c7, we
should only use QMP for capabilities probing starting with 1.2 and
newer. The old code had dead logic that probed on 1.0 and newer.
Signed-off-by: Eric Blake <eblake@redhat.com>
The string comparison logic was inverted and matched the first drive
that does *not* have the name we search for.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The QEMU -drive id= begins with libvirt's QEMU host drive prefix
("drive-"), which is stripped off in several places two convert between
host ("-drive") and guest ("-device") device names.
In the case of BlkIoTune it is unnecessary to strip the QEMU host drive
prefix because we operate on "info block"/"query-block" output that uses
host drive names.
Stripping the prefix incorrectly caused string comparisons to fail since
we were comparing the guest device name against the host device name.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
QEMU uses 'i386' for its 32-bit x86 architecture, but libvirt
wants that to be 'i686', so we must fix it up
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=871756
Commit cd1e8d1 assumed that systems new enough to have journald
also have mkostemp; but this is not true for uclibc.
For that matter, use of mkstemp[s] is unsafe in a multi-threaded
program. We should prefer mkostemp[s] in the first place.
* bootstrap.conf (gnulib_modules): Add mkostemp, mkostemps; drop
mkstemp and mkstemps.
* cfg.mk (sc_prohibit_mkstemp): New syntax check.
* tools/virsh.c (vshEditWriteToTempFile): Adjust caller.
* src/qemu/qemu_driver.c (qemuDomainScreenshot)
(qemudDomainMemoryPeek): Likewise.
* src/secret/secret_driver.c (replaceFile): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=871312
Recent fixes made almost all the right steps to make emulator pinned
to the cpuset of the whole domain in case <emulatorpin> isn't
specified, but qemudDomainGetEmulatorPinInfo still reports all the
CPUs even when cpuset is specified. This patch fixes that.
When there is no 'qemu-kvm' binary and the emulator used for a machine
is, for example, 'qemu-system-x86_64' that, by default, runs without
kvm enabled, libvirt still supplies '-no-kvm' option to this process,
even though it does not recognize such option (making the start of a
domain fail in that case).
This patch fixes building a command-line for QEMU machines without KVM
acceleration and is based on following assumptions:
- QEMU_CAPS_KVM flag means that QEMU is running KVM accelerated
machines by default (without explicitly requesting that using a
command-line option). It is the closest to the truth according to
the code with the only exception being the comment next to the
flag, so it's fixed in this patch as well.
- QEMU_CAPS_ENABLE_KVM flag means that QEMU is, by default, running
without KVM acceleration and in case we need KVM acceleration it
needs to be explicitly instructed to do so. This is partially
true for the past (this option essentially means that QEMU
recognizes the '-enable-kvm' option, even though it's almost the
same).
Currently, we use iohelper when saving/restoring a domain.
However, if there's some kind of error (like I/O) it is not
propagated to libvirt. Since it is not qemu who is doing
the actual write() it will not get error. The iohelper does.
Therefore we should check for iohelper errors as it makes
libvirt more user friendly.
In the XML warning, we print a virsh command line that can be used to
edit that XML. This patch prints UUIDs if the entity name contains
special characters (like shell metacharacters, or "--" that would break
parsing of the XML comment). If the entity doesn't have a UUID, just
print the virsh command that can be used to edit it.
When using block copy to pivot over to a new chain, the backing files
for the new chain might still need labeling (particularly if the user
passes --reuse-ext with a relative backing file name). Relabeling a
file that is already labeled won't hurt, so this just labels the entire
chain at the point of the pivot. Doing the relabel of the chain uses
the fact that we already safely probed the file type of an external
file at the start of the block copy.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before
asking qemu to pivot.
Use the recent addition of qemuDomainPrepareDiskChainElement to
obtain locking manager lease, permit a block device through cgroups,
and set the SELinux label; then audit the fact that we hand a new
file over to qemu. Alas, releasing the lease and label at the end
of the mirroring is a trickier prospect (we would have to trace the
backing chain of both source and destination, and be sure not to
revoke rights to any part of the chain that is shared), so for now,
virDomainBlockJobAbort still leaves things with additional access
granted (as block-pull and block-commit have the same problem of
not clamping access after completion, a future cleanup would cover
all three commands).
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling.
Support the REUSE_EXT flag, in part by copying sanity checks from
snapshot code. This code introduces a case of probing an external
file for its type; such an action would be a security risk if the
existing file is supposed to be raw but the contents resemble some
other format; however, since the virDomainBlockRebase API has a
flag to force treating the file as raw rather than probe, we can
assume that probing is safe in all other instances. Besides, if
we don't probe or force raw, then qemu will.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT
flag.
(qemuDomainBlockCopy): Wire up flag, and add some sanity checks.
Minimal patch to wire up all the pieces in the previous patches
to actually enable a block copy job. By minimal, I mean that
qemu creates the file (that is, no REUSE_EXT flag support yet),
SELinux must be disabled, a lock manager is not informed, and the
audit logs aren't updated. But those will be added as
improvements in future patches.
This patch is designed so that if we ever add a future API
virDomainBlockCopy with more bells and whistles (such as letting
the user specify a destination image format different than the
source), where virDomainBlockRebase is a wrapper around the
simpler portions of the new functionality, then the new API can
just reuse the new qemuDomainBlockCopy function and already
support _SHALLOW and _REUSE_EXT flags. Also note that libvirt.c
already filtered the new flags if _COPY is not present, so that
we are not impacting the case of BlockRebase being a wrapper
around BlockPull.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
(qemuDomainBlockRebase): Call it when appropriate.
Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a block-job-complete command; worse, it
is also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process. A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.
At the time of this patch, the design for persistent bitmap has not
been clarified, so a followup patch will be needed once qemu
actually figures out how to expose it, and we figure out how to use
it. In the meantime, we have a solution that avoids the worst of
the problem. [This problem was first analyzed with the RHEL 6.3
__com.redhat_drive-reopen command; which partly explains why
upstream qemu 1.3 ditched the drive-reopen idea and went with
block-job-complete plus persistent bitmap instead.]
If we surround 'drive-reopen' with a pause/resume pair, then we can
guarantee that the guest cannot modify either source or destination
files in the window of libvirtd uncertainty, and the management app
is guaranteed that either libvirt knows the outcome and reported it
correctly; or that on libvirtd restart, the guest will still be
paused and that the qemu process cannot have disappeared due to
guest shutdown; and use that as a clue that the management app must
implement recovery protocol, with both source and destination files
still being in sync and with 'query-block' still being an option as
part of that recovery. My testing shows that the pause window will
typically be only a fraction of a second.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).
This patch intentionally avoids SELinux, lock manager, and audit
actions. Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands when libvirtd restarts. This
patch is enough to test the common case of success when used
correctly, while saving the subtleties of proper cleanup for
worst-case errors for later.
When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel). Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.
Ideas for future enhancements via new flags:
Depending on when persistent bitmap support is added, it may be
worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up
front if we detect an older qemu with risky pivot operation.
Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file. There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time. And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain. This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
Handle the new type of block copy event and info. Of course,
this patch does nothing until a later patch actually allows the
creation/abort of a block copy job.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New
block job status.
* src/libvirt.c (virDomainBlockRebase): Document the event.
* src/qemu/qemu_monitor_json.c (eventHandlers): New event.
(qemuMonitorJSONHandleBlockJobReady): New function.
(qemuMonitorJSONGetBlockJobInfoOne): Translate new job type.
(qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize
the event to minimize snooping.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful
info query to save effort on a pivot request.
For now, disk migration via block copy job is not implemented in
libvirt. But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact. Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need. Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.
We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain. This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.
The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror'
and 'block-job-complete'[1], which can drive live block copy and
storage migration. [Additionally, RHEL 6.3 had backported an earlier
version of most of the same functionality, but under the names
'__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with
slightly different JSON arguments, and has been using patches similar
to these upstream patches for several months now.]
The libvirt API virDomainBlockRebase as already committed for 0.9.12
is flexible enough to expose the basics of block copy, but some
additional features in the 'drive-mirror' qemu command, such as
setting error policy, setting granularity, or using a persistent
bitmap, may later require a new libvirt API virDomainBlockCopy. I
will wait to add that API until we know more about what qemu 1.3
will finally provide.
This patch caters only to the upstream qemu 1.3 interface, although
I have proven that the changes for RHEL 6.3 can be isolated to
just qemu_monitor_json.c, and the rest of this series will
gracefully handle either interface once the JSON differences are
papered over in a downstream patch.
For consistency with other block job commands, libvirt must handle
the bandwidth argument as MiB/sec from the user, even though qemu
exposes the speed argument as bytes/sec; then again, qemu rounds
up to cluster size internally, so using MiB hides the worst effects
of that rounding if you pass small numbers.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04123.html
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR)
(QEMU_CAPS_DRIVE_REOPEN): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror)
(qemuMonitorDrivePivot): Declare them.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): New passthroughs.
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): Declare them.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=862515
which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.
(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)
A synopsis of the problems:
1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.
2) you *can* do a live attach-interface of such a device.
3) you *can* directly edit a domain and put in two devices with
matching mac addresses.
4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.
5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.
Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:
1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.
2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.
This patch does three things:
1) removes the check for duplicate mac address during a persistent
netdev attach.
2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.
Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)
3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.
Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device
* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
* added new virDomainNetFindIdx function
* removed now unused virDomainNetIndexByMac and
virDomainNetRemoveByMac
* src/qemu/qemu_driver.c
* remove check for duplicate max from qemuDomainAttachDeviceConfig
* use virDomainNetFindIdx/virDomainNetRemove instead
of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
* use virDomainNetFindIdx instead of virDomainIndexByMac
in qemuDomainUpdateDeviceConfig
* src/qemu/qemu_hotplug.c
* use virDomainNetFindIdx instead of a homespun loop in
qemuDomainDetachNetDevice.
* tools/virsh-domain.c: modified detach-interface command as described
above
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones. So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:
$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
Cleaning this up unraveled a chain of other unused variables.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus. Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.
While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.
* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
On one hand, numad probably will manage the affinity of domain process
dynamically in future. On the other hand, even numad won't manage it,
it still could confusion. Let's make things simpler enough to avoid
the lair for now.
When the cpu placement model is "auto", it sets the affinity for
domain process with the advisory nodeset from numad, however,
creating cgroup for the domain process (called emulator thread
in some contexts) later overrides that with pinning it to all
available pCPUs.
How to reproduce:
* Configure the domain with "auto" placement for <vcpu>, e.g.
<vcpu placement='auto'>4</vcpu>
* % virsh start dom
* % cat /proc/$dompid/status
Though the emulator cgroup cause conflicts, but we can't simply
prohibit creating it, as other tunables are still useful, such
as "emulator_period", which is used by API
virDomainSetSchedulerParameter. So this patch doesn't prohibit
creating the emulator cgroup, but inherit the nodeset from numad,
and reset the affinity for domain process.
* src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
to accept the passed nodenet
* src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
Abstract the codes to prepare cpumap into a helper a function,
which can be used later.
* src/qemu/qemu_process.h: Declare qemuPrepareCpumap
* src/qemu/qemu_process.c: Implement qemuPrepareCpumap, and use it.