https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out
a regression caused by commit 38c4a9c - libvirt only labels
the backing chain if the backing chain cache is populated, but
the code to populate the cache was only conditionally performed
if cgroup labeling was necessary.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup...
* src/qemu/qemu_process.c (qemuProcessStart): ...earlier into
caller, where it is now unconditional.
To avoid having to hold the qemu driver lock while iterating through
close callbacks and calling them. This fixes a real deadlock when a
domain which is being migrated from another host gets autodestoyed as a
result of broken connection to the other host.
The max value of number of cpus to compute(id) should not
be equal or greater than max cpu number.
The bug ocurrs when id value is equal to max cpu number which
leads to the off-by-one error in the following for loop.
# virsh cpu-stats guest --start 1
error: Failed to virDomainGetCPUStats()
error: internal error cpuacct parse error
Currently, if lzop decompression binary produces a warning, it
doesn't exit with zero status but 2 instead. Terrifying, but
true. However, warnings may be ignored using '--ignore-warn'
command line argument. Moreover, in which case, the exit status
will be zero.
For both AttachDevice and UpdateDevice APIs, if the disk device
is 'cdrom' or 'floppy', the operations could be ejecting, updating,
and inserting. For either ejecting or updating, the shared disk
entry of the original disk src has to be removed, because it's
not useful anymore.
And since the original disk def will be changed, new disk def passed
as argument will be free'ed in qemuDomainChangeEjectableMedia, so
we need to copy the orignal disk def before
qemuDomainChangeEjectableMedia, to use it for qemuRemoveSharedDisk.
The disk def could be free'ed by qemuDomainChangeEjectableMedia,
which can thus cause crash if we reference the disk pointer. On
the other hand, we have to remove the added shared disk entry from
the table on error codepath.
The hash entry is changed from "ref" to {ref, @domains}. With this, the
caller can simply call qemuRemoveSharedDisk, without afraid of removing
the entry belongs to other domains. qemuProcessStart will obviously
benifit from it on error codepath (which calls qemuProcessStop to do
the cleanup).
Based on moving various checking into qemuAddSharedDisk, this
avoids the caller using it in wrong ways. Also this adds two
new checking for qemuCheckSharedDisk (disk device not 'lun'
and kernel doesn't support unpriv_sgio simply returns 0).
Due to "feature"/"features" nasty typo, any features marked as mandatory
by one side of a migration are silently considered optional by the other
side. The following is the code that formats mandatory features in
migration cookie:
for (i = 0 ; i < QEMU_MIGRATION_COOKIE_FLAG_LAST ; i++) {
if (mig->flagsMandatory & (1 << i))
virBufferAsprintf(buf, " <feature name='%s'/>\n",
qemuMigrationCookieFlagTypeToString(i));
}
Some functions were using virDomainDeviceInfo where virDevicePCIAddress
would suffice. Some were only using integers for slots and functions,
assuming the bus numbers are always 0.
Switch from virDomainDeviceInfoPtr to virDevicePCIAddressPtr:
qemuPCIAddressAsString
qemuDomainPCIAddressCheckSlot
qemuDomainPCIAddressReserveAddr
qemuDomainPCIAddressReleaseAddr
Switch from int slot to virDevicePCIAddressPtr:
qemuDomainPCIAddressReserveSlot
qemuDomainPCIAddressReleaseSlot
qemuDomainPCIAddressGetNextSlot
Deleted functions (they would take the same parameters
as ReserveAddr/ReleaseAddr do now.)
qemuDomainPCIAddressReserveFunction
qemuDomainPCIAddressReleaseFunction
The conversion to qemuCaps dropped the ability with qemu{,-kvm} 1.2 and
newer to set the lost tick policy for the PIT. While the
-no-kvm-pit-reinjection option is depreacated, it is still supported at
least through 1.4, it is better to not lose the functionality.
The qemu driver had been calling virSecurityManagerSetProcessLabel()
from a "pre-exec hook" function that is run after the child is forked,
but before exec'ing qemu. This is problematic because the uid and gid
of the child are set by the security driver, but capabilities are
dropped by virCommand - such separation doesn't work; the two
operations must be done together or the capabilities do not transfer
properly to the child process.
This patch switches to using virSecurityManagerSetChildProcessLabel(),
which is called prior to virCommandRun() (rather than being called
*during* virCommandrun() by the hook function), and doesn't set the
UID/GID/security label directly, but instead merely informs virCommand
what it should set them all to when the time is appropriate.
This lets virCommand choose to do the uid/gid and caps dropping all at
the same time if it wants (it does *want* to, but isn't doing so yet;
that's for an upcoming patch).
Setting the uid/gid of the child process was the only thing done by
the hook function in this case, and that can now be done more simply
with virCommandSetUID/GID.
With the majority of fields in the virQEMUDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the QEMU driver lock. Only a handful
of helper APIs in qemu_conf.c now need it
The hook scripts used by virCommand must be careful wrt
accessing any mutexes that may have been held by other
threads in the parent process. With the recent refactoring
there are 2 potential flaws lurking, which will become real
deadlock bugs once the global QEMU driver lock is removed.
Remove use of the QEMU driver lock from the hook function
by passing in the 'virQEMUDriverConfigPtr' instance directly.
Add functions to the virSecurityManager to be invoked before
and after fork, to ensure the mutex is held by the current
thread. This allows it to be safely used in the hook script
in the child process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the APIs for managing the shared disk list take
a virHashTablePtr as the primary argument. This is bad
because it requires the caller to deal with locking of
the QEMU driver. Switch the APIs to take the full
virQEMUDriverPtr instance
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add locking to virSecurityManagerXXX APIs, so that use of the
security drivers is internally serialized. This avoids the need
to rely on the global driver locks to achieve serialization
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable locking to be introduced to the security manager
objects later, turn virSecurityManager into a virObjectLockable
class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
From qemu's point of view these are still just tap devices, so there's
no reason they shouldn't work with vhost-net; as a matter of fact,
Raja Sivaramakrishnan <srajag00@yahoo.com> verified on libvir-list
that at least the qemu_command.c part of this patch works:
https://www.redhat.com/archives/libvir-list/2012-December/msg01314.html
(the hotplug case is extrapolation on my part).
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To avoid confusion between 'virCapsPtr' and 'qemuCapsPtr'
do some renaming of various fucntions/variables. All
instances of 'qemuCapsPtr' are renamed to 'qemuCaps'. To
avoid that clashing with the 'qemuCaps' typedef though,
rename the latter to virQEMUCaps.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupPtr instance APIs are safe to use without locking
in the QEMU driver, since all internal state they rely on is
immutable. Update the comment to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We are requesting for stderr catching for all cases in
virFileWrapperFdNew(). There is no need to have a separate
function just to report an error, esp. when we can do it in
virFileWrapperFdClose().
The qemuParseGlusterString() replaced dst->src without a VIR_FREE() of
what was in there before.
The qemuBuildCommandLine() did not properly free the boot_buf depending
on various usages.
The qemuParseCommandLineDisk() had numerous paths that didn't clean up
the virDomainDiskDefPtr def properly. Adjust the logic to go through an
error: label before cleanup in order to free the resource.
Currently the activePciHostdevs, inactivePciHostdevsd and
activeUsbHostdevs lists are all implicitly protected by the
QEMU driver lock. Now that the lists all inherit from the
virObjectLockable, we can make the locking explicit, removing
the dependency on the QEMU driver lock for correctness.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When iterating over USB host devices to setup cgroups, the
usbDevice object was leaked in both LXC and QEMU driers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU driver struct has a 'qemuVersion' field that was previously
used to cache the version lookup from capabilities. With the recent
QEMU capabilities rewrite the caching happens at a lower level so
this field is pointless. Removing it avoids worries about locking
when updating it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The duplicate VM checking should be done atomically with
virDomainObjListAdd, so shoud not be a separate function.
Instead just use flags to indicate what kind of checks are
required.
This pair, used in virDomainCreateXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainRestoreFlags:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, true)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainDefineXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 0) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
0, NULL)))
goto cleanup;
Otherwise, we get a lot of scary (but harmless) noise in the logs:
2013-02-05 15:35:48.555+0000: 8637: error : qemuMonitorJSONCheckError:353 : internal error unable to execute QEMU command 'add-fd': Parameter 'fdset-id' expects an existing fdset-id
one for every qemu 1.2 binary that we probe.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd): During
probe, avoid logging failures.
As a step towards making virDomainObjList thread-safe turn it
into an opaque virObject, preventing any direct access to its
internals.
As part of this a new method virDomainObjListForEach is
introduced to replace all existing usage of virHashForEach
Currently the virQEMUDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virQEMUDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virQEMUDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a compression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
If a decompression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
Add support for QEMU -add-fd command line parameter detection.
This intentionally rejects qemu 1.2, where 'add-fd' QMP did
not allow full control of set ids, and where there was no command
line counterpart, but accepts qemu 1.3.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add entry points for calling the qemu 'add-fd' and 'remove-fd'
monitor commands. There is no entry point for 'query-fdsets';
the assumption is that a developer can use
virsh qemu-monitor-command domain '{"execute":"query-fdsets"}'
when debugging issues, and that meanwhile, libvirt is responsible
enough to remember what fds it associated with what fdsets.
Likewise, on the 'add-fd' command, it is assumed that libvirt
will always pass a set id, rather than letting qemu autogenerate
the next available id number.
* src/qemu/qemu_monitor.c (qemuMonitorAddFd, qemuMonitorRemoveFd):
New functions.
* src/qemu/qemu_monitor.h (qemuMonitorAddFd, qemuMonitorRemoveFd):
New prototypes.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New prototypes.
https://bugzilla.redhat.com/show_bug.cgi?id=894723
Currently, if qemuProcessStart() succeeds, but it's decompression
binary that returns nonzero status, we don't kill the qemu process,
but remove it from internal domain list, leaving the qemu process
hanging around totally uncontrolled.
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
With our code, we fail to query for tray-open attribute currently.
That's because in HMP it is 'tray-open' and in QMP it's 'tray_open'.
It always has been. However, we got it exactly the opposite.
A logic bug meant we reported KVM was possible for every
architecture, merely based on whether the query-kvm command
exists. We should instead have been doing it based on whether
the query-kvm command returns 'present: 1'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently QEMU capabilities are initialized before the QEMU driver
sets ownership on its various directories. The upshot is that if
you change the user/group in the qemu.conf file, libvirtd will fail
to probe QEMU the first time it is run after the config change.
Moving QEMU capabilities initialization to after the chown() calls
fixes this
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This previous commit
commit 1a50ba2cb0
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date: Mon Nov 26 15:17:13 2012 +0100
qemu: Fix QMP Capabability Probing Failure
which attempted to make sure the QEMU process used for probing
ran as the right user id, caused serious performance regression
and unreliability in probing. The -daemonize switch in QEMU
guarantees that the monitor socket is present before the parent
process exits. This means libvirtd is guaranteed to be able to
connect immediately. By switching from -daemonize to the
virCommandDaemonize API libvirtd was no longer synchronized with
QEMU's startup process. The result was that the QEMU monitor
failed to open and went into its 200ms sleep loop. This happened
for all 25 binaries resulting in 5 seconds worth of sleeping
at libvirtd startup. In addition sometimes when libvirt connected,
QEMU would be partially initialized and crash causing total
failure to probe that binary.
This commit reverts the previous change, ensuring we do use the
-daemonize flag to QEMU. Startup delay is cut from 7 seconds
to 2 seconds on my machine, which is on a par with what it was
prior to the capabilities rewrite.
To deal with the fact that QEMU needs to be able to create the
pidfile, we switch pidfile location fron runDir to libDir, which
QEMU is guaranteed to be able to write to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there is no reason to hold qemu driver locked
throughout whole API execution. Moreover, we can use the
new qemuDomObjFromDomain() internal API to lookup domain then.
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
if a guest is put into S3 state (such as via virsh dompmsuspend)
and then an external snapshot is taken, qemu forcefully transitions
the domain to paused, but libvirt doesn't reflect that change
internally. Thus, a user has to use 'virsh suspend' to get libvirt
back in sync with qemu state, and if the user doesn't know this
trick, then the guest appears hung.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
Track fact that qemu wakes up a suspended domain on migration.
The previous fix to avoid leaking securityDriverNames forgot to
handle the case of securityDriverNames being NULL, leading to
a crash
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The autodestroy callback code has the following function
called from a hash iterator
qemuDriverCloseCallbackRun(void *payload,
const void *name,
void *opaque)
{
...
char *uuidstr = name
...
dom = closeDef->cb(data->driver, dom, data->conn);
if (dom)
virObjectUnlock(dom);
virHashRemoveEntry(data->driver->closeCallbacks, uuidstr);
}
The closeDef->cb function may well cause the current callback
to be removed, if it shuts down 'dom'. As such the use of
'uuidstr' in virHashRemoveEntry is accessing free'd memory.
We must make a copy of the uuid str before invoking the
callback to be safe.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This will allow storing additional topology data in the NUMA topology
definition.
This patch changes the storage type and fixes fallout of the change
across the drivers using it.
This patch also changes semantics of adding new NUMA cell information.
Until now the data were re-allocated and copied to the topology
definition. This patch changes the addition function to steal the
pointer to a pre-allocated structure to simplify the code.
The way in that memory balloon suppression was handled for S390
is flawed for a number or reasons.
1. Just preventing the default balloon to be created in the case
of VIR_ARCH_S390[X] is not sufficient. An explicit memballoon
element in the guest definition will still be honored, resulting
both in a -balloon option and the allocation of a PCI bus address,
neither being supported.
2. Prohibiting balloon for S390 altogether at a domain_conf level
is no good solution either as there's work in progress on the QEMU
side to implement a virtio-balloon device, although in
conjunction with a new machine type. Suppressing the balloon
should therefore be done at the QEMU driver level depending
on the present capabilities.
Therefore we remove the conditional suppression of the default
balloon in domain_conf.c.
Further, we are claiming the memballoon device for virtio-s390
during device address assignment to prevent it from being considered
as a PCI device.
Finally, we suppress the generation of the balloon command line option
if this is a virtio-s390 machine.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Should have been done in commit 56fd513 already, but was missed
due to oversight: qemuDomainSendKey didn't release the driver lock
in its cleanup section. This fixes an issue introduced by commit
8c5d2ba.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=892079
One of my previous patches (f2a4e5f176) tried to fix crashing
libvirtd on domain detroy. However, we need to copy pattern from
qemuProcessHandleMonitorEOF() instead of decrementing reference
counter. The rationale for this is, if qemu process is dying due
to domain being destroyed, we obtain EOF on both the monitor and
agent sockets. However, if the exit is expected, qemuProcessStop
is called, which cleans both agent and monitor sockets up. We
want qemuAgentClose() to be called iff the EOF is not expected,
so we don't leak an FD and memory. Moreover, there could be race
with qemuProcessHandleMonitorEOF() which could have already
closed the agent socket, in which case we don't want to do
anything.
Adds a "ram" attribute globally to the video.model element, that changes
the resulting qemu command line only if video.type == "qxl".
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
</video>
That attribute gets a default value of 64*1024. The schema is unchanged
for other video element types.
The resulting qemu command line change is the addition of
-global qxl-vga.ram_size=<ram>*1024
or
-global qxl.ram_size=<ram>*1024
For the main and secondary qxl devices respectively.
The default for the qxl ram bar is 64*1024 kilobytes (the same as the
default qxl vram bar size).
This avoids "Event negative_returns: A negative constant "-1" is passed as
an argument to a parameter that cannot be negative.". The called function
uses -1 to determine whether it needs to traverse all the hostdevs.
The snapshot name is used to create path to the definition save file.
When the name contains slashes the creation of the file fails. Reject
such names.
When the snapshot definition can't be saved, the
qemuDomainSnapshotCreate function succeeded without filling some of the
fields in the internal definition.
This patch removes the snapshot and returns failure if the XML file
cannot be written.
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The driver mutex was unlocked in qemuDomainModifyDeviceFlags before
entering qemuDomainObjBeginJobWithDriver where it will be unlocked once
more leaving it in an undefined state. The result was that two
threads were simultaneously looking up the domain hash table during
multiple parallel device attach/detach operations.
Luckily this triggered a virHashIterationError.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The QEMU driver default max port is 65535, but it then increments
this by 1 to 65536. This maps to 0 in an unsigned short :-( This
was apparently done so that for() loops could use "< max" instead
of "<= max". Remove this insanity and just make the loop do the
right thing.
In commit c4bbaaf8, caps->arch was checked uninitialized, rendering the
whole check useless.
This patch moves the conditional setting of QEMU_CAPS_NO_ACPI to
qemuCapsInitQMP, and removes the no longer needed exception for S390.
It also clears the flag for all non-x86 archs instead of just S390 in
qemuCapsInitHelp.
The virDomainObj, qemuAgent, qemuMonitor, lxcMonitor classes
all require a mutex, so can be switched to use virObjectLockable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After live change of cpu counts, the number of processor threads is
verified. This patch makes use of this approach to check if qemu ignored
the request for cpu hot-unplug and report an appropriate message.
Currently all classes must directly inherit from virObject.
This allows for arbitrarily deep hierarchy. There's not much
to this aside from chaining up the 'dispose' handlers from
each class & providing APIs to check types.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Pass stub driver name directly to pciDettachDevice and pciReAttachDevice to fit
for different libvirt drivers. For example, qemu driver prefers pci-stub, but
Xen prefers pciback.
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Add an optional 'type' attribute to <target> element of serial port
device. There are two choices for its value, 'isa-serial' and
'usb-serial'. For backward compatibility, when attribute 'type' is
missing the 'isa-serial' will be chosen as before.
Libvirt XML sample
<serial type='pty'>
<target type='usb-serial' port='0'/>
<address type='usb' bus='0' port='1'/>
</serial>
qemu commandline:
qemu ${other_vm_args} \
-chardev pty,id=charserial0 \
-device usb-serial,chardev=charserial0,id=serial0,bus=usb.0,port=1
https://bugzilla.redhat.com/show_bug.cgi?id=892079
With current code, if user calls virDomainPMSuspendForDuration()
followed by virDomainDestroy(), the former API checks for qemu agent
presence, which will evaluate as true (if agent is configured). While
talking to qemu agent, the qemu driver is unlocked, so the latter API
starts executing. However, if machine dies meanwhile, libvirtd gets
EOF on the agent socket and qemuProcessHandleAgentEOF() is called. The
handler clears reference to qemu agent while the destroy API already
holding a reference to it. This leads to NULL dereferencing later in
the code. Therefore, the agent pointer should be set to NULL only if
we are the exclusive owner of it.
While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
Perform all the appropriate plumbing.
When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,
libvirt will think of them as "paused" event after they are resumed and
effectively running. With this patch the discrepancy goes away.
This is meant to address bug 892791.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
This is the QEMU backend code for the SCLP console support.
It includes SCLP capability detection, QEMU command line generation
and a test case.
Signed-off-by: J.B. Joret <jb@linux.vnet.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Since we daemonized QEMU for capabilities probing there is a long
time if QEMU fails to launch. This is because we're not passing in
any virDomainObjPtr instance and thus the monitor code can not
check to see if the PID is still alive.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code is initializing capabilities before setting
directory permissions. Thus the QEMU binaries being run may
not have the ability to create the UNIX monitor socket on
the first run of libvirtd.
This prevents domain starting and disk attaching if the shared disk's
setting conflicts with other active domain(s), E.g. A domain with
"sgio" set as "filtered", however, another active domain is using
it set as "unfiltered".
This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
This patch adds a new domain lookup helper qemuDomObjFromDomainDriver
that lookups the domain and leaves the driver locked. The driver is
returned as the second argument of that function. If the lookup fails
the driver is unlocked to help avoid cleanup codepaths.
This patch also improves docs for the helpers.
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch exports qemuMigrationIsAllowed and adds a new parameter to it
to denote if it's a remote migration or a local migration. Local
migrations are used in snapshots and saving of the machine state and
have fewer restrictions. This patch also adjusts callers of the function
and tweaks some error messages to be more universal.
Currently there is no way to detect it via QMP and requesting "-sandbox
off" works correctly even if it was compiled out, so this will work
unless someone both requests the sandbox in qemu.conf and builds QEMU
without the support for it.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
When the disk snapshot part of an external system checkpoint fails the
memory image is retained. This patch adds code to remove the image in
such case.
In case the snapshot code isn't able to restart CPUs after an external
checkpoint we would leak a copy of the domains XML definition. This
patch fixes the cleanup path.
False positive, but it breaks the build with gcc-4.6.3.
qemu/qemu_migration.c:2931:37: error: 'offline' may be used
uninitialized in this function [-Werror=uninitialized]
qemu/qemu_migration.c:2887:10: note: 'offline' was declared here
When restarting CPUs after an external snapshot, the restarting function
was called without the appropriate async job type. This caused that a
new sync job wasn't created and allowed races in the monitor.
Offline migration transfers inactive definition of a domain (which may
or may not be active). After successful completion, the domain remains
in its current state on source host and is defined but inactive on
destination host. It's a bit more clever than virDomainGetXMLDesc() on
source host followed by virDomainDefineXML() on destination host, as
offline migration will run pre-migration hook to update the domain XML
on destination host. Currently, copying non-shared storage is not
supported during offline migration.
Offline migration can be requested with a new migration flag called
VIR_MIGRATE_OFFLINE (which has to be combined with
VIR_MIGRATE_PERSIST_DEST flag).
This fixes a problem that showed up during testing of:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
Due to a logic error in the function that gets the name of the bridge
an interface connects to, any time a bridge was specified directly
(type='bridge') rather than indirectly (type='network'), An error
would be logged (although the operation would then complete
successfully):
Network type 6 is not supported
The final virReportError() in the function
qemuDomainNetGetBridgeName() was apparently avoided in the past with a
"goto cleanup" at the end of each case, but the case of bridge somehow
no longer has that final goto cleanup.
The proper solution is anyway to not rely on goto's, but put the error
log inside an else {} clause, so that it's executed only if the type
is neither bridge nor network (in reality, this function should only
ever be called for those two types, that's why this is an internal
error).
While making this change, the error message was also tuned to be more
correct (since it's not really the type of the network, but the type
of the interface, and it *is* otherwise supported, it's just that the
interface type in question doesn't *have* a bridge device associated
with it, or at least we don't know how to get it).
If a network interface model is not specified, libvirt will run
into an unchecked NULL pointer coredump. On the other hand if
the empty model is ignored, a PCI bus address would be generated,
which is not supported by S390.
Since the only valid network type model for S390 is virtio,
we use this as the default value, which is the same for QEMU.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Things are supposed to look like:
<machine canonical='pc-0.12'>pc</machine>
But are currently swapped. This can cause many VMs to revert to having
machine type='pc' which will affect save/restore across qemu upgrades.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Unmanaged PCI devices were only leaked if pciDeviceListAdd failed but
managed devices were always leaked. And leaking PCI device is likely to
leave PCI config file descriptor open. This patch fixes
qemuReattachPciDevice to either free the PCI device or add it to the
inactivePciHostdevs list.
An attempt to attach device that is already attached to a domain results
in the following error:
virsh # attach-device rhel6 pci2 --persistent
error: Failed to attach device from pci2
error: invalid argument: device is already in the domain configuration
The "invalid argument" error code looks wrong, we usually use "operation
invalid" when the action cannot be done in current state.
Only one error in qemu_monitor was already using the relatively
new OPERATION_UNSUPPORTED error, even though it is a better fit
for all of the messages related to options that are unsupported
due to the version of qemu in use rather than due to a user's
XML or .conf file choice. Suggested by Osier Yang.
* src/qemu/qemu_monitor.c (qemuMonitorSendFileHandle)
(qemuMonitorAddHostNetwork, qemuMonitorRemoveHostNetwork)
(qemuMonitorAttachDrive, qemuMonitorDiskSnapshot)
(qemuMonitorDriveMirror, qemuMonitorTransaction)
(qemuMonitorBlockCommit, qemuMonitorDrivePivot)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup)
(qemuMonitorGetVersion, qemuMonitorGetMachines)
(qemuMonitorGetCPUDefinitions, qemuMonitorGetCommands)
(qemuMonitorGetEvents, qemuMonitorGetKVMState)
(qemuMonitorGetObjectTypes, qemuMonitorGetObjectProps)
(qemuMonitorGetTargetArch): Use better error category.
Without this patch, attempts to create a disk snapshot when qemu
is too old results in a cryptic message:
virsh # snapshot-create 23 --disk-only
error: operation failed: Failed to take snapshot: unknown command: 'snapshot_blkdev'
Now it reports:
virsh # snapshot-create 23 --disk-only
error: unsupported configuration: live disk snapshot not supported with this QEMU binary
All versions of qemu that support live disk snapshot also support
QMP (basically upstream qemu 1.1 and later, and backports to RHEL 6.2).
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DISK_SNAPSHOT): New
capability.
* src/qemu/qemu_capabilities.c (qemuCaps): Track it.
(qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
it.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Simplify.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextDiskSnapshot):
Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextDiskSnapshot):
Likewise.
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
The fact that only the guest agent, or ACPI flag can be used
when requesting reboot/shutdown is merely a limitation of the
QEMU driver impl at this time. Thus it should not be in
libvirt.c code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The default machine type must be stored in the first element of
the caps->machineTypes array. This was done for help output
parsing but not for QMP probing.
Added a helper function qemuSetDefaultMachine to apply the same
fix up for both probing methods.
Further, it was necessary to set caps->nmachineTypes after QMP
probing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=872292
Libvirt should not attempt to call a QMP command that has not been
documented in qemu.git - if future qemu introduces a command by the
same name but with subtly different semantics, then libvirt will be
broken when trying to use that command.
We also had some code that could never be reached - some of our
commands have an alternate for new vs. old qemu HMP commands; but
if we are new enough to support QMP, we only need a fallback to
the new HMP counterpart, and don't need to try for a QMP counterpart
for the old HMP version.
See also this attempt to convert the three snapshot commands to QMP:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01597.html
although it looks like that will still not happen before qemu 1.3.
That thread eventually decided that qemu would use the name
'save-vm' rather than 'savevm', which mitigates the fact that
libvirt's attempt to use a QMP 'savevm' would be broken, but we
might not be as lucky on the other commands.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONSetCPU)
(qemuMonitorJSONAddDrive, qemuMonitorJSONDriveDel)
(qemuMonitorJSONCreateSnapshot, qemuMonitorJSONLoadSnapshot)
(qemuMonitorJSONDeleteSnapshot): Use only HMP fallback for now.
(qemuMonitorJSONAddHostNetwork, qemuMonitorJSONRemoveHostNetwork)
(qemuMonitorJSONAttachDrive, qemuMonitorJSONGetGuestDriveAddress):
Delete; QMP implies QEMU_CAPS_DEVICE, which prefers AddNetdev,
RemoveNetdev, and AddDrive anyways (qemu_hotplug.c has all callers).
* src/qemu/qemu_monitor.c (qemuMonitorAddHostNetwork)
(qemuMonitorRemoveHostNetwork, qemuMonitorAttachDrive): Reflect
deleted commands.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddHostNetwork)
(qemuMonitorJSONRemoveHostNetwork, qemuMonitorJSONAttachDrive):
Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
Commit 1b2ebf95 overlooked that there were two spots affected.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice):
Transfer backing chain before deletion.
* src/qemu/qemu_driver.c (qemuDomainDetachDeviceDiskLive): Fix
spacing (partly to ensure a different-looking patch).
This patch adds two labels and gets rid of a ton of duplicated code.
This patch also fixes some error message and switches most of them to
proper error reporting functions.
This patch adds macros to help retrieve configuration values from qemu
driver's configuration. Some configuration options are grouped
together in the process.
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of 1a50ba2cb0 we fail to connect to the
monitor instead of getting an exit status != 0 from qemu itself. This
breaks capabilities probing for the non QMP case.
Replace the following names
* struct qemu_snap_remove with virQEMUSnapRemovePtr
* struct qemu_snap_reparent with virQEMUSnapReparentPtr
* struct qemu_save_header with virQEMUSaveHeaderPtr
* enum qemu_save_formats with virQEMUSaveFormat
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
Throughout the code, we've always used VIR_DOMAIN_SHUTDOWN* flags
even for virDomainReboot() API and its implementation. Fortunately,
the appropriate macros has the same value. But if we want to keep
things consistent, we should be using the correct macros. This
patch doesn't break anything, luckily.
Error messages produced while dispatching guest agent commands didn't
have an apparent reference to the fact that they are dealing with guest
agent commands. This patch fixes up some of the messages to contain that
reference.
using qemu guest agent. As said in previous patch,
@mountPoint must be NULL and @flags zero because
qemu guest agent doesn't support these arguments
yet. If qemu learns them, we can start supporting
them as well.
With QMP capability probing, the version was not set.
virsh version returns:
...
Cannot extract running QEMU hypervisor version
This is fixed by computing caps->version from QMP major,
minor, micro values.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
QMP Capability probing will fail if QEMU cannot bind to the
QMP monitor socket in the qemu_driver->libDir directory.
That's because the child process is stripped of all
capabilities and this directory is chown'ed to the configured
QEMU user/group (normally qemu:qemu) by the QEMU driver.
To prevent this from happening, the driver startup will now pass
the QEMU uid and gid down to the capability probing code.
All capability probing invocations of QEMU will be run with
the configured QEMU uid instead of libvirtd's.
Furter, the pid file handling is moved to libvirt, as QEMU
cannot write to the qemu_driver->runDir (root:root). This also
means that the libvirt daemonizing must be used.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If qemuMonitorOpenUnix is called without a related pid, i.e. for
QMP probing, a connect failure can happen as the result of a race.
Without a pid there is no retry and thus we give up too early.
This changes the code to retry if no pid is supplied.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Change some legacy function names to use 'qemu' as their
prefix instead of 'qemud' which was a hang over from when
the QEMU driver ran inside a separate daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
The actual regression is due to a latent bug. The hot unplug code
was computing the set of files needing cgroup ACL revocation based
on the XML passed in by the user, rather than based on the domain's
details on which disk was being deleted. As long as the revoke
path was always recomputing the backing chain, this didn't really
matter; but now that we want to compute the chain exactly once and
remember that computation, we need to hang on to the backing chain
until after the revoke has happened.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice):
Transfer backing chain before deletion.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Although we require various C99 features, we don't yet require a
complete C99 compiler. On RHEL 5, compilation complained:
qemu/qemu_command.c: In function 'qemuBuildGraphicsCommandLine':
qemu/qemu_command.c:4688: error: 'for' loop initial declaration used outside C99 mode
* src/qemu/qemu_command.c (qemuBuildGraphicsCommandLine): Declare
variable sooner.
* src/qemu/qemu_process.c (qemuProcessInitPasswords): Likewise.
The error "... but the cause is unknown" appeared for XMLs similar to
this:
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/dev/zero'/>
<target dev='sr0'/>
</disk>
Notice unsupported disk type (for the driver), but also no address
specified. The first part is not a problem and we should not abort
immediately because of that, but the combination with the address
unknown was causing an unspecified error.
While fixing this, I added an error to one place where this return
value was not managed properly.
I have been testing libvirt v1.0.0 for deployment within my
organization, and in the process discovered what appears to be a bug
that breaks virsh attach-device, when attaching an RBD volume to an
instance. First, here is the error presented, with v1.0.0 (this worked
in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml
error: Failed to attach device from G84VV1.xml
error: cannot open file 'dc3-1-test/G84VV1': No such file or directory
Using git bisect, I narrowed the problem down to this as the first
commit to break this setup:
4d34c92947 is the first bad commit
Upcoming patches for revert-and-clone branching of snapshots need
to be able to copy a domain definition; make this step reusable.
* src/conf/domain_conf.h (virDomainDefCopy): New prototype.
* src/conf/domain_conf.c (virDomainObjCopyPersistentDef): Split...
(virDomainDefCopy): ...into new function.
(virDomainObjSetDefTransient): Use it.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use it.
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.
Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
Currently, if user calls virDomainAbortJob we just issue
'migrate_cancel' and hope for the best. However, if user calls
the API in wrong phase when migration hasn't been started yet
(perform phase) the cancel request is just ignored. With this
patch, the request is remembered and as soon as perform phase
starts, migration is cancelled.
For S390, the default console target type cannot be of type 'serial'.
It is necessary to at least interpret the 'arch' attribute
value of the os/type element to produce the correct default type.
Therefore we need to extend the signature of defaultConsoleTargetType
to account for architecture. As a consequence all the drivers
supporting this capability function must be updated.
Despite the amount of changed files, the only change in behavior is
that for S390 the default console target type will be 'virtio'.
N.B.: A more future-proof approach could be to to use hypervisor
specific capabilities to determine the best possible console type.
For instance one could add an opaque private data pointer to the
virCaps structure (in case of QEMU to hold capsCache) which could
then be passed to the defaultConsoleTargetType callback to determine
the console target type.
Seems to be however a bit overengineered for the use case...
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When the libvirt daemon is restarted it tries to reconnect to running
qemu domains. Since commit d38897a5d4 the
re-connection code runs in separate threads. In the original
implementation the maximum of domain ID's (that is used as an
initializer for numbering guests created next) while libvirt was
reconnecting to the guest.
With the threaded implementation this opens a possibility for race
conditions with the thread that is autostarting guests. When there's a
guest running with id 1 and the daemon is restarted. The autostart code
is reached first and spawns the first guest that should be autostarted
as id 1. This results into the following unwanted situation:
# virsh list
Id Name State
----------------------------------------------------
1 guest1 running
1 guest2 running
This patch extracts the detection code before the re-connection threads
are started so that the maximum id of the guests being reconnected to is
known.
The only semantic change created by this is if the guest with greatest ID
quits before we are able to reconnect it's ID is used anyway as the
greatest one as without this patch the greatest ID of a process we could
successfuly reconnect to would be used.
This patch adds support for external disk snapshots of inactive domains.
The snapshot is created by calling using qemu-img by calling:
qemu-img create -f format_of_snapshot -o
backing_file=/path/to/src,backing_fmt=format_of_backing_image
/path/to/snapshot
in case the backing image format is known or probing is allowed and
otherwise:
qemu-img create -f format_of_snapshot -o backing_file=/path/to/src
/path/to/snapshot
on each of the disks selected for snapshotting. This patch also modifies
the snapshot preparing function to support creating external snapshots
and to sanitize arguments. For now the user isn't able to mix external
and internal snapshots but this restriction might be lifted in the
future.
Some operations, APIs needs domain to be paused prior operation can be
performed, e.g. (managed-) save of a domain. The processors should be
restored in the end. However, if 'cont' fails for some reason, we log a
message but this is not sufficient as an event should be emitted as
well. Mgmt application can then decide what to do.
The code that was split out into the qemuDomainSaveMemory expands the
pointer containing the XML description of the domain that it gets from
higher layers. If the pointer changes the old one is invalid and the
upper layer function tries to free it causing an abort.
This patch changes the expansion of the original string to a new
allocation and copy of the contents.
qemu is sensitive to the order of arguments passed. Hence, if a
device requires a controller, the controller cmd string must
precede device cmd string. The same apply for controllers, when
for instance ccid controller requires usb controller. So
controllers create partial ordering in which they should be added
to qemu cmd line.
Some of the pre-snapshot check have restrictions wired in regarding
configuration options that influence taking of external checkpoints.
This patch removes restrictions that would inhibit taking of such a
snapshot.
This patch adds support to take external system checkpoints.
The functionality is layered on top of the previous disk-only snapshot
code. When the checkpoint is requested the domain memory is saved to the
memory image file using migration to file. (The user may specify to
take the memory image while the guest is live with the
VIR_DOMAIN_SNAPSHOT_CREATE_LIVE flag.)
The memory save image shares format with the image created by
virDomainSave() API.
Before now, libvirt supported only internal snapshots for active guests.
This patch renames this function to qemuDomainSnapshotCreateActiveInternal
to prepare the grounds for external active snapshots.
The new external system checkpoints will require an async job while the
snapshot is taken. This patch adds QEMU_ASYNC_JOB_SNAPSHOT to track this
job type.
The code that saves domain memory by migration to file can be reused
while doing external checkpoints of a machine. This patch extracts the
common code and places it in a separate function.
When pausing the guest while migration is running (to speed up
convergence) the virDomainSuspend API checks if the migration job is
active before entering the job. This could cause a possible race if the
virDomainSuspend is called while the job is active but ends before the
Suspend API enters the job (this would require that the migration is
aborted). This would cause a incorrect event to be emitted.
Both system checkpoint snapshots and disk snapshots were iterating
over all disks, doing a final sanity check before doing any work.
But since future patches will allow offline snapshots to be either
external or internal, it makes sense to share the pass over all
disks, and then relax restrictions in that pass as new modes are
implemented. Future patches can then handle external disks when
the domain is offline, then handle offline --disk-snapshot, and
finally, combine with migration to file to gain a complete external
system checkpoint snapshot of an active domain without using 'savevm'.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotIsAllowed): Merge...
(qemuDomainSnapshotPrepare): ...into one function.
(qemuDomainSnapshotCreateXML): Update caller.
Now that the XML supports listing internal snapshots, it is worth
always populating the <memory> and <disks> element to match.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Always
parse disk info and set memory info.
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=871273
when using virsh qemu-attach to attach an existing qemu process,
if it misses the -M option in qemu command line, libvirtd crashed
because the NULL value of def->os.machine in later use.
Example:
/usr/libexec/qemu-kvm -name foo \
-cdrom /var/lib/libvirt/images/boot.img \
-monitor unix:/tmp/demo,server,nowait \
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
This patch tries to set default machine type if the value of
def->os.machine is still NULL after qemu command line parsing.
Per the code comment in qemuCapsInitQMPBasic() and commit 43e23c7, we
should only use QMP for capabilities probing starting with 1.2 and
newer. The old code had dead logic that probed on 1.0 and newer.
Signed-off-by: Eric Blake <eblake@redhat.com>
The string comparison logic was inverted and matched the first drive
that does *not* have the name we search for.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The QEMU -drive id= begins with libvirt's QEMU host drive prefix
("drive-"), which is stripped off in several places two convert between
host ("-drive") and guest ("-device") device names.
In the case of BlkIoTune it is unnecessary to strip the QEMU host drive
prefix because we operate on "info block"/"query-block" output that uses
host drive names.
Stripping the prefix incorrectly caused string comparisons to fail since
we were comparing the guest device name against the host device name.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
QEMU uses 'i386' for its 32-bit x86 architecture, but libvirt
wants that to be 'i686', so we must fix it up
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=871756
Commit cd1e8d1 assumed that systems new enough to have journald
also have mkostemp; but this is not true for uclibc.
For that matter, use of mkstemp[s] is unsafe in a multi-threaded
program. We should prefer mkostemp[s] in the first place.
* bootstrap.conf (gnulib_modules): Add mkostemp, mkostemps; drop
mkstemp and mkstemps.
* cfg.mk (sc_prohibit_mkstemp): New syntax check.
* tools/virsh.c (vshEditWriteToTempFile): Adjust caller.
* src/qemu/qemu_driver.c (qemuDomainScreenshot)
(qemudDomainMemoryPeek): Likewise.
* src/secret/secret_driver.c (replaceFile): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=871312
Recent fixes made almost all the right steps to make emulator pinned
to the cpuset of the whole domain in case <emulatorpin> isn't
specified, but qemudDomainGetEmulatorPinInfo still reports all the
CPUs even when cpuset is specified. This patch fixes that.
When there is no 'qemu-kvm' binary and the emulator used for a machine
is, for example, 'qemu-system-x86_64' that, by default, runs without
kvm enabled, libvirt still supplies '-no-kvm' option to this process,
even though it does not recognize such option (making the start of a
domain fail in that case).
This patch fixes building a command-line for QEMU machines without KVM
acceleration and is based on following assumptions:
- QEMU_CAPS_KVM flag means that QEMU is running KVM accelerated
machines by default (without explicitly requesting that using a
command-line option). It is the closest to the truth according to
the code with the only exception being the comment next to the
flag, so it's fixed in this patch as well.
- QEMU_CAPS_ENABLE_KVM flag means that QEMU is, by default, running
without KVM acceleration and in case we need KVM acceleration it
needs to be explicitly instructed to do so. This is partially
true for the past (this option essentially means that QEMU
recognizes the '-enable-kvm' option, even though it's almost the
same).
Currently, we use iohelper when saving/restoring a domain.
However, if there's some kind of error (like I/O) it is not
propagated to libvirt. Since it is not qemu who is doing
the actual write() it will not get error. The iohelper does.
Therefore we should check for iohelper errors as it makes
libvirt more user friendly.
In the XML warning, we print a virsh command line that can be used to
edit that XML. This patch prints UUIDs if the entity name contains
special characters (like shell metacharacters, or "--" that would break
parsing of the XML comment). If the entity doesn't have a UUID, just
print the virsh command that can be used to edit it.
When using block copy to pivot over to a new chain, the backing files
for the new chain might still need labeling (particularly if the user
passes --reuse-ext with a relative backing file name). Relabeling a
file that is already labeled won't hurt, so this just labels the entire
chain at the point of the pivot. Doing the relabel of the chain uses
the fact that we already safely probed the file type of an external
file at the start of the block copy.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before
asking qemu to pivot.
Use the recent addition of qemuDomainPrepareDiskChainElement to
obtain locking manager lease, permit a block device through cgroups,
and set the SELinux label; then audit the fact that we hand a new
file over to qemu. Alas, releasing the lease and label at the end
of the mirroring is a trickier prospect (we would have to trace the
backing chain of both source and destination, and be sure not to
revoke rights to any part of the chain that is shared), so for now,
virDomainBlockJobAbort still leaves things with additional access
granted (as block-pull and block-commit have the same problem of
not clamping access after completion, a future cleanup would cover
all three commands).
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling.
Support the REUSE_EXT flag, in part by copying sanity checks from
snapshot code. This code introduces a case of probing an external
file for its type; such an action would be a security risk if the
existing file is supposed to be raw but the contents resemble some
other format; however, since the virDomainBlockRebase API has a
flag to force treating the file as raw rather than probe, we can
assume that probing is safe in all other instances. Besides, if
we don't probe or force raw, then qemu will.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT
flag.
(qemuDomainBlockCopy): Wire up flag, and add some sanity checks.
Minimal patch to wire up all the pieces in the previous patches
to actually enable a block copy job. By minimal, I mean that
qemu creates the file (that is, no REUSE_EXT flag support yet),
SELinux must be disabled, a lock manager is not informed, and the
audit logs aren't updated. But those will be added as
improvements in future patches.
This patch is designed so that if we ever add a future API
virDomainBlockCopy with more bells and whistles (such as letting
the user specify a destination image format different than the
source), where virDomainBlockRebase is a wrapper around the
simpler portions of the new functionality, then the new API can
just reuse the new qemuDomainBlockCopy function and already
support _SHALLOW and _REUSE_EXT flags. Also note that libvirt.c
already filtered the new flags if _COPY is not present, so that
we are not impacting the case of BlockRebase being a wrapper
around BlockPull.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
(qemuDomainBlockRebase): Call it when appropriate.
Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a block-job-complete command; worse, it
is also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process. A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.
At the time of this patch, the design for persistent bitmap has not
been clarified, so a followup patch will be needed once qemu
actually figures out how to expose it, and we figure out how to use
it. In the meantime, we have a solution that avoids the worst of
the problem. [This problem was first analyzed with the RHEL 6.3
__com.redhat_drive-reopen command; which partly explains why
upstream qemu 1.3 ditched the drive-reopen idea and went with
block-job-complete plus persistent bitmap instead.]
If we surround 'drive-reopen' with a pause/resume pair, then we can
guarantee that the guest cannot modify either source or destination
files in the window of libvirtd uncertainty, and the management app
is guaranteed that either libvirt knows the outcome and reported it
correctly; or that on libvirtd restart, the guest will still be
paused and that the qemu process cannot have disappeared due to
guest shutdown; and use that as a clue that the management app must
implement recovery protocol, with both source and destination files
still being in sync and with 'query-block' still being an option as
part of that recovery. My testing shows that the pause window will
typically be only a fraction of a second.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).
This patch intentionally avoids SELinux, lock manager, and audit
actions. Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands when libvirtd restarts. This
patch is enough to test the common case of success when used
correctly, while saving the subtleties of proper cleanup for
worst-case errors for later.
When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel). Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.
Ideas for future enhancements via new flags:
Depending on when persistent bitmap support is added, it may be
worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up
front if we detect an older qemu with risky pivot operation.
Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file. There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time. And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain. This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
Handle the new type of block copy event and info. Of course,
this patch does nothing until a later patch actually allows the
creation/abort of a block copy job.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New
block job status.
* src/libvirt.c (virDomainBlockRebase): Document the event.
* src/qemu/qemu_monitor_json.c (eventHandlers): New event.
(qemuMonitorJSONHandleBlockJobReady): New function.
(qemuMonitorJSONGetBlockJobInfoOne): Translate new job type.
(qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize
the event to minimize snooping.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful
info query to save effort on a pivot request.
For now, disk migration via block copy job is not implemented in
libvirt. But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact. Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need. Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.
We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain. This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.
The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror'
and 'block-job-complete'[1], which can drive live block copy and
storage migration. [Additionally, RHEL 6.3 had backported an earlier
version of most of the same functionality, but under the names
'__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with
slightly different JSON arguments, and has been using patches similar
to these upstream patches for several months now.]
The libvirt API virDomainBlockRebase as already committed for 0.9.12
is flexible enough to expose the basics of block copy, but some
additional features in the 'drive-mirror' qemu command, such as
setting error policy, setting granularity, or using a persistent
bitmap, may later require a new libvirt API virDomainBlockCopy. I
will wait to add that API until we know more about what qemu 1.3
will finally provide.
This patch caters only to the upstream qemu 1.3 interface, although
I have proven that the changes for RHEL 6.3 can be isolated to
just qemu_monitor_json.c, and the rest of this series will
gracefully handle either interface once the JSON differences are
papered over in a downstream patch.
For consistency with other block job commands, libvirt must handle
the bandwidth argument as MiB/sec from the user, even though qemu
exposes the speed argument as bytes/sec; then again, qemu rounds
up to cluster size internally, so using MiB hides the worst effects
of that rounding if you pass small numbers.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04123.html
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR)
(QEMU_CAPS_DRIVE_REOPEN): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror)
(qemuMonitorDrivePivot): Declare them.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): New passthroughs.
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): Declare them.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=862515
which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.
(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)
A synopsis of the problems:
1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.
2) you *can* do a live attach-interface of such a device.
3) you *can* directly edit a domain and put in two devices with
matching mac addresses.
4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.
5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.
Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:
1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.
2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.
This patch does three things:
1) removes the check for duplicate mac address during a persistent
netdev attach.
2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.
Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)
3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.
Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device
* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
* added new virDomainNetFindIdx function
* removed now unused virDomainNetIndexByMac and
virDomainNetRemoveByMac
* src/qemu/qemu_driver.c
* remove check for duplicate max from qemuDomainAttachDeviceConfig
* use virDomainNetFindIdx/virDomainNetRemove instead
of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
* use virDomainNetFindIdx instead of virDomainIndexByMac
in qemuDomainUpdateDeviceConfig
* src/qemu/qemu_hotplug.c
* use virDomainNetFindIdx instead of a homespun loop in
qemuDomainDetachNetDevice.
* tools/virsh-domain.c: modified detach-interface command as described
above
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones. So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:
$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
Cleaning this up unraveled a chain of other unused variables.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus. Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.
While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.
* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
On one hand, numad probably will manage the affinity of domain process
dynamically in future. On the other hand, even numad won't manage it,
it still could confusion. Let's make things simpler enough to avoid
the lair for now.
When the cpu placement model is "auto", it sets the affinity for
domain process with the advisory nodeset from numad, however,
creating cgroup for the domain process (called emulator thread
in some contexts) later overrides that with pinning it to all
available pCPUs.
How to reproduce:
* Configure the domain with "auto" placement for <vcpu>, e.g.
<vcpu placement='auto'>4</vcpu>
* % virsh start dom
* % cat /proc/$dompid/status
Though the emulator cgroup cause conflicts, but we can't simply
prohibit creating it, as other tunables are still useful, such
as "emulator_period", which is used by API
virDomainSetSchedulerParameter. So this patch doesn't prohibit
creating the emulator cgroup, but inherit the nodeset from numad,
and reset the affinity for domain process.
* src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
to accept the passed nodenet
* src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
Abstract the codes to prepare cpumap into a helper a function,
which can be used later.
* src/qemu/qemu_process.h: Declare qemuPrepareCpumap
* src/qemu/qemu_process.c: Implement qemuPrepareCpumap, and use it.
Transport Open vSwitch per-port data during live
migration by using the utility functions
virNetDevOpenvswitchGetMigrateData() and
virNetDevOpenvswitchSetMigrateData().
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Add the ability for the Qemu V3 migration protocol to
include transporting network configuration. A generic
framework is proposed with this patch to allow for the
transfer of opaque data.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Laine Stump <laine@laine.org>
The snapshot code when reusing an existing file had hard-to-read
logic, as well as a missing sanity check: REUSE_EXT should require
the destination to already be present.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare): Require
destination on REUSE_EXT, rename variable for legibility.
Currently it's assumed that qemu always supports VNC, however it is
definitely possible to compile qemu without VNC support so we should at
the very least check for it and handle that correctly.
Yet another instance of where using plain open() mishandles files
that live on root-squash NFS, and where improving the API can
improve the chance of a successful probe.
* src/util/storage_file.h (virStorageFileProbeFormat): Alter
signature.
* src/util/storage_file.c (virStorageFileProbeFormat): Use better
method for opening file.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Update caller.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
In v2 migration protocol, XML is obtained by calling domainGetXMLDesc.
This includes the default USB controller in XML, which breaks migration
to older libvirt (before 0.9.2).
Commit 409b5f5495
qemu: Emit compatible XML when migrating a domain
only fixed this for v3 migration.
This patch uses the new VIR_DOMAIN_XML_MIGRATABLE flag (detected by
VIR_DRV_FEATURE_XML_MIGRATABLE) to obtain XML without the default controller,
enabling backward v2 migration.
As we switched to setting capabilities based on QMP communication,
qemu seamless-migration capability was not set. In the -help output
this knob is called seamless-migration=[on|off]. The equivalent in
QMP world is SPICE_MIGRATE_COMPLETED event (qemu upstream commit
2fdd16e2).
Gcc with optimization warns:
../../src/qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
../../src/qemu/qemu_driver.c:12813:46: error: 'disk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
../../src/qemu/qemu_driver.c:12698:25: note: 'disk' was declared here
cc1: all warnings being treated as errors
so obviously I had only been testing with optimization off.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Guard cleanup.
I finally have all the pieces in place to perform a block-commit with
SELinux enforcing. There's still missing cleanup work when the commit
completes, but doing that requires tracking both the backing chain and
the base and top files within that chain in domain XML across libvirtd
restarts. Furthermore, from a security standpoint, once you have
granted access, you must assume any damage that can be done will be
done; later revoking access is nice to minimize the window of damage,
but less important as it does not affect the fact that damage can be
done in the first place. Therefore, deferring the revoke efforts until
we have better XML tracking of what chain operations are in effect,
including across a libvirtd restart, is reasonable.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Label disks as
needed.
(qemuDomainPrepareDiskChainElement): Cast away const.
Previously, snapshot code did its own permission granting (lock
manager, cgroup device controller, and security manager labeling)
inline. But now that we are adding block-commit and block-copy
which also have to change permissions, it's better to reuse
common code for the task. While snapshot should fall back to
no access if read-write access failed, block-commit will want to
fall back to read-only access. The common code doesn't know
whether failure to grant read-write access should revert to no
access (snapshot, block-copy) or read-only access (block-commit).
This code can also be used to revoke access to unused files after
block-pull.
It might be nice to clean things up in a future patch by adding
new functions to the lock manager, cgroup manager, and security
manager that takes a single file name and applies context of a
disk to that file, rather than the current semantics of applying
context to the entire chain already associated to a disk. That
way, we could avoid the games this patch plays of temporarily
swapping out the disk->src and related fields of the disk. But
that would involve more code changes, so this patch really is
the smallest hack for doing the necessary work; besides, this
patch is more or less code motion (the hack was already employed
by the snapshot creation code, we are just making it reusable).
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Refactor labeling hacks...
(qemuDomainPrepareDiskChainElement): ...into new function.
Now that we can crawl the chain of backing files, we can do
argument validation and implement the 'shallow' flag. In
testing this, I discovered that it can be handy to pass the
shallow flag and an explicit base, as a means of validating
that the base is indeed the file we expected.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Crawl through
chain to implement shallow flag.
* src/libvirt.c (virDomainBlockCommit): Relax API.
This is the bare minimum to kick off a block commit. In particular,
flags support is missing (shallow requires us to crawl the backing
chain to determine the file name to pass to the qemu monitor command;
delete requires us to track what needs to be deleted at the time
the completion event fires). Also, we are relying on qemu to do
error checking (such as validating 'top' and 'base' as being members
of the backing chain), including the fact that the current qemu code
does not support committing the active layer (although it is still
planned to add that before qemu 1.3). Since the active layer won't
change, we have it easy and do not have to alter the domain XML.
Additionally, this will fail if SELinux is enforcing, because we fail
to grant qemu proper read/write access to the files it will modify.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): New function.
(qemuDriver): Register it.
qemu 1.3 will be adding a 'block-commit' monitor command, per
qemu.git commit ed61fc1. It matches nicely to the libvirt API
virDomainBlockCommit.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_COMMIT): New bit.
* src/qemu/qemu_capabilities.c (qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_monitor.h (qemuMonitorBlockCommit): New prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockCommit): Implement it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise.
(qemuMonitorJSONHandleBlockJobImpl)
(qemuMonitorJSONGetBlockJobInfoOne): Handle new event type.
We used to walk the backing file chain at least twice per disk,
once to set up cgroup device whitelisting, and once to set up
security labeling. Rather than walk the chain every iteration,
which possibly includes calls to fork() in order to open root-squashed
NFS files, we can exploit the cache of the previous patch.
* src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
signature.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
to supply backing chain via disk, if recursion is desired.
* src/security/security_dac.c
(virSecurityDACSetSecurityImageLabel): Adjust caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityImageLabel): Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Likewise.
(qemuSetupCgroup): Pre-populate chain.
Technically, we should not be re-probing any file that qemu might
be currently writing to. As such, we should cache the backing
file chain prior to starting qemu. This patch adds the cache,
but does not use it until the next patch.
Ultimately, we want to also store the chain in domain XML, so that
it is remembered across libvirtd restarts, and so that the only
kosher way to modify the backing chain of an offline domain will be
through libvirt API calls, but we aren't there yet. So for now, we
merely invalidate the cache any time we do a live operation that
alters the chain (block-pull, block-commit, external disk snapshot),
as well as tear down the cache when the domain is not running.
* src/conf/domain_conf.h (_virDomainDiskDef): New field.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean new field.
* src/qemu/qemu_domain.h (qemuDomainDetermineDiskChain): New
prototype.
* src/qemu/qemu_domain.c (qemuDomainDetermineDiskChain): New
function.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive): Pre-populate chain.
(qemuDomainSnapshotCreateSingleDiskActive): Uncache chain before
snapshot.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Update
chain after block pull.
Requiring pre-allocation was an unusual idiom. It allowed iteration
over the backing chain to use fewer mallocs, but made one-shot
clients harder to read. Also, this makes it easier for a future
patch to move away from opening fds on every iteration over the chain.
* src/util/storage_file.h (virStorageFileGetMetadataFromFD): Alter
signature.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Allocate
return value.
(virStorageFileGetMetadata): Update clients.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
This is the last use of raw strings for disk formats throughout
the src/conf directory.
* src/conf/snapshot_conf.h (_virDomainSnapshotDiskDef): Store enum
rather than string for disk type.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefClear)
(virDomainSnapshotDiskDefParseXML, virDomainSnapshotDefFormat):
Adjust users.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive): Likewise.
Express the default disk type as an enum, for easier handling.
* src/conf/capabilities.h (_virCaps): Store enum rather than
string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Adjust
clients.
* src/qemu/qemu_driver.c (qemuCreateCapabilities): Likewise.
When an image has no backing file, using VIR_STORAGE_FILE_AUTO
for its type is a bit confusing. Additionally, a future patch
would like to reserve a default value for the case of no file
type specified in the XML, but different from the current use
of -1 to imply probing, since probing is not always safe.
Also, a couple of file types were missing compared to supported
code: libxl supports 'vhd', and qemu supports 'fat' for directories
passed through as a file system.
* src/util/storage_file.h (virStorageFileFormat): Add
VIR_STORAGE_FILE_NONE, VIR_STORAGE_FILE_FAT, VIR_STORAGE_FILE_VHD.
* src/util/storage_file.c (virStorageFileMatchesVersion): Match
documentation when version probing not supported.
(cowGetBackingStore, qcowXGetBackingStore, qcow1GetBackingStore)
(qcow2GetBackingStoreFormat, qedGetBackingStore)
(virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFD): Take NONE into account.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/conf/storage_conf.c (virStorageVolumeFormatFromString): New
function.
(poolTypeInfo): Use it.
Relabeling tapfd right after the tap device is created.
qemuPhysIfaceConnect is common function called both for static
netdevs and for hotplug netdevs.
Having hostuuid in migration cookie is a nice bonus since it provides an
easy way of detecting migration to the same host. However, requiring it
breaks backward compatibility with older libvirt releases.
Recently, patches were added support for (managed)saving, restoring, and
migrating domains with host USB devices. However, qemu driver would
still forbid migration of such domains because qemuMigrationIsAllowed
was not updated.
If we can't probe the architecture from QMP we parse the architecture
from the qemu binaries name. This results in the architecture being i386
instead of i686 which then results in QEMU_CAPS_PCI_MULTIBUS being unset
which gives a broken qemu command line.
This probably didn't show up earlier since most of the time there's also
a /usr/bin/qemu around which results in i686 capabilities.
When libvirt cannot find a suitable CPU model for host CPU (easily
reproducible by running libvirt in a guest), it would not provide CPU
topology in capabilities XML either. Even though CPU topology is known
and can be queried by virNodeGetInfo. With this patch, CPU topology will
always be provided in capabilities XML regardless on the presence of CPU
model.
Currently we query-spice after the main migration has completed
before moving to next state. Qemu reports this as boolean (not
enclosed within quotes). Therefore it is not correct to use
virJSONValueObjectGetString but virJSONValueObjectGetBoolean instead.
The machine in the last output line of <qemu-binary> -M ?
was always reported as default machine even if this wasn't the
actual default. Trivial fix.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
According to our recent changes (clarifications), we should be pinning
qemu's emulator processes using the <vcpu> 'cpuset' attribute in case
there is no <emulatorpin> specified. This however doesn't work
entirely as expected and this patch should resolve all the remaining
issues.
When p2p migration fails early because qemuMigrationIsAllowed or
qemuMigrationIsSafe say migration should be cancelled, we fail to clear
the migration-out async job. As a result of that, further APIs called
for the same domain may fail with Timed out during operation: cannot
acquire state change lock.
Reported by Guido Winkelmann.
It should relabel tapfd of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT
rather than VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE
(commit ae368ebfcc introduced this bug)
Caution: The context of the two hunks is identical other than indentation.
Please be extremely cautious of where the patch gets applied.
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0
Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0
There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file
This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=805071
to the extent that it can be resolved with current qemu functionality.
It attempts to detect as many situations as possible when the simple
operation of disconnecting an existing tap device from one bridge and
attaching it to another will satisfy the change requested in
virDomainUpdateDeviceFlags() for a network device. Before this patch,
that situation could only be detected if the pre-change interface
*and* the post-change interface definition were both "type='bridge'".
After this patch, it can also be detected if the before or after
interfaces are any combination of type='bridge' and type='network'
(the networks can be <forward mode='nat|route|bridge'>, as long as
they use a Linux host bridge and not macvtap connections).
This extra effort is especially useful since the recent discovery that
a netdev_del+netdev_add combo (to reconnect the network device with
completely different hostside configuration) doesn't work properly
with current qemu (1.2) unless it is accompanied by the matching
device_del+device_add - see this mailing list message for details:
http://lists.nongnu.org/archive/html/qemu-devel/2012-10/msg02355.html
(A slight modification of the patch referenced there has been prepared
to apply on top of this patch, but won't be pushed until qemu can be
made to work with it.)
* qemuDomainChangeNet needs access to the virDomainDeviceDef that
holds the new netdef (so that it can clear out the virDomainDeviceDef
if it ends up using the NetDef to replace the original), so the
virDomainNetDefPtr arg is replaced with a virDomainDeviceDefPtr.
* qemuDomainChangeNet previously checked for *some* changes to the
interface config, but this check was by no means complete. It was also
a bit disorganized.
This refactoring of the code is (I believe) complete in its check of
all NetDef attributes that might be changed, and either returns a
failure (for changes that are simply impossible), or sets one of three
flags:
needLinkStateChange - if the device link state needs to go up/down
needBridgeChange - if everything else is the same, but it needs
to be connected to a difference linux host
bridge
needReconnect - if the entire host side of the device needs
to be torn down and reconstructed (currently
non-working, as mentioned above)
Note that this function will refuse to make any change that requires
the *guest* side of the device to be detached (e.g. changing the PCI
address or mac address). Those would be disruptive enough to the guest
that it's reasonable to require an explicit detach/attach sequence
from the management application.
* As mentioned above, qemuDomainChangeNet also does its best to
understand when a simple change in attached bridge for the existing
tap device will work vs. the need to completely tear down/reconstruct
the host side of the device (including tap device).
This patch *does not* implement the "reconnect" code anyway - there is
a placeholder that turns that into an error. Rather, the purpose of
this patch is to replicate existing behavior with code that is ready
to have that functionality plugged in in a later patch.
* The expanded uses for qemuDomainChangeNetBridge meant that it needed
to be enhanced as well - it no longer replaces the original brname
string in olddev with the new brname; instead, it relies on the
caller to replace the *entire* olddev with newdev (since we've gone
to great lengths to assure they are functionally identical other
than the name of the bridge, this is now not only safe, but more
correct). Additionally, qemuDomainNetChangeBridge can now set the
bridge for type='network' interfaces as well as plain type='bridge'
interfaces. (Note that I had to make this change simultaneous to the
reorganization of qemuDomainChangeNet because the two are too
closely intertwined to separate).
The onlined vcpu pinning policy should inherit def->cpuset if
it's not specified explicitly, and the affinity should be set
in this case. Oppositely, the offlined vcpu pinning policy should
be free()'ed.
Various APIs use cgroup to either set or get the statistics of
host or guest. Hotplug or hot unplug new vcpus without creating
or removing the cgroup for the vcpus could cause problems for
those APIs. E.g.
% virsh vcpucount dom
maximum config 10
maximum live 10
current config 1
current live 1
% virsh setvcpu dom 2
% virsh schedinfo dom --set vcpu_quota=1000
Scheduler : posix
error: Unable to find vcpu cgroup for rhel6.2(vcpu: 1): No such file or
directory
This patch fixes the problem by creating cgroups for each of the
onlined vcpus, and destroying cgroups for each of the offlined
vcpus.
The comment stated that you may call qemuDomainObjBeginJobWithDriver
without passing qemud_driver to signal it's not locked.
qemuDomainObjBeginJobWithDriver still accesses the qemud_driver
structure and the lock singaling is done through a separate parameter.
Save/restore with passed through USB devices currently only works if the
USB device can be found at the same USB address where it used to be
before saving a domain. This makes sense in case a user explicitly
configure the USB address in domain XML. However, if the device was
found automatically by vendor/product identification, we should try to
search for that device when restoring the domain and use any device we
find as long as there is only one available. In other words, the USB
device can now be removed and plugged again or the host can be rebooted
between saving and restoring the domain.
Using VIR_DOMAIN_XML_MIGRATABLE flag, one can request domain's XML
configuration that is suitable for migration or save/restore. Such XML
may contain extra run-time stuff internal to libvirt and some default
configuration may be removed for better compatibility of the XML with
older libvirt releases.
This flag may serve as an easy way to get the XML that can be passed
(after desired modifications) to APIs that accept custom XMLs, such as
virDomainMigrate{,ToURI}2 or virDomainSaveFlags.
All USB device lookup functions emit an error when they cannot find the
requested device. With this patch, their caller can choose if a missing
device is an error or normal condition.
The code which looks up a USB device specified by hostdev is duplicated
in two places. This patch creates a dedicated function that can be
called in both places.
When adding variants of parameter setting APIs which accepted
flags, the existing APIs were all adapted internally to pass
VIR_DOMAIN_AFFECT_CURRENT to the new API. The QEMU impl
qemuSetSchedularParameters was an exception, which instead
used VIR_DOMAIN_AFFECT_LIVE. Change this to match other
compatibility scenarios, so that calling
virDomainSetSchedularParameters(dom, params, nparams);
Has the same semantics as
virDomainSetSchedularParametersFlags(dom, params, nparams, 0);
And
virDomainSetSchedularParametersFlags(dom, params, nparams, VIR_DOMAIN_AFFECT_CURRENT);
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As a side effect of changes in the functions virGetUserID and
virGetGroupID, the user and group configurations for DAC in qemu.conf
are now able to accept both names and IDs, supporting a leading plus
sign to ensure that a numeric value will not be interpreted as a name.
This patch updates the comments in qemu.conf, including a description of
this new behavior.
With the recent introduction of QMP capabilities probing, libvirt failed
to detect support for QXL graphics in QEMU 1.2 and newer. In addition to
fixing that, this patch also causes libvirt to detect QXL support for
qemu-kvm-0.13.0, which doesn't advertise it in -help output but mentions
it in device list. Since qemu-kvm-0.13.0 supported -spice, it looks like
not having qxl in -help was a bug.
When both kvmclock and kvm_pv_eoi are configured (either disabled or
enabled) libvirt will generate invalid CPU specification due to the
fact that even though kvmclock causes the CPU to be specified, it
doesn't set have_cpu flag to true (and the new kvm_pv_eoi as well).
This patch fixes the issue and adds a test exactly for that to show
that it is fixed correctly (and also to keep it that way in the future
of course).
Introduced in commit 0caccb58.
CC libvirt_driver_qemu_impl_la-qemu_capabilities.lo
../../src/qemu/qemu_capabilities.c: In function 'qemuCapsInitQMP':
../../src/qemu/qemu_capabilities.c:2327:13: error: format '%d' expects argument of type 'int', but argument 8 has type 'const char *' [-Werror=format]
* src/qemu/qemu_capabilities.c (qemuCapsInitQMP): Use correct format.
Since libvirt switched to QMP capabilities probing recently, it starts
QEMU process used for this probing with -daemonize, which means
virCommandAbort can no longer reach these processes. As a result of
that, restarting libvirtd will leave several new QEMU processes behind.
Let's use QEMU's -pidfile and use it to kill the process when QMP caps
probing is done.
When doing snapshots, the filesystem freeze function used the agent
entering function that expects the qemud_driver unlocked. This might
cause a deadlock of the qemu driver if the agent does not respond.
The only call path of this function has the qemud_driver locked, so this
patch changes the entering functions to those expecting the driver
locked.
Start a QEMU process using
$QEMU -S -no-user-config -nodefaults \
-nographic -M none -qmp unix:/some/path,server,nowait
and talk QMP over stdio to discover what capabilities the
binary supports. This works for QEMU 1.2.0 or later and
for older QEMU automatically fallback to the old approach
of parsing -help and related command line args.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some architectures provide the query-cpu-definitions command,
but are set to always return a "GenericError" from it :-(
Catch this & treat it as if there was an empty list of CPUs
returned
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After calling qemuMonitorClose(), it is still possible for
the QEMU monitor I/O event callback to get invoked. This
will trigger an error message because mon->fd has been set
to -1 at this point. Silently ignore the case where mon->fd
is -1, likewise for mon->watch being zero.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorOpen method only needs a virDomainObjPtr in order
to access the QEMU pid. This is not critical when detecting the
QEMU capabilties, so can easily be skipped
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The help output for QEMU 1.2.0 changed 'pci-assign' to 'kvm-pci-assign'.
Since the new capabilities code does exact device name matching
instead of substring matching, this caused the capabilities to go
missing.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the qemuCapsParseDeviceStr method has a bunch of open
coded string searches/comparisons to detect devices and their
properties. Soon this data will be obtained from QMP queries
instead of -device help output. Maintaining the list of device
and properties in two places is undesirable. Thus the existing
qemuCapsParseDeviceStr() method needs to be refactored to
separate the device types and properties from the actual
search code.
Thus the -device help output is now parsed to construct a
list of device names, and device properties. These are then
checked against a set of datatables to set the capability
flags
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If QEMU reports CommandNotFound for the 'query-events' command,
we must treat that as success, returning a zero-length array
of events
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorSetCapabilities() API is used to initialize the QMP
protocol capabilities. It has since been abused to initialize some
libvirt internal capabilities based on command/event existance too.
Move the latter code out into qemuCapsProbeQMP() in the QEMU
capabilities source file instead
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemu monitor does not require qemu_conf.h, and the
qemu capabilities code actually wants bitmap.h
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetTargetArch() method to support invocation
of the 'query-target' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetObjectProps() method to support invocation
of the 'device-list-properties' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetObjectTypes() method to support invocation
of the 'qom-list-types' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetEvents() method to support invocation
of the 'query-events' JSON monitor command. No HMP equivalent
is required, since this will only be used when JSON is available
The existing qemuMonitorJSONCheckEvents() method is refactored
to use this new method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetCPUCommands() method to support invocation
of the 'query-commands' JSON monitor command. No HMP equivalent
is required, since this will only be used when JSON is available
The existing qemuMonitorJSONCheckCommands() method is refactored
to use this new method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetCPUDefinitions() method to support invocation
of the 'query-cpu-definitions' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetMachines() method to support invocation
of the 'query-machines' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetVersion() method to support invocation
of the 'query-version' JSON monitor command. No HMP equivalent
is provided, since this will only be used for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuCapsProbeMachineTypes & qemuCapsProbeCPUModels methods
do not need to be invoked directly anymore. Make them static
and refactor them to directly populate the qemuCapsPtr object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When launching a QEMU guest the binary is probed to discover
the list of supported CPU names. Remove this probing with a
simple lookup of CPU models in the qemuCapsPtr object. This
avoids another invocation of the QEMU binary during the
startup path.
As a nice benefit we can now remove all the nasty hacks from
the test suite which were done to avoid having to exec QEMU
on the test system. The building of the -cpu command line
can just rely on data we pre-populate in qemuCapsPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When XML for a new guest is received, the machine type is
immediately canonicalized into the version specific name.
This involves probing QEMU for supported machine types.
Replace this probing with a lookup of the machine types
in the (hopefully cached) qemuCapsPtr object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove all use of the existing APIs for querying QEMU
capability flags. Instead obtain a qemuCapsPtr object
from the global cache. This avoids the execution of
'qemu -help' (and related commands) when launching new
guests.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When building up a virCapsPtr instance, the QEMU driver
was copying the list of machine types across from the
previous virCapsPtr instance, if the QEMU binary had not
changed. Replace this ad-hoc caching of data with use
of the new qemuCapsCache global cache.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a qemuCapsCachePtr object to provide a global cache
of capabilities for QEMU binaries. The cache auto-populates
on first request for capabilities about a binary, and will
auto-refresh if the binary has changed since a previous cache
was populated
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
For historical compat we use 'itanium' as the arch name, so
if the QEMU binary suffix is 'ia64' we need to translate it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the qemuAgentClose method is called from a place which holds
the domain lock, it is theoretically possible to get a deadlock
in the agent destroy callback. This has not been observed, but
the equivalent code in the QEMU monitor destroy callback has seen
a deadlock.
Remove the redundant locking while unrefing the object and the
bogus assignment
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Many parts of virDomainDefPtr were using 'int' variables as
array length counts. Replace all these with size_t and update
various format strings & API signatures to adapt
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU
monitor callbacks
Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)):
#0 0x00000030d0e0de4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000030d0e09ca6 in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00000030d0e09ba8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fcd162f416d in virMutexLock (m=<optimized out>)
at util/threads-pthread.c:85
#4 0x00007fcd1632c651 in virDomainObjLock (obj=<optimized out>)
at conf/domain_conf.c:14256
#5 0x00007fcd0daf05cc in qemuProcessHandleMonitorDestroy (mon=0x7fcccc0029e0,
vm=0x7fcccc00a850) at qemu/qemu_process.c:1026
#6 0x00007fcd0db01710 in qemuMonitorDispose (obj=0x7fcccc0029e0)
at qemu/qemu_monitor.c:249
#7 0x00007fcd162fd4e3 in virObjectUnref (anyobj=<optimized out>)
at util/virobject.c:139
#8 0x00007fcd0db027a9 in qemuMonitorClose (mon=<optimized out>)
at qemu/qemu_monitor.c:860
#9 0x00007fcd0daf61ad in qemuProcessStop (driver=driver@entry=0x7fcd04079d50,
vm=vm@entry=0x7fcccc00a850,
reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0)
at qemu/qemu_process.c:4057
#10 0x00007fcd0db323cf in qemuDomainDestroyFlags (dom=<optimized out>,
flags=<optimized out>) at qemu/qemu_driver.c:1977
#11 0x00007fcd1637ff51 in virDomainDestroyFlags (
domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
At frame #10 we are holding the domain lock, we call into
qemuProcessStop() to cleanup QEMU, which triggers the monitor
to close, which invokes qemuProcessHandleMonitorDestroy() which
tries to obtain the domain lock again. This is a non-recursive
lock, hence hang.
Since qemuMonitorPtr is a virObject, the unref call in
qemuProcessHandleMonitorDestroy no longer needs mutex
protection. The assignment of priv->mon = NULL, can be
instead done by the caller of qemuMonitorClose(), thus
removing all need for locking.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If QEMU quits immediately after we opened the monitor it was
possible we would skip the clearing of the SELinux process
socket context
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the cgroups APIs we have a virCgroupKillPainfully function
which does the loop sending SIGTERM, then SIGKILL and waiting
for the process to exit. There is similar functionality for
simple processes in qemuProcessKill, but it is tangled with
the QEMU code. Untangle it to provide a virProcessKillPainfuly
function
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When calling qemuProcessKill from the virDomainDestroy impl
in QEMU, do not ignore the return value. This ensures that
if QEMU fails to respond to SIGKILL, the caller will know
about the failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Recently, there have been some improvements made to qemu so it
supports seamless migration or something very close to it.
However, it requires libvirt interaction. Once qemu is migrated,
the SPICE server needs to send its internal state to the destination.
Once it's done, it fires SPICE_MIGRATE_COMPLETED event and this
fact is advertised in 'query-spice' output as well.
We must not kill qemu until SPICE server finishes the transfer.
SELinux wants all log files opened with O_APPEND. When
running non-root though, libvirtd likes to use O_TRUNC
to avoid log files growing in size indefinitely. Instead
of using O_TRUNC though, we can use O_APPEND and then
call ftruncate() which keeps SELinux happier.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of process related functions spread
across multiple files. Start to consolidate them by
creating a virprocess.{c,h} file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In most of the snapshot API's there's no need to hold the driver lock
the whole time.
This patch adds helper functions that get the domain object in functions
that don't require the driver lock and simplifies call paths from
snapshot-related API's.
maxcpu and hostcpus are defined and calculated in qemudDomainPinVcpuFlags()
and qemudDomainPinEmulator(), but never used. So remove them including nodeinfo.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Disk hotplug is a two phase action: qemuMonitorAddDrive followed by
qemuMonitorAddDevice. When the first part succeeds but the second one
fails, we need to rollback the drive addition.
https://www.gnu.org/licenses/gpl-howto.html recommends that
the 'If not, see <url>.' phrase be a separate sentence.
* tests/securityselinuxhelper.c: Remove doubled line.
* tests/securityselinuxtest.c: Likewise.
* globally: s/; If/. If/
The "dump-guest-core' option is new option for the machine type
(-machine pc,dump-guest-core) that controls whether the guest memory
will be marked as dumpable.
While testing this, I've found out that the value for the '-M' options
is not parsed correctly when additional parameters are used. However,
when '-machine' is used for the same options, it gets parsed as
expected. That's why this patch also modifies the parsing and creating
of the command line, so both '-M' and '-machine' are recognized. In
QEMU's help there is only mention of the 'machine parameter now with
no sign of the older '-M'.
This patch cleans up building the "-boot" parameter and while on that
fixes one inconsistency by modifying these things:
- I completed the unfinished virDomainBootMenu enum by specifying
LAST, declaring it and also declaring the TypeFromString and
TypeToString parameters.
- Previously mentioned TypeFromString and TypeToString are used when
parsing the XML.
- Last, but not least, visible change is that the "-boot" parameter
is built and parsed properly:
- The "order=" prefix is used only when additional parameters are
used (menu, etc.).
- It's rewritten in a way that other parameters can be added
easily in the future (used in following patch).
- The "order=" parameter is properly parsed regardless to where it
is placed in the string (e.g. "menu=on,order=nc").
- The "menu=" parameter (and others in the future) are created
when they should be (i.e. even when bootindex is supported and
used, but not when bootloader is selected).
Currently, we mark domain PAUSED (but not emit an event)
just before we issue 'stop' on monitor; This command can
take ages to finish, esp. when domain's doing a lot of
IO - users can enforce qemu to open files with O_DIRECT
which doesn't return from write() until data reaches the
block device. Having said that, we report PAUSED even if
domain is not paused yet.
This series adds support to run QEMU with seccomp sandbox enabled. It can be
configured in qemu.conf to on, off, or the QEMU default, which is off in 1.2.
Default value is the QEMU default.
On agent EOF the qemuProcessHandleAgentEOF() callback is called
which locks virDomainObjPtr. Then qemuAgentClose() is called
(with domain object locked) which eventually calls qemuAgentDispose()
and qemuProcessHandleAgentDestroy(). This tries to lock the
domain object again. Hence the deadlock.
All of ide-drive, ide-hd, ide-cd, scsi-disk, scsi-hd, and scsi-cd
supports wwn property. (NB, scsi-block doesn't support to set wwn).
* src/qemu/qemu_command.c: Error out if underlying QEMU doesn't
support wwn property for the device; Set wwn for the device otherwise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-ide-wwn.args: New test
* tests/qemuxml2argvdata/qemuxml2argv-disk-ide-wwn.xml: Likewise
* tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-disk-wwn.args: Likewise
* tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-disk-wwn.xml: Likewise
* tests/qemuxml2argvtest.c: Add the new tests.
This assumes ide-drive.wwn, ide-hd.wwn, ide-cd.wwn were supported
at the same time, similar for scsi-disk.wwn, scsi-hd.wwn, and
scsi-cd.wwn. So only two new caps (QEMU_CAPS_IDE_DRIVE_WWN,
and QEMU_CAPS_SCSI_DISK_WWN) are introduced.
Upstream qemu has raised a concern about whether dumping guest
memory by reading guest paging tables is a security hole:
https://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02607.html
While auditing libvirt to see if we would be impacted, I noticed
that we had some dead code. It is simpler to nuke the dead code
and limit our monitor code to just the subset we make use of.
* src/qemu/qemu_monitor.h (QEMU_MONITOR_DUMP): Drop poorly named
and mostly-unused enum.
* src/qemu/qemu_monitor.c (qemuMonitorDumpToFd): Drop arguments.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDump): Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDump): Likewise.
* src/qemu/qemu_driver.c (qemuDumpToFd): Update caller.
If the qemuBuildCommandLine method raised an error before the
virCommandPtr instance was created, the local var would not
be initialized, resulting in a possible SEGV in the error
cleanup branch. Also add some debugging of the method params
Introduce a qemuCapsNewForBinary() API which creates a new
QEMU capabilities object, populated with data relating to
a specific QEMU binary. The qemuCaps object is also given
a timestamp, which makes it possible to detect when the
cached capabilities for a binary are out of date
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Don't bother checking for the existance of the HMP passthrough
command. Just try to execute it, and propagate the failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorHMPCommand() API and things it calls will report
a wide variety of errors. The QEMU text monitor should not be
overwriting these errors
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds full support for EOI setting for domains. Because this
is CPU feature (flag), the model needs to be added even when it's not
specified. Fortunately this problem was already solved with kvmclock,
so this patch simply abuses that.
And due to the size of the patch (17 lines) I dared to include the tests.
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=843372
when qemu supports the 'transaction' monitor command,
and libvirt's --reuse-ext flag was not specified, libvirt created
a stub file with zero size in first place. After the failure of
QEMU transaction command performing qcow2 snapshots on more than
one drives, the stub file is left behind with non-empty
by the QEMU transaction command.
In order to unlink the file, the patch removes the file size checking.
Steps to reproduce the issue:
Steps:
1, Create a qemu instance with two drive images of qcow2 type (root user)
/usr/libexec/qemu-kvm -m 1024 -smp 1 -name "rhel6u1" \
-drive file=/var/lib/libvirt/images/firstqcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-drive file=/var/lib/libvirt/images/secondqcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -qmp stdio
2, Initialize qemu qmp
{"execute":"qmp_capabilities"}
3, Remove the second drive image file
rm -f /var/lib/libvirt/images/secondqcow2
4, Run 'transaction' command with snapshot qemu commands in.
{"execute":"transaction","arguments":
{"actions":
[{"type":"blockdev-snapshot-sync","data":
{"device":"drive-virtio-disk0","snapshot-file":"/var/lib/libvirt/images/firstqcow2-snapshot.img","format":"qcow2"}
},
{"type":"blockdev-snapshot-sync","data":
{"device":"drive-virtio-disk1","snapshot-file":"/var/lib/libvirt/images/secondqcow2-snapshot.img","format":"qcow2"}
}]
},
"id":"libvirt-6"}
5, Got the error as follows:
{"id": "libvirt-6",
"error": {"class": "OpenFileFailed", "desc": "Could not open '/var/lib/libvirt/images/secondqcow2-snapshot.img'",
"data": {"filename": "/var/lib/libvirt/images/secondqcow2-snapshot.img"}
}
}
6, List first newly-created snapshot file:
-rw-r--r--. 1 root root 262144 Sep 13 11:43 firstqcow2-snapshot.img
The QEMU capabilities APIs used a misc of 'int' and
'unsigned int' for variables relating to array sizes.
Change all these to use 'size_t'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow each VM instance to record additional capabilities
without affecting other VMs, there needs to be a way to do
a deep copy of the qemuCapsPtr object
Add struct fields and APIs to allow the qemu capabilities object
to store version, arch, machines & cpu names, etc
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current qemu capabilities are stored in a virBitmapPtr
object, whose type is exposed to callers. We want to store
more data besides just the flags, so we need to move to a
struct type. This object will also need to be reference
counted, since we'll be maintaining a cache of data per
binary. This change introduces a 'qemuCapsPtr' virObject
class. Most of the change is just renaming types and
variables in all the callers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Technically speaking we should wait until we receive the QMP
greeting message before attempting to send any QMP monitor
commands. Mostly we've got away with this, but there is a race
in some QEMU which cause it to SEGV if you sent it data too
soon after startup. Waiting for the QMP greeting avoids the
race
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a qemu flag for USB redirection filter support.
The output:
usb-redir.chardev=chr
usb-redir.debug=uint8
usb-redir.filter=string
usb-redir.port=string
I got an off-list report about a bad diagnostic:
Target network card mac 52:54:00:49:07:ccdoes not match source 52:54:00:49:07:b8
True to form, I've added a syntax check rule to prevent it
from recurring, and found several other offenders.
* cfg.mk (sc_require_whitespace_in_translation): New rule.
* src/conf/domain_conf.c (virDomainNetDefCheckABIStability): Add
space.
* src/esx/esx_util.c (esxUtil_ParseUri): Likewise.
* src/qemu/qemu_command.c (qemuCollectPCIAddress): Likewise.
* src/qemu/qemu_driver.c (qemuDomainSetMetadata)
(qemuDomainGetMetadata): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainChangeNetBridge): Likewise.
* src/rpc/virnettlscontext.c
(virNetTLSContextCheckCertDNWhitelist): Likewise.
* src/vmware/vmware_driver.c (vmwareDomainResume): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainGetXMLDesc, vboxAttachDrives):
Avoid false negatives.
* tools/virsh-domain.c (info_save_image_dumpxml): Reword.
Based on a report by Luwen Su.
Currently qemuMonitorOpen() requires an address of the QEMU
monitor. When doing QMP based capabilities detection it is
easier if a pre-opened FD can be provided, since then the
monitor can be run on the STDIO console. Add a new API
qemuMonitorOpenFD() for such usage
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add some non-null annotations to qemuMonitorOpen and also
check that the error callback is set, since it is mandatory
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The quota and period tunables for cpu scheduler accept only a certain
range of values. When changing the live configuration invalid values get
rejected. This check is not performed when changing persistent config.
This patch adds a separate range check, that improves error messages
when changing live config and adds the check for persistent config.
This check is done only when using the API. It is still possible to
specify invalid values in the XML.
This patch tries to clean the code up a little bit and shorten very long
lines.
The apparent semantic change from moving the condition before calling
the setter function is a non-issue here as the setter function is a
no-op when called with both arguments zero.
This is another fix for the emulator-pin series. When going through
the cputune pinning settings, the current code is trying to pin all
the CPUs, even when not all of them are specified. This causes error
in the subsequent function which, of course, cannot find the cpu to
pin. Since it's enough to pass the correct VCPU ID to the function,
the fix is trivial.
Only VNC_{{DIS,}CONNECTED,INITIALIZED} and SPICE_INITIALIZED events are
documented to support server/auth field and even there it is marked as
optional. Emit "" auth scheme in case QEMU didn't send it.
After discussion with DB we decided to rename the new iolimit
element as it creates the impression it would be there to
limit (i.e. throttle) I/O instead of specifying immutable
characteristics of a block device.
This is also backed by the fact that the term I/O Limits has
vanished from newer storage admin documentation.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When reboot using qemu guest agent was requested, qemu driver kept
waiting for SHUTDOWN event from qemu. However, such event is never
emitted during guest reboot and qemu driver would keep waiting forever.
This patch adds support for running qemu guests with the required
parameters to forcefully enable or disable BIOS advertising of S3 and
S4 states. The support for this is added to capabilities and there is
also a qemu command parameter parsing implemented.
Implementation of iolimits for the qemu driver with
capability probing for block size attribute and
command line generation for block sizes.
Including testcase for qemuxml2argvtest.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
After fixing the last review comments on remote port searching (commit
a14b4aea51), the commit right after that
wasn't modified accordingly, therefore two values weren't changed as
they should and the configurable ports don't work as expected.
This simple commit changes last two values missed and fixes the issue.
When domain XML contains any of the elements for setting up CPU
scheduling parameters (period, quota, emulator_period, or
emulator_quota) we need cpu cgroup to enforce the configuration.
However, the existing code would just ignore silently such settings if
either cgroups were not available at all cpu cgroup was not available.
Moreover, APIs for manipulating CPU scheduler parameters were already
failing if cpu cgroup was not available. This patch makes cpu cgroup
mandatory for all domains that use CPU scheduling elements in their XML.
The variable max_id is initialized again in the step of
getting cpu mapping variable map2. But in the next for loop
we still expect original value of max_id, the bug will
crash libvirtd when using on NUMA machine with big number
of cpus.
virDomainVcpuPinAdd does a realloc on vcpupin_list if the new vcpu pin
definition doesn't fit into the array. The list is an array of pointers
but the function definition didn't support returning the changed pointer
to the caller if it was realloced. This caused segfaults if realloc
would change the base pointer.
Commit d0c0e79ac6 left behind some dead
code (hasDAC can't be efectively set to true, because
virSecurityManagerNew fails to load the "dac" driver).
This patch also enhances the condition for adding the default
auto-detected security manager if the manager array is allocated but
empty.
Also the configuration file for qemu driver still contains reference to
the DAC driver that can't be enabled manualy.
Before commit 05447e3af4, qemuAgentCommand
blocked until it got a reply or appropriate event. When new parameter
was added to qemuAgentCommand in the above commit, all existing callers
of it were updated in a wrong way changing them from blocking to
5-seconds timeout.
This bug was revealed by the crash described in
https://bugzilla.redhat.com/show_bug.cgi?id=852383
The vlan info pointer sent to virNetDevOpenvswitchAddPort should never
be non-NULL unless there is at least one tag. The factthat such a vlan
info pointer was receveid pointed out that a caller was passing the
wrong pointer. Instead of sending &net->vlan, the result of
virDomainNetGetActualVlan(net) should be sent - that function will
look for vlan info in net->data.network.actual->vlan, and in cany case
return NULL instead of a pointer if the vlan info it finds has no
tags.
Aside from causing the crash, sending a hardcoded &net->vlan has the
effect of ignoring vlan info from a <network> or <portgroup> config.
If no 'security_driver' config option was set, then the code
just loaded the 'dac' security driver. This is a regression
on previous behaviour, where we would probe for a possible
security driver. ie default to SELinux if available.
This changes things so that it 'security_driver' is not set,
we once again do probing. For simplicity we also always
create the stack driver, even if there is only one driver
active.
The desired semantics are:
- security_driver not set
-> probe for selinux/apparmour/nop
-> auto-add DAC driver
- security_driver set to a string
-> add that one driver
-> auto-add DAC driver
- security_driver set to a list
-> add all drivers in list
-> auto-add DAC driver
It is not allowed, or possible to specify 'dac' in the
security_driver config param, since that is always
enabled.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The security driver loading code in qemu has a flaw that causes it to
register the DAC security driver twice. This causes problems (machines
unable to start) as the two DAC drivers clash together.
This patch refactors the code to allow loading the DAC driver even if
its specified in configuration (it can't be registered as a common
security driver), and does not add the driver twice.
If cgroups are enabled in general but cpu cgroup is disabled in
qemu.conf or not mounted at all, libvirt would refuse to start any
domain even though scheduler parameters are not set in domain XML.
This patch makes cpu cgroup mandatory only for domains that actually
want to use it.
With current flow in qemudDomainDefine we might lose data
when updating an existing domain. We parse given XML and
overwrite the configuration. Then we try to save the new
config. However, this step may fail and we don't perform any
roll back. In fact, we remove the domain from the list of
domains held up by qemu driver. This is okay as long as the
domain was brand new one.
Currently, when guest agent is configured but not responsive
(e.g. due to appropriate service not running in the guest)
we return VIR_ERR_INTERNAL_ERROR. Both are wrong. Therefore
we need to introduce new error code to reflect this case.
With the latest patches libvirt supports qemu agent monitor
passthrough. However, function in qemu driver is called
qemuDrvDomainAgentCommand. s/Drv// as used in all other names.
In my quest for reusing variables I failed to edit one variable when
fixing details between two patch versions. That results in a failure
to start qemu with autoport and spice tls, because qemu is trying to
bind two sockets to the same port.
Commit 4b03d59167 changed the pinning
behavior in a way that makes some machines non-startable.
The comment mentioning that we cannot control each vcpu when there is
not VCPU<-> PID mapping available is true, however, this isn't
necessarily an error, because this can be caused by old QEMU without
support for "query-cpus" command as well as a software emulated
machines that don't create more than one process.