Minimal patch to wire up all the pieces in the previous patches
to actually enable a block copy job. By minimal, I mean that
qemu creates the file (that is, no REUSE_EXT flag support yet),
SELinux must be disabled, a lock manager is not informed, and the
audit logs aren't updated. But those will be added as
improvements in future patches.
This patch is designed so that if we ever add a future API
virDomainBlockCopy with more bells and whistles (such as letting
the user specify a destination image format different than the
source), where virDomainBlockRebase is a wrapper around the
simpler portions of the new functionality, then the new API can
just reuse the new qemuDomainBlockCopy function and already
support _SHALLOW and _REUSE_EXT flags. Also note that libvirt.c
already filtered the new flags if _COPY is not present, so that
we are not impacting the case of BlockRebase being a wrapper
around BlockPull.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
(qemuDomainBlockRebase): Call it when appropriate.
Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a block-job-complete command; worse, it
is also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process. A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.
At the time of this patch, the design for persistent bitmap has not
been clarified, so a followup patch will be needed once qemu
actually figures out how to expose it, and we figure out how to use
it. In the meantime, we have a solution that avoids the worst of
the problem. [This problem was first analyzed with the RHEL 6.3
__com.redhat_drive-reopen command; which partly explains why
upstream qemu 1.3 ditched the drive-reopen idea and went with
block-job-complete plus persistent bitmap instead.]
If we surround 'drive-reopen' with a pause/resume pair, then we can
guarantee that the guest cannot modify either source or destination
files in the window of libvirtd uncertainty, and the management app
is guaranteed that either libvirt knows the outcome and reported it
correctly; or that on libvirtd restart, the guest will still be
paused and that the qemu process cannot have disappeared due to
guest shutdown; and use that as a clue that the management app must
implement recovery protocol, with both source and destination files
still being in sync and with 'query-block' still being an option as
part of that recovery. My testing shows that the pause window will
typically be only a fraction of a second.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).
This patch intentionally avoids SELinux, lock manager, and audit
actions. Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands when libvirtd restarts. This
patch is enough to test the common case of success when used
correctly, while saving the subtleties of proper cleanup for
worst-case errors for later.
When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel). Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.
Ideas for future enhancements via new flags:
Depending on when persistent bitmap support is added, it may be
worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up
front if we detect an older qemu with risky pivot operation.
Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file. There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time. And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain. This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
Handle the new type of block copy event and info. Of course,
this patch does nothing until a later patch actually allows the
creation/abort of a block copy job.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New
block job status.
* src/libvirt.c (virDomainBlockRebase): Document the event.
* src/qemu/qemu_monitor_json.c (eventHandlers): New event.
(qemuMonitorJSONHandleBlockJobReady): New function.
(qemuMonitorJSONGetBlockJobInfoOne): Translate new job type.
(qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize
the event to minimize snooping.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful
info query to save effort on a pivot request.
For now, disk migration via block copy job is not implemented in
libvirt. But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact. Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need. Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.
We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain. This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.
The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror'
and 'block-job-complete'[1], which can drive live block copy and
storage migration. [Additionally, RHEL 6.3 had backported an earlier
version of most of the same functionality, but under the names
'__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with
slightly different JSON arguments, and has been using patches similar
to these upstream patches for several months now.]
The libvirt API virDomainBlockRebase as already committed for 0.9.12
is flexible enough to expose the basics of block copy, but some
additional features in the 'drive-mirror' qemu command, such as
setting error policy, setting granularity, or using a persistent
bitmap, may later require a new libvirt API virDomainBlockCopy. I
will wait to add that API until we know more about what qemu 1.3
will finally provide.
This patch caters only to the upstream qemu 1.3 interface, although
I have proven that the changes for RHEL 6.3 can be isolated to
just qemu_monitor_json.c, and the rest of this series will
gracefully handle either interface once the JSON differences are
papered over in a downstream patch.
For consistency with other block job commands, libvirt must handle
the bandwidth argument as MiB/sec from the user, even though qemu
exposes the speed argument as bytes/sec; then again, qemu rounds
up to cluster size internally, so using MiB hides the worst effects
of that rounding if you pass small numbers.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04123.html
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR)
(QEMU_CAPS_DRIVE_REOPEN): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror)
(qemuMonitorDrivePivot): Declare them.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): New passthroughs.
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): Declare them.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=862515
which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.
(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)
A synopsis of the problems:
1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.
2) you *can* do a live attach-interface of such a device.
3) you *can* directly edit a domain and put in two devices with
matching mac addresses.
4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.
5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.
Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:
1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.
2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.
This patch does three things:
1) removes the check for duplicate mac address during a persistent
netdev attach.
2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.
Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)
3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.
Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device
* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
* added new virDomainNetFindIdx function
* removed now unused virDomainNetIndexByMac and
virDomainNetRemoveByMac
* src/qemu/qemu_driver.c
* remove check for duplicate max from qemuDomainAttachDeviceConfig
* use virDomainNetFindIdx/virDomainNetRemove instead
of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
* use virDomainNetFindIdx instead of virDomainIndexByMac
in qemuDomainUpdateDeviceConfig
* src/qemu/qemu_hotplug.c
* use virDomainNetFindIdx instead of a homespun loop in
qemuDomainDetachNetDevice.
* tools/virsh-domain.c: modified detach-interface command as described
above
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones. So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:
$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
Cleaning this up unraveled a chain of other unused variables.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus. Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.
While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.
* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
On one hand, numad probably will manage the affinity of domain process
dynamically in future. On the other hand, even numad won't manage it,
it still could confusion. Let's make things simpler enough to avoid
the lair for now.
When the cpu placement model is "auto", it sets the affinity for
domain process with the advisory nodeset from numad, however,
creating cgroup for the domain process (called emulator thread
in some contexts) later overrides that with pinning it to all
available pCPUs.
How to reproduce:
* Configure the domain with "auto" placement for <vcpu>, e.g.
<vcpu placement='auto'>4</vcpu>
* % virsh start dom
* % cat /proc/$dompid/status
Though the emulator cgroup cause conflicts, but we can't simply
prohibit creating it, as other tunables are still useful, such
as "emulator_period", which is used by API
virDomainSetSchedulerParameter. So this patch doesn't prohibit
creating the emulator cgroup, but inherit the nodeset from numad,
and reset the affinity for domain process.
* src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
to accept the passed nodenet
* src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
Abstract the codes to prepare cpumap into a helper a function,
which can be used later.
* src/qemu/qemu_process.h: Declare qemuPrepareCpumap
* src/qemu/qemu_process.c: Implement qemuPrepareCpumap, and use it.
Transport Open vSwitch per-port data during live
migration by using the utility functions
virNetDevOpenvswitchGetMigrateData() and
virNetDevOpenvswitchSetMigrateData().
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Add the ability for the Qemu V3 migration protocol to
include transporting network configuration. A generic
framework is proposed with this patch to allow for the
transfer of opaque data.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Laine Stump <laine@laine.org>
The snapshot code when reusing an existing file had hard-to-read
logic, as well as a missing sanity check: REUSE_EXT should require
the destination to already be present.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare): Require
destination on REUSE_EXT, rename variable for legibility.
Currently it's assumed that qemu always supports VNC, however it is
definitely possible to compile qemu without VNC support so we should at
the very least check for it and handle that correctly.
Yet another instance of where using plain open() mishandles files
that live on root-squash NFS, and where improving the API can
improve the chance of a successful probe.
* src/util/storage_file.h (virStorageFileProbeFormat): Alter
signature.
* src/util/storage_file.c (virStorageFileProbeFormat): Use better
method for opening file.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Update caller.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
In v2 migration protocol, XML is obtained by calling domainGetXMLDesc.
This includes the default USB controller in XML, which breaks migration
to older libvirt (before 0.9.2).
Commit 409b5f5495
qemu: Emit compatible XML when migrating a domain
only fixed this for v3 migration.
This patch uses the new VIR_DOMAIN_XML_MIGRATABLE flag (detected by
VIR_DRV_FEATURE_XML_MIGRATABLE) to obtain XML without the default controller,
enabling backward v2 migration.
As we switched to setting capabilities based on QMP communication,
qemu seamless-migration capability was not set. In the -help output
this knob is called seamless-migration=[on|off]. The equivalent in
QMP world is SPICE_MIGRATE_COMPLETED event (qemu upstream commit
2fdd16e2).
Gcc with optimization warns:
../../src/qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
../../src/qemu/qemu_driver.c:12813:46: error: 'disk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
../../src/qemu/qemu_driver.c:12698:25: note: 'disk' was declared here
cc1: all warnings being treated as errors
so obviously I had only been testing with optimization off.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Guard cleanup.
I finally have all the pieces in place to perform a block-commit with
SELinux enforcing. There's still missing cleanup work when the commit
completes, but doing that requires tracking both the backing chain and
the base and top files within that chain in domain XML across libvirtd
restarts. Furthermore, from a security standpoint, once you have
granted access, you must assume any damage that can be done will be
done; later revoking access is nice to minimize the window of damage,
but less important as it does not affect the fact that damage can be
done in the first place. Therefore, deferring the revoke efforts until
we have better XML tracking of what chain operations are in effect,
including across a libvirtd restart, is reasonable.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Label disks as
needed.
(qemuDomainPrepareDiskChainElement): Cast away const.
Previously, snapshot code did its own permission granting (lock
manager, cgroup device controller, and security manager labeling)
inline. But now that we are adding block-commit and block-copy
which also have to change permissions, it's better to reuse
common code for the task. While snapshot should fall back to
no access if read-write access failed, block-commit will want to
fall back to read-only access. The common code doesn't know
whether failure to grant read-write access should revert to no
access (snapshot, block-copy) or read-only access (block-commit).
This code can also be used to revoke access to unused files after
block-pull.
It might be nice to clean things up in a future patch by adding
new functions to the lock manager, cgroup manager, and security
manager that takes a single file name and applies context of a
disk to that file, rather than the current semantics of applying
context to the entire chain already associated to a disk. That
way, we could avoid the games this patch plays of temporarily
swapping out the disk->src and related fields of the disk. But
that would involve more code changes, so this patch really is
the smallest hack for doing the necessary work; besides, this
patch is more or less code motion (the hack was already employed
by the snapshot creation code, we are just making it reusable).
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Refactor labeling hacks...
(qemuDomainPrepareDiskChainElement): ...into new function.
Now that we can crawl the chain of backing files, we can do
argument validation and implement the 'shallow' flag. In
testing this, I discovered that it can be handy to pass the
shallow flag and an explicit base, as a means of validating
that the base is indeed the file we expected.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Crawl through
chain to implement shallow flag.
* src/libvirt.c (virDomainBlockCommit): Relax API.
This is the bare minimum to kick off a block commit. In particular,
flags support is missing (shallow requires us to crawl the backing
chain to determine the file name to pass to the qemu monitor command;
delete requires us to track what needs to be deleted at the time
the completion event fires). Also, we are relying on qemu to do
error checking (such as validating 'top' and 'base' as being members
of the backing chain), including the fact that the current qemu code
does not support committing the active layer (although it is still
planned to add that before qemu 1.3). Since the active layer won't
change, we have it easy and do not have to alter the domain XML.
Additionally, this will fail if SELinux is enforcing, because we fail
to grant qemu proper read/write access to the files it will modify.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): New function.
(qemuDriver): Register it.
qemu 1.3 will be adding a 'block-commit' monitor command, per
qemu.git commit ed61fc1. It matches nicely to the libvirt API
virDomainBlockCommit.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_COMMIT): New bit.
* src/qemu/qemu_capabilities.c (qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_monitor.h (qemuMonitorBlockCommit): New prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockCommit): Implement it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise.
(qemuMonitorJSONHandleBlockJobImpl)
(qemuMonitorJSONGetBlockJobInfoOne): Handle new event type.
We used to walk the backing file chain at least twice per disk,
once to set up cgroup device whitelisting, and once to set up
security labeling. Rather than walk the chain every iteration,
which possibly includes calls to fork() in order to open root-squashed
NFS files, we can exploit the cache of the previous patch.
* src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
signature.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
to supply backing chain via disk, if recursion is desired.
* src/security/security_dac.c
(virSecurityDACSetSecurityImageLabel): Adjust caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityImageLabel): Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Likewise.
(qemuSetupCgroup): Pre-populate chain.
Technically, we should not be re-probing any file that qemu might
be currently writing to. As such, we should cache the backing
file chain prior to starting qemu. This patch adds the cache,
but does not use it until the next patch.
Ultimately, we want to also store the chain in domain XML, so that
it is remembered across libvirtd restarts, and so that the only
kosher way to modify the backing chain of an offline domain will be
through libvirt API calls, but we aren't there yet. So for now, we
merely invalidate the cache any time we do a live operation that
alters the chain (block-pull, block-commit, external disk snapshot),
as well as tear down the cache when the domain is not running.
* src/conf/domain_conf.h (_virDomainDiskDef): New field.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean new field.
* src/qemu/qemu_domain.h (qemuDomainDetermineDiskChain): New
prototype.
* src/qemu/qemu_domain.c (qemuDomainDetermineDiskChain): New
function.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive): Pre-populate chain.
(qemuDomainSnapshotCreateSingleDiskActive): Uncache chain before
snapshot.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Update
chain after block pull.
Requiring pre-allocation was an unusual idiom. It allowed iteration
over the backing chain to use fewer mallocs, but made one-shot
clients harder to read. Also, this makes it easier for a future
patch to move away from opening fds on every iteration over the chain.
* src/util/storage_file.h (virStorageFileGetMetadataFromFD): Alter
signature.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Allocate
return value.
(virStorageFileGetMetadata): Update clients.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
This is the last use of raw strings for disk formats throughout
the src/conf directory.
* src/conf/snapshot_conf.h (_virDomainSnapshotDiskDef): Store enum
rather than string for disk type.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefClear)
(virDomainSnapshotDiskDefParseXML, virDomainSnapshotDefFormat):
Adjust users.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive): Likewise.
Express the default disk type as an enum, for easier handling.
* src/conf/capabilities.h (_virCaps): Store enum rather than
string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Adjust
clients.
* src/qemu/qemu_driver.c (qemuCreateCapabilities): Likewise.
When an image has no backing file, using VIR_STORAGE_FILE_AUTO
for its type is a bit confusing. Additionally, a future patch
would like to reserve a default value for the case of no file
type specified in the XML, but different from the current use
of -1 to imply probing, since probing is not always safe.
Also, a couple of file types were missing compared to supported
code: libxl supports 'vhd', and qemu supports 'fat' for directories
passed through as a file system.
* src/util/storage_file.h (virStorageFileFormat): Add
VIR_STORAGE_FILE_NONE, VIR_STORAGE_FILE_FAT, VIR_STORAGE_FILE_VHD.
* src/util/storage_file.c (virStorageFileMatchesVersion): Match
documentation when version probing not supported.
(cowGetBackingStore, qcowXGetBackingStore, qcow1GetBackingStore)
(qcow2GetBackingStoreFormat, qedGetBackingStore)
(virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFD): Take NONE into account.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/conf/storage_conf.c (virStorageVolumeFormatFromString): New
function.
(poolTypeInfo): Use it.
Relabeling tapfd right after the tap device is created.
qemuPhysIfaceConnect is common function called both for static
netdevs and for hotplug netdevs.
Having hostuuid in migration cookie is a nice bonus since it provides an
easy way of detecting migration to the same host. However, requiring it
breaks backward compatibility with older libvirt releases.
Recently, patches were added support for (managed)saving, restoring, and
migrating domains with host USB devices. However, qemu driver would
still forbid migration of such domains because qemuMigrationIsAllowed
was not updated.
If we can't probe the architecture from QMP we parse the architecture
from the qemu binaries name. This results in the architecture being i386
instead of i686 which then results in QEMU_CAPS_PCI_MULTIBUS being unset
which gives a broken qemu command line.
This probably didn't show up earlier since most of the time there's also
a /usr/bin/qemu around which results in i686 capabilities.
When libvirt cannot find a suitable CPU model for host CPU (easily
reproducible by running libvirt in a guest), it would not provide CPU
topology in capabilities XML either. Even though CPU topology is known
and can be queried by virNodeGetInfo. With this patch, CPU topology will
always be provided in capabilities XML regardless on the presence of CPU
model.
Currently we query-spice after the main migration has completed
before moving to next state. Qemu reports this as boolean (not
enclosed within quotes). Therefore it is not correct to use
virJSONValueObjectGetString but virJSONValueObjectGetBoolean instead.
The machine in the last output line of <qemu-binary> -M ?
was always reported as default machine even if this wasn't the
actual default. Trivial fix.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
According to our recent changes (clarifications), we should be pinning
qemu's emulator processes using the <vcpu> 'cpuset' attribute in case
there is no <emulatorpin> specified. This however doesn't work
entirely as expected and this patch should resolve all the remaining
issues.
When p2p migration fails early because qemuMigrationIsAllowed or
qemuMigrationIsSafe say migration should be cancelled, we fail to clear
the migration-out async job. As a result of that, further APIs called
for the same domain may fail with Timed out during operation: cannot
acquire state change lock.
Reported by Guido Winkelmann.
It should relabel tapfd of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT
rather than VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE
(commit ae368ebfcc introduced this bug)
Caution: The context of the two hunks is identical other than indentation.
Please be extremely cautious of where the patch gets applied.
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=851981
When using macvtap, a character device gets first created by
kernel with name /dev/tapN, its selinux context is:
system_u:object_r:device_t:s0
Shortly, when udev gets notification when new file is created
in /dev, it will then jump in and relabel this file back to the
expected default context:
system_u:object_r:tun_tap_device_t:s0
There is a time gap happened.
Sometimes, it will have migration failed, AVC error message:
type=AVC msg=audit(1349858424.233:42507): avc: denied { read write } for
pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524
scontext=unconfined_u:system_r:svirt_t:s0:c598,c908
tcontext=system_u:object_r:device_t:s0 tclass=chr_file
This patch will label the tapfd device before qemu process starts:
system_u:object_r:tun_tap_device_t:MCS(MCS from seclabel->label)
This patch adds support for SUSPEND_DISK event; both lifecycle and
separated. The support is added for QEMU, machines are changed to
PMSUSPENDED, but as QEMU sends SHUTDOWN afterwards, the state changes
to shut-off. This and much more needs to be done in order for libvirt
to work with transient devices, wake-ups etc. This patch is not
aiming for that functionality.
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=805071
to the extent that it can be resolved with current qemu functionality.
It attempts to detect as many situations as possible when the simple
operation of disconnecting an existing tap device from one bridge and
attaching it to another will satisfy the change requested in
virDomainUpdateDeviceFlags() for a network device. Before this patch,
that situation could only be detected if the pre-change interface
*and* the post-change interface definition were both "type='bridge'".
After this patch, it can also be detected if the before or after
interfaces are any combination of type='bridge' and type='network'
(the networks can be <forward mode='nat|route|bridge'>, as long as
they use a Linux host bridge and not macvtap connections).
This extra effort is especially useful since the recent discovery that
a netdev_del+netdev_add combo (to reconnect the network device with
completely different hostside configuration) doesn't work properly
with current qemu (1.2) unless it is accompanied by the matching
device_del+device_add - see this mailing list message for details:
http://lists.nongnu.org/archive/html/qemu-devel/2012-10/msg02355.html
(A slight modification of the patch referenced there has been prepared
to apply on top of this patch, but won't be pushed until qemu can be
made to work with it.)
* qemuDomainChangeNet needs access to the virDomainDeviceDef that
holds the new netdef (so that it can clear out the virDomainDeviceDef
if it ends up using the NetDef to replace the original), so the
virDomainNetDefPtr arg is replaced with a virDomainDeviceDefPtr.
* qemuDomainChangeNet previously checked for *some* changes to the
interface config, but this check was by no means complete. It was also
a bit disorganized.
This refactoring of the code is (I believe) complete in its check of
all NetDef attributes that might be changed, and either returns a
failure (for changes that are simply impossible), or sets one of three
flags:
needLinkStateChange - if the device link state needs to go up/down
needBridgeChange - if everything else is the same, but it needs
to be connected to a difference linux host
bridge
needReconnect - if the entire host side of the device needs
to be torn down and reconstructed (currently
non-working, as mentioned above)
Note that this function will refuse to make any change that requires
the *guest* side of the device to be detached (e.g. changing the PCI
address or mac address). Those would be disruptive enough to the guest
that it's reasonable to require an explicit detach/attach sequence
from the management application.
* As mentioned above, qemuDomainChangeNet also does its best to
understand when a simple change in attached bridge for the existing
tap device will work vs. the need to completely tear down/reconstruct
the host side of the device (including tap device).
This patch *does not* implement the "reconnect" code anyway - there is
a placeholder that turns that into an error. Rather, the purpose of
this patch is to replicate existing behavior with code that is ready
to have that functionality plugged in in a later patch.
* The expanded uses for qemuDomainChangeNetBridge meant that it needed
to be enhanced as well - it no longer replaces the original brname
string in olddev with the new brname; instead, it relies on the
caller to replace the *entire* olddev with newdev (since we've gone
to great lengths to assure they are functionally identical other
than the name of the bridge, this is now not only safe, but more
correct). Additionally, qemuDomainNetChangeBridge can now set the
bridge for type='network' interfaces as well as plain type='bridge'
interfaces. (Note that I had to make this change simultaneous to the
reorganization of qemuDomainChangeNet because the two are too
closely intertwined to separate).
The onlined vcpu pinning policy should inherit def->cpuset if
it's not specified explicitly, and the affinity should be set
in this case. Oppositely, the offlined vcpu pinning policy should
be free()'ed.
Various APIs use cgroup to either set or get the statistics of
host or guest. Hotplug or hot unplug new vcpus without creating
or removing the cgroup for the vcpus could cause problems for
those APIs. E.g.
% virsh vcpucount dom
maximum config 10
maximum live 10
current config 1
current live 1
% virsh setvcpu dom 2
% virsh schedinfo dom --set vcpu_quota=1000
Scheduler : posix
error: Unable to find vcpu cgroup for rhel6.2(vcpu: 1): No such file or
directory
This patch fixes the problem by creating cgroups for each of the
onlined vcpus, and destroying cgroups for each of the offlined
vcpus.
The comment stated that you may call qemuDomainObjBeginJobWithDriver
without passing qemud_driver to signal it's not locked.
qemuDomainObjBeginJobWithDriver still accesses the qemud_driver
structure and the lock singaling is done through a separate parameter.
Save/restore with passed through USB devices currently only works if the
USB device can be found at the same USB address where it used to be
before saving a domain. This makes sense in case a user explicitly
configure the USB address in domain XML. However, if the device was
found automatically by vendor/product identification, we should try to
search for that device when restoring the domain and use any device we
find as long as there is only one available. In other words, the USB
device can now be removed and plugged again or the host can be rebooted
between saving and restoring the domain.
Using VIR_DOMAIN_XML_MIGRATABLE flag, one can request domain's XML
configuration that is suitable for migration or save/restore. Such XML
may contain extra run-time stuff internal to libvirt and some default
configuration may be removed for better compatibility of the XML with
older libvirt releases.
This flag may serve as an easy way to get the XML that can be passed
(after desired modifications) to APIs that accept custom XMLs, such as
virDomainMigrate{,ToURI}2 or virDomainSaveFlags.
All USB device lookup functions emit an error when they cannot find the
requested device. With this patch, their caller can choose if a missing
device is an error or normal condition.
The code which looks up a USB device specified by hostdev is duplicated
in two places. This patch creates a dedicated function that can be
called in both places.
When adding variants of parameter setting APIs which accepted
flags, the existing APIs were all adapted internally to pass
VIR_DOMAIN_AFFECT_CURRENT to the new API. The QEMU impl
qemuSetSchedularParameters was an exception, which instead
used VIR_DOMAIN_AFFECT_LIVE. Change this to match other
compatibility scenarios, so that calling
virDomainSetSchedularParameters(dom, params, nparams);
Has the same semantics as
virDomainSetSchedularParametersFlags(dom, params, nparams, 0);
And
virDomainSetSchedularParametersFlags(dom, params, nparams, VIR_DOMAIN_AFFECT_CURRENT);
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As a side effect of changes in the functions virGetUserID and
virGetGroupID, the user and group configurations for DAC in qemu.conf
are now able to accept both names and IDs, supporting a leading plus
sign to ensure that a numeric value will not be interpreted as a name.
This patch updates the comments in qemu.conf, including a description of
this new behavior.
With the recent introduction of QMP capabilities probing, libvirt failed
to detect support for QXL graphics in QEMU 1.2 and newer. In addition to
fixing that, this patch also causes libvirt to detect QXL support for
qemu-kvm-0.13.0, which doesn't advertise it in -help output but mentions
it in device list. Since qemu-kvm-0.13.0 supported -spice, it looks like
not having qxl in -help was a bug.
When both kvmclock and kvm_pv_eoi are configured (either disabled or
enabled) libvirt will generate invalid CPU specification due to the
fact that even though kvmclock causes the CPU to be specified, it
doesn't set have_cpu flag to true (and the new kvm_pv_eoi as well).
This patch fixes the issue and adds a test exactly for that to show
that it is fixed correctly (and also to keep it that way in the future
of course).
Introduced in commit 0caccb58.
CC libvirt_driver_qemu_impl_la-qemu_capabilities.lo
../../src/qemu/qemu_capabilities.c: In function 'qemuCapsInitQMP':
../../src/qemu/qemu_capabilities.c:2327:13: error: format '%d' expects argument of type 'int', but argument 8 has type 'const char *' [-Werror=format]
* src/qemu/qemu_capabilities.c (qemuCapsInitQMP): Use correct format.
Since libvirt switched to QMP capabilities probing recently, it starts
QEMU process used for this probing with -daemonize, which means
virCommandAbort can no longer reach these processes. As a result of
that, restarting libvirtd will leave several new QEMU processes behind.
Let's use QEMU's -pidfile and use it to kill the process when QMP caps
probing is done.
When doing snapshots, the filesystem freeze function used the agent
entering function that expects the qemud_driver unlocked. This might
cause a deadlock of the qemu driver if the agent does not respond.
The only call path of this function has the qemud_driver locked, so this
patch changes the entering functions to those expecting the driver
locked.
Start a QEMU process using
$QEMU -S -no-user-config -nodefaults \
-nographic -M none -qmp unix:/some/path,server,nowait
and talk QMP over stdio to discover what capabilities the
binary supports. This works for QEMU 1.2.0 or later and
for older QEMU automatically fallback to the old approach
of parsing -help and related command line args.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some architectures provide the query-cpu-definitions command,
but are set to always return a "GenericError" from it :-(
Catch this & treat it as if there was an empty list of CPUs
returned
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After calling qemuMonitorClose(), it is still possible for
the QEMU monitor I/O event callback to get invoked. This
will trigger an error message because mon->fd has been set
to -1 at this point. Silently ignore the case where mon->fd
is -1, likewise for mon->watch being zero.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorOpen method only needs a virDomainObjPtr in order
to access the QEMU pid. This is not critical when detecting the
QEMU capabilties, so can easily be skipped
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The help output for QEMU 1.2.0 changed 'pci-assign' to 'kvm-pci-assign'.
Since the new capabilities code does exact device name matching
instead of substring matching, this caused the capabilities to go
missing.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the qemuCapsParseDeviceStr method has a bunch of open
coded string searches/comparisons to detect devices and their
properties. Soon this data will be obtained from QMP queries
instead of -device help output. Maintaining the list of device
and properties in two places is undesirable. Thus the existing
qemuCapsParseDeviceStr() method needs to be refactored to
separate the device types and properties from the actual
search code.
Thus the -device help output is now parsed to construct a
list of device names, and device properties. These are then
checked against a set of datatables to set the capability
flags
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If QEMU reports CommandNotFound for the 'query-events' command,
we must treat that as success, returning a zero-length array
of events
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorSetCapabilities() API is used to initialize the QMP
protocol capabilities. It has since been abused to initialize some
libvirt internal capabilities based on command/event existance too.
Move the latter code out into qemuCapsProbeQMP() in the QEMU
capabilities source file instead
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemu monitor does not require qemu_conf.h, and the
qemu capabilities code actually wants bitmap.h
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetTargetArch() method to support invocation
of the 'query-target' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetObjectProps() method to support invocation
of the 'device-list-properties' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetObjectTypes() method to support invocation
of the 'qom-list-types' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetEvents() method to support invocation
of the 'query-events' JSON monitor command. No HMP equivalent
is required, since this will only be used when JSON is available
The existing qemuMonitorJSONCheckEvents() method is refactored
to use this new method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetCPUCommands() method to support invocation
of the 'query-commands' JSON monitor command. No HMP equivalent
is required, since this will only be used when JSON is available
The existing qemuMonitorJSONCheckCommands() method is refactored
to use this new method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetCPUDefinitions() method to support invocation
of the 'query-cpu-definitions' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetMachines() method to support invocation
of the 'query-machines' JSON monitor command. No HMP equivalent
is required, since this will only be present for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new qemuMonitorGetVersion() method to support invocation
of the 'query-version' JSON monitor command. No HMP equivalent
is provided, since this will only be used for QEMU >= 1.2
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuCapsProbeMachineTypes & qemuCapsProbeCPUModels methods
do not need to be invoked directly anymore. Make them static
and refactor them to directly populate the qemuCapsPtr object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When launching a QEMU guest the binary is probed to discover
the list of supported CPU names. Remove this probing with a
simple lookup of CPU models in the qemuCapsPtr object. This
avoids another invocation of the QEMU binary during the
startup path.
As a nice benefit we can now remove all the nasty hacks from
the test suite which were done to avoid having to exec QEMU
on the test system. The building of the -cpu command line
can just rely on data we pre-populate in qemuCapsPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When XML for a new guest is received, the machine type is
immediately canonicalized into the version specific name.
This involves probing QEMU for supported machine types.
Replace this probing with a lookup of the machine types
in the (hopefully cached) qemuCapsPtr object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove all use of the existing APIs for querying QEMU
capability flags. Instead obtain a qemuCapsPtr object
from the global cache. This avoids the execution of
'qemu -help' (and related commands) when launching new
guests.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When building up a virCapsPtr instance, the QEMU driver
was copying the list of machine types across from the
previous virCapsPtr instance, if the QEMU binary had not
changed. Replace this ad-hoc caching of data with use
of the new qemuCapsCache global cache.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce a qemuCapsCachePtr object to provide a global cache
of capabilities for QEMU binaries. The cache auto-populates
on first request for capabilities about a binary, and will
auto-refresh if the binary has changed since a previous cache
was populated
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
For historical compat we use 'itanium' as the arch name, so
if the QEMU binary suffix is 'ia64' we need to translate it
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the qemuAgentClose method is called from a place which holds
the domain lock, it is theoretically possible to get a deadlock
in the agent destroy callback. This has not been observed, but
the equivalent code in the QEMU monitor destroy callback has seen
a deadlock.
Remove the redundant locking while unrefing the object and the
bogus assignment
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Many parts of virDomainDefPtr were using 'int' variables as
array length counts. Replace all these with size_t and update
various format strings & API signatures to adapt
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU
monitor callbacks
Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)):
#0 0x00000030d0e0de4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000030d0e09ca6 in _L_lock_840 () from /lib64/libpthread.so.0
#2 0x00000030d0e09ba8 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fcd162f416d in virMutexLock (m=<optimized out>)
at util/threads-pthread.c:85
#4 0x00007fcd1632c651 in virDomainObjLock (obj=<optimized out>)
at conf/domain_conf.c:14256
#5 0x00007fcd0daf05cc in qemuProcessHandleMonitorDestroy (mon=0x7fcccc0029e0,
vm=0x7fcccc00a850) at qemu/qemu_process.c:1026
#6 0x00007fcd0db01710 in qemuMonitorDispose (obj=0x7fcccc0029e0)
at qemu/qemu_monitor.c:249
#7 0x00007fcd162fd4e3 in virObjectUnref (anyobj=<optimized out>)
at util/virobject.c:139
#8 0x00007fcd0db027a9 in qemuMonitorClose (mon=<optimized out>)
at qemu/qemu_monitor.c:860
#9 0x00007fcd0daf61ad in qemuProcessStop (driver=driver@entry=0x7fcd04079d50,
vm=vm@entry=0x7fcccc00a850,
reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0)
at qemu/qemu_process.c:4057
#10 0x00007fcd0db323cf in qemuDomainDestroyFlags (dom=<optimized out>,
flags=<optimized out>) at qemu/qemu_driver.c:1977
#11 0x00007fcd1637ff51 in virDomainDestroyFlags (
domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
At frame #10 we are holding the domain lock, we call into
qemuProcessStop() to cleanup QEMU, which triggers the monitor
to close, which invokes qemuProcessHandleMonitorDestroy() which
tries to obtain the domain lock again. This is a non-recursive
lock, hence hang.
Since qemuMonitorPtr is a virObject, the unref call in
qemuProcessHandleMonitorDestroy no longer needs mutex
protection. The assignment of priv->mon = NULL, can be
instead done by the caller of qemuMonitorClose(), thus
removing all need for locking.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If QEMU quits immediately after we opened the monitor it was
possible we would skip the clearing of the SELinux process
socket context
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In the cgroups APIs we have a virCgroupKillPainfully function
which does the loop sending SIGTERM, then SIGKILL and waiting
for the process to exit. There is similar functionality for
simple processes in qemuProcessKill, but it is tangled with
the QEMU code. Untangle it to provide a virProcessKillPainfuly
function
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When calling qemuProcessKill from the virDomainDestroy impl
in QEMU, do not ignore the return value. This ensures that
if QEMU fails to respond to SIGKILL, the caller will know
about the failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Recently, there have been some improvements made to qemu so it
supports seamless migration or something very close to it.
However, it requires libvirt interaction. Once qemu is migrated,
the SPICE server needs to send its internal state to the destination.
Once it's done, it fires SPICE_MIGRATE_COMPLETED event and this
fact is advertised in 'query-spice' output as well.
We must not kill qemu until SPICE server finishes the transfer.
SELinux wants all log files opened with O_APPEND. When
running non-root though, libvirtd likes to use O_TRUNC
to avoid log files growing in size indefinitely. Instead
of using O_TRUNC though, we can use O_APPEND and then
call ftruncate() which keeps SELinux happier.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
There are a number of process related functions spread
across multiple files. Start to consolidate them by
creating a virprocess.{c,h} file
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In most of the snapshot API's there's no need to hold the driver lock
the whole time.
This patch adds helper functions that get the domain object in functions
that don't require the driver lock and simplifies call paths from
snapshot-related API's.
maxcpu and hostcpus are defined and calculated in qemudDomainPinVcpuFlags()
and qemudDomainPinEmulator(), but never used. So remove them including nodeinfo.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Disk hotplug is a two phase action: qemuMonitorAddDrive followed by
qemuMonitorAddDevice. When the first part succeeds but the second one
fails, we need to rollback the drive addition.
https://www.gnu.org/licenses/gpl-howto.html recommends that
the 'If not, see <url>.' phrase be a separate sentence.
* tests/securityselinuxhelper.c: Remove doubled line.
* tests/securityselinuxtest.c: Likewise.
* globally: s/; If/. If/
The "dump-guest-core' option is new option for the machine type
(-machine pc,dump-guest-core) that controls whether the guest memory
will be marked as dumpable.
While testing this, I've found out that the value for the '-M' options
is not parsed correctly when additional parameters are used. However,
when '-machine' is used for the same options, it gets parsed as
expected. That's why this patch also modifies the parsing and creating
of the command line, so both '-M' and '-machine' are recognized. In
QEMU's help there is only mention of the 'machine parameter now with
no sign of the older '-M'.
This patch cleans up building the "-boot" parameter and while on that
fixes one inconsistency by modifying these things:
- I completed the unfinished virDomainBootMenu enum by specifying
LAST, declaring it and also declaring the TypeFromString and
TypeToString parameters.
- Previously mentioned TypeFromString and TypeToString are used when
parsing the XML.
- Last, but not least, visible change is that the "-boot" parameter
is built and parsed properly:
- The "order=" prefix is used only when additional parameters are
used (menu, etc.).
- It's rewritten in a way that other parameters can be added
easily in the future (used in following patch).
- The "order=" parameter is properly parsed regardless to where it
is placed in the string (e.g. "menu=on,order=nc").
- The "menu=" parameter (and others in the future) are created
when they should be (i.e. even when bootindex is supported and
used, but not when bootloader is selected).
Currently, we mark domain PAUSED (but not emit an event)
just before we issue 'stop' on monitor; This command can
take ages to finish, esp. when domain's doing a lot of
IO - users can enforce qemu to open files with O_DIRECT
which doesn't return from write() until data reaches the
block device. Having said that, we report PAUSED even if
domain is not paused yet.
This series adds support to run QEMU with seccomp sandbox enabled. It can be
configured in qemu.conf to on, off, or the QEMU default, which is off in 1.2.
Default value is the QEMU default.
On agent EOF the qemuProcessHandleAgentEOF() callback is called
which locks virDomainObjPtr. Then qemuAgentClose() is called
(with domain object locked) which eventually calls qemuAgentDispose()
and qemuProcessHandleAgentDestroy(). This tries to lock the
domain object again. Hence the deadlock.
All of ide-drive, ide-hd, ide-cd, scsi-disk, scsi-hd, and scsi-cd
supports wwn property. (NB, scsi-block doesn't support to set wwn).
* src/qemu/qemu_command.c: Error out if underlying QEMU doesn't
support wwn property for the device; Set wwn for the device otherwise.
* tests/qemuxml2argvdata/qemuxml2argv-disk-ide-wwn.args: New test
* tests/qemuxml2argvdata/qemuxml2argv-disk-ide-wwn.xml: Likewise
* tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-disk-wwn.args: Likewise
* tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-disk-wwn.xml: Likewise
* tests/qemuxml2argvtest.c: Add the new tests.
This assumes ide-drive.wwn, ide-hd.wwn, ide-cd.wwn were supported
at the same time, similar for scsi-disk.wwn, scsi-hd.wwn, and
scsi-cd.wwn. So only two new caps (QEMU_CAPS_IDE_DRIVE_WWN,
and QEMU_CAPS_SCSI_DISK_WWN) are introduced.
Upstream qemu has raised a concern about whether dumping guest
memory by reading guest paging tables is a security hole:
https://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02607.html
While auditing libvirt to see if we would be impacted, I noticed
that we had some dead code. It is simpler to nuke the dead code
and limit our monitor code to just the subset we make use of.
* src/qemu/qemu_monitor.h (QEMU_MONITOR_DUMP): Drop poorly named
and mostly-unused enum.
* src/qemu/qemu_monitor.c (qemuMonitorDumpToFd): Drop arguments.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDump): Likewise.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDump): Likewise.
* src/qemu/qemu_driver.c (qemuDumpToFd): Update caller.
If the qemuBuildCommandLine method raised an error before the
virCommandPtr instance was created, the local var would not
be initialized, resulting in a possible SEGV in the error
cleanup branch. Also add some debugging of the method params
Introduce a qemuCapsNewForBinary() API which creates a new
QEMU capabilities object, populated with data relating to
a specific QEMU binary. The qemuCaps object is also given
a timestamp, which makes it possible to detect when the
cached capabilities for a binary are out of date
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Don't bother checking for the existance of the HMP passthrough
command. Just try to execute it, and propagate the failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The qemuMonitorHMPCommand() API and things it calls will report
a wide variety of errors. The QEMU text monitor should not be
overwriting these errors
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch adds full support for EOI setting for domains. Because this
is CPU feature (flag), the model needs to be added even when it's not
specified. Fortunately this problem was already solved with kvmclock,
so this patch simply abuses that.
And due to the size of the patch (17 lines) I dared to include the tests.
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=843372
when qemu supports the 'transaction' monitor command,
and libvirt's --reuse-ext flag was not specified, libvirt created
a stub file with zero size in first place. After the failure of
QEMU transaction command performing qcow2 snapshots on more than
one drives, the stub file is left behind with non-empty
by the QEMU transaction command.
In order to unlink the file, the patch removes the file size checking.
Steps to reproduce the issue:
Steps:
1, Create a qemu instance with two drive images of qcow2 type (root user)
/usr/libexec/qemu-kvm -m 1024 -smp 1 -name "rhel6u1" \
-drive file=/var/lib/libvirt/images/firstqcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-drive file=/var/lib/libvirt/images/secondqcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -qmp stdio
2, Initialize qemu qmp
{"execute":"qmp_capabilities"}
3, Remove the second drive image file
rm -f /var/lib/libvirt/images/secondqcow2
4, Run 'transaction' command with snapshot qemu commands in.
{"execute":"transaction","arguments":
{"actions":
[{"type":"blockdev-snapshot-sync","data":
{"device":"drive-virtio-disk0","snapshot-file":"/var/lib/libvirt/images/firstqcow2-snapshot.img","format":"qcow2"}
},
{"type":"blockdev-snapshot-sync","data":
{"device":"drive-virtio-disk1","snapshot-file":"/var/lib/libvirt/images/secondqcow2-snapshot.img","format":"qcow2"}
}]
},
"id":"libvirt-6"}
5, Got the error as follows:
{"id": "libvirt-6",
"error": {"class": "OpenFileFailed", "desc": "Could not open '/var/lib/libvirt/images/secondqcow2-snapshot.img'",
"data": {"filename": "/var/lib/libvirt/images/secondqcow2-snapshot.img"}
}
}
6, List first newly-created snapshot file:
-rw-r--r--. 1 root root 262144 Sep 13 11:43 firstqcow2-snapshot.img
The QEMU capabilities APIs used a misc of 'int' and
'unsigned int' for variables relating to array sizes.
Change all these to use 'size_t'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow each VM instance to record additional capabilities
without affecting other VMs, there needs to be a way to do
a deep copy of the qemuCapsPtr object
Add struct fields and APIs to allow the qemu capabilities object
to store version, arch, machines & cpu names, etc
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current qemu capabilities are stored in a virBitmapPtr
object, whose type is exposed to callers. We want to store
more data besides just the flags, so we need to move to a
struct type. This object will also need to be reference
counted, since we'll be maintaining a cache of data per
binary. This change introduces a 'qemuCapsPtr' virObject
class. Most of the change is just renaming types and
variables in all the callers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Technically speaking we should wait until we receive the QMP
greeting message before attempting to send any QMP monitor
commands. Mostly we've got away with this, but there is a race
in some QEMU which cause it to SEGV if you sent it data too
soon after startup. Waiting for the QMP greeting avoids the
race
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>