This introduces a hash table for qemu driver, to store the shared
disk's info as (@major:minor, @ref_count). @ref_count is the number
of domains which shares the disk.
Since we only care about if the disk support unprivileged SG_IO
commands, and the SG_IO commands only make sense for block disk,
this patch only manages (add/remove hash entry) the shared disk for
block disk.
* src/qemu/qemu_conf.h: (Add member 'sharedDisks' of type
virHashTablePtr; Declare helpers
qemuGetSharedDiskKey, qemuAddSharedDisk
and qemuRemoveSharedDisk)
* src/qemu/qemu_conf.c (Implement the 3 helpers)
* src/qemu/qemu_process.c (Update 'sharedDisks' when domain
starting and shutdown)
* src/qemu/qemu_driver.c (Update 'sharedDisks' when attaching
or detaching disk).
When the disk alignment check done while redefining an existing snapshot
failed, the qemu driver attempted to free the existing snapshot. As in
the cleanup path the definition of the snapshot wasn't assigned, the
cleanup code dereferenced a NULL pointer.
This patch changes the behavior on error paths while redefining snapshot
in two ways:
1) On failure, modifications done on the snapshot definition object are
rolled back.
2) The previous definition of the data isn't freed until it's certain it
won't be needed any more.
This change avoids the segfault and additionally the snapshot doesn't
vanish if redefinition fails for some reason.
This also changes the function signature to take a
virDomainChrSourceDefPtr instead of just a path, since it needs to
differentiate behavior based on source->type.
The functionality provided in virchrdev.c (previously virconsole.c) is
applicable to other types of character devices besides consoles, such
as channels. This patch is just code motion, renaming things such as
"console" or "pty", instead using more general terms such as
"character device" or "device path".
Since 4c993d8a we failed to set this important capability, which
allows starting a domain with QXL video card. We set DEVICE_QXL
capability bit instead, which is not necessary wrong. Anyway, if
qemu supports the new '-device qxl' it supports older '-vga qxl'
as well. The latter is used for the primary (the first) qxl video
card, the former for other video cards.
Commit b3f2b4ca5c left buf unallocated in
the case of QMP capability probing being used, leading to a segfault in
strlen in the cleanup path.
This patch opens the log and allocates the buffer if QMP probing was
used, so we can display the helpful error message.
Despite our great effort we still parsed qemu log output.
We wouldn't notice unless upcoming qemu 1.4 changed the
format of the logs slightly. Anyway, now we should gather
all interesting knobs like pty paths from monitor. Moreover,
since for historical reasons the first console can be just
an alias to the first serial port, we need to check this and
copy the pty path if that's the case to the first console.
This reverts commit 28224c4d2a
which shouldn't be needed at all because with current qemu
we obtain all paths from 'query-chardev' output. We ought
not parse log output at all anymore.
Since 586502189edf9fd0f89a83de96717a2ea826fdb0 qemu commit, the log
lines reporting chardev's path has changed from:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device redirected to /dev/pts/5
char device redirected to /dev/pts/6
char device redirected to /dev/pts/7
to:
$ ./x86_64-softmmu/qemu-system-x86_64 -serial pty -serial pty -monitor pty
char device compat_monitor0 redirected to /dev/pts/5
char device serial0 redirected to /dev/pts/6
char device serial1 redirected to /dev/pts/7
However, with current code we are not prepared for such change, which
results in us being unable to start any domain.
Many internal qemu APIs must find domain object from passed
virDomainPtr. And with function Peter's introduced, we can use it
instead of copying multiple lines among code.
Since we switched to QMP probing, the object types are spelled out
explicitly, i.e. virtio-net-pci. This has effectively disabled
the capability detection of s390 virtio devices. The trivial fix
is to add the s390 virtio types explicitly to qemuCapsObjectProps.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=888426
The code for doing a block-copy was supposed to track the destination
file in drive->mirror, but was set up to do all mallocs prior to
starting the copy so that OOM wouldn't leave things partially started.
However, the wrong variable was being written; later in the code we
silently did 'disk->mirror = mirror' which was still NULL, and thus
leaking memory and leaving libvirt to think that the mirror job was
never started, which prevented a pivot operation after a copy.
Problem introduced in commit 35c7701c6.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Initialize correct
variable.
To bring in line with new naming practice, rename the=
src/util/cgroup.{h,c} files to vircgroup.{h,c}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, it only considers PTY backend serial devices for pseries.
It need to support all kinds of serial devices.
This patch is to fix the problem which is that it doesn't work
when specifying source type as file.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
ACPI is only supported on x86 platform, PPC can't support it.
So QEMU_CAPS_NO_ACPI shouldn't be set.
This patch is to remove QEMU_CAPS_NO_ACPI capability for
non-x86 platform.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Historically there was an inconsistency in handling of the
itanium arch. The xen driver & CPU model code treated it
as 'ia64' but the QEMU capabilities code used 'itanium'. On
the grounds that no one has ever seriously used itanium
with QEMU, while RHEL shipped itanium with Xen, we should
favour 'ia64' as the canonical format
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When LXC labels USB devices during hotplug, it is running in
host context, so it needs to pass in a vroot path to the
container root.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
'-device VGA' maps to '-vga std'
'-device cirrus-vga' maps to '-vga cirrus'
'-device qxl-vga' maps to '-vga qxl'
(there is also '-device qxl' for secondary devices)
'-device vmware-svga' maps to '-vga vmware'
For qemu(>=1.2), we can use -device to replace -vga for video
device. For the primary video device, the patch tries to use 0x2
slot for matching old qemu. If the 0x2 slot is allocated already,
the addr property could help for using any available slot.
For qemu(< 1.2), we keep using -vga for primary device.
QEMU_CAPS_DEVICE_QXL -device qxl
QEMU_CAPS_DEVICE_VGA -device VGA
QEMU_CAPS_DEVICE_CIRRUS_VGA -device cirrus-vga
QEMU_CAPS_DEVICE_VMWARE_SVGA -device vmware-svga
QEMU_CAPS_DEVICE_VIDEO_PRIMARY /* safe to use -device XXX
for primary video device */
Fix a typo in qemuCapsObjectTypes, the string 'qxl' here
should be -device qxl rather than -vga [...|qxl|..]
Noticed these while building on FreeBSD.
* src/qemu/qemu_monitor.c (qemuMonitorBlockInfoLookup): Rename
variable to avoid 'devname' collision.
* src/qemu/qemu_driver.c (qemuDomainInterfaceStats): Mark unused
variable.
When a network device's bridge connection is changed by
virDomainUpdateDevice, libvirt first removes the netdev's tap from its
old bridge, then adds it to the new bridge. Sometimes, due to a
network being destroyed while a guest device is still attached, the
tap may already be "removed" from the old bridge (or the old bridge
may not even exist any more); the existing code was needlessly failing
the update when this happened, making it impossible to recover from
the situation without completely detaching (i.e. removing) the netdev
from the guest and re-attaching.
Instead of failing the entire operation when removal of the tap from
the old bridge fails, this patch changes qemuDomainChangeNetBridge to
just log a warning and continue, allowing a reasonable recover from
the situation.
(you'll appreciate this change if you ever accidentally destroy a
network while your guests are still using it).
Refactor virLockManagerPluginNew() so that the caller does
not need to pass in the config file path itself - just the
config directory and driver name.
Fix QEMU to actually pass in a config file when creating the
default lock manager plugin, rather than NULL.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
* Autotools changes:
- Don't assume Qemu is Linux-only
- Check Linux headers only on Linux
- Disable firewalld on FreeBSD
* Initctl:
Initctl seem to present only on Linux, so stub it on other platforms
* Raw I/O: Linux-only as well
* Headers cleanup
This patch adds a new domain lookup helper qemuDomObjFromDomainDriver
that lookups the domain and leaves the driver locked. The driver is
returned as the second argument of that function. If the lookup fails
the driver is unlocked to help avoid cleanup codepaths.
This patch also improves docs for the helpers.
When a qemu domain is backed by huge pages, apparmor needs to grant the domain
rw access to files under the hugetlbfs mount point. Add a hook, called in
qemu_process.c, which ends up adding the read-write access through
virt-aa-helper. Qemu will be creating a randomly named file under the
mountpoint and unlinking it as soon as it has mmap()d it, therefore we
cannot predict the full pathname, but for the same reason it is generally
safe to provide access to $path/**.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
This patch exports qemuMigrationIsAllowed and adds a new parameter to it
to denote if it's a remote migration or a local migration. Local
migrations are used in snapshots and saving of the machine state and
have fewer restrictions. This patch also adjusts callers of the function
and tweaks some error messages to be more universal.
Currently there is no way to detect it via QMP and requesting "-sandbox
off" works correctly even if it was compiled out, so this will work
unless someone both requests the sandbox in qemu.conf and builds QEMU
without the support for it.
These classes can borrow unused bandwidth. Basically,
only egress qdsics can have classes, therefore we can
do this kind of traffic shaping only on host's outgoing,
that is domain's incoming traffic.
When the disk snapshot part of an external system checkpoint fails the
memory image is retained. This patch adds code to remove the image in
such case.
In case the snapshot code isn't able to restart CPUs after an external
checkpoint we would leak a copy of the domains XML definition. This
patch fixes the cleanup path.
False positive, but it breaks the build with gcc-4.6.3.
qemu/qemu_migration.c:2931:37: error: 'offline' may be used
uninitialized in this function [-Werror=uninitialized]
qemu/qemu_migration.c:2887:10: note: 'offline' was declared here
When restarting CPUs after an external snapshot, the restarting function
was called without the appropriate async job type. This caused that a
new sync job wasn't created and allowed races in the monitor.
Offline migration transfers inactive definition of a domain (which may
or may not be active). After successful completion, the domain remains
in its current state on source host and is defined but inactive on
destination host. It's a bit more clever than virDomainGetXMLDesc() on
source host followed by virDomainDefineXML() on destination host, as
offline migration will run pre-migration hook to update the domain XML
on destination host. Currently, copying non-shared storage is not
supported during offline migration.
Offline migration can be requested with a new migration flag called
VIR_MIGRATE_OFFLINE (which has to be combined with
VIR_MIGRATE_PERSIST_DEST flag).
This fixes a problem that showed up during testing of:
https://bugzilla.redhat.com/show_bug.cgi?id=881480
Due to a logic error in the function that gets the name of the bridge
an interface connects to, any time a bridge was specified directly
(type='bridge') rather than indirectly (type='network'), An error
would be logged (although the operation would then complete
successfully):
Network type 6 is not supported
The final virReportError() in the function
qemuDomainNetGetBridgeName() was apparently avoided in the past with a
"goto cleanup" at the end of each case, but the case of bridge somehow
no longer has that final goto cleanup.
The proper solution is anyway to not rely on goto's, but put the error
log inside an else {} clause, so that it's executed only if the type
is neither bridge nor network (in reality, this function should only
ever be called for those two types, that's why this is an internal
error).
While making this change, the error message was also tuned to be more
correct (since it's not really the type of the network, but the type
of the interface, and it *is* otherwise supported, it's just that the
interface type in question doesn't *have* a bridge device associated
with it, or at least we don't know how to get it).
If a network interface model is not specified, libvirt will run
into an unchecked NULL pointer coredump. On the other hand if
the empty model is ignored, a PCI bus address would be generated,
which is not supported by S390.
Since the only valid network type model for S390 is virtio,
we use this as the default value, which is the same for QEMU.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Things are supposed to look like:
<machine canonical='pc-0.12'>pc</machine>
But are currently swapped. This can cause many VMs to revert to having
machine type='pc' which will affect save/restore across qemu upgrades.
QEMU supports setting vendor and product strings for disk since
1.2.0 (only scsi-disk, scsi-hd, scsi-cd support it), this patch
exposes it with new XML elements <vendor> and <product> of disk
device.
Unmanaged PCI devices were only leaked if pciDeviceListAdd failed but
managed devices were always leaked. And leaking PCI device is likely to
leave PCI config file descriptor open. This patch fixes
qemuReattachPciDevice to either free the PCI device or add it to the
inactivePciHostdevs list.
An attempt to attach device that is already attached to a domain results
in the following error:
virsh # attach-device rhel6 pci2 --persistent
error: Failed to attach device from pci2
error: invalid argument: device is already in the domain configuration
The "invalid argument" error code looks wrong, we usually use "operation
invalid" when the action cannot be done in current state.
Only one error in qemu_monitor was already using the relatively
new OPERATION_UNSUPPORTED error, even though it is a better fit
for all of the messages related to options that are unsupported
due to the version of qemu in use rather than due to a user's
XML or .conf file choice. Suggested by Osier Yang.
* src/qemu/qemu_monitor.c (qemuMonitorSendFileHandle)
(qemuMonitorAddHostNetwork, qemuMonitorRemoveHostNetwork)
(qemuMonitorAttachDrive, qemuMonitorDiskSnapshot)
(qemuMonitorDriveMirror, qemuMonitorTransaction)
(qemuMonitorBlockCommit, qemuMonitorDrivePivot)
(qemuMonitorBlockJob, qemuMonitorSystemWakeup)
(qemuMonitorGetVersion, qemuMonitorGetMachines)
(qemuMonitorGetCPUDefinitions, qemuMonitorGetCommands)
(qemuMonitorGetEvents, qemuMonitorGetKVMState)
(qemuMonitorGetObjectTypes, qemuMonitorGetObjectProps)
(qemuMonitorGetTargetArch): Use better error category.
Without this patch, attempts to create a disk snapshot when qemu
is too old results in a cryptic message:
virsh # snapshot-create 23 --disk-only
error: operation failed: Failed to take snapshot: unknown command: 'snapshot_blkdev'
Now it reports:
virsh # snapshot-create 23 --disk-only
error: unsupported configuration: live disk snapshot not supported with this QEMU binary
All versions of qemu that support live disk snapshot also support
QMP (basically upstream qemu 1.1 and later, and backports to RHEL 6.2).
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DISK_SNAPSHOT): New
capability.
* src/qemu/qemu_capabilities.c (qemuCaps): Track it.
(qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateDiskActive): Use
it.
* src/qemu/qemu_monitor.c (qemuMonitorDiskSnapshot): Simplify.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskSnapshot):
Likewise.
* src/qemu/qemu_monitor_text.h (qemuMonitorTextDiskSnapshot):
Delete.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextDiskSnapshot):
Likewise.
Currently to deal with auto-shutdown libvirtd must periodically
poll all stateful drivers. Thus sucks because it requires
acquiring both the driver lock and locks on every single virtual
machine. Instead pass in a "inhibit" callback to virStateInitialize
which drivers can invoke whenever they want to inhibit shutdown
due to existance of active VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we can't (currently) rely on the ability to provide blanket
support for all possible network changes by calling the toplevel
netdev hostside disconnect/connect functions (due to qemu only
supporting a lockstep between initialization of host side and guest
side of devices), in order to support live change of an interface's
nwfilter we need to make a special purpose function to only call the
nwfilter teardown and setup functions if the filter for an interface
(or its parameters) changes. The pattern is nearly identical to that
used to change the bridge that an interface is connected to.
This patch was inspired by a request from Guido Winkelmann
<guido@sagersystems.de>, who tested an earlier version.
The fact that only the guest agent, or ACPI flag can be used
when requesting reboot/shutdown is merely a limitation of the
QEMU driver impl at this time. Thus it should not be in
libvirt.c code
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The default machine type must be stored in the first element of
the caps->machineTypes array. This was done for help output
parsing but not for QMP probing.
Added a helper function qemuSetDefaultMachine to apply the same
fix up for both probing methods.
Further, it was necessary to set caps->nmachineTypes after QMP
probing.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
https://bugzilla.redhat.com/show_bug.cgi?id=872292
Libvirt should not attempt to call a QMP command that has not been
documented in qemu.git - if future qemu introduces a command by the
same name but with subtly different semantics, then libvirt will be
broken when trying to use that command.
We also had some code that could never be reached - some of our
commands have an alternate for new vs. old qemu HMP commands; but
if we are new enough to support QMP, we only need a fallback to
the new HMP counterpart, and don't need to try for a QMP counterpart
for the old HMP version.
See also this attempt to convert the three snapshot commands to QMP:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01597.html
although it looks like that will still not happen before qemu 1.3.
That thread eventually decided that qemu would use the name
'save-vm' rather than 'savevm', which mitigates the fact that
libvirt's attempt to use a QMP 'savevm' would be broken, but we
might not be as lucky on the other commands.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONSetCPU)
(qemuMonitorJSONAddDrive, qemuMonitorJSONDriveDel)
(qemuMonitorJSONCreateSnapshot, qemuMonitorJSONLoadSnapshot)
(qemuMonitorJSONDeleteSnapshot): Use only HMP fallback for now.
(qemuMonitorJSONAddHostNetwork, qemuMonitorJSONRemoveHostNetwork)
(qemuMonitorJSONAttachDrive, qemuMonitorJSONGetGuestDriveAddress):
Delete; QMP implies QEMU_CAPS_DEVICE, which prefers AddNetdev,
RemoveNetdev, and AddDrive anyways (qemu_hotplug.c has all callers).
* src/qemu/qemu_monitor.c (qemuMonitorAddHostNetwork)
(qemuMonitorRemoveHostNetwork, qemuMonitorAttachDrive): Reflect
deleted commands.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddHostNetwork)
(qemuMonitorJSONRemoveHostNetwork, qemuMonitorJSONAttachDrive):
Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
Commit 1b2ebf95 overlooked that there were two spots affected.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice):
Transfer backing chain before deletion.
* src/qemu/qemu_driver.c (qemuDomainDetachDeviceDiskLive): Fix
spacing (partly to ensure a different-looking patch).
This patch adds two labels and gets rid of a ton of duplicated code.
This patch also fixes some error message and switches most of them to
proper error reporting functions.
This patch adds macros to help retrieve configuration values from qemu
driver's configuration. Some configuration options are grouped
together in the process.
The virStateInitialize method and several cgroups methods were
using an 'int privileged' parameter or similar for dual-state
values. These are better represented with the bool type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
As of 1a50ba2cb0 we fail to connect to the
monitor instead of getting an exit status != 0 from qemu itself. This
breaks capabilities probing for the non QMP case.
Replace the following names
* struct qemu_snap_remove with virQEMUSnapRemovePtr
* struct qemu_snap_reparent with virQEMUSnapReparentPtr
* struct qemu_save_header with virQEMUSaveHeaderPtr
* enum qemu_save_formats with virQEMUSaveFormat
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Remove the obsolete 'qemud' naming prefix and underscore
based type name. Introduce virQEMUDriverPtr as the replacement,
in common with LXC driver naming style
Throughout the code, we've always used VIR_DOMAIN_SHUTDOWN* flags
even for virDomainReboot() API and its implementation. Fortunately,
the appropriate macros has the same value. But if we want to keep
things consistent, we should be using the correct macros. This
patch doesn't break anything, luckily.
Error messages produced while dispatching guest agent commands didn't
have an apparent reference to the fact that they are dealing with guest
agent commands. This patch fixes up some of the messages to contain that
reference.
using qemu guest agent. As said in previous patch,
@mountPoint must be NULL and @flags zero because
qemu guest agent doesn't support these arguments
yet. If qemu learns them, we can start supporting
them as well.
With QMP capability probing, the version was not set.
virsh version returns:
...
Cannot extract running QEMU hypervisor version
This is fixed by computing caps->version from QMP major,
minor, micro values.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
QMP Capability probing will fail if QEMU cannot bind to the
QMP monitor socket in the qemu_driver->libDir directory.
That's because the child process is stripped of all
capabilities and this directory is chown'ed to the configured
QEMU user/group (normally qemu:qemu) by the QEMU driver.
To prevent this from happening, the driver startup will now pass
the QEMU uid and gid down to the capability probing code.
All capability probing invocations of QEMU will be run with
the configured QEMU uid instead of libvirtd's.
Furter, the pid file handling is moved to libvirt, as QEMU
cannot write to the qemu_driver->runDir (root:root). This also
means that the libvirt daemonizing must be used.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If qemuMonitorOpenUnix is called without a related pid, i.e. for
QMP probing, a connect failure can happen as the result of a race.
Without a pid there is no retry and thus we give up too early.
This changes the code to retry if no pid is supplied.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Change some legacy function names to use 'qemu' as their
prefix instead of 'qemud' which was a hang over from when
the QEMU driver ran inside a separate daemon
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876828
Commit 38c4a9cc introduced a regression in hot unplugging of disks
from qemu, where cgroup device ACLs were no longer being revoked
(thankfully not a security hole: cgroup ACLs only prevent open()
of the disk; so reverting the ACL prevents future abuse but doesn't
stop abuse from an fd that was already opened before the ACL change).
The actual regression is due to a latent bug. The hot unplug code
was computing the set of files needing cgroup ACL revocation based
on the XML passed in by the user, rather than based on the domain's
details on which disk was being deleted. As long as the revoke
path was always recomputing the backing chain, this didn't really
matter; but now that we want to compute the chain exactly once and
remember that computation, we need to hang on to the backing chain
until after the revoke has happened.
* src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice):
Transfer backing chain before deletion.
This patch introduces the RNG schema and updates necessary data strucutures
to allow various hypervisors to make use of Gluster protocol as one of the
supported network disk backend. Next patch will add support to make use of
this feature in Qemu since it now supports Gluster protocol as one of the
network based storage backend.
Two new optional attributes for <host> element are introduced - 'transport'
and 'socket'. Valid transport values are tcp, unix or rdma. If none specified,
tcp is assumed. If transport is unix, socket specifies path to unix socket.
This patch allows users to specify disks on gluster backends like this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume1/image'>
<host name='example.org' port='6000' transport='tcp'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='gluster' name='Volume2/image'>
<host transport='unix' socket='/path/to/sock'/>
</source>
<target dev='vdb' bus='virtio'/>
</disk>
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Although we require various C99 features, we don't yet require a
complete C99 compiler. On RHEL 5, compilation complained:
qemu/qemu_command.c: In function 'qemuBuildGraphicsCommandLine':
qemu/qemu_command.c:4688: error: 'for' loop initial declaration used outside C99 mode
* src/qemu/qemu_command.c (qemuBuildGraphicsCommandLine): Declare
variable sooner.
* src/qemu/qemu_process.c (qemuProcessInitPasswords): Likewise.
The error "... but the cause is unknown" appeared for XMLs similar to
this:
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/dev/zero'/>
<target dev='sr0'/>
</disk>
Notice unsupported disk type (for the driver), but also no address
specified. The first part is not a problem and we should not abort
immediately because of that, but the combination with the address
unknown was causing an unspecified error.
While fixing this, I added an error to one place where this return
value was not managed properly.
I have been testing libvirt v1.0.0 for deployment within my
organization, and in the process discovered what appears to be a bug
that breaks virsh attach-device, when attaching an RBD volume to an
instance. First, here is the error presented, with v1.0.0 (this worked
in v0.10.2):
[root@host ~]# virsh attach-device W5APQ8 G84VV1.xml
error: Failed to attach device from G84VV1.xml
error: cannot open file 'dc3-1-test/G84VV1': No such file or directory
Using git bisect, I narrowed the problem down to this as the first
commit to break this setup:
4d34c92947 is the first bad commit
Upcoming patches for revert-and-clone branching of snapshots need
to be able to copy a domain definition; make this step reusable.
* src/conf/domain_conf.h (virDomainDefCopy): New prototype.
* src/conf/domain_conf.c (virDomainObjCopyPersistentDef): Split...
(virDomainDefCopy): ...into new function.
(virDomainObjSetDefTransient): Use it.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Use it.
This patch adds a helper to determine if snapshots are external and uses
the helper to fix detection of those in snapshot deletion code.
Snapshots are external if they have an external memory image or if the
disk locations are external. As mixed snapshots are forbidden for now
we need to check just one disk to know.
Currently, if user calls virDomainAbortJob we just issue
'migrate_cancel' and hope for the best. However, if user calls
the API in wrong phase when migration hasn't been started yet
(perform phase) the cancel request is just ignored. With this
patch, the request is remembered and as soon as perform phase
starts, migration is cancelled.
For S390, the default console target type cannot be of type 'serial'.
It is necessary to at least interpret the 'arch' attribute
value of the os/type element to produce the correct default type.
Therefore we need to extend the signature of defaultConsoleTargetType
to account for architecture. As a consequence all the drivers
supporting this capability function must be updated.
Despite the amount of changed files, the only change in behavior is
that for S390 the default console target type will be 'virtio'.
N.B.: A more future-proof approach could be to to use hypervisor
specific capabilities to determine the best possible console type.
For instance one could add an opaque private data pointer to the
virCaps structure (in case of QEMU to hold capsCache) which could
then be passed to the defaultConsoleTargetType callback to determine
the console target type.
Seems to be however a bit overengineered for the use case...
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
When the libvirt daemon is restarted it tries to reconnect to running
qemu domains. Since commit d38897a5d4 the
re-connection code runs in separate threads. In the original
implementation the maximum of domain ID's (that is used as an
initializer for numbering guests created next) while libvirt was
reconnecting to the guest.
With the threaded implementation this opens a possibility for race
conditions with the thread that is autostarting guests. When there's a
guest running with id 1 and the daemon is restarted. The autostart code
is reached first and spawns the first guest that should be autostarted
as id 1. This results into the following unwanted situation:
# virsh list
Id Name State
----------------------------------------------------
1 guest1 running
1 guest2 running
This patch extracts the detection code before the re-connection threads
are started so that the maximum id of the guests being reconnected to is
known.
The only semantic change created by this is if the guest with greatest ID
quits before we are able to reconnect it's ID is used anyway as the
greatest one as without this patch the greatest ID of a process we could
successfuly reconnect to would be used.
This patch adds support for external disk snapshots of inactive domains.
The snapshot is created by calling using qemu-img by calling:
qemu-img create -f format_of_snapshot -o
backing_file=/path/to/src,backing_fmt=format_of_backing_image
/path/to/snapshot
in case the backing image format is known or probing is allowed and
otherwise:
qemu-img create -f format_of_snapshot -o backing_file=/path/to/src
/path/to/snapshot
on each of the disks selected for snapshotting. This patch also modifies
the snapshot preparing function to support creating external snapshots
and to sanitize arguments. For now the user isn't able to mix external
and internal snapshots but this restriction might be lifted in the
future.
Some operations, APIs needs domain to be paused prior operation can be
performed, e.g. (managed-) save of a domain. The processors should be
restored in the end. However, if 'cont' fails for some reason, we log a
message but this is not sufficient as an event should be emitted as
well. Mgmt application can then decide what to do.
The code that was split out into the qemuDomainSaveMemory expands the
pointer containing the XML description of the domain that it gets from
higher layers. If the pointer changes the old one is invalid and the
upper layer function tries to free it causing an abort.
This patch changes the expansion of the original string to a new
allocation and copy of the contents.
qemu is sensitive to the order of arguments passed. Hence, if a
device requires a controller, the controller cmd string must
precede device cmd string. The same apply for controllers, when
for instance ccid controller requires usb controller. So
controllers create partial ordering in which they should be added
to qemu cmd line.
Some of the pre-snapshot check have restrictions wired in regarding
configuration options that influence taking of external checkpoints.
This patch removes restrictions that would inhibit taking of such a
snapshot.
This patch adds support to take external system checkpoints.
The functionality is layered on top of the previous disk-only snapshot
code. When the checkpoint is requested the domain memory is saved to the
memory image file using migration to file. (The user may specify to
take the memory image while the guest is live with the
VIR_DOMAIN_SNAPSHOT_CREATE_LIVE flag.)
The memory save image shares format with the image created by
virDomainSave() API.
Before now, libvirt supported only internal snapshots for active guests.
This patch renames this function to qemuDomainSnapshotCreateActiveInternal
to prepare the grounds for external active snapshots.
The new external system checkpoints will require an async job while the
snapshot is taken. This patch adds QEMU_ASYNC_JOB_SNAPSHOT to track this
job type.
The code that saves domain memory by migration to file can be reused
while doing external checkpoints of a machine. This patch extracts the
common code and places it in a separate function.
When pausing the guest while migration is running (to speed up
convergence) the virDomainSuspend API checks if the migration job is
active before entering the job. This could cause a possible race if the
virDomainSuspend is called while the job is active but ends before the
Suspend API enters the job (this would require that the migration is
aborted). This would cause a incorrect event to be emitted.
Both system checkpoint snapshots and disk snapshots were iterating
over all disks, doing a final sanity check before doing any work.
But since future patches will allow offline snapshots to be either
external or internal, it makes sense to share the pass over all
disks, and then relax restrictions in that pass as new modes are
implemented. Future patches can then handle external disks when
the domain is offline, then handle offline --disk-snapshot, and
finally, combine with migration to file to gain a complete external
system checkpoint snapshot of an active domain without using 'savevm'.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotIsAllowed): Merge...
(qemuDomainSnapshotPrepare): ...into one function.
(qemuDomainSnapshotCreateXML): Update caller.
Now that the XML supports listing internal snapshots, it is worth
always populating the <memory> and <disks> element to match.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Always
parse disk info and set memory info.
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=871273
when using virsh qemu-attach to attach an existing qemu process,
if it misses the -M option in qemu command line, libvirtd crashed
because the NULL value of def->os.machine in later use.
Example:
/usr/libexec/qemu-kvm -name foo \
-cdrom /var/lib/libvirt/images/boot.img \
-monitor unix:/tmp/demo,server,nowait \
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
This patch tries to set default machine type if the value of
def->os.machine is still NULL after qemu command line parsing.
Per the code comment in qemuCapsInitQMPBasic() and commit 43e23c7, we
should only use QMP for capabilities probing starting with 1.2 and
newer. The old code had dead logic that probed on 1.0 and newer.
Signed-off-by: Eric Blake <eblake@redhat.com>
The string comparison logic was inverted and matched the first drive
that does *not* have the name we search for.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The QEMU -drive id= begins with libvirt's QEMU host drive prefix
("drive-"), which is stripped off in several places two convert between
host ("-drive") and guest ("-device") device names.
In the case of BlkIoTune it is unnecessary to strip the QEMU host drive
prefix because we operate on "info block"/"query-block" output that uses
host drive names.
Stripping the prefix incorrectly caused string comparisons to fail since
we were comparing the guest device name against the host device name.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
QEMU uses 'i386' for its 32-bit x86 architecture, but libvirt
wants that to be 'i686', so we must fix it up
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=871756
Commit cd1e8d1 assumed that systems new enough to have journald
also have mkostemp; but this is not true for uclibc.
For that matter, use of mkstemp[s] is unsafe in a multi-threaded
program. We should prefer mkostemp[s] in the first place.
* bootstrap.conf (gnulib_modules): Add mkostemp, mkostemps; drop
mkstemp and mkstemps.
* cfg.mk (sc_prohibit_mkstemp): New syntax check.
* tools/virsh.c (vshEditWriteToTempFile): Adjust caller.
* src/qemu/qemu_driver.c (qemuDomainScreenshot)
(qemudDomainMemoryPeek): Likewise.
* src/secret/secret_driver.c (replaceFile): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainScreenshot): Likewise.
https://bugzilla.redhat.com/show_bug.cgi?id=871312
Recent fixes made almost all the right steps to make emulator pinned
to the cpuset of the whole domain in case <emulatorpin> isn't
specified, but qemudDomainGetEmulatorPinInfo still reports all the
CPUs even when cpuset is specified. This patch fixes that.
When there is no 'qemu-kvm' binary and the emulator used for a machine
is, for example, 'qemu-system-x86_64' that, by default, runs without
kvm enabled, libvirt still supplies '-no-kvm' option to this process,
even though it does not recognize such option (making the start of a
domain fail in that case).
This patch fixes building a command-line for QEMU machines without KVM
acceleration and is based on following assumptions:
- QEMU_CAPS_KVM flag means that QEMU is running KVM accelerated
machines by default (without explicitly requesting that using a
command-line option). It is the closest to the truth according to
the code with the only exception being the comment next to the
flag, so it's fixed in this patch as well.
- QEMU_CAPS_ENABLE_KVM flag means that QEMU is, by default, running
without KVM acceleration and in case we need KVM acceleration it
needs to be explicitly instructed to do so. This is partially
true for the past (this option essentially means that QEMU
recognizes the '-enable-kvm' option, even though it's almost the
same).
Currently, we use iohelper when saving/restoring a domain.
However, if there's some kind of error (like I/O) it is not
propagated to libvirt. Since it is not qemu who is doing
the actual write() it will not get error. The iohelper does.
Therefore we should check for iohelper errors as it makes
libvirt more user friendly.
In the XML warning, we print a virsh command line that can be used to
edit that XML. This patch prints UUIDs if the entity name contains
special characters (like shell metacharacters, or "--" that would break
parsing of the XML comment). If the entity doesn't have a UUID, just
print the virsh command that can be used to edit it.
When using block copy to pivot over to a new chain, the backing files
for the new chain might still need labeling (particularly if the user
passes --reuse-ext with a relative backing file name). Relabeling a
file that is already labeled won't hurt, so this just labels the entire
chain at the point of the pivot. Doing the relabel of the chain uses
the fact that we already safely probed the file type of an external
file at the start of the block copy.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before
asking qemu to pivot.
Use the recent addition of qemuDomainPrepareDiskChainElement to
obtain locking manager lease, permit a block device through cgroups,
and set the SELinux label; then audit the fact that we hand a new
file over to qemu. Alas, releasing the lease and label at the end
of the mirroring is a trickier prospect (we would have to trace the
backing chain of both source and destination, and be sure not to
revoke rights to any part of the chain that is shared), so for now,
virDomainBlockJobAbort still leaves things with additional access
granted (as block-pull and block-commit have the same problem of
not clamping access after completion, a future cleanup would cover
all three commands).
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling.
Support the REUSE_EXT flag, in part by copying sanity checks from
snapshot code. This code introduces a case of probing an external
file for its type; such an action would be a security risk if the
existing file is supposed to be raw but the contents resemble some
other format; however, since the virDomainBlockRebase API has a
flag to force treating the file as raw rather than probe, we can
assume that probing is safe in all other instances. Besides, if
we don't probe or force raw, then qemu will.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT
flag.
(qemuDomainBlockCopy): Wire up flag, and add some sanity checks.
Minimal patch to wire up all the pieces in the previous patches
to actually enable a block copy job. By minimal, I mean that
qemu creates the file (that is, no REUSE_EXT flag support yet),
SELinux must be disabled, a lock manager is not informed, and the
audit logs aren't updated. But those will be added as
improvements in future patches.
This patch is designed so that if we ever add a future API
virDomainBlockCopy with more bells and whistles (such as letting
the user specify a destination image format different than the
source), where virDomainBlockRebase is a wrapper around the
simpler portions of the new functionality, then the new API can
just reuse the new qemuDomainBlockCopy function and already
support _SHALLOW and _REUSE_EXT flags. Also note that libvirt.c
already filtered the new flags if _COPY is not present, so that
we are not impacting the case of BlockRebase being a wrapper
around BlockPull.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
(qemuDomainBlockRebase): Call it when appropriate.
Since libvirt drops locks between issuing a monitor command and
getting a response, it is possible for libvirtd to be restarted
before getting a response on a block-job-complete command; worse, it
is also possible for the guest to shut itself down during the window
while libvirtd is down, ending the qemu process. A management app
needs to know if the pivot happened (and the destination file
contains guest contents not in the source) or failed (and the source
file contains guest contents not in the destination), but since
the job is finished, 'query-block-jobs' no longer tracks the
status of the job, and if the qemu process itself has disappeared,
even 'query-block' cannot be checked to ask qemu its current state.
At the time of this patch, the design for persistent bitmap has not
been clarified, so a followup patch will be needed once qemu
actually figures out how to expose it, and we figure out how to use
it. In the meantime, we have a solution that avoids the worst of
the problem. [This problem was first analyzed with the RHEL 6.3
__com.redhat_drive-reopen command; which partly explains why
upstream qemu 1.3 ditched the drive-reopen idea and went with
block-job-complete plus persistent bitmap instead.]
If we surround 'drive-reopen' with a pause/resume pair, then we can
guarantee that the guest cannot modify either source or destination
files in the window of libvirtd uncertainty, and the management app
is guaranteed that either libvirt knows the outcome and reported it
correctly; or that on libvirtd restart, the guest will still be
paused and that the qemu process cannot have disappeared due to
guest shutdown; and use that as a clue that the management app must
implement recovery protocol, with both source and destination files
still being in sync and with 'query-block' still being an option as
part of that recovery. My testing shows that the pause window will
typically be only a fraction of a second.
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around
drive-reopen.
(qemuDomainBlockJobImpl): Update caller.
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).
This patch intentionally avoids SELinux, lock manager, and audit
actions. Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands when libvirtd restarts. This
patch is enough to test the common case of success when used
correctly, while saving the subtleties of proper cleanup for
worst-case errors for later.
When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel). Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.
Ideas for future enhancements via new flags:
Depending on when persistent bitmap support is added, it may be
worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up
front if we detect an older qemu with risky pivot operation.
Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file. There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time. And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain. This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
Handle the new type of block copy event and info. Of course,
this patch does nothing until a later patch actually allows the
creation/abort of a block copy job.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New
block job status.
* src/libvirt.c (virDomainBlockRebase): Document the event.
* src/qemu/qemu_monitor_json.c (eventHandlers): New event.
(qemuMonitorJSONHandleBlockJobReady): New function.
(qemuMonitorJSONGetBlockJobInfoOne): Translate new job type.
(qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize
the event to minimize snooping.
* src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful
info query to save effort on a pivot request.
For now, disk migration via block copy job is not implemented in
libvirt. But when we do implement it, we have to deal with the
fact that qemu does not yet provide an easy way to re-start a qemu
process with mirroring still intact. Paolo has proposed an idea
for a persistent dirty bitmap that might make this possible, but
until that design is complete, it's hard to say what changes
libvirt would need. Even something like 'virDomainSave' becomes
hairy, if you realize the implications that 'virDomainRestore'
would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that
the initial client of live storage migration via disk mirroring is
oVirt, which always uses transient domains, and that if a transient
domain is destroyed while a mirror exists, oVirt can easily restart
the storage migration by creating a new domain that visits just the
source storage, with no loss in data.
We can make life a lot easier by being cowards for now, forbidding
certain operations on a domain. This patch guarantees that we
never get in a state where we would have to restart a domain with
a mirroring block copy, by preventing saves, snapshots, migration,
hot unplug of a disk in use, and conversion to a persistent domain
(thankfully, it is still relatively easy to 'virsh undefine' a
running domain to temporarily make it transient, run tests on
'virsh blockcopy', then 'virsh define' to restore the persistence).
Later, if the qemu design is enhanced, we can relax our code.
The change to qemudDomainDefine looks a bit odd for undoing an
assignment, rather than probing up front to avoid the assignment,
but this is because of how virDomainAssignDef combines both a
lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype.
* src/conf/domain_conf.c (virDomainHasDiskMirror): New function.
* src/libvirt_private.syms (domain_conf.h): Export it.
* src/qemu/qemu_driver.c (qemuDomainSaveInternal)
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous
actions while block copy is already in action.
* src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise.
* src/qemu/qemu_migration.c (qemuMigrationIsAllowed): Likewise.
Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror'
and 'block-job-complete'[1], which can drive live block copy and
storage migration. [Additionally, RHEL 6.3 had backported an earlier
version of most of the same functionality, but under the names
'__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with
slightly different JSON arguments, and has been using patches similar
to these upstream patches for several months now.]
The libvirt API virDomainBlockRebase as already committed for 0.9.12
is flexible enough to expose the basics of block copy, but some
additional features in the 'drive-mirror' qemu command, such as
setting error policy, setting granularity, or using a persistent
bitmap, may later require a new libvirt API virDomainBlockCopy. I
will wait to add that API until we know more about what qemu 1.3
will finally provide.
This patch caters only to the upstream qemu 1.3 interface, although
I have proven that the changes for RHEL 6.3 can be isolated to
just qemu_monitor_json.c, and the rest of this series will
gracefully handle either interface once the JSON differences are
papered over in a downstream patch.
For consistency with other block job commands, libvirt must handle
the bandwidth argument as MiB/sec from the user, even though qemu
exposes the speed argument as bytes/sec; then again, qemu rounds
up to cluster size internally, so using MiB hides the worst effects
of that rounding if you pass small numbers.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg04123.html
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR)
(QEMU_CAPS_DRIVE_REOPEN): New bits.
* src/qemu/qemu_capabilities.c (qemuCaps): Name them.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set
them.
(qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror)
(qemuMonitorDrivePivot): Declare them.
* src/qemu/qemu_monitor.c (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): New passthroughs.
* src/qemu/qemu_monitor.h (qemuMonitorDriveMirror)
(qemuMonitorDrivePivot): Declare them.
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=862515
which describes inconsistencies in dealing with duplicate mac
addresses on network devices in a domain.
(at any rate, it resolves *almost* everything, and prints out an
informative error message for the one problem that isn't solved, but
has a workaround.)
A synopsis of the problems:
1) you can't do a persistent attach-interface of a device with a mac
address that matches an existing device.
2) you *can* do a live attach-interface of such a device.
3) you *can* directly edit a domain and put in two devices with
matching mac addresses.
4) When running virsh detach-device (live or config), only MAC address
is checked when matching the device to remove, so the first device
with the desired mac address will be removed. This isn't always the
one that's wanted.
5) when running virsh detach-interface (live or config), the only two
items that can be specified to match against are mac address and model
type (virtio, etc) - if multiple netdevs match both of those
attributes, it again just finds the first one added and assumes that
is the only match.
Since it is completely valid to have multiple network devices with the
same MAC address (although it can cause problems in many cases, there
*are* valid use cases), what is needed is:
1) remove the restriction that prohibits doing a persistent add of a
netdev with a duplicate mac address.
2) enhance the backend of virDomainDetachDeviceFlags to check for
something that *is* guaranteed unique (but still work with just mac
address, as long as it yields only a single results.
This patch does three things:
1) removes the check for duplicate mac address during a persistent
netdev attach.
2) unifies the searching for both live and config detach of netdevices
in the subordinate functions of qemuDomainModifyDeviceFlags() to use the
new function virDomainNetFindIdx (which matches mac address and PCI
address if available, checking for duplicates if only mac address was
specified). This function returns -2 if multiple matches are found,
allowing the callers to print out an appropriate message.
Steps 1 & 2 are enough to fully fix the problem when using virsh
attach-device and detach-device (which require an XML description of
the device rather than a bunch of commandline args)
3) modifies the virsh detach-interface command to check for multiple
matches of mac address and show an error message suggesting use of the
detach-device command in cases where there are multiple matching mac
addresses.
Later we should decide how we want to input a PCI address on the virsh
commandline, and enhance detach-interface to take a --address option,
eliminating the need to use detach-device
* src/conf/domain_conf.c
* src/conf/domain_conf.h
* src/libvirt_private.syms
* added new virDomainNetFindIdx function
* removed now unused virDomainNetIndexByMac and
virDomainNetRemoveByMac
* src/qemu/qemu_driver.c
* remove check for duplicate max from qemuDomainAttachDeviceConfig
* use virDomainNetFindIdx/virDomainNetRemove instead
of virDomainNetRemoveByMac in qemuDomainDetachDeviceConfig
* use virDomainNetFindIdx instead of virDomainIndexByMac
in qemuDomainUpdateDeviceConfig
* src/qemu/qemu_hotplug.c
* use virDomainNetFindIdx instead of a homespun loop in
qemuDomainDetachNetDevice.
* tools/virsh-domain.c: modified detach-interface command as described
above
It turns out that the cpuacct results properly account for offline
cpus, and always returns results for every possible cpu, not just
the online ones. So there is no need to check the map of online
cpus in the first place, merely only a need to know the maximum
possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change
from returning the maximum id to instead returning the width of
the bitmap (one larger than the maximum id) in commit 2f4c5338,
which made this code encounter some off-by-one logic leading to
bad error messages when a cpu was offline:
$ virsh cpu-stats dom
error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
Cleaning this up unraveled a chain of other unused variables.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop
pointless check for cpumap changes, and use correct number of
cpus. Simplify signature.
(qemuDomainGetCPUStats): Adjust caller.
* src/nodeinfo.h (nodeGetCPUCount): New prototype.
(nodeGetCPUBitmap): Drop unused parameter.
* src/nodeinfo.c (nodeGetCPUBitmap): Likewise.
(nodeGetCPUMap): Adjust caller.
(nodeGetCPUCount): New function.
* src/libvirt_private.syms (nodeinfo.h): Export it.
Callers should not need to know what the name of the file to
be read in the Linux-specific version of nodeGetCPUmap;
furthermore, qemu cares about online cpus, not present cpus,
when determining which cpus to skip.
While at it, I fixed the fact that we were computing the maximum
online cpu id by doing a slow iteration, when what we really want
to know is the max available cpu.
* src/nodeinfo.h (nodeGetCPUmap): Rename...
(nodeGetCPUBitmap): ...and simplify signature.
* src/nodeinfo.c (linuxParseCPUmax): New function.
(linuxParseCPUmap): Simplify and alter signature.
(nodeGetCPUBitmap): Change implementation.
* src/libvirt_private.syms (nodeinfo.h): Reflect rename.
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Update
caller.
On one hand, numad probably will manage the affinity of domain process
dynamically in future. On the other hand, even numad won't manage it,
it still could confusion. Let's make things simpler enough to avoid
the lair for now.
When the cpu placement model is "auto", it sets the affinity for
domain process with the advisory nodeset from numad, however,
creating cgroup for the domain process (called emulator thread
in some contexts) later overrides that with pinning it to all
available pCPUs.
How to reproduce:
* Configure the domain with "auto" placement for <vcpu>, e.g.
<vcpu placement='auto'>4</vcpu>
* % virsh start dom
* % cat /proc/$dompid/status
Though the emulator cgroup cause conflicts, but we can't simply
prohibit creating it, as other tunables are still useful, such
as "emulator_period", which is used by API
virDomainSetSchedulerParameter. So this patch doesn't prohibit
creating the emulator cgroup, but inherit the nodeset from numad,
and reset the affinity for domain process.
* src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
to accept the passed nodenet
* src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
Abstract the codes to prepare cpumap into a helper a function,
which can be used later.
* src/qemu/qemu_process.h: Declare qemuPrepareCpumap
* src/qemu/qemu_process.c: Implement qemuPrepareCpumap, and use it.
Transport Open vSwitch per-port data during live
migration by using the utility functions
virNetDevOpenvswitchGetMigrateData() and
virNetDevOpenvswitchSetMigrateData().
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Add the ability for the Qemu V3 migration protocol to
include transporting network configuration. A generic
framework is proposed with this patch to allow for the
transfer of opaque data.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Laine Stump <laine@laine.org>
The snapshot code when reusing an existing file had hard-to-read
logic, as well as a missing sanity check: REUSE_EXT should require
the destination to already be present.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare): Require
destination on REUSE_EXT, rename variable for legibility.
Currently it's assumed that qemu always supports VNC, however it is
definitely possible to compile qemu without VNC support so we should at
the very least check for it and handle that correctly.
Yet another instance of where using plain open() mishandles files
that live on root-squash NFS, and where improving the API can
improve the chance of a successful probe.
* src/util/storage_file.h (virStorageFileProbeFormat): Alter
signature.
* src/util/storage_file.c (virStorageFileProbeFormat): Use better
method for opening file.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Update caller.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
In v2 migration protocol, XML is obtained by calling domainGetXMLDesc.
This includes the default USB controller in XML, which breaks migration
to older libvirt (before 0.9.2).
Commit 409b5f5495
qemu: Emit compatible XML when migrating a domain
only fixed this for v3 migration.
This patch uses the new VIR_DOMAIN_XML_MIGRATABLE flag (detected by
VIR_DRV_FEATURE_XML_MIGRATABLE) to obtain XML without the default controller,
enabling backward v2 migration.
As we switched to setting capabilities based on QMP communication,
qemu seamless-migration capability was not set. In the -help output
this knob is called seamless-migration=[on|off]. The equivalent in
QMP world is SPICE_MIGRATE_COMPLETED event (qemu upstream commit
2fdd16e2).
Gcc with optimization warns:
../../src/qemu/qemu_driver.c: In function 'qemuDomainBlockCommit':
../../src/qemu/qemu_driver.c:12813:46: error: 'disk' may be used uninitialized in this function [-Werror=maybe-uninitialized]
../../src/qemu/qemu_driver.c:12698:25: note: 'disk' was declared here
cc1: all warnings being treated as errors
so obviously I had only been testing with optimization off.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Guard cleanup.
I finally have all the pieces in place to perform a block-commit with
SELinux enforcing. There's still missing cleanup work when the commit
completes, but doing that requires tracking both the backing chain and
the base and top files within that chain in domain XML across libvirtd
restarts. Furthermore, from a security standpoint, once you have
granted access, you must assume any damage that can be done will be
done; later revoking access is nice to minimize the window of damage,
but less important as it does not affect the fact that damage can be
done in the first place. Therefore, deferring the revoke efforts until
we have better XML tracking of what chain operations are in effect,
including across a libvirtd restart, is reasonable.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Label disks as
needed.
(qemuDomainPrepareDiskChainElement): Cast away const.
Previously, snapshot code did its own permission granting (lock
manager, cgroup device controller, and security manager labeling)
inline. But now that we are adding block-commit and block-copy
which also have to change permissions, it's better to reuse
common code for the task. While snapshot should fall back to
no access if read-write access failed, block-commit will want to
fall back to read-only access. The common code doesn't know
whether failure to grant read-write access should revert to no
access (snapshot, block-copy) or read-only access (block-commit).
This code can also be used to revoke access to unused files after
block-pull.
It might be nice to clean things up in a future patch by adding
new functions to the lock manager, cgroup manager, and security
manager that takes a single file name and applies context of a
disk to that file, rather than the current semantics of applying
context to the entire chain already associated to a disk. That
way, we could avoid the games this patch plays of temporarily
swapping out the disk->src and related fields of the disk. But
that would involve more code changes, so this patch really is
the smallest hack for doing the necessary work; besides, this
patch is more or less code motion (the hack was already employed
by the snapshot creation code, we are just making it reusable).
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateSingleDiskActive)
(qemuDomainSnapshotUndoSingleDiskActive): Refactor labeling hacks...
(qemuDomainPrepareDiskChainElement): ...into new function.
Now that we can crawl the chain of backing files, we can do
argument validation and implement the 'shallow' flag. In
testing this, I discovered that it can be handy to pass the
shallow flag and an explicit base, as a means of validating
that the base is indeed the file we expected.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Crawl through
chain to implement shallow flag.
* src/libvirt.c (virDomainBlockCommit): Relax API.
This is the bare minimum to kick off a block commit. In particular,
flags support is missing (shallow requires us to crawl the backing
chain to determine the file name to pass to the qemu monitor command;
delete requires us to track what needs to be deleted at the time
the completion event fires). Also, we are relying on qemu to do
error checking (such as validating 'top' and 'base' as being members
of the backing chain), including the fact that the current qemu code
does not support committing the active layer (although it is still
planned to add that before qemu 1.3). Since the active layer won't
change, we have it easy and do not have to alter the domain XML.
Additionally, this will fail if SELinux is enforcing, because we fail
to grant qemu proper read/write access to the files it will modify.
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): New function.
(qemuDriver): Register it.
qemu 1.3 will be adding a 'block-commit' monitor command, per
qemu.git commit ed61fc1. It matches nicely to the libvirt API
virDomainBlockCommit.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_BLOCK_COMMIT): New bit.
* src/qemu/qemu_capabilities.c (qemuCapsProbeQMPCommands): Set it.
* src/qemu/qemu_monitor.h (qemuMonitorBlockCommit): New prototype.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONBlockCommit):
Likewise.
* src/qemu/qemu_monitor.c (qemuMonitorBlockCommit): Implement it.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockCommit):
Likewise.
(qemuMonitorJSONHandleBlockJobImpl)
(qemuMonitorJSONGetBlockJobInfoOne): Handle new event type.
We used to walk the backing file chain at least twice per disk,
once to set up cgroup device whitelisting, and once to set up
security labeling. Rather than walk the chain every iteration,
which possibly includes calls to fork() in order to open root-squashed
NFS files, we can exploit the cache of the previous patch.
* src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
signature.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
to supply backing chain via disk, if recursion is desired.
* src/security/security_dac.c
(virSecurityDACSetSecurityImageLabel): Adjust caller.
* src/security/security_selinux.c
(virSecuritySELinuxSetSecurityImageLabel): Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
* src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
(qemuTeardownDiskCgroup): Likewise.
(qemuSetupCgroup): Pre-populate chain.
Technically, we should not be re-probing any file that qemu might
be currently writing to. As such, we should cache the backing
file chain prior to starting qemu. This patch adds the cache,
but does not use it until the next patch.
Ultimately, we want to also store the chain in domain XML, so that
it is remembered across libvirtd restarts, and so that the only
kosher way to modify the backing chain of an offline domain will be
through libvirt API calls, but we aren't there yet. So for now, we
merely invalidate the cache any time we do a live operation that
alters the chain (block-pull, block-commit, external disk snapshot),
as well as tear down the cache when the domain is not running.
* src/conf/domain_conf.h (_virDomainDiskDef): New field.
* src/conf/domain_conf.c (virDomainDiskDefFree): Clean new field.
* src/qemu/qemu_domain.h (qemuDomainDetermineDiskChain): New
prototype.
* src/qemu/qemu_domain.c (qemuDomainDetermineDiskChain): New
function.
* src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
(qemuDomainChangeDiskMediaLive): Pre-populate chain.
(qemuDomainSnapshotCreateSingleDiskActive): Uncache chain before
snapshot.
* src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Update
chain after block pull.
Requiring pre-allocation was an unusual idiom. It allowed iteration
over the backing chain to use fewer mallocs, but made one-shot
clients harder to read. Also, this makes it easier for a future
patch to move away from opening fds on every iteration over the chain.
* src/util/storage_file.h (virStorageFileGetMetadataFromFD): Alter
signature.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Allocate
return value.
(virStorageFileGetMetadata): Update clients.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/storage/storage_backend_fs.c (virStorageBackendProbeTarget):
Likewise.
This is the last use of raw strings for disk formats throughout
the src/conf directory.
* src/conf/snapshot_conf.h (_virDomainSnapshotDiskDef): Store enum
rather than string for disk type.
* src/conf/snapshot_conf.c (virDomainSnapshotDiskDefClear)
(virDomainSnapshotDiskDefParseXML, virDomainSnapshotDefFormat):
Adjust users.
* src/qemu/qemu_driver.c (qemuDomainSnapshotDiskPrepare)
(qemuDomainSnapshotCreateSingleDiskActive): Likewise.
Express the default disk type as an enum, for easier handling.
* src/conf/capabilities.h (_virCaps): Store enum rather than
string for disk type.
* src/conf/domain_conf.c (virDomainDiskDefParseXML): Adjust
clients.
* src/qemu/qemu_driver.c (qemuCreateCapabilities): Likewise.
When an image has no backing file, using VIR_STORAGE_FILE_AUTO
for its type is a bit confusing. Additionally, a future patch
would like to reserve a default value for the case of no file
type specified in the XML, but different from the current use
of -1 to imply probing, since probing is not always safe.
Also, a couple of file types were missing compared to supported
code: libxl supports 'vhd', and qemu supports 'fat' for directories
passed through as a file system.
* src/util/storage_file.h (virStorageFileFormat): Add
VIR_STORAGE_FILE_NONE, VIR_STORAGE_FILE_FAT, VIR_STORAGE_FILE_VHD.
* src/util/storage_file.c (virStorageFileMatchesVersion): Match
documentation when version probing not supported.
(cowGetBackingStore, qcowXGetBackingStore, qcow1GetBackingStore)
(qcow2GetBackingStoreFormat, qedGetBackingStore)
(virStorageFileGetMetadataFromBuf)
(virStorageFileGetMetadataFromFD): Take NONE into account.
* src/conf/domain_conf.c (virDomainDiskDefForeachPath): Likewise.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Likewise.
* src/conf/storage_conf.c (virStorageVolumeFormatFromString): New
function.
(poolTypeInfo): Use it.
Relabeling tapfd right after the tap device is created.
qemuPhysIfaceConnect is common function called both for static
netdevs and for hotplug netdevs.
Having hostuuid in migration cookie is a nice bonus since it provides an
easy way of detecting migration to the same host. However, requiring it
breaks backward compatibility with older libvirt releases.
Recently, patches were added support for (managed)saving, restoring, and
migrating domains with host USB devices. However, qemu driver would
still forbid migration of such domains because qemuMigrationIsAllowed
was not updated.
If we can't probe the architecture from QMP we parse the architecture
from the qemu binaries name. This results in the architecture being i386
instead of i686 which then results in QEMU_CAPS_PCI_MULTIBUS being unset
which gives a broken qemu command line.
This probably didn't show up earlier since most of the time there's also
a /usr/bin/qemu around which results in i686 capabilities.