Basically a followup of the previous patch about balloon desactivation
if desactivated, to not ask for balloon information to qemu as we will
just get an error back.
This can make a huge difference in the time needed for domain
information or list when a machine is loaded, and balloon has been
desactivated in the guests.
* src/qemu/qemu_driver.c: do not get the balloon info if the balloon
suppor is disabled
The balloon device is automatically added to qemu guests if supported,
but it may be useful to desactivate it. The simplest to not change the
existing behaviour is to allow
<memballoon type="none"/>
as an extra option to desactivate it (it is automatically added if the
memballoon construct is missing for the domain).
The following simple patch just adds the extra option and does not
change the default behaviour but avoid creating a balloon device if
type="none" is used.
* docs/schemas/domain.rng: add the extra type attribute value
* src/conf/domain_conf.c src/conf/domain_conf.h: add the extra enum
value
* src/qemu/qemu_conf.c: if enum is NONE, don't activate the device,
i.e. don't pass the args to qemu/kvm
device_del command is not synchronous for PCI devices, it merely asks
the guest to release the device and returns. If the host wants to use
that device before the guest actually releases it, we are in big
trouble. To avoid this, we already added a loop which waits up to 10
seconds until the device is actually released before we do anything else
with that device. But we only added this loop for managed PCI devices
before we try reattach them back to the host.
However, we need to wait even for non-managed devices. We don't reattach
them automatically, but we still want to prevent the host from using it.
This was revealed thanks to sVirt: when we relabel sysfs files
corresponding to the PCI device before the guest finished releasing the
device, qemu is no longer allowed to access those files and if it wants
(as a result of guest's request) to write anything to them, it just
exits, which kills the guest.
This is not a proper fix and needs some further work both on libvirt and
qemu side in the future.
Fix the error checking to use the return value from brAddTap() instead
of checking the current errno value which might have been changed by
clean up calls inside of brAddTap().
Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
Added a more detailed error message when adding a tap devices fails and
the kernel is missing tun support.
Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
the followup on the boot=on problem, basically it's not needed to
specify it when booting out of IDE devices when using KVM
* src/qemu/qemu_conf.c: do not use boot=on for IDE devices
* tests/qemuxml2argvdata/qemuxml2argv*.args: this changes the output
for 5 of the tests
Patch version revamped by Eric Blake <eblake@redhat.com> of Jiri
Denemark <jdenemar@redhat.com> original patch
When attaching a PCI device which doesn't explicitly set its PCI
address, libvirt allocates the address automatically. The problem is
that when checking which PCI address is unused, we only check for those
with slot number higher than the highest slot number ever used.
Thus attaching/detaching such device several times in a row (31 is the
theoretical limit, less then 30 tries are enough in practise) makes any
further device attachment fail. Furthermore, attaching a device with
predefined PCI address to 0:0:31 immediately forbids attachment of any
PCI device without explicit address.
This patch changes the logic so that we always check all PCI addresses
before we say there is no PCI address available.
Modifications from v1: revert back to remembering the last slot
reserved, but allow wraparound to not be limited by the end.
In this way, slots are still assigned in the same order as
before the patch, rather than filling in the gaps closest to
0 and risking making windows guests mad.
* src/qemu/qemu_conf.c: fix pci reservation code to do a round-robbin
check of all available PCI splot availability before failing.
Basically the 'boot=on' boot selection device is something present in
KVM but not in upstream QEmu, as a result if we boot a QEmu domain
without KVM acceleration we must disable boot=on ... even if the front
end kvm binary expose that capability in the help page.
* src/qemu/qemu_conf.c: in qemudBuildCommandLine if -no-kvm
is passed, then deactivate QEMUD_CMD_FLAG_DRIVE_BOOT
ADD_ARG_LIT should only be used for literal arguments,
since it duplicates the memory. Since virBufferContentAndReset
is already allocating memory, we should only use ADD_ARG.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
If detecting the FLR flag of a pci device fails, then we
could run into the situation of trying to close a file
descriptor twice, once in pciInitDevice() and once in pciFreeDevice().
Fix that by removing the pciCloseConfig() in pciInitDevice() and
just letting pciFreeDevice() handle it.
Thanks to Chris Wright for pointing out this problem.
While we are at it, fix an error check. While it would actually
work as-is (since success returns 0), it's still more clear to
check for < 0 (as the rest of the code does).
Signed-off-by: Chris Lalancette <clalance@redhat.com>
All <console> devices now export a <target> type attribute. QEMU defaults
to 'serial', UML defaults to 'uml, xen can be either 'serial' or 'xen'
depending on fullvirt. Understandably there is lots of test fallout.
This will be used to differentiate between a serial vs. virtio console for
QEMU.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
targetType only tracks the actual <target> format we are parsing. Currently
we only fill abide this value for channel devices.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
There is actually a difference between the character device type (serial,
parallel, channel, ...) and the target type (virtio, guestfwd). Currently
they are awkwardly conflated.
Start to pull them apart by renaming targetType -> deviceType. This is
an entirely mechanical change.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
QEMU has had two different syntax for disk cache options
Old: on|off
New: writeback|writethrough|none
QEMU recently added another 'unsafe' option which broke the
libvirt check. We can avoid this & future breakage, if we
do a negative check for the old syntax, instead of a positive
check for the new syntax
* src/qemu/qemu_conf.c: Invert cache option check
Add a new element to the <os> block:
<bootmenu enable="yes|no"/>
Which maps to -boot,menu=on|off on the QEMU command line.
I decided to use an explicit 'enable' attribute rather than just make the
bootmenu element boolean. This allows us to treat lack of a bootmenu element
as 'use hypervisor default'.
When doing a PCI secondary bus reset, we must be sure that there are no
active devices on the same bus segment. The active device tracking is
designed to only track host devices that are active in use by guests.
This ignores host devices that are actively in use by the host. So the
current logic will reset host devices.
Switch this logic around and allow sbus reset when we are assigning all
devices behind a bridge to the same guest at guest startup or as a result
of a single attach-device command.
* src/util/pci.h: change signature of pciResetDevice to add an
inactive devices list
* src/qemu/qemu_driver.c src/xen/xen_driver.c: use (or not) the new
functionality of pciResetDevice() depending on the place of use
* src/util/pci.c: implement the interface and logic changes
- src/qemu/qemu_driver.c: Eliminate code duplication by using the new
helpers qemuPrepareHostdevPCIDevices and qemuDomainReAttachHostdevDevices.
This reduces the number of open coded calls to pciResetDevice.
- src/qemu/qemu_driver.c: These new helpers take hostdev list and count
directly rather than getting them indirectly from domain definition.
This will allow reuse for the attach-device case.
- src/qemu/qemu_driver.c: Update qemuGetPciHostDeviceList to take a
hostdev list and count directly, rather than getting this indirectly
from domain definition. This will allow reuse for the attach-device case.
Thanks to DV for knocking together the Relax-NG changes
quickly for me.
Changes since v1:
- Change the domain.rng to correspond to the new schema
- Don't allocate caps->ns in testQemuCapsInit since it is a static table
Changes since v2:
- Change domain.rng to add restrictions on allowed environment names
Changes since v3:
- Remove a bogus comment in the tests
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Implement the qemu driver's virDomainQemuMonitorCommand
and hook it into the API entry point.
Changes since v1:
- Rename the (external) qemuMonitorCommand to qemuDomainMonitorCommand
- Add virCheckFlags to qemuDomainMonitorCommand
Changes since v2:
- Drop ATTRIBUTE_UNUSED from the flags
Changes since v3:
- Add a flag to priv so we only print out monitor command warning once. Note
that this has not been plumbed into qemuDomainObjPrivateXMLFormat or
qemuDomainObjPrivateXMLParse, which means that if you run a monitor command,
restart libvirtd, and then run another monitor command, you may get an
an erroneous VIR_INFO. It's a pretty minor matter, and I didn't think it
warranted the additional code.
- Add BeginJob/EndJob calls around EnterMonitor/ExitMonitor
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Add the library entry point for the new virDomainQemuMonitorCommand()
entry point. Because this is not part of the "normal" libvirt API,
it gets its own header file, library file, and will eventually
get its own over-the-wire protocol later in the series.
Changes since v1:
- Go back to using the virDriver table for qemuDomainMonitorCommand, due to
linking issues
- Added versioning information to the libvirt-qemu.so
Changes since v2:
- None
Changes since v3:
- Add LGPL header to libvirt-qemu.c
- Make virLibConnError and virLibDomainError macros instead of function calls
Changes since v4:
- Move exported symbols to libvirt_qemu.syms
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Now that we have the ability to specify arbitrary qemu
command-line parameters in the XML, use it to handle unknown
command-line parameters when doing a native-to-xml conversion.
Changes since v1:
- Rename num_extra to num_args
- Fix up a memory leak on an error path
Changes since v2:
- Add a VIR_WARN when adding the argument via qemu:arg
Changes since v3:
- None
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Implement the qemu hooks for XML namespace data. This
allows us to specify a qemu XML namespace, and then
specify:
<qemu:commandline>
<qemu:arg value='arg'/>
<qemu:env name='name' value='value'/>
</qemu:commandline>
In the domain XML.
Changes since v1:
- Change the <qemu:arg>arg</qemu:arg> XML to <qemu:arg value='arg'/> XML
- Fix up some memory leaks in qemuDomainDefNamespaceParse
- Rename num_extra and extra to num_args and args, respectively
- Fixed up some error messages
- Make sure to escape user-provided data in qemuDomainDefNamespaceFormatXML
Changes since v2:
- Add checking to ensure environment variable names are valid
- Invert the logic in qemuDomainDefNamespaceFormatXML to return early
Changes since v3:
- Change strspn() to c_isalpha() check of first letter of environment variable
Signed-off-by: Chris Lalancette <clalance@redhat.com>
A Linux software bridge will assume the MAC address of the enslaved
interface with the numerically lowest MAC addr. When the bridge
changes MAC address there is a period of network blackout, so a
change should be avoided. The kernel gives TAP devices a completely
random MAC address. Occassionally the random TAP device MAC is lower
than that of the physical interface (eth0, eth1etc) that is enslaved,
causing the bridge to change its MAC.
This change sets an explicit MAC address for all TAP devices created
using the configured MAC from the XML, but with the high byte set
to 0xFE. This should ensure TAP device MACs are higher than any
physical interface MAC.
* src/qemu/qemu_conf.c, src/uml/uml_conf.c: Pass in a MAC addr
for the TAP device with high byte set to 0xFE
* src/util/bridge.c, src/util/bridge.h: Set a MAC when creating
the TAP device to override random MAC
The PCI slot 1 must be reserved at all times, since PIIX3 is
always present, even if no IDE device is in use for guest disks
* src/qemu/qemu_conf.c: Always reserve slot 1 for PIIX3
virFileOperation previously returned 0 on success, or the value of
errno on failure. Although there are other functions in libvirt that
use this convention, the preferred (and more common) convention is to
return 0 on success and -errno (or simply -1 in some cases) on
failure. This way the check for failure is always (ret < 0).
* src/util/util.c - change virFileOperation and virFileOperationNoFork to
return -errno on failure.
* src/storage/storage_backend.c, src/qemu/qemu_driver.c
- change the hook functions passed to virFileOperation to return
-errno on failure.
To try and ensure that people upgrading from old QEMU get guests
with the same PCI device ordering, change the way we assign addrs
to match QEMU's default order. This should make Windows less
annoyed.
* src/qemu/qemu_conf.c: Follow QEMU's default PCI ordering
logic when assigning addresses
* tests/*.args: Update for changed PCI addresses
To allow compatibility with older QEMU PCI device slot assignment
it is necessary to explicitly track the balloon device in the
XML. This introduces a new device
<memballoon model='virtio|xen'/>
It can also have a PCI address, auto-assigned if necessary.
The memballoon will be automatically added to all Xen and QEMU
guests by default.
* docs/schemas/domain.rng: Add <memballoon> element
* src/conf/domain_conf.c, src/conf/domain_conf.h: parsing
and formatting for memballoon device. Always add a memory
balloon device to Xen/QEMU if none exists in XML
* src/libvirt_private.syms: Export memballoon model APIs
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Honour the
PCI device address in memory balloon device
* tests/*: Update to test new functionality
The first VGA and IDE devices need to have fixed PCI address
reservations. Currently this is handled inline with the other
non-primary VGA/IDE devices. The fixed virtio balloon device
at slot 3, ensures auto-assignment skips the slots 1/2. The
virtio address will shortly become configurable though. This
means the reservation of fixed slots needs to be done upfront
to ensure that they don't get re-used for other devices.
This is more or less reverting the previous changeset:
commit 83acdeaf17
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Wed Feb 3 16:11:29 2010 +0000
Fix restore of QEMU guests with PCI device reservation
The difference is that this time, instead of unconditionally
reserving the address, we only reserve the address if it was
initially type=none. Addresses of type=pci were handled
earlier in process by qemuDomainPCIAddressSetCreate(). This
ensures restore step doesn't have problems
* src/qemu/qemu_conf.c: Reserve first VGA + IDE address
upfront
The VIR_ERR_NO_SUPPORT refers to an API which is not implemented.
There is a separate VIR_ERR_CONFIG_UNSUPPORTED for XML config
options that are not available with the current hypervisor.
* src/qemu/qemu_conf.c, src/qemu/qemu_driver.c: Remove
many VIR_ERR_NO_SUPPORT replace with VIR_ERR_CONFIG_UNSUPPORTED
If you try to execute two concurrent migrations p2p
from A->B and B->A, the two libvirtd's will deadlock
trying to perform the migrations. The reason for this is
that in p2p migration, the libvirtd's are responsible for
making the RPC Prepare, Migrate, and Finish calls. However,
they are currently holding the driver lock while doing so,
which basically guarantees deadlock in this scenario.
This patch fixes the situation by adding
qemuDomainObjEnterRemoteWithDriver and
qemuDomainObjExitRemoteWithDriver helper methods. The Enter
take an additional object reference, then drops both the
domain object lock and the driver lock. The Exit takes
both the driver and domain object lock, then drops the
reference. Adding calls to these Enter and Exit helpers
around remote calls in the various migration methods
seems to fix the problem for me in testing.
This should make the situation safe. The additional domain
object reference ensures that the domain object won't disappear
while this operation is happening. The BeginJob that is called
inside of qemudDomainMigratePerform ensures that we can't execute a
second migrate (or shutdown, or save, etc) job while the
migration is active. Finally, the additional check on the state
of the vm after we reacquire the locks ensures that we can't
be surprised by an external event (domain crash, etc).
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Record a default driver name/type in capabilities struct. Use this
when parsing disks if value is not set in XML config.
* src/conf/capabilities.h: Record default driver name/type for disks
* src/conf/domain_conf.c: Fallback to default driver name/type
when parsing disks
* src/qemu/qemu_driver.c: Set default driver name/type to raw
Disk format probing is now disabled by default. A new config
option in /etc/qemu/qemu.conf will re-enable it for existing
deployments where this causes trouble
The implementation of security driver callbacks often needs
to access the security driver object. Currently only a handful
of callbacks include the driver object as a parameter. Later
patches require this is many more places.
* src/qemu/qemu_driver.c: Pass in the security driver object
to all callbacks
* src/qemu/qemu_security_dac.c, src/qemu/qemu_security_stacked.c,
src/security/security_apparmor.c, src/security/security_driver.h,
src/security/security_selinux.c: Add a virSecurityDriverPtr
param to all security callbacks
Update the QEMU cgroups code, QEMU DAC security driver, SELinux
and AppArmour security drivers over to use the shared helper API
virDomainDiskDefForeachPath().
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Convert over to use virDomainDiskDefForeachPath()
Require the disk image to be passed into virStorageFileGetMetadata.
If this is set to VIR_STORAGE_FILE_AUTO, then the format will be
resolved using probing. This makes it easier to control when
probing will be used
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Set VIR_STORAGE_FILE_AUTO when calling virStorageFileGetMetadata.
* src/storage/storage_backend_fs.c: Probe for disk format before
calling virStorageFileGetMetadata.
* src/util/storage_file.h, src/util/storage_file.c: Remove format
from virStorageFileMeta struct & require it to be passed into
method.
* src/qemu/qemu_driver.c (qemuConnectMonitor): Correct erroneous
parenthesization in two expressions. Without this fix, failure
to set or clear SELinux security context in the monitor would go
undiagnosed. Also correct a diagnostic and split some long lines.
In case qemu supports -nodefconfig, libvirt adds uses it when launching
new guests. Since this option may affect CPU models supported by qemu,
we need to use it when probing for available models.
An indentation mistake meant that a check for return status
was not properly performed in all cases. This could result
in a crash on NULL pointer in a following line.
* src/qemu/qemu_monitor_json.c: Fix check for return status
when processing JSON for blockstats
Some, but not all, codepaths in the qemuMonitorOpen() method
would trigger the destroy callback. The caller does not expect
this to be invoked if construction fails, only during normal
release of the monitor. This resulted in a possible double-unref
of the virDomainObjPtr, because the caller explicitly unrefs
the virDomainObjPtr if qemuMonitorOpen() fails
* src/qemu/qemu_monitor.c: Don't invoke destroy callback from
qemuMonitorOpen() failure paths
Make sure to *not* call qemuDomainPCIAddressReleaseAddr if
QEMUD_CMD_FLAG_DEVICE is *not* set (for older qemu). This
prevents a crash when trying to do device detachment from
a qemu guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
In the current libvirt PCI code, there is no checking whether
a PCI device is in use by a guest when doing node device
detach or reattach. This causes problems when a device is
assigned to a guest, and the administrator starts issuing
nodedevice commands. Make it so that we check the list
of active devices when trying to detach/reattach, and only
allow the operation if the device is not assigned to a guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
This code was just recently added (by me) and didn't account for the
fact that stdin_path is sometimes NULL. If it's NULL, and
SetSecurityAllLabel fails, a segfault would result.
When the saved domain image is on an NFS share, at least some part of
domainSetSecurityAllLabel will fail (for example, selinux labels can't
be modified). To allow domain restore to still work in this case, just
ignore the errors.
Also restore the label to its original value after qemu is finished
with the file.
Prior to this patch, qemu domain restore did not function properly if
selinux was set to enforce.
If an active migration operation fails, or is cancelled by the
admin, the QEMU on the destination is shutdown and the one on
the source continues running. It is important in shutting down
the QEMU on the destination, the security drivers don't reset
the file labelling/permissions.
* src/qemu/qemu_driver.c: Don't reset labelling/permissions
on migration abort
The patches for shared storage migration were not correctly written
for json mode. Thus the 'blk' and 'inc' parameters were never being
set. In addition they didn't set the QEMU_MONITOR_MIGRATE_BACKGROUND
so migration was synchronous. Due to multiple bugs in QEMU's JSON
impl this wasn't noticed because it treated the sync migration requst
as asynchronous anyway. Finally 'background' parameter was converted
to take arbitrary flags but not renamed, and not all uses were changed
to unsigned int.
* src/qemu/qemu_driver.c: Set QEMU_MONITOR_MIGRATE_BACKGROUND in
doNativeMigrate
* src/qemu/qemu_monitor_json.c: Process QEMU_MONITOR_MIGRATE_NON_SHARED_DISK
and QEMU_MONITOR_MIGRATE_NON_SHARED_INC flags
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.h, src/qemu/qemu_monitor_text.c,
src/qemu/qemu_monitor_text.h: change 'int background' to
'unsigned int flags' in migration APIs. Add logging of flags
parameter
During incoming migration the QEMU monitor is not able to be
used. The incoming migration code did not keep hold of the
job lock because migration is split across multiple API calls.
This meant that further monitor commands on the guest would
hang until migration finished with no timeout.
In this change the qemuDomainMigratePrepare method sets the
job flag just before it returns. The qemuDomainMigrateFinish
method checks for this job flag & clears it once done. This
prevents any use of the monitor between prepare+finish steps.
The qemuDomainGetJobInfo method is also updated to refresh
the job elapsed time. This means that virsh domjobinfo can
return time data during incoming migration
* src/qemu/qemu_driver.c: Keep a job active during incoming
migration. Refresh job elapsed time when returning job info
When configuring serial, parallel, console or channel devices
with a file, dev or pipe backend type, it is necessary to label
the file path in the security drivers. For char devices of type
file, it is neccessary to pre-create (touch) the file if it does
not already exist since QEMU won't be allowed todo so itself.
dev/pipe configs already require the admin to pre-create before
starting the guest.
* src/qemu/qemu_security_dac.c: set file ownership for character
devices
* src/security/security_selinux.c: Set file labeling for character
devices
* src/qemu/qemu_driver.c: Add character devices to cgroup ACL
We previously assumed that if the -device option existed in qemu, that
-nodefconfig would also exist. It turns out that isn't the case, as
demonstrated by qemu-kvm-0.12.3 in Fedora 13.
*/src/qemu/qemu_conf.[hc] - add a new QEMUD_CMD_FLAG, set it via the
help output, and check it before adding
-nodefconfig to the qemu commandline.
The domain XML parsing code autogenerates disk address and
controller elements when they are not explicitly specified.
The code assumes a narrow SCSI bus (7 units per bus). ESX
uses a wide SCSI bus (16 units per bus).
This is a step towards controller support for the ESX driver.
We already use the '-nodefaults' command line arg with QEMU to stop
it adding any default devices to guests. Unfortunately, QEMU will
load global config files from /etc/qemu that may also add default
devices. These aren't blocked by '-nodefaults', so we need to also
add the '-nodefconfig' arg to prevent that.
Unfortunately these global config files are also used to define
custom CPU models. So in blocking global hardware device addition
we also block definitions of new CPU models. Libvirt doesn't know
about these custom CPU models though, so it would never make use
of them anyway. Thus blocking them via -nodefconfig isn't a show
stopping problem. We would need to expand libvirt's own CPU model
XML database to support these instead.
* src/qemu/qemu_conf.c: Add '-nodefconfig' if available
* tests/qemuxml2argvdata/: Add '-nodefconfig' to all data files which
have '-nodefaults' present
The current code pattern requires that callers of qemuMonitorClose
check for the return value == 0, and if so, set priv->mon = NULL
and release the reference held on the associated virDomainObjPtr
The change d84bb6d6a3 violated that
requirement, meaning that priv->mon never gets set to NULL, and
a reference count is leaked on virDomainObjPtr.
This design was a bad one, so remove the need to check the return
valueof qemuMonitorClose(). Instead allow registration of a
callback that's invoked just when the last reference on qemuMonitorPtr
is released.
Finally there was a potential reference leak in qemuConnectMonitor
in the failure path.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add a destroy
callback invoked from qemuMonitorFree
* src/qemu/qemu_driver.c: Use the destroy callback to release the
reference on virDomainObjPtr when the monitor is freed. Fix other
potential reference count leak in connecting to monitor
Before issuing monitor commands it is neccessary to check whether
the guest is still running. Most places use virDomainIsActive()
correctly, but a few relied on 'priv->mon != NULL'. In theory
these should be equivalent, but the release of the last reference
count on priv->mon can be delayed a small amount of time until
the event handler is finally deregistered. A further ref counting
bug also means that priv->mon might be never released. In such a
case, code could mistakenly issue a monitor command and wait for
a response that will never arrive, effectively leaving the QEMU
driver waiting on virCondWait() forever..
To protect against these possibilities, make sure all code uses
virDomainIsActive(), not 'priv->mon != NULL'
* src/qemu/qemu_driver.c: Replace 'priv->mon != NULL' with
calls to 'priv->mon != NULL'()
Following Daniel Berrange's multiple helpful suggestions for improving
this patch and introducing another driver interface, I now wrote the
below patch where the nwfilter driver registers the functions to
instantiate and teardown the nwfilters with a function in
conf/domain_nwfilter.c called virDomainConfNWFilterRegister. Previous
helper functions that were called from qemu_driver.c and qemu_conf.c
were move into conf/domain_nwfilter.h with slight renaming done for
consistency. Those functions now call the function expored by
domain_nwfilter.c, which in turn call the functions of the new driver
interface, if available.
If VM startup fails early enough (can't find a referenced USB device),
libvirtd will crash trying to clear the VNC port bit, since port = 0,
which overflows us out of the bitmap bounds.
Fix this by being more defensive in the bitmap operations, and only
clearing a previously set VNC port.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Followup to https://bugzilla.redhat.com/show_bug.cgi?id=599091,
commit 20206a4b, to reduce disk waste in padding.
* src/qemu/qemu_monitor.h (QEMU_MONITOR_MIGRATE_TO_FILE_BS): Drop
back to 4k.
(QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE): New macro.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update comment.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToFile): Use
two invocations of dd to output non-aligned large blocks.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToFile):
Likewise.
Match earlier change for qemu pause support with virDomainCreateXML.
* src/qemu/qemu_driver.c (qemudDomainObjStart): Add parameter; all
callers changed.
(qemudDomainStartWithFlags): Implement flag support.
When a disk is on a root squashed NFS server, it may not be
possible to stat() the disk file in virCgroupAllowDevice.
The virStorageFileGetMeta method may also fail to extract
the parent backing store. Both of these errors have to be
ignored to avoid breaking NFS deployments
* src/qemu/qemu_driver.c: Ignore errors in cgroup setup to
keep root squash NFS happy
https://bugzilla.redhat.com/show_bug.cgi?id=589465
Some guests (eg with badly configured grub, or Windows' installation cd)
require quick response from the console user. That's why we have a
"launchPaused" option in vdsm.
To implement it via libvirt, we need to ask libvirt not to call
qemuMonitorStartCPUs() after starting qemu. Calling virDomainStop
immediately after the domain is up is inherently raceful.
* src/qemu/qemu_driver.c (qemudStartVMDaemon): Add new parameter;
all callers adjusted.
(qemudDomainCreate): Implement support for new flag.
When an attempt to hotplug a PCI device to a guest fails,
the device was left attached to pci-stub. It is neccessary
to reset the device and then attach it to the host driver
again.
* src/qemu/qemu_driver.c: Reattach PCI device to host if
hotadd fails
Any output at all from device_add indicates an error in the
command execution. Thus it needs to check for reply != ""
* src/qemu/qemu_monitor_text.c: Fix reply check for errors
to treat any output as an error
When SELinux is running in MLS mode, libvirtd will have a
different security level to the VMs. For libvirtd to be
able to connect to the monitor console, the client end of
the UNIX domain socket needs a different label. This adds
infrastructure to set the socket label via the security
driver framework
* src/qemu/qemu_driver.c: Call out to socket label APIs in
security driver
* src/qemu/qemu_security_stacked.c: Wire up socket label
drivers
* src/security/security_driver.h: Define security driver
entry points for socket labelling
* src/security/security_selinux.c: Set socket label based on
VM label
To ensure that the device addressing scheme is stable across
hotplug/unplug, all virtio serial channels needs to have an
associated port number in their address. This is then specified
to QEMU using the nr=NNN parameter
* src/conf/domain_conf.c, src/conf/domain_conf.h: Parsing
for port number in vioserial address types.
* src/qemu/qemu_conf.c: Set 'nr=NNN' parameter with virtio
serial port number
* tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.args,
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.xml: Expand
data set to ensure coverage of port addressing
QEMU upstream decided against adding a 'reason' field to
the block IO event in QMP. Disable this code to remove a
annoying warning message. It will be renabled when the
error string reason is re-introduced in QEMU
Adjust args to qemudStartVMDaemon() to also specify path to stdin_fd,
so this can be passed to the AppArmor driver via SetSecurityAllLabel().
This updates all calls to qemudStartVMDaemon() as well as setting up
the non-AppArmor security driver *SetSecurityAllLabel() declarations
for the above. This is required for the following
"apparmor-fix-save-restore" patch since AppArmor resolves the passed
file descriptor to the pathname given to open().
See https://bugzilla.redhat.com/show_bug.cgi?id=599091
Saving a paused 512MB domain took 3m47s with the old block size of 512
bytes. Changing the block size to 1024*1024 decreased the time to 56
seconds. (Doubling again to 2048*1024 yielded 0 improvement; lowering
to 512k increased the save time to 1m10s, about 20%)
The pointer to the xml describing the domain is saved into an object
prior to calling VIR_REALLOC_N() to make the size of the memory it
points to a multiple of QEMU_MONITOR_MIGRATE_TO_FILE_BS. If that
operation needs to allocate new memory, the pointer that was saved is
no longer valid.
To avoid this situation, adjust the size *before* saving the pointer.
(This showed up when experimenting with very large values of
QEMU_MONITOR_MIGRATE_TO_FILE_BS).
This patch that adds support for configuring 802.1Qbg and 802.1Qbh
switches. The 802.1Qbh part has been successfully tested with real
hardware. The 802.1Qbg part has only been tested with a (dummy)
server that 'behaves' similarly to how we expect lldpad to 'behave'.
The following changes were made during the development of this patch:
- Merging Scott's v13-pre1 patch
- Fixing endptr related bug while using virStrToLong_ui() pointed out
by Jim Meyering
- Addressing Jim Meyering's comments to v11
- requiring mac address to the vpDisassociateProfileId() function to
pass it further to the 802.1Qbg disassociate part (802.1Qbh untouched)
- determining pid of lldpad daemon by reading it from /var/run/libvirt.pid
(hardcode as is hardcode alson in lldpad sources)
- merging netlink send code for kernel target and user space target
(lldpad) using one function nlComm() to send the messages
- adding a select() after the sending and before the reading of the
netlink response in case lldpad doesn't respond and so we don't hang
- when reading the port status, in case of 802.1Qbg, no status may be
received while things are 'in progress' and only at the end a status
will be there.
- when reading the port status, use the given instanceId and vf to pick
the right IFLA_VF_PORT among those nested under IFLA_VF_PORTS.
- never sending nor parsing IFLA_PORT_SELF type of messages in the
802.1Qbg case
- iterating over the elements in a IFLA_VF_PORTS to pick the right
IFLA_VF_PORT by either IFLA_PORT_PROFILE and given profileId
(802.1Qbh) or IFLA_PORT_INSTANCE_UUID and given instanceId (802.1Qbg)
and reading the current status in IFLA_PORT_RESPONSE.
- recycling a previous patch that adds functionality to interface.c to
- get the vlan identifier on an interface
- get the flags of an interface and some convenience function to
check whether an interface is 'up' or not (not currently used here)
- adding function to determine the root physical interface of an
interface. For example if a macvtap is linked to eth0.100, it will
find eth0. Also adding a function that finds the vlan on the 'way to
the root physical interface'
- conveying the root physical interface name and index in case of 802.1Qbg
- conveying mac address of macvlan device and vlan identifier in
IFLA_VFINFO_LIST[ IFLA_VF_INFO[ IFLA_VF_MAC(mac), IFLA_VF_VLAN(vlan) ] ]
to (future) lldpad via netlink
- To enable build with --without-macvtap rename the
[dis|]associatePortProfileId functions, prepend 'vp' before their
name and make them non-static functions.
- Renaming variable multicast to nltarget_kernel and inverting
the logic
- Addressing Jim Meyering's comments; this also touches existing
code for example for correcting indentation of break statements or
simplification of switch statements.
- Renamed occurrencvirVirtualPortProfileDef to virVirtualPortProfileParamses
- 802.1Qbg part prepared for sending a RTM_SETLINK and getting
processing status back plus a subsequent RTM_GETLINK to
get IFLA_PORT_RESPONSE.
Note: This interface for 802.1Qbg may still change
- [David Allan] move getPhysfn inside IFLA_VF_PORT_MAX to avoid
compiler
warning when latest if_link.h isn't available
- move from Stefan's 802.1Qb{g|h} XML v8 to v9
- move hostuuid and vf index calcs to inside doPortProfileOp8021Qbh
- remove debug fprintfs
- use virGetHostUUID (thanks Stefan!)
- fix compile issue when latest if_link.h isn't available
- change poll timeout to 10s, at 1/8 intervals
- if polling times out, log msg and return -ETIMEDOUT
- Add Stefan's code for getPortProfileStatus
- Poll for up to 2 secs for port-profile status, at 1/8 sec intervals:
- if status indicates error, abort openMacvtapTap
- if status indicates success, exit polling
- if status is "in-progress" after 2 secs of polling, exit
polling loop silently, without error
My patch finishes out the 802.1Qbh parts, which Stefan had mostly complete.
I've tested using the recent kernel updates for VF_PORT netlink msgs and
enic for Cisco's 10G Ethernet NIC. I tested many VMs, each with several
direct interfaces, each configured with a port-profile per the XML. VM-to-VM,
and VM-to-external work as expected. VM-to-VM on same host (using same NIC)
works same as VM-to-VM where VMs are on diff hosts. I'm able to change
settings on the port-profile while the VM is running to change the virtual
port behaviour. For example, adjusting a QoS setting like rate limit. All
VMs with interfaces using that port-profile immediatly see the effect of the
change to the port-profile.
I don't have a SR-IOV device to test so source dev is a non-SR-IOV device,
but most of the code paths include support for specifing the source dev and
VF index. We'll need to complete this by discovering the PF given the VF
linkdev. Once we have the PF, we'll also have the VF index. All this info-
mation is available from sysfs.
Since the macvtap device needs active tear-down and the teardown logic
is based on the interface name, it can happen that if for example 1 out
of 3 interfaces was successfully created, that during the failure path
the macvtap's target device name is used to tear down an interface that
is doesn't own (owned by another VM).
So, in this patch, the target interface name is reset so that there is
no target interface name and the interface name is always cleared after
a tear down.
The hotplug methods still had the qemuCmdFlags variable declared
as an int, instead of unsigned long long. This caused flag checks
to be incorrect for flags > 31
* src/qemu/qemu_driver.c: Fix integer overflow in hotplug
This allows libvirt to open the PCI device sysfs config file prior
to dropping privileges so qemu can access the full config space.
Without this, a de-privileged qemu can only access the first 64
bytes of config space.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Detect support
for pci-assign.configfd option. Use this option when formatting
PCI device string if possible
* src/qemu/qemu_driver.c: Pre-open PCI sysfs config file and pass
to QEMU
We've been running into a lot of situations where
virGetHostname() is returning "localhost", where a plain
gethostname() would have returned the correct thing. This
is because virGetHostname() is *always* trying to canonicalize
the name returned from gethostname(), even when it doesn't
have to.
This patch changes virGetHostname so that if the value returned
from gethostname() is already FQDN or localhost, it returns
that string directly. If the value returned from gethostname()
is a shortened hostname, then we try to canonicalize it. If
that succeeds, we returned the canonicalized hostname. If
that fails, and/or returns "localhost", then we just return
the original string we got from gethostname() and hope for
the best.
Note that after this patch it is up to clients to check whether
"localhost" is an allowed return value. The only place
where it's currently not is in qemu migration.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
This patch parses the following two XML descriptions, one for
802.1Qbg and one for 802.1Qbh, and stores the data internally.
The actual triggering of the switch setup protocol has not been
implemented here but the relevant code to do that should go into
the functions associatePortProfileId() and disassociatePortProfileId().
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbg'>
<parameters managerid='12' typeid='0x123456' typeidversion='1'
instanceid='fa9b7fff-b0a0-4893-8e0e-beef4ff18f8f'/>
</virtualport>
<filterref filter='clean-traffic'/>
</interface>
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbh'>
<parameters profileid='my_profile'/>
</virtualport>
</interface>
I'd suggest to use this patch as a base for triggering the setup
protocol with the 802.1Qb{g|h} switch.
Several rounds of changes were made to this patch. The
following is a list of these changes.
- Renamed structure virVirtualPortProfileDef to virVirtualPortProfileParams
as per Daniel Berrange's request
- Addressing Daniel Berrange's comments:
- removing macvtap.h's dependency on domain_conf.h by
moving the virVirtualPortProfileDef structure into macvtap.h
and not passing virtDomainNetDefPtr to any functions in
macvtap.c
- Addressed most of Chris Wright's comments:
- indicating error in case virtualport XML node cannot be parsed
properly
- parsing hex and decimal numbers using virStrToLong_ui() with
parameter '0' for base
- tgifname (target interface name) variable wasn't necessary
to pass to openMacvtapTap function anymore
- assigning the virtual port data structure to the virDomainNetDef
only if it was previously parsed
- make sure that the error code returned by openMacvtapTap() is a negative n
in case the associatePortProfileId() function failed.
- renaming vsi in the XML to virtualport
- replace all occurrences of vsi in the source as well
- removing mode and MAC address parameters from the functions that
will communicate with the hareware diretctly or indirectly
- moving the associate and disassociate functions to the end of the
file for subsequent patches to easier make them generally available
for export
- passing the macvtap interface name rather than the link device since
this otherwise gives funny side effects when using netlink messages
where IFLA_IFNAME and IFLA_ADDRESS are specified and the link dev
all of a sudden gets the MAC address of the macvtap interface.
- Removing rc = -1 error indications in the case of 802.1Qbg|h setup in case
we wanted to use hook scripts for the setup and so the setup doesn't fail
here.
- if instance ID UUID is not supplied it will automatically be generated
- adapted schema to make instance ID UUID optional
- added test case
- parser and XML generator have been separated into their own
functions so they can be re-used elsewhere (passthrough case
for example)
- Adapted XML parser and generator support the above shown type
(802.1Qbg, 802.1Qbh).
- Adapted schema to above XML
- Adapted test XML to above XML
- Passing through the VM's UUID which seems to be necessary for
802.1Qbh -- sorry no host UUID
- adding virtual function ID to association function, in case it's
necessary to use (for SR-IOV)
Allow for a host UUID in the capabilities XML. Local drivers
will initialize this from the SMBIOS data. If a sanity check
shows SMBIOS uuid is invalid, allow an override from the
libvirtd.conf configuration file
* daemon/libvirtd.c, daemon/libvirtd.conf: Support a host_uuid
configuration option
* docs/schemas/capability.rng: Add optional host uuid field
* src/conf/capabilities.c, src/conf/capabilities.h: Include
host UUID in XML
* src/libvirt_private.syms: Export new uuid.h functions
* src/lxc/lxc_conf.c, src/qemu/qemu_driver.c,
src/uml/uml_conf.c: Set host UUID in capabilities
* src/util/uuid.c, src/util/uuid.h: Support for host UUIDs
* src/node_device/node_device_udev.c: Use the host UUID functions
* tests/confdata/libvirtd.conf, tests/confdata/libvirtd.out: Add
new host_uuid config option to test
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
Basic live migration was broken by the commit that added
non-shared block support in two ways:
1) It added a virCheckFlags() to doNativeMigrate(). Besides
the fact that typical usage of virCheckFlags() is in driver
entry points, and doNativeMigrate() is not an entry point,
it was missing important flags like VIR_MIGRATE_LIVE. Move
the virCheckFlags to the top-level qemuDomainMigratePrepare2
and friends.
2) It also added a memory leak in qemuMonitorTextMigrate()
by not freeing the memory used by virBufferContentAndReset().
This is fixed by storing the pointer in a temporary variable
and freeing it at the end.
With this patch in place, normal live migration works again.
v3: Instead of the churn for virCheckFlagsUI and UL, instead
always promote flags to an unsigned long and always use %lx
for the fprintf.
v2: Add back flags check, which required adding virCheckFlagsUI
and virCheckFlagsUL
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Currently all host audio backends are disabled if a VM is using VNC, in
favor of the QEMU VNC audio extension. Unfortunately no released VNC
client supports this extension, so users have no way of getting audio
to work if using VNC.
Add a new config option in qemu.conf which allows changing libvirt's
behavior, but keep the default intact.
v2: Fix doc typos, change name to vnc_allow_host_audio
The device path doesn't make use of guestAddr, so the memcpy corrupts
the guest info struct.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virDomainGetBlockInfo API allows query physical block
extent and allocated block extent. These are normally the
same value unless storing a special format like qcow2
inside a block device. In this scenario we can query QEMU
to get the actual allocated extent.
Since last time:
- Return fatal error in text monitor
- Only invoke monitor command for block devices
- Fix error handling JSON code
* src/qemu/qemu_driver.c: Fill in block aloction extent when VM
is running
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
API to query the highest block extent via info blockstats
The qemu driver contains a subtle race in the logic to find next
available vnc port. Currently it iterates through all available ports
and returns the first for which bind(2) succeeds. However it is possible
that a previously issued port has not yet been bound by qemu, resulting
in the same port used for a subsequent domain.
This patch addresses the race by using a simple bitmap to "reserve" the
ports allocated by libvirt.
V2:
- Put port bitmap in struct qemud_driver
- Initialize bitmap in qemudStartup
V3:
- Check for failure of virBitmapGetBit
- Additional check for port != -1 before calling virbitmapClearBit
V4:
- Check for failure of virBitmap{Set,Clear}Bit
We need a common internal function for starting managed domains to be
used during autostart. This patch factors out relevant code from
qemudDomainStart into qemudDomainObjStart and makes it use the
refactored code for domain restore instead of calling qemudDomainRestore
API directly.
We need to be able to restore a domain which we already locked and
started a job for it without undoing these steps. This patch factors
out internals of qemudDomainRestore into separate functions which work
for locked objects.
* src/qemu/qemu_conf.c (qemudParseHelpStr): Fix errors that made
it impossible to diagnose invalid minor and micro version number
components.
Signed-off-by: Chris Wright <chrisw@redhat.com>
The current cleanup: in StartVMDaemon path is a poor duplication.
qemuShutdownVMDaemon can handle teardown for inactive VMs, so let's use it.
v2: Remove old abort: label, only use cleanup:
* src/qemu/qemu_conf.c (QEMU_VERSION_STR_1, QEMU_VERSION_STR_2):
Define these instead of...
(QEMU_VERSION_STR): ... this. Remove definition.
(qemudParseHelpStr): Check first for the new, shorter prefix,
"QEMU emulator version", and then for the old one,
"QEMU PC emulator version" when trying to parse the version number.
Based on a patch by Chris Wright.
There doesn't seem to be anything specific to tap devices for this
array of file descriptors which need to stay open of the guest to use.
Rename then for others to make use of.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Do not require each caller of virStorageFileGetMetadata and
virStorageFileGetMetadataFromFD to first clear the storage of the
"meta" buffer. Instead, initialize that storage in
virStorageFileGetMetadataFromFD.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Clear
"meta" here, not before each of the following callers.
* src/qemu/qemu_driver.c (qemuSetupDiskCgroup): Don't clear "meta" here.
(qemuTeardownDiskCgroup): Likewise.
* src/qemu/qemu_security_dac.c (qemuSecurityDACSetSecurityImageLabel):
Likewise.
* src/security/security_selinux.c (SELinuxSetSecurityImageLabel):
Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
Approximately 60 messages were marked. Since these diagnostics are
intended solely for developers and maintainers, encouraging translation
is deemed to be counterproductive:
http://thread.gmane.org/gmane.comp.emulators.libvirt/25050/focus=25052
Run this command:
git grep -l VIR_WARN|xargs perl -pi -e \
's/(VIR_WARN0?)\s*\(_\((".*?")\)/$1($2/'
There were three very similar uses of qemuMonitorAddDrive.
This change makes the three 17-line sequences identical.
* src/qemu/qemu_driver.c (qemudDomainAttachPciDiskDevice): Detect
failure. Add VIR_WARN and braces.
(qemudDomainAttachSCSIDisk): Add VIR_WARN and braces.
(qemudDomainAttachUsbMassstorageDevice): Likewise.
History has shown that there are frequent bugs in the QEMU driver
code leading to the monitor being invoked with a NULL pointer.
Although the QEMU driver code should always report an error in
this case before invoking the monitor, as a safety net put in a
generic check in the monitor code entry points.
* src/qemu/qemu_monitor.c: Safety net to check for NULL monitor
object
Any method which intends to invoke a monitor command must have
a check for virDomainObjIsActive() before using the monitor to
ensure that priv->mon != NULL.
There is one subtle edge case in this though. If a method invokes
multiple monitor commands, and calls qemuDomainObjExitMonitor()
in between two of these commands then there is no guarentee that
priv->mon != NULL anymore. This is because the QEMU process may
exit or die at any time, and because qemuDomainObjEnterMonitor()
releases the lock on virDomainObj, it is possible for the background
thread to close the monitor handle and thus qemuDomainObjExitMonitor
will release the last reference allowing priv->mon to become NULL.
This affects several methods, most notably migration but also some
hotplug methods. This patch takes a variety of approaches to solve
the problem, depending on the particular usage scenario. Generally
though it suffices to add an extra virDomainObjIsActive() check
if qemuDomainObjExitMonitor() was called during the method.
* src/qemu/qemu_driver.c: Fix multiple potential NULL pointer flaws
in usage of the monitor
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Upon look-up failure,
i.e., vm==NULL, goto cleanup, rather than to "endjob", superficially
since the latter would dereference vm, but more fundamentally because
we certainly don't want to call qemuDomainObjEndJob before we've
even attempted qemuDomainObjBeginJob.
(gdb) p/x QEMUD_CMD_FLAG_VNET_HOST
$7 = 0xffffffff80000000
Oops - that meant we were incorrectly setting QEMU_CMD_FLAG_RTC_TD_HACK
for qemu-kvm-0.12.3 (and probably botching a few other settings as well).
Fixes Red Hat BZ#592070
* src/qemu/qemu_conf.h (QEMUD_CMD_FLAG_VNET_HOST): Avoid sign
extension.
* tests/qemuhelpdata/qemu-kvm-0.12.3: New file.
* tests/qemuhelptest.c (mymain): Add another case.
A fedora translator filed:
https://bugzilla.redhat.com/show_bug.cgi?id=580816
Pointing out these two error messages as unclear: "write save" sounds
like a typo without context, and lack of a colon made the second message
difficult to parse.
virFileResolveLink was returning a positive value on error,
thus confusing callers that assumed failure was < 0. The
confusion is further evidenced by callers that would have
ended up calling virReportSystemError with a negative value
instead of a valid errno.
Fixes Red Hat BZ #591363.
* src/util/util.c (virFileResolveLink): Live up to documentation.
* src/qemu/qemu_security_dac.c
(qemuSecurityDACRestoreSecurityFileLabel): Adjust callers.
* src/security/security_selinux.c
(SELinuxRestoreSecurityFileLabel): Likewise.
* src/storage/storage_backend_disk.c
(virStorageBackendDiskDeleteVol): Likewise.
qemuReadLogOutput early VM death detection is racy and won't always work.
Startup then errors when connecting to the VM monitor. This won't report
the emulator cmdline output which is typically the most useful diagnostic.
Check if the VM has died at the very end of the monitor connection step,
and if so, report the cmdline output.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=581381
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Avoid NULL-deref
upon unknown UUID. Call qemuDomainObjBeginJob(vm) only after
ensuring that vm != NULL, not before. This potential NULL-deref
was introduced by commit 2c555d87b0.
The code specifies driver->cacheDir as the format string,
but it usually doesn't contain '%s', so the subsequent
argument, "/qemu.mem.XXXXXX", is always ignored.
The patch fixes the misuse.
Setting dynamic_ownership=0 in /etc/libvirt/qemu.conf prevents
libvirt's DAC security driver from setting uid/gid on disk
files when starting/stopping QEMU, allowing the admin to manage
this manually. As a side effect it also stopped setting of
uid/gid when saving guests to a file, which completely breaks
save when QEMU is running non-root. Thus saved state labelling
code must ignore the dynamic_ownership parameter
* src/qemu/qemu_security_dac.c: Ignore dynamic_ownership=0 when
doing save/restore image labelling
When QEMU runs with its disk on NFS, and as a non-root user, the
disk is chownd to that non-root user. When migration completes
the last step is shutting down the QEMU on the source host. THis
normally resets user/group/security label. This is bad when the
VM was just migrated because the file is still in use on the dest
host. It is thus neccessary to skip the reset step for any files
found to be on a shared filesystem
* src/libvirt_private.syms: Export virStorageFileIsSharedFS
* src/util/storage_file.c, src/util/storage_file.h: Add a new
method virStorageFileIsSharedFS() to determine if a file is
on a shared filesystem (NFS, GFS, OCFS2, etc)
* src/qemu/qemu_driver.c: Tell security driver not to reset
disk labels on migration completion
* src/qemu/qemu_security_dac.c, src/qemu/qemu_security_stacked.c,
src/security/security_selinux.c, src/security/security_driver.h,
src/security/security_apparmor.c: Add ability to skip disk
restore step for files on shared filesystems.
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
QEMU is gaining a new monitor command netdev_add for hotplugging
NICs using the netdev backend code. We already support this on
the command this, though it is disabled. This adds support for
hotplug too, also to remain disabled until 0.13 QEMU is released
* src/qemu/qemu_driver.c: Support netdev hotplug for NICs
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for netdev_add and netdev_remove commands
When closing a monitor using qemuMonitorClose(), we are aware of
the possibility the monitor is still being used somewhere:
/* NB: ordinarily one might immediately set mon->watch to -1
* and mon->fd to -1, but there may be a callback active
* that is still relying on these fields being valid. So
* we merely close them, but not clear their values and
* use this explicit 'closed' flag to track this state */
but since we call virEventAddHandle() on that monitor without increasing
its ref counter, the monitor is still freed which makes possible users
of it quite unhappy. The unhappiness can lead to a hang if qemuMonitorIO
tries to lock mutex which no longer exists.
With the introduction of the generic qemu device model, unplugging
SCSI disks works like a charm, so support it in libvirt.
* src/qemu/qemu_driver.c: Add qemudDomainDetachSCSIDiskDevice() to do the
unplugging, extend qemudDomainDetachDeviceAdd().
Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Support for live migration between hosts that do not share storage was
added to qemu-kvm release 0.12.1.
It supports two flags:
-b migration without shared storage with full disk copy
-i migration without shared storage with incremental copy (same base image
shared between source and destination).
I tested the live migration without shared storage (both flags) for native
and p2p with and without tunnelling. I also verified that the fix doesn't
affect normal migration with shared storage.
WIN32 is always defined when __MINGW32__ is defined, but the
converse is not true. WIN32 is more generic, if someone were
to ever attempt porting to a microsoft compiler. This does
not affect Cygwin, which intentionally does not define WIN32.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Use more
generic flag macro.
* src/storage/storage_backend.c
(virStorageBackendUpdateVolTargetInfoFD)
(virStorageBackendRunProgRegex): Likewise.
* tools/console.h (vshRunConsole): Likewise.
This introduces a new event type
VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
This event is the same as the previous VIR_DOMAIN_ID_IO_ERROR
event, but also includes a string describing the cause of
the event.
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventIOErrorReasonCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *srcPath,
const char *devAlias,
int action,
const char *reason,
void *opaque);
This is currently wired up to the QEMU block IO error events
* daemon/remote.c: Dispatch IO error events to client
* examples/domain-events/events-c/event-test.c: Watch for
IO error events
* include/libvirt/libvirt.h.in: Define new IO error event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle IO error events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for block IO errors and emit a libvirt IO error event
* src/remote/remote_driver.c: Receive and dispatch IO error
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
IO error events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for BLOCK_IO_ERROR event
from QEMU monitor
When using -device syntax, the IO event will have a different
prefix, 'drive-' that needs to be skipped over before matching
against the libvirt disk alias
* src/qemu/qemu_driver.c: Skip QEMU_DRIVE_HOST_PREFIX in IO event
This defines the internal driver API and stubs out each driver
* src/driver.h: Define virDrvDomainGetBlockInfo signature
* src/libvirt.c, src/libvirt_public.syms: Glue public API to drivers
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c, src/xenapi/xenapi_driver.c: Stub out driver
qemuDomainPCIAddressSetFree was freeing up the hash
table for the pci addresses, but not freeing up the addr
structure. Looking over the callers of this function, it
seems like they expect it to also free up the structure,
so do that here.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The previous commit changes a goto from 'endjob' to 'cleanup',
leaving the endjob label unused. Remove it to avoid compile
warning.
* src/qemu/qemu_driver.c: Remove 'endjob' label
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): When setting
"vm" to NULL, jump over vm-dereferencing code to "cleanup".
(qemuDomainRevertToSnapshot): Likewise.
In cases where the security driver failed to restore a label after a
guest has saved, we mistakenly jumped to the error cleanup paths.
This is not good, because the operation has in fact completed and
cannot be rolled back completely. Label restore is non-critical, so
just log the problem instead. Also add a missing restore call in
the error cleanup path
* src/qemu/qemu_driver.c: Fix handling of security driver
restore failures in QEMU domain save
When cgroups is enabled, access to block devices is likely to be
restricted to a whitelist. Prior to saving a guest to a block device,
it is necessary to add the block device to the whitelist. This is
not required upon restore, since QEMU reads from stdin
* src/qemu/qemu_driver.c: Add block device to cgroups whitelist
if neccessary during domain save.
The save process was relying on use of the shell >> append
operator to ensure the save data was placed after the libvirt
header + XML. This doesn't work for block devices though.
Replace this code with use of 'dd' and its 'seek' parameter.
This means that we need to pad the header + XML out to a
multiple of dd block size (in this case we choose 512).
The qemuMonitorMigateToCommand() monitor API is used for both
save/coredump, and migration via UNIX socket. We can't simply
switch this to use 'dd' since this causes problems with the
migration usage. Thus, create a dedicated qemuMonitorMigateToFile
which can accept an filename + offset, and remove the filename
from the current qemuMonitorMigateToCommand() API
* src/qemu/qemu_driver.c: Switch to qemuMonitorMigateToFile
for save and core dump
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Create
a new qemuMonitorMigateToFile, separate from the existing
qemuMonitorMigateToCommand to allow handling file offsets
It is possible to use block devices with domain save/restore. Upon
failure QEMU unlinks the path being saved to. This isn't good when
it is a block device !
* src/qemu/qemu_driver.c: Don't unlink block devices if save fails
If a transient QEMU crashes during save attempt, then the virDomainPtr
object may be freed. If a persistent QEMU crashes during save, then
the 'priv->mon' field is no longer valid since it will be inactive.
* src/qemu/qemu_driver.c: Fix two crashes when QEMU exits
during a save attempt
In particular I was forgetting to take the qemuMonitorPrivatePtr
lock (via qemuDomainObjBeginJob), which would cause problems
if two users tried to access the same domain at the same time.
This patch also fixes a problem where I was forgetting to remove
a transient domain from the list of domains.
Thanks to Stephen Shaw for pointing out the problem and testing
out the initial patch.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
With JSON qemu monitor, we get a STOP event from qemu whenever qemu
stops guests CPUs. The downside of it is that vm->state is changed to
PAUSED and a new generic paused event is send to applications. However,
when we ask qemu to stop the CPUs we are not really interested in qemu
event and we usually want to issue a more specific event.
By setting vm->status to PAUSED before actually sending the request to
qemu (and resetting it back if the request fails) we can ignore the
event since the event handler does nothing when the guest is already
paused. This solution is quite hacky but unfortunately it's the best
solution which I was able to come up with and it doesn't introduce a
race condition.
While doing some testing of the snapshot code I noticed that
if qemuDomainSnapshotLoad failed, it would print a NULL as
part of the error. That's not desirable, so leave the
full_path variable around until after we are done printing
errors.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
It's not needed and is currently ignored, but this is a bug.
It will get fixed soon and QMP will return an error for keys
it doesn't know about, this will break libvirt.
* src/qemu/qemu_monitor_json.c: remove qemuMonitorJSONCommandAddTimestamp()
and the place where it's invoked in qemuMonitorJSONMakeCommand()
* The error messages coming from qemu's DAC support contain strings
from the original SELinux security driver code. This just removes
references to "security context" and other SELinux-isms from the DAC
code.
Signed-off-by: Spencer Shimko <sshimko@tresys.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The hang fix in d376b7d63e was incomplete
since it left quite a few {Enter,Exit}Monitor calls which require driver
to be unlocked. Since the driver is locked throughout the whole
function, {Enter,Exit}MonitorWithDriver need to be used instead to
ensure driver is not locked when issuing monitor commands.
The comment in qemuDomainWaitForMigrationComplete says we are polling
every 50ms but the code sleeps only for 50us. This was already discussed
during review but apparently forgotten when the series was pushed.
The text monitor code was checking for a '\n' prefix on several
places. Previously this would work, but since the monitor code
re-write the '\n' is already stripped off, so mustn't be checked
for.
* src/qemu/qemu_monitor_text.c: Fix monitor error checking
Probably as a result of a merge error, the CPU hotplug command
names were completely wrong.
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Fix
the CPU hotplug command names
Adds ability to provide a preferred CPU model for CPUID data decoding.
Such model would be considered as the best possible model (if it's
supported by hypervisor) regardless on number of features which have to
be added or removed for describing required CPU.
So far, when CPUID data were converted into CPU model and features, the
features can only be added to the model. As a result, when a guest asked
for something like "qemu64,-svm" it would get a qemu32 plus a bunch of
additional features instead.
This patch adds support for removing feature from the base model.
Selection algorithm remains the same: the best CPU model is the model
which requires lowest number of features to be added/removed from it.
Qemu committed a patch which list some CPU names in [] when asked for
supported CPUs (qemu -cpu ?). Yet, it needs such CPUs to be passed
without those square braces. When probing for supported CPU models, we
can just strip the square braces and pretend we have never seen them.
First, inital VCPU pinning is set correctly but then it is reset by
assigning qemu process to a new cgroup (which contains all CPUs). It's
easily fixed by swapping these two actions.
The initial boot of VMs uses -device for NICs where available. The
corresponding monitor command is device_add, but the network hotplug
code was still using device_del by mistake.
* src/qemu/qemu_driver.c: Use device_add for NIC hotplug where
available
If either of the getfd or host_net_add monitor commands return
any text, this indicates an error condition. Don't ignore this!
* src/qemu/qemu_monitor_text.c: Report errors for getfd and
host_net_add
The 'device_del' command expects a parameter called 'id' but we
were passing 'config'.
* src/qemu/qemu_monitor_json.c: Fix device_del command parameter
Disk devices in QEMU have two parts, the guest device and the host
backend driver. Historically these two parts have had the same
"unique" name. With the switch to using -device though, they now
have separate names. Thus when changing CDROM media, for guests
using -device syntax, we need to prepend the QEMU_DRIVE_HOST_PREFIX
constant
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Add helper function
qemuDeviceDriveHostAlias() for building a host backend alias
* src/qemu/qemu_driver.c: Use qemuDeviceDriveHostAlias() to determine
the host backend alias for performing eject/change commands in the
monitor
The device_add command was added in JSON mode in a way I didn't
expect. Instead of passing the normal device string to the JSON
command:
{ "execute": "device_add", "arguments": { "device": "ne2k_pci,id=nic.1,netdev=net.1" } }
We need to split up the device string into a full JSON object
{ "execute": "device_add", "arguments": { "driver": "ne2k_pci", "id": "nic.1", "netdev": "net.1" } }
* src/qemu/qemu_conf.h, src/qemu/qemu_conf.c: Rename the
qemuCommandLineParseKeywords method to qemuParseKeywords
and export it to monitor
* src/qemu/qemu_monitor_json.c: Split up device string into
a JSON object for device_add command
The parameter for the qemuMonitorDeviceDel() is a device alias,
not a device config string. Rename the parameter reflect this
and avoid confusion to readers.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Rename devicestr to devalias in qemuMonitorDeviceDel()
The QEMU developers have stated that they will not be porting
the commands 'pci_add', 'pci_del', 'usb_add', 'usb_del' to the
JSON mode monitor, since they're obsoleted by 'device_add'
and 'device_del'. libvirt has (untested) code that would have
supported those commands in theory, but since we already use
device_add/del where available, there's no need to keep the
legacy stuff anymore.
The text mode monitor keeps support for all commands for sake
of historical compatability.
* src/qemu/qemu_monitor_json.c: Remove 'pci_add', 'pci_del',
'usb_add', 'usb_del' commands
The QEMU driver is mistakenly calling directly into the text
mode monitor for the domain memory stats query.
* src/qemu/qemu_driver.c: Replace qemuMonitorTextGetMemoryStats with
qemuMonitorGetMemoryStats
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add the new
wrapper for qemuMonitorGetMemoryStats
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Add
qemuMonitorJSONGetMemoryStats implementation
Instead of reporting VIR_ERR_INTERNAL_ERROR use the more specific
VIR_ERR_CONFIG_UNSUPPORTED
* src/qemu/qemu_conf.c: Report VIR_ERR_CONFIG_UNSUPPORTED for
unsupported video adapters
To avoid race-conditions, the tear down of a filter has to happen before
the tap interface disappears and another tap interface with the same
name can re-appear. This patch tries to fix this. In one place, where
communication with the qemu monitor may fail, I am only tearing the
filters down after knowing that the function did not fail.
I am also moving the tear down functions into an include file for other
drivers to reuse.
* src/qemu/qemu_driver.c (qemudDomainAttachSCSIDisk):
Initialize "cont" to NULL, so clang knows it's set.
Add an sa_assert so it knows it's non-NULL when dereferenced.
In a couple of cases typos meant we were firing the wrong type
of event. In the python code my previous commit accidentally
missed some chunks of the code.
* python/libvirt-override-virConnect.py: Add missing python glue
accidentally left out of previous commit
* src/conf/domain_event.c, src/qemu/qemu_monitor_json.c: Fix typos
in event name / method name to invoke
Currently when we attempt to change the cdrom in a qemu VM the monitor
doesn't generate an error if the target filename doesn't exist. I've
submitted a patch[1] for this. This patch is the libvirt qemu-driver
side which catches the error message from the monitor and reportes the
error to libvirt. This means that virsh attach-disk cdrom commands
won't appear to succeed when qemu change command actually failed.
* src/qemu/qemu_monitor_text.c: in qemuMonitorTextChangeMedia() look
for failure to access the new data
When qemu libvirt driver doesn't support guest CPU selection with given
qemu binary, guests requiring specific CPU should fail to start instead
of being silently supplied with a default CPU.
* src/qemu/qemu_driver.c (qemudStartVMDaemon): Initialize "logfile"
to ensure that we don't use it uninitialized -- thus closing an
arbitrary file descriptor -- in the cleanup block.
When starting up qemu VNC autoport guests, we were
only looking through ports 5900 to 6000, meaning we
were limited to 100 total clients. Increase that
limit to 65535 (the last available port), so we can
have up to 59635 VNC autoport guests.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The images are saved in /var/lib/libvirt/qemu/save/
and named $domainname.save . The directory is created appropriately
at daemon startup. When a domain is started while a saved image is
available, libvirt will try to load this saved image, and start the
domain as usual in case of failure. In any case the saved image is
discarded once the domain is created.
* src/qemu/qemu_conf.h: adds an extra save path to the driver config
* src/qemu/qemu_driver.c: implement the 3 new operations and handling
of the image directory
virDomainManagedSave() is to be run on a running domain. Once the call
complete, as in virDomainSave() the domain is stopped upon completion,
but there is no restore counterpart as any order to start the domain
from the API would load the state from the managed file, similary if
the domain is autostarted when libvirtd starts.
Once a domain has restarted his managed save image is destroyed,
basically managed save image can only exist for a stopped domain,
for a running domain that would be by definition outdated data.
* include/libvirt/libvirt.h.in src/libvirt.c src/libvirt_public.syms:
adds the new entry points virDomainManagedSave(),
virDomainHasManagedSaveImage() and virDomainManagedSaveRemove()
* src/driver.h src/esx/esx_driver.c src/lxc/lxc_driver.c
src/opennebula/one_driver.c src/openvz/openvz_driver.c
src/phyp/phyp_driver.c src/qemu/qemu_driver.c src/vbox/vbox_tmpl.c
src/remote/remote_driver.c src/test/test_driver.c src/uml/uml_driver.c
src/xen/xen_driver.c: add corresponding new internal drivers entry
points
The clock timer XML is being updated in the following ways (based on
further off-list discussion that was missed during the initial
implementation):
1) 'wallclock' is changed to 'track', and the possible values are 'boot'
(corresponds to old 'host'), 'guest', and 'wall'.
2) 'mode' has an additional value 'smpsafe'
3) when tickpolicy='catchup', there can be an optional sub-element of
timer called 'catchup':
<catchup threshold=123 slew=120 limit=10000/>
Those three values are all longs, always optional, and if they are present,
they are positive. Internally, 0 indicates "unspecified".
* docs/schemas/domain.rng: updated RNG definition to account for changes
* src/conf/domain_conf.h: change the C struct and enums to match changes.
* src/conf/domain_conf.c: timer parse and format functions changed to
handle the new selections and new element.
* src/libvirt_private.syms: *TimerWallclock* changes to *TimerTrack*
* src/qemu/qemu_conf.c: again, account for Wallclock --> Track change.
virFileReadLimFD is a poor fit for reading the header
of the restore file. The problem is that virFileReadLimFD
returns an error when there is more data after the amount
you ask to read, but that is *expected* in this case.
This patch is essentially a revert of
1a4d5c9543, but I don't think
that commit does what it says anyway. It purports to prevent
an unwarranted OOM error, but since virFileReadLimFD will
allocate memory up to the maximum anyway, the upper limit
on the total amount of memory allocated is the same for either
the old version or the new version. Since the old saferead
actually works and virFileReadLimFD does not, revert to
using saferead.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When a watchdog/IO error occurs, one of the possible actions that
QEMU might take is to pause the guest. In this scenario libvirt
needs to update its internal state for the VM, and emit a
lifecycle event:
VIR_DOMAIN_EVENT_SUSPENDED
with a detail being one of:
VIR_DOMAIN_EVENT_SUSPENDED_IOERROR
VIR_DOMAIN_EVENT_SUSPENDED_WATCHDOG
To future proof against possible QEMU support for multiple monitor
consoles, this patch also hooks into the 'STOPPED' event in QEMU
and emits a generic VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event
* include/libvirt/libvirt.h.in: Add VIR_DOMAIN_EVENT_SUSPENDED_IOERROR
* src/qemu/qemu_driver.c: Update VM state to paused when IO error
or watchdog events occurrs
* src/qemu/qemu_monitor_json.c: Fix typo in disk IO event name
virStrToLong* guarantees (via strtol) that the end pointer will be set
to the point at which parsing stopped (even on failure, this point is
the start of the input string).
* src/esx/esx_driver.c (esxGetVersion): Remove pointless
conditional.
* src/qemu/qemu_conf.c (qemuParseCommandLinePCI)
(qemuParseCommandLineUSB, qemuParseCommandLineSmp): Likewise.
* src/qemu/qemu_monitor_text.c
(qemuMonitorTextGetMigrationStatus): Likewise.
Since the timers are defined to cover all possible config cases for
several different hypervisors, many of these possibilities generate an
error on qemu. Here is what is currently supported:
RTC: If the -rtc commandline option is available, allow setting
"clock=host"
or "clock=vm" based on the rtc timer clock='host|guest' value. Also
add "driftfix=slew" if the tickpolicy is 'catchup', or add nothing
if
tickpolicy is 'delay'. (Other tickpolicies will raise an error).
If -rtc isn't available, but -rtc-td-hack is, add that option
if the tickpolicy is 'catchup', add -rtc-td-hack, if it is 'delay'
add nothing, and if it's anything else, raise an error.
PIT: If -no-kvm-pit-reinjection is available, and tickpolicy is
'delay', add that option. if tickpolicy is 'catchup', do
nothing. Anything else --> raise an error.
If -no-kvm-pit-reinjection *isn't* available, but -tdf is, when
tickpolicy is 'catchup' add -tdf. If it's 'delay', do
nothing. Anything else --> raise an error.
If neither of those commandline options is available, and
tickpolicy is anything other than 'delay' (or unspecified), raise
an error.
HPET: If -no-hpet flag is available and present='no', add -no-hpet.
If -no-hpet is not available, and present='yes', raise an error.
If present is unspecified, the default is to do whatever this
particular qemu does by default, so don't raise an error.
All other timer types are unsupported by QEMU, so they will raise an
error.
* src/qemu/qemu_conf.c: extend qemuBuildClockArgStr() to generate the
command line arguments for the new options
* src/qemu/qemu_conf.h: define 4 new flags
* src/qemu/qemu_conf.c: check the help text of qemu for presence of
features indicated by each flag.
* tests/qemuhelptest.c: add appropriate flags into the masks for each test
The QEMU cpu affinity is used in NUMA scenarios to ensure that
guest memory is allocated from a specific node. Normally memory
is allocate on demand in vCPU threads, but when using hugepages
the initial thread leader allocates memory upfront. libvirt was
not setting affinity of the thread leader, or I/O threads. This
patch changes the code to set the process affinity in between
the fork()/exec() of QEMU. This ensures that every single QEMU
thread gets the affinity
* src/qemu/qemu_driver.c: Set affinity on entire QEMU process
at startup
Right now this implements only 2 basic hooks:
- before the qemu process is being launched
- after the qemu process is terminated
the XML description of the domain is passed to the hook script stdin
/etc/libvirt/hook/qemu
* src/qemu/qemu_driver.c: implement synchronous script hooks for QEmu
at domain startup and end
This flag is used in migration prepare step to send updated XML
definition of a guest.
Also ``virsh dumpxml --update-cpu [--inactive] guest'' command can be
used to see the updated CPU requirements.
When a domain is defined on host1, migrated to host2 and then migrated
back to host1, its current configuration would overwrite the libvirtd's
in-memory copy of persistent configuration of that domain. This is not
desired as we want to preserve the persistent configuration untouched.
This patch introduces new 'live' parameter to virDomainAssignDef.
Passing 'true' for 'live' means the configuration passed to
virDomainAssignDef describes a configuration of live instance of the
domain. This applies for saved domains which are being restored or for
incoming domains during migration.
All callers have been changed to pass the appropriate value.
* Fixes per feedback from Dan and Daniel
* Added test datafiles
* Re-disabled JSON flags
* Added code to print the error policy attribute when generating XML
* Re-add empty tag
Add support for Qemu to have firewall rules applied and removed on VM
startup and shutdown respectively. This patch also provides support for
the updating of a filter that causes all VMs that reference the filter
to have their ebtables/iptables rules updated.
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
To find out where the net type 'direct' needs to be handled I introduced
the 'enum virDomainNetType' in the virDomainNetDef structure and let the
compiler tell me where the case statement is missing. Then I added the
unhandled device statement to the UML driver.
* src/conf/domain_conf.h: change _virDomainNetDef type from int to
virDomainNetType enum
* src/conf/domain_conf.c src/lxc/lxc_driver.c src/qemu/qemu_conf.c
src/uml/uml_conf.c: make sure all enum cases are properly handled
in switches
Use the new virDomainUpdateDeviceFlags API to allow the VNC password
to be changed on the fly
* src/internal.h: Define STREQ_NULLABLE() which is like STREQ()
but does not crash if either argument is NULL, and treats two
NULLs as equal.
* src/libvirt_private.syms: Export virDomainGraphicsTypeToString
* src/qemu/qemu_driver.c: Support VNC password change on a live
machine
* src/qemu/qemu_monitor.c: Disable crazy debugging info. Treat a
NULL password as "" (empty string), allowing passwords to be
disabled in the monitor
To allow the new virDomainUpdateDeviceFlags() API to be universally
used with all drivers, this patch adds an impl to all the current
drivers which support CDROM or Floppy disk media change via the
current virDomainAttachDeviceFlags API
* src/qemu/qemu_driver.c, src/vbox/vbox_tmpl.c,
src/xen/proxy_internal.c, src/xen/xen_driver.c,
src/xen/xend_internal.c: Implement media change via the
virDomainUpdateDeviceFlags API
* src/xen/xen_driver.h, src/xen/xen_hypervisor.c,
src/xen/xen_inotify.c, src/xen/xm_internal.c,
src/xen/xs_internal.c: Stubs for Xen driver entry points
The current virDomainAttachDevice API can be (ab)used to change
the media of an existing CDROM/Floppy device. Going forward there
will be more devices that can be configured on the fly and overloading
virDomainAttachDevice for this is not too pleasant. This patch adds
a new virDomainUpdateDeviceFlags() explicitly just for modifying
existing devices.
* include/libvirt/libvirt.h.in: Add virDomainUpdateDeviceFlags
* src/driver.h: Internal API for virDomainUpdateDeviceFlags
* src/libvirt.c, src/libvirt_public.syms: Glue public API to
driver API
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c, src/uml/uml_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c, src/xenapi/xenapi_driver.c: Add
stubs for new driver entry point
This introduces a new event type
VIR_DOMAIN_EVENT_ID_GRAPHICS
The same event can be emitted in 3 scenarios
typedef enum {
VIR_DOMAIN_EVENT_GRAPHICS_CONNECT = 0,
VIR_DOMAIN_EVENT_GRAPHICS_INITIALIZE,
VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT,
} virDomainEventGraphicsPhase;
Connect/disconnect are triggered at socket accept/close.
The initialize phase is immediately after the protocol
setup and authentication has completed. ie when the
client is authorized and about to start interacting with
the graphical desktop
This event comes with *a lot* of potential information
- IP address, port & address family of client
- IP address, port & address family of server
- Authentication scheme (arbitrary string)
- Authenticated subject identity. A subject may have
multiple identities with some authentication schemes.
For example, vencrypt+sasl results in a x509dname
and saslUsername identities.
This results in a very complicated callback :-(
typedef enum {
VIR_DOMAIN_EVENT_GRAPHICS_ADDRESS_IPV4,
VIR_DOMAIN_EVENT_GRAPHICS_ADDRESS_IPV6,
} virDomainEventGraphicsAddressType;
struct _virDomainEventGraphicsAddress {
int family;
const char *node;
const char *service;
};
typedef struct _virDomainEventGraphicsAddress virDomainEventGraphicsAddress;
typedef virDomainEventGraphicsAddress *virDomainEventGraphicsAddressPtr;
struct _virDomainEventGraphicsSubject {
int nidentity;
struct {
const char *type;
const char *name;
} *identities;
};
typedef struct _virDomainEventGraphicsSubject virDomainEventGraphicsSubject;
typedef virDomainEventGraphicsSubject *virDomainEventGraphicsSubjectPtr;
typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn,
virDomainPtr dom,
int phase,
virDomainEventGraphicsAddressPtr local,
virDomainEventGraphicsAddressPtr remote,
const char *authScheme,
virDomainEventGraphicsSubjectPtr subject,
void *opaque);
The wire protocol is similarly complex
struct remote_domain_event_graphics_address {
int family;
remote_nonnull_string node;
remote_nonnull_string service;
};
const REMOTE_DOMAIN_EVENT_GRAPHICS_IDENTITY_MAX = 20;
struct remote_domain_event_graphics_identity {
remote_nonnull_string type;
remote_nonnull_string name;
};
struct remote_domain_event_graphics_msg {
remote_nonnull_domain dom;
int phase;
remote_domain_event_graphics_address local;
remote_domain_event_graphics_address remote;
remote_nonnull_string authScheme;
remote_domain_event_graphics_identity subject<REMOTE_DOMAIN_EVENT_GRAPHICS_IDENTITY_MAX>;
};
This is currently implemented in QEMU for the VNC graphics
protocol, but designed to be usable with SPICE graphics in
the future too.
* daemon/remote.c: Dispatch graphics events to client
* examples/domain-events/events-c/event-test.c: Watch for
graphics events
* include/libvirt/libvirt.h.in: Define new graphics event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle graphics events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for VNC events and emit a libvirt graphics event
* src/remote/remote_driver.c: Receive and dispatch graphics
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
graphics events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for VNC_CONNECTED,
VNC_INITIALIZED & VNC_DISCONNETED events from QEMU monitor
This introduces a new event type
VIR_DOMAIN_EVENT_ID_IO_ERROR
This event includes the action that is about to be taken
as a result of the watchdog triggering
typedef enum {
VIR_DOMAIN_EVENT_IO_ERROR_NONE = 0,
VIR_DOMAIN_EVENT_IO_ERROR_PAUSE,
VIR_DOMAIN_EVENT_IO_ERROR_REPORT,
} virDomainEventIOErrorAction;
In addition it has the source path of the disk that had the
error and its unique device alias. It does not include the
target device name (/dev/sda), since this would preclude
triggering IO errors from other file backed devices (eg
serial ports connected to a file)
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventIOErrorCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *srcPath,
const char *devAlias,
int action,
void *opaque);
This is currently wired up to the QEMU block IO error events
* daemon/remote.c: Dispatch IO error events to client
* examples/domain-events/events-c/event-test.c: Watch for
IO error events
* include/libvirt/libvirt.h.in: Define new IO error event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle IO error events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for block IO errors and emit a libvirt IO error event
* src/remote/remote_driver.c: Receive and dispatch IO error
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
IO error events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for BLOCK_IO_ERROR event
from QEMU monitor
This introduces a new event type
VIR_DOMAIN_EVENT_ID_WATCHDOG
This event includes the action that is about to be taken
as a result of the watchdog triggering
typedef enum {
VIR_DOMAIN_EVENT_WATCHDOG_NONE = 0,
VIR_DOMAIN_EVENT_WATCHDOG_PAUSE,
VIR_DOMAIN_EVENT_WATCHDOG_RESET,
VIR_DOMAIN_EVENT_WATCHDOG_POWEROFF,
VIR_DOMAIN_EVENT_WATCHDOG_SHUTDOWN,
VIR_DOMAIN_EVENT_WATCHDOG_DEBUG,
} virDomainEventWatchdogAction;
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventWatchdogCallback)(virConnectPtr conn,
virDomainPtr dom,
int action,
void *opaque);
* daemon/remote.c: Dispatch watchdog events to client
* examples/domain-events/events-c/event-test.c: Watch for
watchdog events
* include/libvirt/libvirt.h.in: Define new watchdg event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle watchdog events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for watchdogs and emit a libvirt watchdog event
* src/remote/remote_driver.c: Receive and dispatch watchdog
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
watchdog events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for WATCHDOG event
from QEMU monitor
This introduces a new event type
VIR_DOMAIN_EVENT_ID_RTC_CHANGE
This event includes the new UTC offset measured in seconds.
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventRTCChangeCallback)(virConnectPtr conn,
virDomainPtr dom,
long long utcoffset,
void *opaque);
If the guest XML configuration for the <clock> is set to
offset='variable', then the XML will automatically be
updated with the new UTC offset value. This ensures that
during migration/save/restore the new offset is preserved.
* daemon/remote.c: Dispatch RTC change events to client
* examples/domain-events/events-c/event-test.c: Watch for
RTC change events
* include/libvirt/libvirt.h.in: Define new RTC change event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle RTC change events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for RTC changes and emit a libvirt RTC change event
* src/remote/remote_driver.c: Receive and dispatch RTC change
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
RTC change events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for RTC_CHANGE event
from QEMU monitor
The reboot event is not a normal lifecycle event, since the
virtual machine on the host does not change state. Rather the
guest OS is resetting the virtual CPUs. ie, the QEMU process
does not restart. Thus, this does not belong in the current
lifecycle events callback.
This introduces a new event type
VIR_DOMAIN_EVENT_ID_REBOOT
It takes no parameters, besides the virDomainPtr, so it can
use the generic callback signature.
* daemon/remote.c: Dispatch reboot events to client
* examples/domain-events/events-c/event-test.c: Watch for
reboot events
* include/libvirt/libvirt.h.in: Define new reboot event ID
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle reboot events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for reboots and emit a libvirt reboot event
* src/remote/remote_driver.c: Receive and dispatch reboot
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
reboot events
The libvirtd daemon impl will need to switch over to using the
new event APIs. To make this simpler, ensure all drivers currently
providing events support both the new APIs and old APIs.
* src/lxc/lxc_driver.c, src/qemu/qemu_driver.c, src/test/test_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c: Implement the new
virConnectDomainEvent(Dereg|Reg)isterAny driver entry points
The internal domain events APIs are designed to handle the lifecycle
events. This needs to be refactored to allow arbitrary new event
types to be handled.
* The signature of virDomainEventDispatchFunc changes to use
virConnectDomainEventGenericCallback instead of the lifecycle
event specific virConnectDomainEventCallback
* Every registered callback gains a unique ID to allow its
removal based on ID, instead of function pointer
* Every registered callback gains an 'eventID' to allow callbacks
for different types of events to be distinguished
* virDomainEventDispatch is adapted to filter out callbacks
whose eventID does not match the eventID of the event being
dispatched
* virDomainEventDispatch is adapted to filter based on the
domain name and uuid, if this filter is set for a callback.
* virDomainEvent type/detail fields are moved into a union to
allow different data fields for other types of events to be
added later
* src/conf/domain_event.h, src/conf/domain_event.c: Refactor
to allow handling of different types of events
* src/lxc/lxc_driver.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/xen/xen_driver.c: Change dispatch function signature
to use virConnectDomainEventGenericCallback
The current API for domain events has a number of problems
- Only allows for domain lifecycle change events
- Does not allow the same callback to be registered multiple times
- Does not allow filtering of events to a specific domain
This introduces a new more general purpose domain events API
typedef enum {
VIR_DOMAIN_EVENT_ID_LIFECYCLE = 0, /* virConnectDomainEventCallback */
...more events later..
}
int virConnectDomainEventRegisterAny(virConnectPtr conn,
virDomainPtr dom, /* Optional, to filter */
int eventID,
virConnectDomainEventGenericCallback cb,
void *opaque,
virFreeCallback freecb);
int virConnectDomainEventDeregisterAny(virConnectPtr conn,
int callbackID);
Since different event types can received different data in the callback,
the API is defined with a generic callback. Specific events will each
have a custom signature for their callback. Thus when registering an
event it is neccessary to cast the callback to the generic signature
eg
int myDomainEventCallback(virConnectPtr conn,
virDomainPtr dom,
int event,
int detail,
void *opaque)
{
...
}
virConnectDomainEventRegisterAny(conn, NULL,
VIR_DOMAIN_EVENT_ID_LIFECYCLE,
VIR_DOMAIN_EVENT_CALLBACK(myDomainEventCallback)
NULL, NULL);
The VIR_DOMAIN_EVENT_CALLBACK() macro simply does a "bad" cast
to the generic signature
* include/libvirt/libvirt.h.in: Define new APIs for registering
domain events
* src/driver.h: Internal driver entry points for new events APIs
* src/libvirt.c: Wire up public API to driver API for events APIs
* src/libvirt_public.syms: Export new APIs
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c,
src/vbox/vbox_tmpl.c, src/xen/xen_driver.c,
src/xenapi/xenapi_driver.c: Stub out new API entries
Before, this function would blindly accept an invalid def->dst
and then abuse the idx=-1 it would get from virDiskNameToIndex,
when passing it invalid strings like "xvda:disk" and "sda1".
Now, this function returns -1 upon failure.
* src/conf/domain_conf.c (virDomainDiskDefAssignAddress): as above.
Update callers.
* src/conf/domain_conf.h: Update prototype.
* src/qemu/qemu_conf.c: Update callers.
"virsh dominfo <vm>" crashes if there's no primary security driver set
since we only intialize the secmodel.model and secmodel.doi if we have
one. Attached patch checks for securityPrimaryDriver instead of
securityDriver since the later is always set in qemudSecurityInit().
Closes: http://bugs.debian.org/574359
Attempt to turn on vhost-net mode for devices of type NETWORK, BRIDGE,
and DIRECT (macvtap).
* src/qemu/qemu_conf.h: add vhostfd to qemuBuildHostNetStr prototype
add qemudOpenVhostNet prototype new flag to set when :,vhost=" found in
qemu help
* src/qemu/qemu_conf.c: * set QEMUD_CMD_FLAG_VNET_HOST is ",vhost=" found
in qemu help
- qemudOpenVhostNet - opens /dev/vhost-net to pass to qemu if everything
is in place to use it.
- qemuBuildHostNetStr - add vhostfd to commandline if it's not empty
(higher levels decide whether or not to fill it in)
- qemudBuildCommandLine - if /dev/vhost-net is successfully opened, add
its fd to tapfds array so it isn't closed on qemu exec, and populate
vhostfd_name to be passed in to commandline builder.
* src/qemu/qemu_driver.c: add filler 0 for new arg to qemuBuildHostNetStr,
along with a note that this must be implemented in order for hot-plug of
vhost-net virtio devices to work properly (once qemu "netdev_add" monitor
command is implemented).
Currently no command can be sent to a qemu process while another job is
active. This patch adds support for signaling long-running jobs (such as
migration) so that other threads may request predefined operations to be
done during such jobs. Two signals are defined so far:
- QEMU_JOB_SIGNAL_CANCEL
- QEMU_JOB_SIGNAL_SUSPEND
The first one is used by qemuDomainAbortJob.
The second one is used by qemudDomainSuspend for suspending a domain
during migration, which allows for changing live migration into offline
migration. However, there is a small issue in the way qemudDomainSuspend
is currently implemented for migrating domains. The API calls returns
immediately after signaling migration job which means it is asynchronous
in this specific case.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
* src/qemu/qemu_driver.c (qemudDomainAttachSCSIDisk): The ".controller"
member is an index, and *may* be 0. As such, the commit that we're
reverting broke SCSI disk hot-plug on controller 0.
Reported by Wolfgang Mauerer.
We need to call PrepareHostdevs to determine the USB device path before
any security calls. PrepareHostUSBDevices was also incorrectly skipping
all USB devices.
* src/qemu/qemu_conf.c: add the ",readonly=on" for read-only disks
and also parse it back in qemuParseCommandLineDisk()
* tests/qemuxml2argvtest.c
tests/qemuxml2argvdata/qemuxml2argv-disk-drive-readonly-disk.args
tests/qemuxml2argvdata/qemuxml2argv-disk-drive-readonly-disk.xml:
add a specific regression test
Currently if you dump the core of a qemu guest with
qemudDomainCoreDump, subsequent commands will hang
up libvirtd. This is because qemudDomainCoreDump
uses qemuDomainWaitForMigrationComplete, which expects
the qemuDriverLock to be held when it's called. This
patch does the simple thing and moves the qemuDriveUnlock
to the end of the qemudDomainCoreDump so that the driver
lock is held for the entirety of the call (as it is done
in qemudDomainSave). We will probably want to make the
lock more fine-grained than that in the future, but
we can fix both qemudDomainCoreDump and qemudDomainSave
at the same time.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The code to add job support into libvirtd caused a problem
in qemudDomainSetVcpus. In particular, a qemuDomainObjEndJob()
call was added at the end of the function, but a
corresponding qemuDomainObjBeginJob() was not. Additionally,
a call to qemuDomainObj{Enter,Exit}Monitor() was also missed
in qemudDomainHotplugVcpus(). These missing calls conspired to
cause a hang in the libvirtd process after the command was
finished. Fix this by adding the missing calls.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
As previously discussed[1], this patch removes the
qemudDomainSetMaxMemory() function, since it doesn't
work. This means that instead of getting somewhat
cryptic errors, you will now get:
error: Unable to change MaxMemorySize
error: this function is not supported by the hypervisor: virDomainSetMaxMemory
Which describes the situation perfectly.
[1] https://www.redhat.com/archives/libvir-list/2010-February/msg00928.html
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When using the JSON monitor, qemuMonitorJSONExtractCPUInfo
was returning 0 on success. Unfortunately, higher levels of
the cpuinfo code expect that it returns the number of CPUs
it found on success. This one-line patch fixes it so that
it returns the correct number. This makes "virsh vcpuinfo <domain>"
work when using the JSON monitor.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
A few more non-literal format strings in error log messages have crept
in. Fix them in the standard way - turn the format string into "%s"
with the original string as the arg.
When adding domainMemoryStats API support for the qemu driver, I didn't
follow the locking rules exactly. The job condition must be held when
executing monitor commands. This corrects the segfaults I was seeing
when calling domainMemoryStats in a multi-threaded environment.
* src/qemu/qemu_driver.c: in qemudDomainMemoryStats() add missing
calls to qemuDomainObjBeginJob/qemuDomainObjEndJob
doTunnelSendAll function (used by QEMU migration) uses a 64k buffer on
the stack, which could be problematic. This patch replaces that with a
buffer from the heap.
While in the neighborhood, this patch also improves error reporting in
the case that saferead fails - previously, virStreamAbort() was called
(resetting errno) before reporting the error. It's been changed to
report the error first.
* src/qemu/qemu_driver.c: fix doTunnelSendAll() to use a malloc'ed
buffer
* src/qemu/qemu_driver.c (qemudDomainAttachSCSIDisk): Handle
the (theoretical) case of an empty controller list, so that
clang does not think the subsequent dereference of "cont"
would dereference an undefined variable (due to preceding
loop not iterating even once).
* src/qemu/qemu_driver.c (qemudDomainRestore): A corrupt save file
(in particular, a too-large header.xml_len value) would cause an
unwarranted out-of-memory error. Do not trust the just-read
header.xml_len. Instead, merely use that as a hint, and
read/allocate up to that number of bytes from the file.
Also verify that header.xml_len is positive; if it were negative,
passing it to virFileReadLimFD could cause trouble.
The code erroneously searched the entire "reply" for a comma, when
its intent was to search only that portion after "balloon: actual="
* src/qemu/qemu_monitor_text.c (qemuMonitorTextGetMemoryStats):
Search for "," only starting *after* the BALLOON_PREFIX string.
Otherwise, we'd be more prone to false positives.
Changeset
commit 5073aa994a
Author: Cole Robinson <crobinso@redhat.com>
Date: Mon Jan 11 11:40:46 2010 -0500
Added support for product/vendor based passthrough, but it only
worked at the security driver layer. The main guest XML config
was not updated with the resolved bus/device ID. When the QEMU
argv refactoring removed use of product/vendor, this then broke
launching guests.
THe solution is to move the product/vendor resolution up a layer
into the QEMU driver. So the first thing QEMU does is resolve
the product/vendor to a bus/device and updates the XML config
with this info. The rest of the code, including security drivers
and QEMU argv generated can now rely on bus/device always being
set.
* src/util/hostusb.c, src/util/hostusb.h: Split vendor/product
resolution code out of usbGetDevice and into usbFindDevice.
Add accessors for bus/device ID
* src/security/virt-aa-helper.c, src/security/security_selinux.c,
src/qemu/qemu_security_dac.c: Remove vendor/product from the
usbGetDevice() calls
* src/qemu/qemu_driver.c: Use usbFindDevice to resolve vendor/product
into a bus/device ID
The pci_del command is not being ported to QMP. Convert all the
QEMU hotplug code over to use device_del whenever it is available
to avoid the pci_del problem
* src/qemu/qemu_driver.c: Convert unplug code to device_del
Previously hot-unplug could not be supported for USB devices
in QEMU, since usb_del required the guest visible address
which libvirt never knows. With 'device_del' command we can
now unplug based on device alias, so support that.
* src/qemu/qemu_driver.c: Use device_del to remove USB devices
QEMU has a monitor command 'set_cpu' which allows a specific
CPU to be toggled between online& offline state. libvirt CPU
hotplug does not work in terms of individual indexes CPUs.
Thus to support this, we iteratively toggle the online state
when the total number of vCPUs is adjusted via libvirt
NB, currently untested since QEMU segvs when running this!
* src/qemu/qemu_driver.c: Toggle online state for CPUs when
doing hotplug
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
monitor API for toggling a CPU's online status via 'set_cpu
The QMP code was running query-migration instead of query-migrate.
This doesn't work so well
* src/qemu/qemu_monitor_json.c: s/query-migration/query-migrate/
The code to remove the cgroup after QEMU failed to startup could
be obscuring a real error from earlier on. It is not neccessary
to raise an error in this case, so tell cgroups to keep quiet
* src/qemu/qemu_driver.c: Don't raise cgroups error in QEMU start
cleanup code.
The QEMU hotunplug code for PCI devices was looking at host
devices in the guest config without first filtering non
PCI devices. This means it was reading garbage
* src/qemu/qemu_driver.c: Filter out non-PCI devices
Commit 3c12a67b76 added
a dependency on the NFS_SUPER_MAGIC macro, which is
defined in linux/magic.h. Unfortunately linux/magic.h
is not available in RHEL-5, and causes a compile error.
Just define it locally, since this is something that
can't change.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Move *all* file operations related to creation and writing of libvirt
header to the domain save file into a hook function that is called by
virFileOperation. First try to call virFileOperation as root. If that
fails with EACCESS, and (in the case of Linux) statfs says that we're
trying to save the file on an NFS share, rerun virFileOperation,
telling it to fork a child process and setuid to the qemu user. This
is the only way we can successfully create a file on a root-squashed
NFS server.
This patch (along with setting dynamic_ownership=0 in qemu.conf)
makes qemudDomainSave work on root-squashed NFS.
* src/qemu/qemu_driver.c: provide new qemudDomainSaveFileOpHook()
utility, use it in qemudDomainSave() if normal creation of the
file as root failed, and after checking the filesystem type for
the storage is NFS. In that case we also bypass the security
driver, as this would fail on NFS.
If qemudDomainRestore fails to open the domain save file, create a
pipe, then fork a process that does setuid(qemu_user) and opens the
file, then reads this file and stuffs it into the pipe. the parent
libvirtd process will use the other end of the pipe as its fd, then
reap the child process after it's done reading.
This makes domain restore work on a root-squash NFS share that is only
visible to the qemu user.
* src/qemu/qemu_driver.c: add new qemudOpenAsUID() helper function,
and use it in qemudDomainRestore() if reading as root directly failed.
The USB/PCI device hotplug code for the QEMU driver was forgetting
to allocate a unique device alias.
* src/qemu/qemu_driver.c: Fill in device alias for USB/PCI devices
The code assumed that 'device_add' returned an empty string upon
success. This is not true, it sometimes prints random debug info.
THus we need to check for an explicit fail string
* src/qemu/qemu_monitor_text.c: Fix error checking of the device_add
monitor command
When a VM save attempt failed, the VM would be left in a paused
state. It is neccessary to resume CPU execution upon failure
if it was running originally
* src/qemu/qemu_driver.c: Resume CPUs upon save failure
This supports cancellation of jobs for the QEMU driver against
the virDomainMigrate, virDomainSave and virDomainCoreDump APIs.
It is not yet supported for the virDomainRestore API, although
it is desirable.
* src/qemu/qemu_driver.c: Issue 'migrate_cancel' command if
virDomainAbortJob is issued during a migration operation
* tools/virsh.c: Add a domjobabort command
This provides the internal glue for the driver API
* src/driver.h: Internal API contract
* src/libvirt.c, src/libvirt_public.syms: Connect public API
to driver API
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/test/test_driver.c src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Stub out entry points
Introduce support for virDomainGetJobInfo in the QEMU driver. This
allows for monitoring of any API that uses the 'info migrate' monitor
command. ie virDomainMigrate, virDomainSave and virDomainCoreDump
Unfortunately QEMU does not provide a way to monitor incoming migration
so we can't wire up virDomainRestore yet.
The virsh tool gets a new command 'domjobinfo' to query status
* src/qemu/qemu_driver.c: Record virDomainJobInfo and start time
in qemuDomainObjPrivatePtr objects. Add generic shared handler
for calling 'info migrate' with all migration based APIs.
* src/qemu/qemu_monitor_text.c: Fix parsing of 'info migration' reply
* tools/virsh.c: add new 'domjobinfo' command to query progress
The internal glue layer for the new pubic API
* src/driver.h: Define internal driver API contract
* src/libvirt.c, src/libvirt_public.syms: Wire up public
API to internal driver API
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Stub new entry point
A number of the error messages raised when parsing USB devices
refered to PCI devices by mistake
* src/qemu/qemu_conf.c: s/PCI/USB/ in qemuParseCommandLineUSB()
when the underlying qemu supports the drive/device model and the
controller has been added this way.
* src/qemu/qemu_driver.c: use qemuMonitorDelDevice() when detaching
PCI controller and if supported
* src/qemu/qemu_monitor.[ch]: add new qemuMonitorDelDevice() function
* src/qemu/qemu_monitor_json.[ch]: JSON backend for DelDevice command
* src/qemu/qemu_monitor_text.[ch]: Text backend for DelDevice command
* src/qemu/qemu_driver.c: in qemudDomainDetachPciControllerDevice()
when a controller is not present in the system anymore, the PCI
address must be deleted from libvirt's hashtable because it can
be re-used for other purposes.
* src/qemu/qemu_driver.c: in qemudDomainAttachDevice(), one must not
delete the data part when the operation succeeds because it is
required later on. The correct pattern to handlethe parsed
representation of the device information on success
is dev->data.controller = NULL; virDomainDeviceDefFree(dev);,
which leaves the structure pointed at by data in memory.
Allow an arbitrary timezone with QEMU by setting the $TZ environment
variable when launching QEMU
* src/qemu/qemu_conf.c: Set TZ environment variable if a timezone
is requested
* tests/qemuxml2argvtest.c: Add test case for timezones
* tests/qemuxml2argvdata/qemuxml2argv-clock-france.xml,
tests/qemuxml2argvdata/qemuxml2argv-clock-france.args: Data
for timezone tests
This allows QEMU guests to be started with an arbitrary clock
offset
The test case can't actually be enabled, since QEMU argv expects
an absolute timestring, and this will obviously change every
time the test runs :-( Hopefully QEMU will allow a relative
time offset in the future.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Use the -rtc arg
if available to support variable clock offset mode
* tests/qemuhelptest.c: Add QEMUD_CMD_FLAG_RTC for qemu 0.12.1
* qemuxml2argvdata/qemuxml2argv-clock-variable.args,
qemuxml2argvdata/qemuxml2argv-clock-variable.xml,
qemuxml2argvtest.c: Test case, except we can't actually enable
it yet.
The XML will soon be extended to allow more than just a simple
localtime/utc boolean flag. This change replaces the plain
'int localtime' with a separate struct to prepare for future
extension
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add a new
virDomainClockDef structure
* src/libvirt_private.syms: Export virDomainClockOffsetTypeToString
and virDomainClockOffsetTypeFromString
* src/qemu/qemu_conf.c, src/vbox/vbox_tmpl.c, src/xen/xend_internal.c,
src/xen/xm_internal.c: Updated to use new structure for localtime
If the hostname as returned by "gethostname" resolves
to "localhost" (as it does with the broken Fedora-12
installer), then live migration will fail because the
source will try to migrate to itself. Detect this
situation up-front and abort the live migration before
we do any real work.
* src/util/util.h src/util/util.c: add a new virGetHostnameLocalhost
with an optional localhost check, and rewire virGetHostname() to use
it
* src/libvirt_private.syms: expose the new function
* src/qemu/qemu_driver.c: use it in qemudDomainMigratePrepare2()
This patch sets or unsets the IFF_VNET_HDR flag depending on what device
is used in the VM. The manipulation of the flag is done in the open
function and is only fatal if the IFF_VNET_HDR flag could not be cleared
although it has to be (or if an ioctl generally fails). In that case the
macvtap tap is closed again and the macvtap interface torn.
* src/qemu/qemu_conf.c src/qemu/qemu_conf.h: pass qemuCmdFlags to
qemudPhysIfaceConnect()
* src/util/macvtap.c src/util/macvtap.h: add vnet_hdr boolean to
openMacvtapTap(), and private function configMacvtapTap()
* src/qemu/qemu_driver.c: add extra qemuCmdFlags when calling
qemudPhysIfaceConnect()
Support virtio-serial controller and virtio channel in QEMU backend.
Will output
the following for virtio-serial controller:
-device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4,max_ports=16,vectors=4
and the following for a virtio channel:
-chardev pty,id=channel0 \
-device
virtserialport,bus=virtio-serial0.0,chardev=channel0,name=org.linux-kvm.port.0
* src/qemu/qemu_conf.c: Add argument output for virtio
* tests/qemuxml2argvtest.c
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.args: Add test for
QEMU command line generation
Add support for virtio-serial by defining a new 'virtio' channel target type
and a virtio-serial controller. Allows the following to be specified in a
domain:
<controller type='virtio-serial' index='0' ports='16' vectors='4'/>
<channel type='pty'>
<target type='virtio' name='org.linux-kvm.port.0'/>
<address type='virtio-serial' controller='0' bus='0'/>
</channel>
* docs/schemas/domain.rng: Add virtio-serial controller and virtio
channel type.
* src/conf/domain_conf.[ch]: Domain parsing/serialization for
virtio-serial controller and virtio channel.
* tests/qemuxml2xmltest.c
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.xml: add domain xml
parsing test
* src/libvirt_private.syms src/qemu/qemu_conf.c:
virDomainDefAddDiskControllers() renamed to
virDomainDefAddImplicitControllers()
Currently we just error with ex. 'virbr0: No such device'.
Since we are using public API calls here, we need to ensure that any
raised error is properly saved and restored, since API entry points
always reset messages.
Rework and simplification of teardown of the macvtap device.
Basically all devices with the same MAC address and link device are kept
alive and not attempted to be torn down. If a macvtap device linked to a
physical interface with a certain MAC address 'M' is to be created it
will automatically fail if the interface is 'up'ed and another macvtap
with the same properties (MAC addr 'M', link dev) happens to be 'up'.
This will prevent the VM from starting or the device from being attached
to a running VM. Stale interfaces are assumed to be there for some
reason and not stem from libvirt.
In the VM shutdown path, it's assuming that an interface name is always
available so that if the device type is DIRECT it can be torn down
using its name.
* src/util/macvtap.h src/libvirt_macvtap.syms: change of deleting routine
* src/util/macvtap.c: cleanups and change of deleting routine
* src/qemu/qemu_driver.c: change cleanup on shutdown
* src/qemu/qemu_conf.c: don't delete Macvtap in qemudPhysIfaceConnect()
The QEMU JSON monitor changed balloon commands to return/accept
bytes instead of kilobytes. Update libvirt to cope with this
* src/qemu/qemu_monitor_json.c: Expect/use bytes for ballooning
Similar to the Set*Mem commands, this implementation was bogus and
misleading. Make it clear this is a hotplug only operation, and that the
hotplug piece isn't even implemented.
Also drop the overkill maxvcpus validation: we don't perform this check
at XML define time so clearly no one is missing it, and there is
always the risk that our info will be out of date, possibly preventing
legitimate CPU values.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
SetMem and SetMaxMem are hotplug only APIs, any persistent config
changes are supposed to go via XML definition. The original implementation
of these calls were incorrect and had the nasty side effect of making
a psuedo persistent change that would be lost after libvirtd restart
(I didn't know any better).
Fix these APIs to rightly reject non running domains.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
The plain QEMU tree does not include 'thread_id' in the JSON
output. Thus we need to treat it as non-fatal if missing.
* src/qemu/qemu_monitor_json.c: Treat missing thread_id as non-fatal
A typo in the check for the primary IDE controller could cause
a crash on restore depending on the exact guest config.
* src/qemu/qemu_conf.c: Fix s/video/controller/ typo & slot
number typo
Current error reporting for JSON mode returns the full JSON
command string and full JSON error string. This is not very
user friendly, so this change makes the error report only
contain the basic command name, and friendly error message
description string. The full JSON data is logged instead.
* src/qemu/qemu_monitor_json.c: Always return the 'desc' field from
the JSON error message to users.
When in JSON mode, QEMU requires that 'qmp_capabilities' is run as
the first command in the monitor. This is a no-op when run in the
text mode monitor
* src/qemu/qemu_driver.c: Run capabilities negotiation when
connecting to the monitor
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Add
support for the 'qmp_capabilities' command, no-op in text mode.
This part adds support for qemu making a macvtap tap device available
via file descriptor passed to qemu command line. This also attempts to
tear down the macvtap device when a VM terminates. This includes support
for attachment and detachment to/from running VM.
* src/qemu/qemu_conf.[ch] src/qemu/qemu_driver.c: add support in the
QEmu driver
Current PCI addresses are allocated at time of VM startup.
To make them truely persistent, it is neccessary to do this
at time of virDomainDefine/virDomainCreate. The code in
qemuStartVMDaemon still remains in order to cope with upgrades
from older libvirt releases
* src/qemu/qemu_driver.c: Rename existing qemuAssignPCIAddresses
to qemuDetectPCIAddresses. Add new qemuAssignPCIAddresses which
does auto-allocation upfront. Call qemuAssignPCIAddresses from
qemuDomainDefine and qemuDomainCreate to assign PCI addresses that
can then be persisted. Don't clear PCI addresses at shutdown if
they are intended to be persistent
The old text mode monitor prompts for a password when disks are
encrypted. This interactive approach doesn't work for JSON mode
monitor. Thus there is a new 'block_passwd' command that can be
used.
* src/qemu/qemu_driver.c: Split out code for looking up a disk
secret from findVolumeQcowPassphrase, into a new method
getVolumeQcowPassphrase. Enhance qemuInitPasswords() to also
set the disk encryption password via the monitor
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for the 'block_passwd' monitor command.
Since c26cb9234f, the dname
parameter has been ignored by these two functions. Use it.
* src/qemu/qemu_driver.c (qemudDomainMigratePrepareTunnel): Honor dname
parameter once again.
(qemudDomainMigratePrepare2): Likewise.
All other libvirt functions use array first and then number of elements
in that array. Let's make cpuDecode follow this rule.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
With QEMU >= 0.12 the host and guest side of disks no longer have
the same naming convention. Specifically the host side will now
get a 'drive-' prefix added to its name. The 'info blockstats'
monitor command returns the host side name, so it is neccessary
to strip this off when looking up stats since libvirt stores the
guest side name !
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Move 'drive-' prefix
string to a defined constant
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Strip
off 'drive-' prefix (if found) when looking up disk stats
Currently the timeout for reading startup output is 3 seconds. If the
host is under any sort of load, we can easily trigger this. Lets bump
it to 30 seconds.
Since the polling loop checks to see if the process has died, we shouldn't
erroneously hit this timeout if qemu bombs (only if it is stuck in some
infinite loop).
Use the ATTRIBUTE_NONNULL annotation to mark some virConnectPtr
args as mandatory non-null so the compiler can warn of mistakes
* src/conf/domain_event.h: All virConnectPtr args must be non-null
* src/qemu/qemu_conf.h: qemudBuildCommandLine and
qemudNetworkIfaceConnect() must be given non-null connection
* tests/qemuxml2argvtest.c: Provide a non-null (dummy) connection to
qemudBuildCommandLine()
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in cpu_conf.{h,c} and update all callers to
match
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in node_device_conf.{h,c} and update all callers to
match
All callers now pass a NULL virConnectPtr into the USB/PCi device
iterator functions. Therefore the virConnectPtr arg can now be
removed from these functions
* src/util/hostusb.h, src/util/hostusb.c: Remove virConnectPtr
from usbDeviceFileIterate
* src/util/pci.c, src/util/pci.h: Remove virConnectPtr arg from
pciDeviceFileIterate
* src/qemu/qemu_security_dac.c, src/security/security_selinux.c: Update
to drop redundant virConnectPtr arg
The QEMU flags are commonly stored as a signed or unsigned int,
allowing only 31 flags. This limit is rather close, so to aid
future patches, change it to a 64-bit int
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h, src/qemu/qemu_driver.c,
tests/qemuargv2xmltest.c, tests/qemuhelptest.c, tests/qemuxml2argvtest.c:
Use 'unsigned long long' for QEMU flags
The virConnectPtr is no longer required for error reporting since
that is recorded in a thread local. Remove use of virConnectPtr
from all APIs in security_driver.{h,c} and update all callers to
match
The security driver was mistakenly initialized before the QEMU
config file was loaded. This prevents it being turned off again.
The capabilities XML was also getting the wrong security driver
name, due to the stacked driver arrangement.
* src/qemu/qemu_driver.c: Fix initialization order and capabilities
model name
When restoring from a saved guest image, the XML would already
contain the PCI slot ID of the IDE controller & video card.
The attempt to explicitly reserve this upfront would thus fail
everytime.
* src/qemu/qemu_conf.c: Reserve IDE controller / video card
slot at time of need, rather than upfront
If the primary security driver (SELinux/AppArmour) was disabled
then the secondary QEMU DAC security driver was also disabled.
This is mistaken, because the latter must be active at all times
* src/qemu/qemu_driver.c: Ensure DAC driver is always active
To allow devices to be hot(un-)plugged it is neccessary to ensure
they all have a unique device aliases. This fixes the hotplug
methods to assign device aliases before invoking the monitor
commands which need them
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Expose methods
for assigning device aliases for disks, host devices and
controllers
* src/qemu/qemu_driver.c: Assign device aliases when hotplugging
all types of device
* tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.args,
tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.args:
Update for changed hostdev naming scheme
This patch re-arranges the QEMU device alias assignment code to
make it easier to call into the same codeblock when performing
device hotplug. The new code has the ability to skip over already
assigned names to facilitate hotplug
* src/qemu/qemu_driver.c: Call qemuAssignDeviceNetAlias()
instead of qemuAssignNetNames
* src/qemu/qemu_conf.h: Export qemuAssignDeviceNetAlias()
instead of qemuAssignNetNames
* src/qemu/qemu_driver.c: Merge the legacy disk/network alias
assignment code into the main methods
The current way of assigning names to the host network backend and
NIC device in QEMU was over complicated, by varying naming scheme
based on the NIC model and backend type. This simplifies the naming
to simply be 'net0' and 'hostnet0', allowing code to easily determine
the host network name and vlan based off the primary device alias
name 'net0'. This in turn allows removal of alot of QEMU specific
code from the XML parser, and makes it easier to assign new unique
names for NICs that are hotplugged
* src/conf/domain_conf.c, src/conf/domain_conf.h: Remove hostnet_name
and vlan fields from virNetworkDefPtr
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h, src/qemu/qemu_driver.c:
Use a single network alias naming scheme regardless of NIC type
or backend type. Determine VLANs from the alias name.
* tests/qemuxml2argvdata/qemuxml2argv-net-eth-names.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.args: Update
for new simpler naming scheme
The QEMU 0.12.x tree has the -netdev command line argument, but not
corresponding monitor command. We can't enable the former, without
the latter since it will break hotplug/unplug.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Disable -netdev usage
until 0.13 at earliest
* tests/qemuxml2argvtest.c: Add test for -netdev syntax
* tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-netdev.xml: Test
data files for -netdev syntax
PCI disk, disk controllers, net devices and host devices need to
have PCI addresses assigned before they are hot-plugged
* src/qemu/qemu_conf.c: Add APIs for ensuring a device has an
address and releasing unused addresses
* src/qemu/qemu_driver.c: Ensure all devices have addresses
when hotplugging.
The current QEMU code allocates PCI addresses incrementally starting
at 4. This is not satisfactory because the user may have given some
addresses in their XML config, which need to be skipped over when
allocating addresses to remaining devices.
It is thus neccessary to maintain a list of already allocated PCI
addresses and then only allocate ones that remain unused. This is
also required for domain device hotplug to work properly later.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Add APIs for creating
list of existing PCI addresses, and allocating new addresses.
Refactor address assignment to use this code
* src/qemu/qemu_driver.c: Pull PCI address assignment up into the
qemuStartVMDaemon() method, as a prelude to moving it into the
'define' method. Update list of allocated addresses when connecting
to a running VM at daemon startup.
* tests/qemuxml2argvtest.c, tests/qemuargv2xmltest.c,
tests/qemuxml2xmltest.c: Remove USB product test since all
passthrough is done based on address
* tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-product.args,
tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-product.xml: Kil
unused data files
Since QEMU startup uses the new -device argument, the hotplug
code needs todo the same. This converts disk, network and
host device hotplug to use the device_add command
* src/qemu/qemu_driver.c: Use new device_add monitor APIs
whereever possible
The way QEMU is started has been changed to use '-device' and
the new style '-drive' syntax. This needs to be mirrored in
the hotplug code, requiring addition of two new APIs.
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Define APIs
qemuMonitorAddDevice() and qemuMonitorAddDrive()
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Implement the new monitor APIs
To allow for better code reuse from hotplug methods, the code for
generating PCI/USB hostdev arg values is split out into separate
methods
* qemu/qemu_conf.h, qemu/qemu_conf.c: Introduce new APis for
qemuBuildPCIHostdevPCIDevStr, qemuBuildUSBHostdevUsbDevStr
and qemuBuildUSBHostdevDevStr
All the helper functions for building command line arguments
now return a 'char *', instead of acepting a 'char **' or
virBufferPtr argument
* qemu/qemu_conf.c: Standardize syntax for building args
* qemu/qemu_conf.h: Export all functions for building args
* qemu/qemu_driver.c: Update for changed syntax for building
NIC/hostnet args
Similar to the race fixed by
be34c3c7ef, make sure
to wait around for KVM to release the resources from
a hot-detached PCI device before attempting to
rebind that device to the host driver.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The QEMU driver contained code to generate a -device string for piix4-ide, but
wasn't using it. This change removes this string generation. It also adds a
comment explaining why IDE and FDC controllers don't generate -device strings.
The change also generates an error if a sata controller is specified for a QEMU
domain, as this isn't supported.
* src/qemu/qemu_conf.c: Remove VIR_DOMAIN_CONTROLLER_TYPE_IDE handler in
qemuBuildControllerDevStr(). Ignore IDE and FDC controllers. Error if
SATA controller is discovered. Add comments.
On RHEL-5 the qemu-kvm binary is located in /usr/libexec.
To reduce confusion for people trying to run upstream libvirt
on RHEL-5 machines, make the qemu driver look in /usr/libexec
for the qemu-kvm binary.
To make this work, I modified virFindFileInPath to handle an
absolute path correctly. I also ran into an issue where
NULL was sometimes being passed for the file parameter
to virFindFileInPath; it didn't crash prior to this patch
since it was building paths like /usr/bin/(null). This
is non-standard behavior, though, so I added a NULL
check at the beginning.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
If you shutdown libvirtd while a domain with PCI
devices is running, then try to restart libvirtd,
libvirtd will crash.
This happens because qemuUpdateActivePciHostdevs() is calling
pciDeviceListSteal() with a dev of 0x0 (NULL), and then trying
to dereference it. This patch fixes it up so that
qemuUpdateActivePciHostdevs() steals the devices after first
Get()'ting them, avoiding the crash.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
* src/qemu/qemu_monitor_text.c (qemuMonitorTextAttachDrive): Most other
failures in this function would "goto cleanup", but one mistakenly
returned directly, skipping the cleanup and resulting in a leak.
In addition, iterating the "try_command" loop would clobber, and
thus leak, the "cmd" allocated on the first iteration,
so be careful to free it in addition to "reply" beforehand.
The KVM build of QEMU includs the thread ID of each vCPU in the
'query-cpus' output. This is required for pinning guests to
particular host CPUs
* src/qemu/qemu_monitor_json.c: Extract 'thread_id' from CPU info
* src/util/json.c, src/util/json.h: Declare returned strings
to be const
* src/qemu/qemu_monitor.c: Wire up JSON mode for qemuMonitorGetPtyPaths
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Fix
const correctness. Add missing error message in the function
qemuMonitorJSONGetAllPCIAddresses. Add implementation of the
qemuMonitorGetPtyPaths function calling 'query-chardev'.
Certain hypervisors (like qemu/kvm) map the PCI bar(s) on
the host when doing device passthrough. This can lead to a race
condition where the hypervisor is still cleaning up the device while
libvirt is trying to re-attach it to the host device driver. To avoid
this situation, we look through /proc/iomem, and if the hypervisor is
still holding onto the bar (denoted by the string in the matcher variable),
then we can wait around a bit for that to clear up.
v2: Thanks to review by DV, make sure we wait the full timeout per-device
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Fix a small problem with the qemu memory stats parsing algorithm. If qemu
reports a stat that libvirt does not recognize, skip past it so parsing can
continue. This corrects a potential infinite loop in the parsing code that can
only be triggered if new statistics are added to qemu.
* src/qemu/qemu_monitor_text.c: qemuMonitorParseExtraBalloonInfo add a
skip for extra ','
The loop looking for the controller associated with a SCI drive had
an off by one, causing it to miss the last controller.
* src/qemu/qemu_driver.c: Fix off-by-1 in searching for SCSI
drive hotplug
The hotplug code in QEMU was leaking memory because although the
inner device object was being moved into the main virDomainDefPtr
config object, the outer container virDomainDeviceDefPtr was not.
* src/qemu/qemu_driver.c: Clarify code to show that the inner
device object is owned by the main domain config upon
successfull attach.
Add the ability to turn off dynamic management of file permissions
for libvirt guests.
* qemu/libvirtd_qemu.aug: Support 'dynamic_ownership' flag
* qemu/qemu.conf: Document 'dynamic_ownership' flag.
* qemu/qemu_conf.c: Load 'dynamic_ownership' flag
* qemu/test_libvirtd_qemu.aug: Test 'dynamic_ownership' flag
The hotplug code was not correctly invoking the security driver
in error paths. If a hotplug attempt failed, the device would
be left with VM permissions applied, rather than restored to the
original permissions. Also, a CDROM media that is ejected was
not restored to original permissions. Finally there was a bogus
call to set hostdev permissions in the hostdev unplug code
* qemu/qemu_driver.c: Fix security driver usage in hotplug/unplug
If there is a problem with VM startup, PCI devices may be left
assigned to pci-stub / pci-back. Adding a call to reattach
host devices in the cleanup path is required.
* qemu/qemu_driver.c: qemuDomainReAttachHostDevices() when
VM startup fails
Remove all the QEMU driver calls for setting file ownership and
process uid/gid. Instead wire in the QEMU DAC security driver,
stacking it ontop of the primary SELinux/AppArmour driver.
* qemu/qemu_driver.c: Switch over to new DAC security driver
This new security driver is responsible for managing UID/GID changes
to the QEMU process, and any files/disks/devices assigned to it.
* qemu/qemu_conf.h: Add flag for disabling automatic file permission
changes
* qemu/qemu_security_dac.h, qemu/qemu_security_dac.c: New DAC driver
for QEMU guests
* Makefile.am: Add new files
Pulling the disk labelling code out of the exec hook, and into
libvirtd will allow it to access shared state in the daemon. It
will also make debugging & error reporting easier / more reliable.
* qemu/qemu_driver.c: Move initial disk labelling calls up into
libvirtd. Add cleanup of disk labels upon failure
If a VM fails to start, we can't simply free the security label
strings, we must call the domainReleaseSecurityLabel() method
otherwise the reserved 'mcs' level will be leaked in SElinux
* src/qemu/qemu_driver.c: Invoke domainReleaseSecurityLabel()
when domain fails to start
The current security driver architecture has the following
split of logic
* domainGenSecurityLabel
Allocate the unique label for the domain about to be started
* domainGetSecurityLabel
Retrieve the current live security label for a process
* domainSetSecurityLabel
Apply the previously allocated label to the current process
Setup all disk image / device labelling
* domainRestoreSecurityLabel
Restore the original disk image / device labelling.
Release the unique label for the domain
The 'domainSetSecurityLabel' method is special because it runs
in the context of the child process between the fork + exec.
This is require in order to set the process label. It is not
required in order to label disks/devices though. Having the
disk labelling code run in the child process limits what it
can do.
In particularly libvirtd would like to remember the current
disk image label, and only change shared image labels for the
first VM to start. This requires use & update of global state
in the libvirtd daemon, and thus cannot run in the child
process context.
The solution is to split domainSetSecurityLabel into two parts,
one applies process label, and the other handles disk image
labelling. At the same time domainRestoreSecurityLabel is
similarly split, just so that it matches the style. Thus the
previous 4 methods are replaced by the following 6 new methods
* domainGenSecurityLabel
Allocate the unique label for the domain about to be started
No actual change here.
* domainReleaseSecurityLabel
Release the unique label for the domain
* domainGetSecurityProcessLabel
Retrieve the current live security label for a process
Merely renamed for clarity.
* domainSetSecurityProcessLabel
Apply the previously allocated label to the current process
* domainRestoreSecurityAllLabel
Restore the original disk image / device labelling.
* domainSetSecurityAllLabel
Setup all disk image / device labelling
The SELinux and AppArmour drivers are then updated to comply with
this new spec. Notice that the AppArmour driver was actually a
little different. It was creating its profile for the disk image
and device labels in the 'domainGenSecurityLabel' method, where as
the SELinux driver did it in 'domainSetSecurityLabel'. With the
new method split, we can have consistency, with both drivers doing
that in the domainSetSecurityAllLabel method.
NB, the AppArmour changes here haven't been compiled so may not
build.
The QEMU driver is doing 90% of the calls to check for static vs
dynamic labelling. Except it is forgetting todo so in many places,
in particular hotplug is mistakenly assigning disk labels. Move
all this logic into the security drivers themselves, so the HV
drivers don't have to think about it.
* src/security/security_driver.h: Add virDomainObjPtr parameter
to virSecurityDomainRestoreHostdevLabel and to
virSecurityDomainRestoreSavedStateLabel
* src/security/security_selinux.c, src/security/security_apparmor.c:
Add explicit checks for VIR_DOMAIN_SECLABEL_STATIC and skip all
chcon() code in those cases
* src/qemu/qemu_driver.c: Remove all checks for VIR_DOMAIN_SECLABEL_STATIC
or VIR_DOMAIN_SECLABEL_DYNAMIC. Add missing checks for possibly NULL
driver entry points.
* src/lxc/lxc_container.c src/lxc/lxc_controller.c src/lxc/lxc_driver.c
src/network/bridge_driver.c src/qemu/qemu_driver.c
src/uml/uml_driver.c: virFileMakePath returns 0 for success, or the
value of errno on failure, so error checking should be to test
if non-zero, not if lower than 0
The test expected all environment variables copied in qemudBuildCommandLine
to have known values. So all of them have to be either set to a known value
or be unset. SDL_VIDEODRIVER and QEMU_AUDIO_DRV are not handled at all but
should be handled. Unset both, otherwise the test will fail if they are set
in the testing environment.
* src/qemu/qemu_conf.c: add a comment about copied environment variables
and qemuxml2argvtest
* tests/qemuxml2argvtest.c: unset SDL_VIDEODRIVER and QEMU_AUDIO_DRV
Invoking the virConnectGetCapabilities() method causes the QEMU
driver to rebuild its internal capabilities object. Unfortunately
it was forgetting to register the custom domain status XML hooks
again.
To avoid this kind of error in the future, the code which builds
capabilities is refactored into one single method, which can be
called from all locations, ensuring reliable rebuilds.
* src/qemu/qemu_driver.c: Fix rebuilding of capabilities XML and
guarentee it is always consistent
I noticed some debug messages are printed with an empty lines after
them. This patch removes these empty lines from all invocations of the
following macros:
VIR_DEBUG
VIR_DEBUG0
VIR_ERROR
VIR_ERROR0
VIR_INFO
VIR_WARN
VIR_WARN0
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
QEMU's command line equivalent for the following domain XML fragment
<vcpus>2</vcpus>
<cpu ...>
...
<topology sockets='1' cores='2', threads='1'/>
</cpu>
is
-smp 2,sockets=1,cores=2,threads=1
This syntax was introduced in QEMU-0.12.
Version 2 changes:
- -smp argument build split into a separate function
- always add ",sockets=S,cores=C,threads=T" to -smp if qemu supports it
- use qemuParseCommandLineKeywords for command line parsing
Version 3 changes:
- ADD_ARG_LIT => ADD_ARG and line reordering in qemudBuildCommandLine
- rebased
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Current version expects name=value,... list and when an incorrect string
such as "a,b,c=d" would be parsed as "a,b,c" keyword with "d" value
without reporting any error, which is probably not the expected
behavior.
This patch adds an extra argument called allowEmptyValue, which if
non-zero will permit keywords with no value; "a,b=c,,d=" will be parsed
as follows:
keyword value
"a" NULL
"b" "c"
"" NULL
"d" ""
In case allowEmptyValue is zero, the string is required to contain
name=value pairs only; retvalues is guaranteed to contain non-NULL
pointers. Now, "a,b,c=d" will result in an error.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Replace
-balloon virtio
With
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
This allows it to get correct assigned PCI address as declared in
previous patch
* src/qemu/qemu_conf.c: Convert Virtio ballon to -device and
give it an explicit PCI address
* tests/qemuxml2argvdata/qemuxml2argv-*args: Add in virtio balloon
where appropriate
Instead of relying on QEMU to assign PCI addresses and then querying
them with 'info pci', manually assign all PCI addresses before starting
the guest. These addresses are not stable across reboots. That will
come in a later patch
NB, the PIIX3 (IDE, FDC, ISA-Bridge) will always have slot 1 and
VGA will always have slot 2. We declare the Virtio Balloon gets
slot 3, and then all remaining slots are for configured devices.
* src/qemu/qemu_conf.c: If -device is supported, then assign all PCI
addresses when building the command line
* src/qemu/qemu_driver.c: Don't query monitor for PCI addresses if
they have already been assigned
* tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.args,
tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args,
tests/qemuxml2argvdata/qemuxml2argv-sound-device.args,
tests/qemuxml2argvdata/qemuxml2argv-watchdog-device.args: Update
to include PCI slot/bus information
QEMU always configures a VGA card. If no video card is included in
the libvirt XML, it is neccessary to explicitly turn off the default
using -vga none
* src/qemu/qemu_conf.c: Pass -vga none if no video card is configured
* tests/qemuargv2xmltest.c, tests/qemuxml2argvtest.c: Test for
handling -vga none.
* tests/qemuxml2argvdata/qemuxml2argv-nographics-vga.args,
tests/qemuxml2argvdata/qemuxml2argv-nographics-vga.xml: Test
data files
Not all QEMU builds default to SDL graphics for their display.
Newer QEMU now has an explicit -sdl flag, which we can use to
explicitly request SDL intead of relying on the default. This
protects libvirt against unexpected changes in graphics default
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Probe for -sdl
flag and use it if it is found
* tests/qemuhelptest.c: Add SDL flag to tests
The old syntax was
-chardev SOMECONFIG
-nic user,guestfwd=tcp:IP:PORT-chardev:CHARDEV
The new syntax is
-chardev SOMECONFIG
-netdev user,guestfwd=tcp:IP:PORT,chardev=ID,id=user-ID
The old syntax was
-usbdevice host:PRODUCT:VENDOR
Or
-usbdevice host:BUS.DEV
The new syntax is
-device usb-host,product=PRODUCT,vendor=VENDOR
Or
-device usb-host,hostbus=BUS,hostaddr=DEV
The previous syntax was severely limited in its options
-usbdevice disk:/home/berrange/output.img
The new syntax is the same as for other disk types
-drive file=/home/berrange/output.img,if=none,id=usb-1,index=1
-device usb-storage,drive=usb-1
Again, the index= arg is wrong here, and will be removed in a
later merge
The current syntax uses a pair of args
-net nic,macaddr=52:54:00:56:6c:55,vlan=3,model=pcnet,name=pcnet.0
-net user,vlan=3,name=user.0
The new syntax does not need the vlan craziness anymore, and
so has a simplified pair of args
-netdev user,id=user.0
-device pcnet,netdev=user.0,id=pcnet.0,mac=52:54:00:56:6c:55,addr=<PCI SLOT>
The current preferred syntax for disk drives uses
-drive file=/vms/plain.qcow,if=virtio,index=0,boot=on,format=qcow
The new syntax splits this up into a pair of linked args
-drive file=/vms/plain.qcow,if=none,id=drive-virtio-0,format=qcow2
-device virtio-blk-pci,drive=drive-virtio-0,id=virtio-0,addr=<PCI SLOT>
SCSI/IDE devices also get a bus property linking them to the
controller
-device scsi-disk,drive=drive-scsi0-0-0,id=scsi0-0-0,bus=scsi0.0,scsi-id=0
-device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bus=ide0,unit=0
The current syntax for audio devices is a horrible multiplexed
arg
-soundhw sb16,pcspk,ac97
The new syntax is
-device sb16,id=sound0
or
-device AC97,id=sound1,addr=<PCI SLOT>
NB, pcspk still uses the old -soundhw syntax
The current character device syntax uses either
-serial tty,path=/dev/ttyS2
Or
-chardev tty,id=serial0,path=/dev/ttyS2 -serial chardev:serial0
With the new -device support, we now prefer
-chardev file,id=serial0,path=/tmp/serial.log -device isa-serial,chardev=serial0
This patch changes the existing -chardev syntax to use this new
scheme, and fallbacks to the old plain -serial syntax for old
QEMU.
The monitor device changes to
-chardev socket,id=monitor,path=/tmp/test-monitor,server,nowait -mon chardev=monitor
In addition, this patch adds --nodefaults, which kills off the
default serial, parallel, vga and nic devices. THis avoids the
need for us to explicitly turn each off
When starting a guest, give every device a unique alias. This will
be used for the 'id' parameter in -device args in later patches.
It can also be used to uniquely identify devices in the monitor
For old QEMU without -device, assign disk names based on QEMU's
historical naming scheme.
* src/qemu/qemu_conf.c: Assign unique device aliases
* src/qemu/qemu_driver.c: Remove obsolete qemudDiskDeviceName
and use the device alias in eject & blockstats commands
Probe for the new -device flag and if available set the -nodefaults
flag, instead of using -net none, -serial none or -parallel none.
Other device types will be converted to use -device in later patches.
The -nodefaults flag will help avoid unwelcome surprises from future
QEMU releases
* src/qemu/qemu_conf.c: Probe for -device. Add -nodefaults flag.
Remove -net none, -serial none or -parallel none
* src/qemu/qemu_conf.h: Define QEMU_CMD_FLAG_DEVICE
* tests/qemuhelpdata/qemu-0.12.1: New data file for 0.12.1 QEMU
* tests/qemuhelptest.c: Test feature extraction from 0.12.1 QEMU
This patch introduces the support for giving all devices a short,
unique name, henceforth known as a 'device alias'. These aliases
are not set by the end user, instead being assigned by the hypervisor
if it decides it want to support this concept.
The QEMU driver sets them whenever using the -device arg syntax
and uses them for improved hotplug/hotunplug. it is the intent
that other APIs (block / interface stats & device hotplug) be
able to accept device alias names in the future.
The XML syntax is
<alias name="video0"/>
This may appear in any type of device that supports device info.
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add a 'alias'
field to virDomainDeviceInfo struct & parse/format it in XML
* src/libvirt_private.syms: Export virDomainDefClearDeviceAliases
* src/qemu/qemu_conf.c: Replace use of "nic_name" field with the
standard device alias
* src/qemu/qemu_driver.c: Clear device aliases at shutdown
The PCI device addresses are only valid while the VM is running,
since they are auto-assigned by QEMU. After shutdown they must
all be cleared. Future QEMU driver enhancement will allow for
persistent PCI address assignment
* src/conf/domain_conf.h, src/conf/domain_conf.c, src/libvirt_private.syms
Add virDomainDefClearPCIAddresses() method for wiping out auto assigned
PCI addresses
* src/qemu/qemu_driver.c: Clear PCI addresses at VM shutdown
Existing applications using libvirt are not aware of the disk
controller concept. Thus, after parsing the <disk> definitions
in the XML, it is neccessary to create <controller> elements
to satisfy all requested disks, as per their defined drive
addresses
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/libvirt_private.syms: Add virDomainDefAddDiskControllers()
method for populating disk controllers, and call it after
parsing disk definitions.
* src/qemu/qemu_conf.c: Call virDomainDefAddDiskControllers()
when doing ARGV -> XML conversion
* tests/qemuxml2argvdata/qemuxml2argv*.xml: Add disk controller
data to all data files which don't have it already
Hotunplug of devices requires that we know their PCI address. Even
hotplug of SCSI drives, required that we know the PCI address of
the SCSI controller to attach the drive to. We can find this out
by running 'info pci' and then correlating the vendor/product IDs
with the devices we booted with.
Although this approach is somewhat fragile, it is the only viable
option with QEMU < 0.12, since there is no way for libvirto set
explicit PCI addresses when creating devices in the first place.
For QEMU > 0.12, this code will not be used.
* src/qemu/qemu_driver.c: Assign all dynamic PCI addresses on
startup of QEMU VM, matching vendor/product IDs
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
API for fetching PCI device address mapping
The current SCSI hotplug support attaches a brand new SCSI controller
for every disk. This is broken because the semantics differ from those
used when starting the VM initially. In the latter case, each SCSI
controller is filled before a new one is added.
If the user specifies an high drive index (sdazz) then at initial
startup, many intermediate SCSI controllers may be added with no
drives.
This patch changes SCSI hotplug so that it exactly matches the
behaviour of initial startup. First the SCSI controller number is
determined for the drive to be hotplugged. If any controller upto
and including that controller number is not yet present, it is
attached. Then finally the drive is attached to the last controller.
NB, this breaks SCSI hotunplug, because there is no 'drive_del'
command in current QEMU. Previous SCSI hotunplug was broken in
any case because it was unplugging the entire controller, not
just the drive in question.
A future QEMU will allow proper SCSI hotunplug of a drive.
This patch is derived from work done by Wolfgang Mauerer on disk
controllers.
* src/qemu/qemu_driver.c: Fix SCSI hotplug to add a drive to
the correct controller, instead of just attaching a new
controller.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for 'drive_add' command
This patch allows for explicit hotplug/unplug of SCSI controllers.
Ordinarily this is not required, since QEMU/libvirt will attach
a new SCSI controller whenever one is required. Allowing explicit
hotplug of controllers though, enables the caller to specify a
static PCI address, instead of auto-assigning the next available
PCI slot. Or it will when we have static PCI addressing.
This patch is derived from Wolfgang Mauerer's disk controller
patch series.
* src/qemu/qemu_driver.c: Support hotplug & unplug of SCSI
controllers
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
new API for attaching PCI SCSI controllers
qemudFindCharDevicePTYsMonitor reports an error if 'info chardev' didn't
provide information for a requested device, even if the log output parsing
had found the pty path for that device. This makes pty assignment fail for
older QEMU/KVM versions. For example KVM 72 on Debian doesn't support
'info chardev', so qemuMonitorTextGetPtyPaths cannot parse any useful
information and the hash for device-id-to-pty-path mapping stays empty.
Make qemudFindCharDevicePTYsMonitor report an error only if the log output
parsing and the 'info chardev' parsing failed to provide the pty path.
The current code for using -drive simply sets the -drive 'index'
parameter. QEMU internally converts this to bus/unit depending
on the type of drive. This does not give us precise control over
the bus/unit assignment though. This change switches over to make
libvirt explicitly calculate the bus/unit number.
In addition bus/unit/index are actually irrelevant for VirtIO
disks, since each virtio disk is a separate PCI device. No disk
controller is involved.
Doing the conversion to bus/unit in libvirt allows us to correctly
attach SCSI controllers when required.
* src/qemu/qemu_conf.c: Specify bus/unit instead of index for
disks
* tests/qemuxml2argvdata/qemuxml2argv-disk*.args: Switch over from
using index=NNNN, to bus=NN, unit=NN for SCSI/IDE/Floppy disks
To enable it to be called from multiple locations, split out
the code for building the -drive arg string. This will be needed
by later patches which do drive hotplug, the conversion to use
-device, and the conversion to controller/bus/unit addressing
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Add qemuBuildDriveStr
for building -drive arg string
Convert the QEMU monitor APIs over to use virDomainDeviceAddress
structs for passing addresses in/out, instead of individual bits.
This makes the number of parameters smaller & easier to deal with.
No functional change
* src/qemu/qemu_driver.c, src/qemu/qemu_monitor.c,
src/qemu/qemu_monitor.h, src/qemu/qemu_monitor_text.c,
src/qemu/qemu_monitor_text.h: Change monitor hotplug APIs to
take an explicit address ptr for all host/guest addresses
When parsing the <disk> element specification, if no <address>
is provided for the disk, then automatically assign one based on
the <target dev='sdXX'/> device name. This provides for backwards
compatability with existing applications using libvirt, while also
allowing new apps to have complete fine grained control.
* src/conf/domain_conf.h, src/conf/domain_conf.c,
src/libvirt_private.syms: Add virDomainDiskDefAssignAddress()
for assigning a controller/bus/unit address based on disk target
* src/qemu/qemu_conf.c: Call virDomainDiskDefAssignAddress() after
generating XML from ARGV
* tests/qemuxml2argvdata/*.xml: Add in drive address information
to all XML files
All guest devices now use a common device address structure
summarized by:
enum virDomainDeviceAddressType {
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE,
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI,
};
struct _virDomainDevicePCIAddress {
unsigned int domain;
unsigned int bus;
unsigned int slot;
unsigned int function;
};
struct _virDomainDeviceInfo {
int type;
union {
virDomainDevicePCIAddress pci;
} addr;
};
This replaces the anonymous structs in Disk/Net/Hostdev data
structures. Where available, the address is *always* printed
in the XML file, instead of being hidden in the internal state
file.
<address type='pci' domain='0x0000' bus='0x1e' slot='0x07' function='0x0'/>
The structure definition is based on Wolfgang Mauerer's disk
controller patch series.
* docs/schemas/domain.rng: Define the <address> syntax and
associate it with disk/net/hostdev devices
* src/conf/domain_conf.h, src/conf/domain_conf.c,
src/libvirt_private.syms: APIs for parsing/formatting address
information. Also remove the QEMU specific 'pci_addr' attributes
* src/qemu/qemu_driver.c: Replace use of 'pci_addr' attrs with
new standardized format.
Based off how QEMU does it, look through /sys/bus/usb/devices/* for
matching vendor:product info, and if found, use info from the surrounding
files to build the device's /dev/bus/usb path.
This fixes USB device assignment by vendor:product when running qemu
as non-root (well, it should, but for some reason I couldn't reproduce
the failure people are seeing in [1], but it appears to work properly)
[1] https://bugzilla.redhat.com/show_bug.cgi?id=542450
qemudWaitForMonitor calls qemudReadLogOutput with qemudFindCharDevicePTYs
as callback. qemudFindCharDevicePTYs calls qemudExtractTTYPath to assign
a string to chr->data.file.path. Afterwards qemudWaitForMonitor may call
qemudFindCharDevicePTYsMonitor that overwrites chr->data.file.path without
freeing the old value. This results in leaking the memory allocated by
qemudExtractTTYPath.
Report an OOM error if the strdup in qemudFindCharDevicePTYsMonitor fails.
The -mem-prealloc flag should be used when using large pages
This ensures qemu tries to allocate all required memory immediately,
rather than when first used. The latter mode will crash qemu
if hugepages aren't available when accessed, while the former
should gracefully fallback to non-hugepages.
* src/qemu/qemu_conf.c: add -mem-prealloc flag to qemu command line
when using large pages
The behavior for the qemu balloon device has changed. Formerly, a virtio
balloon device was provided by default. Now, '-balloon virtio' must be
specified on the command line to enable it. This patch causes libvirt to
add '-balloon virtio' to the command line whenever the -balloon option is
available.
* src/qemu/qemu_conf.c src/qemu/qemu_conf.h: check for the new flag and
add "-baloon vitio" to qemu command when needed
* tests/qemuhelptest.c: add the new flag for detection
This change makes the 'info chardev' parser ignore any trailing
whitespace on a line. This fixes a specific problem handling a '\r\n'
line ending.
* src/qemu/qemu_monitor_text.c: Ignore trailing whitespace in
'info chardev' output.
* src/qemu/qemu_driver.c (qemudDomainMigratePrepare2): Remove useless
test of always-non-NULL uri_out parameter. Use ATTRIBUTE_NONNULL to
inform tools.
* src/qemu/qemu_conf.h: Remove QEMU_CMD_FLAG_0_12 and just leave
the lone JSON flag
* src/qemu/qemu_conf.c: Enable JSON on QEMU 0.13 or later, but
leave it disabled for now
The XML XPath for detecting JSON in the running VM statefile was
wrong causing all VMs to get JSON mode enabled at libvirtd restart.
In addition if a VM was running a JSON enabled QEMU once, and then
altered to point to a non-JSON enabled QEMU later the 'monJSON'
flag would not get reset to 0.
* src/qemu/qemu_driver.c: Fix setting/detection of JSON mode
Support for memory statistics reporting is accepted for qemu inclusion.
Statistics are reported via the monitor command 'info balloon' as a comma
seprated list:
(qemu) info balloon
balloon: actual=1024,mem_swapped_in=0,mem_swapped_out=0,major_page_faults=88,minor_page_faults=105535,free_mem=1017065472,total_mem=1045229568
Libvirt, qemu, and the guest operating system may support a subset of the
statistics defined by the virtio spec. Thus, only statistics recognized by
components will be reported.
* src/qemu/qemu_driver.c src/qemu/qemu_monitor_text.[ch]: implement the
new entry point by using info balloon monitor command
Set up the types for the domainMemoryStats function and insert it into the
virDriver structure definition. Because of static initializers, update
every driver and set the new field to NULL.
* include/libvirt/libvirt.h.in: new API
* src/driver.h src/*/*_driver.c src/vbox/vbox_tmpl.c: add the new
entry to the driver structure
* python/generator.py: fix compiler errors, the actual python binding is
implemented later
* src/driver.h: add an extra entry point in the structure
* src/esx/esx_driver.c src/lxc/lxc_driver.c src/opennebula/one_driver.c
src/openvz/openvz_driver.c src/phyp/phyp_driver.c src/qemu/qemu_driver.c
src/remote/remote_driver.c src/test/test_driver.c src/uml/uml_driver.c
src/vbox/vbox_tmpl.c src/xen/xen_driver.c: add NULL entry points for
all drivers
* src/qemu/qemu_driver.c (doNonTunnelMigrate): Don't let a
NULL "uri_out" provoke a NULL-dereference in doNativeMigrate:
supply omitted goto-after-qemudReportError.
* src/qemu/qemu_driver.c (qemudDomainMigratePrepareTunnel): Upon an
out of memory error, we would end up with unixfile==NULL and attempt
to unlink(NULL). Skip the unlink when it's NULL.
* src/qemu/qemu_driver.c (qemudDomainMigrateFinish2): Set
"event" to NULL after qemuDomainEventQueue frees it, so a
subsequent free (after endjob label) upon qemuMonitorStartCPUs
failure does not cause a double free.
If there are no references remaining to the object, vm is set to NULL
and vm->persistent cannot be accessed. Fixed by this trivial patch.
* src/qemu/qemu_driver.c (qemudDomainCoreDump): Avoid possible
NULL pointer dereference on --crash dump.
This is trivial for QEMU since you just have to not stop the vm before
starting the dump. And for Xen, you just pass the flag down to xend.
* include/libvirt/libvirt.h.in (virDomainCoreDumpFlags): Add VIR_DUMP_LIVE.
* src/qemu/qemu_driver.c (qemudDomainCoreDump): Support live dumping.
* src/xen/xend_internal.c (xenDaemonDomainCoreDump): Support live dumping.
* tools/virsh.c (opts_dump): Add --live. (cmdDump): Map it to VIR_DUMP_LIVE.
This patch adds the --crash option (already present in "xm dump-core")
to "virsh dump". virDomainCoreDump already has a flags argument, so
the API/ABI is untouched.
* include/libvirt/libvirt.h.in (virDomainCoreDumpFlags): New flag for
CoreDump
* src/test/test_driver.c (testDomainCoreDump): Do not crash
after dump unless VIR_DUMP_CRASH is given.
* src/qemu/qemu_driver.c (qemudDomainCoreDump): Shutdown the domain
instead of restarting it if --crash is passed.
* src/xen/xend_internal.c (xenDaemonDomainCoreDump): Support --crash.
* tools/virsh.c (opts_dump): Add --crash.
(cmdDump): Map it to flags for virDomainCoreDump and pass them.
1) qemuMigrateToCommand uses ">>" so we have to truncate the file
before starting the migration;
2) the command wasn't updated to chown the driver and set/restore
the security lavels;
3) the VM does not have to be resumed if migration fails;
4) the file is not removed when migration fails.
* src/qemu/qemu_driver.c (qemuDomainCoreDump): Truncate file before
dumping, set/restore ownership and security labels for the file.
Those were pointed by DanB in his review but not yet fixed
* src/qemu/qemu_driver.c: qemudWaitForMonitor() use EnterMonitorWithDriver()
and ExitMonitorWithDriver() there
* src/qemu/qemu_monitor_text.c: checking fro strdu failure and hash
table add error in qemuMonitorTextGetPtyPaths()
This change makes the QEMU driver get pty paths from the output of the
monitor 'info chardev' command. This output is structured, and contains
both the name of the device and the path on the same line. This is
considerably more reliable than parsing the startup log output, which
requires the parsing code to know which order QEMU will print pty
information in.
Note that we still need to parse the log output as the monitor itself
may be on a pty. This should be rare, however, and the new code will
replace all pty paths parsed by the log output method once the monitor
is available.
* src/qemu/qemu_monitor.(c|h) src/qemu_monitor_text.(c|h): Implement
qemuMonitorGetPtyPaths().
* src/qemu/qemu_driver.c: Get pty path information using
qemuMonitorGetPtyPaths().
Change -monitor, -serial and -parallel output to use -chardev if it is
available.
* src/qemu/qemu_conf.c: Update qemudBuildCommandLine to use -chardev where
available.
* tests/qemuxml2argvtest.c tests/qemuxml2argvdata/: Add -chardev equivalents
for all current serial and parallel tests.
Even if qemudStartVMDaemon suceeds, an error was logged such as
'qemuRemoveCgroup:1778 : internal error Unable to find cgroup for'.
This is because qemudStartVMDaemon calls qemuRemoveCgroup to
ensure that old cgroup does not remain. This workaround makes
sense but leaving an error message may confuse users.
* src/qemu/qemu_driver.c: a an option to the function to suppress the
error being logged
This adds a new flag, VIR_MIGRATE_PAUSED, that mandates pausing
the migrated VM before starting it.
* include/libvirt/libvirt.h.in (virDomainMigrateFlags): Add VIR_MIGRATE_PAUSED.
* src/qemu/qemu_driver.c (qemudDomainMigrateFinish2): Handle VIR_MIGRATE_PAUSED.
* tools/virsh.c (opts_migrate): Add --suspend. (cmdMigrate): Handle it.
* tools/virsh.pod (migrate): Document it.
This makes a small change on the failed-migration path. Up to now,
all VMs that failed non-live migration after the "stop" command
were restarted. This must not be done when the VM was paused in
the first place.
* src/qemu/qemu_driver.c (qemudDomainMigratePerform): Do not restart
a paused VM that fails migration. Set paused state after "stop",
reset it after failure.
Replace free(virBufferContentAndReset()) with virBufferFreeAndReset().
Update documentation and replace all remaining calls to free() with
calls to VIR_FREE(). Also add missing calls to virBufferFreeAndReset()
and virReportOOMError() in OOM error cases.
The QEMU 0.10.0 release (and possibly other 0.10.x) has a bug where
it sometimes/often forgets to display the initial monitor greeting
line, soley printing a (qemu). This in turn confuses the text
console parsing because it has a '(qemu)' it is not expecting. The
confusion results in a negative malloc. Bad things follow.
This re-writes the text console handling to be more robust. The key
idea is that it should only look for a (qemu), once it has seen the
original command echo'd back. This ensures it'll skip the bogus stray
(qemu) with broken QEMUs.
* src/qemu/qemu_monitor.c: Add some (disabled) debug code
* src/qemu/qemu_monitor_text.c: Re-write way command replies
are detected
Since the monitor I/O is processed out of band from the main
thread(s) invoking monitor commands, the virDomainObj may be
deleted by the I/O thread. The qemuDomainObjBeginJob takes an
extra reference to protect against final deletion, but this
reference is released by the corresponding EndJob call. THus
after the EndJob call it may not be valid to reference the
virDomainObj any more. To allow callers to detect this, the
EndJob call is changed to return the remaining reference count.
* src/conf/domain_conf.c: Make virDomainObjUnref return the
remaining reference count
* src/qemu/qemu_driver.c: Avoid referencing virDomainObjPtr
after qemuDomainObjEndJob if it has been deleted.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add callbacks
for reset, shutdown, poweroff and stop events. Add convenience
methods for emiting those events
With addition of events there will be alot of callbacks.
To avoid having to add many APIs to register callbacks,
provide them all at once in a big table
* src/qemu/qemu_driver.c: Pass in a callback table to QEMU
monitor code
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h Replace
the EOF and disk secret callbacks with a callback table
Initial support for the new QEMU monitor protocol using JSON
as the data encoding format instead of plain text
* po/POTFILES.in: Add src/qemu/qemu_monitor_json.c
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Hack to turn on QMP
mode. Replace with a version number check on >= 0.12 later
* src/qemu/qemu_monitor.c: Delegate to json monitor if enabled
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Add
impl of QMP protocol
* src/Makefile.am: Add src/qemu/qemu_monitor_json.{c,h}
Now that drivers are using a private domain object state blob,
the virDomainObjFormat/Parse methods are no longer able to
directly serialize all neccessary state to/from XML. It is
thus neccessary to introduce a pair of callbacks fo serializing
private state.
The code for serializing vCPU PIDs and the monitor device
config can now move out of domain_conf.c and into the
qemu_driver.c where they belong.
* src/conf/capabilities.h: Add callbacks for serializing private
state to/from XML
* src/conf/domain_conf.c, src/conf/domain_conf.h: Remove the
monitor, monitor_chr, monitorWatch, nvcpupids and vcpupids
fields from virDomainObjPtr. Remove code that serialized
those fields
* src/libvirt_private.syms: Export virXPathBoolean
* src/qemu/qemu_driver.c: Add callbacks for serializing monitor
and vcpupid data to/from XML
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Pass monitor
char device config into qemuMonitorOpen directly.
The code to start CPUs executing has nothing todo with CPU
affinity masks, so pull it out of the qemudInitCpuAffinity()
method and up into qemudStartVMDaemon()
* src/qemu/qemu_driver.c: Pull code to start CPUs executing out
of qemudInitCpuAffinity()
The current QEMU disk media change does not support setting the
disk format. The new JSON monitor will support this, so add an
extra parameter to pass this info in
* src/qemu/qemu_driver.c: Pass in disk format when changing media
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Add a 'format' arg to qemuMonitorChangeMedia()
The qemuMonitorEscape() method, and the VIR_ENUM for migration
status will be needed by the JSON monitor too, so move that code
into the shared qemu_monitor.c file instead of qemu_monitor_text.c
* src/qemu/qemu_monitor.h: Declare qemuMonitorMigrationStatus enum
and qemuMonitorEscapeArg and qemuMonitorEscapeShell methods
* src/qemu/qemu_monitor.c: Implement qemuMonitorMigrationStatus enum
and qemuMonitorEscapeArg and qemuMonitorEscapeShell methods
* src/qemu/qemu_monitor_text.c: Remove above methods/enum
If QEMU shuts down while we're in the middle of processing a
monitor command, the monitor will be freed, and upon cleaning
up we attempt to do qemuMonitorUnlock(priv->mon) when priv->mon
is NULL.
To address this we introduce proper reference counting into
the qemuMonitorPtr object, and hold an extra reference whenever
executing a command.
* src/qemu/qemu_driver.c: Hold a reference on the monitor while
executing commands, and only NULL-ify the priv->mon field when
the last reference is released
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add reference
counting to handle safe deletion of monitor objects
* src/conf/domain_conf.c: don't call virDomainObjUnlock twice
* src/qemu/qemu_driver.c: relock driver lock if an error occurs in
qemuDomainObjBeginJobWithDriver, enter/exit monitor with driver
in qemudDomainSave
Introduce a new type="dir" mode for <disks> that allows use of
QEMU's virtual FAT block device driver. eg
<disk type='dir' device='floppy'>
<source dir='/tmp/test'/>
<target dev='fda' bus='fdc'/>
<readonly/>
</disk>
gets turned into
-drive file=fat:floppy:/tmp/test,if=floppy,index=0
Only read-only disks are supported with virtual FAT mode
* src/conf/domain_conf.c, src/conf/domain_conf.h: Add type="dir"
* docs/schemas/domain.rng: Document new disk type
* src/xen/xend_internal.c, src/xen/xm_internal.c: Raise error for
unsupported disk types
* tests/qemuxml2argvdata/qemuxml2argv-disk-cdrom-empty.args: Fix
empty disk file handling
* tests/qemuxml2argvdata/qemuxml2argv-disk-drive-fat.args,
tests/qemuxml2argvdata/qemuxml2argv-disk-drive-fat.xml,
tests/qemuxml2argvdata/qemuxml2argv-floppy-drive-fat.args,
tests/qemuxml2argvdata/qemuxml2argv-floppy-drive-fat.xml
tests/qemuxml2argvtest.c: Test QEMU vitual FAT driver
* src/qemu/qemu_conf.c: Support generating fat:/some/dir type
disk args
* src/security/security_selinux.c: Temporarily skip labelling
of directory based disks
* src/Makefile.am: Add processinfo.h/processinfo.c
* src/util/processinfo.c, src/util/processinfo.h: Module providing
APIs for getting/setting process CPU affinity
* src/qemu/qemu_driver.c: Switch over to new APIs for schedular
affinity
* src/libvirt_private.syms: Export virProcessInfoSetAffinity
and virProcessInfoGetAffinity to internal drivers
Recent qemu releases require command option '-enable-qemu' in order
for the kvm functionality be activated. Libvirt needs to pass this flag
to qemu when starting a domain. Note that without the option,
even if both the kernel and qemu support KVM, KVM will not be activated
and VMs will be very slow.
* src/qemu/qemu_conf.h src/qemu/qemu_conf.c: parse the extra command
line option from help and add it when running kvm
* tests/qemuhelptest.c: this modified the flags output for qemu-0.10.5
and qemu-kvm-0.11.0-rc2 regression tests
The qemudStartVMDaemon() and several functions it calls use
the QEMU monitor. The QEMU driver is locked while this function
is executing, so it is rquired to release the driver lock and
reacquire it either side of issuing a monitor command. It
failed todo so, leading to deadlock
* qemu/qemu_driver.c: Release driver when in qemudStartVMDaemon
and things it calls
The QEMU monitor open method would not take a reference on
the virDomainObjPtr until it had successfully opened the
monitor. The cleanup code upon failure to open though would
call qemuMonitorClose() which would in turn decrement the
reference count. This caused the virDoaminObjPtr to be mistakenly
freed and then the whole driver crashes
* src/qemu/qemu_monitor.c: Fix reference counting in
qemuMonitorOpen
There is currently no way to determine the libvirt version of a remote
libvirtd we are connected to. This is a useful piece of data to enable
feature detection.
* src/conf/domain_conf.h src/conf/domain_conf.c: add the new entry in
the enum and lists of virDomainDiskBus
* src/qemu/qemu_conf.c: same for virDomainDiskQEMUBus
When running qemu:///system instance, libvirtd runs as root,
but QEMU may optionally be configured to run non-root. When
then saving a guest to a state file, the file is initially
created as root, and thus QEMU cannot write to it. It is also
missing labelling required to allow access via SELinux.
* src/qemu/qemu_driver.c: Set ownership on save image before
running migrate command in virDomainSave impl. Call out to
security driver to set save image labelling
* src/security/security_driver.h: Add driver APIs for setting
and restoring saved state file labelling
* src/security/security_selinux.c: Implement saved state file
labelling for SELinux
Introduce a number of new APIs to expose some boolean properties
of objects, which cannot otherwise reliably determined, nor are
aspects of the XML configuration.
* virDomainIsActive: Checking virDomainGetID is not reliable
since it is not possible to distinguish between error condition
and inactive domain for ID of -1.
* virDomainIsPersistent: Check whether a persistent config exists
for the domain
* virNetworkIsActive: Check whether the network is active
* virNetworkIsPersistent: Check whether a persistent config exists
for the network
* virStoragePoolIsActive: Check whether the storage pool is active
* virStoragePoolIsPersistent: Check whether a persistent config exists
for the storage pool
* virInterfaceIsActive: Check whether the host interface is active
* virConnectIsSecure: whether the communication channel to the
hypervisor is secure
* virConnectIsEncrypted: whether any network based commnunication
channels are encrypted
NB, a channel can be secure, even if not encrypted, eg if it does
not involve the network, like a UNIX socket, or pipe.
* include/libvirt/libvirt.h.in: Define public API
* src/driver.h: Define internal driver API
* src/libvirt.c: Implement public API entry point
* src/libvirt_public.syms: Export API symbols
* src/esx/esx_driver.c, src/lxc/lxc_driver.c,
src/interface/netcf_driver.c, src/network/bridge_driver.c,
src/opennebula/one_driver.c, src/openvz/openvz_driver.c,
src/phyp/phyp_driver.c, src/qemu/qemu_driver.c,
src/remote/remote_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c: Stub out driver tables
* src/libvirt.c src/lxc/lxc_conf.c src/lxc/lxc_container.c
src/lxc/lxc_controller.c src/node_device/node_device_hal.c
src/openvz/openvz_conf.c src/qemu/qemu_driver.c
src/qemu/qemu_monitor_text.c src/remote/remote_driver.c
src/storage/storage_backend_disk.c src/storage/storage_driver.c
src/util/logging.c src/xen/sexpr.c src/xen/xend_internal.c
src/xen/xm_internal.c: Steve Grubb <sgrubb@redhat.com> sent a code
review and those are the fixes correcting the problems
Some monitor commands may take a very long time to complete. It is
not desirable to block other incoming API calls forever. With this
change, if an existing API call is holding the job lock, additional
API calls will not wait forever. They will time out after a short
period of time, allowing application to retry later.
* include/libvirt/virterror.h, src/util/virterror.c: Add new
VIR_ERR_OPERATION_TIMEOUT error code
* src/qemu/qemu_driver.c: Change to a timed condition variable
wait for acquiring the monitor job lock
QEMU monitor commands may sleep for a prolonged period of time.
If the virDomainObjPtr or qemu driver lock is held this will
needlessly block execution of many other API calls. it also
prevents asynchronous monitor events from being dispatched
while a monitor command is executing, because deadlock will
ensure.
To resolve this, it is neccessary to release all locks while
executing a monitor command. This change introduces a flag
indicating that a monitor job is active, and a condition
variable to synchronize access to this flag. This ensures that
only a single thread can be making a state change or executing
a monitor command at a time, while still allowing other API
calls to be completed without blocking
* src/qemu/qemu_driver.c: Release driver and domain lock when
running monitor commands. Re-add locking to disk passphrase
callback
* src/qemu/THREADS.txt: Document threading rules
Change the QEMU monitor file handle watch to poll for both
read & write events, as well as EOF. All I/O to/from the
QEMU monitor FD is now done in the event callback thread.
When the QEMU driver needs to send a command, it puts the
data to be sent into a qemuMonitorMessagePtr object instance,
queues it for dispatch, and then goes to sleep on a condition
variable. The event thread sends all the data, and then waits
for the reply to arrive, putting the response / error data
back into the qemuMonitorMessagePtr and notifying the condition
variable.
There is a temporary hack in the disk passphrase callback to
avoid acquiring the domain lock. This avoids a deadlock in
the command processing, since the domain lock is still held
when running monitor commands. The next commit will remove
the locking when running commands & thus allow re-introduction
of locking the disk passphrase callback
* src/qemu/qemu_driver.c: Temporarily don't acquire lock in
disk passphrase callback. To be reverted in next commit
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Remove
raw I/O functions, and a generic qemuMonitorSend() for
invoking a command
* src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Remove all low level I/O, and use the new qemuMonitorSend()
API. Provide a qemuMonitorTextIOProcess() method for detecting
command/reply/prompt boundaries in the monitor data stream
In preparation of the monitor I/O process becoming fully asynchronous,
it is neccessary to ensure all access to internals of the qemuMonitorPtr
object is protected by a mutex lock.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add mutex for locking
monitor.
* src/qemu/qemu_driver.c: Add locking around all monitor commands
Change the QEMU driver to not directly invoke the text mode monitor
APIs. Instead add a generic wrapper layer, which will eventually
invoke either the text or JSON protocol code as needed. Pass an
qemuMonitorPtr object into the monitor APIs instead of virDomainObjPtr
to complete the de-coupling of the monitor impl from virDomainObj
data structures
* src/qemu/qemu_conf.h: Remove qemuDomainObjPrivate definition
* src/qemu/qemu_driver.c: Add qemuDomainObjPrivate definition.
Pass qemuMonitorPtr into all monitor APIs instead of the
virDomainObjPtr instance.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add thin
wrappers for all qemuMonitorXXX command APIs, calling into
qemu_monitor_text.c/h
* src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Rename qemuMonitor -> qemuMonitorText & update to accept
qemuMonitorPtr instead of virDomainObjPtr
Decouple the monitor code from the virDomainDefPtr structure
by moving the disk encryption lookup code back into the
qemu_driver.c file. Instead provide a function callback to
the monitor code which can be invoked to retrieve encryption
data as required.
* src/qemu/qemu_driver.c: Add findDomainDiskEncryption,
and findVolumeQcowPassphrase. Pass address of the method
findVolumeQcowPassphrase into qemuMonitorOpen()
* src/qemu/qemu_monitor.c: Associate a disk
encryption function callback with the qemuMonitorPtr
object.
* src/qemu/qemu_monitor_text.c: Remove findDomainDiskEncryption
and findVolumeQcowPassphrase.
Introduce a new qemuDomainObjPrivate object which is used to store
the private QEMU specific data associated with each virDomainObjPtr
instance. This contains a single member, an instance of the new
qemuMonitorPtr object which encapsulates the QEMU monitor state.
The internals of the latter are private to the qemu_monitor* files,
not to be shown to qemu_driver.c
* src/qemu/qemu_conf.h: Definition of qemuDomainObjPrivate.
* src/qemu/qemu_driver.c: Register a functions for creating
and freeing qemuDomainObjPrivate instances with the domain
capabilities. Remove the qemudDispatchVMEvent() watch since
I/O watches are now handled by the monitor code itself. Pass
a new qemuHandleMonitorEOF() callback into qemuMonitorOpen
to allow notification when the monitor quits.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Introduce
the 'qemuMonitor' object. Temporarily add new APIs
qemuMonitorWrite, qemuMonitorRead, qemuMonitorWaitForInput
to allow text based monitor impl to perform I/O.
* src/qemu/qemu_monitor_text.c: Call APIs for reading/writing
to monitor instead of accessing the file handle directly.
The qemu_driver.c code should not contain any code that interacts
with the QEMU monitor at a low level. A previous commit moved all
the command invocations out. This change moves out the code which
actually opens the monitor device.
* src/qemu/qemu_driver.c: Remove qemudOpenMonitor & methods called
from it.
* src/Makefile.am: Add qemu_monitor.{c,h}
* src/qemu/qemu_monitor.h: Add qemuMonitorOpen()
* src/qemu/qemu_monitor.c: All code for opening the monitor
* src/util/pci.c, src/util/pci.h: Make the pciDeviceList struct
opaque to callers of the API. Add accessor methods for managing
devices in the list
* src/qemu/qemu_driver.c: Update to use APIs instead of directly
accessing pciDeviceList fields
As it was basically unimplemented and more confusing than useful
at the moment.
* src/libvirt_private.syms: remove from internal symbols list
* src/qemu/qemu_bridge_filter.c src/util/ebtables.c: remove code and
one use of the unimplemented function
allows the following to be specified in a domain:
<channel type='pipe'>
<source path='/tmp/guestfwd'/>
<target type='guestfwd' address='10.0.2.1' port='4600'/>
</channel>
* proxy/Makefile.am: add network.c as dep of domain_conf.c
* docs/schemas/domain.rng src/conf/domain_conf.[ch]: extend the domain
schemas and the parsing/serialization side for the new construct
QEmu support will add the following on the qemu command line:
-chardev pipe,id=channel0,path=/tmp/guestfwd
-net user,guestfwd=tcp:10.0.2.1:4600-chardev:channel0
* src/qemu/qemu_conf.c: Add argument output for channel
* tests/qemuxml2(argv|xml)test.c: Add test for <channel> domain syntax
A character device's target (it's interface in the guest) had only a
single property: port. This patch is in preparation for adding targets
which require other properties.
Since this changes the conf type for character devices this affects
a number of drivers:
* src/conf/domain_conf.[ch] src/esx/esx_vmx.c src/qemu/qemu_conf.c
src/qemu/qemu_driver.c src/uml/uml_conf.c src/uml/uml_driver.c
src/vbox/vbox_tmpl.c src/xen/xend_internal.c src/xen/xm_internal.c:
target properties are moved into a union in virDomainChrDef, and a
targetType field is added to identify which union member should be
used. All current code which touches a virDomainChrDef is updated both
to use the new union field, and to populate targetType if necessary.
* src/qemu/qemu.conf src/qemu/qemu_conf.c src/qemu/qemu_conf.h: there is
a new config type option for mac filtering
* src/qemu/qemu_bridge_filter.[ch]: new module for the ebtable entry points
* src/qemu/qemu_driver.c: plug the MAC filtering at the right places
in the domain life cycle
* src/Makefile.am po/POTFILES.in: add the new module
- Don't duplicate SystemError
- Use proper error code in domain_conf
- Fix a broken error call in qemu_conf
- Don't use VIR_ERR_ERROR in security driver (isn't a valid code in this case)
All drivers have copy + pasted inadequate error reporting which wraps
util.c:virGetHostname. Move all error reporting to this function, and improve
what we report.
Changes from v1:
Drop the driver wrappers around virGetHostname. This means we still need
to keep the new conn argument to virGetHostname, but I think it's worth
it.
introduced on commit 9231aa7d95
* src/qemu/qemu_driver.c: in qemudRemoveDomainStatus fix a reference
to an undefined variable buf and free up an allocated string
When building with --disable-nls, I got a few messages like this:
storage/storage_backend.c: In function 'virStorageBackendCreateQemuImg':
storage/storage_backend.c:571: warning: format not a string literal and no format arguments
Fix these up.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
qemudShutdownVMDaemon() calls qemudRemoveDomainStatus(), which
then calls virFileDeletePID(). qemudShutdownVMDaemon() then
unnecessarily calls virFileDeletePID() again. Remove this second
usage of it, and also slightly refactor qemudRemoveDomainStatus()
to VIR_WARN appropriate error messages.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The LXC driver was mistakenly returning -1 for lxcStartup()
in scenarios that are not an error. This caused the libvirtd
to quit for unprivileged users. This fixes the return code
of LXC driver, and also adds a "name" field to the virStateDriver
struct and logging to make it easier to find these problems
in the future
* src/driver.h: Add a 'name' field to state driver to allow
easy identification during failures
* src/libvirt.c: Log name of failed driver for virStateInit
failures
* src/lxc/lxc_driver.c: Don't return a failure code for
lxcStartup() if LXC is not available on this host, simply
disable the driver.
* src/network/bridge_driver.c, src/node_device/node_device_devkit.c,
src/node_device/node_device_hal.c, src/opennebula/one_driver.c,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/secret/secret_driver.c, src/storage/storage_driver.c,
src/uml/uml_driver.c, src/xen/xen_driver.c: Fill in name
field in virStateDriver struct
Rename virDomainIsActive to virDomainObjIsActive, and
virInterfaceIsActive to virInterfaceObjIsActive and finally
virNetworkIsActive to virNetworkObjIsActive.
* src/conf/domain_conf.c, src/conf/domain_conf.h,
src/conf/interface_conf.h, src/conf/network_conf.c,
src/conf/network_conf.h, src/lxc/lxc_driver.c,
src/network/bridge_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/qemu/qemu_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c: Update for
renamed APIs.
Nearly all of the methods in src/util/util.h have error codes that
must be checked by the caller to correct detect & report failure.
Add ATTRIBUTE_RETURN_CHECK to ensure compile time validation of
this
* daemon/libvirtd.c: Add explicit check on return value of virAsprintf
* src/conf/domain_conf.c: Add missing check on virParseMacAddr return
value status & report error
* src/network/bridge_driver.c: Add missing OOM check on virAsprintf
and report error
* src/qemu/qemu_conf.c: Add missing check on virParseMacAddr return
value status & report error
* src/security/security_selinux.c: Remove call to virRandomInitialize
that's done in libvirt.c already
* src/storage/storage_backend_logical.c: Add check & log on virRun
return status
* src/util/util.c: Add missing checks on virAsprintf/Run status
* src/util/util.h: Annotate all methods with ATTRIBUTE_RETURN_CHECK
if they return an error status code
* src/vbox/vbox_tmpl.c: Add missing check on virParseMacAddr
* src/xen/xm_internal.c: Add missing checks on virAsprintf
* tests/qemuargv2xmltest.c: Remove bogus call to virRandomInitialize()
The virDomainObjPtr object stores state about a running domain.
This object is shared across all drivers so it is not appropriate
to include driver specific state here. This patch adds the ability
to request a blob of private data per domain object instance. The
driver must provide a allocator & deallocator for this purpose
THis patch abuses the virCapabilitiesPtr structure for storing the
allocator/deallocator callbacks, since it is already being abused
for other internal things relating to parsing. This should be moved
out into a separate object at some point.
* src/conf/capabilities.h: Add privateDataAllocFunc and
privateDataFreeFunc fields
* src/conf/domain_conf.c: Invoke the driver allocators / deallocators
when creating/freeing virDomainObjPtr instances.
* src/conf/domain_conf.h: Pass virCapsPtr into virDomainAssignDef
to allow access to the driver specific allocator function
* src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/qemu/qemu_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c: Update for
change in virDomainAssignDef contract
The current virDomainObjListPtr object stores domain objects in
an array. This means that to find a particular objects requires
O(n) time, and more critically acquiring O(n) mutex locks.
The new impl replaces the array with a virHashTable, keyed off
UUID. Finding a object based on UUID is now O(1) time, and only
requires a single mutex lock. Finding by name/id is unchanged
in complexity.
In changing this, all code which iterates over the array had
to be updated to use a hash table iterator function callback.
Several of the functions which were identically duplicating
across all drivers were pulled into domain_conf.c
* src/conf/domain_conf.h, src/conf/domain_conf.c: Change
virDomainObjListPtr to use virHashTable. Add a initializer
method virDomainObjListInit, and rename virDomainObjListFree
to virDomainObjListDeinit, since its not actually freeing
the container, only its contents. Also add some convenient
methods virDomainObjListGetInactiveNames,
virDomainObjListGetActiveIDs and virDomainObjListNumOfDomains
which can be used to implement the correspondingly named
public API entry points in drivers
* src/libvirt_private.syms: Export new methods from domain_conf.h
* src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_conf.c, src/openvz/openvz_driver.c,
src/qemu/qemu_driver.c, src/test/test_driver.c,
src/uml/uml_driver.c, src/vbox/vbox_tmpl.c: Update all code
to deal with hash tables instead of arrays for domains
If the the qemu and kvm binaries are the same, we don't include machine
types in the kvm domain info.
However, the code which refreshes the machine types info from the
previous capabilities structure first looks at the kvm domain's info,
finds it matches and then copies the empty machine types list over
for the top-level qemu domain.
That doesn't make sense, we shouldn't copy an empty machin types list.
* src/qemu/qemu_conf.c: qemudGetOldMachinesFromInfo(): don't copy an
empty machine types list.
Normally, when you migrate a domain from host A to host B,
the domain on host A remains defined but shutoff and the domain
on host B remains running but is a "transient". Add a new
flag to virDomainMigrate() to allow the original domain to be
undefined on source host A, and a new flag to virDomainMigrate() to
allow the new domain to be persisted on the destination host B.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The logic for running the decompression programs was broken in
commit f238709304, so that for
non-raw formats the decompression program was never run, and
for raw formats, it tried to exec an argv[] with initial NULL
in the program name.
* src/qemu/qemu_driver.c: Fix logic in runing decompression program
Introduces several new public API options for migration
- VIR_MIGRATE_PEER2PEER: With this flag the client only
invokes the virDomainMigratePerform method, expecting
the source host driver to do whatever is required to
complete the entire migration process.
- VIR_MIGRATE_TUNNELLED: With this flag the actual data
for migration will be tunnelled over the libvirtd RPC
channel. This requires that VIR_MIGRATE_PEER2PEER is
also set.
- virDomainMigrateToURI: This is variant of the existing
virDomainMigrate method which does not require any
virConnectPtr for the destination host. Given suitable
driver support, this allows for all the same modes as
virDomainMigrate()
The URI for VIR_MIGRATE_PEER2PEER must be a valid libvirt
URI. For non-p2p migration a hypervisor specific migration
URI is used.
virDomainMigrateToURI without a PEER2PEER flag is only
support for Xen currently, and it involves XenD talking
directly to XenD, no libvirtd involved at all.
* include/libvirt/libvirt.h.in: Add VIR_MIGRATE_PEER2PEER
flag for migration
* src/libvirt_internal.h: Add feature flags for peer to
peer migration (VIR_FEATURE_MIGRATE_P2P) and direct
migration (VIR_MIGRATE_PEER2PEER mode)
* src/libvirt.c: Implement support for VIR_MIGRATE_PEER2PEER
and virDomainMigrateToURI APIs.
* src/xen/xen_driver.c: Advertise support for DIRECT migration
* src/xen/xend_internal.c: Add TODO item for p2p migration
* src/libvirt_public.syms: Export virDomainMigrateToURI
method
* src/qemu/qemu_driver.c: Add support for PEER2PEER and
migration, and adapt TUNNELLED migration.
* tools/virsh.c: Add --p2p and --direct args and use the
new virDomainMigrateToURI method where possible.
Re-arrange the doTunnelMigrate method putting all non-QEMU local
state setup steps first. This maximises chances of success before
then starting destination QEMU for receiving incoming migration.
Altogether this can reduce the number of goto cleanup labels to
something more managable.
* qemu/qemu_driver.c: Re-order steps in doTunnelMigrate
Simplify the doTunnelMigrate code by pulling out the code for
sending all tunnelled data into separate helper
* qemu/qemu_driver.c: introduce doTunnelSendAll() method
Simplify the doTunnelMigrate() method by pulling out the code
which opens/closes the virConnectPtr object into a parent
method
* qemu/qemu_driver.c: Add doPeer2PeerMigrate which then calls
doTunnelMigrate with dconn & dom_xml
virStreamAbort is needed when the caller wishes to terminate
the stream early, not when virStreamSend fails.
* qemu/qemu_driver.c: Fix calling of virStreamAbort during
tunnelled migration
The code for tunnelled migration was added in a dedicated method,
but the native migration code is still inline in the top level
qemudDomainMigratePerform() API. Move the native code out into
a dedicated method too to make things more maintainable.
* src/qemu/qemu_driver.c: Pull code for performing a native
QEMU migration out into separate method
Since virMigratePrepareTunnel() is used for migration over the
native libvirt connection, there is never any need to pass the
target URI to this method.
* daemon/remote.c, src/driver.h, src/libvirt.c, src/libvirt_internal.h,
src/qemu/qemu_driver.c, src/remote/remote_driver.c,
src/remote/remote_protocol.c, src/remote/remote_protocol.h,
src/remote/remote_protocol.x: Remove 'uri_in' parameter from
virMigratePrepareTunnel() method
When James Morris originally submitted his sVirt patches (as seen in
libvirt 0.6.1), he did not require on disk labelling for
virSecurityDomainRestoreImageLabel. A later commit[2] changed this
behavior to assume on disk labelling, which halts implementations for
path-based MAC systems such as AppArmor and TOMOYO where
vm->def->seclabel is required to obtain the label.
* src/security/security_driver.h src/qemu/qemu_driver.c
src/security/security_selinux.c: adds the 'virDomainObjPtr vm'
argument back to *RestoreImageLabel
Fix migration, broken in two different ways by the QEMU monitor
abstraction. Note that the QEMU console emits a "\r\n" as the
line-ending.
* src/qemu/qemu_monitor_text.c (qemuMonitorGetMigrationStatus):
Fix "info migrate" command and its output's parsing.
Implementation of tunnelled migration, using a Unix Domain Socket
on the qemu backend. Note that this requires very new versions of
qemu (0.10.7 at least) in order to get the appropriate bugfixes.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The upcoming tunnelled migration needs to be able to set
a migration in progress in the background, as well as
be able to cancel a migration when a problem has happened.
This patch allows for both of these to properly work.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When using VNC for graphics + keyboard + mouse, we shouldn't
then use the host OS for audio. Audio should go back over
VNC.
When using SDL for graphics, we should use the host OS for
audio since that's where the display is. We need to allow
certain QEMU env variables to be passed through to guest
too to allow choice of QEMU audio backend.
* qemud/libvirtd.sysconf: Mention QEMU/SDL audio env vars
* src/qemu_conf.c: Passthrough QEMU/SDL audio env for SDL display,
disable host audio for VNC display
* src/qemu/qemu_monitor_text.c: Always print command and reply
in qemuMonitorCommandWithHandler. Print all args in each monitor
command API & remove redundant relpy printing
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add new
qemuMonitorRemoveHostNetwork() command for removing host
networks
* src/qemu/qemu_driver.c: Convert NIC hotplug methods over
to use qemuMonitorRemoveHostNetwork()
* src/qemu/qemu_conf.h, src/qemu/qemu_conf.c: Remove prefix arg
from qemuBuildHostNetStr which is no longer required
* src/qemu/qemu_driver.c: Refactor to use qemuMonitorAddHostNetwork()
API for adding host network
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorAddHostNetwork() method for adding host networks
* src/qemu/qemu_conf.c: Remove separator from qemuBuildNicStr()
args, and remove hardcoded 'nic' prefix. Leave it upto callers
instead
* src/qemu/qemu_driver.c: Switch over to using the new
qemuMonitorAddPCINetwork() method for NIC hotplug
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorAddPCINetwork API for PCI network device hotplug
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorCloseFileHandle and qemuMonitorSendFileHandle
APIs for processing file handles
* src/qemu/qemu_driver.c: Convert NIC hotplug method over to
use qemuMonitorCloseFileHandle and qemuMonitorSendFileHandle
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
API qemuMonitorAddPCIDisk()
* src/qemu/qemu_driver.c: Convert over to using the new
qemuMonitorAddPCIDisk() method, and remove now obsolete
qemudEscape() method
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new API
qemuMonitorRemovePCIDevice() for removing PCI device
* src/qemu/qemu_driver.c: Convert all places removing PCI devices
over to new qemuMonitorRemovePCIDevice() API
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
API qemuMonitorAddPCIHostDevice()
* src/qemu/qemu_driver.c: Switch to using qemuMonitorAddPCIHostDevice()
for PCI host device hotplug
One API adds an exact device based on bus+dev, the other adds
any device matching vendor+product
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorAddUSBDeviceExact() and qemuMonitorAddUSBDeviceMatch()
commands.
* src/qemu/qemu_driver.c: Switch over to using the new
qemuMonitorAddUSBDeviceExact() and qemuMonitorAddUSBDeviceMatch()
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorAddUSBDisk() API
* src/qemu/qemu_driver.c: Switch USB disk hotplug to the new
src/qemu/qemu_driver.c API.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorMigrateToCommand() API
* src/qemu/qemu_driver.c: Switch over to using the
qemuMonitorMigrateToCommand() API for core dumps and save
to file APIs
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new API
qemuMonitorMigrateToHost() for doing TCP migration
* src/qemu/qemu_driver.c: Convert to use qemuMonitorMigrateToHost().
Also handle proper URIs (tcp:// as well as tcp:)
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add new
qemuMonitorGetMigrationStatus() command.
* src/qemu/qemu_driver.c: Use new qemuMonitorGetMigrationStatus()
command to check completion status.
* src/qemu/qemu_driver.c: Use new qemuMonitorSetMigrationSpeed()
API during migration
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add new
qemuMonitorSetMigrationSpeed() API