Add a new element to the <os> block:
<bootmenu enable="yes|no"/>
Which maps to -boot,menu=on|off on the QEMU command line.
I decided to use an explicit 'enable' attribute rather than just make the
bootmenu element boolean. This allows us to treat lack of a bootmenu element
as 'use hypervisor default'.
When doing a PCI secondary bus reset, we must be sure that there are no
active devices on the same bus segment. The active device tracking is
designed to only track host devices that are active in use by guests.
This ignores host devices that are actively in use by the host. So the
current logic will reset host devices.
Switch this logic around and allow sbus reset when we are assigning all
devices behind a bridge to the same guest at guest startup or as a result
of a single attach-device command.
* src/util/pci.h: change signature of pciResetDevice to add an
inactive devices list
* src/qemu/qemu_driver.c src/xen/xen_driver.c: use (or not) the new
functionality of pciResetDevice() depending on the place of use
* src/util/pci.c: implement the interface and logic changes
- src/qemu/qemu_driver.c: Eliminate code duplication by using the new
helpers qemuPrepareHostdevPCIDevices and qemuDomainReAttachHostdevDevices.
This reduces the number of open coded calls to pciResetDevice.
- src/qemu/qemu_driver.c: These new helpers take hostdev list and count
directly rather than getting them indirectly from domain definition.
This will allow reuse for the attach-device case.
- src/qemu/qemu_driver.c: Update qemuGetPciHostDeviceList to take a
hostdev list and count directly, rather than getting this indirectly
from domain definition. This will allow reuse for the attach-device case.
Thanks to DV for knocking together the Relax-NG changes
quickly for me.
Changes since v1:
- Change the domain.rng to correspond to the new schema
- Don't allocate caps->ns in testQemuCapsInit since it is a static table
Changes since v2:
- Change domain.rng to add restrictions on allowed environment names
Changes since v3:
- Remove a bogus comment in the tests
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Implement the qemu driver's virDomainQemuMonitorCommand
and hook it into the API entry point.
Changes since v1:
- Rename the (external) qemuMonitorCommand to qemuDomainMonitorCommand
- Add virCheckFlags to qemuDomainMonitorCommand
Changes since v2:
- Drop ATTRIBUTE_UNUSED from the flags
Changes since v3:
- Add a flag to priv so we only print out monitor command warning once. Note
that this has not been plumbed into qemuDomainObjPrivateXMLFormat or
qemuDomainObjPrivateXMLParse, which means that if you run a monitor command,
restart libvirtd, and then run another monitor command, you may get an
an erroneous VIR_INFO. It's a pretty minor matter, and I didn't think it
warranted the additional code.
- Add BeginJob/EndJob calls around EnterMonitor/ExitMonitor
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Add the library entry point for the new virDomainQemuMonitorCommand()
entry point. Because this is not part of the "normal" libvirt API,
it gets its own header file, library file, and will eventually
get its own over-the-wire protocol later in the series.
Changes since v1:
- Go back to using the virDriver table for qemuDomainMonitorCommand, due to
linking issues
- Added versioning information to the libvirt-qemu.so
Changes since v2:
- None
Changes since v3:
- Add LGPL header to libvirt-qemu.c
- Make virLibConnError and virLibDomainError macros instead of function calls
Changes since v4:
- Move exported symbols to libvirt_qemu.syms
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Now that we have the ability to specify arbitrary qemu
command-line parameters in the XML, use it to handle unknown
command-line parameters when doing a native-to-xml conversion.
Changes since v1:
- Rename num_extra to num_args
- Fix up a memory leak on an error path
Changes since v2:
- Add a VIR_WARN when adding the argument via qemu:arg
Changes since v3:
- None
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Implement the qemu hooks for XML namespace data. This
allows us to specify a qemu XML namespace, and then
specify:
<qemu:commandline>
<qemu:arg value='arg'/>
<qemu:env name='name' value='value'/>
</qemu:commandline>
In the domain XML.
Changes since v1:
- Change the <qemu:arg>arg</qemu:arg> XML to <qemu:arg value='arg'/> XML
- Fix up some memory leaks in qemuDomainDefNamespaceParse
- Rename num_extra and extra to num_args and args, respectively
- Fixed up some error messages
- Make sure to escape user-provided data in qemuDomainDefNamespaceFormatXML
Changes since v2:
- Add checking to ensure environment variable names are valid
- Invert the logic in qemuDomainDefNamespaceFormatXML to return early
Changes since v3:
- Change strspn() to c_isalpha() check of first letter of environment variable
Signed-off-by: Chris Lalancette <clalance@redhat.com>
A Linux software bridge will assume the MAC address of the enslaved
interface with the numerically lowest MAC addr. When the bridge
changes MAC address there is a period of network blackout, so a
change should be avoided. The kernel gives TAP devices a completely
random MAC address. Occassionally the random TAP device MAC is lower
than that of the physical interface (eth0, eth1etc) that is enslaved,
causing the bridge to change its MAC.
This change sets an explicit MAC address for all TAP devices created
using the configured MAC from the XML, but with the high byte set
to 0xFE. This should ensure TAP device MACs are higher than any
physical interface MAC.
* src/qemu/qemu_conf.c, src/uml/uml_conf.c: Pass in a MAC addr
for the TAP device with high byte set to 0xFE
* src/util/bridge.c, src/util/bridge.h: Set a MAC when creating
the TAP device to override random MAC
The PCI slot 1 must be reserved at all times, since PIIX3 is
always present, even if no IDE device is in use for guest disks
* src/qemu/qemu_conf.c: Always reserve slot 1 for PIIX3
virFileOperation previously returned 0 on success, or the value of
errno on failure. Although there are other functions in libvirt that
use this convention, the preferred (and more common) convention is to
return 0 on success and -errno (or simply -1 in some cases) on
failure. This way the check for failure is always (ret < 0).
* src/util/util.c - change virFileOperation and virFileOperationNoFork to
return -errno on failure.
* src/storage/storage_backend.c, src/qemu/qemu_driver.c
- change the hook functions passed to virFileOperation to return
-errno on failure.
To try and ensure that people upgrading from old QEMU get guests
with the same PCI device ordering, change the way we assign addrs
to match QEMU's default order. This should make Windows less
annoyed.
* src/qemu/qemu_conf.c: Follow QEMU's default PCI ordering
logic when assigning addresses
* tests/*.args: Update for changed PCI addresses
To allow compatibility with older QEMU PCI device slot assignment
it is necessary to explicitly track the balloon device in the
XML. This introduces a new device
<memballoon model='virtio|xen'/>
It can also have a PCI address, auto-assigned if necessary.
The memballoon will be automatically added to all Xen and QEMU
guests by default.
* docs/schemas/domain.rng: Add <memballoon> element
* src/conf/domain_conf.c, src/conf/domain_conf.h: parsing
and formatting for memballoon device. Always add a memory
balloon device to Xen/QEMU if none exists in XML
* src/libvirt_private.syms: Export memballoon model APIs
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Honour the
PCI device address in memory balloon device
* tests/*: Update to test new functionality
The first VGA and IDE devices need to have fixed PCI address
reservations. Currently this is handled inline with the other
non-primary VGA/IDE devices. The fixed virtio balloon device
at slot 3, ensures auto-assignment skips the slots 1/2. The
virtio address will shortly become configurable though. This
means the reservation of fixed slots needs to be done upfront
to ensure that they don't get re-used for other devices.
This is more or less reverting the previous changeset:
commit 83acdeaf17
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Wed Feb 3 16:11:29 2010 +0000
Fix restore of QEMU guests with PCI device reservation
The difference is that this time, instead of unconditionally
reserving the address, we only reserve the address if it was
initially type=none. Addresses of type=pci were handled
earlier in process by qemuDomainPCIAddressSetCreate(). This
ensures restore step doesn't have problems
* src/qemu/qemu_conf.c: Reserve first VGA + IDE address
upfront
The VIR_ERR_NO_SUPPORT refers to an API which is not implemented.
There is a separate VIR_ERR_CONFIG_UNSUPPORTED for XML config
options that are not available with the current hypervisor.
* src/qemu/qemu_conf.c, src/qemu/qemu_driver.c: Remove
many VIR_ERR_NO_SUPPORT replace with VIR_ERR_CONFIG_UNSUPPORTED
If you try to execute two concurrent migrations p2p
from A->B and B->A, the two libvirtd's will deadlock
trying to perform the migrations. The reason for this is
that in p2p migration, the libvirtd's are responsible for
making the RPC Prepare, Migrate, and Finish calls. However,
they are currently holding the driver lock while doing so,
which basically guarantees deadlock in this scenario.
This patch fixes the situation by adding
qemuDomainObjEnterRemoteWithDriver and
qemuDomainObjExitRemoteWithDriver helper methods. The Enter
take an additional object reference, then drops both the
domain object lock and the driver lock. The Exit takes
both the driver and domain object lock, then drops the
reference. Adding calls to these Enter and Exit helpers
around remote calls in the various migration methods
seems to fix the problem for me in testing.
This should make the situation safe. The additional domain
object reference ensures that the domain object won't disappear
while this operation is happening. The BeginJob that is called
inside of qemudDomainMigratePerform ensures that we can't execute a
second migrate (or shutdown, or save, etc) job while the
migration is active. Finally, the additional check on the state
of the vm after we reacquire the locks ensures that we can't
be surprised by an external event (domain crash, etc).
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Record a default driver name/type in capabilities struct. Use this
when parsing disks if value is not set in XML config.
* src/conf/capabilities.h: Record default driver name/type for disks
* src/conf/domain_conf.c: Fallback to default driver name/type
when parsing disks
* src/qemu/qemu_driver.c: Set default driver name/type to raw
Disk format probing is now disabled by default. A new config
option in /etc/qemu/qemu.conf will re-enable it for existing
deployments where this causes trouble
The implementation of security driver callbacks often needs
to access the security driver object. Currently only a handful
of callbacks include the driver object as a parameter. Later
patches require this is many more places.
* src/qemu/qemu_driver.c: Pass in the security driver object
to all callbacks
* src/qemu/qemu_security_dac.c, src/qemu/qemu_security_stacked.c,
src/security/security_apparmor.c, src/security/security_driver.h,
src/security/security_selinux.c: Add a virSecurityDriverPtr
param to all security callbacks
Update the QEMU cgroups code, QEMU DAC security driver, SELinux
and AppArmour security drivers over to use the shared helper API
virDomainDiskDefForeachPath().
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Convert over to use virDomainDiskDefForeachPath()
Require the disk image to be passed into virStorageFileGetMetadata.
If this is set to VIR_STORAGE_FILE_AUTO, then the format will be
resolved using probing. This makes it easier to control when
probing will be used
* src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
src/security/security_selinux.c, src/security/virt-aa-helper.c:
Set VIR_STORAGE_FILE_AUTO when calling virStorageFileGetMetadata.
* src/storage/storage_backend_fs.c: Probe for disk format before
calling virStorageFileGetMetadata.
* src/util/storage_file.h, src/util/storage_file.c: Remove format
from virStorageFileMeta struct & require it to be passed into
method.
* src/qemu/qemu_driver.c (qemuConnectMonitor): Correct erroneous
parenthesization in two expressions. Without this fix, failure
to set or clear SELinux security context in the monitor would go
undiagnosed. Also correct a diagnostic and split some long lines.
In case qemu supports -nodefconfig, libvirt adds uses it when launching
new guests. Since this option may affect CPU models supported by qemu,
we need to use it when probing for available models.
An indentation mistake meant that a check for return status
was not properly performed in all cases. This could result
in a crash on NULL pointer in a following line.
* src/qemu/qemu_monitor_json.c: Fix check for return status
when processing JSON for blockstats
Some, but not all, codepaths in the qemuMonitorOpen() method
would trigger the destroy callback. The caller does not expect
this to be invoked if construction fails, only during normal
release of the monitor. This resulted in a possible double-unref
of the virDomainObjPtr, because the caller explicitly unrefs
the virDomainObjPtr if qemuMonitorOpen() fails
* src/qemu/qemu_monitor.c: Don't invoke destroy callback from
qemuMonitorOpen() failure paths
Make sure to *not* call qemuDomainPCIAddressReleaseAddr if
QEMUD_CMD_FLAG_DEVICE is *not* set (for older qemu). This
prevents a crash when trying to do device detachment from
a qemu guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
In the current libvirt PCI code, there is no checking whether
a PCI device is in use by a guest when doing node device
detach or reattach. This causes problems when a device is
assigned to a guest, and the administrator starts issuing
nodedevice commands. Make it so that we check the list
of active devices when trying to detach/reattach, and only
allow the operation if the device is not assigned to a guest.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
This code was just recently added (by me) and didn't account for the
fact that stdin_path is sometimes NULL. If it's NULL, and
SetSecurityAllLabel fails, a segfault would result.
When the saved domain image is on an NFS share, at least some part of
domainSetSecurityAllLabel will fail (for example, selinux labels can't
be modified). To allow domain restore to still work in this case, just
ignore the errors.
Also restore the label to its original value after qemu is finished
with the file.
Prior to this patch, qemu domain restore did not function properly if
selinux was set to enforce.
If an active migration operation fails, or is cancelled by the
admin, the QEMU on the destination is shutdown and the one on
the source continues running. It is important in shutting down
the QEMU on the destination, the security drivers don't reset
the file labelling/permissions.
* src/qemu/qemu_driver.c: Don't reset labelling/permissions
on migration abort
The patches for shared storage migration were not correctly written
for json mode. Thus the 'blk' and 'inc' parameters were never being
set. In addition they didn't set the QEMU_MONITOR_MIGRATE_BACKGROUND
so migration was synchronous. Due to multiple bugs in QEMU's JSON
impl this wasn't noticed because it treated the sync migration requst
as asynchronous anyway. Finally 'background' parameter was converted
to take arbitrary flags but not renamed, and not all uses were changed
to unsigned int.
* src/qemu/qemu_driver.c: Set QEMU_MONITOR_MIGRATE_BACKGROUND in
doNativeMigrate
* src/qemu/qemu_monitor_json.c: Process QEMU_MONITOR_MIGRATE_NON_SHARED_DISK
and QEMU_MONITOR_MIGRATE_NON_SHARED_INC flags
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.h, src/qemu/qemu_monitor_text.c,
src/qemu/qemu_monitor_text.h: change 'int background' to
'unsigned int flags' in migration APIs. Add logging of flags
parameter
During incoming migration the QEMU monitor is not able to be
used. The incoming migration code did not keep hold of the
job lock because migration is split across multiple API calls.
This meant that further monitor commands on the guest would
hang until migration finished with no timeout.
In this change the qemuDomainMigratePrepare method sets the
job flag just before it returns. The qemuDomainMigrateFinish
method checks for this job flag & clears it once done. This
prevents any use of the monitor between prepare+finish steps.
The qemuDomainGetJobInfo method is also updated to refresh
the job elapsed time. This means that virsh domjobinfo can
return time data during incoming migration
* src/qemu/qemu_driver.c: Keep a job active during incoming
migration. Refresh job elapsed time when returning job info
When configuring serial, parallel, console or channel devices
with a file, dev or pipe backend type, it is necessary to label
the file path in the security drivers. For char devices of type
file, it is neccessary to pre-create (touch) the file if it does
not already exist since QEMU won't be allowed todo so itself.
dev/pipe configs already require the admin to pre-create before
starting the guest.
* src/qemu/qemu_security_dac.c: set file ownership for character
devices
* src/security/security_selinux.c: Set file labeling for character
devices
* src/qemu/qemu_driver.c: Add character devices to cgroup ACL
We previously assumed that if the -device option existed in qemu, that
-nodefconfig would also exist. It turns out that isn't the case, as
demonstrated by qemu-kvm-0.12.3 in Fedora 13.
*/src/qemu/qemu_conf.[hc] - add a new QEMUD_CMD_FLAG, set it via the
help output, and check it before adding
-nodefconfig to the qemu commandline.
The domain XML parsing code autogenerates disk address and
controller elements when they are not explicitly specified.
The code assumes a narrow SCSI bus (7 units per bus). ESX
uses a wide SCSI bus (16 units per bus).
This is a step towards controller support for the ESX driver.
We already use the '-nodefaults' command line arg with QEMU to stop
it adding any default devices to guests. Unfortunately, QEMU will
load global config files from /etc/qemu that may also add default
devices. These aren't blocked by '-nodefaults', so we need to also
add the '-nodefconfig' arg to prevent that.
Unfortunately these global config files are also used to define
custom CPU models. So in blocking global hardware device addition
we also block definitions of new CPU models. Libvirt doesn't know
about these custom CPU models though, so it would never make use
of them anyway. Thus blocking them via -nodefconfig isn't a show
stopping problem. We would need to expand libvirt's own CPU model
XML database to support these instead.
* src/qemu/qemu_conf.c: Add '-nodefconfig' if available
* tests/qemuxml2argvdata/: Add '-nodefconfig' to all data files which
have '-nodefaults' present
The current code pattern requires that callers of qemuMonitorClose
check for the return value == 0, and if so, set priv->mon = NULL
and release the reference held on the associated virDomainObjPtr
The change d84bb6d6a3 violated that
requirement, meaning that priv->mon never gets set to NULL, and
a reference count is leaked on virDomainObjPtr.
This design was a bad one, so remove the need to check the return
valueof qemuMonitorClose(). Instead allow registration of a
callback that's invoked just when the last reference on qemuMonitorPtr
is released.
Finally there was a potential reference leak in qemuConnectMonitor
in the failure path.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add a destroy
callback invoked from qemuMonitorFree
* src/qemu/qemu_driver.c: Use the destroy callback to release the
reference on virDomainObjPtr when the monitor is freed. Fix other
potential reference count leak in connecting to monitor
Before issuing monitor commands it is neccessary to check whether
the guest is still running. Most places use virDomainIsActive()
correctly, but a few relied on 'priv->mon != NULL'. In theory
these should be equivalent, but the release of the last reference
count on priv->mon can be delayed a small amount of time until
the event handler is finally deregistered. A further ref counting
bug also means that priv->mon might be never released. In such a
case, code could mistakenly issue a monitor command and wait for
a response that will never arrive, effectively leaving the QEMU
driver waiting on virCondWait() forever..
To protect against these possibilities, make sure all code uses
virDomainIsActive(), not 'priv->mon != NULL'
* src/qemu/qemu_driver.c: Replace 'priv->mon != NULL' with
calls to 'priv->mon != NULL'()
Following Daniel Berrange's multiple helpful suggestions for improving
this patch and introducing another driver interface, I now wrote the
below patch where the nwfilter driver registers the functions to
instantiate and teardown the nwfilters with a function in
conf/domain_nwfilter.c called virDomainConfNWFilterRegister. Previous
helper functions that were called from qemu_driver.c and qemu_conf.c
were move into conf/domain_nwfilter.h with slight renaming done for
consistency. Those functions now call the function expored by
domain_nwfilter.c, which in turn call the functions of the new driver
interface, if available.
If VM startup fails early enough (can't find a referenced USB device),
libvirtd will crash trying to clear the VNC port bit, since port = 0,
which overflows us out of the bitmap bounds.
Fix this by being more defensive in the bitmap operations, and only
clearing a previously set VNC port.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Followup to https://bugzilla.redhat.com/show_bug.cgi?id=599091,
commit 20206a4b, to reduce disk waste in padding.
* src/qemu/qemu_monitor.h (QEMU_MONITOR_MIGRATE_TO_FILE_BS): Drop
back to 4k.
(QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE): New macro.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update comment.
* src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToFile): Use
two invocations of dd to output non-aligned large blocks.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToFile):
Likewise.
Match earlier change for qemu pause support with virDomainCreateXML.
* src/qemu/qemu_driver.c (qemudDomainObjStart): Add parameter; all
callers changed.
(qemudDomainStartWithFlags): Implement flag support.
When a disk is on a root squashed NFS server, it may not be
possible to stat() the disk file in virCgroupAllowDevice.
The virStorageFileGetMeta method may also fail to extract
the parent backing store. Both of these errors have to be
ignored to avoid breaking NFS deployments
* src/qemu/qemu_driver.c: Ignore errors in cgroup setup to
keep root squash NFS happy
https://bugzilla.redhat.com/show_bug.cgi?id=589465
Some guests (eg with badly configured grub, or Windows' installation cd)
require quick response from the console user. That's why we have a
"launchPaused" option in vdsm.
To implement it via libvirt, we need to ask libvirt not to call
qemuMonitorStartCPUs() after starting qemu. Calling virDomainStop
immediately after the domain is up is inherently raceful.
* src/qemu/qemu_driver.c (qemudStartVMDaemon): Add new parameter;
all callers adjusted.
(qemudDomainCreate): Implement support for new flag.
When an attempt to hotplug a PCI device to a guest fails,
the device was left attached to pci-stub. It is neccessary
to reset the device and then attach it to the host driver
again.
* src/qemu/qemu_driver.c: Reattach PCI device to host if
hotadd fails
Any output at all from device_add indicates an error in the
command execution. Thus it needs to check for reply != ""
* src/qemu/qemu_monitor_text.c: Fix reply check for errors
to treat any output as an error
When SELinux is running in MLS mode, libvirtd will have a
different security level to the VMs. For libvirtd to be
able to connect to the monitor console, the client end of
the UNIX domain socket needs a different label. This adds
infrastructure to set the socket label via the security
driver framework
* src/qemu/qemu_driver.c: Call out to socket label APIs in
security driver
* src/qemu/qemu_security_stacked.c: Wire up socket label
drivers
* src/security/security_driver.h: Define security driver
entry points for socket labelling
* src/security/security_selinux.c: Set socket label based on
VM label
To ensure that the device addressing scheme is stable across
hotplug/unplug, all virtio serial channels needs to have an
associated port number in their address. This is then specified
to QEMU using the nr=NNN parameter
* src/conf/domain_conf.c, src/conf/domain_conf.h: Parsing
for port number in vioserial address types.
* src/qemu/qemu_conf.c: Set 'nr=NNN' parameter with virtio
serial port number
* tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.args,
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.xml: Expand
data set to ensure coverage of port addressing
QEMU upstream decided against adding a 'reason' field to
the block IO event in QMP. Disable this code to remove a
annoying warning message. It will be renabled when the
error string reason is re-introduced in QEMU
Adjust args to qemudStartVMDaemon() to also specify path to stdin_fd,
so this can be passed to the AppArmor driver via SetSecurityAllLabel().
This updates all calls to qemudStartVMDaemon() as well as setting up
the non-AppArmor security driver *SetSecurityAllLabel() declarations
for the above. This is required for the following
"apparmor-fix-save-restore" patch since AppArmor resolves the passed
file descriptor to the pathname given to open().
See https://bugzilla.redhat.com/show_bug.cgi?id=599091
Saving a paused 512MB domain took 3m47s with the old block size of 512
bytes. Changing the block size to 1024*1024 decreased the time to 56
seconds. (Doubling again to 2048*1024 yielded 0 improvement; lowering
to 512k increased the save time to 1m10s, about 20%)
The pointer to the xml describing the domain is saved into an object
prior to calling VIR_REALLOC_N() to make the size of the memory it
points to a multiple of QEMU_MONITOR_MIGRATE_TO_FILE_BS. If that
operation needs to allocate new memory, the pointer that was saved is
no longer valid.
To avoid this situation, adjust the size *before* saving the pointer.
(This showed up when experimenting with very large values of
QEMU_MONITOR_MIGRATE_TO_FILE_BS).
This patch that adds support for configuring 802.1Qbg and 802.1Qbh
switches. The 802.1Qbh part has been successfully tested with real
hardware. The 802.1Qbg part has only been tested with a (dummy)
server that 'behaves' similarly to how we expect lldpad to 'behave'.
The following changes were made during the development of this patch:
- Merging Scott's v13-pre1 patch
- Fixing endptr related bug while using virStrToLong_ui() pointed out
by Jim Meyering
- Addressing Jim Meyering's comments to v11
- requiring mac address to the vpDisassociateProfileId() function to
pass it further to the 802.1Qbg disassociate part (802.1Qbh untouched)
- determining pid of lldpad daemon by reading it from /var/run/libvirt.pid
(hardcode as is hardcode alson in lldpad sources)
- merging netlink send code for kernel target and user space target
(lldpad) using one function nlComm() to send the messages
- adding a select() after the sending and before the reading of the
netlink response in case lldpad doesn't respond and so we don't hang
- when reading the port status, in case of 802.1Qbg, no status may be
received while things are 'in progress' and only at the end a status
will be there.
- when reading the port status, use the given instanceId and vf to pick
the right IFLA_VF_PORT among those nested under IFLA_VF_PORTS.
- never sending nor parsing IFLA_PORT_SELF type of messages in the
802.1Qbg case
- iterating over the elements in a IFLA_VF_PORTS to pick the right
IFLA_VF_PORT by either IFLA_PORT_PROFILE and given profileId
(802.1Qbh) or IFLA_PORT_INSTANCE_UUID and given instanceId (802.1Qbg)
and reading the current status in IFLA_PORT_RESPONSE.
- recycling a previous patch that adds functionality to interface.c to
- get the vlan identifier on an interface
- get the flags of an interface and some convenience function to
check whether an interface is 'up' or not (not currently used here)
- adding function to determine the root physical interface of an
interface. For example if a macvtap is linked to eth0.100, it will
find eth0. Also adding a function that finds the vlan on the 'way to
the root physical interface'
- conveying the root physical interface name and index in case of 802.1Qbg
- conveying mac address of macvlan device and vlan identifier in
IFLA_VFINFO_LIST[ IFLA_VF_INFO[ IFLA_VF_MAC(mac), IFLA_VF_VLAN(vlan) ] ]
to (future) lldpad via netlink
- To enable build with --without-macvtap rename the
[dis|]associatePortProfileId functions, prepend 'vp' before their
name and make them non-static functions.
- Renaming variable multicast to nltarget_kernel and inverting
the logic
- Addressing Jim Meyering's comments; this also touches existing
code for example for correcting indentation of break statements or
simplification of switch statements.
- Renamed occurrencvirVirtualPortProfileDef to virVirtualPortProfileParamses
- 802.1Qbg part prepared for sending a RTM_SETLINK and getting
processing status back plus a subsequent RTM_GETLINK to
get IFLA_PORT_RESPONSE.
Note: This interface for 802.1Qbg may still change
- [David Allan] move getPhysfn inside IFLA_VF_PORT_MAX to avoid
compiler
warning when latest if_link.h isn't available
- move from Stefan's 802.1Qb{g|h} XML v8 to v9
- move hostuuid and vf index calcs to inside doPortProfileOp8021Qbh
- remove debug fprintfs
- use virGetHostUUID (thanks Stefan!)
- fix compile issue when latest if_link.h isn't available
- change poll timeout to 10s, at 1/8 intervals
- if polling times out, log msg and return -ETIMEDOUT
- Add Stefan's code for getPortProfileStatus
- Poll for up to 2 secs for port-profile status, at 1/8 sec intervals:
- if status indicates error, abort openMacvtapTap
- if status indicates success, exit polling
- if status is "in-progress" after 2 secs of polling, exit
polling loop silently, without error
My patch finishes out the 802.1Qbh parts, which Stefan had mostly complete.
I've tested using the recent kernel updates for VF_PORT netlink msgs and
enic for Cisco's 10G Ethernet NIC. I tested many VMs, each with several
direct interfaces, each configured with a port-profile per the XML. VM-to-VM,
and VM-to-external work as expected. VM-to-VM on same host (using same NIC)
works same as VM-to-VM where VMs are on diff hosts. I'm able to change
settings on the port-profile while the VM is running to change the virtual
port behaviour. For example, adjusting a QoS setting like rate limit. All
VMs with interfaces using that port-profile immediatly see the effect of the
change to the port-profile.
I don't have a SR-IOV device to test so source dev is a non-SR-IOV device,
but most of the code paths include support for specifing the source dev and
VF index. We'll need to complete this by discovering the PF given the VF
linkdev. Once we have the PF, we'll also have the VF index. All this info-
mation is available from sysfs.
Since the macvtap device needs active tear-down and the teardown logic
is based on the interface name, it can happen that if for example 1 out
of 3 interfaces was successfully created, that during the failure path
the macvtap's target device name is used to tear down an interface that
is doesn't own (owned by another VM).
So, in this patch, the target interface name is reset so that there is
no target interface name and the interface name is always cleared after
a tear down.
The hotplug methods still had the qemuCmdFlags variable declared
as an int, instead of unsigned long long. This caused flag checks
to be incorrect for flags > 31
* src/qemu/qemu_driver.c: Fix integer overflow in hotplug
This allows libvirt to open the PCI device sysfs config file prior
to dropping privileges so qemu can access the full config space.
Without this, a de-privileged qemu can only access the first 64
bytes of config space.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Detect support
for pci-assign.configfd option. Use this option when formatting
PCI device string if possible
* src/qemu/qemu_driver.c: Pre-open PCI sysfs config file and pass
to QEMU
We've been running into a lot of situations where
virGetHostname() is returning "localhost", where a plain
gethostname() would have returned the correct thing. This
is because virGetHostname() is *always* trying to canonicalize
the name returned from gethostname(), even when it doesn't
have to.
This patch changes virGetHostname so that if the value returned
from gethostname() is already FQDN or localhost, it returns
that string directly. If the value returned from gethostname()
is a shortened hostname, then we try to canonicalize it. If
that succeeds, we returned the canonicalized hostname. If
that fails, and/or returns "localhost", then we just return
the original string we got from gethostname() and hope for
the best.
Note that after this patch it is up to clients to check whether
"localhost" is an allowed return value. The only place
where it's currently not is in qemu migration.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
This patch parses the following two XML descriptions, one for
802.1Qbg and one for 802.1Qbh, and stores the data internally.
The actual triggering of the switch setup protocol has not been
implemented here but the relevant code to do that should go into
the functions associatePortProfileId() and disassociatePortProfileId().
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbg'>
<parameters managerid='12' typeid='0x123456' typeidversion='1'
instanceid='fa9b7fff-b0a0-4893-8e0e-beef4ff18f8f'/>
</virtualport>
<filterref filter='clean-traffic'/>
</interface>
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbh'>
<parameters profileid='my_profile'/>
</virtualport>
</interface>
I'd suggest to use this patch as a base for triggering the setup
protocol with the 802.1Qb{g|h} switch.
Several rounds of changes were made to this patch. The
following is a list of these changes.
- Renamed structure virVirtualPortProfileDef to virVirtualPortProfileParams
as per Daniel Berrange's request
- Addressing Daniel Berrange's comments:
- removing macvtap.h's dependency on domain_conf.h by
moving the virVirtualPortProfileDef structure into macvtap.h
and not passing virtDomainNetDefPtr to any functions in
macvtap.c
- Addressed most of Chris Wright's comments:
- indicating error in case virtualport XML node cannot be parsed
properly
- parsing hex and decimal numbers using virStrToLong_ui() with
parameter '0' for base
- tgifname (target interface name) variable wasn't necessary
to pass to openMacvtapTap function anymore
- assigning the virtual port data structure to the virDomainNetDef
only if it was previously parsed
- make sure that the error code returned by openMacvtapTap() is a negative n
in case the associatePortProfileId() function failed.
- renaming vsi in the XML to virtualport
- replace all occurrences of vsi in the source as well
- removing mode and MAC address parameters from the functions that
will communicate with the hareware diretctly or indirectly
- moving the associate and disassociate functions to the end of the
file for subsequent patches to easier make them generally available
for export
- passing the macvtap interface name rather than the link device since
this otherwise gives funny side effects when using netlink messages
where IFLA_IFNAME and IFLA_ADDRESS are specified and the link dev
all of a sudden gets the MAC address of the macvtap interface.
- Removing rc = -1 error indications in the case of 802.1Qbg|h setup in case
we wanted to use hook scripts for the setup and so the setup doesn't fail
here.
- if instance ID UUID is not supplied it will automatically be generated
- adapted schema to make instance ID UUID optional
- added test case
- parser and XML generator have been separated into their own
functions so they can be re-used elsewhere (passthrough case
for example)
- Adapted XML parser and generator support the above shown type
(802.1Qbg, 802.1Qbh).
- Adapted schema to above XML
- Adapted test XML to above XML
- Passing through the VM's UUID which seems to be necessary for
802.1Qbh -- sorry no host UUID
- adding virtual function ID to association function, in case it's
necessary to use (for SR-IOV)
Allow for a host UUID in the capabilities XML. Local drivers
will initialize this from the SMBIOS data. If a sanity check
shows SMBIOS uuid is invalid, allow an override from the
libvirtd.conf configuration file
* daemon/libvirtd.c, daemon/libvirtd.conf: Support a host_uuid
configuration option
* docs/schemas/capability.rng: Add optional host uuid field
* src/conf/capabilities.c, src/conf/capabilities.h: Include
host UUID in XML
* src/libvirt_private.syms: Export new uuid.h functions
* src/lxc/lxc_conf.c, src/qemu/qemu_driver.c,
src/uml/uml_conf.c: Set host UUID in capabilities
* src/util/uuid.c, src/util/uuid.h: Support for host UUIDs
* src/node_device/node_device_udev.c: Use the host UUID functions
* tests/confdata/libvirtd.conf, tests/confdata/libvirtd.out: Add
new host_uuid config option to test
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
Basic live migration was broken by the commit that added
non-shared block support in two ways:
1) It added a virCheckFlags() to doNativeMigrate(). Besides
the fact that typical usage of virCheckFlags() is in driver
entry points, and doNativeMigrate() is not an entry point,
it was missing important flags like VIR_MIGRATE_LIVE. Move
the virCheckFlags to the top-level qemuDomainMigratePrepare2
and friends.
2) It also added a memory leak in qemuMonitorTextMigrate()
by not freeing the memory used by virBufferContentAndReset().
This is fixed by storing the pointer in a temporary variable
and freeing it at the end.
With this patch in place, normal live migration works again.
v3: Instead of the churn for virCheckFlagsUI and UL, instead
always promote flags to an unsigned long and always use %lx
for the fprintf.
v2: Add back flags check, which required adding virCheckFlagsUI
and virCheckFlagsUL
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Currently all host audio backends are disabled if a VM is using VNC, in
favor of the QEMU VNC audio extension. Unfortunately no released VNC
client supports this extension, so users have no way of getting audio
to work if using VNC.
Add a new config option in qemu.conf which allows changing libvirt's
behavior, but keep the default intact.
v2: Fix doc typos, change name to vnc_allow_host_audio
The device path doesn't make use of guestAddr, so the memcpy corrupts
the guest info struct.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virDomainGetBlockInfo API allows query physical block
extent and allocated block extent. These are normally the
same value unless storing a special format like qcow2
inside a block device. In this scenario we can query QEMU
to get the actual allocated extent.
Since last time:
- Return fatal error in text monitor
- Only invoke monitor command for block devices
- Fix error handling JSON code
* src/qemu/qemu_driver.c: Fill in block aloction extent when VM
is running
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
API to query the highest block extent via info blockstats
The qemu driver contains a subtle race in the logic to find next
available vnc port. Currently it iterates through all available ports
and returns the first for which bind(2) succeeds. However it is possible
that a previously issued port has not yet been bound by qemu, resulting
in the same port used for a subsequent domain.
This patch addresses the race by using a simple bitmap to "reserve" the
ports allocated by libvirt.
V2:
- Put port bitmap in struct qemud_driver
- Initialize bitmap in qemudStartup
V3:
- Check for failure of virBitmapGetBit
- Additional check for port != -1 before calling virbitmapClearBit
V4:
- Check for failure of virBitmap{Set,Clear}Bit
We need a common internal function for starting managed domains to be
used during autostart. This patch factors out relevant code from
qemudDomainStart into qemudDomainObjStart and makes it use the
refactored code for domain restore instead of calling qemudDomainRestore
API directly.
We need to be able to restore a domain which we already locked and
started a job for it without undoing these steps. This patch factors
out internals of qemudDomainRestore into separate functions which work
for locked objects.
* src/qemu/qemu_conf.c (qemudParseHelpStr): Fix errors that made
it impossible to diagnose invalid minor and micro version number
components.
Signed-off-by: Chris Wright <chrisw@redhat.com>
The current cleanup: in StartVMDaemon path is a poor duplication.
qemuShutdownVMDaemon can handle teardown for inactive VMs, so let's use it.
v2: Remove old abort: label, only use cleanup:
* src/qemu/qemu_conf.c (QEMU_VERSION_STR_1, QEMU_VERSION_STR_2):
Define these instead of...
(QEMU_VERSION_STR): ... this. Remove definition.
(qemudParseHelpStr): Check first for the new, shorter prefix,
"QEMU emulator version", and then for the old one,
"QEMU PC emulator version" when trying to parse the version number.
Based on a patch by Chris Wright.
There doesn't seem to be anything specific to tap devices for this
array of file descriptors which need to stay open of the guest to use.
Rename then for others to make use of.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Do not require each caller of virStorageFileGetMetadata and
virStorageFileGetMetadataFromFD to first clear the storage of the
"meta" buffer. Instead, initialize that storage in
virStorageFileGetMetadataFromFD.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Clear
"meta" here, not before each of the following callers.
* src/qemu/qemu_driver.c (qemuSetupDiskCgroup): Don't clear "meta" here.
(qemuTeardownDiskCgroup): Likewise.
* src/qemu/qemu_security_dac.c (qemuSecurityDACSetSecurityImageLabel):
Likewise.
* src/security/security_selinux.c (SELinuxSetSecurityImageLabel):
Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
Approximately 60 messages were marked. Since these diagnostics are
intended solely for developers and maintainers, encouraging translation
is deemed to be counterproductive:
http://thread.gmane.org/gmane.comp.emulators.libvirt/25050/focus=25052
Run this command:
git grep -l VIR_WARN|xargs perl -pi -e \
's/(VIR_WARN0?)\s*\(_\((".*?")\)/$1($2/'
There were three very similar uses of qemuMonitorAddDrive.
This change makes the three 17-line sequences identical.
* src/qemu/qemu_driver.c (qemudDomainAttachPciDiskDevice): Detect
failure. Add VIR_WARN and braces.
(qemudDomainAttachSCSIDisk): Add VIR_WARN and braces.
(qemudDomainAttachUsbMassstorageDevice): Likewise.
History has shown that there are frequent bugs in the QEMU driver
code leading to the monitor being invoked with a NULL pointer.
Although the QEMU driver code should always report an error in
this case before invoking the monitor, as a safety net put in a
generic check in the monitor code entry points.
* src/qemu/qemu_monitor.c: Safety net to check for NULL monitor
object
Any method which intends to invoke a monitor command must have
a check for virDomainObjIsActive() before using the monitor to
ensure that priv->mon != NULL.
There is one subtle edge case in this though. If a method invokes
multiple monitor commands, and calls qemuDomainObjExitMonitor()
in between two of these commands then there is no guarentee that
priv->mon != NULL anymore. This is because the QEMU process may
exit or die at any time, and because qemuDomainObjEnterMonitor()
releases the lock on virDomainObj, it is possible for the background
thread to close the monitor handle and thus qemuDomainObjExitMonitor
will release the last reference allowing priv->mon to become NULL.
This affects several methods, most notably migration but also some
hotplug methods. This patch takes a variety of approaches to solve
the problem, depending on the particular usage scenario. Generally
though it suffices to add an extra virDomainObjIsActive() check
if qemuDomainObjExitMonitor() was called during the method.
* src/qemu/qemu_driver.c: Fix multiple potential NULL pointer flaws
in usage of the monitor
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Upon look-up failure,
i.e., vm==NULL, goto cleanup, rather than to "endjob", superficially
since the latter would dereference vm, but more fundamentally because
we certainly don't want to call qemuDomainObjEndJob before we've
even attempted qemuDomainObjBeginJob.
(gdb) p/x QEMUD_CMD_FLAG_VNET_HOST
$7 = 0xffffffff80000000
Oops - that meant we were incorrectly setting QEMU_CMD_FLAG_RTC_TD_HACK
for qemu-kvm-0.12.3 (and probably botching a few other settings as well).
Fixes Red Hat BZ#592070
* src/qemu/qemu_conf.h (QEMUD_CMD_FLAG_VNET_HOST): Avoid sign
extension.
* tests/qemuhelpdata/qemu-kvm-0.12.3: New file.
* tests/qemuhelptest.c (mymain): Add another case.
A fedora translator filed:
https://bugzilla.redhat.com/show_bug.cgi?id=580816
Pointing out these two error messages as unclear: "write save" sounds
like a typo without context, and lack of a colon made the second message
difficult to parse.
virFileResolveLink was returning a positive value on error,
thus confusing callers that assumed failure was < 0. The
confusion is further evidenced by callers that would have
ended up calling virReportSystemError with a negative value
instead of a valid errno.
Fixes Red Hat BZ #591363.
* src/util/util.c (virFileResolveLink): Live up to documentation.
* src/qemu/qemu_security_dac.c
(qemuSecurityDACRestoreSecurityFileLabel): Adjust callers.
* src/security/security_selinux.c
(SELinuxRestoreSecurityFileLabel): Likewise.
* src/storage/storage_backend_disk.c
(virStorageBackendDiskDeleteVol): Likewise.
qemuReadLogOutput early VM death detection is racy and won't always work.
Startup then errors when connecting to the VM monitor. This won't report
the emulator cmdline output which is typically the most useful diagnostic.
Check if the VM has died at the very end of the monitor connection step,
and if so, report the cmdline output.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=581381
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Avoid NULL-deref
upon unknown UUID. Call qemuDomainObjBeginJob(vm) only after
ensuring that vm != NULL, not before. This potential NULL-deref
was introduced by commit 2c555d87b0.
The code specifies driver->cacheDir as the format string,
but it usually doesn't contain '%s', so the subsequent
argument, "/qemu.mem.XXXXXX", is always ignored.
The patch fixes the misuse.
Setting dynamic_ownership=0 in /etc/libvirt/qemu.conf prevents
libvirt's DAC security driver from setting uid/gid on disk
files when starting/stopping QEMU, allowing the admin to manage
this manually. As a side effect it also stopped setting of
uid/gid when saving guests to a file, which completely breaks
save when QEMU is running non-root. Thus saved state labelling
code must ignore the dynamic_ownership parameter
* src/qemu/qemu_security_dac.c: Ignore dynamic_ownership=0 when
doing save/restore image labelling
When QEMU runs with its disk on NFS, and as a non-root user, the
disk is chownd to that non-root user. When migration completes
the last step is shutting down the QEMU on the source host. THis
normally resets user/group/security label. This is bad when the
VM was just migrated because the file is still in use on the dest
host. It is thus neccessary to skip the reset step for any files
found to be on a shared filesystem
* src/libvirt_private.syms: Export virStorageFileIsSharedFS
* src/util/storage_file.c, src/util/storage_file.h: Add a new
method virStorageFileIsSharedFS() to determine if a file is
on a shared filesystem (NFS, GFS, OCFS2, etc)
* src/qemu/qemu_driver.c: Tell security driver not to reset
disk labels on migration completion
* src/qemu/qemu_security_dac.c, src/qemu/qemu_security_stacked.c,
src/security/security_selinux.c, src/security/security_driver.h,
src/security/security_apparmor.c: Add ability to skip disk
restore step for files on shared filesystems.
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
QEMU is gaining a new monitor command netdev_add for hotplugging
NICs using the netdev backend code. We already support this on
the command this, though it is disabled. This adds support for
hotplug too, also to remain disabled until 0.13 QEMU is released
* src/qemu/qemu_driver.c: Support netdev hotplug for NICs
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for netdev_add and netdev_remove commands
When closing a monitor using qemuMonitorClose(), we are aware of
the possibility the monitor is still being used somewhere:
/* NB: ordinarily one might immediately set mon->watch to -1
* and mon->fd to -1, but there may be a callback active
* that is still relying on these fields being valid. So
* we merely close them, but not clear their values and
* use this explicit 'closed' flag to track this state */
but since we call virEventAddHandle() on that monitor without increasing
its ref counter, the monitor is still freed which makes possible users
of it quite unhappy. The unhappiness can lead to a hang if qemuMonitorIO
tries to lock mutex which no longer exists.
With the introduction of the generic qemu device model, unplugging
SCSI disks works like a charm, so support it in libvirt.
* src/qemu/qemu_driver.c: Add qemudDomainDetachSCSIDiskDevice() to do the
unplugging, extend qemudDomainDetachDeviceAdd().
Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Support for live migration between hosts that do not share storage was
added to qemu-kvm release 0.12.1.
It supports two flags:
-b migration without shared storage with full disk copy
-i migration without shared storage with incremental copy (same base image
shared between source and destination).
I tested the live migration without shared storage (both flags) for native
and p2p with and without tunnelling. I also verified that the fix doesn't
affect normal migration with shared storage.
WIN32 is always defined when __MINGW32__ is defined, but the
converse is not true. WIN32 is more generic, if someone were
to ever attempt porting to a microsoft compiler. This does
not affect Cygwin, which intentionally does not define WIN32.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Use more
generic flag macro.
* src/storage/storage_backend.c
(virStorageBackendUpdateVolTargetInfoFD)
(virStorageBackendRunProgRegex): Likewise.
* tools/console.h (vshRunConsole): Likewise.
This introduces a new event type
VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
This event is the same as the previous VIR_DOMAIN_ID_IO_ERROR
event, but also includes a string describing the cause of
the event.
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventIOErrorReasonCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *srcPath,
const char *devAlias,
int action,
const char *reason,
void *opaque);
This is currently wired up to the QEMU block IO error events
* daemon/remote.c: Dispatch IO error events to client
* examples/domain-events/events-c/event-test.c: Watch for
IO error events
* include/libvirt/libvirt.h.in: Define new IO error event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle IO error events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for block IO errors and emit a libvirt IO error event
* src/remote/remote_driver.c: Receive and dispatch IO error
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
IO error events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for BLOCK_IO_ERROR event
from QEMU monitor
When using -device syntax, the IO event will have a different
prefix, 'drive-' that needs to be skipped over before matching
against the libvirt disk alias
* src/qemu/qemu_driver.c: Skip QEMU_DRIVE_HOST_PREFIX in IO event
This defines the internal driver API and stubs out each driver
* src/driver.h: Define virDrvDomainGetBlockInfo signature
* src/libvirt.c, src/libvirt_public.syms: Glue public API to drivers
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c, src/xenapi/xenapi_driver.c: Stub out driver
qemuDomainPCIAddressSetFree was freeing up the hash
table for the pci addresses, but not freeing up the addr
structure. Looking over the callers of this function, it
seems like they expect it to also free up the structure,
so do that here.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The previous commit changes a goto from 'endjob' to 'cleanup',
leaving the endjob label unused. Remove it to avoid compile
warning.
* src/qemu/qemu_driver.c: Remove 'endjob' label
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): When setting
"vm" to NULL, jump over vm-dereferencing code to "cleanup".
(qemuDomainRevertToSnapshot): Likewise.
In cases where the security driver failed to restore a label after a
guest has saved, we mistakenly jumped to the error cleanup paths.
This is not good, because the operation has in fact completed and
cannot be rolled back completely. Label restore is non-critical, so
just log the problem instead. Also add a missing restore call in
the error cleanup path
* src/qemu/qemu_driver.c: Fix handling of security driver
restore failures in QEMU domain save
When cgroups is enabled, access to block devices is likely to be
restricted to a whitelist. Prior to saving a guest to a block device,
it is necessary to add the block device to the whitelist. This is
not required upon restore, since QEMU reads from stdin
* src/qemu/qemu_driver.c: Add block device to cgroups whitelist
if neccessary during domain save.
The save process was relying on use of the shell >> append
operator to ensure the save data was placed after the libvirt
header + XML. This doesn't work for block devices though.
Replace this code with use of 'dd' and its 'seek' parameter.
This means that we need to pad the header + XML out to a
multiple of dd block size (in this case we choose 512).
The qemuMonitorMigateToCommand() monitor API is used for both
save/coredump, and migration via UNIX socket. We can't simply
switch this to use 'dd' since this causes problems with the
migration usage. Thus, create a dedicated qemuMonitorMigateToFile
which can accept an filename + offset, and remove the filename
from the current qemuMonitorMigateToCommand() API
* src/qemu/qemu_driver.c: Switch to qemuMonitorMigateToFile
for save and core dump
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Create
a new qemuMonitorMigateToFile, separate from the existing
qemuMonitorMigateToCommand to allow handling file offsets
It is possible to use block devices with domain save/restore. Upon
failure QEMU unlinks the path being saved to. This isn't good when
it is a block device !
* src/qemu/qemu_driver.c: Don't unlink block devices if save fails
If a transient QEMU crashes during save attempt, then the virDomainPtr
object may be freed. If a persistent QEMU crashes during save, then
the 'priv->mon' field is no longer valid since it will be inactive.
* src/qemu/qemu_driver.c: Fix two crashes when QEMU exits
during a save attempt
In particular I was forgetting to take the qemuMonitorPrivatePtr
lock (via qemuDomainObjBeginJob), which would cause problems
if two users tried to access the same domain at the same time.
This patch also fixes a problem where I was forgetting to remove
a transient domain from the list of domains.
Thanks to Stephen Shaw for pointing out the problem and testing
out the initial patch.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
With JSON qemu monitor, we get a STOP event from qemu whenever qemu
stops guests CPUs. The downside of it is that vm->state is changed to
PAUSED and a new generic paused event is send to applications. However,
when we ask qemu to stop the CPUs we are not really interested in qemu
event and we usually want to issue a more specific event.
By setting vm->status to PAUSED before actually sending the request to
qemu (and resetting it back if the request fails) we can ignore the
event since the event handler does nothing when the guest is already
paused. This solution is quite hacky but unfortunately it's the best
solution which I was able to come up with and it doesn't introduce a
race condition.
While doing some testing of the snapshot code I noticed that
if qemuDomainSnapshotLoad failed, it would print a NULL as
part of the error. That's not desirable, so leave the
full_path variable around until after we are done printing
errors.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
It's not needed and is currently ignored, but this is a bug.
It will get fixed soon and QMP will return an error for keys
it doesn't know about, this will break libvirt.
* src/qemu/qemu_monitor_json.c: remove qemuMonitorJSONCommandAddTimestamp()
and the place where it's invoked in qemuMonitorJSONMakeCommand()
* The error messages coming from qemu's DAC support contain strings
from the original SELinux security driver code. This just removes
references to "security context" and other SELinux-isms from the DAC
code.
Signed-off-by: Spencer Shimko <sshimko@tresys.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The hang fix in d376b7d63e was incomplete
since it left quite a few {Enter,Exit}Monitor calls which require driver
to be unlocked. Since the driver is locked throughout the whole
function, {Enter,Exit}MonitorWithDriver need to be used instead to
ensure driver is not locked when issuing monitor commands.
The comment in qemuDomainWaitForMigrationComplete says we are polling
every 50ms but the code sleeps only for 50us. This was already discussed
during review but apparently forgotten when the series was pushed.
The text monitor code was checking for a '\n' prefix on several
places. Previously this would work, but since the monitor code
re-write the '\n' is already stripped off, so mustn't be checked
for.
* src/qemu/qemu_monitor_text.c: Fix monitor error checking
Probably as a result of a merge error, the CPU hotplug command
names were completely wrong.
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Fix
the CPU hotplug command names
Adds ability to provide a preferred CPU model for CPUID data decoding.
Such model would be considered as the best possible model (if it's
supported by hypervisor) regardless on number of features which have to
be added or removed for describing required CPU.
So far, when CPUID data were converted into CPU model and features, the
features can only be added to the model. As a result, when a guest asked
for something like "qemu64,-svm" it would get a qemu32 plus a bunch of
additional features instead.
This patch adds support for removing feature from the base model.
Selection algorithm remains the same: the best CPU model is the model
which requires lowest number of features to be added/removed from it.
Qemu committed a patch which list some CPU names in [] when asked for
supported CPUs (qemu -cpu ?). Yet, it needs such CPUs to be passed
without those square braces. When probing for supported CPU models, we
can just strip the square braces and pretend we have never seen them.
First, inital VCPU pinning is set correctly but then it is reset by
assigning qemu process to a new cgroup (which contains all CPUs). It's
easily fixed by swapping these two actions.
The initial boot of VMs uses -device for NICs where available. The
corresponding monitor command is device_add, but the network hotplug
code was still using device_del by mistake.
* src/qemu/qemu_driver.c: Use device_add for NIC hotplug where
available
If either of the getfd or host_net_add monitor commands return
any text, this indicates an error condition. Don't ignore this!
* src/qemu/qemu_monitor_text.c: Report errors for getfd and
host_net_add
The 'device_del' command expects a parameter called 'id' but we
were passing 'config'.
* src/qemu/qemu_monitor_json.c: Fix device_del command parameter
Disk devices in QEMU have two parts, the guest device and the host
backend driver. Historically these two parts have had the same
"unique" name. With the switch to using -device though, they now
have separate names. Thus when changing CDROM media, for guests
using -device syntax, we need to prepend the QEMU_DRIVE_HOST_PREFIX
constant
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Add helper function
qemuDeviceDriveHostAlias() for building a host backend alias
* src/qemu/qemu_driver.c: Use qemuDeviceDriveHostAlias() to determine
the host backend alias for performing eject/change commands in the
monitor
The device_add command was added in JSON mode in a way I didn't
expect. Instead of passing the normal device string to the JSON
command:
{ "execute": "device_add", "arguments": { "device": "ne2k_pci,id=nic.1,netdev=net.1" } }
We need to split up the device string into a full JSON object
{ "execute": "device_add", "arguments": { "driver": "ne2k_pci", "id": "nic.1", "netdev": "net.1" } }
* src/qemu/qemu_conf.h, src/qemu/qemu_conf.c: Rename the
qemuCommandLineParseKeywords method to qemuParseKeywords
and export it to monitor
* src/qemu/qemu_monitor_json.c: Split up device string into
a JSON object for device_add command
The parameter for the qemuMonitorDeviceDel() is a device alias,
not a device config string. Rename the parameter reflect this
and avoid confusion to readers.
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h:
Rename devicestr to devalias in qemuMonitorDeviceDel()
The QEMU developers have stated that they will not be porting
the commands 'pci_add', 'pci_del', 'usb_add', 'usb_del' to the
JSON mode monitor, since they're obsoleted by 'device_add'
and 'device_del'. libvirt has (untested) code that would have
supported those commands in theory, but since we already use
device_add/del where available, there's no need to keep the
legacy stuff anymore.
The text mode monitor keeps support for all commands for sake
of historical compatability.
* src/qemu/qemu_monitor_json.c: Remove 'pci_add', 'pci_del',
'usb_add', 'usb_del' commands
The QEMU driver is mistakenly calling directly into the text
mode monitor for the domain memory stats query.
* src/qemu/qemu_driver.c: Replace qemuMonitorTextGetMemoryStats with
qemuMonitorGetMemoryStats
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Add the new
wrapper for qemuMonitorGetMemoryStats
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h: Add
qemuMonitorJSONGetMemoryStats implementation
Instead of reporting VIR_ERR_INTERNAL_ERROR use the more specific
VIR_ERR_CONFIG_UNSUPPORTED
* src/qemu/qemu_conf.c: Report VIR_ERR_CONFIG_UNSUPPORTED for
unsupported video adapters
To avoid race-conditions, the tear down of a filter has to happen before
the tap interface disappears and another tap interface with the same
name can re-appear. This patch tries to fix this. In one place, where
communication with the qemu monitor may fail, I am only tearing the
filters down after knowing that the function did not fail.
I am also moving the tear down functions into an include file for other
drivers to reuse.
* src/qemu/qemu_driver.c (qemudDomainAttachSCSIDisk):
Initialize "cont" to NULL, so clang knows it's set.
Add an sa_assert so it knows it's non-NULL when dereferenced.
In a couple of cases typos meant we were firing the wrong type
of event. In the python code my previous commit accidentally
missed some chunks of the code.
* python/libvirt-override-virConnect.py: Add missing python glue
accidentally left out of previous commit
* src/conf/domain_event.c, src/qemu/qemu_monitor_json.c: Fix typos
in event name / method name to invoke
Currently when we attempt to change the cdrom in a qemu VM the monitor
doesn't generate an error if the target filename doesn't exist. I've
submitted a patch[1] for this. This patch is the libvirt qemu-driver
side which catches the error message from the monitor and reportes the
error to libvirt. This means that virsh attach-disk cdrom commands
won't appear to succeed when qemu change command actually failed.
* src/qemu/qemu_monitor_text.c: in qemuMonitorTextChangeMedia() look
for failure to access the new data
When qemu libvirt driver doesn't support guest CPU selection with given
qemu binary, guests requiring specific CPU should fail to start instead
of being silently supplied with a default CPU.
* src/qemu/qemu_driver.c (qemudStartVMDaemon): Initialize "logfile"
to ensure that we don't use it uninitialized -- thus closing an
arbitrary file descriptor -- in the cleanup block.
When starting up qemu VNC autoport guests, we were
only looking through ports 5900 to 6000, meaning we
were limited to 100 total clients. Increase that
limit to 65535 (the last available port), so we can
have up to 59635 VNC autoport guests.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The images are saved in /var/lib/libvirt/qemu/save/
and named $domainname.save . The directory is created appropriately
at daemon startup. When a domain is started while a saved image is
available, libvirt will try to load this saved image, and start the
domain as usual in case of failure. In any case the saved image is
discarded once the domain is created.
* src/qemu/qemu_conf.h: adds an extra save path to the driver config
* src/qemu/qemu_driver.c: implement the 3 new operations and handling
of the image directory
virDomainManagedSave() is to be run on a running domain. Once the call
complete, as in virDomainSave() the domain is stopped upon completion,
but there is no restore counterpart as any order to start the domain
from the API would load the state from the managed file, similary if
the domain is autostarted when libvirtd starts.
Once a domain has restarted his managed save image is destroyed,
basically managed save image can only exist for a stopped domain,
for a running domain that would be by definition outdated data.
* include/libvirt/libvirt.h.in src/libvirt.c src/libvirt_public.syms:
adds the new entry points virDomainManagedSave(),
virDomainHasManagedSaveImage() and virDomainManagedSaveRemove()
* src/driver.h src/esx/esx_driver.c src/lxc/lxc_driver.c
src/opennebula/one_driver.c src/openvz/openvz_driver.c
src/phyp/phyp_driver.c src/qemu/qemu_driver.c src/vbox/vbox_tmpl.c
src/remote/remote_driver.c src/test/test_driver.c src/uml/uml_driver.c
src/xen/xen_driver.c: add corresponding new internal drivers entry
points
The clock timer XML is being updated in the following ways (based on
further off-list discussion that was missed during the initial
implementation):
1) 'wallclock' is changed to 'track', and the possible values are 'boot'
(corresponds to old 'host'), 'guest', and 'wall'.
2) 'mode' has an additional value 'smpsafe'
3) when tickpolicy='catchup', there can be an optional sub-element of
timer called 'catchup':
<catchup threshold=123 slew=120 limit=10000/>
Those three values are all longs, always optional, and if they are present,
they are positive. Internally, 0 indicates "unspecified".
* docs/schemas/domain.rng: updated RNG definition to account for changes
* src/conf/domain_conf.h: change the C struct and enums to match changes.
* src/conf/domain_conf.c: timer parse and format functions changed to
handle the new selections and new element.
* src/libvirt_private.syms: *TimerWallclock* changes to *TimerTrack*
* src/qemu/qemu_conf.c: again, account for Wallclock --> Track change.
virFileReadLimFD is a poor fit for reading the header
of the restore file. The problem is that virFileReadLimFD
returns an error when there is more data after the amount
you ask to read, but that is *expected* in this case.
This patch is essentially a revert of
1a4d5c9543, but I don't think
that commit does what it says anyway. It purports to prevent
an unwarranted OOM error, but since virFileReadLimFD will
allocate memory up to the maximum anyway, the upper limit
on the total amount of memory allocated is the same for either
the old version or the new version. Since the old saferead
actually works and virFileReadLimFD does not, revert to
using saferead.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When a watchdog/IO error occurs, one of the possible actions that
QEMU might take is to pause the guest. In this scenario libvirt
needs to update its internal state for the VM, and emit a
lifecycle event:
VIR_DOMAIN_EVENT_SUSPENDED
with a detail being one of:
VIR_DOMAIN_EVENT_SUSPENDED_IOERROR
VIR_DOMAIN_EVENT_SUSPENDED_WATCHDOG
To future proof against possible QEMU support for multiple monitor
consoles, this patch also hooks into the 'STOPPED' event in QEMU
and emits a generic VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event
* include/libvirt/libvirt.h.in: Add VIR_DOMAIN_EVENT_SUSPENDED_IOERROR
* src/qemu/qemu_driver.c: Update VM state to paused when IO error
or watchdog events occurrs
* src/qemu/qemu_monitor_json.c: Fix typo in disk IO event name
virStrToLong* guarantees (via strtol) that the end pointer will be set
to the point at which parsing stopped (even on failure, this point is
the start of the input string).
* src/esx/esx_driver.c (esxGetVersion): Remove pointless
conditional.
* src/qemu/qemu_conf.c (qemuParseCommandLinePCI)
(qemuParseCommandLineUSB, qemuParseCommandLineSmp): Likewise.
* src/qemu/qemu_monitor_text.c
(qemuMonitorTextGetMigrationStatus): Likewise.
Since the timers are defined to cover all possible config cases for
several different hypervisors, many of these possibilities generate an
error on qemu. Here is what is currently supported:
RTC: If the -rtc commandline option is available, allow setting
"clock=host"
or "clock=vm" based on the rtc timer clock='host|guest' value. Also
add "driftfix=slew" if the tickpolicy is 'catchup', or add nothing
if
tickpolicy is 'delay'. (Other tickpolicies will raise an error).
If -rtc isn't available, but -rtc-td-hack is, add that option
if the tickpolicy is 'catchup', add -rtc-td-hack, if it is 'delay'
add nothing, and if it's anything else, raise an error.
PIT: If -no-kvm-pit-reinjection is available, and tickpolicy is
'delay', add that option. if tickpolicy is 'catchup', do
nothing. Anything else --> raise an error.
If -no-kvm-pit-reinjection *isn't* available, but -tdf is, when
tickpolicy is 'catchup' add -tdf. If it's 'delay', do
nothing. Anything else --> raise an error.
If neither of those commandline options is available, and
tickpolicy is anything other than 'delay' (or unspecified), raise
an error.
HPET: If -no-hpet flag is available and present='no', add -no-hpet.
If -no-hpet is not available, and present='yes', raise an error.
If present is unspecified, the default is to do whatever this
particular qemu does by default, so don't raise an error.
All other timer types are unsupported by QEMU, so they will raise an
error.
* src/qemu/qemu_conf.c: extend qemuBuildClockArgStr() to generate the
command line arguments for the new options
* src/qemu/qemu_conf.h: define 4 new flags
* src/qemu/qemu_conf.c: check the help text of qemu for presence of
features indicated by each flag.
* tests/qemuhelptest.c: add appropriate flags into the masks for each test
The QEMU cpu affinity is used in NUMA scenarios to ensure that
guest memory is allocated from a specific node. Normally memory
is allocate on demand in vCPU threads, but when using hugepages
the initial thread leader allocates memory upfront. libvirt was
not setting affinity of the thread leader, or I/O threads. This
patch changes the code to set the process affinity in between
the fork()/exec() of QEMU. This ensures that every single QEMU
thread gets the affinity
* src/qemu/qemu_driver.c: Set affinity on entire QEMU process
at startup
Right now this implements only 2 basic hooks:
- before the qemu process is being launched
- after the qemu process is terminated
the XML description of the domain is passed to the hook script stdin
/etc/libvirt/hook/qemu
* src/qemu/qemu_driver.c: implement synchronous script hooks for QEmu
at domain startup and end
This flag is used in migration prepare step to send updated XML
definition of a guest.
Also ``virsh dumpxml --update-cpu [--inactive] guest'' command can be
used to see the updated CPU requirements.
When a domain is defined on host1, migrated to host2 and then migrated
back to host1, its current configuration would overwrite the libvirtd's
in-memory copy of persistent configuration of that domain. This is not
desired as we want to preserve the persistent configuration untouched.
This patch introduces new 'live' parameter to virDomainAssignDef.
Passing 'true' for 'live' means the configuration passed to
virDomainAssignDef describes a configuration of live instance of the
domain. This applies for saved domains which are being restored or for
incoming domains during migration.
All callers have been changed to pass the appropriate value.
* Fixes per feedback from Dan and Daniel
* Added test datafiles
* Re-disabled JSON flags
* Added code to print the error policy attribute when generating XML
* Re-add empty tag
Add support for Qemu to have firewall rules applied and removed on VM
startup and shutdown respectively. This patch also provides support for
the updating of a filter that causes all VMs that reference the filter
to have their ebtables/iptables rules updated.
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
To find out where the net type 'direct' needs to be handled I introduced
the 'enum virDomainNetType' in the virDomainNetDef structure and let the
compiler tell me where the case statement is missing. Then I added the
unhandled device statement to the UML driver.
* src/conf/domain_conf.h: change _virDomainNetDef type from int to
virDomainNetType enum
* src/conf/domain_conf.c src/lxc/lxc_driver.c src/qemu/qemu_conf.c
src/uml/uml_conf.c: make sure all enum cases are properly handled
in switches