When a disk is on a root squashed NFS server, it may not be
possible to stat() the disk file in virCgroupAllowDevice.
The virStorageFileGetMeta method may also fail to extract
the parent backing store. Both of these errors have to be
ignored to avoid breaking NFS deployments
* src/qemu/qemu_driver.c: Ignore errors in cgroup setup to
keep root squash NFS happy
https://bugzilla.redhat.com/show_bug.cgi?id=589465
Some guests (eg with badly configured grub, or Windows' installation cd)
require quick response from the console user. That's why we have a
"launchPaused" option in vdsm.
To implement it via libvirt, we need to ask libvirt not to call
qemuMonitorStartCPUs() after starting qemu. Calling virDomainStop
immediately after the domain is up is inherently raceful.
* src/qemu/qemu_driver.c (qemudStartVMDaemon): Add new parameter;
all callers adjusted.
(qemudDomainCreate): Implement support for new flag.
* This patch is a modification of a patch submitted by Nigel Jones.
It fixes several memory leaks on device addition/removal:
1. Free the virNodeDeviceDefPtr in udevAddOneDevice if the return
value is non-zero
2. Always release the node device reference after the device has been
processed.
* Refactored for better readability per the suggestion of clalance
A look at the QEMU source revealed the missing bits of info about
the VPC file format, so we can enable this now
* src/util/storage_file.c: Enable VPC format, providing version
and disk size offset fields
When an attempt to hotplug a PCI device to a guest fails,
the device was left attached to pci-stub. It is neccessary
to reset the device and then attach it to the host driver
again.
* src/qemu/qemu_driver.c: Reattach PCI device to host if
hotadd fails
The restore code is done in places where errors cannot be
raised, since they will overwrite over pre-existing errors.
* src/security/security_selinux.c: Only warn about failures
in label restore, don't report errors
Any output at all from device_add indicates an error in the
command execution. Thus it needs to check for reply != ""
* src/qemu/qemu_monitor_text.c: Fix reply check for errors
to treat any output as an error
HAL is deprecated and UDEV is the future. Thus if both
options are compiled, we should prefer use of UDEV over
HAL
* src/node_device/node_device_driver.c: Switch init
order to try UDEV first, then HAL
When SELinux is running in MLS mode, libvirtd will have a
different security level to the VMs. For libvirtd to be
able to connect to the monitor console, the client end of
the UNIX domain socket needs a different label. This adds
infrastructure to set the socket label via the security
driver framework
* src/qemu/qemu_driver.c: Call out to socket label APIs in
security driver
* src/qemu/qemu_security_stacked.c: Wire up socket label
drivers
* src/security/security_driver.h: Define security driver
entry points for socket labelling
* src/security/security_selinux.c: Set socket label based on
VM label
The network driver is not doing correct checking for
duplicate UUID/name values. This introduces a new method
virNetworkObjIsDuplicate, based on the previously
written virDomainObjIsDuplicate.
* src/conf/network_conf.c, src/conf/network_conf.c,
src/libvirt_private.syms: Add virNetworkObjIsDuplicate,
* src/network/bridge_driver.c: Call virNetworkObjIsDuplicate
for checking uniqueness of uuid/names
The storage pool driver is mistakenly using the error code
VIR_ERR_INVALID_STORAGE_POOL which is for diagnosing invalid
pointers. This patch switches it to use VIR_ERR_NO_STORAGE_POOL
which is the correct code for cases where the storage pool does
not exist
* src/storage/storage_driver.c: Replace VIR_ERR_INVALID_STORAGE_POOL
with VIR_ERR_NO_STORAGE_POOL
The storage pool driver is not doing correct checking for
duplicate UUID/name values. This introduces a new method
virStoragePoolObjIsDuplicate, based on the previously
written virDomainObjIsDuplicate.
* src/conf/storage_conf.c, src/conf/storage_conf.c,
src/libvirt_private.syms: Add virStoragePoolObjIsDuplicate,
* src/storage/storage_driver.c: Call virStoragePoolObjIsDuplicate
for checking uniqueness of uuid/names
The domain parsing code would auto-add a virtio serial controller
if it saw any virtio serial channel defined. Unfortunately it
always added a controller with index=0, even if the channel address
specified an index != 0. It only added one controller, even if
multiple controllers were referenced by channels. Finally, it let
the ports+vectors parameters initialize to zero instead of -1, which
prevented the controllers accepting any ports.
* src/conf/domain_conf.c: Initialize ports+vectors when adding
virtio serial controllers. Add all neccessary virtio serial
controllers, instead of hardcoding controller 0
* qemuxml2argvdata/qemuxml2argv-channel-virtio.args,
qemuxml2argvdata/qemuxml2argv-channel-virtio.xml: Expand to
test controller auto-add behaviour
To ensure that the device addressing scheme is stable across
hotplug/unplug, all virtio serial channels needs to have an
associated port number in their address. This is then specified
to QEMU using the nr=NNN parameter
* src/conf/domain_conf.c, src/conf/domain_conf.h: Parsing
for port number in vioserial address types.
* src/qemu/qemu_conf.c: Set 'nr=NNN' parameter with virtio
serial port number
* tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.args,
tests/qemuxml2argvdata/qemuxml2argv-channel-virtio.xml: Expand
data set to ensure coverage of port addressing
QEMU upstream decided against adding a 'reason' field to
the block IO event in QMP. Disable this code to remove a
annoying warning message. It will be renabled when the
error string reason is re-introduced in QEMU
Adjust args to qemudStartVMDaemon() to also specify path to stdin_fd,
so this can be passed to the AppArmor driver via SetSecurityAllLabel().
This updates all calls to qemudStartVMDaemon() as well as setting up
the non-AppArmor security driver *SetSecurityAllLabel() declarations
for the above. This is required for the following
"apparmor-fix-save-restore" patch since AppArmor resolves the passed
file descriptor to the pathname given to open().
See https://bugzilla.redhat.com/show_bug.cgi?id=599091
Saving a paused 512MB domain took 3m47s with the old block size of 512
bytes. Changing the block size to 1024*1024 decreased the time to 56
seconds. (Doubling again to 2048*1024 yielded 0 improvement; lowering
to 512k increased the save time to 1m10s, about 20%)
The pointer to the xml describing the domain is saved into an object
prior to calling VIR_REALLOC_N() to make the size of the memory it
points to a multiple of QEMU_MONITOR_MIGRATE_TO_FILE_BS. If that
operation needs to allocate new memory, the pointer that was saved is
no longer valid.
To avoid this situation, adjust the size *before* saving the pointer.
(This showed up when experimenting with very large values of
QEMU_MONITOR_MIGRATE_TO_FILE_BS).
Fixes for issues in commit 211dd1e9 noted by by Jim Meyering.
1. Allocate content buffer of size content_length + 1 to ensure
NUL-termination.
2. Limit content buffer size to 64k
3. Fix whitespace issue
V2:
- Add comment to clarify allocation of content buffer
- Add ATTRIBUTE_NONNULL where appropriate
- User NULLSTR macro
There are cases when a response from xend can exceed 4096 bytes, in
which case anything beyond 4096 is ignored. This patch changes the
current fixed-size, stack-allocated buffer to a dynamically allocated
buffer based on Content-Length in HTTP header.
* It appears that the udev event for HBA creation arrives before the
associated sysfs data is fully populated, resulting in bogus data
for the nodedev entry until the entry is refreshed. This problem is
particularly troublesome when creating NPIV vHBAs because it results
in libvirt failing to find the newly created adapter and waiting for
the full timeout period before erroneously failing the create
operation. This patch forces an update before any attempt to use
any scsi_host nodedev entry.
This patch that adds support for configuring 802.1Qbg and 802.1Qbh
switches. The 802.1Qbh part has been successfully tested with real
hardware. The 802.1Qbg part has only been tested with a (dummy)
server that 'behaves' similarly to how we expect lldpad to 'behave'.
The following changes were made during the development of this patch:
- Merging Scott's v13-pre1 patch
- Fixing endptr related bug while using virStrToLong_ui() pointed out
by Jim Meyering
- Addressing Jim Meyering's comments to v11
- requiring mac address to the vpDisassociateProfileId() function to
pass it further to the 802.1Qbg disassociate part (802.1Qbh untouched)
- determining pid of lldpad daemon by reading it from /var/run/libvirt.pid
(hardcode as is hardcode alson in lldpad sources)
- merging netlink send code for kernel target and user space target
(lldpad) using one function nlComm() to send the messages
- adding a select() after the sending and before the reading of the
netlink response in case lldpad doesn't respond and so we don't hang
- when reading the port status, in case of 802.1Qbg, no status may be
received while things are 'in progress' and only at the end a status
will be there.
- when reading the port status, use the given instanceId and vf to pick
the right IFLA_VF_PORT among those nested under IFLA_VF_PORTS.
- never sending nor parsing IFLA_PORT_SELF type of messages in the
802.1Qbg case
- iterating over the elements in a IFLA_VF_PORTS to pick the right
IFLA_VF_PORT by either IFLA_PORT_PROFILE and given profileId
(802.1Qbh) or IFLA_PORT_INSTANCE_UUID and given instanceId (802.1Qbg)
and reading the current status in IFLA_PORT_RESPONSE.
- recycling a previous patch that adds functionality to interface.c to
- get the vlan identifier on an interface
- get the flags of an interface and some convenience function to
check whether an interface is 'up' or not (not currently used here)
- adding function to determine the root physical interface of an
interface. For example if a macvtap is linked to eth0.100, it will
find eth0. Also adding a function that finds the vlan on the 'way to
the root physical interface'
- conveying the root physical interface name and index in case of 802.1Qbg
- conveying mac address of macvlan device and vlan identifier in
IFLA_VFINFO_LIST[ IFLA_VF_INFO[ IFLA_VF_MAC(mac), IFLA_VF_VLAN(vlan) ] ]
to (future) lldpad via netlink
- To enable build with --without-macvtap rename the
[dis|]associatePortProfileId functions, prepend 'vp' before their
name and make them non-static functions.
- Renaming variable multicast to nltarget_kernel and inverting
the logic
- Addressing Jim Meyering's comments; this also touches existing
code for example for correcting indentation of break statements or
simplification of switch statements.
- Renamed occurrencvirVirtualPortProfileDef to virVirtualPortProfileParamses
- 802.1Qbg part prepared for sending a RTM_SETLINK and getting
processing status back plus a subsequent RTM_GETLINK to
get IFLA_PORT_RESPONSE.
Note: This interface for 802.1Qbg may still change
- [David Allan] move getPhysfn inside IFLA_VF_PORT_MAX to avoid
compiler
warning when latest if_link.h isn't available
- move from Stefan's 802.1Qb{g|h} XML v8 to v9
- move hostuuid and vf index calcs to inside doPortProfileOp8021Qbh
- remove debug fprintfs
- use virGetHostUUID (thanks Stefan!)
- fix compile issue when latest if_link.h isn't available
- change poll timeout to 10s, at 1/8 intervals
- if polling times out, log msg and return -ETIMEDOUT
- Add Stefan's code for getPortProfileStatus
- Poll for up to 2 secs for port-profile status, at 1/8 sec intervals:
- if status indicates error, abort openMacvtapTap
- if status indicates success, exit polling
- if status is "in-progress" after 2 secs of polling, exit
polling loop silently, without error
My patch finishes out the 802.1Qbh parts, which Stefan had mostly complete.
I've tested using the recent kernel updates for VF_PORT netlink msgs and
enic for Cisco's 10G Ethernet NIC. I tested many VMs, each with several
direct interfaces, each configured with a port-profile per the XML. VM-to-VM,
and VM-to-external work as expected. VM-to-VM on same host (using same NIC)
works same as VM-to-VM where VMs are on diff hosts. I'm able to change
settings on the port-profile while the VM is running to change the virtual
port behaviour. For example, adjusting a QoS setting like rate limit. All
VMs with interfaces using that port-profile immediatly see the effect of the
change to the port-profile.
I don't have a SR-IOV device to test so source dev is a non-SR-IOV device,
but most of the code paths include support for specifing the source dev and
VF index. We'll need to complete this by discovering the PF given the VF
linkdev. Once we have the PF, we'll also have the VF index. All this info-
mation is available from sysfs.
Fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=598272
Some files under /sys/bus/usb/devices/ have the format 'usbX', where
X is the USB bus number. Use STRPREFIX to correctly parse the bus numbers.
If a directory pool contains pipes or sockets, a pool start can fail or hang:
https://bugzilla.redhat.com/show_bug.cgi?id=589577
We already try to avoid these special files, but only attempt after
opening the path, which is where the problems lie. Unify volume opening
into helper functions, which use the proper open() flags to avoid error,
followed by fstat to validate storage mode.
Previously, virStorageBackendUpdateVolTargetInfoFD attempted to enforce the
storage mode check, but allowed callers to detect this case and silently
continue. In practice, only the FS backend was using this feature, the rest
were treating unknown mode as an error condition. Unfortunately the InfoFD
function wasn't raising an error message here, so error reporting was
busted.
This patch adds 2 functions: virStorageBackendVolOpen, and
virStorageBackendVolOpenModeSkip. The latter retains the original opt out
semantics, the former now throws an explicit error.
This patch maintains the previous volume mode checks: allowing specific
modes for specific pool types requires a bit of surgery, since VolOpen
is called through several different helper functions.
v2: Use ATTRIBUTE_NONNULL. Drop stat check, just open with
O_NONBLOCK|O_NOCTTY.
v3: Move mode check logic back to VolOpen. Use 2 VolOpen functions with
different error semantics.
v4: Make second VolOpen function more extensible. Didn't opt to change
FS backend defaults, this can just be to fix the original bug.
v5: Prefix default flags with VIR_, use ATTRIBUTE_RETURN_CHECK
Since the macvtap device needs active tear-down and the teardown logic
is based on the interface name, it can happen that if for example 1 out
of 3 interfaces was successfully created, that during the failure path
the macvtap's target device name is used to tear down an interface that
is doesn't own (owned by another VM).
So, in this patch, the target interface name is reset so that there is
no target interface name and the interface name is always cleared after
a tear down.
* If a nodedev has a parent that we don't want to display, we should
continue walking up the udev device tree to see if any of its
earlier ancestors are devices that we display. It makes the tree
much nicer looking than having a whole lot of devices hanging off
the root node.
These files are borrowed from upstream release versions, and should
not need further edits in the context of libvirt (instead, a new
upstream vbox release would entail adding a new header file). We do
not re-generate these files as part of libvirt, nor do we want to lose
our minor edits (such as cppi cleanups).
* src/vbox/vbox_CAPI_v2_2.h: Clarify file origins.
* src/vbox/vbox_CAPI_v3_0.h: Likewise.
* src/vbox/vbox_CAPI_v3_1.h: Likewise.
* src/vbox/vbox_CAPI_v3_2.h: Likewise. Reindent with cppi.
Fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=235961
If using the default virtual network, an easy way to lose guest network
connectivity is to install libvirt inside the VM. The autostarted
default network inside the guest collides with host virtual network
routing. This is a long standing issue that has caused users quite a
bit of pain and confusion.
On network startup, parse /proc/net/route and compare the requested
IP+netmask against host routing destinations: if any matches are found,
refuse to start the network.
v2: Drop sscanf, fix a comment typo, comment that function could use
libnl instead of /proc
v3: Consider route netmask. Compare binary data rather than convert to
string.
v4: Return to using sscanf, drop inet functions in favor of virSocket,
parsing safety checks. Don't make parse failures fatal, in case
expected format changes.
v5: Try and continue if we receive unexpected. Delimit parsed lines to
prevent scanning past newline
'listen' isn't a valid qemu-dm option, as reported a long time ago here:
https://bugzilla.redhat.com/show_bug.cgi?id=492958
Matches the near identical logic in qemu_conf.c
v2: When parsing sexpr, only match on ",server", rather than
full ',server,nowait'.
* Incorporated Jim's feedback (v1 & v2)
* Moved case of DEVTYPE == "wlan" up as it's definitive that we have a network interface.
* Made comment more detailed about the wired case to explain better
how it differentiates between wired network interfaces and USB
devices.
Eliminate almost all backward jumps by replacing this common pattern:
int
some_random_function(void)
{
int result = 0;
...
cleanup:
<unconditional cleanup code>
return result;
failure:
<cleanup code in case of an error>
result = -1;
goto cleanup
}
with this simpler pattern:
int
some_random_function(void)
{
int result = -1;
...
result = 0;
cleanup:
if (result < 0) {
<cleanup code in case of an error>
}
<unconditional cleanup code>
return result;
}
Add a bool success variable in functions that don't have a int result
that can be used for the new pattern.
Also remove some unnecessary memsets in error paths.
The hotplug methods still had the qemuCmdFlags variable declared
as an int, instead of unsigned long long. This caused flag checks
to be incorrect for flags > 31
* src/qemu/qemu_driver.c: Fix integer overflow in hotplug
This allows libvirt to open the PCI device sysfs config file prior
to dropping privileges so qemu can access the full config space.
Without this, a de-privileged qemu can only access the first 64
bytes of config space.
* src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Detect support
for pci-assign.configfd option. Use this option when formatting
PCI device string if possible
* src/qemu/qemu_driver.c: Pre-open PCI sysfs config file and pass
to QEMU
We've been running into a lot of situations where
virGetHostname() is returning "localhost", where a plain
gethostname() would have returned the correct thing. This
is because virGetHostname() is *always* trying to canonicalize
the name returned from gethostname(), even when it doesn't
have to.
This patch changes virGetHostname so that if the value returned
from gethostname() is already FQDN or localhost, it returns
that string directly. If the value returned from gethostname()
is a shortened hostname, then we try to canonicalize it. If
that succeeds, we returned the canonicalized hostname. If
that fails, and/or returns "localhost", then we just return
the original string we got from gethostname() and hope for
the best.
Note that after this patch it is up to clients to check whether
"localhost" is an allowed return value. The only place
where it's currently not is in qemu migration.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The virVirtualPortProfileFormat just went below the
virVirtualPortProfileParamsParseXML function and got inside the
The attached patch moves virVirtualPortProfileFormat below the #ifndef
PROXY block.
Allows listing existing pools and requesting information about them.
Alter the esxVI_ProductVersion enum in a way that allows to check for
product type by masking.
This patch parses the following two XML descriptions, one for
802.1Qbg and one for 802.1Qbh, and stores the data internally.
The actual triggering of the switch setup protocol has not been
implemented here but the relevant code to do that should go into
the functions associatePortProfileId() and disassociatePortProfileId().
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbg'>
<parameters managerid='12' typeid='0x123456' typeidversion='1'
instanceid='fa9b7fff-b0a0-4893-8e0e-beef4ff18f8f'/>
</virtualport>
<filterref filter='clean-traffic'/>
</interface>
<interface type='direct'>
<source dev='eth0.100' mode='vepa'/>
<model type='virtio'/>
<virtualport type='802.1Qbh'>
<parameters profileid='my_profile'/>
</virtualport>
</interface>
I'd suggest to use this patch as a base for triggering the setup
protocol with the 802.1Qb{g|h} switch.
Several rounds of changes were made to this patch. The
following is a list of these changes.
- Renamed structure virVirtualPortProfileDef to virVirtualPortProfileParams
as per Daniel Berrange's request
- Addressing Daniel Berrange's comments:
- removing macvtap.h's dependency on domain_conf.h by
moving the virVirtualPortProfileDef structure into macvtap.h
and not passing virtDomainNetDefPtr to any functions in
macvtap.c
- Addressed most of Chris Wright's comments:
- indicating error in case virtualport XML node cannot be parsed
properly
- parsing hex and decimal numbers using virStrToLong_ui() with
parameter '0' for base
- tgifname (target interface name) variable wasn't necessary
to pass to openMacvtapTap function anymore
- assigning the virtual port data structure to the virDomainNetDef
only if it was previously parsed
- make sure that the error code returned by openMacvtapTap() is a negative n
in case the associatePortProfileId() function failed.
- renaming vsi in the XML to virtualport
- replace all occurrences of vsi in the source as well
- removing mode and MAC address parameters from the functions that
will communicate with the hareware diretctly or indirectly
- moving the associate and disassociate functions to the end of the
file for subsequent patches to easier make them generally available
for export
- passing the macvtap interface name rather than the link device since
this otherwise gives funny side effects when using netlink messages
where IFLA_IFNAME and IFLA_ADDRESS are specified and the link dev
all of a sudden gets the MAC address of the macvtap interface.
- Removing rc = -1 error indications in the case of 802.1Qbg|h setup in case
we wanted to use hook scripts for the setup and so the setup doesn't fail
here.
- if instance ID UUID is not supplied it will automatically be generated
- adapted schema to make instance ID UUID optional
- added test case
- parser and XML generator have been separated into their own
functions so they can be re-used elsewhere (passthrough case
for example)
- Adapted XML parser and generator support the above shown type
(802.1Qbg, 802.1Qbh).
- Adapted schema to above XML
- Adapted test XML to above XML
- Passing through the VM's UUID which seems to be necessary for
802.1Qbh -- sorry no host UUID
- adding virtual function ID to association function, in case it's
necessary to use (for SR-IOV)
This patch introduces a dependency on libnl, which subsequent patches
will then use.
Changes from V1 to V2:
- added diffstats
- following changes in tree
Spurious / in a pool target path makes life difficult for apps using the
GetVolByPath, and doing other path based comparisons with pools. This
has caused a few issues for virt-manager users:
https://bugzilla.redhat.com/show_bug.cgi?id=494005https://bugzilla.redhat.com/show_bug.cgi?id=593565
Add a new util API which removes spurious /, virFileSanitizePath. Sanitize
target paths when parsing pool XML, and for paths passed to GetVolByPath.
v2: Leading // must be preserved, properly sanitize path=/, sanitize
away /./ -> /
v3: Properly handle starting ./ and ending /.
v4: Drop all '.' handling, just sanitize / for now.
Allow for a host UUID in the capabilities XML. Local drivers
will initialize this from the SMBIOS data. If a sanity check
shows SMBIOS uuid is invalid, allow an override from the
libvirtd.conf configuration file
* daemon/libvirtd.c, daemon/libvirtd.conf: Support a host_uuid
configuration option
* docs/schemas/capability.rng: Add optional host uuid field
* src/conf/capabilities.c, src/conf/capabilities.h: Include
host UUID in XML
* src/libvirt_private.syms: Export new uuid.h functions
* src/lxc/lxc_conf.c, src/qemu/qemu_driver.c,
src/uml/uml_conf.c: Set host UUID in capabilities
* src/util/uuid.c, src/util/uuid.h: Support for host UUIDs
* src/node_device/node_device_udev.c: Use the host UUID functions
* tests/confdata/libvirtd.conf, tests/confdata/libvirtd.out: Add
new host_uuid config option to test
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
Basic live migration was broken by the commit that added
non-shared block support in two ways:
1) It added a virCheckFlags() to doNativeMigrate(). Besides
the fact that typical usage of virCheckFlags() is in driver
entry points, and doNativeMigrate() is not an entry point,
it was missing important flags like VIR_MIGRATE_LIVE. Move
the virCheckFlags to the top-level qemuDomainMigratePrepare2
and friends.
2) It also added a memory leak in qemuMonitorTextMigrate()
by not freeing the memory used by virBufferContentAndReset().
This is fixed by storing the pointer in a temporary variable
and freeing it at the end.
With this patch in place, normal live migration works again.
v3: Instead of the churn for virCheckFlagsUI and UL, instead
always promote flags to an unsigned long and always use %lx
for the fprintf.
v2: Add back flags check, which required adding virCheckFlagsUI
and virCheckFlagsUL
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Currently all host audio backends are disabled if a VM is using VNC, in
favor of the QEMU VNC audio extension. Unfortunately no released VNC
client supports this extension, so users have no way of getting audio
to work if using VNC.
Add a new config option in qemu.conf which allows changing libvirt's
behavior, but keep the default intact.
v2: Fix doc typos, change name to vnc_allow_host_audio
The device path doesn't make use of guestAddr, so the memcpy corrupts
the guest info struct.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virDomainGetBlockInfo API allows query physical block
extent and allocated block extent. These are normally the
same value unless storing a special format like qcow2
inside a block device. In this scenario we can query QEMU
to get the actual allocated extent.
Since last time:
- Return fatal error in text monitor
- Only invoke monitor command for block devices
- Fix error handling JSON code
* src/qemu/qemu_driver.c: Fill in block aloction extent when VM
is running
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
API to query the highest block extent via info blockstats
* src/lxc/lxc_driver.c (lxcSetSchedulerParameters): Ensure that
"->field" is "cpu_shares" before possibly giving a diagnostic about
a type for a "cpu_shares" value.
Also, virCgroupSetCpuShares could fail without evoking a diagnostic.
Add one.
Volume detection in the scsi backend was duplicating code already
present in storage_backend.c. Let's drop the duplicate code.
Also, change the shared function name to be less generic, and remove
some error squashing in the other call site.
We were squashing error messages in a few cases. Recode to follow common
ret = -1 convention.
v2: Handle more error squashing issues further up in MakeNewVol and
CreateVols. Use ret = -1 convention in MakeVols.
The qemu driver contains a subtle race in the logic to find next
available vnc port. Currently it iterates through all available ports
and returns the first for which bind(2) succeeds. However it is possible
that a previously issued port has not yet been bound by qemu, resulting
in the same port used for a subsequent domain.
This patch addresses the race by using a simple bitmap to "reserve" the
ports allocated by libvirt.
V2:
- Put port bitmap in struct qemud_driver
- Initialize bitmap in qemudStartup
V3:
- Check for failure of virBitmapGetBit
- Additional check for port != -1 before calling virbitmapClearBit
V4:
- Check for failure of virBitmap{Set,Clear}Bit
V2:
- Move bitmap impl to src/util/bitmap.[ch]
- Use CHAR_BIT instead of explicit '8'
- Use size_t instead of unsigned int
- Fix calculation of bitmap size in virBitmapAlloc
- Ensure bit is within range of map in the set, clear, and get
operations
- Use bool in virBitmapGetBit
- Add virBitmapFree to free-like funcs in cfg.mk
V3:
- Check for overflow in virBitmapAlloc
- Fix copy and paste bug in virBitmapAlloc
- Use size_t in prototypes
- Add ATTRIBUTE_NONNULL in prototypes where appropriate
and remove NULL check from impl
V4:
- Add ATTRIBUTE_RETURN_CHECK in prototypes where appropriate.
We shouldn't be checking validity in domain_conf, since
it can be used by multiple different hosts and hypervisors.
Remove the check completely.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
We need a common internal function for starting managed domains to be
used during autostart. This patch factors out relevant code from
qemudDomainStart into qemudDomainObjStart and makes it use the
refactored code for domain restore instead of calling qemudDomainRestore
API directly.
We need to be able to assign new def to an existing virDomainObj which
is already locked. This patch factors out the relevant code from
virDomainAssignDef into virDomainObjAssignDef.
We need to be able to restore a domain which we already locked and
started a job for it without undoing these steps. This patch factors
out internals of qemudDomainRestore into separate functions which work
for locked objects.
* src/qemu/qemu_conf.c (qemudParseHelpStr): Fix errors that made
it impossible to diagnose invalid minor and micro version number
components.
Signed-off-by: Chris Wright <chrisw@redhat.com>
The current cleanup: in StartVMDaemon path is a poor duplication.
qemuShutdownVMDaemon can handle teardown for inactive VMs, so let's use it.
v2: Remove old abort: label, only use cleanup:
* src/qemu/qemu_conf.c (QEMU_VERSION_STR_1, QEMU_VERSION_STR_2):
Define these instead of...
(QEMU_VERSION_STR): ... this. Remove definition.
(qemudParseHelpStr): Check first for the new, shorter prefix,
"QEMU emulator version", and then for the old one,
"QEMU PC emulator version" when trying to parse the version number.
Based on a patch by Chris Wright.
* src/lxc/lxc_controller.c (ignorable_epoll_accept_errno): New function.
(lxcControllerMain): Handle a failed accept carefully:
most errno values indicate legitimate failure and must be fatal.
However, ignore a special case: that in which an incoming client quits
between the poll() indicating its presence, and our accept() which
is trying to process it.
There doesn't seem to be anything specific to tap devices for this
array of file descriptors which need to stay open of the guest to use.
Rename then for others to make use of.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Chris Lalancette <clalance@redhat.com>
These files may be useful for anyone making modifications to
source files in a tarball distribution.
* src/Makefile.am (EXTRA_DIST): Add THREADS.txt.
* daemon/Makefile.am (EXTRA_DIST): Add THREADING.txt.
This test was failing on systems using pdwtags from dwarves-1.3.
Reported by Matthias Bolte.
Two-pronged fix:
- use --verbose to work also with dwarves-1.3; adapt regular
expressions to handle now-varying separators
- require a minimum number of post-split clauses, in order to
skip upon any future format change.
Currently there are 318; if there are 300 or fewer,
give a warning similar to when pdwtags is missing.
* src/Makefile.am (remote_protocol-structs): Use pdwtags' --verbose
option to make 1.3 emit member sizes and offsets.
Consistently output WARNING messages to stderr.
Do not require each caller of virStorageFileGetMetadata and
virStorageFileGetMetadataFromFD to first clear the storage of the
"meta" buffer. Instead, initialize that storage in
virStorageFileGetMetadataFromFD.
* src/util/storage_file.c (virStorageFileGetMetadataFromFD): Clear
"meta" here, not before each of the following callers.
* src/qemu/qemu_driver.c (qemuSetupDiskCgroup): Don't clear "meta" here.
(qemuTeardownDiskCgroup): Likewise.
* src/qemu/qemu_security_dac.c (qemuSecurityDACSetSecurityImageLabel):
Likewise.
* src/security/security_selinux.c (SELinuxSetSecurityImageLabel):
Likewise.
* src/security/virt-aa-helper.c (get_files): Likewise.
Approximately 60 messages were marked. Since these diagnostics are
intended solely for developers and maintainers, encouraging translation
is deemed to be counterproductive:
http://thread.gmane.org/gmane.comp.emulators.libvirt/25050/focus=25052
Run this command:
git grep -l VIR_WARN|xargs perl -pi -e \
's/(VIR_WARN0?)\s*\(_\((".*?")\)/$1($2/'
There were three very similar uses of qemuMonitorAddDrive.
This change makes the three 17-line sequences identical.
* src/qemu/qemu_driver.c (qemudDomainAttachPciDiskDevice): Detect
failure. Add VIR_WARN and braces.
(qemudDomainAttachSCSIDisk): Add VIR_WARN and braces.
(qemudDomainAttachUsbMassstorageDevice): Likewise.
History has shown that there are frequent bugs in the QEMU driver
code leading to the monitor being invoked with a NULL pointer.
Although the QEMU driver code should always report an error in
this case before invoking the monitor, as a safety net put in a
generic check in the monitor code entry points.
* src/qemu/qemu_monitor.c: Safety net to check for NULL monitor
object
Any method which intends to invoke a monitor command must have
a check for virDomainObjIsActive() before using the monitor to
ensure that priv->mon != NULL.
There is one subtle edge case in this though. If a method invokes
multiple monitor commands, and calls qemuDomainObjExitMonitor()
in between two of these commands then there is no guarentee that
priv->mon != NULL anymore. This is because the QEMU process may
exit or die at any time, and because qemuDomainObjEnterMonitor()
releases the lock on virDomainObj, it is possible for the background
thread to close the monitor handle and thus qemuDomainObjExitMonitor
will release the last reference allowing priv->mon to become NULL.
This affects several methods, most notably migration but also some
hotplug methods. This patch takes a variety of approaches to solve
the problem, depending on the particular usage scenario. Generally
though it suffices to add an extra virDomainObjIsActive() check
if qemuDomainObjExitMonitor() was called during the method.
* src/qemu/qemu_driver.c: Fix multiple potential NULL pointer flaws
in usage of the monitor
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Upon look-up failure,
i.e., vm==NULL, goto cleanup, rather than to "endjob", superficially
since the latter would dereference vm, but more fundamentally because
we certainly don't want to call qemuDomainObjEndJob before we've
even attempted qemuDomainObjBeginJob.
(gdb) p/x QEMUD_CMD_FLAG_VNET_HOST
$7 = 0xffffffff80000000
Oops - that meant we were incorrectly setting QEMU_CMD_FLAG_RTC_TD_HACK
for qemu-kvm-0.12.3 (and probably botching a few other settings as well).
Fixes Red Hat BZ#592070
* src/qemu/qemu_conf.h (QEMUD_CMD_FLAG_VNET_HOST): Avoid sign
extension.
* tests/qemuhelpdata/qemu-kvm-0.12.3: New file.
* tests/qemuhelptest.c (mymain): Add another case.
A fedora translator filed:
https://bugzilla.redhat.com/show_bug.cgi?id=580816
Pointing out these two error messages as unclear: "write save" sounds
like a typo without context, and lack of a colon made the second message
difficult to parse.
virFileResolveLink was returning a positive value on error,
thus confusing callers that assumed failure was < 0. The
confusion is further evidenced by callers that would have
ended up calling virReportSystemError with a negative value
instead of a valid errno.
Fixes Red Hat BZ #591363.
* src/util/util.c (virFileResolveLink): Live up to documentation.
* src/qemu/qemu_security_dac.c
(qemuSecurityDACRestoreSecurityFileLabel): Adjust callers.
* src/security/security_selinux.c
(SELinuxRestoreSecurityFileLabel): Likewise.
* src/storage/storage_backend_disk.c
(virStorageBackendDiskDeleteVol): Likewise.
Fix the cygwin regression introduced in commit 48445ccff, but
without repeating the fresh build regression of commit
2d550542e.
* src/Makefile.am (libvirt_test_la_LIBADD): Split out subset of
locally-built libraries...
(libvirt_test_la_BUILT_LIBADD): ...into new variable.
(libvirt_test_la_DEPENDENCIES): Depend only on the subset that
automake would have given us for free if we didn't have to add our
own extra file.
Matthias noted that the line:
virt_aa_helper_LDFLAGS = $(WARN_CFLAGS)
looks inconsistent, so I did an audit.
Currently, the set of compiler warning flags passed to gcc as $CC are
equally permitted as the set of linker flags passed to gcc as $LD, so
there was no problem with that usage. But if we ever get in a
situation where $CC and $LD treat particular flags differently, using
the right variable form will make it easier.
In the process, I spotted a couple of typos that were omitting useful
flags, as well as specifying a -l under the wrong variable.
* acinclude.m4 (LIBVIRT_COMPILE_WARNINGS): Define WARN_LDFLAGS as
an alias for WARN_CFLAGS.
* tools/Makefile.am (virsh_LDFLAGS): Use more canonical spelling.
* proxy/Makefile.am (libvirt_proxy_LDFLAGS): Likewise. Move
library...
(libvirt_proxy_LDADD): ...here.
* src/Makefile.am (virt_aa_helper_LDFLAGS): Use more canonical
spelling of WARN_LDFLAGS.
(libvirt_parthelper_LDFLAGS, libvirt_lxc_LDFLAGS): Likewise. Use
correct spelling of COVERAGE_LDFLAGS.
Reported by Matthias Bolte.
* configure.ac: Check for <linux/magic.h>.
* src/util/storage_file.c: Include <linux/magic.h> only if present.
Linux kernels prior to 2.6.19 lacked it.
[__linux__] (NFS_SUPER_MAGIC): Define if not already defined.
qemuReadLogOutput early VM death detection is racy and won't always work.
Startup then errors when connecting to the VM monitor. This won't report
the emulator cmdline output which is typically the most useful diagnostic.
Check if the VM has died at the very end of the monitor connection step,
and if so, report the cmdline output.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=581381
* src/qemu/qemu_driver.c (qemudDomainSetVcpus): Avoid NULL-deref
upon unknown UUID. Call qemuDomainObjBeginJob(vm) only after
ensuring that vm != NULL, not before. This potential NULL-deref
was introduced by commit 2c555d87b0.
This reverts commit 2d550542ee.
The patch worked for incremental builds, but broke fresh
builds, because it interfered with automake's automatic
dependency generation. Until I figure out how to make
automake do what we want, I'd rather leave cygwin broken
but fresh Linux builds working.
make[3]: *** No rule to make target `-lxml2', needed by `libvirt.la'. Stop.
Due to treating the wrong string as a dependency.
* src/Makefile.am (libvirt_la_DEPENDENCIES): Depend only on
locally-built file, not on strings that might resolve as '-lxml2'.
The code specifies driver->cacheDir as the format string,
but it usually doesn't contain '%s', so the subsequent
argument, "/qemu.mem.XXXXXX", is always ignored.
The patch fixes the misuse.
Setting dynamic_ownership=0 in /etc/libvirt/qemu.conf prevents
libvirt's DAC security driver from setting uid/gid on disk
files when starting/stopping QEMU, allowing the admin to manage
this manually. As a side effect it also stopped setting of
uid/gid when saving guests to a file, which completely breaks
save when QEMU is running non-root. Thus saved state labelling
code must ignore the dynamic_ownership parameter
* src/qemu/qemu_security_dac.c: Ignore dynamic_ownership=0 when
doing save/restore image labelling
When QEMU runs with its disk on NFS, and as a non-root user, the
disk is chownd to that non-root user. When migration completes
the last step is shutting down the QEMU on the source host. THis
normally resets user/group/security label. This is bad when the
VM was just migrated because the file is still in use on the dest
host. It is thus neccessary to skip the reset step for any files
found to be on a shared filesystem
* src/libvirt_private.syms: Export virStorageFileIsSharedFS
* src/util/storage_file.c, src/util/storage_file.h: Add a new
method virStorageFileIsSharedFS() to determine if a file is
on a shared filesystem (NFS, GFS, OCFS2, etc)
* src/qemu/qemu_driver.c: Tell security driver not to reset
disk labels on migration completion
* src/qemu/qemu_security_dac.c, src/qemu/qemu_security_stacked.c,
src/security/security_selinux.c, src/security/security_driver.h,
src/security/security_apparmor.c: Add ability to skip disk
restore step for files on shared filesystems.
The cgroups ACL code was only allowing the primary disk image.
It is possible to chain images together, so we need to search
for backing stores and add them to the ACL too. Since the ACL
only handles block devices, we ignore the EINVAL we get from
plain files. In addition it was missing code to teardown the
cgroup when hot-unplugging a disk
* src/qemu/qemu_driver.c: Allow backing stores in cgroup ACLs
and add missing teardown code in unplug path
If the IO error event does not include a reason, then there
is a possible crash dispatching the event
* src/conf/domain_event.c: Missing check for a NULL reason before
strduping allows for a crash
QEMU is gaining a new monitor command netdev_add for hotplugging
NICs using the netdev backend code. We already support this on
the command this, though it is disabled. This adds support for
hotplug too, also to remain disabled until 0.13 QEMU is released
* src/qemu/qemu_driver.c: Support netdev hotplug for NICs
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Add
support for netdev_add and netdev_remove commands
When closing a monitor using qemuMonitorClose(), we are aware of
the possibility the monitor is still being used somewhere:
/* NB: ordinarily one might immediately set mon->watch to -1
* and mon->fd to -1, but there may be a callback active
* that is still relying on these fields being valid. So
* we merely close them, but not clear their values and
* use this explicit 'closed' flag to track this state */
but since we call virEventAddHandle() on that monitor without increasing
its ref counter, the monitor is still freed which makes possible users
of it quite unhappy. The unhappiness can lead to a hang if qemuMonitorIO
tries to lock mutex which no longer exists.
First calling REMOTE_PROC_CLOSE and then removing watches might lead to
a hang as HANGUP event can be triggered before the watches are actually
removed but after virConnectPtr is already freed. As a result of that
remoteDomainEventFired() would try to lock uninitialized mutex, which
would hang for ever.
Allow debugging of GNUTLS interactions by setting
LIBVIRT_GNUTLS_DEBUG=10 LIBVIRT_DEBUG=1 virsh
* src/remote/remote_driver.c: Use LIBVIRT_GNUTLS_DEBUG to
enable gnutls debugging
Some shells warn about missing programs before redirection;
the idiomatic way to silence them is to run the program check
inside a subshell, with the redirections outside the subshell.
But a subshell is only needed in places where it is reasonable
to expect the use of such a noisy shell in the first place.
* src/Makefile.am (remote_protocol-structs): Use subshell, for
FreeBSD 8.0 /bin/sh.
* cfg.mk (sc_preprocessor_indentation): Avoid subshell, since the
only users running cfg.mk can be assumed to have decent tools.
For printf("%*s",foo,bar), clang complains if foo is not int:
warning: field width should have type 'int', but argument has
type 'unsigned int' [-Wformat]
* src/conf/storage_encryption_conf.c
(virStorageEncryptionSecretFormat, virStorageEncryptionFormat):
Use correct type.
* src/conf/storage_encryption_conf.h (virStorageEncryptionFormat):
Likewise.
Now, if you update remote_protocol.x without also updating
remote_protocol-structs to match, then "make check" will fail.
* src/Makefile.am (remote_protocol-structs): Extract list of
structs and member names from remote_protocol.o.
(check-local): Depend on it.
* src/remote_protocol-structs: New file.
This reverts commit b5b8a6db69.
That commit was not necessary. The problem is fixed by commit
0e9b3a269b, but I didn't rebuild
it properly after pulling in the commit and didn't notice it.
Per automake, LDFLAGS is used early in the line, and LIBADD
(libraries) or LDADD (programs) is used late. On platforms like
cygwin, without lazy linking, this order matters. Therefore, libtool
commands, -L, and similar should be in LDFLAGS, but -l should be in
L*ADD.
* src/Makefile.am (*_LDFLAGS): Move libraries...
(*_LIBADD): ...to their LIBADD counterpart.
Change 965466c1 added a new field to struct remote_error, which broke
the RPC protocol. Fortunately the new field is unused, so this change
simply removes it again.
* src/remote/remote_protocol.(c|h|x): Remove remote_nwfilter from struct
remote_error
With the introduction of the generic qemu device model, unplugging
SCSI disks works like a charm, so support it in libvirt.
* src/qemu/qemu_driver.c: Add qemudDomainDetachSCSIDiskDevice() to do the
unplugging, extend qemudDomainDetachDeviceAdd().
Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Gnulib can guarantee that pthread.h exists, but for now, it is a dummy
header with no support for most pthread_* functions. Modify our
use of pthread to use function checks, rather than header checks,
to determine how much pthread support is present.
* bootstrap.conf (gnulib_modules): Add pthread.
* configure.ac: Drop all pthread.h checks. Optimize function
checks. Add check for pthread functions.
* src/Makefile.am (libvirt_lxc_LDADD): Ensure proper link.
* src/remote/remote_driver.c (remoteIOEventLoop): Depend on
pthread_sigmask, now that gnulib guarantees pthread.h.
* src/util/util.c (virFork): Likewise.
* src/util/threads.c (threads-pthread.c): Depend on
pthread_mutexattr_init, as a witness of full pthread support.
* src/util/threads.h (threads-pthread.h): Likewise.
Detected by clang. POSIX requires that the second argument to
va_start be the name of the last variable; and in some implementations,
passing *path instead of path would dereference bogus memory instead
of pulling arguments off the stack.
* src/util/util.c (virBuildPathInternal): Use correct argument to
va_start.
Support for live migration between hosts that do not share storage was
added to qemu-kvm release 0.12.1.
It supports two flags:
-b migration without shared storage with full disk copy
-i migration without shared storage with incremental copy (same base image
shared between source and destination).
I tested the live migration without shared storage (both flags) for native
and p2p with and without tunnelling. I also verified that the fix doesn't
affect normal migration with shared storage.
Add an empty body for virCondWaitUntil and move virPipeReadUntilEOF
out of the '#ifndef WIN32' block, because it compiles fine with MinGW
in combination with gnulib.
Necessary on cygwin, where uid_t and gid_t are 4-byte long rather
than int, causing gcc -Wformat warnings.
* src/util/util.c (virFileOperationNoFork, virDirCreateNoFork)
(virFileOperation, virDirCreate, virGetUserEnt): Cast uid_t and
gid_t before passing to printf.
* .gitignore: Ignore Windows executables.
When a filter is updated, only those interfaces must have their old
rules cleared that either reference the filter directly or indirectly
through another filter. Remember between the different steps of the
instantiation of the filters which interfaces must be skipped. I am
using a hash map to remember the names of the interfaces and store a
bogus pointer to ~0 into it that need not be freed.
For the decision on whether to instantiate the rules, the check for a
pending IP address learn request is not sufficient since then only the
thread could instantiate the rules. So, a boolean needs to be passed
when the thread instantiates the filter rules late and the IP address
learn request is still pending in order to override the check for the
pending learn request. If the rules are to be updated while the thread
is active, this will not be done immediately but the thread will do that
later on.
WIN32 is always defined when __MINGW32__ is defined, but the
converse is not true. WIN32 is more generic, if someone were
to ever attempt porting to a microsoft compiler. This does
not affect Cygwin, which intentionally does not define WIN32.
* src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Use more
generic flag macro.
* src/storage/storage_backend.c
(virStorageBackendUpdateVolTargetInfoFD)
(virStorageBackendRunProgRegex): Likewise.
* tools/console.h (vshRunConsole): Likewise.
[Error message]
error: Failed to start domain lxc_test1
error: internal error Failed to create veth device pair: 512
The reason of the failure is that lxc driver unexpectedly re-uses
an auto-assigned veth name and tries to create the created veth
again. The failure will happen when a domain has multiple network
interfaces and the names of those are not specified in XML.
The patch fixes the problem by resetting buffers of veth names
in every iteration of creating veth.
* src/lxc/lxc_driver.c: prevent re-using auto-assigned veth name
Reported by Kumar L Srikanth-B22348.
<hostdev> address parsing previously attempted to detect the number
base: currently it is hardcoded to base 16, which can break PCI
assignment via virt-manager. Revert to the previous behavior.
* src/conf/domain_conf.c: virDomainDevicePCIAddressParseXML, switch to
virStrToLong_ui(bus, NULL, 0, ...) to autodetect base
This introduces a new event type
VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
This event is the same as the previous VIR_DOMAIN_ID_IO_ERROR
event, but also includes a string describing the cause of
the event.
Thus there is a new callback definition for this event type
typedef void (*virConnectDomainEventIOErrorReasonCallback)(virConnectPtr conn,
virDomainPtr dom,
const char *srcPath,
const char *devAlias,
int action,
const char *reason,
void *opaque);
This is currently wired up to the QEMU block IO error events
* daemon/remote.c: Dispatch IO error events to client
* examples/domain-events/events-c/event-test.c: Watch for
IO error events
* include/libvirt/libvirt.h.in: Define new IO error event ID
and callback signature
* src/conf/domain_event.c, src/conf/domain_event.h,
src/libvirt_private.syms: Extend API to handle IO error events
* src/qemu/qemu_driver.c: Connect to the QEMU monitor event
for block IO errors and emit a libvirt IO error event
* src/remote/remote_driver.c: Receive and dispatch IO error
events to application
* src/remote/remote_protocol.x: Wire protocol definition for
IO error events
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c: Watch for BLOCK_IO_ERROR event
from QEMU monitor
Introduce a function to notify the IP address learning
thread to terminate and thus release the lock on the interface.
Notify the thread before grabbing the lock on the interface
and tearing down the rules. This prevents a 'virsh destroy' to
tear down the rules that the IP address learning thread has
applied.
The functions invoked by the IP address learning thread
that apply some basic filtering rules did not clean up
any previous filtering rules that may still be there
(due to a libvirt restart for example). With the
patch below all the rules are cleaned up first.
Also, I am introducing a function to drop all traffic
in case the IP address learning thread could not apply
the rules.
The local DHCP server on virtbr0 sends DHCP ACK messages when a VM is
started and requests an IP address while the initial DHCP lease on the
VM's MAC address hasn't expired. So, also pick the IP address of the VM
if that type of message is seen.
Thanks to Gerhard Stenzel for providing a test case for this.
Changes from V1 to V2:
- cleanup: replacing DHCP option numbers through constants
When using -device syntax, the IO event will have a different
prefix, 'drive-' that needs to be skipped over before matching
against the libvirt disk alias
* src/qemu/qemu_driver.c: Skip QEMU_DRIVE_HOST_PREFIX in IO event
* daemon/remote.c: Server side dispatcher
* daemon/remote_dispatch_args.h, daemon/remote_dispatch_prototypes.h,
daemon/remote_dispatch_ret.h, daemon/remote_dispatch_table.h: Update
with new API
* src/remote/remote_driver.c: Client side dispatcher
* src/remote/remote_protocol.c, src/remote/remote_protocol.h: Update
* src/remote/remote_protocol.x: Define new wire protocol
This defines the internal driver API and stubs out each driver
* src/driver.h: Define virDrvDomainGetBlockInfo signature
* src/libvirt.c, src/libvirt_public.syms: Glue public API to drivers
* src/esx/esx_driver.c, src/lxc/lxc_driver.c, src/opennebula/one_driver.c,
src/openvz/openvz_driver.c, src/phyp/phyp_driver.c,
src/test/test_driver.c, src/uml/uml_driver.c, src/vbox/vbox_tmpl.c,
src/xen/xen_driver.c, src/xenapi/xenapi_driver.c: Stub out driver
qemuDomainPCIAddressSetFree was freeing up the hash
table for the pci addresses, but not freeing up the addr
structure. Looking over the callers of this function, it
seems like they expect it to also free up the structure,
so do that here.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
We were over-writing a pointer without freeing it in
case of a disk device, leading to a memory leak.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
When building on Ubuntu with make -j3 (or more), it would always
fail when trying to build virt-aa-helper. I'm not an expert in
automake by any means, but I think the entry for virt-aa-helper
is mis-using LDADD; it shouldn't be putting direct paths to
libvirt_conf.la and libvirt_util.la, but instead referencing those
names. With this patch in place, I'm able to successfully build
on Ubuntu 9.04 with make -j3.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The previous commit changes a goto from 'endjob' to 'cleanup',
leaving the endjob label unused. Remove it to avoid compile
warning.
* src/qemu/qemu_driver.c: Remove 'endjob' label
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): When setting
"vm" to NULL, jump over vm-dereferencing code to "cleanup".
(qemuDomainRevertToSnapshot): Likewise.
use /var/lib/libvirt/dnsmasq since /var/lib/libvirt/network is
unreadable by the dnsmasq binary
* src/network/bridge_driver.c: update DNSMASQ_STATE_DIR
* src/Makefile.am: create it on make install
* libvirt.spec.in: take the new directory into account
In cases where the security driver failed to restore a label after a
guest has saved, we mistakenly jumped to the error cleanup paths.
This is not good, because the operation has in fact completed and
cannot be rolled back completely. Label restore is non-critical, so
just log the problem instead. Also add a missing restore call in
the error cleanup path
* src/qemu/qemu_driver.c: Fix handling of security driver
restore failures in QEMU domain save
When cgroups is enabled, access to block devices is likely to be
restricted to a whitelist. Prior to saving a guest to a block device,
it is necessary to add the block device to the whitelist. This is
not required upon restore, since QEMU reads from stdin
* src/qemu/qemu_driver.c: Add block device to cgroups whitelist
if neccessary during domain save.
The save process was relying on use of the shell >> append
operator to ensure the save data was placed after the libvirt
header + XML. This doesn't work for block devices though.
Replace this code with use of 'dd' and its 'seek' parameter.
This means that we need to pad the header + XML out to a
multiple of dd block size (in this case we choose 512).
The qemuMonitorMigateToCommand() monitor API is used for both
save/coredump, and migration via UNIX socket. We can't simply
switch this to use 'dd' since this causes problems with the
migration usage. Thus, create a dedicated qemuMonitorMigateToFile
which can accept an filename + offset, and remove the filename
from the current qemuMonitorMigateToCommand() API
* src/qemu/qemu_driver.c: Switch to qemuMonitorMigateToFile
for save and core dump
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h,
src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_json.h,
src/qemu/qemu_monitor_text.c, src/qemu/qemu_monitor_text.h: Create
a new qemuMonitorMigateToFile, separate from the existing
qemuMonitorMigateToCommand to allow handling file offsets
It is possible to use block devices with domain save/restore. Upon
failure QEMU unlinks the path being saved to. This isn't good when
it is a block device !
* src/qemu/qemu_driver.c: Don't unlink block devices if save fails
If a transient QEMU crashes during save attempt, then the virDomainPtr
object may be freed. If a persistent QEMU crashes during save, then
the 'priv->mon' field is no longer valid since it will be inactive.
* src/qemu/qemu_driver.c: Fix two crashes when QEMU exits
during a save attempt
In particular I was forgetting to take the qemuMonitorPrivatePtr
lock (via qemuDomainObjBeginJob), which would cause problems
if two users tried to access the same domain at the same time.
This patch also fixes a problem where I was forgetting to remove
a transient domain from the list of domains.
Thanks to Stephen Shaw for pointing out the problem and testing
out the initial patch.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
This patch adds support for the RARP protocol. This may be needed due to
qemu sending out a RARP packet (at least that's what it seems to want to
do even though the protocol id is wrong) when migration finishes and
we'd need a rule to let the packets pass.
Unfortunately my installation of ebtables does not understand -p RARP
and also seems to otherwise depend on strings in /etc/ethertype
translated to protocol identifiers. Therefore I need to pass -p 0x8035
for RARP. To generally get rid of the dependency of that file I switch
all so far supported protocols to use their protocol identifier in the
-p parameter rather than the string.
I am also extending the schema and added a test case.
changes from v1 to v2:
- added test case into patch
With JSON qemu monitor, we get a STOP event from qemu whenever qemu
stops guests CPUs. The downside of it is that vm->state is changed to
PAUSED and a new generic paused event is send to applications. However,
when we ask qemu to stop the CPUs we are not really interested in qemu
event and we usually want to issue a more specific event.
By setting vm->status to PAUSED before actually sending the request to
qemu (and resetting it back if the request fails) we can ignore the
event since the event handler does nothing when the guest is already
paused. This solution is quite hacky but unfortunately it's the best
solution which I was able to come up with and it doesn't introduce a
race condition.
* virStorageEncryptionFormat is called from both
virDomainDiskDefFormat and virStorageVolTargetDefFormat. The proper
indentation in the generated XML depends on the caller. My earlier
patch to fix the incorrect indentation for the domain XML broke the
indentation for the storage XML. This patch adopts Laine's
suggestion of requring the caller of virStorageEncryptionFormat to
provide an unsigned int with the number of spaces the output should
be indented. The patch modifies both callers to provide the
additional argument.
* Add a regression test for the domain XML
* src/conf/domain_conf.c src/conf/storage_conf.c
src/conf/storage_encryption_conf.c src/conf/storage_encryption_conf.h:
change the indentation code
* tests/qemuxml2xmltest.c
tests/qemuxml2argvdata/qemuxml2argv-encrypted-disk.args
tests/qemuxml2argvdata/qemuxml2argv-encrypted-disk.xml: add a regression test
Cygwin's XDR implementation defines xdr_u_int64_t instead of
xdr_uint64_t and lacks IXDR_PUT_INT32/IXDR_GET_INT32.
Alter the IXDR_GET_LONG regex in rpcgen_fix.pl so it doesn't destroy
the #define IXDR_GET_INT32 IXDR_GET_LONG in remote_protocol.x.
Also fix the remote_protocol.h regex in rpcgen_fix.pl.
With this patch I want to enable hex number inputs in the filter XML. A
number that was entered as hex is also printed as hex unless a string
representing the meaning can be found.
I am also extending the schema and adding a test case. A problem with
the DSCP value is fixed on the way as well.
Changes from V1 to V2:
- using asHex boolean in all printf type of functions to select the
output format in hex or decimal format
This patch makes libvirtd start the dnsmasq daemon with a
--dhcp-hostsfile option instead of --dhcp-host options for each
'//ip/dhcp/host' entries defined in network xml file.
the dnsmasq host file is stored into /var/lib/libvirt/network
* src/network/bridge_driver.c: define the directory for the hostfiles
and save/delete them to be used by dnsmasq
* po/POTFILES.in: the new module contains translatable strings
* src/Makefile.am: include the files in the utils set
* src/libvirt_private.syms: exports the symbols internally
It implements an idea to save dhcp hosts' macaddr vs. ipaddr mappings to
static file and make dnsmasq loading it with "--dhcp-hostsfile" option,
originally suggested by Dan, and can address the problem that too
many "--dhcp-host" args hitting ARG_MAX limit
* src/util/dnsmasq.h src/util/dnsmasq.c: adds the 2 new files
While doing some testing of the snapshot code I noticed that
if qemuDomainSnapshotLoad failed, it would print a NULL as
part of the error. That's not desirable, so leave the
full_path variable around until after we are done printing
errors.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
We were freeing the virDomainSnapshotDefPtr, but not
the virDomainSnapshotObjPtr in virDomainSnapshotObjFree.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
It's not needed and is currently ignored, but this is a bug.
It will get fixed soon and QMP will return an error for keys
it doesn't know about, this will break libvirt.
* src/qemu/qemu_monitor_json.c: remove qemuMonitorJSONCommandAddTimestamp()
and the place where it's invoked in qemuMonitorJSONMakeCommand()
The nwfilterDriverActive() could de-reference a NULL pointer
if it hadn't be started at the point it was called. It was
also not thread safe, since it lacked locking around data
accesses.
* src/nwfilter/nwfilter_driver.c: Fix locking & NULL checks
in nwfilterDriverActive()
The user probably doesn't care what the gai error numbers are, as
much as what the failed conversion IP address was.
* src/remote/remote_driver.c (addrToString): Mention which address
could not be converted.
* daemon/remote.c (addrToString): Likewise.
* The error messages coming from qemu's DAC support contain strings
from the original SELinux security driver code. This just removes
references to "security context" and other SELinux-isms from the DAC
code.
Signed-off-by: Spencer Shimko <sshimko@tresys.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
- using INT_BUFSIZE_BOUND() to determine the length of the buffersize
for printing and integer into
- not explicitly initializing static var threadsTerminate to false
anymore, since that's done automatically
Changes after V2:
- removed while looks in case of OOM error
- removed on ifaceDown() call
- preceding one ifaceDown() call with an ifaceCheck() call
Since the name of an interface can be the same between stops and starts
of different VMs I have to switch the IP address learning thread to use
the index of the interface to determine whether an interface is still
available or not - in the case of macvtap the thread needs to listen for
traffic on the physical interface, thus having to time out periodically
to check whether the VM's macvtap device is still there as an indication
that the VM is still alive. Previously the following sequence of 2 VMs
with macvtap device
virsh start testvm1; virsh destroy testvm1 ; virsh start testvm2
would not terminate the thread upon testvm1's destroy since the name of
the interface on the host could be the same (i.e, macvtap0) on testvm1
and testvm2, thus it was easily race-able. The thread would then
determine the IP address parameter for testvm2 but apply the rule set
for testvm1. :-(
I am also introducing a lock for the interface (by name) that the thread
must hold while it listens for the traffic and releases when it
terminates upon VM termination or 0.5 second thereafter. Thus, the new
thread for a newly started VM with the same interface name will not
start while the old one still holds the lock. The only other code that I
see that also needs to grab the lock to serialize operation is the one
that tears down the firewall that were established on behalf of an
interface.
I am moving the code applying the 'basic' firewall rules during the IP
address learning phase inside the thread but won't start the thread
unless it is ensured that the firewall driver has the ability to apply
the 'basic' firewall rules.
The hang fix in d376b7d63e was incomplete
since it left quite a few {Enter,Exit}Monitor calls which require driver
to be unlocked. Since the driver is locked throughout the whole
function, {Enter,Exit}MonitorWithDriver need to be used instead to
ensure driver is not locked when issuing monitor commands.
The comment in qemuDomainWaitForMigrationComplete says we are polling
every 50ms but the code sleeps only for 50us. This was already discussed
during review but apparently forgotten when the series was pushed.
The text monitor code was checking for a '\n' prefix on several
places. Previously this would work, but since the monitor code
re-write the '\n' is already stripped off, so mustn't be checked
for.
* src/qemu/qemu_monitor_text.c: Fix monitor error checking
Probably as a result of a merge error, the CPU hotplug command
names were completely wrong.
* src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Fix
the CPU hotplug command names
Adds ability to provide a preferred CPU model for CPUID data decoding.
Such model would be considered as the best possible model (if it's
supported by hypervisor) regardless on number of features which have to
be added or removed for describing required CPU.
So far, when CPUID data were converted into CPU model and features, the
features can only be added to the model. As a result, when a guest asked
for something like "qemu64,-svm" it would get a qemu32 plus a bunch of
additional features instead.
This patch adds support for removing feature from the base model.
Selection algorithm remains the same: the best CPU model is the model
which requires lowest number of features to be added/removed from it.
Qemu committed a patch which list some CPU names in [] when asked for
supported CPUs (qemu -cpu ?). Yet, it needs such CPUs to be passed
without those square braces. When probing for supported CPU models, we
can just strip the square braces and pretend we have never seen them.
First, inital VCPU pinning is set correctly but then it is reset by
assigning qemu process to a new cgroup (which contains all CPUs). It's
easily fixed by swapping these two actions.
An empty root snapshot list was considered as error condition. Creating a
new snapshot would fail if the domain didn't have snapshots yet, because
the snapshot-create function tries to lookup the list of existing snapshots
in order to verify that the snapshot name is unique. This fails if the
domain doesn't have snapshots yet.
Removing the NULL check from esxVI_LookupRootSnapshotTreeList fixes this.