19c6ad9a (qemu: Refactor qemuDomainSetMemoryParameters) introduced
a new macro, VIR_GET_LIMIT_PARAMETER(PARAM, VALUE). But if statement
in the macro is not correct and so set_XXXX flags are set to false
in the wrong. As a result, libvirt ignores all memtune parameters.
This patch fixes the conditional expression to work correctly.
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
BZ:https://bugzilla.redhat.com/show_bug.cgi?id=912021
Without error handler set, virDefaultErrorFunc will be called, the
error message is prefixed with "libvir:". It become a little better
by using prefix "libvirt:" when working with upper application.
For example:
1, stop libvirtd daemon
2, run virt-top.
libvir: XML-RPC error : Failed to connect \
socket to '/var/run/libvirt/libvirt-sock-ro': \
No such file or directory
libvirt: VIR_ERR_SYSTEM_ERROR: VIR_FROM_RPC: \
Failed to connect socket to '/var/run/libvirt/libvirt-sock-ro': \
No such file or directory
Commit f506a4c1 changed virSetUIDGID() to be a noop
when uid/gid are -1, while it used to be a noop when
they are <= 0.
The changes in this commit broke creating new VMs in GNOME Boxes
as qemuDomainCheckDiskPresence gets called during domain creation/startup,
which in turn calls virFileAccessibleAs which fails after calling
virSetUIDGID(0, 0) (Boxes uses session libvirtd). virSetUIDGID is called with
(0, 0) as these are the default user/group values in virQEMUDriverConfig
for session libvirtd.
This commit changes virQEMUDriverConfigNew to use -1 as the unpriviledged
uid/gid. I've also looked at the various places where cfg->user is used,
and they all seem to handle -1 correctly.
Currently, after we removed the qemu driver lock, it may happen
that two or more threads will start up a machine with macvlan and
race over virNetDevMacVLanCreateWithVPortProfile(). However,
there's a racy section in which we are generating a sequence of
possible device names and detecting if they exits. If we found
one which doesn't we try to create a device with that name.
However, the other thread is doing just the same. Assume it will
succeed and we must therefore fail. If this happens more than 5
times (which in massive parallel startup surely will) we return
-1 without any error reported. This patch is a simple hack to
both of these problems. It introduces a mutex, so only one thread
will enter the section, and if it runs out of possibilities,
error is reported. Moreover, the number of retries is raised to 20.
This reverts the hack done in
commit 568a6cda27
Author: Jiri Denemark <jdenemar@redhat.com>
Date: Fri Feb 15 15:11:47 2013 +0100
qemu: Avoid deadlock in autodestroy
since we now have a fix which avoids the deadlock scenario
entirely
There is a lock ordering problem in the QEMU close callback
APIs.
When starting a guest we have a lock on the VM. We then
set a autodestroy callback, which acquires a lock on the
close callbacks.
When running auto-destroy, we obtain a lock on the close
callbacks, then run each callbacks - which obtains a lock
on the VM.
This causes deadlock if anyone tries to start a VM, while
autodestroy is taking place.
The fix is to do autodestroy in 2 phases. First obtain
all the callbacks and remove them from the list under
the close callback lock. Then invoke each callback
from outside the close callback lock.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When the auto-destroy callback runs it is supposed to return
NULL if the virDomainObjPtr is no longer valid. It was not
doing this for transient guests, so we tried to virObjectUnlock
a mutex which had been freed. This often led to a crash.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
qemuProcessStart expects to be run with a job already set and every
caller except for qemuMigrationPrepareAny use it correctly. This bug can
be observed in libvirtd logs during incoming migration as
warning : qemuDomainObjEnterMonitorInternal:979 : This thread seems
to be the async job owner; entering monitor without asking for a
nested job is dangerous
With the apparmor security driver enabled, qemu instances fail
to start
# grep ^security_driver /etc/libvirt/qemu.conf
security_driver = "apparmor"
# virsh start test-kvm
error: Failed to start domain test-kvm
error: internal error security label already defined for VM
The model field of virSecurityLabelDef object is always populated
by virDomainDefGetSecurityLabelDef(), so remove the check for a
NULL model when verifying if a label is already defined for the
instance.
Checking for a NULL model and populating it later in
AppArmorGenSecurityLabel() has been left in the code to be
consistent with virSecuritySELinuxGenSecurityLabel().
As pointed out in
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1034661
The sentence
"The function of PCI device addresses must less than 8"
does not quite make sense. Update that to read
"The function of PCI device addresses must be less than 8"
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Currently, qemuDomainShutdownFlags() chooses the agent method of
shutdown whenever the agent is configured. However, this
assumption is not enough as the guest agent may be unresponsive
at the moment. So unless guest agent method has been explicitly
requested, we should fall back to the ACPI method.
The unitialized local variable qemuVersion can cause an random value
to be returned for the hypervisor version, observable with virsh version.
Introduced by commit b46f7f4a0b
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
disk->src is still used for disks->hosts->name, do not free it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The QEMU driver has a list of devices nodes that are whitelisted
for all guests. The kernel has recently started returning an
error if you try to whitelist a device which does not exist.
This causes a warning in libvirt logs and an audit error for
any missing devices. eg
2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code for putting the emulator threads in a separate cgroup
would spam the logs with warnings
2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3
2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4
2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6
This is because it has only created child cgroups for 3 of the
controllers, but was trying to move the processes from all the
controllers. The fix is to only try to move threads in the
controllers we actually created. Also remove the warning and
make it return a hard error to avoid such lazy callers in the
future.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virQEMUCloseCallbacksRunOne method was passing a uuid string
to virDomainObjListFindByUUID, when it actually expected to get
a raw uuid buffer. This was not caught by the compiler because
the method was using a 'void *uuid' instead of first casting
it to the expected type.
This regression was accidentally caused by refactoring in
commit 568a6cda27
Author: Jiri Denemark <jdenemar@redhat.com>
Date: Fri Feb 15 15:11:47 2013 +0100
qemu: Avoid deadlock in autodestroy
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=896092 mentions that
qemu 1.4 and earlier only accept a simple start-stop range for
the cpu=... argument of -numa. Libvirt would attempt to use
-numa cpu=1,3 for a disjoint range, which did not work as intended.
Upstream qemu will be adding a new syntax for disjoint cpu ranges
in 1.5; but the design for that syntax is still under discussion
at the time of this patch. So for libvirt 1.0.3, it is safest to
just reject attempts to build an invalid qemu command line; in the
future, we can add a capability bit and translate to the final
accepted design for selecting a disjoint cpu range in numa.
* src/qemu/qemu_command.c (qemuBuildNumaArgStr): Reject disjoint
ranges.
This reverts commit 383ebc4694.
We decided the xml for this feature needed more thought to make sure
we are doing it the best way, in particular wrt option values that
have multiple items.
These two flags in fact are mutually exclusive. Requesting them both
doesn't make any sense regardless of hypervisor driver. Hence, we have
to make it within libvirt.c file instead of fixing it in each driver.
Eugene Marcotte reported that if gcrypt-devel (a prereq of
gnutls-devel) is not present, then compilation fails due to
an unconditional use of <gcrypt.h>.
* src/libvirt.c (includes): Properly guard use of gcrypt.h.
This reverts commit 0bbbd42c30.
The design for this feature is not complete, and may change the
name of the 'schid' attribute. Revert requested by Viktor Mihajlovski.
The parallels storage driver declared some loop variables
inside the for(;;). This is not allowed by libvirt coding
standards
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This change tried to fix a crash with changing CDROM media but
failed to actually do so
commit d0172d2b1b
Author: Osier Yang <jyang@redhat.com>
Date: Tue Feb 19 20:27:45 2013 +0800
qemu: Remove the shared disk entry if the operation is ejecting or updating
It was still accessing disk->src, when the entire 'disk' object
has been free'd already. Even if it weren't free'd, accessing
the 'src' value of virDomainDiskDef is not allowed without
first validating disk->type is file or block. Just remove the
broken code entirely.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
VIR_ERR_NO_CONNECT already contains "no connection driver available".
This patch changes:
no connection driver available for No connection for URI hello
to:
no connection driver available for hello
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=851413
Currently we call virSetDeviceUnprivSGIO with val == 0 if a block device
has an sgio attribute. But for sgio='filtered', we know that a
kernel with no unpriv_sgio support will always behave as the user
wanted. In this case, there is no need to call the function and
report a (bogus) error.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If virCondInit fails (okay, so that's unlikely), then we end up
attempting a virObjectUnlock() on the cleanup path, even though
we don't hold a lock. This is not guaranteed to be safe. While
at it, I noticed a couple places where we were referencing mon->fd
outside locks.
* src/qemu/qemu_monitor.c (qemuMonitorOpenInternal): Minimize lock
duration. mon->watch doesn't need clean up on error.
(qemuMonitorGetBlockExtent, qemuMonitorBlockResize): Don't
dereference fd outside of lock.
I built without yajl support, and noticed a strange failure message
in qemumonitorjsontest:
2013-02-22 16:12:37.503+0000: 19812: error : virJSONValueToString:1119 : internal error No JSON parser implementation is available
2013-02-22 16:12:37.503+0000: 19812: error : qemuMonitorJSONCommandWithFd:253 : out of memory
While a later patch will fix the test to skip when json is not present,
this patch avoids overriding the more useful error message from
virJSONValueToString returning NULL.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCommandWithFd):
Don't override message.
(qemuMonitorJSONCheckError): Don't print NULL.
* src/qemu/qemu_agent.c (qemuAgentCommand): Don't override message.
(qemuAgentCheckError): Don't print NULL.
(qemuAgentArbitraryCommand): Properly fail on OOM.
The new TypedParam helper APIs allow to simplify this function
significantly.
This patch integrates the fix in 75e5bec97b
by correctly ordering the setting functions instead of reordering the
parameters.
The bridge device was showing the vnet devices created for the domains
as connected to the bridge. libvirt should only show host devices when
trying to get the interface definition rather than the domain devices as
well.
uid_t and gid_t are opaque types, ranging from s32 to u32 to u64.
Explicitly cast the magic -1 to the appropriate type.
Signed-off-by: Philipp Hahn <hahn@univention.de>
uid_t and gid_t are opaque types, ranging from s32 to u32 to u64.
Explicitly cast them to unsigned int for printing.
Signed-off-by: Philipp Hahn <hahn@univention.de>
The uid_t|gid_t values are explicitly casted to "unsigned long", but the
printf() still used "%d", which is for signed values.
Change the format to "%u".
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch adds a new capability bit QEMU_CAPS_OBJECT_RNG_EGD and code
to support the egd backend for the VirtIO RNG device.
The device is added by 3 qemu command line options:
-chardev socket,id=charrng0,host=1.2.3.4,port=1234 (communication
backend)
-object rng-egd,chardev=charrng0,id=rng0 (RNG protocol client)
-device virtio-rng-pci,rng=rng0,bus=pci.0,addr=0x4 (the RNG device)
This patch implements support for the virtio-rng-pci device and the
rng-random backend in qemu.
Two capabilities bits are added to track support for those:
QEMU_CAPS_DEVICE_VIRTIO_RNG - for the device support and
QEMU_CAPS_OBJECT_RNG_RANDOM - for the backend support.
qemu is invoked with these additional parameters if the device is
enabled:
-object rng-random,id=rng0,filename=/test/phile (to add the backend)
-device virtio-rng-pci,rng=rng0,bus=pci.0,addr=0x4 (to add the device)
This patch adds basic configuration support for the RNG device
supporting the virtio model with the "random" and "egd" backend types as
described in the schema in the previous patch.
This patch adds a fake switch statement to force the compiler to warn
after a new device type was added. This should remind the contributor to
add the new device also to this iterator function.
Originally, only a host name was used to associate a
DHCPv6 request with a specific IPv6 address. Further testing
demonstrates that this is an unreliable method and, instead,
a client-id or DUID needs to be used. According to DHCPv6
standards, this id can be a duid-LLT, duid-LL, or duid-UUID
even though dnsmasq will accept almost any text string.
Although validity checking of a specified string makes sure it is
hexadecimal notation with bytes separated by colons, there is no
rigorous check to make sure it meets the standard.
Documentation and schemas have been updated.
Signed-off-by: Gene Czarcinski <gene@czarc.net>
Signed-off-by: Laine Stump <laine@laine.org>
For really old qemu-img binaries which do not support specifying
the format of the backing file, display a DEBUG message instead of
INFO that this can't be done.
This function does the source part of NBD magic. It
invokes drive-mirror on each non shared and RW disk with
a source and wait till the mirroring process completes.
When it does we can proceed with migration.
Currently, an active waiting is done: every 500ms libvirt
asks qemu if block-job is finished or not. However, once
the job finishes, qemu doesn't report its progress so we
can only assume if the job finished successfully or not.
The better solution would be to listen to the event which
is sent as soon as the job finishes. The event does
contain the result of job.
We need to start NBD server and feed it with all non-<shared/>,
RW and source-full disks. Moreover, with new virPortAllocator we
must ensure the borrowed port for NBD server will be returned if
either migration completes or qemu process is torn down.
This will be used with new migration scheme.
This patch creates basically just monitor stub
functions. Wiring them into something useful
is done in later patches.
This will be used with new migration scheme.
This patch creates basically just monitor stub
functions. Wiring them into something useful
is done in later patches.
This migration cookie is meant for two purposes. The first is to be sent
in begin phase from source to destination to let it know we support new
implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can
start NBD server. Then, the second purpose is, destination can let us
know, on which port the NBD server is running.
This patch adds support for a new <option>-Tag in the <dhcp> block of
network configs, based on a subset of the fifth proposal by Laine
Stump in the mailing list discussion at
https://www.redhat.com/archives/libvir-list/2012-November/msg01054.html.
Any such defined option will result in a dhcp-option=<number>,"<value>"
statement in the generated dnsmasq configuration file.
Currently, DHCP options can be specified by number only and there is
no whitelisting or blacklisting of option numbers, which should
probably be added.
Signed-off-by: Pieter Hollants <pieter@hollants.com>
Signed-off-by: Laine Stump <laine@laine.org>
The bfree and blocks fields are supposed to be in units of frsize. We were
calculating capacity correctly using those units, but the available
calculation was using bsize instead. Most file systems report these as the
same value specifically because many programs are buggy, but that is no
reason to rely on that behavior, or to behave inconsistently.
This bug has been present since e266ded (2008) and aa296e6c, when the code
was originally introduced (the latter via cut and paste).
Signed-off-by: Sage Weil <sage@newdream.net>
On FreeBSD, I got a 'make check' failure:
GEN check-symsorting
Symbol block at ./libvirt_atomic.syms:4: viratomic.h not found
* src/Makefile.am (SYM_FILES): New define.
(check-symsorting): Check on all symfiles, even when not used.
* src/libvirt_atomic.syms: Fix offender.
As a side effect, this also fixes reporting disk migration process.
It was added to memory migration progress, which was wrong. Disk
progress has dedicated fields in virDomainJobInfo structure.
remoteDeserializeTypedParameters can now be called with either
preallocated params array (size of which is announced by nparams) or it
can allocate params array according to the number of parameters received
from the server.
Refactored the interface device type identification to make it more
clear about the operations. Add support for udev devtype to detect
VLANs on Linux 3.7 and newer. Move VLAN detection based on device
name to fallback case.
Mechanical move to break up udevIfaceGetIfaceDef() into different
helpers for each of the interface types to hopefully make the code
easier to follow. This moves the vlan code to
udevIfaceGetIfaceDefVlan().
Based on feedback from Laine Stump, improve a number of the error
handling cases to report the issue to the user instead of not generating
data or giving vague errors. Added the bridge device name to every error
message as well to make it clear which bridge failed.
Mechanical move to break up udevIfaceGetIfaceDef() into different
helpers for each of the interface types to hopefully make the code
easier to follow. This moves the bridge code to
udevIfaceGetIfaceDefBridge().
https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out
a regression caused by commit 38c4a9c - libvirt only labels
the backing chain if the backing chain cache is populated, but
the code to populate the cache was only conditionally performed
if cgroup labeling was necessary.
* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup...
* src/qemu/qemu_process.c (qemuProcessStart): ...earlier into
caller, where it is now unconditional.
To avoid having to hold the qemu driver lock while iterating through
close callbacks and calling them. This fixes a real deadlock when a
domain which is being migrated from another host gets autodestoyed as a
result of broken connection to the other host.
The max value of number of cpus to compute(id) should not
be equal or greater than max cpu number.
The bug ocurrs when id value is equal to max cpu number which
leads to the off-by-one error in the following for loop.
# virsh cpu-stats guest --start 1
error: Failed to virDomainGetCPUStats()
error: internal error cpuacct parse error
Don't allow interval to be > MAX_INT/1000 in virKeepAliveStart()
Guard against possible overflow in virKeepAliveTimeout() by setting the
timeout to be MAX_INT/1000 since the math following will multiply it by 1000.
Currently, if lzop decompression binary produces a warning, it
doesn't exit with zero status but 2 instead. Terrifying, but
true. However, warnings may be ignored using '--ignore-warn'
command line argument. Moreover, in which case, the exit status
will be zero.
For both AttachDevice and UpdateDevice APIs, if the disk device
is 'cdrom' or 'floppy', the operations could be ejecting, updating,
and inserting. For either ejecting or updating, the shared disk
entry of the original disk src has to be removed, because it's
not useful anymore.
And since the original disk def will be changed, new disk def passed
as argument will be free'ed in qemuDomainChangeEjectableMedia, so
we need to copy the orignal disk def before
qemuDomainChangeEjectableMedia, to use it for qemuRemoveSharedDisk.
The disk def could be free'ed by qemuDomainChangeEjectableMedia,
which can thus cause crash if we reference the disk pointer. On
the other hand, we have to remove the added shared disk entry from
the table on error codepath.
The hash entry is changed from "ref" to {ref, @domains}. With this, the
caller can simply call qemuRemoveSharedDisk, without afraid of removing
the entry belongs to other domains. qemuProcessStart will obviously
benifit from it on error codepath (which calls qemuProcessStop to do
the cleanup).
Based on moving various checking into qemuAddSharedDisk, this
avoids the caller using it in wrong ways. Also this adds two
new checking for qemuCheckSharedDisk (disk device not 'lun'
and kernel doesn't support unpriv_sgio simply returns 0).
Automating a sorting check is the only way to ensure we don't
regress. Suggested by Dan Berrange.
* src/check-symsorting.pl (check_sorting): Add a parameter,
validate that groups are in order, and that files exist.
* src/Makefile.am (check-symsorting): Adjust caller.
* src/libvirt_private.syms: Fix typo.
* src/libvirt_linux.syms: Fix file name.
* src/libvirt_vmx.syms: Likewise.
* src/libvirt_xenxs.syms: Likewise.
* src/libvirt_sasl.syms: Likewise.
* src/libvirt_libssh2.syms: Likewise.
* src/libvirt_esx.syms: Mention file name.
* src/libvirt_openvz.syms: Likewise.
Due to "feature"/"features" nasty typo, any features marked as mandatory
by one side of a migration are silently considered optional by the other
side. The following is the code that formats mandatory features in
migration cookie:
for (i = 0 ; i < QEMU_MIGRATION_COOKIE_FLAG_LAST ; i++) {
if (mig->flagsMandatory & (1 << i))
virBufferAsprintf(buf, " <feature name='%s'/>\n",
qemuMigrationCookieFlagTypeToString(i));
}
Some functions were using virDomainDeviceInfo where virDevicePCIAddress
would suffice. Some were only using integers for slots and functions,
assuming the bus numbers are always 0.
Switch from virDomainDeviceInfoPtr to virDevicePCIAddressPtr:
qemuPCIAddressAsString
qemuDomainPCIAddressCheckSlot
qemuDomainPCIAddressReserveAddr
qemuDomainPCIAddressReleaseAddr
Switch from int slot to virDevicePCIAddressPtr:
qemuDomainPCIAddressReserveSlot
qemuDomainPCIAddressReleaseSlot
qemuDomainPCIAddressGetNextSlot
Deleted functions (they would take the same parameters
as ReserveAddr/ReleaseAddr do now.)
qemuDomainPCIAddressReserveFunction
qemuDomainPCIAddressReleaseFunction
Recent renames were not reflected into the comments of
libvirt_private.syms; furthermore, since we mix private headers from
several directories into this file, knowing where the file lives
can be helpful.
* src/libvirt_private.sym: Reflect recent names.
We pass over the address/port start/end values many times so we put
them in structs.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Laine Stump <laine@laine.org>
Let users set the port range to be used for forward mode NAT:
...
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
...
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Laine Stump <laine@laine.org>
Support setting which public ip to use for NAT via attribute
address in subelement <nat> in <forward>:
...
<forward mode='nat'>
<address start='1.2.3.4' end='1.2.3.10'/>
</forward>
...
This will construct an iptables line using:
'-j SNAT --to-source <start>-<end>'
instead of:
'-j MASQUERADE'
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Laine Stump <laine@laine.org>
We need to drop the server lock before calling virObjectUnlock(client)
since in case we had the last reference to the client, its dispose
callback would be called and that could possibly try to lock the server
and cause a deadlock. This is exactly what happens when there is only
one QEMU domain running and it is marked to be autodestroyed when the
connection dies. This results in qemuProcessAutoDestroy ->
qemuProcessStop -> virNetServerRemoveShutdownInhibition call sequence,
where the last function locks the server.
The function does not report any errors so there should be no need too
reset an existing error first. Moreover, virTypedParamsFree is mostly
called in cleanup phase where it has the potential to reset any useful
reported earlier.
Gcc lets you do:
int ATTRIBUTE_NONNULL(1) foo(void *param);
int foo(void *param) ATTRIBUTE_NONNULL(1);
int ATTRIBUTE_NONNULL(1) foo(void *param) { ... }
but chokes on:
int foo(void *param) ATTRIBUTE_NONNULL(1) { ... }
However, since commit eefb881, we have intentionally been disabling
ATTRIBUTE_NONNULL because of lame gcc handling of the attribute (that
is, gcc doesn't do decent warning reporting, then compiles code that
mysteriously fails if you break the contract of the attribute, which
is surprisingly easy to do), leaving it on only for Coverity (which
does a much better job of improved static analysis when the attribute
is present).
But completely eliding the macro makes it too easy to write code that
uses the fourth syntax option, if you aren't using Coverity. So this
patch forces us to avoid syntax errors, even when not using the
attribute under gcc. It also documents WHY we disable the warning
under gcc, rather than forcing you to find the commit log.
* src/internal.h (ATTRIBUTE_NONNULL): Expand to empty attribute,
rather than nothing, when on gcc.
The conversion to qemuCaps dropped the ability with qemu{,-kvm} 1.2 and
newer to set the lost tick policy for the PIT. While the
-no-kvm-pit-reinjection option is depreacated, it is still supported at
least through 1.4, it is better to not lose the functionality.
Coverity found the DACGenLabel was checking for mgr == NULL after a
possible dereference; however, in order to get into the function the
virSecurityManagerGenLabel would have already dereferenced sec_managers[i]
so the check was unnecessary. Same check is made in SELinuxGenSecurityLabel.
Between revision 65fb9d49 and before this patch, an upgrade of libvirt while
VMs are running and instantiating iptables filtering rules due to nwfilter
rules, may leave stray iptables rules behind when shutting VMs down.
Left-over iptables rules may look like this:
Chain FP-vnet0 (1 references)
target prot opt source destination
DROP tcp -- 0.0.0.0/0 0.0.0.0/0 tcp spt:122
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
[...]
Chain libvirt-out (1 references)
target prot opt source destination
FO-vnet0 all -- 0.0.0.0/0 0.0.0.0/0 [goto] PHYSDEV match --physdev-out vnet0
The reason is that the recent nwfilter code only removed filtering rules in
the libvirt-out chain that contain the --physdev-is-bridged parameter.
Older rules didn't match and were not removed.
Note that the user-defined chain FO-vnet0 could not be removed due to the
reference from the rule in libvirt-out.
Often the work around may be done through
service iptables restart
kill -SIGHUP $(pidof libvirtd)
This patch now also removes older libvirt versions' iptables rules.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
If you have a qcow2 file /path1/to/file pointed to by symlink
/path2/symlink, and pass qemu /path2/symlink, then qemu treats
a relative backing file in the qcow2 metadata as being relative
to /path2, not /path1/to. Yes, this means that it is possible
to create a qcow2 file where the choice of WHICH directory and
symlink you access its contents from will then determine WHICH
backing file (if any) you actually find; the results can be
rather screwy, but we have to match what qemu does.
Libvirt and qemu default to creating absolute backing file
names, so most users don't hit this. But at least VDSM uses
symlinks and relative backing names alongside the
--reuse-external flags to libvirt snapshot operations, with the
result that libvirt was failing to follow the intended chain of
backing files, and then backing files were not granted the
necessary sVirt permissions to be opened by qemu.
See https://bugzilla.redhat.com/show_bug.cgi?id=903248 for
more gory details. This fixes a regression introduced in
commit 8250783.
I tested this patch by creating the following chain:
ls /home/eblake/Downloads/Fedora.iso # raw file for base
cd /var/lib/libvirt/images
qemu-img create -f qcow2 \
-obacking_file=/home/eblake/Downloads/Fedora.iso,backing_fmt=raw one
mkdir sub
cd sub
ln -s ../one onelink
qemu-img create -f qcow2 \
-obacking_file=../sub/onelink,backing_fmt=qcow2 two
mv two ..
ln -s ../two twolink
qemu-img create -f qcow2 \
-obacking_file=../sub/twolink,backing_fmt=qcow2 three
mv three ..
ln -s ../three threelink
then pointing my domain at /var/lib/libvirt/images/sub/threelink.
Prior to this patch, I got complaints about missing backing
files; afterwards, I was able to verify that the backing chain
(and hence DAC and SELinux relabels) of the entire chain worked.
* src/util/virstoragefile.h (_virStorageFileMetadata): Add
directory member.
* src/util/virstoragefile.c (absolutePathFromBaseFile): Drop,
replaced by...
(virFindBackingFile): ...better function.
(virStorageFileGetMetadataInternal): Add an argument.
(virStorageFileGetMetadataFromFD, virStorageFileChainLookup)
(virStorageFileGetMetadata): Update callers.
Prior to this patch, we had the callchains:
external users
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataFromBuf
virStorageFileGetMetadataRecurse
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataFromBuf
However, a future patch wants to add an additional parameter to
the bottom of the chain, for use by virStorageFileGetMetadataRecurse,
without affecting existing external callers. Since there is only a
single caller of the internal function, we can repurpose it to fit
our needs, with this patch giving us:
external users
\-> virStorageFileGetMetadataFromFD
\-> virStorageFileGetMetadataInternal
virStorageFileGetMetadataRecurse /
\-> virStorageFileGetMetadataInternal
* src/util/virstoragefile.c (virStorageFileGetMetadataFromFD):
Move most of the guts...
(virStorageFileGetMetadataFromBuf): ...here, and rename...
(virStorageFileGetMetadataInternal): ...to this.
(virStorageFileGetMetadataRecurse): Use internal helper.
virStorageFileGetMetadataFromFD is the only caller of
virStorageFileGetMetadataFromBuf; and it doesn't care about the
difference between a return of 0 (total success) or 1
(metadata was inconsistent, but pointer was populated as best
as possible); only about a return of -1 (could not read metadata
or out of memory). Changing the return type, and normalizing
the variable names used, will make merging the functions easier
in the next commit.
* src/util/virstoragefile.c (virStorageFileGetMetadataFromBuf):
Change return value, and rename some variables.
(virStorageFileGetMetadataFromFD): Rename some variables.
No semantic change; done so the next patch doesn't need a forward
declaration of a static function.
* src/util/virstoragefile.c (virStorageFileProbeFormatFromBuf):
Hoist earlier.
More mingw build failures:
CCLD libvirt-lxc.la
/usr/lib64/gcc/i686-w64-mingw32/4.7.2/../../../../i686-w64-mingw32/bin/ld: cannot find libvirt_lxc.def: No such file or directory
CC virportallocatortest-virportallocatortest.o
../../tests/virportallocatortest.c: In function 'main':
../../tests/virportallocatortest.c:195:1: error: implicit declaration of function 'setenv' [-Werror=implicit-function-declaration]
* src/Makefile.am (GENERATED_SYM_FILES): Also generate
libvirt_lxc.def.
* bootstrap.conf (gnulib_modules): Import setenv.
CC libvirt_util_la-vircommand.lo
../../src/util/vircommand.c:2358:1: error: 'virCommandHandshakeChild' defined but not used [-Werror=unused-function]
The function is only implemented inside #ifndef WIN32.
* src/util/vircommand.c (virCommandHandshakeChild): Hoist earlier,
so that win32 build doesn't hit an unused forward declaration.
No need to use HAVE_REGEX_H - our use of gnulib guarantees that
the header exists and works, regardless of platform. Similarly,
we can unconditionally assume a compiling <sys/wait.h> (although
the mingw version of this header is not full-featured).
* src/storage/storage_backend.c: Drop useless conditional.
* tests/testutils.c: Likewise.
virCommand was previously calling virSetUIDGID() to change the uid and
gid of the child process, then separately calling
virSetCapabilities(). This did not work if the desired uid was != 0,
since a setuid to anything other than 0 normally clears all
capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the
uid, gid, and capabilities bits. This will get the new process setup
properly.
Since the static functions virSetCapabilities() and
virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch
will make CAP_SYS_RAWIO (which is required for passthrough of generic
scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and
74e0349) be retained by qemu when necessary. Apparently that
capability has been broken for non-root qemu ever since it was
originally added.
Normally when a process' uid is changed to non-0, all the capabilities
bits are cleared, even those explicitly set with calls to
capng_update()/capng_apply() made immediately before setuid. And
*after* the process' uid has been changed, it no longer has the
necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities
bits, it is necessary to either call capng_change_id() (which
unfortunately doesn't currently call initgroups to setup auxiliary
group membership), or to perform the small amount of calisthenics
contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities
setting/clearing in virSetUIDGIDWithCaps() and virCommand's
virSetCapabilities() (which it will replace in the next patch) is that
the new function properly clears the capabilities bounding set, so it
will not be possible for a child process to set any new
capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in
capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed
to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current
capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities
maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to
setuid/setgid/change the bounding set (unless they were also requested
by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled
with setting the uid, this is necessarily done in a single function,
rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at
time of execve) for a process that has uid != 0, the *file*
capabilities of the binary being executed must also have the desired
capabilities bit(s) set (see "man 7 capabilities"). This can be done
with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio").
This is an interim measure to make sure everything still works in this
order. The next step will be to perform capabilities drop and
setuid/gid as a single operation (which is the only way to keep any
capabilities when switching to a non-root uid).
The qemu driver had been calling virSecurityManagerSetProcessLabel()
from a "pre-exec hook" function that is run after the child is forked,
but before exec'ing qemu. This is problematic because the uid and gid
of the child are set by the security driver, but capabilities are
dropped by virCommand - such separation doesn't work; the two
operations must be done together or the capabilities do not transfer
properly to the child process.
This patch switches to using virSecurityManagerSetChildProcessLabel(),
which is called prior to virCommandRun() (rather than being called
*during* virCommandrun() by the hook function), and doesn't set the
UID/GID/security label directly, but instead merely informs virCommand
what it should set them all to when the time is appropriate.
This lets virCommand choose to do the uid/gid and caps dropping all at
the same time if it wants (it does *want* to, but isn't doing so yet;
that's for an upcoming patch).
The existing virSecurityManagerSetProcessLabel() API is designed so
that it must be called after forking the child process, but before
exec'ing the child. Due to the way the virCommand API works, that
means it needs to be put in a "hook" function that virCommand is told
to call out to at that time.
Setting the child process label is a basic enough need when executing
any process that virCommand should have a method of doing that. But
virCommand must be told what label to set, and only the security
driver knows the answer to that question.
The new virSecurityManagerSet*Child*ProcessLabel() API is the way to
transfer the knowledge about what label to set from the security
driver to the virCommand object. It is given a virCommandPtr, and each
security driver calls the appropriate virCommand* API to tell
virCommand what to do between fork and exec.
1) in the case of the DAC security driver, it calls
virCommandSetUID/GID() to set a uid and gid that must be set for the
child process.
2) for the SELinux security driver, it calls
virCommandSetSELinuxLabel() to save a copy of the char* that will be
sent to setexeccon_raw() *after forking the child process*.
3) for the AppArmor security drivers, it calls
virCommandSetAppArmorProfile() to save a copy of the char* that will
be sent to aa_change_profile() *after forking the child process*.
With this new API in place, we will be able to remove
virSecurityManagerSetProcessLabel() from any virCommand pre-exec
hooks.
(Unfortunately, the LXC driver uses clone() rather than virCommand, so
it can't take advantage of this new security driver API, meaning that
we need to keep around the older virSecurityManagerSetProcessLabel(),
at least for now.)
virCommand gets two new APIs: virCommandSetSELinuxLabel() and
virCommandSetAppArmorProfile(), which both save a copy of a
null-terminated string in the virCommand. During virCommandRun, if the
string is non-NULL and we've been compiled with AppArmor and/or
SELinux security driver support, the appropriate security library
function is called for the child process, using the string that was
previously set. In the case of SELinux, setexeccon_raw() is called,
and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use
the upcoming virSecurityManagerSetChildProcessLabel() prior to running
a child process, rather than needing to setup a hook function to be
called (and in turn call virSecurityManagerSetProcessLabel()) *during*
the setup of the child process.
This makes it simpler to include the necessary system security driver
libraries for a particular system. For this patch, several existing
conditional sections from the Makfile were replaced; I'll later be
adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c
will be calling a function from $securitylib.
Setting the uid/gid of the child process was the only thing done by
the hook function in this case, and that can now be done more simply
with virCommandSetUID/GID.
Rather than treating uid:gid of 0:0 as a NOP, we blindly pass that
through to the lower layers. However, we *do* check for a requested
value of "-1" to mean "don't change this setting". setregid() and
setreuid() already interpret -1 as a NOP, so this is just an
optimization, but we are also calling getpwuid_r and initgroups, and
it's unclear what the former would do with a uid of -1.
If a uid and/or gid is specified for a command, it will be set just
after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set
uid/gid. This moves the setting of uid/gid and dropping of
capabilities closer to each other, which is important since the two
should really be done at the same time (libcapng provides a single
function that does both, which we will be unable to use, but want to
mimic as closely as possible).
All args except "cmd" in the call to virExec are now redundant, since
they can all be found in cmd, so remove the args and reference the
data directly in cmd. One exception to this is that "infd" was being
modified within virExec, and modifying the original in cmd caused make
check failures, so cmd->infd is copied to a local, and the local is
used during virExec().
virExecWithHook is only called from one place, so it always has the
same "hook" function (virHookCommand), and the data sent to that
function is always a virCommandPtr, so eliminate the function and
generic data from the arglist, and replace it with "virCommandPtr
cmd". The call to (hook)(data) is replaced with
"virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the
next patch, which will remove all other args to virExec (since they
are now redundant, as they're all members of virCommandPtr).
With the majority of fields in the virQEMUDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the QEMU driver lock. Only a handful
of helper APIs in qemu_conf.c now need it
Currently, if a command wants to do asynchronous IO, a callback
is registered in the libvirtd eventloop to handle writes and
reads. However, there's a race in virCommandWait. The eventloop
may already be executing the callback, while virCommandWait is
mangling internal state of virCommand. To deal with it, we need
to either introduce locking or spawn a separate thread where we
poll() on stdio from child. The former, however, requires to
unlock all mutexes held, as the event loop may execute other
callbacks which tries to lock one of the mutexes, deadlock and
thus never wake us up. So it's safer to spawn a separate thread.
Commit 8b55992f added some Coverity comments to silence what was
a real bug in the code. Since then, we've had a miserable run
of trying to fix the underlying problem (commits c059cde and
ba5193c), and still have a problem on 32-bit machines.
This fixes the problem for once and for all, by realizing that
on older xen, cpumap_t is identical to uint64_t, and using the
new virendian.h to do the transformation from the API (documented
to be little-endian) to the host structure.
* src/xen/xen_hypervisor.c (virXen_setvcpumap): Do the conversion
correctly. Finally.
This makes code easier to read, by avoiding lines longer than
80 columns and removing the repetition from the callers.
* src/util/virstoragefile.c (qedGetHeaderUL, qedGetHeaderULL):
Delete in favor of more generic macros.
(qcow2GetBackingStoreFormat, qcowXGetBackingStore)
(qedGetBackingStore, virStorageFileMatchesVersion)
(virStorageFileGetMetadataInternal): Use new macros.
* src/cpu/cpu_x86.c (x86VendorLoad): Likewise.
We have several cases where we need to read endian-dependent
data regardless of host endianness; rather than open-coding
these call sites, it will be nicer to funnel things through
a macro.
The virendian.h file can be expanded to add writer functions,
and/or 16-bit access patterns, if needed. Also, if we need
to turn things into a function to avoid multiple evaluations
of buf, that can be done later. But for now, a macro worked.
* src/util/virendian.h: New file.
* src/Makefile.am (UTIL_SOURCES): Ship it.
* tests/virendiantest.c: New test.
* tests/Makefile.am (test_programs, virendiantest_SOURCES): Run
the test.
* .gitignore: Ignore built file.
When removing a VM from the virDomainObjListPtr, we must not
be holding the VM lock while acquiring the list lock. Re-order
code to ensure that we can release the VM lock early.
The hook scripts used by virCommand must be careful wrt
accessing any mutexes that may have been held by other
threads in the parent process. With the recent refactoring
there are 2 potential flaws lurking, which will become real
deadlock bugs once the global QEMU driver lock is removed.
Remove use of the QEMU driver lock from the hook function
by passing in the 'virQEMUDriverConfigPtr' instance directly.
Add functions to the virSecurityManager to be invoked before
and after fork, to ensure the mutex is held by the current
thread. This allows it to be safely used in the hook script
in the child process.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
On RHEL 5, I got:
security/security_selinux.c: In function 'getContext':
security/security_selinux.c:971: warning: unused parameter 'mgr' [-Wunused-parameter]
* src/security/security_selinux.c (getContext): Mark potentially
unused parameter.
Add necessary handling code for the new s390 CCW address type to
virDomainDeviceInfo. Further, introduce memory management, XML
parsing, output formatting and range validation for the new
virDomainDeviceCCWAddress type.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
This just simply changes nodeDeviceLookupByWWN to be not static,
and its name into nodeDeviceLookupSCSIHostByWWN. And use that for
udev and HAL backends.
Like virNodeDeviceCreateXML, virNodeDeviceLookupSCSIHostByWWN
has to be treated specially when generating the RPC codes. Also
new rules are added in fixup_name to keep the name SCSIHostByWWN.
Since the name (like scsi_host10) is not stable for vHBA, (it can
be changed either after recreating or system rebooting), current
API virNodeDeviceLookupByName is not nice to use for management app
in this case. (E.g. one wants to destroy the vHBA whose name has
been changed after system rebooting, he has to find out current
name first).
Later patches will support the persistent vHBA via storage pool,
with which one can identify the vHBA stably by the wwnn && wwpn
pair.
So this new API comes.
The security manager drivers are not allowed to call back
out to top level security manager APIs, since that results
in recursive mutex acquisition and thus deadlock. Remove
calls to virSecurityManagerGetModel from SELinux / AppArmor
drivers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Turns out the issue regarding ptr_arith and sign_exension weren't false
positives. When shifting an 'unsigned char' as a target, it gets promoted
to an 'int'; however, that 'int' cannot be shifted 32 bits which was how
the algorithm was written. For the ptr_arith rather than index into the
cpumap, change the to address as necessary and assign directly.
Arguments for driver entry points are checked in libvirt.c, so no need to
check again. Make function entry points consistent. Don't type caste the
privateData.
Arguments for driver entry points are checked in libvirt.c, so no need to
check again. Make function entry points consistent. Don't type caste the
privateData.
Arguments for driver entry points are checked in libvirt.c, so no need to
check again. Make function entry points consistent. Don't type caste the
privateData.
Currently the APIs for managing the shared disk list take
a virHashTablePtr as the primary argument. This is bad
because it requires the caller to deal with locking of
the QEMU driver. Switch the APIs to take the full
virQEMUDriverPtr instance
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add locking to virSecurityManagerXXX APIs, so that use of the
security drivers is internally serialized. This avoids the need
to rely on the global driver locks to achieve serialization
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable locking to be introduced to the security manager
objects later, turn virSecurityManager into a virObjectLockable
class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of creating an iptables command in one shot, do it in steps
so we can add conditional options like physdev and protocol.
This removes code duplication while keeping existing behaviour.
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
From qemu's point of view these are still just tap devices, so there's
no reason they shouldn't work with vhost-net; as a matter of fact,
Raja Sivaramakrishnan <srajag00@yahoo.com> verified on libvir-list
that at least the qemu_command.c part of this patch works:
https://www.redhat.com/archives/libvir-list/2012-December/msg01314.html
(the hotplug case is extrapolation on my part).
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To avoid confusion between 'virCapsPtr' and 'qemuCapsPtr'
do some renaming of various fucntions/variables. All
instances of 'qemuCapsPtr' are renamed to 'qemuCaps'. To
avoid that clashing with the 'qemuCaps' typedef though,
rename the latter to virQEMUCaps.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupPtr instance APIs are safe to use without locking
in the QEMU driver, since all internal state they rely on is
immutable. Update the comment to reflect this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We are requesting for stderr catching for all cases in
virFileWrapperFdNew(). There is no need to have a separate
function just to report an error, esp. when we can do it in
virFileWrapperFdClose().
The qemuParseGlusterString() replaced dst->src without a VIR_FREE() of
what was in there before.
The qemuBuildCommandLine() did not properly free the boot_buf depending
on various usages.
The qemuParseCommandLineDisk() had numerous paths that didn't clean up
the virDomainDiskDefPtr def properly. Adjust the logic to go through an
error: label before cleanup in order to free the resource.
Commit 2025356 missed uses of PCI functions in the older HAL-related
code, probably because hal-devel is no longer available in latest Fedora.
* src/node_device/node_device_hal.c (gather_pci_cap): Reflect
function rename.
We had an easy way to iterate set bits, but not for iterating
cleared bits.
* src/util/virbitmap.h (virBitmapNextClearBit): New prototype.
* src/util/virbitmap.c (virBitmapNextClearBit): Implement it.
* src/libvirt_private.syms (bitmap.h): Export it.
* tests/virbitmaptest.c (test4): Test it.
The conditional setting of cmdout in networkBuildDhcpDaemonCommandLine()
caused Coverity to complain that 'cmd' could be leaked if !cmdout. Since
the function is local and only called with cmdout being passed those checks
have been removed.
Currently the activePciHostdevs, inactivePciHostdevsd and
activeUsbHostdevs lists are all implicitly protected by the
QEMU driver lock. Now that the lists all inherit from the
virObjectLockable, we can make the locking explicit, removing
the dependency on the QEMU driver lock for correctness.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow modifications to the lists to be synchronized, convert
virPCIDeviceList and virUSBDeviceList into virObjectLockable
classes. The locking, however, will not be self-contained. The
users of these classes will have to call virObjectLock/Unlock
in the critical regions.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When iterating over USB host devices to setup cgroups, the
usbDevice object was leaked in both LXC and QEMU driers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The QEMU driver struct has a 'qemuVersion' field that was previously
used to cache the version lookup from capabilities. With the recent
QEMU capabilities rewrite the caching happens at a lower level so
this field is pointless. Removing it avoids worries about locking
when updating it.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Switch virDomainObjList to inherit from virObjectLockable and
make all the APIs acquire/release the mutex when running. This
makes virDomainObjList completely self-locking and no longer
reliant on the hypervisor driver locks
The duplicate VM checking should be done atomically with
virDomainObjListAdd, so shoud not be a separate function.
Instead just use flags to indicate what kind of checks are
required.
This pair, used in virDomainCreateXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainRestoreFlags:
if (virDomainObjListIsDuplicate(privconn->domains, def, 1) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, true)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
NULL)))
goto cleanup;
This pair, used in virDomainDefineXML:
if (virDomainObjListIsDuplicate(privconn->domains, def, 0) < 0)
goto cleanup;
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def, false)))
goto cleanup;
Changes to
if (!(dom = virDomainObjListAdd(privconn->domains,
privconn->caps,
def,
0, NULL)))
goto cleanup;
Otherwise, we get a lot of scary (but harmless) noise in the logs:
2013-02-05 15:35:48.555+0000: 8637: error : qemuMonitorJSONCheckError:353 : internal error unable to execute QEMU command 'add-fd': Parameter 'fdset-id' expects an existing fdset-id
one for every qemu 1.2 binary that we probe.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd): During
probe, avoid logging failures.
As a step towards making virDomainObjList thread-safe turn it
into an opaque virObject, preventing any direct access to its
internals.
As part of this a new method virDomainObjListForEach is
introduced to replace all existing usage of virHashForEach
Currently the virQEMUDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virQEMUDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virQEMUDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.
NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a compression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
If a decompression binary prints something to stderr, currently
it is discarded. However, it can contain useful data from
debugging POV, so we should catch it.
Commit 34e8f63a32 introduced support for catching errors from
libvirt iohelper. However, at those times there wasn't such fancy
API as virCommandDoAsyncIO(), so everything has to be implemented
on our own. But since we do have the API now, we can use it and
drop our implementation then.
Currently, if we want to feed stdin, or catch stdout or stderr of a
virCommand we have to use virCommandRun(). When using virCommandRunAsync()
we have to register FD handles by hand. This may lead to code duplication.
Hence, introduce an internal API, which does this automatically within
virCommandRunAsync(). The intended usage looks like this:
virCommandPtr cmd = virCommandNew*(...);
char *buf = NULL;
...
virCommandSetOutputBuffer(cmd, &buf);
virCommandDoAsyncIO(cmd);
if (virCommandRunAsync(cmd, NULL) < 0)
goto cleanup;
...
if (virCommandWait(cmd, NULL) < 0)
goto cleanup;
/* @buf now contains @cmd's stdout */
VIR_DEBUG("STDOUT: %s", NULLSTR(buf));
...
cleanup:
VIR_FREE(buf);
virCommandFree(cmd);
Note, that both stdout and stderr buffers may change until virCommandWait()
returns.
libvirt.c calls curl_global_init() if WITH_CURL is defined and thus it
should be linked with libcurl. This fixes link failure in case neither
xenapi nor esx driver is enabled (they are the only users of libcurl).
QEMU is fully capable of handling VDI images and we just refuse to
work with them. As qemu-img knows and supports this, there should be
no problem with this addition.
This is of course, just basic functionality, without searching for any
backing files, etc.
Some files have the magic shifted to some offset other than 0, so we
have to support that. I also cleaned up some lines to be more
readable and added missing magic for iso file format.
Commit 4445e16bfa changed the signature
of esxConnectToHost and esxConnectToVCenter by replacing the esxPrivate
pointer with virConnectPtr. The esxPrivate pointer was then retrieved
again from virConnectPtr's privateData. This resulted in a NULL pointer
dereference, because the privateData pointer was not yet initialized at
the point where esxConnectToHost and esxConnectToVCenter are called.
This was fixed in commit b126715a48 that
moved the initialization of privateData before the problematic calls.
Simplify the logic by making the call to esxFreePrivate unconditional and
changing esxConnectToHost and esxConnectToVCenter back to take a esxPrivate
pointer directly. This allows to assign esxPrivate to the virConnectPtr's
privateData pointer as one of the last steps in esxOpen making it more
obvious that it is not initialized during the earlier steps of esxOpen.
Commit 6094ad7b (0.9.3 release) promoted several functions from
internal to public, but forgot to fix the documentation generator
to provide details about those functions.
For an example of what this fixes, look at:
file:///path/to/libvirt/docs/html/libvirt-libvirt.html#virEventAddHandle
before and after the patch.
* docs/apibuild.py (ignored_functions): Don't ignore functions
that were turned into official API.
* src/util/virevent.c: Fix comments to pass through parser.
Add support for QEMU -add-fd command line parameter detection.
This intentionally rejects qemu 1.2, where 'add-fd' QMP did
not allow full control of set ids, and where there was no command
line counterpart, but accepts qemu 1.3.
Signed-off-by: Eric Blake <eblake@redhat.com>
Add entry points for calling the qemu 'add-fd' and 'remove-fd'
monitor commands. There is no entry point for 'query-fdsets';
the assumption is that a developer can use
virsh qemu-monitor-command domain '{"execute":"query-fdsets"}'
when debugging issues, and that meanwhile, libvirt is responsible
enough to remember what fds it associated with what fdsets.
Likewise, on the 'add-fd' command, it is assumed that libvirt
will always pass a set id, rather than letting qemu autogenerate
the next available id number.
* src/qemu/qemu_monitor.c (qemuMonitorAddFd, qemuMonitorRemoveFd):
New functions.
* src/qemu/qemu_monitor.h (qemuMonitorAddFd, qemuMonitorRemoveFd):
New prototypes.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New functions.
* src/qemu/qemu_monitor_json.h (qemuMonitorJSONAddFd)
(qemuMonitorJSONRemoveFd): New prototypes.
Way back when I started making changes for Coverity messages my first set
were to a bunch of CHECKED_RETURN errors. In particular virAsprintf() had
a few callers that Coverity noted didn't check their return (although some
did check if the buffer being printed to was NULL or not).
It was suggested at the time as a further patch an ATTRIBUTE_RETURN_CHECK
should be added to virAsprintf(), see:
https://www.redhat.com/archives/libvir-list/2013-January/msg00120.html
This patch does that and fixes a few more instances not found by Coverity
that failed the check.
When a disk-only snapshot is requested the domain is treated as if it
was offline. This forbids to mix memory checkpoints with the DISK_ONLY
flag.
This patch improves the error message and mentions the restriction in
the virsh man page.
Commit 60b176c3d0 introduced a bug that
when editing an XML with cputune similar to this:
...
<vcpu placement='static' current='1'>2</vcpu>
<cputune>
<vcpupin vcpu="1" cpuset="0"/>
</cputune>
...
results in formatted XML that looks like this:
...
<vcpu placement='static' current='1'>2</vcpu>
<cputune>
</cputune>
...
That is caused by a condition depending on def->cputune.vcpupin being
set rather than checking def->cputune.nvcpupin. Notice that nvcpupin
can be 0 and vcpupin can still be allocated since it's a pointer to an
array, so no harm done there.
I also changed it on other places in the code where it depended on the
wrong variable.
Setting the log output prefix to 0 is not supported and in fact results
in the following message:
warning : virLogParseOutputs:1021 : Ignoring invalid log output setting.
https://bugzilla.redhat.com/show_bug.cgi?id=894723
Currently, if qemuProcessStart() succeeds, but it's decompression
binary that returns nonzero status, we don't kill the qemu process,
but remove it from internal domain list, leaving the qemu process
hanging around totally uncontrolled.
This patch resolves CVE-2013-0170:
https://bugzilla.redhat.com/show_bug.cgi?id=893450
When reading and dispatching of a message failed the message was freed
but wasn't removed from the message queue.
After that when the connection was about to be closed the pointer for
the message was still present in the queue and it was passed to
virNetMessageFree which tried to call the callback function from an
uninitialized pointer.
This patch removes the message from the queue before it's freed.
* rpc/virnetserverclient.c: virNetServerClientDispatchRead:
- avoid use after free of RPC messages
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
Adjust the macros to free memory allocated during various calls to
perform the check if parameter is NULL prior to really freeing and to
set the pointer to NULL after done freeing.
Resolve a false positive from 'vboxIIDFromUUID_v2_x()'. The code sets
'iid->value = &iid->backing' unconditionally prior to calling 'nsIDFromChar()'.
The 'vboxIIDUnalloc_v2_x()' checks iid->value to not be &iid->backing. The
iid->backing is a static buffer within the initialized structure.
Since libxl provides the domain ID in the event handler callback,
find the domain object based on the ID. This approach prevents
processing the callback on a domain that has already been reaped.
Also, similar to the xl implementation, ignore the SUSPEND shutdown
reason. By calling libxl_domain_suspend(), we know a shutdown
event with SUSPEND reason will be generated, but it can be safely
ignored since any subsequent cleanup will be done by the callers.
libxlDoDomainSave() was removing non-persistent domains, but
required callers to have the virDomainObj locked. Callers could
potentially unlock an already freed virDomainObj. Move this
logic to the callers of libxlDoDomainSave().
I've noticed that libxl can invoke timeout reregister/modify hooks
after returning from libxl_ctx_free. Explicitly remove the
timeouts before freeing the libxl ctx to avoid executing hooks on
stale objects.
It is possible to destroy and cleanup a VM, resulting in freeing the
libxlDomainObjPrivate object and associated libxl ctx, before all fds and
timeouts have been deregistered and destroyed.
Fix this race by incrementing the reference count on libxlDomainObjPrivate
for each fd and timeout registration. Only when all fds and timeouts are
deregistered and destroyed will the libxlDomainObjPrivate be destroyed.
The libxl driver is racy in it's interactions with libxl and libvirt's
event loop. The event loop can invoke callbacks after libxl has
deregistered the event, and possibly access freed data associated with
the event.
This patch fixes the race by converting libxlDomainObjPrivate to a
virObjectLockable, and locking it while executing libxl upcalls and
libvirt event loop callbacks.
Note that using the virDomainObj lock is not satisfactory since it may
be desirable to hold the virDomainObj lock even when libxl events such
as reading and writing to xenstore need processed.
xen-unstable changeset 26469 makes changes wrt modifying and deregistering
timeouts.
First, timeout modify callbacks will only be invoked with an
abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this
commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be
compatible before and after changeset 26469.
While at it, fix a potential overflow in the timeout register callback.
While working with a pmsuspend vs. snapshot issue, I noticed that
the state file in /var/run/libvirt/qemu/dom.xml contained a rather
suspicious "(null)" string, which does not round-trip well through
a libvirtd restart. Had I been on a platform other than glibc
where printf("%s",NULL) crashes instead of printing (null), we might
have noticed the problem much sooner.
And in fixing that problem, I also noticed that we had several
missing states, because we were #defining several *_LAST names
to a value _different_ than what they were already given as enums
in libvirt.h. Yuck. I got rid of default: labels in the case
statements, because they get in the way of gcc's -Wswitch helping
us ensure we cover all enum values.
* src/conf/domain_conf.c (virDomainStateReasonToString)
(virDomainStateReasonFromString): Fill in missing domain states;
rewrite case statement to let compiler enforce checking.
(VIR_DOMAIN_NOSTATE_LAST, VIR_DOMAIN_RUNNING_LAST)
(VIR_DOMAIN_BLOCKED_LAST, VIR_DOMAIN_PAUSED_LAST)
(VIR_DOMAIN_SHUTDOWN_LAST, VIR_DOMAIN_SHUTOFF_LAST)
(VIR_DOMAIN_CRASHED_LAST): Drop dead defines.
(VIR_DOMAIN_PMSUSPENDED_LAST): Drop dead define.
(virDomainPMSuspendedReason): Add missing enum function.
(virDomainRunningReason, virDomainPausedReason): Add missing enum
value.
* src/conf/domain_conf.h (virDomainPMSuspendedReason): Declare
missing functions.
* src/libvirt_private.syms (domain_conf.h): Export them.
With our code, we fail to query for tray-open attribute currently.
That's because in HMP it is 'tray-open' and in QMP it's 'tray_open'.
It always has been. However, we got it exactly the opposite.
A logic bug meant we reported KVM was possible for every
architecture, merely based on whether the query-kvm command
exists. We should instead have been doing it based on whether
the query-kvm command returns 'present: 1'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently QEMU capabilities are initialized before the QEMU driver
sets ownership on its various directories. The upshot is that if
you change the user/group in the qemu.conf file, libvirtd will fail
to probe QEMU the first time it is run after the config change.
Moving QEMU capabilities initialization to after the chown() calls
fixes this
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This previous commit
commit 1a50ba2cb0
Author: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Date: Mon Nov 26 15:17:13 2012 +0100
qemu: Fix QMP Capabability Probing Failure
which attempted to make sure the QEMU process used for probing
ran as the right user id, caused serious performance regression
and unreliability in probing. The -daemonize switch in QEMU
guarantees that the monitor socket is present before the parent
process exits. This means libvirtd is guaranteed to be able to
connect immediately. By switching from -daemonize to the
virCommandDaemonize API libvirtd was no longer synchronized with
QEMU's startup process. The result was that the QEMU monitor
failed to open and went into its 200ms sleep loop. This happened
for all 25 binaries resulting in 5 seconds worth of sleeping
at libvirtd startup. In addition sometimes when libvirt connected,
QEMU would be partially initialized and crash causing total
failure to probe that binary.
This commit reverts the previous change, ensuring we do use the
-daemonize flag to QEMU. Startup delay is cut from 7 seconds
to 2 seconds on my machine, which is on a par with what it was
prior to the capabilities rewrite.
To deal with the fact that QEMU needs to be able to create the
pidfile, we switch pidfile location fron runDir to libDir, which
QEMU is guaranteed to be able to write to.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there is no reason to hold qemu driver locked
throughout whole API execution. Moreover, we can use the
new qemuDomObjFromDomain() internal API to lookup domain then.
Hosts for rbd are ceph monitor daemons. These have fixed IP addresses,
so they are often referenced by IP rather than hostname for
convenience, or to avoid relying on DNS. Using IPv4 addresses as the
host name works already, but IPv6 addresses require rbd-specific
escaping because the colon is used as an option separator in the
string passed to qemu.
Escape these colons, and enclose the IPv6 address in square brackets
so it is distinguished from the port, which is currently mandatory.
Acked-by: Osier Yang <jyang@redhat.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
if a guest is put into S3 state (such as via virsh dompmsuspend)
and then an external snapshot is taken, qemu forcefully transitions
the domain to paused, but libvirt doesn't reflect that change
internally. Thus, a user has to use 'virsh suspend' to get libvirt
back in sync with qemu state, and if the user doesn't know this
trick, then the guest appears hung.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
Track fact that qemu wakes up a suspended domain on migration.