THREADS.txt states that the contents of vm should not be read or
modified while the vm lock is not held, but that the lock must not
be held while performing a monitor command. This fixes all the
offenders that I could find.
* src/qemu/qemu_process.c (qemuProcessStartCPUs)
(qemuProcessInitPasswords, qemuProcessStart): Don't modify or
refer to vm state outside lock.
* src/qemu/qemu_driver.c (qemudDomainHotplugVcpus): Likewise.
* src/qemu/qemu_hotplug.c (qemuDomainChangeGraphicsPasswords):
Likewise.
This is detailed in:
https://bugzilla.redhat.com/show_bug.cgi?id=688957
Since radvd is executed by daemonizing it, the attempt to exec the
radvd binary doesn't happen until after libvirtd has already received
an exit code from the intermediate forked process, so no error is
detected or logged by __virExec().
We can't require radvd as a prerequisite for the libvirt package (many
installations don't use IPv6, so they don't need it), so instead we
add in a check to verify there is an executable radvd binary prior to
trying to exec it.
When SASL is active, it was possible that we read and decoded
more data off the wire than we initially wanted. The loop
processing this data terminated after only one message to
avoid delaying the calling thread, but this could delay
event delivery. As long as there is decoded SASL data in
memory, we must process it, before returning to the poll()
event loop.
This is a counterpart to the same kind of issue solved in
commit 68d2c3482f
in a different area of the code
* src/remote/remote_driver.c: Process all pending SASL data
virExec would only resolved the binary to $PATH if no env
variables were being set. Since there is no execvep() API
in POSIX, we use virFindFileInPath to manually resolve
the binary and then use execv() instead of execvp().
Add a new xen driver based on libxenlight [1], which is the primary
toolstack starting with Xen 4.1.0. The driver is stateful and runs
privileged only.
Like the existing xen-unified driver, the libxenlight driver is
accessed with xen:// URI. Driver selection is based on the status
of xend. If xend is running, the libxenlight driver will not load
and xen:// connections are handled by xen-unified. If xend is not
running *and* the libxenlight driver is available, xen://
connections are deferred to the libxenlight driver.
V6:
- Address several code style issues noted by Daniel Veillard
- Make drive work with xen:/// URI
- Hold domain object reference while domain is injected in
libvirt event loop. Race found and fixed by Markus Groß.
V5:
- Ensure events are unregistered when domain private data
is destroyed. Discovered and fixed by Markus Groß.
V4:
- Handle restart of libvirtd, reconnecting to previously
started domains
- Rebased to current master
- Tested against Xen 4.1 RC7-pre (c/s 22961:c5d121fd35c0)
V3:
- Reserve vnc port within driver when autoport=yes
V2:
- Update to Xen 4.1 RC6-pre (c/s 22940:5a4710640f81)
- Rebased to current master
- Plug memory leaks found by Stefano Stabellini and valgrind
- Handle SHUTDOWN_crash domain death event
[1] http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00436.html
Calling most hash APIs is not safe from inside of an iterator callback.
Exceptions are APIs that do not modify the hash table and removing
current hash entry from virHashFroEach callback.
This patch make all APIs which are not safe fail instead of just relying
on the callback being nice not calling any unsafe APIs.
Steps to reproduce this bug:
# cat test.sh
#! /bin/bash -x
virsh start domain
sleep 5
virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
# while true; do ./test.sh ; done
Then libvirtd will crash.
The reason is that:
we add a reference of obj when we open the monitor. We will reduce this
reference when we free the monitor.
If the reference of monitor is 0, we will free monitor automatically and
the reference of obj is reduced.
But in the function qemuDomainObjExitMonitorWithDriver(), we reduce this
reference again when the reference of monitor is 0.
It will cause the obj be freed in the function qemuDomainObjEndJob().
Then we start the domain again, and libvirtd will crash in the function
virDomainObjListSearchName(), because we pass a null pointer(obj->def->name)
to strcmp().
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
This bug was reported by Shi Jin(jinzishuai@gmail.com):
=============
# virsh attach-disk RHEL6RC /var/lib/libvirt/images/test3.img vdb \
--driver file --subdriver qcow2
Disk attached successfully
# virsh save RHEL6RC /var/lib/libvirt/images/memory.save
Domain RHEL6RC saved to /var/lib/libvirt/images/memory.save
# virsh restore /var/lib/libvirt/images/memory.save
error: Failed to restore domain from /var/lib/libvirt/images/memory.save
error: internal error unsupported driver name 'file'
for disk '/var/lib/libvirt/images/test3.img'
=============
We check the driver name when we start or restore VM, but we do
not check it while attaching a disk. This adds the same check on disk
driverName used in qemuBuildCommandLine to qemudDomainAttachDevice.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
As pointed out, locking the buffer from the signal handler
cannot been guaranteed to be safe, so to avoid any hazard
we prefer the trade off of dumping logs possibly messed up
by concurrent logging activity rather than risk a daemon
crash.
* src/util/logging.c: change virLogEmergencyDumpAll() to not
take any lock on the log buffer but reset buffer content variables
to an empty set before starting the actual dump.
Steps to reproduce this bug:
# virsh qemu-monitor-command domain 'cpu_set 2 online' --hmp
The domain has 2 cpus, and we try to set the third cpu online.
The qemu crashes, and this command will hang.
The reason is that the refs is not 1 when we unwatch the monitor.
We lock the monitor, but we do not unlock it. So virCondWait()
will be blocked.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
* Correct the documentation for cgroup: the swap_hard_limit indicates
mem+swap_hard_limit.
* Change cgroup private apis to: virCgroupGet/SetMemSwapHardLimit
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
I'm proposing we make use of $PCIDIR/reset in qemu-kvm to reset
devices on VM reset. We need to add it to libvirt's list of
files that get ownership for device assignment.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
xen-unstable c/s 21118:28e5409e3fb3 bumped sysctl version to 8.
xen-unstable c/s 21212:de94884a669c introduced CPU pools feature,
adding another member to xen_domctl_getdomaininfo struct. Add a
corresponding domctl v7 struct in xen hypervisor sub-driver and
detect sysctl v8 during initialization.
The virCond of the remote_thread_call struct was leaked in some
places. This results in leaking the underlying mutex. Which in turn
leaks a handle on Windows.
Reported by Aliaksandr Chabatar and Ihar Smertsin.
A bug in libnl (see https://bugzilla.redhat.com/show_bug.cgi?id=677724
and https://bugzilla.redhat.com/show_bug.cgi?id=677725) makes it very
easy to create a failure to connect to the netlink socket when trying
to open a macvtap network device ("type='direct'" in domain interface
XML). When that error occurred (during a call to libnl's nl_connect()
from libvirt's nlComm(), there was no log message, leading virsh (for
example) to report "unknown error".
There were two other cases in nlComm where an error in a libnl
function might return with failure but no error reported. In all three
cases, this patch logs a message which will hopefully be more useful.
Note that more detailed information about the failure might be
available from libnl's nl_geterror() function, but it calls
strerror(), which is not threadsafe, so we can't use it.
If pool xml has no definition for "port", then "Segmentation fault"
happens when jumping to "cleanup:" to do "VIR_FREE(port)", as "port"
was not initialized in this situation.
* src/conf/storage_conf.c
POSIX states about dd:
If the bs=expr operand is specified and no conversions other than
sync, noerror, or notrunc are requested, the data returned from each
input block shall be written as a separate output block; if the read
returns less than a full block and the sync conversion is not
specified, the resulting output block shall be the same size as the
input block. If the bs=expr operand is not specified, or a conversion
other than sync, noerror, or notrunc is requested, the input shall be
processed and collected into full-sized output blocks until the end of
the input is reached.
Since we aren't using conv=sync, there is no zero-padding, but our
use of bs= means that a short read results in a short write. If
instead we use ibs= and obs=, then short reads are collected and dd
only has to do a single write, which can make dd more efficient.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile):
Avoid 'dd bs=', since it can cause short writes.
The virCommandNewArgs() method would free the virCommandPtr
if it failed to add the args. This meant errors reported in
virCommandAddArgSet() were lost. Simply removing the check
for errors from the constructor means they can be reported
correctly later
The virCommandAddEnvPassCommon() method failed to check for
errors before reallocating the cmd->env array, causing a
potential SEGV if cmd was NULL
The virCommandAddArgSet() method needs to validate that at
least 1 element in 'val's parameter is non-NULL, otherwise
code like
cmd = virCommandNew(binary)
virCommandAddAtg(cmd, "foo")
Would end up trying todo execve("foo"), if binary was
NULL.
The virSetNonBlock() API only allows enabling non-blocking
operations. It doesn't allow turning blocking back on. Add
a new API to allow arbitrary toggling.
* src/libvirt_private.syms, src/util/util.h
src/util/util.c: Add virSetBlocking
This patch fix a simple bug in virDomainSetMemoryFlags function.
The patch sent before lacks the consideration of the case
where the driver doesn't support virDomainSetMemoryFlags API.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
The current LXC I/O controller looks for HUP to detect
when a guest has quit. This isn't reliable as during
initial bootup it is possible that 'init' will close
the console and let mingetty re-open it. The shutdown
of containers was also flakey because it only killed
the libvirt I/O controller and expected container
processes to gracefully follow.
Change the I/O controller such that when it see HUP
or an I/O error, it uses kill($PID, 0) to see if the
process has really quit.
Change the container shutdown sequence to use the
virCgroupKillPainfully function to ensure every
really goes away
This change makes the use of the 'cpu', 'devices'
and 'memory' cgroups controllers compulsory with
LXC
* docs/drvlxc.html.in: Document that certain cgroups
controllers are now mandatory
* src/lxc/lxc_controller.c: Check if PID is still
alive before quitting on I/O error/HUP
* src/lxc/lxc_driver.c: Use virCgroupKillPainfully
This is the part allowing to dynamically resize the debug log
buffer from it's default 64kB size. The buffer is now dynamically
allocated.
It adds a new API virLogSetBufferSize() which resizes the buffer
If passed a zero size, the buffer is deallocated and we do the small
optimization of not formatting messages which are not output anymore.
On the daemon side, it just adds a new option log_buffer_size to
libvirtd.conf and call virLogSetBufferSize() if needed
* src/util/logging.h src/util/logging.c src/libvirt_private.syms:
make buffer dynamic and add virLogSetBufferSize() internal API
* daemon/libvirtd.conf: document the new log_buffer_size option
* daemon/libvirtd.c: read and use the new log_buffer_size option
Outgoing migration still uses a Unix socket and or exec netcat until
the next patch.
* src/qemu/qemu_migration.c (qemuMigrationPrepareTunnel):
Replace Unix socket with simpler pipe.
Suggested by Paolo Bonzini.
The newly added call to qemuAuditNetDevice in qemuPhysIfaceConnect was
assuming that res_ifname (the name of the macvtap device) was always
valid, but this isn't the case. If openMacvtapTap fails, it always
returns NULL, which would result in a segv.
Since the audit log only needs a record of devices that are actually
sent to qemu, and a failure to open the macvtap device means that no
device will be sent to qemu, we can solve this problem by only doing
the audit if openMacvtapTap is successful (in which case res_ifname is
guaranteed valid).
Normally dnsmasq will send a default route (the address of the host in
the network definition) to any client requesting an address via
DHCP. On an isolated network this makes no sense, as we have iptables
to prevent any traffic going out via that interface, so anything sent
that way would be dropped anyway.
This extra/unusable default route becomes problematic if you have
setup a guest with multiple network interfaces, with one connected to
an isolated network and another that provides connectivity to the
outside (example - one interface directly connecting to a physical
interface via macvtap, with a second connected to an isolated network
so that the host and guest can communicate (macvtap doesn't support
guest<->host communication without an external switch that supports
vepa, or reflecting all traffic back)). In this case, if the guest
chooses the default route of the isolated network, the guest will not
be able to get network traffic beyond the host.
To prevent dnsmasq from sending a default route, you can tell it to
send 0 bytes of data for the default route option (option number 3)
with --dhcp-option=3 (normally the data to send for the option would
follow the option number; no extra data means "don't send this option").
I have checked on RHEL5 (a good representative of the oldest supported
libvirt platforms) and its version of dnsmasq (2.45) does support
--dhcp-option, so this shouldn't create any compatibility problems.
As pointed on CVE-2011-1146, some API forgot to check the read-only
status of the connection for entry point which modify the state
of the system or may lead to a remote execution using user data.
The entry points concerned are:
- virConnectDomainXMLToNative
- virNodeDeviceDettach
- virNodeDeviceReAttach
- virNodeDeviceReset
- virDomainRevertToSnapshot
- virDomainSnapshotDelete
* src/libvirt.c: fix the above set of entry points to error on read-only
connections
By default, all dnsmasq processes share the same leases file. libvirt
also uses the --dhcp-lease-max option to control the maximum number of
leases allowed. The problem is that libvirt puts in a number equal to
the number of addresses in the range for the one network handled by a
single instance of dnsmasq, but dnsmasq checks the total number of
leases in the file (which could potentially contain many more).
The solution is to tell each instance of dnsmasq to create and use its
own leases file. (/var/lib/libvirt/network/<net-name>.leases).
This file is created by dnsmasq when it starts, but not deleted when
it exists. This is fine when the network is just being stopped, but if
the leases file was left around when a network was undefined, we could
end up with an ever-increasing number of dead files - instead, we
explicitly unlink the leases file when a network is undefined.
Note that Ubuntu carries a patch against an older version of libvirt for this:
hhttps://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/713071
ttp://bazaar.launchpad.net/~serge-hallyn/ubuntu/maverick/libvirt/bugall/revision/109
I was certain I'd also seen discussion of this on libvir-list or
libvirt-users, but couldn't find it.
This fixes a regression introduced in commit ad48df, and reported on
the libvirt-users list:
https://www.redhat.com/archives/libvirt-users/2011-March/msg00018.html
The problem in that commit was that we began searching a list of ip
address definitions (rather than just having one) to look for a dhcp
range or static host; when we didn't find any, our pointer (ipdef) was
left at NULL, and when ipdef was NULL, we returned without starting up
dnsmasq.
Previously dnsmasq was started even without any dhcp ranges or static
entries, because it's still useful for DNS services.
Another problem I noticed while investigating was that, if there are
IPv6 addresses, but no IPv4 addresses of any kind, we would jump out
at an ever higher level in the call chain.
This patch does the following:
1) networkBuildDnsmasqArgv() = all uses of ipdef are protected from
NULL dereference. (this patch doesn't change indentation, to make
review easier. The next patch will change just the
indentation). ipdef is intended to point to the first IPv4 address
with DHCP info (or the first IPv4 address if none of them have any
dhcp info).
2) networkStartDhcpDaemon() = if the loop looking for an ipdef with
DHCP info comes up empty, we then grab the first IPv4 def from the
list. Also, instead of returning if there are no IPv4 defs, we just
return if there are no IP defs at all (either v4 or v6). This way a
network that is IPv6-only will still get dnsmasq listening for DNS
queries.
3) in networkStartNetworkDaemon() - we will startup dhcp not just if there
are any IPv4 addresses, but also if there are any IPv6 addresses.
Currently a single storage volume with a broken backing file will disable the
whole storage pool. This can happen when the backing file is on some
unavailable network storage or if the backing volume is deleted, while the
storage volumes using it remain.
Since the storage pool can not be re-activated, re-creating the missing
or deleting the now useless volumes using libvirt only is not possible.
Fixing this is a little bit tricky:
1. virStorageBackendProbeTarget() only detects the missing backing file,
if the backing file format is not explicitly specified. If the
backing file is created using
kvm-img create -f qcow2 -o backing_fmt=qcow2,backing_file=... ...
no error is detected at this stage.
The new return code -3 signals that the backing file could not be
opened.
2. The backingStore.format must be >= 0, since values < 0 would break
virStorageVolTargetDefFormat() when dumping the XML data such as
<format type='...'/>
Because of this the format is faked as VIR_STORAGE_FILE_RAW.
3. virStorageBackendUpdateVolTargetInfo() always opens the backing file
and thus always detects a missing backing file.
Since it "only" updates the capacity, allocation, owner, group, mode
and SELinux label, just ignore errors at this stage, print an error
message and continue.
4. Using vol-dump on a broken volume still doesn't work, but at least
vol-destroy and pool-refresh do work now.
To reproduce:
dir=$(mktemp -d)
virsh pool-create-as tmp dir '' '' '' '' "$dir"
virsh vol-create-as --format qcow2 tmp back 1G
virsh vol-create-as --format qcow2 --backing-vol-format qcow2 --backing-vol back tmp cow 1G
virsh vol-delete --pool tmp back
virsh pool-refresh tmp
After the last step, the pool will be gone (because it was not persistent). As
long as the now broken image stays in the directory, you will not be able to
re-create or re-start the pool.
Signed-off-by: Philipp Hahn <hahn@univention.de>
This patch introduces a new libvirt API (virDomainSetMemoryFlags) and
a flag (virDomainMemoryModFlags).
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Opening raw network devices with the intent of passing those fds to
qemu is worth an audit point. This makes a multi-part audit: first,
we audit the device(s) that libvirt opens on behalf of the MAC address
of a to-be-created interface (which can independently succeed or
fail), then we audit whether qemu actually started the network device
with the same MAC (so searching backwards for successful audits with
the same MAC will show which fd(s) qemu is actually using). Note that
it is possible for the fd to be successfully opened but no attempt
made to pass the fd to qemu (for example, because intermediate
nwfilter operations failed) - no interface start audit will occur in
that case; so the audit for a successful opened fd does not imply
rights given to qemu unless there is a followup audit about the
attempt to start a new interface.
Likewise, when a network device is hot-unplugged, there is only one
audit message about the MAC being discontinued; again, searching back
to the earlier device open audits will show which fds that qemu quits
using (and yes, I checked via /proc/<qemu-pid>/fd that qemu _does_
close out the fds associated with an interface on hot-unplug). The
code would require much more refactoring to be able to definitively
state which device(s) were discontinued at that point, since we
currently don't record anywhere in the XML whether /dev/vhost-net was
opened for a given interface.
* src/qemu/qemu_audit.h (qemuAuditNetDevice): New prototype.
* src/qemu/qemu_audit.c (qemuAuditNetDevice): New function.
* src/qemu/qemu_command.h (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Adjust prototype.
* src/qemu/qemu_command.c (qemuNetworkIfaceConnect)
(qemuPhysIfaceConnect, qemuOpenVhostNet): Add audit points and
adjust parameters.
(qemuBuildCommandLine): Adjust caller.
* src/qemu/qemu_hotplug.c (qemuDomainAttachNetDevice): Likewise.
Since libvirt always passes /dev/net/tun to qemu via fd, we should
never trigger the cases where qemu tries to directly open the
device. Therefore, it is safer to deny the cgroup device ACL.
* src/qemu/qemu_cgroup.c (defaultDeviceACL): Remove /dev/net/tun.
* src/qemu/qemu.conf (cgroup_device_acl): Reflect this change.