Reported by Jason Helfman as a build-breaker on FreeBSD.
* src/conf/domain_conf.c (virDomainFSDefParseXML): Use POSIX
spelling.
* src/openvz/openvz_conf.c (openvzReadFSConf): Likewise.
Below patch fixes this coverity report:
/libvirt/src/conf/nwfilter_conf.c:382:
leaked_storage: Variable "varAccess" going out of scope leaks the storage it points to.
Since we are mounting a new /dev in the container, we must
remove any sub-mounts like /dev/shm, /dev/mqueue, etc,
otherwise they'll be recorded in /proc/mounts, but not be
accessible to applications.
Hello,
This is a patch to fix vm's outbound traffic control problem.
Currently, vm's outbound traffic control by libvirt doesn't go well.
This problem was previously discussed at libvir-list ML, however
it seems that there isn't still any answer to the problem.
http://www.redhat.com/archives/libvir-list/2011-August/msg00333.html
I measured Guest(with virtio-net) to Host TCP throughput with the
command "netperf -H".
Here are the outbound QoS parameters and the results.
outbound average rate[kilobytes/s] : Guest to Host throughput[Mbit/s]
======================================================================
1024 (8Mbit/s) : 4.56
2048 (16Mbit/s) : 3.29
4096 (32Mbit/s) : 3.35
8192 (64Mbit/s) : 3.95
16384 (128Mbit/s) : 4.08
32768 (256Mbit/s) : 3.94
65536 (512Mbit/s) : 3.23
The outbound traffic goes down unreasonably and is even not controled.
The cause of this problem is too large mtu value in "tc filter" command run by
libvirt. The command uses burst value to set mtu and the burst is equal to
average rate value if it's not set. This value is too large. For example
if the average rate is set to 1024 kilobytes/s, the mtu value is set to 1024
kilobytes. That's too large compared to the size of network packets.
Here libvirt applies tc ingress filter to Host's vnet(tun) device.
Tc ingress filter is implemented with TBF(Token Buckets Filter) algorithm. TBF
uses mtu value to calculate the amount of token consumed by each packet. With too
large mtu value, the token consumption rate is set too large. This leads to
token starvation and deterioration of TCP throughput.
Then, should we use the default mtu value 2 kilobytes?
The anser is No, because Guest with virtio-net device uses 65536 bytes
as mtu to transmit packets to Host, and the tc filter with the default mtu
value 2k drops packets whose size is larger than 2k. So, the most packets
is droped and again leads to deterioration of TCP throughput.
The appropriate mtu value is 65536 bytes which is equal to the maximum value
of network interface device defined in <linux/netdevice.h>. The value is
not so large that it causes token starvation and not so small that it
drops most packets.
Therefore this patch set the mtu value to 64kb(== 65535 bytes).
Again, here are the outbound QoS parameters and the TCP throughput with
the libvirt patched.
outbound average rate[kilobytes/s] : Guest to Host throughput[Mbit/s]
======================================================================
1024 (8Mbit/s) : 8.22
2048 (16Mbit/s) : 16.42
4096 (32Mbit/s) : 32.93
8192 (64Mbit/s) : 66.85
16384 (128Mbit/s) : 133.88
32768 (256Mbit/s) : 271.01
65536 (512Mbit/s) : 547.32
The outbound traffic conforms to the given limit.
Thank you,
Signed-off-by: Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com>
If the user specified invalid protocol type in a network's SRV record
the error path ended up in freeing uninitialized pointers causing a
daemon crash.
*network_conf.c: virNetworkDNSSrvDefParseXML(): initialize local
variables
VirtualBox doesn't use the common virDomainObj implementation so this
patch adds a separate implementation using the VirtualBox API.
This driver implementation supports all currently defined flags. As
VirtualBox does not support transient guests, managed save images and
autostarting we assume all guests are persistent, don't have a managed
save image and are not autostarted. Filtering for existence of those
properities results in empty list.
mnt_fsname can not be the same, as we check the duplicate pool
sources earlier before, means it can't be the same pool, moreover,
a pool can't be started if it's already active anyway. So no reason
to act as success.
virConnectDomainEventRegisterAny() takes a domain as an argument.
So it should be possible to register the same event (be it
VIR_DOMAIN_EVENT_ID_LIFECYCLE for example) for two different domains.
That is, we need to take domain into account when searching for
duplicate event being already registered.
In order to retrieve some sysinfo data we need to parse /proc/sysinfo and
/proc/cpuinfo.
Signed-off-by: Thang Pham <thang.pham@us.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
For the s390x architecture the sysfs core_id alone is not unique. As a
result it can happen that libvirt thinks there are less host CPUs available
than really present.
Currently, a logical CPU is equivalent to a core for s390x. We therefore
produce a fake core id from the CPU number.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Minimal CPU "parser" for s390 to avoid compile time warning.
Signed-off-by: Thang Pham <thang.pham@us.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Adding CPU encoder/decoder for s390 to avoid runtime error messages.
Signed-off-by: Thang Pham <thang.pham@us.ibm.com>
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Starting a KVM guest on s390 fails immediately. This is because
"qemu --help" reports -no-acpi even for the s390(x) architecture but
-no-acpi isn't supported there.
Workaround is to remove QEMU_CAPS_NO_ACPI from the capability set
after the version/capability extraction.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
We used to prefix 'rbd:' to volume names, this is not necessary.
Qemu takes RBD devices in this way, like: qemu -drive rbd:pool/image
When attaching a network disk like RBD to a guest we however do not use this prefix.
Currently you can't map a RBD volume name directly to a domain without removing the prefix.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
If no 'listen' attribute or <listen> element is set in the
guest XML, the default driver configured listen address is
used. There is no way to client applications to determine
what this address is though. When starting the guest, we
should update the live XML to include this default listen
address
Storage is one of the last domains in libvirt where we don't fully
utilize inactive and live XML. Okay, it might be because we don't
have support for that. So implement such support. However, we need
to fallback when talking to old daemon which doesn't support this
new flag called VIR_STORAGE_XML_INACTIVE.
Currently, we share the idea of old & new def with domains. Users can
*-edit an object (domain, pool) which spawns a new internal
representation for them. This is referenced via
{domainObj,poolObj}->newDef [compared to ->def]. However, for pool we
were never overwriting def with newDef. This must be done on
pool-destroy (like we do analogically in domain detroy).
Currently libvirt-lxc checks to see if the destination exists and is a
directory. If it is not a directory then the mount fails. Since
libvirt-lxc can bind mount files on an inode, this patch is needed to
allow us to bind mount files on files. Currently we want to bind mount
on top of /etc/machine-id, and /etc/adjtime
If the destination of the mount point does not exists, it checks if the
src is a directory and then attempts to create a directory, otherwise it
creates an empty file for the destination. The code will then bind mount
over the destination.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some GNULIB headers (eg unistd.h) will often need to include
winsock2.h for various symbols. There is a rule that winsock2.h
must be included before windows.h. This means that any file
which does
#ifdef WIN32
#include <windows.h>
#endif
#include <unistd.h>
is potentially broken. A simple rule is that /all/ includes of
windows.h must be matched with a preceding include of winsock2.h
regardless of whether unistd.h is used currently
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A sanlock lease can be marked as shared (rather
than exclusive) using SANLK_RES_SHARED flag. This
adds support for that flag and ensures that in auto
disk mode, any shared disks use shared leases. This
also makes any read-only disks be completely
ignored.
These changes remove the need for the option
ignore_readonly_and_shared_disks
so that is removed
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently you can configure LXC to bind a host directory to
a guest directory, but not to bind a guest directory to a
guest directory. While the guest container init could do
this itself, allowing it in the libvirt XML means a stricter
SELinux policy can be written
Introduce a new syntax for filesystems to allow use of a RAM
filesystem
<filesystem type='ram'>
<source usage='10' units='MiB'/>
<target dir='/mnt'/>
</filesystem>
The usage units default to KiB to limit consumption of host memory.
* docs/formatdomain.html.in: Document new syntax
* docs/schemas/domaincommon.rng: Add new attributes
* src/conf/domain_conf.c: Parsing/formatting of RAM filesystems
* src/lxc/lxc_container.c: Mounting of RAM filesystems
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The test of ref count is not protected by lock, which is unsafe because
the ref count may have been changed by other threads during the test.
This patch fixes this.
When shutting down libvirtd, the virNetServer shutdown can deadlock
if there are in-flight jobs being handled by virNetServerHandleJob().
virNetServerFree() will acquire the virNetServer lock and call
virThreadPoolFree() to terminate the workers, waiting for the workers
to finish. But in-flight workers will attempt to acquire the
virNetServer lock, resulting in deadlock.
Fix the deadlock by unlocking the virNetServer lock before calling
virThreadPoolFree(). This is safe since the virNetServerPtr object
is ref-counted and only decrementing the ref count needs to be
protected. Additionally, there is no need to re-acquire the lock
after virThreadPoolFree() completes as all the workers have
terminated.
The lxc contoller eventually makes use of virRandomBits(), which was
segfaulting since virRandomInitialize() is never invoked.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff554d560 in random_r () from /lib64/libc.so.6
(gdb) bt
0 0x00007ffff554d560 in random_r () from /lib64/libc.so.6
1 0x0000000000469eaa in virRandomBits (nbits=32) at util/virrandom.c:80
2 0x000000000045bf69 in virHashCreateFull (size=256,
dataFree=0x4aa2a2 <hashDataFree>, keyCode=0x45bd40 <virHashStrCode>,
keyEqual=0x45bdad <virHashStrEqual>, keyCopy=0x45bdfa <virHashStrCopy>,
keyFree=0x45be37 <virHashStrFree>) at util/virhash.c:134
3 0x000000000045c069 in virHashCreate (size=0, dataFree=0x4aa2a2 <hashDataFree>)
at util/virhash.c:164
4 0x00000000004aa562 in virNWFilterHashTableCreate (n=0)
at conf/nwfilter_params.c:686
5 0x00000000004aa95b in virNWFilterParseParamAttributes (cur=0x711d30)
at conf/nwfilter_params.c:793
6 0x0000000000481a7f in virDomainNetDefParseXML (caps=0x702c90, node=0x7116b0,
ctxt=0x7101b0, bootMap=0x0, flags=0) at conf/domain_conf.c:4589
7 0x000000000048cc36 in virDomainDefParseXML (caps=0x702c90, xml=0x710040,
root=0x7103b0, ctxt=0x7101b0, expectedVirtTypes=16, flags=0)
at conf/domain_conf.c:8658
8 0x000000000048f011 in virDomainDefParseNode (caps=0x702c90, xml=0x710040,
root=0x7103b0, expectedVirtTypes=16, flags=0) at conf/domain_conf.c:9360
9 0x000000000048ee30 in virDomainDefParse (xmlStr=0x0,
filename=0x702ae0 "/var/run/libvirt/lxc/x.xml", caps=0x702c90,
expectedVirtTypes=16, flags=0) at conf/domain_conf.c:9310
10 0x000000000048ef00 in virDomainDefParseFile (caps=0x702c90,
filename=0x702ae0 "/var/run/libvirt/lxc/x.xml", expectedVirtTypes=16, flags=0)
at conf/domain_conf.c:9332
11 0x0000000000425053 in main (argc=5, argv=0x7fffffffe2b8)
at lxc/lxc_controller.c:1773
The comment says:
/* Now create the final dir in the path with the uid/gid/mode
* requested in the config. If the dir already exists, just set
* the perms.
*/
However, virDirCreate is only invoked if the target path doesn't
exist yet (which is opposite with the comment), or the uid from
the config is not -1 (I don't understand why, think it's just
another mistake). And the result is the perms of the pool won't
be changed if one tries to build the pool with different perms
again.
Besides these logic error fix, if no uid and gid are specified in
the config, the practical used uid, gid are reflected.
The two new APIs are rather trivial; based on bits and pieces of
other existing APIs. But rather than blindly return 0 or 1 for
HasMetadata, I chose to first validate that the snapshot in
question in fact exists.
* src/esx/esx_driver.c (esxDomainSnapshotIsCurrent)
(esxDomainSnapshotHasMetadata): New functions.
* src/vbox/vbox_tmpl.c (vboxDomainSnapshotIsCurrent)
(vboxDomainSnapshotHasMetadata): Likewise.
Blindly returning success is misleading if the object no longer
exists; it is a bit better to check for existence up front before
returning information about that object. This pattern matches the
fact that most of our other APIs check for existence as a side
effect prior to getting at the real piece of information being
queried.
* src/esx/esx_driver.c (esxDomainIsUpdated, esxDomainIsPersistent):
Add existence checks.
* src/vbox/vbox_tmpl.c (vboxDomainIsPersistent)
(vboxDomainIsUpdated): Likewise.
This patch adds support for listing all domains into drivers that use
the common virDomainObj implementation: libxl, lxc, openvz, qemu, test,
uml, vmware.
For drivers that don't support managed save images the guests are
treated as if they had none, so filtering guests that do have such an
image on this driver succeeds and produces 0 results.
The two new functions are very similar to the existing functions;
just a matter of different arguments and a call to a different
helper function.
* src/qemu/qemu_driver.c (qemuDomainSnapshotListNames)
(qemuDomainSnapshotNum, qemuDomainSnapshotListChildrenNames)
(qemuDomainSnapshotNumChildren): Support new flags.
(qemuDomainListAllSnapshots): New functions.
Wraps the conversion from 'char *name' to virDomainSnapshotPtr in
a reusable manner.
* src/conf/virdomainlist.h (virDomainListSnapshots): New declaration.
* src/conf/virdomainlist.c (virDomainListSnapshots): Implement it.
* src/libvirt_private.syms (virdomainlist.h): Export it.
The generator doesn't handle lists of virDomainSnapshotPtr, so
this commit requires a bit more work than some RPC additions.
* src/remote/remote_protocol.x
(REMOTE_PROC_DOMAIN_LIST_ALL_SNAPSHOTS)
(REMOTE_PROC_DOMAIN_SNAPSHOT_LIST_ALL_CHILDREN): New RPC calls,
with corresponding structs.
* daemon/remote.c (remoteDispatchDomainListAllSnapshots)
(remoteDispatchDomainSnapshotListAllChildren): New functions.
* src/remote/remote_driver.c (remoteDomainListAllSnapshots)
(remoteDomainSnapshotListAllChildren): Likewise.
* src/remote_protocol-structs: Regenerate.
There was an inherent race between virDomainSnapshotNum() and
virDomainSnapshotListNames(), where an additional snapshot could
be created in the meantime, or where a snapshot could be deleted
before converting the name back to a virDomainSnapshotPtr. It
was also an awkward name: the function operates on domains, not
domain snapshots. virDomainSnapshotListChildrenNames() suffered
from the same inherent race, although its naming was nicer.
This patch makes things nicer by grabbing a snapshot list
atomically, in the format most useful to the user.
* include/libvirt/libvirt.h.in (virDomainListAllSnapshots)
(virDomainSnapshotListAllChildren): New declarations.
* src/libvirt.c (virDomainSnapshotListNames)
(virDomainSnapshotListChildrenNames): Add cross-references.
(virDomainListAllSnapshots, virDomainSnapshotListAllChildren):
New functions.
* src/libvirt_public.syms (LIBVIRT_0.9.13): Export them.
* src/driver.h (virDrvDomainListAllSnapshots)
(virDrvDomainSnapshotListAllChildren): New callbacks.
* python/generator.py (skip_function): Prepare for later
hand-written versions.
It turns out that one-bit filtering makes it hard to select the inverse
set, so it is easier to provide filtering groups. For back-compat,
omitting all bits within a group means the group is not used for
filtering, and by definition of a group (each snapshot matches exactly
one bit within the group, and the set of bits in the group covers all
snapshots), selecting all bits also makes the group useless.
Unfortunately, virDomainSnapshotListChildren defined the bit
VIR_DOMAIN_SNAPSHOT_LIST_DESCENDANTS as an expansion rather than a
filter, so we cannot make it part of a filter group, so that bit
(and its counterpart VIR_DOMAIN_SNAPSHOT_LIST_ROOTS for
virDomainSnapshotList) remains a single control bit.
* include/libvirt/libvirt.h.in (virDomainSnapshotListFlags): Add a
couple more flags.
* src/libvirt.c (virDomainSnapshotNum)
(virDomainSnapshotNumChildren): Document them.
(virDomainSnapshotListNames, virDomainSnapshotListChildrenNames):
Likewise, and add thread-safety caveats.
* src/conf/virdomainlist.h (VIR_DOMAIN_SNAPSHOT_FILTERS_*): New
convenience macros.
* src/conf/domain_conf.c (virDomainSnapshotObjListCopyNames)
(virDomainSnapshotObjListCount): Support the new flags.
Until now, it was possible to crash libvirtd when defining domain with
channel device with missing source element.
When creating new virDomainChrDef, target.port is set to -1, but
unfortunately it is an union with addresses that virDomainChrDefFree
tries to free in case the deviceType is channel. Having the port set
to -1 is intended, however the cleanest way to get around the problems
with the crash seems to be renumbering the VIR_DOMAIN_CHR_CHANNEL_
target types to cover new NONE type (with value 0) being the default
(no target type yet).
Macro virCheckNullArgGoto is supposed to check for NULL argument but
checks non-NULL instead.
Macro virCheckNonNullArgReturn reports error as if the argument should
be NULL when it shouldn't.
when lxcContainerIdentifyCGroups failed, the memory it allocated
has been freed, so we should not free this memory again in
lxcContainerSetupPivortRoot and lxcContainerSetupExtraMounts.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Another case where we can do the same amount of work with fewer
lines of redundant code, which will make adding new filters easier.
* src/conf/domain_conf.c (virDomainSnapshotNameData): Adjust
struct.
(virDomainSnapshotObjListCount): Delete, now taken care of...
(virDomainSnapshotObjListCopyNames): ...here.
(virDomainSnapshotObjListGetNames): Adjust caller to handle
counting.
(virDomainSnapshotObjListNum): Simplify.