Commit Graph

9983 Commits

Author SHA1 Message Date
Guido Günther
41f1db6a0c Introduce filesystem limits to virDomainFSDef 2012-05-24 11:35:02 +02:00
Guido Günther
b46e005459 Introduce virDomainParseScaledValue
and use it for virDomainParseMemory. This allows to parse arbitrary
scaled value, not only memory related values as needed for the
filesystem limits code following later in this series.
2012-05-24 11:35:01 +02:00
Daniel P. Berrange
4c7973e184 Remove more bogus systemd service dependencies
Adding syslog.target is obsolete, avahi.target does not
exist and dbus.target is also obsolete

Reported-by: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-23 12:03:06 +01:00
Jiri Denemark
63643f67ab Revert "rpc: Discard non-blocking calls only when necessary"
This reverts commit b1e374a7ac, which was
rather bad since I failed to consider all sides of the issue. The main
things I didn't consider properly are:

- a thread which sends a non-blocking call waits for the thread with
  the buck to process the call
- the code doesn't expect non-blocking calls to remain in the queue
  unless they were already partially sent

Thus, the reverted patch actually breaks more than what it fixes and
clients (which may even be libvirtd during p2p migrations) will likely
end up in a deadlock.
2012-05-22 23:33:11 +02:00
Peter Krempa
db19417fc0 qemu_hotplug: Don't free the PCI device structure after hot-unplug
The pciDevice structure corresponding to the device being hot-unplugged
was freed after it was "stolen" from activeList. The pointer was still
used for eg-inactive list. This patch removes the free of the structure
and frees it only if reset fails on the device.
2012-05-22 18:21:29 +02:00
Laine Stump
3404729e58 util: export virBufferTrim
This was forgotten in commit cdb87b1c4b.
2012-05-22 11:36:04 -04:00
Eric Blake
cdb87b1c4b virBuffer: add way to trim back extra text
I'm tired of writing:

bool sep = false;
while (...) {
    if (sep)
       virBufferAddChar(buf, ',');
    sep = true;
    virBufferAdd(buf, str);
}

This makes it easier, allowing one to write:

while (...)
    virBufferAsprintf(buf, "%s,", str);
virBufferTrim(buf, ",", -1);

to trim any remaining comma.

* src/util/buf.h (virBufferTrim): Declare.
* src/util/buf.c (virBufferTrim): New function.
* tests/virbuftest.c (testBufTrim): Test it.
2012-05-21 16:01:43 -06:00
Wido den Hollander
74951eadef storage backend: Add RBD (RADOS Block Device) support
This patch adds support for a new storage backend with RBD support.

RBD is the RADOS Block Device and is part of the Ceph distributed storage
system.

It comes in two flavours: Qemu-RBD and Kernel RBD, this storage backend only
supports Qemu-RBD, thus limiting the use of this storage driver to Qemu only.

To function this backend relies on librbd and librados being present on the
local system.

The backend also supports Cephx authentication for safe authentication with
the Ceph cluster.

For storing credentials it uses the built-in secret mechanism of libvirt.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2012-05-21 12:37:38 -06:00
Eric Blake
b8e6021e7b build: fix unused variable after last patch
The previous commit (2cb0899) left a dead variable behind.

* src/libxl/libxl_driver.c (libxlClose): Drop dead variable.
2012-05-21 12:36:50 -06:00
Daniel P. Berrange
2cb0899eec Fix potential events deadlock when unref'ing virConnectPtr
When the last reference to a virConnectPtr is released by
libvirtd, it was possible for a deadlock to occur in the
virDomainEventState functions. The virDomainEventStatePtr
holds a reference on virConnectPtr for each registered
callback. When removing a callback, the virUnrefConnect
function is run. If this causes the last reference on the
virConnectPtr to be released, then virReleaseConnect can
be run, which in turns calls qemudClose. This function has
a call to virDomainEventStateDeregisterConn which is intended
to remove all callbacks associated with the virConnectPtr
instance. This will try to grab a lock on virDomainEventState
but this lock is already held. Deadlock ensues

Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)):

Since each callback associated with a virConnectPtr holds a
reference on virConnectPtr, it is impossible for the qemudClose
method to be invoked while any callbacks are still registered.
Thus the call to virDomainEventStateDeregisterConn must in fact
be a no-op. Thus it is possible to just remove all trace of
virDomainEventStateDeregisterConn and avoid the deadlock.

* src/conf/domain_event.c, src/conf/domain_event.h,
  src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn
* src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
  src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove
  calls to virDomainEventStateDeregisterConn
2012-05-21 18:50:47 +01:00
Jim Fehlig
651d712452 Fix build when configuring with polkit0
Commit 2223ea98 removed the only use of 'server' param in
remoteDispatchAuthPolkit().  Mark the parameter with ATTRIBUTE_UNUSED
to fix the build when configuring with polkit0.
2012-05-21 09:23:41 -06:00
Stefan Berger
a3f3ab4c9c nwfilter: Add support for ipset
This patch adds support for the recent ipset iptables extension
to libvirt's nwfilter subsystem. Ipset allows to maintain 'sets'
of IP addresses, ports and other packet parameters and allows for
faster lookup (in the order of O(1) vs. O(n)) and rule evaluation
to achieve higher throughput than what can be achieved with
individual iptables rules.

On the command line iptables supports ipset using

iptables ... -m set --match-set <ipset name> <flags> -j ...

where 'ipset name' is the name of a previously created ipset and
flags is a comma-separated list of up to 6 flags. Flags use 'src' and 'dst'
for selecting IP addresses, ports etc. from the source or
destination part of a packet. So a concrete example may look like this:

iptables -A INPUT -m set --match-set test src,src -j ACCEPT

Since ipset management is quite complex, the idea was to leave ipset 
management outside of libvirt but still allow users to reference an ipset.
The user would have to make sure the ipset is available once the VM is
started so that the iptables rule(s) referencing the ipset can be created.

Using XML to describe an ipset in an nwfilter rule would then look as
follows:

  <rule action='accept' direction='in'>
    <all ipset='test' ipsetflags='src,src'/>
  </rule>

The two parameters on the command line are also the two distinct XML attributes
'ipset' and 'ipsetflags'.

FYI: Here is the man page for ipset:

https://ipset.netfilter.org/ipset.man.html

Regards,
    Stefan
2012-05-21 06:26:34 -04:00
Eric Blake
e8314e78f9 build: fix virnetlink on glibc 2.11
We were being lazy - virnetlink.c was getting uint32_t as a
side-effect from glibc 2.14's <unistd.h>, but older glibc 2.11
does not provide uint32_t from <unistd.h>.  In fact, POSIX states
that <unistd.h> need only provide intptr_t, not all of <stdint.h>,
so the bug really is ours.  Reported by Jonathan Alescio.

* src/util/virnetlink.h: Include <stdint.h>.
2012-05-18 09:42:25 -06:00
Hu Tao
fe0aac0503 Adds support to param 'vcpu_time' in qemu_driver.
This involves setting the cpuacct cgroup to a per-vcpu granularity,
as well as summing the each vcpu accounting into a common array.
Now that we are reading more than one cgroup file, we double-check
that cpus weren't hot-plugged between reads to invalidate our
summing.

Signed-off-by: Eric Blake <eblake@redhat.com>
2012-05-18 08:53:49 -06:00
Hu Tao
d29a7aaa1a Add a new param 'vcpu_time' to virDomainGetCPUStats
Currently virDomainGetCPUStats gets total cpu usage, which consists
of:

  1. vcpu usage: the physical cpu time consumed by virtual cpu(s) of
     domain
  2. hypervisor: `total cpu usage' - `vcpu usage'

The param 'vcpu_time' is for getting vcpu usages.
2012-05-17 12:42:06 -06:00
Marc-André Lureau
d9a269bc74 tests: add ich6 codec type test to qemuxml2argv-sound-device
Test new codec type element.
2012-05-17 11:43:35 -06:00
Marc-André Lureau
a7675a6ba5 qemu: honour sound <codec> sub-elements
With ICH6 audio device, allow to specify codecs.
By default, for compatibility reasons, if no codec is specified,
"hda-duplex" will be used.
2012-05-17 11:40:36 -06:00
Marc-André Lureau
988e85a51e domain: add <codec> sound sub-element
Allow specifying sound device codecs. See formatdomain.html for
more details.
2012-05-17 11:40:11 -06:00
Marc-André Lureau
0aaebd7abc qemu: test CAPS_HDA_MICRO 2012-05-17 11:12:40 -06:00
Michal Privoznik
9c484e3dc5 qemu: Don't delete USB device on failed qemuPrepareHostdevUSBDevices
If qemuPrepareHostdevUSBDevices fail it will roll back devices added
to the driver list of used devices. However, if it may fail because
the device is being used already. But then again - with roll back.
Therefore don't try to remove a usb device manually if the function
fail. Although, we want to remove the device if any operation
performed afterwards fail.
2012-05-17 13:40:52 +02:00
Eric Blake
5a8262a0ae nodeinfo: test more details
Make it obvious why we need Osier's patch in commit 10d9038b
to fix NUMA parsing of an AMD machine with two cores sharing
a socket id.

* tests/nodeinfotest.c (linuxTestCompareFiles): Enhance the test.
* tests/nodeinfodata/linux-nodeinfo-sysfs-test-*-output.txt: Update.
2012-05-16 10:23:06 -06:00
Daniel P. Berrange
e7df360d56 Add a virLogMessage alternative taking va_list args
Allow the logging APIs to be called with a va_list for format
args, instead of requiring var-args usage.

* src/util/logging.h, src/util/logging.c: Add virLogVMessage
2012-05-16 17:13:13 +01:00
Eric Blake
3337ba6dc7 build: fix recent syntax-check breakage
The use of readlink() in lxc_container.c is intentional; we don't
want an absolute pathname there.

* src/util/cgroup.h (VIR_CGROUP_SYSFS_MOUNT): Indent properly.
* cfg.mk (exclude_file_name_regexp--sc_prohibit_readlink): Add
exemption.
2012-05-16 09:52:44 -06:00
Michal Privoznik
2f5fdc886e qemu: Rollback on used USB devices
One of our latest USB device handling patches
05abd1507d introduced a regression.
That is, we first create a temporary list of all USB devices that
are to be used by domain just starting up. Then we iterate over and
check if a device from the list is in the global list of currently
assigned devices (activeUsbHostdevs). If not, we add it there and
continue with next iteration then. But if a device from temporary
list is either taken already or adding to the activeUsbHostdevs fails,
we remove all devices in temp list from the activeUsbHostdevs list.
Therefore, if a device is already taken we remove it from
activeUsbHostdevs even if we should not. Thus, next time we allow
the device to be assigned to another domain.
2012-05-16 17:10:28 +02:00
Daniel P. Berrange
7ba66ef285 Fix build compat with older libselinux for LXC
Most versions of libselinux do not contain the function
selinux_lxc_contexts_path() that the security driver
recently started using for LXC. We must add a conditional
check for it in configure and then disable the LXC security
driver for builds where libselinux lacks this function.

* configure.ac: Check for selinux_lxc_contexts_path
* src/security/security_selinux.c: Disable LXC security
  if selinux_lxc_contexts_path() is missing
2012-05-16 15:38:29 +01:00
Daniel P. Berrange
51bcb09fe9 Reject any non-option command line arguments
Due to a bug in editing /etc/sysconfig/libvirtd, VDSM was causing
libvirt processes to run with the following command line args

   /usr/sbin/libvirtd --listen '#' 'by vdsm'

While it correctly rejects any invalid option flags, libvirtd
was not rejecting any non-option command line arguments

* daemon/libvirtd.c: Reject non-option argv
2012-05-16 12:03:02 +01:00
Daniel P. Berrange
a8c0b2fed0 Remount cgroups controllers after setting up new /sys in LXC
Normal practice is for cgroups controllers to be mounted at
/sys/fs/cgroup. When setting up a container, /sys is mounted
with a new sysfs instance, thus we must re-mount all the
cgroups controllers. The complexity is that we must mount
them in the same layout as the host OS. ie if 'cpu' and 'cpuacct'
were mounted at the same location in the host we must preserve
this in the container. Also if any controllers are co-located
we must setup symlinks from the individual controller name to
the co-located mount-point

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 11:37:40 +01:00
Daniel P. Berrange
c529b47a75 Trim /proc & /sys subtrees before mounting new instances
Both /proc and /sys may have sub-mounts in them from the host
OS. We must explicitly unmount them all before mounting the
new instance over that location. If we don't then /proc/mounts
will show the sub-mounts as existing, even though nothing will
be able to access them, due to the over-mount.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 11:27:29 +01:00
Daniel P. Berrange
c16b4c43fc Avoid LXC pivot root in the root source is still /
If the LXC config has a filesystem

  <filesystem>
     <source dir='/'/>
     <target dir='/'/>
  </filesystem>

then there is no need to go down the pivot root codepath.
We can simply use the existing root as needed.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel P. Berrange
e8639920ac Mount fresh instance of sysfs/selinux in LXC
Currently to make sysfs readonly, we remount the existing
instance and then bind it readonly. Unfortunately this means
sysfs is still showing device objects wrt the host OS namespace.
We need it to reflect the container namespace, so we must mount
a completely new instance of it. Do the same for selinuxfs since
there is no benefit to bind mounting & this lets us simplify
the code.

* src/lxc/lxc_container.c: Mount fresh sysfs instance

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel Walsh
8dd5794f81 Convert the LXC driver to use the security driver API for mount options
Instead of hardcoding use of SELinux contexts in the LXC driver,
switch over to using the official security driver API.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel Walsh
abf2ebbd27 Add security driver APIs for getting mount options
Some security drivers require special options to be passed to
the mount system call. Add a security driver API for handling
this data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel Walsh
6844ceadb4 Add support for LXC specific SELinux configuration
The SELinux policy for LXC uses a different configuration file
than the traditional svirt one. Thus we need to load
/etc/selinux/targeted/contexts/lxc_contexts which contains
something like this:

 process = "system_u:system_r:svirt_lxc_net_t:s0"
 file = "system_u:object_r:svirt_lxc_file_t:s0"
 content = "system_u:object_r:virt_var_lib_t:s0"

cleverly designed to be parsable by virConfPtr

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel Walsh
fa5e68ffbf Use private data struct in SELinux driver
Currently the SELinux driver stores its state in a set of global
variables. This switches it to use a private data struct instead.
This will enable different instances to have their own data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:46 +01:00
Daniel Walsh
cf36c23bc9 Don't enable the AppArmour security driver with LXC
The AppArmour driver does not currently have support for LXC
so ensure that when probing, it claims to be disabled

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:46 +01:00
Daniel Walsh
73580c60d1 Pass the virt driver name into security drivers
To allow the security drivers to apply different configuration
information per hypervisor, pass the virtualization driver name
into the security manager constructor.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:46 +01:00
Daniel P. Berrange
6cfc3f8f4f Remove bogus udev.target dep from libvirtd unit
There is no 'udev.target' unit in systemd (only 'udev.service')
yet libvirtd's unit file had a dep on one. There's no compelling
reason for a dep on udev, so remove it altogether.

Reported-by: Avi Kivity <avi@redhat.com>
2012-05-16 10:04:59 +01:00
Jiri Denemark
63b4243624 qemu: Add support for -no-user-config
Thanks to this new option we are now able to use modern CPU models (such
as Westmere) defined in external configuration file.

The qemu-1.1{,-device} data files for qemuhelptest are filled in with
qemu-1.1-rc2 output for now. I will update those files with real
qemu-1.1 output once it is released.
2012-05-15 20:29:12 +02:00
Daniel P. Berrange
03b804a200 Set a sensible default master start port for ehci companion controllers
The uhci1, uhci2, uhci3 companion controllers for ehci1 must
have a master start port set. Since this value is predictable
we should set it automatically if the app does not supply it
2012-05-15 17:07:34 +01:00
Daniel P. Berrange
1ebd52cb87 Fix logic for assigning PCI addresses to USB2 companion controllers
Currently each USB2 companion controller gets put on a separate
PCI slot. Not only is this wasteful of PCI slots, but it is not
in compliance with the spec for USB2 controllers. The master
echi1 and all companion controllers should be in the same slot,
with echi1 in function 7, and uhci1-3 in functions 0-2 respectively.

* src/qemu/qemu_command.c: Special case handling of USB2 controllers
  to apply correct pci slot assignment
* tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.args,
  tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.xml: Expand
  test to cover automatic slot assignment
2012-05-15 17:07:34 +01:00
Daniel P. Berrange
2c195fdbf3 Fix virDomainDeviceInfoIsSet() to check all struct fields
The virDomainDeviceInfoIsSet API was only checking if an
address or alias was set in the struct. Thus if only a
rom bar setting / filename, boot index, or USB master
value was set, they could be accidentally dropped when
formatting XML
2012-05-15 17:07:34 +01:00
Daniel P. Berrange
b3567ef37c Remove redundant trailing slash in user dir paths
Callers of virGetUser{Config,Runtime,Cache}Directory all
append further path component. We should not be
adding a trailing slash in the return path otherwise we
get paths containing '//'

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-15 17:07:18 +01:00
Daniel P. Berrange
548563956e Allow stack traces to be included with log messages
Sometimes it is useful to see the callpath for log messages.
This change enhances the log filter syntax so that stack traces
can be show by setting '1:+NAME' instead of '1:NAME'.

This results in output like:

2012-05-09 14:18:45.136+0000: 13314: debug : virInitialize:414 : register drivers
/home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virInitialize+0xd6)[0x7f89188ebe86]
/home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x431921]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3a21e21735]
/home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x40a279]

2012-05-09 14:18:45.136+0000: 13314: debug : virRegisterDriver:775 : driver=0x7f8918d02760 name=Test
/home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virRegisterDriver+0x6b)[0x7f89188ec717]
/home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(+0x11b3ad)[0x7f891891e3ad]
/home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virInitialize+0xf3)[0x7f89188ebea3]
/home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x431921]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3a21e21735]
/home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x40a279]

* docs/logging.html.in: Document new syntax
* configure.ac: Check for execinfo.h
* src/util/logging.c, src/util/logging.h: Add support for
  stack traces
* tests/testutils.c: Adapt to API change

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-15 17:01:40 +01:00
Daniel P. Berrange
905be03d20 Move user libvirtd socket out of abstract namespace
The current unprivileged user libvirtd sockets are in the abstract
namespace. This has a number of problems

 - You can't connect to them remotely using the nc/ssh tunnel
 - This is not portable for OS-X, BSD & probably others
 - Parent directory permissions don't apply
2012-05-15 16:29:55 +01:00
Daniel P. Berrange
2adda523ea Add openvz_util.c to POTFILES 2012-05-15 16:27:08 +01:00
Daniel P. Berrange
3247b63ba9 Add bundled(gnulib) to RPM specfile
According to Fedora guidelines, because we bundle gnulib we
need to add a virtual Provides: bundled(gnulib).

https://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries#Requirement_if_you_bundle
2012-05-15 16:25:30 +01:00
Guido Günther
80fd8367c9 openvz: determine kb/pages only once
to save some syscalls (as suggested by Eric Blake)
2012-05-15 14:39:14 +02:00
Osier Yang
c086af6b9b libvirt-guests: Remove LISTFILE if it's empty when stopping service
$LISTFILE is created even no domain is running, and the empty
$LISTFILE could cause improper service status.

    stopped ,with saved guests

Which is not right, as there is no domain was saved.
2012-05-15 16:22:28 +08:00
Osier Yang
10d9038b74 nodeinfo: Get the correct CPU number on AMD Magny Cours platform
"Instead of developing one CPU with 12 cores, the Magny Cours is
actually two 6 core “Bulldozer” CPUs combined in to one package"

I.e, each package has two NUMA nodes, and the two numa nodes share
the same core ID set (0-6), which means parsing the cores number
from sysfs doesn't work in this case.

And the wrong CPU number could cause three problems for libvirt:

1) performance lost

  A domain without "cpuset" or "placement='auto'" (to drive numad)
specified will be only pinned to part of the CPUs.

2) domain can be started

  If a domain uses numad, and the advisory nodeset returned from
numad contains node which exceeds the range of wrong total CPU
number. The domain will fail to start, as the bitmask passed to
sched_setaffinity could be fully filled with zero.

3) wrong CPU number affects lots of stuffs.

  E.g. for command "virsh vcpuinfo", "virsh vcpupin", it will always
output with the truncated CPU list.

For more details:

https://www.redhat.com/archives/libvir-list/2012-May/msg00607.html

This patch is to fix the problem by parsing /proc/cpuinfo to get
the value of field "cpu cores", and use it as nodeinfo->cores if
it's greater than the cores number from sysfs.
2012-05-15 10:19:49 +08:00
Osier Yang
be9f6ecb28 qemu: Set memory policy using cgroup if placement is auto
Like for 'static' placement, when the memory policy mode is
'strict', set the memory policy by writing the advisory nodeset
returned from numad to cgroup file cpuset.mems,
2012-05-15 10:11:14 +08:00