684 Commits

Author SHA1 Message Date
Daniel P. Berrange
849272dee8 Move check for cgroup devices ACL upfront in LXC hotplug
The check for whether the cgroup devices ACL is available is
done quite late during LXC hotplug - in fact after the device
node is already created in the container in some cases. Better
to do it upfront so we fail immediately.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit c3eb12cace868884393d35c23278653634d81c70)
2014-02-18 13:18:46 +00:00
Daniel P. Berrange
51fe7d9de8 Disks are always block devices, never character devices
The LXC disk hotplug code was allowing block or character devices
to be given as disk. A disk is always a block device.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit d24e6b8b1eb87daa6ee467b76cf343725468949c)
2014-02-18 13:18:46 +00:00
Daniel P. Berrange
57097be96d Fix reset of cgroup when detaching USB device from LXC guests
When detaching a USB device from an LXC guest we must remove
the device from the cgroup ACL. Unfortunately we were telling
the cgroup code to use the guest /dev path, not the host /dev
path, and the guest device node had already been unlinked.
This was, however, fortunate since the code passed &priv->cgroup
instead of priv->cgroup, so would have crash if the device node
were accessible.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 2c2bec94d27ccd070bee18a6113b1cfea6d80126)
2014-02-18 13:18:46 +00:00
Daniel P. Berrange
d2fc280c9b Record hotplugged USB device in LXC live guest config
After hotplugging a USB device, the LXC driver forgot
to add the device def to the virDomainDefPtr.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit a537827d15516f2b59afb23ce2d50b8a88d7f090)
2014-02-18 13:18:46 +00:00
Daniel P. Berrange
79754d40bf Fix path used for USB device attach with LXC
The LXC code missed the 'usb' component out of the path
/dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually
setup cgroups for the device. This was in fact lucky
because the call to virLXCSetupHostUsbDeviceCgroup
was also mistakenly passing '&priv->cgroup' instead of
just 'priv->cgroup'. So once the path is fixed, libvirtd
would then crash trying to access the bogus virCgroupPtr
pointer. This would have been a security issue, were it
not for the bogus path preventing the pointer reference
being reached.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit c3648972222d4eb056e6e667c193ba56a7aa3557)
2014-02-18 13:18:46 +00:00
Eric Blake
b19f326964 event: move event filtering to daemon (regression fix)
https://bugzilla.redhat.com/show_bug.cgi?id=1058839

Commit f9f56340 for CVE-2014-0028 almost had the right idea - we
need to check the ACL rules to filter which events to send.  But
it overlooked one thing: the event dispatch queue is running in
the main loop thread, and therefore does not normally have a
current virIdentityPtr.  But filter checks can be based on current
identity, so when libvirtd.conf contains access_drivers=["polkit"],
we ended up rejecting access for EVERY event due to failure to
look up the current identity, even if it should have been allowed.

Furthermore, even for events that are triggered by API calls, it
is important to remember that the point of events is that they can
be copied across multiple connections, which may have separate
identities and permissions.  So even if events were dispatched
from a context where we have an identity, we must change to the
correct identity of the connection that will be receiving the
event, rather than basing a decision on the context that triggered
the event, when deciding whether to filter an event to a
particular connection.

If there were an easy way to get from virConnectPtr to the
appropriate virIdentityPtr, then object_event.c could adjust the
identity prior to checking whether to dispatch an event.  But
setting up that back-reference is a bit invasive.  Instead, it
is easier to delay the filtering check until lower down the
stack, at the point where we have direct access to the RPC
client object that owns an identity.  As such, this patch ends
up reverting a large portion of the framework of commit f9f56340.
We also have to teach 'make check' to special-case the fact that
the event registration filtering is done at the point of dispatch,
rather than the point of registration.  Note that even though we
don't actually use virConnectDomainEventRegisterCheckACL (because
the RegisterAny variant is sufficient), we still generate the
function for the purposes of documenting that the filtering
takes place.

Also note that I did not entirely delete the notion of a filter
from object_event.c; I still plan on using that for my upcoming
patch series for qemu monitor events in libvirt-qemu.so.  In
other words, while this patch changes ACL filtering to live in
remote.c and therefore we have no current client of the filtering
in object_event.c, the notion of filtering in object_event.c is
still useful down the road.

* src/check-aclrules.pl: Exempt event registration from having to
pass checkACL filter down call stack.
* daemon/remote.c (remoteRelayDomainEventCheckACL)
(remoteRelayNetworkEventCheckACL): New functions.
(remoteRelay*Event*): Use new functions.
* src/conf/domain_event.h (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Drop unused parameter.
* src/conf/network_event.h (virNetworkEventStateRegisterID):
Likewise.
* src/conf/domain_event.c (virDomainEventFilter): Delete unused
function.
* src/conf/network_event.c (virNetworkEventFilter): Likewise.
* src/libxl/libxl_driver.c: Adjust caller.
* src/lxc/lxc_driver.c: Likewise.
* src/network/bridge_driver.c: Likewise.
* src/qemu/qemu_driver.c: Likewise.
* src/remote/remote_driver.c: Likewise.
* src/test/test_driver.c: Likewise.
* src/uml/uml_driver.c: Likewise.
* src/vbox/vbox_tmpl.c: Likewise.
* src/xen/xen_driver.c: Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 11f20e43f1388d5f8f8c0bfac8c9cda6160a106b)

Conflicts:
	daemon/remote.c - not backporting network events
	src/conf/network_event.c - likewise
	src/conf/network_event.h - likewise
	src/network/bridge_driver.c - likewise
	src/conf/domain_event.c - revert back to pre-CVE state
	src/conf/domain_event.h - likewise
	src/libxl/libxl_driver.c - likewise
	src/lxc/lxc_driver.c - likewise
	src/remote/remote_driver.c - likewise
	src/test/test_driver.c - likewise
	src/uml/uml_driver.c - likewise
	src/xen/xen_driver.c - likewise
2014-02-05 08:32:33 -07:00
Daniel P. Berrange
f3aa956f32 Push nwfilter update locking up to top level
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.

In the virNWFilter{Define,Undefine} codepaths the lock ordering
is

  1. nwfilter driver lock
  2. virt driver lock
  3. nwfilter update lock
  4. domain object lock

In the VM guest startup paths the lock ordering is

  1. virt driver lock
  2. domain object lock
  3. nwfilter update lock

As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.

The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of

  1. nwfilter driver lock
  2. nwfilter update lock
  3. virt driver lock
  4. domain object lock

and VM start using

  1. nwfilter update lock
  2. virt driver lock
  3. domain object lock

This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.

These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb)
2014-02-04 15:45:24 +02:00
Eric Blake
eb7ec2312b event: filter global events by domain:getattr ACL [CVE-2014-0028]
Ever since ACL filtering was added in commit 7639736 (v1.1.1), a
user could still use event registration to obtain access to a
domain that they could not normally access via virDomainLookup*
or virConnectListAllDomains and friends.  We already have the
framework in the RPC generator for creating the filter, and
previous cleanup patches got us to the point that we can now
wire the filter through the entire object event stack.

Furthermore, whether or not domain:getattr is honored, use of
global events is a form of obtaining a list of networks, which
is covered by connect:search_domains added in a93cd08 (v1.1.0).
Ideally, we'd have a way to enforce connect:search_domains when
doing global registrations while omitting that check on a
per-domain registration.  But this patch just unconditionally
requires connect:search_domains, even when no list could be
obtained, based on the following observations:
1. Administrators are unlikely to grant domain:getattr for one
or all domains while still denying connect:search_domains - a
user that is able to manage domains will want to be able to
manage them efficiently, but efficient management includes being
able to list the domains they can access.  The idea of denying
connect:search_domains while still granting access to individual
domains is therefore not adding any real security, but just
serves as a layer of obscurity to annoy the end user.
2. In the current implementation, domain events are filtered
on the client; the server has no idea if a domain filter was
requested, and must therefore assume that all domain event
requests are global.  Even if we fix the RPC protocol to
allow for server-side filtering for newer client/server combos,
making the connect:serach_domains ACL check conditional on
whether the domain argument was NULL won't benefit older clients.
Therefore, we choose to document that connect:search_domains
is a pre-requisite to any domain event management.

Network events need the same treatment, with the obvious
change of using connect:search_networks and network:getattr.

* src/access/viraccessperm.h
(VIR_ACCESS_PERM_CONNECT_SEARCH_DOMAINS)
(VIR_ACCESS_PERM_CONNECT_SEARCH_NETWORKS): Document additional
effect of the permission.
* src/conf/domain_event.h (virDomainEventStateRegister)
(virDomainEventStateRegisterID): Add new parameter.
* src/conf/network_event.h (virNetworkEventStateRegisterID):
Likewise.
* src/conf/object_event_private.h (virObjectEventStateRegisterID):
Likewise.
* src/conf/object_event.c (_virObjectEventCallback): Track a filter.
(virObjectEventDispatchMatchCallback): Use filter.
(virObjectEventCallbackListAddID): Register filter.
* src/conf/domain_event.c (virDomainEventFilter): New function.
(virDomainEventStateRegister, virDomainEventStateRegisterID):
Adjust callers.
* src/conf/network_event.c (virNetworkEventFilter): New function.
(virNetworkEventStateRegisterID): Adjust caller.
* src/remote/remote_protocol.x
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_REGISTER)
(REMOTE_PROC_CONNECT_DOMAIN_EVENT_REGISTER_ANY)
(REMOTE_PROC_CONNECT_NETWORK_EVENT_REGISTER_ANY): Generate a
filter, and require connect:search_domains instead of weaker
connect:read.
* src/test/test_driver.c (testConnectDomainEventRegister)
(testConnectDomainEventRegisterAny)
(testConnectNetworkEventRegisterAny): Update callers.
* src/remote/remote_driver.c (remoteConnectDomainEventRegister)
(remoteConnectDomainEventRegisterAny): Likewise.
* src/xen/xen_driver.c (xenUnifiedConnectDomainEventRegister)
(xenUnifiedConnectDomainEventRegisterAny): Likewise.
* src/vbox/vbox_tmpl.c (vboxDomainGetXMLDesc): Likewise.
* src/libxl/libxl_driver.c (libxlConnectDomainEventRegister)
(libxlConnectDomainEventRegisterAny): Likewise.
* src/qemu/qemu_driver.c (qemuConnectDomainEventRegister)
(qemuConnectDomainEventRegisterAny): Likewise.
* src/uml/uml_driver.c (umlConnectDomainEventRegister)
(umlConnectDomainEventRegisterAny): Likewise.
* src/network/bridge_driver.c
(networkConnectNetworkEventRegisterAny): Likewise.
* src/lxc/lxc_driver.c (lxcConnectDomainEventRegister)
(lxcConnectDomainEventRegisterAny): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit f9f56340539d609cdc2e9d4ab812b9f146c3f100)

Conflicts:
	src/conf/object_event.c - not backporting event refactoring
	src/conf/object_event_private.h - likewise
	src/conf/network_event.c - not backporting network events
	src/conf/network_event.h - likewise
	src/network/bridge_driver.c - likewise
	src/access/viraccessperm.h - likewise
	src/remote/remote_protocol.x - likewise
	src/conf/domain_event.c - includes code that upstream has in object_event
	src/conf/domain_event.h - context
	src/libxl/libxl_driver.c - context
	src/lxc/lxc_driver.c - context
	src/remote/remote_driver.c - context, not backporting network events
	src/test/test_driver.c - context, not backporting network events
	src/uml/uml_driver.c - context
	src/xen/xen_driver.c - context
2014-01-15 14:41:47 -07:00
Martin Kletzander
99d1b7eded Fix crash in lxcDomainSetMemoryParameters
The function doesn't check whether the request is made for active or
inactive domain.  Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).

I re-made the function in order for it to work the same way it's qemu
counterpart does.

Reproducer:
 1) Define an LXC domain
 2) Do 'virsh memtune <domain> --hard-limit 133T'

Backtrace:
 Thread 6 (Thread 0x7fffec8c0700 (LWP 26826)):
 #0  0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf718) at util/vircgroup.c:1764
 #1  0x00007ffff70e9206 in virCgroupSetValueStr (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffe409f360 "1073741824")
     at util/vircgroup.c:669
 #2  0x00007ffff70e98b4 in virCgroupSetValueU64 (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", value=1073741824) at util/vircgroup.c:740
 #3  0x00007ffff70ee518 in virCgroupSetMemory (group=0x0, kb=1048576) at util/vircgroup.c:1904
 #4  0x00007ffff70ee675 in virCgroupSetMemoryHardLimit (group=0x0, kb=1048576)
     at util/vircgroup.c:1944
 #5  0x00005555557d54c8 in lxcDomainSetMemoryParameters (dom=0x7fffe40cc420,
     params=0x7fffe409f100, nparams=1, flags=0) at lxc/lxc_driver.c:774
 #6  0x00007ffff72c20f9 in virDomainSetMemoryParameters (domain=0x7fffe40cc420,
     params=0x7fffe409f100, nparams=1, flags=0) at libvirt.c:4051
 #7  0x000055555561365f in remoteDispatchDomainSetMemoryParameters (server=0x555555eb7e00,
     client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510)
     at remote_dispatch.h:7621
 #8  0x00005555556133fd in remoteDispatchDomainSetMemoryParametersHelper (server=0x555555eb7e00,
     client=0x555555ec4b10, msg=0x555555eb94e0, rerr=0x7fffec8bfb70, args=0x7fffe40b8510,
     ret=0x7fffe40b84f0) at remote_dispatch.h:7591
 #9  0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
     server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
     at rpc/virnetserverprogram.c:435
 #10 0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
     server=0x555555eb7e00, client=0x555555ec4b10, msg=0x555555eb94e0)
     at rpc/virnetserverprogram.c:305
 #11 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ec4b10,
     prog=0x555555ec3ae0, msg=0x555555eb94e0) at rpc/virnetserver.c:165
 #12 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ec3e30, opaque=0x555555eb7e00)
     at rpc/virnetserver.c:186
 #13 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
 #14 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
 #15 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
 #16 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
(cherry picked from commit 9faf3f2950aed1643ab7564afcb4c693c77f71b5)
2013-12-20 13:00:21 +00:00
Martin Kletzander
705f388bce CVE-2013-6436: fix crash in lxcDomainGetMemoryParameters
The function doesn't check whether the request is made for active or
inactive domain.  Thus when the domain is not running it still tries
accessing non-existing cgroups (priv->cgroup, which is NULL).

I re-made the function in order for it to work the same way it's qemu
counterpart does.

Reproducer:
 1) Define an LXC domain
 2) Do 'virsh memtune <domain>'

Backtrace:
 Thread 6 (Thread 0x7fffec8c0700 (LWP 13387)):
 #0  0x00007ffff70edcc4 in virCgroupPathOfController (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", path=0x7fffec8bf750) at util/vircgroup.c:1764
 #1  0x00007ffff70e958c in virCgroupGetValueStr (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf7c0) at util/vircgroup.c:705
 #2  0x00007ffff70e9d29 in virCgroupGetValueU64 (group=0x0, controller=3,
     key=0x7ffff75734bd "memory.limit_in_bytes", value=0x7fffec8bf810) at util/vircgroup.c:804
 #3  0x00007ffff70ee706 in virCgroupGetMemoryHardLimit (group=0x0, kb=0x7fffec8bf8a8)
     at util/vircgroup.c:1962
 #4  0x00005555557d590f in lxcDomainGetMemoryParameters (dom=0x7fffd40024a0,
     params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at lxc/lxc_driver.c:826
 #5  0x00007ffff72c28d3 in virDomainGetMemoryParameters (domain=0x7fffd40024a0,
     params=0x7fffd40027a0, nparams=0x7fffec8bfa24, flags=0) at libvirt.c:4137
 #6  0x000055555563714d in remoteDispatchDomainGetMemoryParameters (server=0x555555eb7e00,
     client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
     ret=0x7fffd4002420) at remote.c:1895
 #7  0x00005555556052c4 in remoteDispatchDomainGetMemoryParametersHelper (server=0x555555eb7e00,
     client=0x555555ebaef0, msg=0x555555ebb3e0, rerr=0x7fffec8bfb70, args=0x7fffd40024e0,
     ret=0x7fffd4002420) at remote_dispatch.h:4050
 #8  0x00007ffff73b293f in virNetServerProgramDispatchCall (prog=0x555555ec3ae0,
     server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
     at rpc/virnetserverprogram.c:435
 #9  0x00007ffff73b207f in virNetServerProgramDispatch (prog=0x555555ec3ae0,
     server=0x555555eb7e00, client=0x555555ebaef0, msg=0x555555ebb3e0)
     at rpc/virnetserverprogram.c:305
 #10 0x00007ffff73a4d2c in virNetServerProcessMsg (srv=0x555555eb7e00, client=0x555555ebaef0,
     prog=0x555555ec3ae0, msg=0x555555ebb3e0) at rpc/virnetserver.c:165
 #11 0x00007ffff73a4e8d in virNetServerHandleJob (jobOpaque=0x555555ebc7e0, opaque=0x555555eb7e00)
     at rpc/virnetserver.c:186
 #12 0x00007ffff7187f3f in virThreadPoolWorker (opaque=0x555555eb7ac0) at util/virthreadpool.c:144
 #13 0x00007ffff718733a in virThreadHelper (data=0x555555eb7890) at util/virthreadpthread.c:161
 #14 0x00007ffff468ed89 in start_thread (arg=0x7fffec8c0700) at pthread_create.c:308
 #15 0x00007ffff3da26bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
(cherry picked from commit f8c1cb90213508c4f32549023b0572ed774e48aa)
2013-12-20 13:00:15 +00:00
Daniel P. Berrange
262157f651 LXC: Ensure security context is set when mounting images
When setting up filesystems backed by block devices or file
images, the SELinux mount options must be used to ensure the
correct context is set

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-29 12:19:37 +00:00
Daniel P. Berrange
d45b833d14 Pull lxcContainerGetSubtree out into shared virfile module
Move the code for lxcContainerGetSubtree into the virfile
module creating 2 new functions

  int virFileGetMountSubtree(const char *mtabpath,
                             const char *prefix,
                             char ***mountsret,
                             size_t *nmountsret);
  int virFileGetMountReverseSubtree(const char *mtabpath,
                                    const char *prefix,
                                    char ***mountsret,
                                    size_t *nmountsret);

Add a new virfiletest.c test case to validate the new code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-28 11:49:01 +00:00
Daniel P. Berrange
c60a2713d6 Introduce standard methods for sorting strings with qsort
Add virStringSortCompare and virStringSortRevCompare as
standard functions to use with qsort.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-28 11:29:46 +00:00
Gao feng
f575fda748 LXC: don't unmount mounts for shared root
Also after commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
vfs: Lock in place mounts from more privileged users,

unprivileged user has no rights to umount the mounts that
inherited from parent mountns.

right now, I have no good idea to fix this problem, we need
to do more research. this patch just skip unmounting these
mounts for shared root.

BTW, I think when libvirt lxc enables user namespace, the
configuation that shares root with host is very rara.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-11-26 15:55:47 +00:00
Gao feng
46f2d16f07 LXC: fix the problem that libvirt lxc fail to start on latest kernel
After kernel commit 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942
vfs: Lock in place mounts from more privileged users,

unprivileged user has no rights to move the mounts that
inherited from parent mountns. we use this feature to move
the /stateDir/domain-name.{dev, devpts} to the /dev/ and
/dev/pts directroy of container. this commit breaks libvirt lxc.

this patch changes the behavior to bind these mounts when
user namespace is enabled and move these mounts when user
namespace is disabled.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-11-26 12:22:25 +00:00
Chen Hanxiao
55d1285ef4 lxc: don't do duplicate work when getting pagesize
Don't do duplicate work when getting pagesize.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-11-25 10:52:50 +01:00
Eric Blake
64b2335c2a maint: fix comma style issues: remaining drivers
Most of our code base uses space after comma but not before;
fix the remaining uses before adding a syntax check.

* src/lxc/lxc_container.c: Consistently use commas.
* src/openvz/openvz_driver.c: Likewise.
* src/openvz/openvz_util.c: Likewise.
* src/remote/remote_driver.c: Likewise.
* src/test/test_driver.c: Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-11-20 09:14:55 -07:00
Daniel P. Berrange
784bb73eaa Add missing 'return 0;' in stub lxcStartFuse() method impl.
Without a 'return 0' in the stub lxcStartFuse() method, the
compiler warns:

lxc/lxc_fuse.c:374: error: control reaches end of non-void function
[-Wreturn-type]

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-18 16:12:39 +00:00
Daniel P. Berrange
3563c51d3e Avoid async signal safety problem in glibc's setxid
The glibc setxid is supposed to be async signal safe, but
libc developers confirm that it is not. This causes a problem
when libvirt_lxc starts the FUSE thread and then runs clone()
to start the container. If the clone() was done before the
FUSE thread has completely started up, then the container
will hang in setxid after clone().

The fix is to avoid creating any threads until after the
container has been clone()'d. By avoiding any threads in
the parent, the child is no longer required to run in an
async signal safe context, and we thus avoid the glibc
bug.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-18 15:36:23 +00:00
Daniel P. Berrange
5087a5a009 Fix busy wait loop in LXC container I/O handling
If the host side of an LXC container console disconnected
and the guest side continued to write data, until the PTY
buffer filled up, the LXC controller would busy wait. It
would repeatedly see POLLHUP from poll() and not disable
the watch.

This was due to some bogus logic detecting blocking
conditions. Upon seeing a POLLHUP we must disable all
reading & writing from the PTY, and setup the epoll to
wake us up again when the connection comes back.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-12 11:14:49 +00:00
Peter Krempa
de7b5faf43 conf: Refactor storing and usage of feature flags
Currently we were storing domain feature flags in a bit field as the
they were either enabled or disabled. New features such as paravirtual
spinlocks however can be tri-state as the default option may depend on
hypervisor version.

To allow storing tri-state feature state in the same place instead of
having to declare dedicated variables for each feature this patch
refactors the bit field to an array.
2013-11-08 09:44:42 +01:00
Daniel P. Berrange
9ecbd38c4c Skip any files which are not mounted on the host
Currently the LXC container tries to skip selinux/securityfs
mounts if the directory does not exist in the filesystem,
or if SELinux is disabled.

The former check is flawed because the /sys/fs/selinux
or /sys/kernel/securityfs directories may exist in sysfs
even if the mount type is disabled. Instead of just doing
an access() check, use an virFileIsMounted() to see if
the FS is actually present in the host OS. This also
avoids the need to check is_selinux_enabled().

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-05 15:51:48 +08:00
Daniel P. Berrange
bf8874025e Add flag to lxcBasicMounts to control use in user namespaces
Some mounts must be skipped if running inside a user namespace,
since the kernel forbids their use. Instead of strcmp'ing the
filesystem type in the body of the loop, set an explicit flag
in the lxcBasicMounts table.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-05 15:51:47 +08:00
Daniel P. Berrange
6d5fdde3dd Remove duplicate entries in lxcBasicMounts array
Currently the lxcBasicMounts array has separate entries for
most mounts, to reflect that we must do a separate mount
operation to make mounts read-only. Remove the duplicate
entries and instead set the MS_RDONLY flag against the main
entry. Then change lxcContainerMountBasicFS to look for the
MS_RDONLY flag, mask it out & do a separate bind mount.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-05 15:51:47 +08:00
Daniel P. Berrange
f567a583f3 Remove pointless 'srcpath' variable in lxcContainerMountBasicFS
The 'srcpath' variable is initialized from 'mnt->src' and never
changed thereafter. Some places continue to use 'mnt->src' and
others use 'srcpath'. Remove the pointless 'srcpath' variable
and use 'mnt->src' everywhere.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-05 15:51:47 +08:00
Daniel P. Berrange
c6b84a9dee Remove unused 'opts' field from LXC basic mounts struct
The virLXCBasicMountInfo struct contains a 'char *opts'
field passed onto the mount() syscall. Every entry in the
list sets this to NULL though, so it can be removed to
simplify life.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-11-05 15:51:47 +08:00
Gao feng
919374c73e LXC: don't free tty before using it in lxcContainerSetupDevices
Introduced by commit 0f31f7b.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
2013-10-29 15:44:56 +01:00
Giuseppe Scrivano
b51038a4cd capabilities: add baselabel per sec driver/virt type to secmodel
Expand the "secmodel" XML fragment of "host" with a sequence of
baselabel's which describe the default security context used by
libvirt with a specific security model and virtualization type:

<secmodel>
  <model>selinux</model>
  <doi>0</doi>
  <baselabel type='kvm'>system_u:system_r:svirt_t:s0</baselabel>
  <baselabel type='qemu'>system_u:system_r:svirt_tcg_t:s0</baselabel>
</secmodel>
<secmodel>
  <model>dac</model>
  <doi>0</doi>
  <baselabel type='kvm'>107:107</baselabel>
  <baselabel type='qemu'>107:107</baselabel>
</secmodel>

"baselabel" is driver-specific information, e.g. in the DAC security
model, it indicates USER_ID:GROUP_ID.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-29 07:06:04 -06:00
Chen Hanxiao
8e1336fea9 Skip debug message in lxcContainerSetID if no map is set.
The lxcContainerSetID() method prints a misleading log
message about setting the uid/gid when no ID map is
present in the XML config. Skip the debug message in
this case.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-28 11:19:20 +00:00
Daniel P. Berrange
9b0af09240 Remove (nearly) all use of getuid()/getgid()
Most of the usage of getuid()/getgid() is in cases where we are
considering what privileges we have. As such the code should be
using the effective IDs, not real IDs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-21 14:03:52 +01:00
Daniel P. Berrange
1e4a02bdfe Remove all direct use of getenv
Unconditional use of getenv is not secure in setuid env.
While not all libvirt code runs in a setuid env (since
much of it only exists inside libvirtd) this is not always
clear to developers. So make all the code paranoid, even
if it only ever runs inside libvirtd.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-21 14:03:52 +01:00
Daniel P. Berrange
9b8f307c6a Make virCommand env handling robust in setuid env
When running setuid, we must be careful about what env vars
we allow commands to inherit from us. Replace the
virCommandAddEnvPass function with two new ones which do
filtering

  virCommandAddEnvPassAllowSUID
  virCommandAddEnvPassBlockSUID

And make virCommandAddEnvPassCommon use the appropriate
ones

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-21 14:03:52 +01:00
Daniel P. Berrange
0894ce863f Fix typo breaking cgroups for NBD backed filesystems
A typo in the setup of NBD backed filesystems meant the
/dev/nbdN device would not be added to the cgroups device
ACL.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-16 12:22:40 +01:00
Daniel P. Berrange
8f132ef1b1 Add some logging to LXC disk/fs nbd/loop setup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-16 12:22:40 +01:00
Daniel P. Berrange
1d8afffecd Add logging to LXC cgroup devices setup
To facilitate debugging, add some more logging to LXC cgroup
devices ACL setup.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-16 12:22:40 +01:00
Hongwei Bi
dcd0f6d724 fix typo in lxc_driver.c and virsh-nodedev.c 2013-10-15 06:47:24 -06:00
Eric Blake
33aec50684 maint: avoid 'const fooPtr' in all remaining places
'const fooPtr' is the same as 'foo * const' (the pointer won't
change, but it's contents can).  But in general, if an interface
is trying to be const-correct, it should be using 'const foo *'
(the pointer is to data that can't be changed).

Fix up all remaining offenders.

* src/lxc/lxc_process.c (virLXCProcessSetupInterfaceBridged): Drop
needless const.
* src/uml/uml_driver.c (umlMonitorCommand): Use intended type.
(umlMonitorAddress): Fix fallout.
* src/xen/xm_internal.c (xenXMDomainSearchForUUID): Use intended type.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-14 14:34:38 -06:00
Eric Blake
d24677090f maint: avoid 'const fooPtr' in domain_conf
'const fooPtr' is the same as 'foo * const' (the pointer won't
change, but it's contents can).  But in general, if an interface
is trying to be const-correct, it should be using 'const foo *'
(the pointer is to data that can't be changed).

Fix up offenders in src/conf/domain_conf, and their fallout.

Several things to note: virObjectLock() requires a non-const
argument; if this were C++, we could treat the locking field
as 'mutable' and allow locking an otherwise 'const' object, but
that is a more invasive change, so I instead dropped attempts
to be const-correct on domain lookup.  virXMLPropString and
friends require a non-const xmlNodePtr - this is because libxml2
is not a const-correct library.  We could make the src/util/virxml
wrappers cast away const, but I figured it was easier to not
try to mark xmlNodePtr as const.  Finally, virDomainDeviceDefCopy
was a rather hard conversion - it calls virDomainDeviceDefPostParse,
which in turn in the xen driver was actually modifying the domain
outside of the current device being visited.  We should not be
adding a device on the first per-device callback, but waiting until
after all per-device callbacks are complete.

* src/conf/domain_conf.h (virDomainObjListFindByID)
(virDomainObjListFindByUUID, virDomainObjListFindByName)
(virDomainObjAssignDef, virDomainObjListAdd): Drop attempt at
const.
(virDomainDeviceDefCopy): Use intended type.
(virDomainDeviceDefParse, virDomainDeviceDefPostParseCallback)
(virDomainVideoDefaultType, virDomainVideoDefaultRAM)
(virDomainChrGetDomainPtrs): Make const-correct.
* src/conf/domain_conf.c (virDomainObjListFindByID)
(virDomainObjListFindByUUID, virDomainObjListFindByName)
(virDomainDeviceDefCopy, virDomainObjListAdd)
(virDomainObjAssignDef, virDomainHostdevSubsysUsbDefParseXML)
(virDomainHostdevSubsysPciOrigStatesDefParseXML)
(virDomainHostdevSubsysPciDefParseXML)
(virDomainHostdevSubsysScsiDefParseXML)
(virDomainControllerModelTypeFromString)
(virDomainTPMDefParseXML, virDomainTimerDefParseXML)
(virDomainSoundCodecDefParseXML, virDomainSoundDefParseXML)
(virDomainWatchdogDefParseXML, virDomainRNGDefParseXML)
(virDomainMemballoonDefParseXML, virDomainNVRAMDefParseXML)
(virSysinfoParseXML, virDomainVideoAccelDefParseXML)
(virDomainVideoDefParseXML, virDomainHostdevDefParseXML)
(virDomainRedirdevDefParseXML)
(virDomainRedirFilterUsbDevDefParseXML)
(virDomainRedirFilterDefParseXML, virDomainIdMapEntrySort)
(virDomainIdmapDefParseXML, virDomainVcpuPinDefParseXML)
(virDiskNameToBusDeviceIndex, virDomainDeviceDefCopy)
(virDomainVideoDefaultType, virDomainHostdevAssignAddress)
(virDomainDeviceDefPostParseInternal, virDomainDeviceDefPostParse)
(virDomainChrGetDomainPtrs, virDomainControllerSCSINextUnit)
(virDomainSCSIDriveAddressIsUsed)
(virDomainDriveAddressIsUsedByDisk)
(virDomainDriveAddressIsUsedByHostdev): Fix fallout.
* src/openvz/openvz_driver.c (openvzDomainDeviceDefPostParse):
Likewise.
* src/libxl/libxl_domain.c (libxlDomainDeviceDefPostParse):
Likewise.
* src/qemu/qemu_domain.c (qemuDomainDeviceDefPostParse)
(qemuDomainDefaultNetModel): Likewise.
* src/lxc/lxc_domain.c (virLXCDomainDeviceDefPostParse):
Likewise.
* src/uml/uml_driver.c (umlDomainDeviceDefPostParse): Likewise.
* src/xen/xen_driver.c (xenDomainDeviceDefPostParse): Split...
(xenDomainDefPostParse): ...since per-device callback is not the
time to be adding a device.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-10-14 14:34:38 -06:00
Daniel P. Berrange
5a1cb1075a Improve log filtering in virLXCProcessReadLogOutputData
Make the virLXCProcessReadLogOutputData method ignore the log
lines about the container startup argv, ignore the generic
error message from libvirt_lxc when lxcContainerMain fails
and skip over blank lines.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 15:38:20 +01:00
Daniel P. Berrange
01100c7f60 Ensure lxcContainerResolveSymlinks reports errors
The lxcContainerResolveSymlinks method merely logged some errors
as debug messages, rather than reporting them as proper errors.
This meant startup failures were not diagnosed at all.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 15:38:20 +01:00
Daniel P. Berrange
558546fb8f Ensure lxcContainerMain reports errors on stderr
Ensure the lxcContainerMain method reports any errors that
occur during setup to stderr, where libvirtd will pick them
up.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 15:38:20 +01:00
Daniel P. Berrange
97973ebb7a Initialize threading & error layer in LXC controller
In Fedora 20, libvirt_lxc crashes immediately at startup with a
trace

 #0  0x00007f0cddb653ec in free () from /lib64/libc.so.6
 #1  0x00007f0ce0e16f4a in virFree (ptrptr=ptrptr@entry=0x7f0ce1830058) at util/viralloc.c:580
 #2  0x00007f0ce0e2764b in virResetError (err=0x7f0ce1830030) at util/virerror.c:354
 #3  0x00007f0ce0e27a5a in virResetLastError () at util/virerror.c:387
 #4  0x00007f0ce0e28858 in virEventRegisterDefaultImpl () at util/virevent.c:233
 #5  0x00007f0ce0db47c6 in main (argc=11, argv=0x7fff4596c328) at lxc/lxc_controller.c:2352

Normally virInitialize calls virErrorInitialize and
virThreadInitialize, but we don't link to libvirt.so
in libvirt_lxc, and nor did we ever call the error
or thread initializers.

I have absolutely no idea how this has ever worked, let alone
what caused it to stop working in Fedora 20.

In addition not all code paths from virLogSetFromEnv will
ensure virLogInitialize is called correctly, which is another
possible crash scenario.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 12:16:23 +01:00
Daniel P. Berrange
1815e2d081 Improve error reporting with LXC controller
The LXC code would read the log file if an LXC guest failed to
startup. There were a number of failure cases where the guest
will not start and libvirtd never gets as far as looking at the
log file.

Fix this by replacing some earlier generic errors with messages
from the log.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 10:33:07 +01:00
Daniel P. Berrange
13c011c337 Fix exit status of lxc controller
The LXC controller main() method initialized 'rc' to 1
rather than '-1'. In the cleanup path it will print any
error to stderr, if-and-only-if rc < 0. Hence the incorrect
initialization caused errors to be lost.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 10:31:01 +01:00
Daniel P. Berrange
ae9a0485ae Make LXC controller use a private dbus connection & close it
The LXC controller uses dbus to talk to systemd to create
cgroups. This means that each LXC controller instance has
a dbus connection. The DBus daemon is limited to 256
connections by default and we want to be able to run many
1000 of containers.

While the dbus limit could be raised in the config files,
it is simpler to make libvirt LXC controller close its
dbus connection once everything is configured.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-14 10:31:01 +01:00
Chen Hanxiao
2c9ccd1e0c lxc: Fix an improper comment in lxc_process.c
Fix the improper comment for the "release" hook.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-10-14 16:15:14 +08:00
Ján Tomko
3f029fb531 LXC: Fix handling of RAM filesystem size units
Since 76b644c when the support for RAM filesystems was introduced,
libvirt accepted the following XML:
<source usage='1024' unit='KiB'/>

This was parsed correctly and internally stored in bytes, but it
was formatted as (with an extra 's'):
<source usage='1024' units='KiB'/>
When read again, this was treated as if the units were missing,
meaning libvirt was unable to parse its own XML correctly.

The usage attribute was documented as being in KiB, but it was not
scaled if the unit was missing. Transient domains still worked,
because this was balanced by an extra 'k' in the mount options.

This patch:
Changes the parser to use 'units' instead of 'unit', as the latter
was never documented (fixing persistent domains) and some programs
(libvirt-glib, libvirt-sandbox) already parse the 'units' attribute.

Removes the extra 'k' from the tmpfs mount options, which is needed
because now we parse our own XML correctly.

Changes the default input unit to KiB to match documentation, fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=1015689
2013-10-09 17:44:45 +02:00
Daniel P. Berrange
999d72fbd5 Remove use of virConnectPtr from all remaining nwfilter code
The virConnectPtr is passed around loads of nwfilter code in
order to provide it as a parameter to the callback registered
by the virt drivers. None of the virt drivers use this param
though, so it serves no purpose.

Avoiding the need to pass a virConnectPtr means that the
nwfilterStateReload method no longer needs to open a bogus
QEMU driver connection. This addresses a race condition that
can lead to a crash on startup.

The nwfilter driver starts before the QEMU driver and registers
some callbacks with DBus to detect firewalld reload. If the
firewalld reload happens while the QEMU driver is still starting
up though, the nwfilterStateReload method will open a connection
to the partially initialized QEMU driver and cause a crash.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-07 14:19:10 +01:00
Daniel P. Berrange
8766e9b5a5 Avoid deleting NULL veth device name
If veth device allocation has a fatal error, the veths
array may contain NULL device names. Avoid calling the
virNetDevVethDelete function on such names.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:28:08 +01:00
Daniel P. Berrange
f5eae57086 Don't set netdev offline in container cleanup
During container cleanup there is a race where the kernel may
have destroyed the veth device before we try to set it offline.
This causes log error messages. Given that we're about to
delete the device entirely, setting it offline is pointless.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-10-03 11:25:20 +01:00