Commit Graph

675 Commits

Author SHA1 Message Date
Gao feng
1c7037cff4 LXC: don't try to mount selinux filesystem when user namespace enabled
Right now we mount selinuxfs even user namespace is enabled and
ignore the error. But we shouldn't ignore these errors when user
namespace is not enabled.

This patch skips mounting selinuxfs when user namespace enabled.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-09-12 15:18:01 +01:00
Daniel P. Berrange
75235a52bc Ensure root filesystem is recursively mounted readonly
If the guest is configured with

    <filesystem type='mount'>
      <source dir='/'/>
      <target dir='/'/>
      <readonly/>
    </filesystem>

Then any submounts under / should also end up readonly, except
for those setup as basic mounts. eg if the user has /home on a
separate volume, they'd expect /home to be readonly, but we
should not touch the /sys, /proc, etc dirs we setup ourselves.

Users can selectively make sub-mounts read-write again by
simply listing them as new mounts without the <readonly>
flag set

    <filesystem type='mount'>
      <source dir='/home'/>
      <target dir='/home'/>
    </filesystem>

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-09-12 12:01:49 +01:00
Daniel P. Berrange
f27f5f7edd Move array of mounts out of lxcContainerMountBasicFS
Move the array of basic mounts out of the lxcContainerMountBasicFS
function, to a global variable. This is to allow it to be referenced
by other methods wanting to know what the basic mount paths are.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-09-12 11:52:12 +01:00
Gao feng
66e2adb2ba LXC: introduce lxcContainerUnmountForSharedRoot
Move the unmounting private or useless filesystems for
container to this function.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-09-11 13:09:31 +01:00
Gao feng
4142bf46b8 LXC: umount the temporary filesystem created by libvirt
The devpts, dev and fuse filesystems are mounted temporarily.
there is no need to export them to container if container shares
the root directory with host.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-09-11 13:09:31 +01:00
Hongwei Bi
46c9bce4c8 LXC: Free variable vroot in lxcDomainDetachDeviceHostdevUSBLive()
The variable vroot should be freed in label cleanup.
2013-09-09 10:40:13 +02:00
Chen Hanxiao
744fb50831 LXC: fix typos in lxc_container.c
Fix docs and error message typos in lxc_container.c

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
2013-09-06 12:14:00 +01:00
Gao feng
1583dfda7c LXC: Don't mount securityfs when user namespace enabled
Right now, securityfs is disallowed to be mounted in non-initial
user namespace, so we must avoid trying to mount securityfs in
a container which has user namespace enabled.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-09-05 12:00:07 +01:00
Daniel P. Berrange
c13a2c282b Ensure that /dev exists in the container root filesystem
If booting a container with a root FS that isn't the host's
root, we must ensure that the /dev mount point exists.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-08-13 16:26:44 +01:00
Daniel P. Berrange
2d07f84302 Honour root prefix in lxcContainerMountFSBlockAuto
The lxcContainerMountFSBlockAuto method can be used to mount the
initial root filesystem, so it cannot assume a prefix of /.oldroot.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-08-13 14:04:28 +01:00
Dan Walsh
6807238d87 Ensure securityfs is mounted readonly in container
If securityfs is available on the host, we should ensure to
mount it read-only in the container. This will avoid systemd
trying to mount it during startup causing SELinux AVCs.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-08-08 14:25:50 +01:00
Michal Privoznik
1199edb1d4 Introduce max_queued_clients
This configuration knob lets user to set the length of queue of
connection requests waiting to be accept()-ed by the daemon. IOW, it
just controls the @backlog passed to listen:

  int listen(int sockfd, int backlog);
2013-08-05 11:03:01 +02:00
Daniel P. Berrange
1166eeba61 Fix crashing upgrading from older libvirts with running guests
If upgrading from a libvirt that is older than 1.0.5, we can
not assume that vm->def->resource is non-NULL. This bogus
assumption caused libvirtd to crash

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-08-02 15:32:26 +01:00
Daniel P. Berrange
2fe2470181 Enable support for systemd-machined in cgroups creation
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-31 19:29:19 +01:00
Yuri Chornoivan
5b4c035b08 Fix minor typos in messages and docs
Signed-off-by: Eric Blake <eblake@redhat.com>
2013-07-30 07:07:33 -06:00
Daniel P. Berrange
35fe8d97c0 Set default partition in libvirtd instead of libvirt_lxc
By setting the default partition in libvirt_lxc it is not
visible when querying the live XML. Move setting of the
default partition into libvirtd virLXCProcessStart

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-26 17:46:22 +01:00
John Ferlan
cefb97fb81 virStateDriver - Separate AutoStart from Initialize
Adjust these drivers to handle their Autostart functionality after each
of the drivers has gone through their Initialization functions
2013-07-26 09:30:53 -04:00
Daniel P. Berrange
5ec5a22493 Add 'controllers' arg to virCgroupNewDetect
When detecting cgroups we must honour any controllers
whitelist the driver may have.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:55:47 +01:00
Daniel P. Berrange
a45b99ead9 Introduce a more convenient virCgroupNewDetectMachine
Instead of requiring drivers to use a combination of calls
to virCgroupNewDetect and virCgroupIsValidMachine, combine
the two into virCgroupNewDetectMachine

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 19:47:30 +01:00
Daniel P. Berrange
f6c5f9077c Convert LXC driver to use virCgroupNewMachine
Convert the LXC driver code to use the new atomic API
for setup of cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-25 11:42:48 +01:00
Michal Privoznik
4e5f0dd2d3 virLXCMonitorClose: Unlock domain while closing monitor
There's a race in lxc driver causing a deadlock. If a domain is
destroyed immediately after started, the deadlock can occur. When domain
is started, the even loop tries to connect to the monitor. If the
connecting succeeds, virLXCProcessMonitorInitNotify() is called with
@mon->client locked. The first thing that callee does, is
virObjectLock(vm). So the order of locking is: 1) @mon->client, 2) @vm.

However, if there's another thread executing virDomainDestroy on the
very same domain, the first thing done here is locking the @vm. Then,
the corresponding libvirt_lxc process is killed and monitor is closed
via calling virLXCMonitorClose(). This callee tries to lock @mon->client
too. So the order is reversed to the first case. This situation results
in deadlock and unresponsive libvirtd (since the eventloop is involved).

The proper solution is to unlock the @vm in virLXCMonitorClose prior
entering virNetClientClose(). See the backtrace as follows:

Thread 25 (Thread 0x7f1b7c9b8700 (LWP 16312)):
0  0x00007f1b80539714 in __lll_lock_wait () from /lib64/libpthread.so.0
1  0x00007f1b8053516c in _L_lock_516 () from /lib64/libpthread.so.0
2  0x00007f1b80534fbb in pthread_mutex_lock () from /lib64/libpthread.so.0
3  0x00007f1b82a637cf in virMutexLock (m=0x7f1b3c0038d0) at util/virthreadpthread.c:85
4  0x00007f1b82a4ccf2 in virObjectLock (anyobj=0x7f1b3c0038c0) at util/virobject.c:320
5  0x00007f1b82b861f6 in virNetClientCloseInternal (client=0x7f1b3c0038c0, reason=3) at rpc/virnetclient.c:696
6  0x00007f1b82b862f5 in virNetClientClose (client=0x7f1b3c0038c0) at rpc/virnetclient.c:721
7  0x00007f1b6ee12500 in virLXCMonitorClose (mon=0x7f1b3c007210) at lxc/lxc_monitor.c:216
8  0x00007f1b6ee129f0 in virLXCProcessCleanup (driver=0x7f1b68100240, vm=0x7f1b680ceb70, reason=VIR_DOMAIN_SHUTOFF_DESTROYED) at lxc/lxc_process.c:174
9  0x00007f1b6ee14106 in virLXCProcessStop (driver=0x7f1b68100240, vm=0x7f1b680ceb70, reason=VIR_DOMAIN_SHUTOFF_DESTROYED) at lxc/lxc_process.c:710
10 0x00007f1b6ee1aa36 in lxcDomainDestroyFlags (dom=0x7f1b5c002560, flags=0) at lxc/lxc_driver.c:1291
11 0x00007f1b6ee1ab1a in lxcDomainDestroy (dom=0x7f1b5c002560) at lxc/lxc_driver.c:1321
12 0x00007f1b82b05be5 in virDomainDestroy (domain=0x7f1b5c002560) at libvirt.c:2303
13 0x00007f1b835a7e85 in remoteDispatchDomainDestroy (server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0, rerr=0x7f1b7c9b7c30, args=0x7f1b5c004a50) at remote_dispatch.h:3143
14 0x00007f1b835a7d78 in remoteDispatchDomainDestroyHelper (server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0, rerr=0x7f1b7c9b7c30, args=0x7f1b5c004a50, ret=0x7f1b5c0029e0) at remote_dispatch.h:3121
15 0x00007f1b82b93704 in virNetServerProgramDispatchCall (prog=0x7f1b8573af90, server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0) at rpc/virnetserverprogram.c:435
16 0x00007f1b82b93263 in virNetServerProgramDispatch (prog=0x7f1b8573af90, server=0x7f1b857419d0, client=0x7f1b8574ae40, msg=0x7f1b8574acf0) at rpc/virnetserverprogram.c:305
17 0x00007f1b82b8c0f6 in virNetServerProcessMsg (srv=0x7f1b857419d0, client=0x7f1b8574ae40, prog=0x7f1b8573af90, msg=0x7f1b8574acf0) at rpc/virnetserver.c:163
18 0x00007f1b82b8c1da in virNetServerHandleJob (jobOpaque=0x7f1b8574dca0, opaque=0x7f1b857419d0) at rpc/virnetserver.c:184
19 0x00007f1b82a64158 in virThreadPoolWorker (opaque=0x7f1b8573cb10) at util/virthreadpool.c:144
20 0x00007f1b82a63ae5 in virThreadHelper (data=0x7f1b8574b9f0) at util/virthreadpthread.c:161
21 0x00007f1b80532f4a in start_thread () from /lib64/libpthread.so.0
22 0x00007f1b7fc4f20d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f1b83546740 (LWP 16297)):
0  0x00007f1b80539714 in __lll_lock_wait () from /lib64/libpthread.so.0
1  0x00007f1b8053516c in _L_lock_516 () from /lib64/libpthread.so.0
2  0x00007f1b80534fbb in pthread_mutex_lock () from /lib64/libpthread.so.0
3  0x00007f1b82a637cf in virMutexLock (m=0x7f1b680ceb80) at util/virthreadpthread.c:85
4  0x00007f1b82a4ccf2 in virObjectLock (anyobj=0x7f1b680ceb70) at util/virobject.c:320
5  0x00007f1b6ee13bd7 in virLXCProcessMonitorInitNotify (mon=0x7f1b3c007210, initpid=4832, vm=0x7f1b680ceb70) at lxc/lxc_process.c:601
6  0x00007f1b6ee11fd3 in virLXCMonitorHandleEventInit (prog=0x7f1b3c001f10, client=0x7f1b3c0038c0, evdata=0x7f1b8574a7d0, opaque=0x7f1b3c007210) at lxc/lxc_monitor.c:109
7  0x00007f1b82b8a196 in virNetClientProgramDispatch (prog=0x7f1b3c001f10, client=0x7f1b3c0038c0, msg=0x7f1b3c003928) at rpc/virnetclientprogram.c:259
8  0x00007f1b82b87030 in virNetClientCallDispatchMessage (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1019
9  0x00007f1b82b876bb in virNetClientCallDispatch (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1140
10 0x00007f1b82b87d41 in virNetClientIOHandleInput (client=0x7f1b3c0038c0) at rpc/virnetclient.c:1312
11 0x00007f1b82b88f51 in virNetClientIncomingEvent (sock=0x7f1b3c0044e0, events=1, opaque=0x7f1b3c0038c0) at rpc/virnetclient.c:1832
12 0x00007f1b82b9e1c8 in virNetSocketEventHandle (watch=3321, fd=54, events=1, opaque=0x7f1b3c0044e0) at rpc/virnetsocket.c:1695
13 0x00007f1b82a272cf in virEventPollDispatchHandles (nfds=21, fds=0x7f1b8574ded0) at util/vireventpoll.c:498
14 0x00007f1b82a27af2 in virEventPollRunOnce () at util/vireventpoll.c:645
15 0x00007f1b82a25a61 in virEventRunDefaultImpl () at util/virevent.c:273
16 0x00007f1b82b8e97e in virNetServerRun (srv=0x7f1b857419d0) at rpc/virnetserver.c:1097
17 0x00007f1b8359db6b in main (argc=2, argv=0x7ffff98dbaa8) at libvirtd.c:1512
2013-07-24 17:53:00 +02:00
John Ferlan
8134b37d34 lxc: Resolve Coverity warning
Commit 'c8695053' resulted in the following:

Coverity error seen in the output:
    ERROR: REVERSE_INULL
    FUNCTION: lxcProcessAutoDestroy

Due to the 'dom' being checked before 'dom->persistent' since 'dom'
is already dereferenced prior to that.
2013-07-23 19:04:48 -04:00
Daniel P. Berrange
da704c8782 Create + setup cgroups atomically for LXC process
Currently the LXC driver creates the VM's cgroup prior to
forking, and then libvirt_lxc moves the child process
into the cgroup. This won't work with systemd whose APIs
do the creation of cgroups + attachment of processes atomically.

Fortunately we simply move the entire cgroups setup into
the libvirt_lxc child process. We make it take place before
fork'ing into the background, so by the time virCommandRun
returns in the LXC driver, the cgroup is guaranteed to be
present.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
87b2e6fa84 Auto-detect existing cgroup placement
Use the new virCgroupNewDetect function to determine cgroup
placement of existing running VMs. This will allow the legacy
cgroups creation APIs to be removed entirely

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-23 22:46:31 +01:00
Daniel P. Berrange
0d7f45aea7 Convert remainder of cgroups code to report errors
Convert the remaining methods in vircgroup.c to report errors
instead of returning errno values.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
3260fdfab0 Convert the virCgroupKill* APIs to report errors
Instead of returning errno values, change the virCgroupKill*
APIs to fully report errors.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
b64dabff27 Report full errors from virCgroupNew*
Instead of returning raw errno values, report full libvirt
errors in virCgroupNew* functions.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 13:09:58 +01:00
Daniel P. Berrange
3aac4e5632 LXC: Set default driver for image backed filesystems
If no explicit driver is set for an image backed filesystem,
set it to use the loop driver (if raw) or nbd driver (if
non-raw)

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 12:32:25 +01:00
Daniel P. Berrange
2e832b18d6 LXC: Fix some error reporting in filesystem setup
A couple of places in LXC setup for filesystems did not do
a "goto cleanup" after reporting errors. While fixing this,
also add in many more debug statements to aid troubleshooting

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-22 12:32:07 +01:00
Michal Privoznik
dbeb04a65c Introduce lxcDomObjFromDomain
Similarly to qemu driver, we can use a helper function to
lookup a domain instead of copying multiple lines around.
2013-07-18 14:16:54 +02:00
Michal Privoznik
eb150c86b4 Remove lxcDriverLock from almost everywhere
With the majority of fields in the virLXCDriverPtr struct
now immutable or self-locking, there is no need for practically
any methods to be using the LXC driver lock. Only a handful
of helper APIs now need it.
2013-07-18 14:16:54 +02:00
Michal Privoznik
2a82171aff lxc: Make activeUsbHostdevs use locks
The activeUsbHostdevs item in LXCDriver are lockable, but the lock has
to be called explicitly. Call the virObject(Un)Lock() in order to
achieve mutual exclusion once lxcDriverLock is removed.
2013-07-18 14:16:54 +02:00
Michal Privoznik
64ec738e58 Stop accessing driver->caps directly in LXC driver
The 'driver->caps' pointer can be changed on the fly. Accessing
it currently requires the global driver lock. Isolate this
access in a single helper, so a future patch can relax the
locking constraints.
2013-07-18 14:16:54 +02:00
Michal Privoznik
c86950533a lxc: switch to virCloseCallbacks API 2013-07-18 14:16:54 +02:00
Michal Privoznik
4deeb74d01 Introduce annotations for virLXCDriverPtr fields
Annotate the fields in virLXCDriverPtr to indicate the locking
rules for their use.
2013-07-18 14:16:54 +02:00
Michal Privoznik
29bed27eb4 lxc: Use atomic ops for driver->nactive 2013-07-18 14:16:54 +02:00
Michal Privoznik
7fca37554c Introduce a virLXCDriverConfigPtr object
Currently the virLXCDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virLXCDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virLXCDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.

NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
2013-07-18 14:16:53 +02:00
Michal Privoznik
7e94a1a4ea virLXCDriver: Drop unused @cgroup
It is not used anywhere, so it makes no sense to have it there.
2013-07-18 14:16:53 +02:00
Daniel P. Berrange
040d996342 Merge virCommandPreserveFD / virCommandTransferFD
Merge the virCommandPreserveFD / virCommandTransferFD methods
into a single virCommandPasFD method, and use a new
VIR_COMMAND_PASS_FD_CLOSE_PARENT to indicate their difference
in behaviour

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-18 12:18:24 +01:00
Daniel P. Berrange
11693bc6f0 LXC: Wire up the virDomainCreate{XML}WithFiles methods
Wire up the new virDomainCreate{XML}WithFiles methods in the
LXC driver, so that FDs get passed down to the init process.

The lxc_container code needs to do a little dance in order
to renumber the file descriptors it receives into linear
order, starting from STDERR_FILENO + 1.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-18 12:07:51 +01:00
Michal Privoznik
192a86cadf lxc_container: Don't call virGetGroupList during exec
Commit 75c1256 states that virGetGroupList must not be called
between fork and exec, then commit ee777e99 promptly violated
that for lxc.

Patch originally posted by Eric Blake <eblake@redhat.com>.
2013-07-17 14:26:09 +02:00
Michal Privoznik
37d96498c6 lxcCapsInit: Allocate primary security driver unconditionally
Currently, if the primary security driver is 'none', we skip
initializing caps->host.secModels. This means, later, when LXC domain
XML is parsed and <seclabel type='none'/> is found (see
virSecurityLabelDefsParseXML), the model name is not copied to the
seclabel. This leads to subsequent crash in virSecurityManagerGenLabel
where we call STREQ() over the model (note, that we are expecting model
to be !NULL).
2013-07-17 12:36:45 +02:00
Gao feng
129d25dcd9 LXC: Change the owner of live attached host devices
The owner of this host devices should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:59:41 -06:00
Gao feng
7a8212aac9 LXC: Change the owner of host devices to the root of container
These host devices are created for container,
the owner should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:59:29 -06:00
Gao feng
f87be04fd8 LXC: Create host devices for container on host side
Otherwise the container will fail to start if we
enable user namespace, since there is no rights to
do mknod in uninit user namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:59:24 -06:00
Gao feng
4f41a8e5b2 LXC: Change the owner of live attached disk device
The owner of this disk device should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:59:20 -06:00
Gao feng
14a0c4084d LXC: Move virLXCControllerChown to lxc_container.c
lxc driver will use this function to change the owner
of hot added devices.

Move virLXCControllerChown to lxc_container.c and Rename
it to lxcContainerChown.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:59:14 -06:00
Gao feng
ae4e916f04 LXC: controller: change the owner of disk to the root of container
These disk devices are created for container,
the owner should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:58:53 -06:00
Gao feng
7161f0a385 LXC: Setup disks for container on host side
Since mknod in container is forbidden, we should setup disks
on host side.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-16 09:57:38 -06:00
Daniel P. Berrange
f45dbdb213 Add a couple of debug statements to LXC driver
When failing to start a container due to inaccessible root
filesystem path, we did not log any meaningful error. Add a
few debug statements to assist diagnosis

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-12 11:06:08 +01:00
Eric Blake
ee777e9949 util: make virSetUIDGID async-signal-safe
https://bugzilla.redhat.com/show_bug.cgi?id=964358

POSIX states that multi-threaded apps should not use functions
that are not async-signal-safe between fork and exec, yet we
were using getpwuid_r and initgroups.  Although rare, it is
possible to hit deadlock in the child, when it tries to grab
a mutex that was already held by another thread in the parent.
I actually hit this deadlock when testing multiple domains
being started in parallel with a command hook, with the following
backtrace in the child:

 Thread 1 (Thread 0x7fd56bbf2700 (LWP 3212)):
 #0  __lll_lock_wait ()
     at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
 #1  0x00007fd5761e7388 in _L_lock_854 () from /lib64/libpthread.so.0
 #2  0x00007fd5761e7257 in __pthread_mutex_lock (mutex=0x7fd56be00360)
     at pthread_mutex_lock.c:61
 #3  0x00007fd56bbf9fc5 in _nss_files_getpwuid_r (uid=0, result=0x7fd56bbf0c70,
     buffer=0x7fd55c2a65f0 "", buflen=1024, errnop=0x7fd56bbf25b8)
     at nss_files/files-pwd.c:40
 #4  0x00007fd575aeff1d in __getpwuid_r (uid=0, resbuf=0x7fd56bbf0c70,
     buffer=0x7fd55c2a65f0 "", buflen=1024, result=0x7fd56bbf0cb0)
     at ../nss/getXXbyYY_r.c:253
 #5  0x00007fd578aebafc in virSetUIDGID (uid=0, gid=0) at util/virutil.c:1031
 #6  0x00007fd578aebf43 in virSetUIDGIDWithCaps (uid=0, gid=0, capBits=0,
     clearExistingCaps=true) at util/virutil.c:1388
 #7  0x00007fd578a9a20b in virExec (cmd=0x7fd55c231f10) at util/vircommand.c:654
 #8  0x00007fd578a9dfa2 in virCommandRunAsync (cmd=0x7fd55c231f10, pid=0x0)
     at util/vircommand.c:2247
 #9  0x00007fd578a9d74e in virCommandRun (cmd=0x7fd55c231f10, exitstatus=0x0)
     at util/vircommand.c:2100
 #10 0x00007fd56326fde5 in qemuProcessStart (conn=0x7fd53c000df0,
     driver=0x7fd55c0dc4f0, vm=0x7fd54800b100, migrateFrom=0x0, stdin_fd=-1,
     stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
     flags=1) at qemu/qemu_process.c:3694
 ...

The solution is to split the work of getpwuid_r/initgroups into the
unsafe portions (getgrouplist, called pre-fork) and safe portions
(setgroups, called post-fork).

* src/util/virutil.h (virSetUIDGID, virSetUIDGIDWithCaps): Adjust
signature.
* src/util/virutil.c (virSetUIDGID): Add parameters.
(virSetUIDGIDWithCaps): Adjust clients.
* src/util/vircommand.c (virExec): Likewise.
* src/util/virfile.c (virFileAccessibleAs, virFileOpenForked)
(virDirCreate): Likewise.
* src/security/security_dac.c (virSecurityDACSetProcessLabel):
Likewise.
* src/lxc/lxc_container.c (lxcContainerSetID): Likewise.
* configure.ac (AC_CHECK_FUNCS_ONCE): Check for setgroups, not
initgroups.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-07-11 15:46:42 -06:00
John Ferlan
8283ef9ea2 testutils: Resolve Coverity issues
Recent changes uncovered a NEGATIVE_RETURNS in the return from sysconf()
when processing a for loop in virtTestCaptureProgramExecChild() in
testutils.c

Code review uncovered 3 other code paths with the same condition that
weren't found by Covirity, so fixed those as well.
2013-07-11 14:18:11 -04:00
Gao feng
46a46563ca LXC: remove some incorrect setting ATTRIBUTE_UNUSED
these parameters shouldn't be marked as ATTRIBUTE_UNUSED.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-11 13:43:31 +02:00
Daniel P. Berrange
a4b57dfb9e Convert 'int i' to 'size_t i' in src/lxc/ files
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-10 17:55:16 +01:00
Michal Privoznik
56965922ab Adapt to VIR_ALLOC and virAsprintf in src/lxc/* 2013-07-10 11:07:32 +02:00
Michal Privoznik
8290cbbc38 viralloc: Report OOM error on failure
Similarly to VIR_STRDUP, we want the OOM error to be reported in
VIR_ALLOC and friends.
2013-07-10 11:07:31 +02:00
Gao feng
468ee0bc4d LXC: hostdev: create parent directory for hostdev
Create parent directroy for hostdev automatically when we
start a lxc domain or attach a hostdev to a lxc domain.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-09 11:16:20 +01:00
Gao feng
c0d8c7c885 LXC: hostdev: introduce lxcContainerSetupHostdevCapsMakePath
This helper function is used to create parent directory for
the hostdev which will be added to the container. If the
parent directory of this hostdev doesn't exist, the mknod of
the hostdev will fail. eg with /dev/net/tun

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-09 11:15:11 +01:00
Richard Weinberger
9a0ac6d9c2 LXC: Create /dev/tty within a container
Many applications use /dev/tty to read from stdin.
e.g. zypper on openSUSE.

Let's create this device node to unbreak those applications.
As /dev/tty is a synonym for the current controlling terminal
it cannot harm the host or any other containers.

Signed-off-by: Richard Weinberger <richard@nod.at>
2013-07-09 11:05:14 +01:00
Daniel P. Berrange
763973607d Add access control filtering of domain objects
Ensure that all APIs which list domain objects filter
them against the access control system.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-03 15:54:53 +01:00
Gao feng
350fd95f40 LXC: blkio: allow to setup weight_device
libivrt lxc can only set generic weight for container,
This patch allows user to setup per device blkio
weigh for container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-03 12:35:54 +01:00
Gao feng
e7b3349f5a LXC: fix memory leak when userns configuration is incorrect
We forgot to free the stack when Kernel doesn't
support user namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-03 12:19:50 +01:00
Daniel P. Berrange
1165e39ca3 Add some misc debugging to LXC startup
Add some debug logging of LXC wait/continue messages
and uid/gid map update code.
2013-07-02 14:00:13 +01:00
Daniel P. Berrange
293f717028 Ignore failure to mount SELinux filesystem in container
User namespaces will deny the ability to mount the SELinux
filesystem. This is harmless for libvirt's LXC needs, so the
error can be ignored.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-07-02 14:00:13 +01:00
Gao feng
5daa1b0132 LXC: fuse: Change files owner to the root user of container
The owner of the /proc/meminfo in container should
be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:05 +01:00
Gao feng
6c7665e150 LXC: controller: change the owner of /dev/pts and ptmx to the root of container
These files are created for container,
the owner should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:05 +01:00
Gao feng
a591ae6068 LXC: controller: change the owner of devices created on host
Since these devices are created for the container.
the owner should be the root user of the container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:05 +01:00
Gao feng
40a8fe6d25 LXC: controller: change the owner of /dev to the root user of container
container will create /dev/pts directory in /dev.
the owner of /dev should be the root user of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:05 +01:00
Gao feng
ff1a6019e9 LXC: controller: change the owner of tty devices to the root user of container
Since these tty devices will be used by container,
the owner of them should be the root user of container.

This patch also adds a new function virLXCControllerChown,
we can use this general function to change the owner of
files.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:04 +01:00
Gao feng
e1d32bb955 LXC: Creating devices for container on host side
user namespace doesn't allow to create devices in
uninit userns. We should create devices on host side.

We first mount tmpfs on dev directroy under state dir
of container. then create devices under this dev dir.

Finally in container, mount the dev directroy created
on host to the /dev/ directroy of container.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:04 +01:00
Gao feng
9a085a228c LXC: introduce virLXCControllerSetupUserns and lxcContainerSetID
This patch introduces new helper function
virLXCControllerSetupUserns, in this function,
we set the files uid_map and gid_map of the init
task of container.

lxcContainerSetID is used for creating cred for
tasks running in container. Since after setuid/setgid,
we may be a new user. This patch calls lxcContainerSetUserns
at first to make sure the new created files belong to
right user.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:04 +01:00
Gao feng
8b58336eec LXC: enable user namespace only when user set the uidmap
User namespace will be enabled only when the idmap exist
in configuration.

If you want disable user namespace,just remove these
elements from XML.

If kernel doesn't support user namespace and idmap exist
in configuration file, libvirt lxc will start failed and
return "Kernel doesn't support user namespace" message.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-07-02 11:20:04 +01:00
Jiri Denemark
c40ed4168a Rename virTypedParameterArrayValidate as virTypedParamsValidate 2013-06-25 00:38:24 +02:00
Daniel P. Berrange
279866d550 Add ACL checks into the LXC driver
Insert calls to the ACL checking APIs in all LXC driver
entrypoints.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-06-24 15:25:43 +01:00
John Ferlan
38ada092d1 lxc: Resolve issue with GetScheduler APIs for non running domain
As a consequence of the cgroup layout changes from commit 'cfed9ad4', the
lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs
failed to return data for a non running domain.  This can be seen through
a 'virsh schedinfo <domain>' command which returns:

Scheduler      : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted

Prior to that change a non running domain would return:

Scheduler      : posix
cpu_shares     : 0
vcpu_period    : 0
vcpu_quota     : 0
emulator_period: 0
emulator_quota : 0

This patch will restore the capability to return configuration only data
for a non running domain regardless of whether cgroups are available.
2013-06-19 15:01:48 -04:00
Richard Weinberger
1133404c73 LXC: s/chroot/chdir in lxcContainerPivotRoot()
...fixes a trivial copy&paste error.

Signed-off-by: Richard Weinberger <richard@nod.at>
2013-06-14 11:24:41 +02:00
Ján Tomko
e557766c3b Replace two-state local integers with bool
Found with 'git grep "= 1"'.
2013-06-06 17:22:53 +02:00
Daniel P. Berrange
922ebe4ead Ensure non-root can read /proc/meminfo file in LXC containers
By default files in a FUSE mount can only be accessed by the
user which created them, even if the file permissions would
otherwise allow it. To allow other users to access the FUSE
mount the 'allow_other' mount option must be used. This bug
prevented non-root users in an LXC container from reading
the /proc/meminfo file.

https://bugzilla.redhat.com/show_bug.cgi?id=967977

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-06-05 14:02:20 +01:00
Daniel P. Berrange
61e672b23e Remove legacy code for single-instance devpts filesystem
Earlier commit f7e8653f dropped support for using LXC with
kernels having single-instance devpts filesystem from the
LXC controller. It forgot to remove the same code from the
LXC container setup.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-06-05 14:01:54 +01:00
Eric Blake
1add9c78da maint: don't use config.h in .h files
Enforce the rule that .h files don't need to (redundantly)
include <config.h>.

* cfg.mk (sc_prohibit_config_h_in_headers): New rule.
(_virsh_includes): Delete; instead, inline a smaller number of
exclusions...
(exclude_file_name_regexp--sc_require_config_h)
(exclude_file_name_regexp--sc_require_config_h_first): ...here.
* daemon/libvirtd.h (includes): Fix offenders.
* src/driver.h (includes): Likewise.
* src/gnutls_1_0_compat.h (includes): Likewise.
* src/libxl/libxl_conf.h (includes): Likewise.
* src/libxl/libxl_driver.h (includes): Likewise.
* src/lxc/lxc_conf.h (includes): Likewise.
* src/lxc/lxc_driver.h (includes): Likewise.
* src/lxc/lxc_fuse.h (includes): Likewise.
* src/network/bridge_driver.h (includes): Likewise.
* src/phyp/phyp_driver.h (includes): Likewise.
* src/qemu/qemu_conf.h (includes): Likewise.
* src/util/virnetlink.h (includes): Likewise.

Signed-off-by: Eric Blake <eblake@redhat.com>
2013-06-05 05:53:25 -06:00
Osier Yang
1ea88abd7e src/lxc: Remove the whitespace before ";" 2013-05-21 23:41:45 +08:00
Gao feng
7adfda0d6d LXC: move the comments to the proper place
The comments is for virLXCControllerSetupPrivateNS.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-20 12:45:02 -06:00
Gao feng
2a3466fafb LXC: fix memory leak in virLXCControllerSetupDevPTS
We forgot to free the mount_options.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-20 12:45:02 -06:00
Gao feng
eae1c286a1 LXC: remove unnecessary check on root filesystem
After commit c131525bec
"Auto-add a root <filesystem> element to LXC containers on startup"
for libvirt lxc, root must be existent.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-20 12:45:01 -06:00
Daniel P. Berrange
63ea1e5432 Re-add selinux/selinux.h to lxc_container.c
Re-add the selinux header to lxc_container.c since other
functions now use it, beyond the patch that was just
reverted.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-17 10:59:25 +01:00
Daniel P. Berrange
7bebd88871 Revert "Change label of fusefs mounted at /proc/meminfo in lxc containers"
This reverts commit 940c6f1085.
2013-05-17 10:22:54 +01:00
Daniel P. Berrange
95c6cc344b Don't mount selinux fs in LXC if selinux is disabled
Before trying to mount the selinux filesystem in a container
use is_selinux_enabled() to check if the machine actually
has selinux support (eg not booted with selinux=0)

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-16 16:28:53 +01:00
Daniel P. Berrange
d7d7581b03 Fix LXC startup when /var/run is an absolute symlink
During startup, the LXC driver uses paths such as

  /.oldroot/var/run/libvirt/lxc/...

to access directories from the previous root filesystem
after doing a pivot_root(). Unfortunately if /var/run
is an absolute symlink to /run, instead of a relative
symlink to ../run, these paths break.

At least one Linux distro is known to use an absolute
symlink for /var/run, so workaround this, by resolving
all symlinks before doing the pivot_root().

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-16 16:28:53 +01:00
Dan Walsh
940c6f1085 Change label of fusefs mounted at /proc/meminfo in lxc containers
We do not want to allow contained applications to be able to read fusefs_t.
So we want /proc/meminfo label to match the system default proc_t.

Fix checking of error codes
2013-05-15 17:39:22 +02:00
Daniel P. Berrange
7bb7510de7 Remove obsolete skipRoot flag in LXC driver
The lxcContainerMountAllFS method had a 'bool skipRoot'
flag to control whether it mounts the / filesystem. Since
removal of the non-pivot root container setup codepaths,
this flag is obsolete as the only caller always passes
'true'.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-15 17:29:35 +02:00
Daniel P. Berrange
31453a837b Stop passing around old root directory prefix
Many methods accept a string parameter specifying the
old root directory prefix. Since removal of the non-pivot
root container setup codepaths, this parameter is obsolete
in many methods where the callers always pass "/.oldroot".

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-15 17:29:35 +02:00
Daniel P. Berrange
37cebfec92 Remove obsolete pivotRoot flag in LXC driver
The lxcContainerMountBasicFS method had a 'bool pivotRoot'
flag to control whether it mounted a private /dev. Since
removal of the non-pivot root container setup codepaths,
this flag is obsolete as the only caller always passes
'true'.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-15 17:29:35 +02:00
Daniel P. Berrange
6b5f12c805 Support NBD backed disks/filesystems in LXC driver
The LXC driver can already configure <disk> or <filesystem>
devices to use the loop device. This extends it to also allow
for use of the NBD device, to support non-raw formats.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-13 13:15:19 +01:00
Daniel P. Berrange
c8fa7e8c55 Re-arrange code setting up ifs/disk loop devices for LXC
The current code for setting up loop devices to LXC disks first
does a switch() based on the disk format, then looks at the
disk driver name. Reverse this so it first looks at the driver
name, and then the disk format. This is more useful since the
list of supported disk formats depends on what driver is used.

The code for setting loop devices for LXC fs entries also needs
to have the same logic added, now the XML schema supports this.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-13 13:15:19 +01:00
Michal Privoznik
a96d7f3c8f Adapt to VIR_STRDUP and VIR_STRNDUP in src/lxc/* 2013-05-09 14:00:45 +02:00
John Ferlan
649ecb704f lxc: Coverity false positive USE_AFTER_FREE 2013-05-08 06:16:53 -04:00
Daniel P. Berrange
a605b7e041 Unmerge attach/update/modify device APIs in drivers
The LXC, QEMU, and LibXL drivers have all merged their handling of
the attach/update/modify device APIs into one large

  'xxxxDomainModifyDeviceFlags'

which then does a 'switch()' based on the actual API being invoked.
While this saves some lines of code, it is not really all that
significant in the context of the driver API impls as a whole.

This merger of the handling of different APIs creates pain when
wanting to automated analysis of the code and do things which
are specific to individual APIs. The slight duplication of code
from unmerged the API impls, is preferrable to allow for easier
automated analysis.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-08 10:47:48 +01:00
Daniel P. Berrange
4a044d0256 Separate internal node suspend APIs from public API
The individual hypervisor drivers were directly referencing
APIs in virnodesuspend.c in their virDriverPtr struct. Separate
these methods, so there is always a wrapper in the hypervisor
driver. This allows the unused virConnectPtr args to be removed
from the virnodesuspend.c file. Again this will ensure that
ACL checks will only be performed on invocations that are
directly associated with public API usage.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-08 10:47:47 +01:00
Daniel P. Berrange
1c6d4ca557 Separate internal node device APIs from public API
The individual hypervisor drivers were directly referencing
APIs in src/nodeinfo.c in their virDriverPtr struct. Separate
these methods, so there is always a wrapper in the hypervisor
driver. This allows the unused virConnectPtr args to be
removed from the nodeinfo.c file. Again this will ensure that
ACL checks will only be performed on invocations that are
directly associated with public API usage.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-08 10:47:47 +01:00
Daniel P. Berrange
ead630319d Separate virGetHostname() API contract from driver APIs
Currently the virGetHostname() API has a bogus virConnectPtr
parameter. This is because virtualization drivers directly
reference this API in their virDriverPtr tables, tieing its
API design to the public virConnectGetHostname API design.

This also causes problems for access control checks since
these must only be done for invocations from the public
API, not internal invocation.

Remove the bogus virConnectPtr parameter, and make each
hypervisor driver provide a dedicated function for the
driver API impl. This will allow access control checks
to be easily inserted later.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-05-08 10:47:47 +01:00