Commit Graph

177 Commits

Author SHA1 Message Date
Daniel P. Berrange
ef38965c30 Convert HAVE_CAPNG to WITH_CAPNG
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-01-14 13:25:06 +00:00
Daniel P. Berrange
63f18f3786 Convert HAVE_SELINUX to WITH_SELINUX
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2013-01-14 13:24:49 +00:00
Gao feng
ae9874e471 libvirt: lxc: fix incorrect parameter of lxcContainerMountProcFuse
when we has no host's src mapped to container.
there is no .oldroot dir,so libvirt lxc will fail
to start when mouting meminfo.

in this case,the parameter srcprefix of function
lxcContainerMountProcFuse should be NULL.and make
this method handle NULL correctly.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-01-09 15:08:42 +01:00
John Ferlan
36ac6e37be lxc: Avoid possible NULL dereference on *root prior to opendir().
If running on older Linux without mounted cgroups then its possible that
*root would be NULL.
2013-01-07 17:11:57 -07:00
Daniel P. Berrange
f24404a324 Rename virterror.c virterror_internal.h to virerror.{c,h} 2012-12-21 11:19:50 +00:00
Daniel P. Berrange
e861b31275 Rename uuid.{c,h} to viruuid.{c,h} 2012-12-21 11:19:49 +00:00
Daniel P. Berrange
44f6ae27fe Rename util.{c,h} to virutil.{c,h} 2012-12-21 11:19:49 +00:00
Daniel P. Berrange
ab9b7ec2f6 Rename memory.{c,h} to viralloc.{c,h} 2012-12-21 11:17:14 +00:00
Daniel P. Berrange
936d95d347 Rename logging.{c,h} to virlog.{c,h} 2012-12-21 11:17:14 +00:00
Daniel P. Berrange
ebc8db5189 Rename hostusb.{c,h} to virusb.{c,h} 2012-12-21 11:17:13 +00:00
Daniel P. Berrange
04d9510f50 Rename command.{c,h} to vircommand.{c,h} 2012-12-21 11:17:13 +00:00
Daniel P. Berrange
c25c18f71b Convert capabilities / domain_conf to use virArch
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-12-18 16:53:03 +00:00
Daniel P. Berrange
83a9c93807 Add support for misc host device passthrough with LXC
This extends support for host device passthrough with LXC to
cover misc devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-12-17 17:50:51 +00:00
Daniel P. Berrange
313669d1c1 Add support for storage host device passthrough with LXC
This extends support for host device passthrough with LXC to
cover storage devices. In this case all we need todo is a
mknod in the container's /dev and whitelist the device in
cgroups

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-12-17 17:50:51 +00:00
Daniel P. Berrange
95fef5f407 Add support for USB host device passthrough with LXC
This adds support for host device passthrough with the
LXC driver. Since there is only a single kernel image,
it doesn't make sense to pass through PCI devices, but
USB devices are fine. For the latter we merely need to
make the /dev/bus/usb/NNN/MMM character device exist
in the container's /dev

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-12-17 17:50:51 +00:00
Daniel P. Berrange
368e341ac1 Add support for disks with LXC
Currently LXC guests can be given arbitrary pre-mounted
filesystems, however, for some usecases it is more appropriate
to provide block devices which the container can mount itself.
This first impl only allows for <disk type='block'>, in other
words exposing a host disk device to a container. Since LXC
does not have device namespace virtualization, we are cheating
a little bit. If the XML specifies /dev/sdc4 to be given to
the container as /dev/sda1, when we do the mknod /dev/sda1
in the container's /dev, we actually use the major:minor
number of /dev/sdc4, not /dev/sda1.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-12-17 17:50:51 +00:00
Gao feng
df33ecdd9e mount fuse's meminfo file to container's /proc/meminfo
we already have virtualize meminfo for container through fuse filesystem,
add function lxcContainerMountProcFuse to mount this meminfo file to
the container's /proc/meminfo.

So we can isolate container's /proc/meminfo from host now.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-11-28 10:28:49 +00:00
Daniel P. Berrange
f999e2fdce Pass virSecurityManagerPtr object further down into LXC setup code
Currently the lxcContainerSetupMounts method uses the
virSecurityManagerPtr instance to obtain the mount options
string and then only passes the string down into methods
it calls. As functionality in LXC grows though, those
methods need to have direct access to the virSecurityManagerPtr
instance. So push the code down a level.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-27 16:45:09 +00:00
Daniel P. Berrange
3f6470f753 Fix error handling in virSecurityManagerGetMountOptions
The impls of virSecurityManagerGetMountOptions had no way to
return errors, since the code was treating 'NULL' as a success
value. This is somewhat pointless, since the calling code did
not want NULL in the first place and has to translate it into
the empty string "". So change the code so that the impls can
return "" directly, allowing use of NULL for error reporting
once again

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-27 16:45:04 +00:00
Daniel P. Berrange
1c04f99970 Remove spurious whitespace between function name & open brackets
The libvirt coding standard is to use 'function(...args...)'
instead of 'function (...args...)'. A non-trivial number of
places did not follow this rule and are fixed in this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-11-02 13:36:49 +00:00
Dan Walsh
2e03b08ead Linux Containers are not allowed to create device nodes.
This needs to be done before the container starts. Turning
off the mknod capability is noticed by systemd, which will
no longer attempt to create device nodes.

This eliminates SELinux AVC messages and ugly failure messages in the journal.
2012-11-01 15:14:25 -06:00
Daniel P. Berrange
9467ab6074 Move virProcess{Kill,Abort,TranslateStatus} into virprocess.{c,h}
Continue consolidation of process functions by moving some
helpers out of command.{c,h} into virprocess.{c,h}

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-09-26 10:09:57 +01:00
Daniel P. Berrange
0fb58ef5cd Rename virPid{Abort,Wait} to virProcess{Abort,Wait}
Change "Pid" to "Process" to align with the virProcessKill
API naming prefix

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-09-26 10:09:57 +01:00
Daniel P. Berrange
1532bd498a Fix start of containers with custom root filesystem
A prefix change to unmount the SELinux filesystem broke starting
of LXC containers with a custom root filesystem
2012-09-26 10:09:50 +01:00
Daniel P. Berrange
2b9189e8ad Improve some debugging log messages in LXC mount setup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-09-21 10:43:25 +01:00
Daniel P. Berrange
c15d893252 Ensure existing selinux mount is removed before mounting new one in LXC
Some kernel versions (at least RHEL-6 2.6.32) do not let you over-mount
an existing selinuxfs instance with a new one. Thus we must unmount the
existing instance inside our namespace.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-09-21 10:27:42 +01:00
Eric Blake
4ecb723b9e maint: fix up copyright notice inconsistencies
https://www.gnu.org/licenses/gpl-howto.html recommends that
the 'If not, see <url>.' phrase be a separate sentence.

* tests/securityselinuxhelper.c: Remove doubled line.
* tests/securityselinuxtest.c: Likewise.
* globally: s/;  If/.  If/
2012-09-20 16:30:55 -06:00
Daniel P. Berrange
a4fd740561 Don't assume use of /sys/fs/cgroup
The introduction of /sys/fs/cgroup came in fairly recent kernels.
Prior to that time distros would pick a custom directory like
/cgroup or /dev/cgroup. We need to auto-detect where this is,
rather than hardcoding it

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-09-07 13:30:20 +01:00
Eric Blake
6f926c5ef6 build: fix build without HAVE_CAPNG
Otherwise, a build may fail with:

lxc/lxc_conatiner.c: In function 'lxcContainerDropCapabilities':
lxc/lxc_container.c:1662:46: error: unused parameter 'keepReboot' [-Werror=unused-parameter]

* src/lxc/lxc_container.c (lxcContainerDropCapabilities): Mark
parameter unused.
2012-07-30 11:59:25 -06:00
Daniel P. Berrange
b46b1c762a Allow CAP_SYS_REBOOT on new enough kernels
Check whether the reboot() system call is virtualized, and if
it is, then allow the container to keep CAP_SYS_REBOOT.

Based on an original patch by Serge Hallyn

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-07-30 13:07:45 +01:00
Daniel P. Berrange
4343fee0a8 Replace use of lxcError with virReportError
Update all LXC code to use virReportError instead of the custom
lxcError macro

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-07-30 12:35:08 +01:00
Osier Yang
f9ce7dad60 Desert the FSF address in copyright
Per the FSF address could be changed from time to time, and GNU
recommends the following now: (http://www.gnu.org/licenses/gpl-howto.html)

  You should have received a copy of the GNU General Public License
  along with Foobar.  If not, see <http://www.gnu.org/licenses/>.

This patch removes the explicit FSF address, and uses above instead
(of course, with inserting 'Lesser' before 'General').

Except a bunch of files for security driver, all others are changed
automatically, the copyright for securify files are not complete,
that's why to do it manually:

  src/security/security_selinux.h
  src/security/security_driver.h
  src/security/security_selinux.c
  src/security/security_apparmor.h
  src/security/security_apparmor.c
  src/security/security_driver.c
2012-07-23 10:50:50 +08:00
Dan Walsh
9f5ef4d9b3 lxcContainerMountCGroups also mounts a tmpfs that needs to be labeled.
This patch passes down the sec_mount_options to the
lxcContainerMountCGroups function and then mounts the tmpfs with
the correct label.
2012-07-18 20:52:18 +01:00
Daniel J Walsh
e00184291e Mount all tmpfs filesystems with correct SELinux label
Basically within a Secure Linux Container (virt-sandbox) we want all content
that the process within the container can write to be labeled the same.  We
are labeling the physical disk correctly but when we create "RAM" based file
systems
libvirt is not labeling them, and they are defaulting to tmpfs_t, which will
will not allow the processes to write.  This patch labels the RAM based file
systems correctly.
2012-07-18 19:49:22 +01:00
Daniel P. Berrange
6068754670 Only ummount /proc, /sys, /dev if the root source is '/'
Previous commits added code to unmount the existing /proc,
/sys and /dev hierarchies on the root filesystem of the
container. This should only have been done if the container's
root filesystem was the same as the host's root. ie if
the root source is '/'.   As it is, this causes LXC containersr
to fail to start if their root source is not '/'

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-07-05 17:40:52 +01:00
Daniel P. Berrange
ba797c73e6 Move veth device management into virLXCControllerPtr object
Move the veth device name state into the virLXCControllerPtr
object and stop passing it around. Also use size_t instead
of unsigned int for the array length parameters.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-07-05 10:46:09 +01:00
Daniel P. Berrange
5bb83236c9 Remove sub-mounts under /dev when starting an LXC container
Since we are mounting a new /dev in the container, we must
remove any sub-mounts like /dev/shm, /dev/mqueue, etc,
otherwise they'll be recorded in /proc/mounts, but not be
accessible to applications.
2012-06-29 16:29:33 +01:00
Daniel J Walsh
465c055f4a Support bind mounting host files, as well as directories in LXC
Currently libvirt-lxc checks to see if the destination exists and is a
directory.  If it is not a directory then the mount fails.  Since
libvirt-lxc can bind mount files on an inode, this patch is needed to
allow us to bind mount files on files.  Currently we want to bind mount
on top of /etc/machine-id, and /etc/adjtime

If the destination of the mount point does not exists, it checks if the
src is a directory and then attempts to create a directory, otherwise it
creates an empty file for the destination.  The code will then bind mount
over the destination.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-06-25 10:56:38 +01:00
Daniel P. Berrange
3b1ddec1ef Add support for guest bind mounts with LXC
Currently you can configure LXC to bind a host directory to
a guest directory, but not to bind a guest directory to a
guest directory. While the guest container init could do
this itself, allowing it in the libvirt XML means a stricter
SELinux policy can be written
2012-06-25 10:17:56 +01:00
Daniel P. Berrange
76b644c362 Add support for RAM filesystems for LXC
Introduce a new syntax for filesystems to allow use of a RAM
filesystem

   <filesystem type='ram'>
      <source usage='10' units='MiB'/>
      <target dir='/mnt'/>
   </filesystem>

The usage units default to KiB to limit consumption of host memory.

* docs/formatdomain.html.in: Document new syntax
* docs/schemas/domaincommon.rng: Add new attributes
* src/conf/domain_conf.c: Parsing/formatting of RAM filesystems
* src/lxc/lxc_container.c: Mounting of RAM filesystems

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-06-25 10:17:56 +01:00
Gao feng
00828bebda LXC: avoid useless duplicate memory free
when lxcContainerIdentifyCGroups failed, the memory it allocated
has been freed, so we should not free this memory again in
lxcContainerSetupPivortRoot and lxcContainerSetupExtraMounts.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-19 16:09:47 +08:00
Gao feng
3477e6b0ab LXC: fix incorrect DEBUG info
print debug info "container support is enabled"
when host support the user or net namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-18 10:21:27 -06:00
Gao feng
0896265cf7 LXC: fix memory leak in lxcContainerSetupExtraMounts
kill the "return 0;" code, it will cause memory leak.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-18 10:21:05 -06:00
Eric Blake
3c42abe661 build: fix whitespace damage
Introduced in commit 1f8c33b67.

* src/lxc/lxc_container.c (lxcContainerGetSubtree): Avoid TAB.
2012-06-18 10:13:57 -06:00
Gao feng
1f8c33b672 LXC: fix memory leak in lxcContainerGetSubtree
when libvirt_lxc trigger oom error in lxcContainerGetSubtree
we should free the alloced memory for mounts.

so when lxcContainerGetSubtree failed,we should do some
memory cleanup in lxcContainerUnmountSubtree.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-18 21:28:21 +08:00
Gao feng
73e2d646fb LXC: fix memory leak in lxcContainerMountFSBlockHelper
we alloc the memory for format in lxcContainerMountDetectFilesystem
but without free it in lxcContainerMountFSBlockHelper.

this patch just call VIR_FREE to free it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-18 21:28:13 +08:00
Daniel P. Berrange
e9d8861e58 Always pivot_root event if the new root source is '/'
This reverts

  commit c16b4c43fc
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Fri May 11 15:09:27 2012 +0100

    Avoid LXC pivot root in the root source is still /

This commit broke setup of /dev, because the code which
deals with setting up a private /dev and /dev/pts only
works if you do a pivotroot.

The original intent of avoiding the pivot root was to
try and ensure the new root has a minimumal mount
tree. The better way todo this is to just unmount the
bits we don't want (ie old /proc & /sys subtrees.
So apply the logic from

  commit c529b47a75
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Fri May 11 11:35:28 2012 +0100

    Trim /proc & /sys subtrees before mounting new instances

to the pivot_root codepath as well
2012-06-14 12:02:03 -04:00
Gao feng
e49d792f29 LXC: fix memory leak in lxcContainerMountFSBlockAuto
we forgot to free fslist,just add VIR_FREE(fslist).

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-11 14:18:55 +08:00
Gao feng
0cb787bd3c LXC: fix incorrect parameter of mount in lxcContainerMountFSBind
when do remount,the source and target should be the same
values specified in the initial mount() call.

So change fs->dst to src.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-11 13:03:48 +08:00
Gao feng
a80bb970fc LXC: Delete unused variable src in lxcContainerMountBasicFS
There is no code use the variable "src" in lxcContainerMountBasicFS.
so delete it and VIR_FREE.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
2012-06-11 11:50:53 +08:00
Daniel P. Berrange
a8c0b2fed0 Remount cgroups controllers after setting up new /sys in LXC
Normal practice is for cgroups controllers to be mounted at
/sys/fs/cgroup. When setting up a container, /sys is mounted
with a new sysfs instance, thus we must re-mount all the
cgroups controllers. The complexity is that we must mount
them in the same layout as the host OS. ie if 'cpu' and 'cpuacct'
were mounted at the same location in the host we must preserve
this in the container. Also if any controllers are co-located
we must setup symlinks from the individual controller name to
the co-located mount-point

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 11:37:40 +01:00
Daniel P. Berrange
c529b47a75 Trim /proc & /sys subtrees before mounting new instances
Both /proc and /sys may have sub-mounts in them from the host
OS. We must explicitly unmount them all before mounting the
new instance over that location. If we don't then /proc/mounts
will show the sub-mounts as existing, even though nothing will
be able to access them, due to the over-mount.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 11:27:29 +01:00
Daniel P. Berrange
c16b4c43fc Avoid LXC pivot root in the root source is still /
If the LXC config has a filesystem

  <filesystem>
     <source dir='/'/>
     <target dir='/'/>
  </filesystem>

then there is no need to go down the pivot root codepath.
We can simply use the existing root as needed.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel P. Berrange
e8639920ac Mount fresh instance of sysfs/selinux in LXC
Currently to make sysfs readonly, we remount the existing
instance and then bind it readonly. Unfortunately this means
sysfs is still showing device objects wrt the host OS namespace.
We need it to reflect the container namespace, so we must mount
a completely new instance of it. Do the same for selinuxfs since
there is no benefit to bind mounting & this lets us simplify
the code.

* src/lxc/lxc_container.c: Mount fresh sysfs instance

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel Walsh
8dd5794f81 Convert the LXC driver to use the security driver API for mount options
Instead of hardcoding use of SELinux contexts in the LXC driver,
switch over to using the official security driver API.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-05-16 10:05:47 +01:00
Daniel P. Berrange
07cf96ecc7 Make lxcContainerSetStdio the last thing to be called in container startup
Once lxcContainerSetStdio is invoked, logging will not work as
expected in libvirt_lxc. So make sure this is the last thing to
be called, in particular after setting the security process label
2012-05-01 16:05:03 +01:00
Daniel P. Berrange
ec8cae93db Consistent style for usage of sizeof operator
The code is splattered with a mix of

  sizeof foo
  sizeof (foo)
  sizeof(foo)

Standardize on sizeof(foo) and add a syntax check rule to
enforce it

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2012-03-30 11:47:24 +01:00
Daniel P. Berrange
c91cff255f Add support for setting init argv for LXC
Pass argv to the init binary of LXC, using a new <initarg> element.

* docs/formatdomain.html.in: Document <os> usage for containers
* docs/schemas/domaincommon.rng: Add <initarg> element
* src/conf/domain_conf.c, src/conf/domain_conf.h: parsing and
  formatting of <initarg>
* src/lxc/lxc_container.c: Setup LXC argv
* tests/Makefile.am, tests/lxcxml2xmldata/lxc-systemd.xml,
  tests/lxcxml2xmltest.c, tests/testutilslxc.c,
  tests/testutilslxc.h: Test parsing/formatting of LXC related
  XML parts
2012-03-27 15:52:25 +01:00
Daniel P. Berrange
eb8f31c16b Detect location fo selinux mount point
The SELinux mount point moved from /selinux to /sys/fs/selinux
when systemd came along.

* configure.ac: Probe for SELinux mount point
* src/lxc/lxc_container.c: Use SELinux mount point determined
  by configure.ac
2012-03-27 15:52:25 +01:00
Daniel P. Berrange
10a8b1f958 Add support for forcing a private network namespace for LXC guests
If no <interface> elements are included in an LXC guest XML
description, then the LXC guest will just see the host's
network interfaces. It is desirable to be able to hide the
host interfaces, without having to define any guest interfaces.

This patch introduces a new feature flag <privnet/> to allow
forcing of a private network namespace for LXC. In the future
I also anticipate that we will add <privuser/> to force a
private user ID namespace.

* src/conf/domain_conf.c, src/conf/domain_conf.h: Add support
  for <privnet/> feature. Auto-set <privnet> if any <interface>
  devices are defined
* src/lxc/lxc_container.c: Honour request for private network
  namespace
2012-03-15 17:00:39 +00:00
Daniel P. Berrange
6e6aa000c6 Add container_uuid env variable to LXC guests
Systemd has declared that all container virtualization technologies
should set 'container_uuid' to identify themselves.

http://cgit.freedesktop.org/systemd/systemd/commit/?id=09b967eaa51a39dabb7f238927f67bd682466dbc
2012-03-15 11:20:20 +00:00
Martin Kletzander
6ba4b300b0 lxc: Cleaner fix for compilation without SELinux
Just a cleanup of commit 32f881c6c4.
2012-02-29 14:55:32 +01:00
Daniel P. Berrange
d474dbadde Populate /dev/std{in,out,err} symlinks in LXC containers
Some applications expect /dev/std{in,out,err} to exist. Populate
them during container startup as symlinks to /proc/self/fd
2012-02-08 19:50:15 +00:00
Philipp Hahn
99d24ab2e0 virterror.c: Fix several spelling mistakes
compat{a->i}bility
erron{->e}ous
nec{c->}essary.
Either "the" or "a".

Signed-off-by: Philipp Hahn <hahn@univention.de>
2012-02-03 11:32:51 -07:00
Martin Kletzander
32f881c6c4 Fixed connection definition for non-SELinux builds
This patch fixes the access of variable "con" in two files where the
variable was declared only on SELinux builds and thus the build failed
without SELinux. It's a rather nasty fix but helps fix the build
quickly and without any major changes to the code.
2012-02-03 16:13:45 +01:00
Daniel P. Berrange
5df67cdcd3 Set a security context on /dev and /dev/pts mounts
To allow the container to access /dev and /dev/pts when under
sVirt, set an explicit mount option. Also set a max size on
the /dev mount to prevent DOS on memory usage

* src/lxc/lxc_container.c: Set /dev mount context
* src/lxc/lxc_controller.c: Set /dev/pts mount context
2012-02-02 17:45:19 -07:00
Daniel P. Berrange
0f01192e7e Add support for sVirt in the LXC driver
For the sake of backwards compat, LXC guests are *not*
confined by default. This is because it is not practical
to dynamically relabel containers using large filesystem
trees. Applications can create confined containers though,
by giving suitable XML configs

* src/Makefile.am: Link libvirt_lxc to security drivers
* src/lxc/libvirtd_lxc.aug, src/lxc/lxc_conf.h,
  src/lxc/lxc_conf.c, src/lxc/lxc.conf,
  src/lxc/test_libvirtd_lxc.aug: Config file handling for
  security driver
* src/lxc/lxc_driver.c: Wire up security driver functions
* src/lxc/lxc_controller.c: Add a '--security' flag to
  specify which security driver to activate
* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Set
  the process label just before exec'ing init.
2012-02-02 17:44:39 -07:00
Eric Blake
16dc4ade7a lxc: export container=lxc-libvirt for systemd
Systemd detects containers based on whether they have
an environment variable starting with 'container=lxc';
using a longer name fits the expectations, while also
allowing detection of who created the container.

Requested by Lennart Poettering, in response to
https://bugs.freedesktop.org/show_bug.cgi?id=45175

* src/lxc/lxc_container.c (lxcContainerBuildInitCmd): Add another
env-var.
2012-01-25 08:25:37 -07:00
Daniel P. Berrange
c30a78c398 Don't bind mount onto a char device for /dev/ptmx in LXC
The current setup code for LXC is bind mounting /dev/pts/ptmx
on top of a character device /dev/ptmx. This is denied by SELinux
policy and is just wrong. The target of a bind mount should just
be a plain file

* src/lxc/lxc_container.c: Don't bind /dev/pts/ptmx onto
  a char device
2012-01-25 14:11:08 +00:00
Daniel P. Berrange
c53ba61b21 Fix startup of LXC containers with filesystems containing symlinks
Given an LXC guest with a root filesystem path of

  /export/lxc/roots/helloworld/root

During startup, we will pivot the root filesystem to end up
at

  /.oldroot/export/lxc/roots/helloworld/root

We then try to open

  /.oldroot/export/lxc/roots/helloworld/root/dev/pts

Now consider if '/export/lxc' is an absolute symlink pointing
to '/media/lxc'. The kernel will try to open

  /media/lxc/roots/helloworld/root/dev/pts

whereas it should be trying to open

  /.oldroot//media/lxc/roots/helloworld/root/dev/pts

To deal with the fact that the root filesystem can be moved,
we need to resolve symlinks in *any* part of the filesystem
source path.

* src/libvirt_private.syms, src/util/util.c,
  src/util/util.h: Add virFileResolveAllLinks to resolve
  all symlinks in a path
* src/lxc/lxc_container.c: Resolve all symlinks in filesystem
  paths during startup
2012-01-18 13:34:42 +00:00
Daniel P. Berrange
428cffb1e7 Move LXC veth.c code into shared utility APIs
Move the virNetDevSetName and virNetDevSetNamespace APIs out
of LXC's veth.c and into virnetdev.c.

Move the remaining content of the file to src/util/virnetdevveth.c

* src/lxc/veth.c: Rename to src/util/virnetdevveth.c
* src/lxc/veth.h: Rename to src/util/virnetdevveth.h
* src/util/virnetdev.c, src/util/virnetdev.h: Add
  virNetDevSetName and virNetDevSetNamespace
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
  src/lxc/lxc_driver.c: Update include paths
2011-11-15 10:28:02 +00:00
Daniel P. Berrange
29b242ad80 Rename the LXC veth management APIs and delete duplicated APIs
The src/lxc/veth.c file contains APIs for managing veth devices,
but some of the APIs duplicate stuff from src/util/virnetdev.h.
Delete thed duplicate APIs and rename the remaining ones to
follow virNetDevVethXXXX

* src/lxc/veth.c, src/lxc/veth.h: Rename APIs & delete duplicates
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
  src/lxc/lxc_driver.c: Update for API renaming
2011-11-15 10:28:02 +00:00
Eric Blake
e55ec69de6 build: drop useless dirent.h includes
* .gnulib: Update to latest, for improved syntax-check.
* src/lxc/lxc_container.c (includes): Drop unused include.
* src/network/bridge_driver.c: Likewise.
* src/node_device/node_device_linux_sysfs.c: Likewise.
* src/openvz/openvz_driver.c: Likewise.
* src/qemu/qemu_conf.c: Likewise.
* src/storage/storage_backend_iscsi.c: Likewise.
* src/storage/storage_backend_mpath.c: Likewise.
* src/uml/uml_conf.c: Likewise.
* src/uml/uml_driver.c: Likewise.
2011-11-11 14:12:37 -07:00
Daniel P. Berrange
0f31f7b794 Add support for multiple consoles in LXC
Currently the LXC controller only supports setup of a single
text console. This is wired up to the container init's stdio,
as well as /dev/console and /dev/tty1. Extending support for
multiple consoles, means wiring up additional PTYs to /dev/tty2,
/dev/tty3, etc, etc. The LXC controller is passed multiple open
file handles, one for each console requested.

* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Wire up
  all the /dev/ttyN links required to symlink to /dev/pts/NN
* src/lxc/lxc_container.h: Open more container side /dev/pts/NN
  devices, and adapt event loop to handle I/O from all consoles
* src/lxc/lxc_driver.c: Setup multiple host side PTYs
2011-11-03 12:01:13 +00:00
Daniel P. Berrange
26798492e3 Add support for probing filesystem with libblkid
The LXC code for mounting container filesystems from block devices
tries all filesystems in /etc/filesystems and possibly those in
/proc/filesystems. The regular mount binary, however, first tries
using libblkid to detect the format. Add support for doing the same
in libvirt, since Fedora's /etc/filesystems is missing many formats,
most notably ext4 which is the default filesystem Fedora uses!

* src/Makefile.am: Link libvirt_lxc to libblkid
* src/lxc/lxc_container.c: Probe filesystem format with libblkid
2011-11-01 18:40:37 +00:00
Daniel P. Berrange
6828535669 Fix error message when failing to detect filesystem
If we looped through /etc/filesystems trying to mount with each
type and failed all options, we forget to actually raise an
error message.

* src/lxc/lxc_container.c: Raise error if unable to detect
  the filesystems. Also fix existing error message
2011-11-01 18:40:37 +00:00
Daniel P. Berrange
878cc33a6a Workaround for broken kernel autofs mounts
The kernel automounter is mostly broken wrt to containers. Most
notably if you start a new filesystem namespace and then attempt
to unmount any autofs filesystem, it will typically fail with a
weird error message like

  Failed to unmount '/.oldroot/sys/kernel/security':Too many levels of symbolic links

Attempting to detach the autofs mount using umount2(MNT_DETACH)
will also fail with the same error. Therefore if we get any error on
unmount()ing a filesystem from the old root FS when starting a
container, we must immediately break out and detach the entire
old root filesystem (ignoring any mounts below it).

This has the effect of making the old root filesystem inaccessible
to anything inside the container, but at the cost that the mounts
live on in the kernel until the container exits. Given that SystemD
uses autofs by default, we need LXC to be robust this scenario and
thus this tradeoff is worthwhile.

* src/lxc/lxc_container.c: Detach root filesystem if any umount
  operation fails.
2011-11-01 18:40:37 +00:00
Daniel P. Berrange
a02f57faa9 Correctly handle '*' in /etc/filesystems
The /etc/filesystems file can contain a '*' on the last line to
indicate that /proc/filessystems should be tried next. We have
a check that this '*' only occurs on the last line. Unfortunately
when we then start reading /proc/filesystems, we mistakenly think
we've seen '*' in /proc/filesystems and fail

* src/lxc/lxc_container.c: Skip '*' validation when we're reading
  /proc/filesystems
2011-11-01 18:40:37 +00:00
Daniel P. Berrange
065ecf5162 Ensure errno is valid when returning from lxcContainerWaitForContinue
Only some of the return paths of lxcContainerWaitForContinue will
have set errno. In other paths we need to set it manually to avoid
the caller getting a random stale errno value

* src/lxc/lxc_container.c: Set errno in lxcContainerWaitForContinue
2011-11-01 18:40:37 +00:00
Eric Blake
69d044c034 waitpid: improve safety
Based on a report by Coverity.  waitpid() can leak resources if it
fails with EINTR, so it should never be used without checking return
status.  But we already have a helper function that does that, so
use it in more places.

* src/lxc/lxc_container.c (lxcContainerAvailable): Use safer
virWaitPid.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput, virtTestMain):
Likewise.
* src/libvirt.c (virConnectAuthGainPolkit): Simplify with virCommand.
2011-10-24 15:42:52 -06:00
Eric Blake
dbbe16c26e maint: typo fixes
I noticed a couple typos in recent commits, and fixed the remaining
instances of them.

* docs/internals/command.html.in: Fix spelling errors.
* include/libvirt/libvirt.h.in (virConnectDomainEventCallback):
Likewise.
* python/libvirt-override.py (virEventAddHandle): Likewise.
* src/lxc/lxc_container.c (lxcContainerChild): Likewise.
* src/util/hash.c (virHashCreateFull): Likewise.
* src/storage/storage_backend_logical.c
(virStorageBackendLogicalMakeVol): Likewise.
* src/esx/esx_driver.c (esxFormatVMXFileName): Likewise.
* src/vbox/vbox_tmpl.c (vboxIIDIsEqual_v3_x): Likewise.
2011-10-10 14:02:06 -06:00
Daniel P. Berrange
652f887144 Allow passing of command line args to LXC container
When booting a virtual machine with a kernel/initrd it is possible
to pass command line arguments using the <cmdline>...args...</cmdline>
element in the guest XML. These appear to the kernel / init process
in /proc/cmdline.

When booting a container we do not have a custom /proc/cmdline,
but we can easily set an environment variable for it. Ideally
we could pass individual arguments to the init process as a
regular set of 'char *argv[]' parameters, but that would involve
libvirt parsing the <cmdline> XML text. This can easily be added
later, even if we add the env variable now

* docs/drvlxc.html.in: Document env variables passed to LXC
* src/conf/domain_conf.c: Add <cmdline> to be parsed for
  guests of type='exe'
* src/lxc/lxc_container.c: Set LIBVIRT_LXC_CMDLINE env var
2011-10-04 14:15:09 +01:00
Michal Privoznik
45ad3d6962 debug: Annotate some variables as unused
as they are not used with debugging turned off.
2011-09-27 10:16:46 +02:00
Serge Hallyn
c1665ba872 Create ptmx as a device
Hi,

I'm seeing an issue with udev and libvirt-lxc.  Libvirt-lxc creates
/dev/ptmx as a symlink to /dev/pts/ptmx.  When udev starts up, it
checks the device type, sees ptmx is 'not right', and replaces it
with a 'proper' ptmx.

In lxc, /dev/ptmx is bind-mounted from /dev/pts/ptmx instead of being
symlinked, so udev sees the right device type and leaves it alone.

A patch like the following seems to work for me.  Would there be
any objections to this?

>From 4c5035de52de7e06a0de9c5d0bab8c87a806cba7 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@domU-12-31-39-14-F0-B3.compute-1.internal>
Date: Wed, 31 Aug 2011 18:15:54 +0000
Subject: [PATCH 1/1] make ptmx a bind mount rather than symlink

udev on some systems checks the device type of /dev/ptmx, and replaces it if
not as expected.  The symlink created by libvirt-lxc therefore gets replaced.
By creating it as a bind mount, the device type is correct and udev leaves it
alone.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
2011-09-01 20:11:50 -06:00
Eric Blake
3a52b864dd maint: fix comment typos
* src/qemu/qemu_driver.c (qemuDomainSaveInternal): Fix typo.
* src/conf/domain_event.c (virDomainEventDispatchMatchCallback):
Likewise.
* daemon/libvirtd.c (daemonRunStateInit): Likewise.
* src/lxc/lxc_container.c (lxcContainerChildMountSort): Likewise.
* src/util/virterror.c (virCopyError, virRaiseErrorFull): Likewise.
* src/xenxs/xen_sxpr.c (xenParseSxprSound): Likewise.
2011-08-23 11:31:28 -06:00
Daniel P. Berrange
5f5c6fde00 Allow use of file images for LXC container filesystems
A previous commit gave the LXC driver the ability to mount
block devices for the container filesystem. Through use of
the loopback device functionality, we can build on this to
support use of plain file images for LXC filesytems.

By setting the LO_FLAGS_AUTOCLEAR flag we can ensure that
the loop device automatically disappears when the container
dies / shuts down

* src/lxc/lxc_container.c: Raise error if we see a file
  based filesystem, since it should have been turned into
  a loopback device already
* src/lxc/lxc_controller.c: Rewrite any filesystems of
  type=file, into type=block, by binding the file image
  to a free loop device
2011-08-08 11:38:09 +01:00
Daniel P. Berrange
77791dc0e1 Allow use of block devices for guest filesystem
Currently the LXC driver can only populate filesystems from
host filesystems, using bind mounts. This patch allows host
block devices to be mounted. It autodetects the filesystem
format at mount time, and adds the block device to the cgroups
ACL. Example usage is

    <filesystem type='block' accessmode='passthrough'>
      <source dev='/dev/sda1'/>
      <target dir='/home'/>
    </filesystem>

* src/lxc/lxc_container.c: Mount block device filesystems
* src/lxc/lxc_controller.c: Add block device filesystems
  to cgroups ACL
2011-08-08 11:38:05 +01:00
Daniel P. Berrange
b6bd2d3466 Don't mount /dev for application containers
An application container shouldn't get a private /dev. Fix
the regression from 6d37888e6a

* src/lxc/lxc_container.c: Don't mount /dev for app containers
2011-08-08 11:24:35 +01:00
Daniel P. Berrange
b3ad9b9b80 Honour filesystem readonly flag & make special FS readonly
A container should not be allowed to modify stuff in /sys
or /proc/sys so make them readonly. Make /selinux readonly
so that containers think that selinux is disabled.

Honour the readonly flag when mounting container filesystems
from the guest XML config

* src/lxc/lxc_container.c: Support readonly mounts
2011-07-22 15:31:11 +01:00
Daniel P. Berrange
6d37888e6a Refactor mounting of special filesystems
Even in non-virtual root filesystem mode we should be mounting
more than just a new /proc. Refactor lxcContainerMountBasicFS
so that it does everything except for /dev and /dev/pts moving
that into lxcContainerMountDevFS. Pass in a source prefix
to lxcContainerMountBasicFS() so it can be used in both shared
root and private root modes.

* src/lxc/lxc_container.c: Unify mounting code for special
  filesystems
2011-07-22 15:31:11 +01:00
Daniel P. Berrange
66a00e61a4 Pull code for doing a bind mount into separate method
The bind mount setup is about to get more complicated.
To avoid having to deal with several copies, pull it
out into a separate lxcContainerMountFSBind method.

Also pull out the iteration over container filesystems,
so that it will be easier to drop in support for non-bind
mount filesystems

* src/lxc/lxc_container.c: Pull bind mount code out into
  lxcContainerMountFSBind
2011-07-22 15:31:07 +01:00
Eric Blake
8e22e08935 build: rename files.h to virfile.h
In preparation for a future patch adding new virFile APIs.

* src/util/files.h, src/util/files.c: Move...
* src/util/virfile.h, src/util/virfile.c: ...here, and rename
functions to virFile prefix.  Macro names are intentionally
left alone.
* *.c: All '#include "files.h"' uses changed.
* src/Makefile.am (UTIL_SOURCES): Reflect rename.
* cfg.mk (exclude_file_name_regexp--sc_prohibit_close): Likewise.
* src/libvirt_private.syms: Likewise.
* docs/hacking.html.in: Likewise.
* HACKING: Regenerate.
2011-07-21 10:34:51 -06:00
Eric Blake
5037cea55e lxc: reject unknown flags
* src/lxc/lxc_driver.c (lxcOpen, lxcDomainSetMemoryParameters)
(lxcDomainGetMemoryParameters): Reject unknown flags.
* src/lxc/lxc_container.c (lxcContainerStart): Rename flags to
cflags to reflect that it is not tied to libvirt.
2011-07-13 14:42:05 -06:00
Matthias Bolte
e123e1ee6b Fix return value semantic of virFileMakePath
Some callers expected virFileMakePath to set errno, some expected
it to return an errno value. Unify this to return 0 on success and
-1 on error. Set errno to report detailed error information.

Also optimize virFileMakePath if stat fails with an errno different
from ENOENT.
2011-07-06 09:27:06 +02:00
Cole Robinson
f9e8d6a065 lxc: Ensure container <init> actually exists
Since we can't really get useful error reporting from virCommandExec since
it needs to be the last thing we do.
2011-06-07 14:38:54 -04:00
Cole Robinson
a7e2dd1c32 lxc: controller: Improve container error reporting
Add a handshake with the cloned container process to try and detect
if it fails to start.
2011-06-07 14:38:54 -04:00
Cole Robinson
965a957ccc lxc: Improve guest startup error reporting
Add a simple handshake with the lxc_controller process so we can detect
process startup failures. We do this by adding a new --handshake cli arg
to lxc_controller for passing a file descriptor. If the process fails to
launch, we scrape all output from the logfile and report it to the user.
2011-06-07 14:38:39 -04:00
Cole Robinson
6973594ca8 lxc: Don't report error in Wait/SendContinue
We will reuse these shortly, and each use should have a different error
message.
2011-06-07 14:32:03 -04:00
Cole Robinson
eee1763c8c lxc: Drop container stdio as late as possible
Makes it more likely we get useful error output in the logs
2011-06-07 14:32:03 -04:00
Lai Jiangshan
b65f37a4a1 libvirt,logging: cleanup VIR_XXX0()
These VIR_XXXX0 APIs make us confused, use the non-0-suffix APIs instead.

How do these coversions works? The magic is using the gcc extension of ##.
When __VA_ARGS__ is empty, "##" will swallow the "," in "fmt," to
avoid compile error.

example: origin				after CPP
	high_level_api("%d", a_int)	low_level_api("%d", a_int)
	high_level_api("a  string")	low_level_api("a  string")

About 400 conversions.

8 special conversions:
VIR_XXXX0("") -> VIR_XXXX("msg") (avoid empty format) 2 conversions
VIR_XXXX0(string_literal_with_%) -> VIR_XXXX(%->%%) 0 conversions
VIR_XXXX0(non_string_literal) -> VIR_XXXX("%s", non_string_literal)
  (for security) 6 conversions

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2011-05-11 12:41:14 -06:00