Move the virNetDevSetName and virNetDevSetNamespace APIs out
of LXC's veth.c and into virnetdev.c.
Move the remaining content of the file to src/util/virnetdevveth.c
* src/lxc/veth.c: Rename to src/util/virnetdevveth.c
* src/lxc/veth.h: Rename to src/util/virnetdevveth.h
* src/util/virnetdev.c, src/util/virnetdev.h: Add
virNetDevSetName and virNetDevSetNamespace
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
src/lxc/lxc_driver.c: Update include paths
The src/lxc/veth.c file contains APIs for managing veth devices,
but some of the APIs duplicate stuff from src/util/virnetdev.h.
Delete thed duplicate APIs and rename the remaining ones to
follow virNetDevVethXXXX
* src/lxc/veth.c, src/lxc/veth.h: Rename APIs & delete duplicates
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
src/lxc/lxc_driver.c: Update for API renaming
Currently the LXC controller only supports setup of a single
text console. This is wired up to the container init's stdio,
as well as /dev/console and /dev/tty1. Extending support for
multiple consoles, means wiring up additional PTYs to /dev/tty2,
/dev/tty3, etc, etc. The LXC controller is passed multiple open
file handles, one for each console requested.
* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Wire up
all the /dev/ttyN links required to symlink to /dev/pts/NN
* src/lxc/lxc_container.h: Open more container side /dev/pts/NN
devices, and adapt event loop to handle I/O from all consoles
* src/lxc/lxc_driver.c: Setup multiple host side PTYs
The LXC code for mounting container filesystems from block devices
tries all filesystems in /etc/filesystems and possibly those in
/proc/filesystems. The regular mount binary, however, first tries
using libblkid to detect the format. Add support for doing the same
in libvirt, since Fedora's /etc/filesystems is missing many formats,
most notably ext4 which is the default filesystem Fedora uses!
* src/Makefile.am: Link libvirt_lxc to libblkid
* src/lxc/lxc_container.c: Probe filesystem format with libblkid
If we looped through /etc/filesystems trying to mount with each
type and failed all options, we forget to actually raise an
error message.
* src/lxc/lxc_container.c: Raise error if unable to detect
the filesystems. Also fix existing error message
The kernel automounter is mostly broken wrt to containers. Most
notably if you start a new filesystem namespace and then attempt
to unmount any autofs filesystem, it will typically fail with a
weird error message like
Failed to unmount '/.oldroot/sys/kernel/security':Too many levels of symbolic links
Attempting to detach the autofs mount using umount2(MNT_DETACH)
will also fail with the same error. Therefore if we get any error on
unmount()ing a filesystem from the old root FS when starting a
container, we must immediately break out and detach the entire
old root filesystem (ignoring any mounts below it).
This has the effect of making the old root filesystem inaccessible
to anything inside the container, but at the cost that the mounts
live on in the kernel until the container exits. Given that SystemD
uses autofs by default, we need LXC to be robust this scenario and
thus this tradeoff is worthwhile.
* src/lxc/lxc_container.c: Detach root filesystem if any umount
operation fails.
The /etc/filesystems file can contain a '*' on the last line to
indicate that /proc/filessystems should be tried next. We have
a check that this '*' only occurs on the last line. Unfortunately
when we then start reading /proc/filesystems, we mistakenly think
we've seen '*' in /proc/filesystems and fail
* src/lxc/lxc_container.c: Skip '*' validation when we're reading
/proc/filesystems
Only some of the return paths of lxcContainerWaitForContinue will
have set errno. In other paths we need to set it manually to avoid
the caller getting a random stale errno value
* src/lxc/lxc_container.c: Set errno in lxcContainerWaitForContinue
Based on a report by Coverity. waitpid() can leak resources if it
fails with EINTR, so it should never be used without checking return
status. But we already have a helper function that does that, so
use it in more places.
* src/lxc/lxc_container.c (lxcContainerAvailable): Use safer
virWaitPid.
* daemon/libvirtd.c (daemonForkIntoBackground): Likewise.
* tests/testutils.c (virtTestCaptureProgramOutput, virtTestMain):
Likewise.
* src/libvirt.c (virConnectAuthGainPolkit): Simplify with virCommand.
When booting a virtual machine with a kernel/initrd it is possible
to pass command line arguments using the <cmdline>...args...</cmdline>
element in the guest XML. These appear to the kernel / init process
in /proc/cmdline.
When booting a container we do not have a custom /proc/cmdline,
but we can easily set an environment variable for it. Ideally
we could pass individual arguments to the init process as a
regular set of 'char *argv[]' parameters, but that would involve
libvirt parsing the <cmdline> XML text. This can easily be added
later, even if we add the env variable now
* docs/drvlxc.html.in: Document env variables passed to LXC
* src/conf/domain_conf.c: Add <cmdline> to be parsed for
guests of type='exe'
* src/lxc/lxc_container.c: Set LIBVIRT_LXC_CMDLINE env var
Hi,
I'm seeing an issue with udev and libvirt-lxc. Libvirt-lxc creates
/dev/ptmx as a symlink to /dev/pts/ptmx. When udev starts up, it
checks the device type, sees ptmx is 'not right', and replaces it
with a 'proper' ptmx.
In lxc, /dev/ptmx is bind-mounted from /dev/pts/ptmx instead of being
symlinked, so udev sees the right device type and leaves it alone.
A patch like the following seems to work for me. Would there be
any objections to this?
>From 4c5035de52de7e06a0de9c5d0bab8c87a806cba7 Mon Sep 17 00:00:00 2001
From: Ubuntu <ubuntu@domU-12-31-39-14-F0-B3.compute-1.internal>
Date: Wed, 31 Aug 2011 18:15:54 +0000
Subject: [PATCH 1/1] make ptmx a bind mount rather than symlink
udev on some systems checks the device type of /dev/ptmx, and replaces it if
not as expected. The symlink created by libvirt-lxc therefore gets replaced.
By creating it as a bind mount, the device type is correct and udev leaves it
alone.
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
A previous commit gave the LXC driver the ability to mount
block devices for the container filesystem. Through use of
the loopback device functionality, we can build on this to
support use of plain file images for LXC filesytems.
By setting the LO_FLAGS_AUTOCLEAR flag we can ensure that
the loop device automatically disappears when the container
dies / shuts down
* src/lxc/lxc_container.c: Raise error if we see a file
based filesystem, since it should have been turned into
a loopback device already
* src/lxc/lxc_controller.c: Rewrite any filesystems of
type=file, into type=block, by binding the file image
to a free loop device
Currently the LXC driver can only populate filesystems from
host filesystems, using bind mounts. This patch allows host
block devices to be mounted. It autodetects the filesystem
format at mount time, and adds the block device to the cgroups
ACL. Example usage is
<filesystem type='block' accessmode='passthrough'>
<source dev='/dev/sda1'/>
<target dir='/home'/>
</filesystem>
* src/lxc/lxc_container.c: Mount block device filesystems
* src/lxc/lxc_controller.c: Add block device filesystems
to cgroups ACL
An application container shouldn't get a private /dev. Fix
the regression from 6d37888e6a
* src/lxc/lxc_container.c: Don't mount /dev for app containers
A container should not be allowed to modify stuff in /sys
or /proc/sys so make them readonly. Make /selinux readonly
so that containers think that selinux is disabled.
Honour the readonly flag when mounting container filesystems
from the guest XML config
* src/lxc/lxc_container.c: Support readonly mounts
Even in non-virtual root filesystem mode we should be mounting
more than just a new /proc. Refactor lxcContainerMountBasicFS
so that it does everything except for /dev and /dev/pts moving
that into lxcContainerMountDevFS. Pass in a source prefix
to lxcContainerMountBasicFS() so it can be used in both shared
root and private root modes.
* src/lxc/lxc_container.c: Unify mounting code for special
filesystems
The bind mount setup is about to get more complicated.
To avoid having to deal with several copies, pull it
out into a separate lxcContainerMountFSBind method.
Also pull out the iteration over container filesystems,
so that it will be easier to drop in support for non-bind
mount filesystems
* src/lxc/lxc_container.c: Pull bind mount code out into
lxcContainerMountFSBind
* src/lxc/lxc_driver.c (lxcOpen, lxcDomainSetMemoryParameters)
(lxcDomainGetMemoryParameters): Reject unknown flags.
* src/lxc/lxc_container.c (lxcContainerStart): Rename flags to
cflags to reflect that it is not tied to libvirt.
Some callers expected virFileMakePath to set errno, some expected
it to return an errno value. Unify this to return 0 on success and
-1 on error. Set errno to report detailed error information.
Also optimize virFileMakePath if stat fails with an errno different
from ENOENT.
Add a simple handshake with the lxc_controller process so we can detect
process startup failures. We do this by adding a new --handshake cli arg
to lxc_controller for passing a file descriptor. If the process fails to
launch, we scrape all output from the logfile and report it to the user.
These VIR_XXXX0 APIs make us confused, use the non-0-suffix APIs instead.
How do these coversions works? The magic is using the gcc extension of ##.
When __VA_ARGS__ is empty, "##" will swallow the "," in "fmt," to
avoid compile error.
example: origin after CPP
high_level_api("%d", a_int) low_level_api("%d", a_int)
high_level_api("a string") low_level_api("a string")
About 400 conversions.
8 special conversions:
VIR_XXXX0("") -> VIR_XXXX("msg") (avoid empty format) 2 conversions
VIR_XXXX0(string_literal_with_%) -> VIR_XXXX(%->%%) 0 conversions
VIR_XXXX0(non_string_literal) -> VIR_XXXX("%s", non_string_literal)
(for security) 6 conversions
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Using the 'personality(2)' system call, we can make a container
on an x86_64 host appear to be i686. Likewise for most other
Linux 64bit arches.
* src/lxc/lxc_conf.c: Fill in 32bit capabilities for x86_64 hosts
* src/lxc/lxc_container.h, src/lxc/lxc_container.c: Add API to
check if an arch has a 32bit alternative
* src/lxc/lxc_controller.c: Set the process personality when
starting guest
When spawning 'init' in the container, set
LIBVIRT_LXC_UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
LIBVIRT_LXC_NAME=YYYYYYYYYYYY
to allow guest software to detect & identify that they
are in a container
* src/lxc/lxc_container.c: Set LIBVIRT_LXC_UUID and
LIBVIRT_LXC_NAME env vars
Done mechanically with:
$ git grep -l '\bDEBUG0\? *(' | xargs -L1 sed -i 's/\bDEBUG0\? *(/VIR_&/'
followed by manual deletion of qemudDebug in daemon/libvirtd.c, along
with a single 'make syntax-check' fallout in the same file, and the
actual deletion in src/util/logging.h.
* src/util/logging.h (DEBUG, DEBUG0): Delete.
* daemon/libvirtd.h (qemudDebug): Likewise.
* global: Change remaining clients over to VIR_DEBUG counterpart.
Until now, user namespaces have not done much, but (for that
reason) have been innocuous to glob in with other CLONE_
flags. Upcoming userns development, however, will make tasks
cloned with CLONE_NEWUSER far more restricted. In particular,
for some time they will be unable to access files with anything
other than the world access perms.
This patch assumes that noone really needs the user namespaces
to be enabled. If that is wrong, then we can try a more
baroque patch where we create a file owned by a test userid with
700 perms and, if we can't access it after setuid'ing to that
userid, then return 0. Otherwise, assume we are using an
older, 'harmless' user namespace implementation.
Comments appreciated. Is it ok to do this?
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
The /dev/console device inside the container must NOT map
to the real /dev/console device node, since this allows the
container control over the current host console. A fun side
effect of this is that starting a container containing a
real Fedora OS will kill off your X server.
Remove the /dev/console node, and replace it with a symlink
to the primary console TTY
* src/lxc/lxc_container.c: Replace /dev/console with a
symlink to /dev/pty/0
* src/lxc/lxc_controller.c: Remove /dev/console from cgroups
ACL
Using automated replacement with sed and editing I have now replaced all
occurrences of close() with VIR_(FORCE_)CLOSE() except for one, of
course. Some replacements were straight forward, others I needed to pay
attention. I hope I payed attention in all the right places... Please
have a look. This should have at least solved one more double-close
error.
Previously, the functions in src/lxc/veth.c could sometimes return
positive values on failure rather than -1. This made accurate error
reporting difficult, and led to one failure to catch an error in a
calling function.
This patch makes all the functions in veth.c consistently return 0 on
success, and -1 on failure. It also fixes up the callers to the veth.c
functions where necessary.
Note that this patch may be related to the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=607496.
It will not fix the bug, but should unveil what happens.
* po/POTFILES.in - add veth.c, which previously had no translatable strings
* src/lxc/lxc_controller.c
* src/lxc/lxc_container.c
* src/lxc/lxc_driver.c - fixup callers to veth.c, and remove error logs,
as they are now done in veth.c
* src/lxc/veth.c - make all functions consistently return -1 on error.
* src/lxc/veth.h - use ATTRIBUTE_NONNULL to protect against NULL args.
The function is expected to return negative value on failure,
however, it returns positive value when either setInterfaceName
or vethInterfaceUpOrDown fails. Because the function returns
the return value of either as is, however, the two functions
may return positive value on failure.
The patch fixes the defects and add error messages.
Approximately 60 messages were marked. Since these diagnostics are
intended solely for developers and maintainers, encouraging translation
is deemed to be counterproductive:
http://thread.gmane.org/gmane.comp.emulators.libvirt/25050/focus=25052
Run this command:
git grep -l VIR_WARN|xargs perl -pi -e \
's/(VIR_WARN0?)\s*\(_\((".*?")\)/$1($2/'
Upstart crashes & burns in a heap if $TERM environment variable
is missing. Presumably the kernel always sets this when booting
init on a real machine, so libvirt should set it for containers
too.
To make a typical inittab / mingetty setup happier, we need to
symlink the primary console /dev/pts/0 to /dev/tty1.
Improve logging in certain scenarios to make troubleshooting
easier
* src/lxc/lxc_container.c: Create /dev/tty1 and set $TERM
Two files were using functions from <sys/stat.h> but not including
in. Most of the time they got this automatically via another header,
but certain build flag combinations can reveal the problem
* src/lxc/lxc_container.c, src/node_device/node_device_linux_sysfs.c:
Add <sys/stat.h>