Since these tty devices will be used by container,
the owner of them should be the root user of container.
This patch also adds a new function virLXCControllerChown,
we can use this general function to change the owner of
files.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
user namespace doesn't allow to create devices in
uninit userns. We should create devices on host side.
We first mount tmpfs on dev directroy under state dir
of container. then create devices under this dev dir.
Finally in container, mount the dev directroy created
on host to the /dev/ directroy of container.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
This patch introduces new helper function
virLXCControllerSetupUserns, in this function,
we set the files uid_map and gid_map of the init
task of container.
lxcContainerSetID is used for creating cred for
tasks running in container. Since after setuid/setgid,
we may be a new user. This patch calls lxcContainerSetUserns
at first to make sure the new created files belong to
right user.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
After commit c131525bec
"Auto-add a root <filesystem> element to LXC containers on startup"
for libvirt lxc, root must be existent.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
The LXC driver can already configure <disk> or <filesystem>
devices to use the loop device. This extends it to also allow
for use of the NBD device, to support non-raw formats.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The current code for setting up loop devices to LXC disks first
does a switch() based on the disk format, then looks at the
disk driver name. Reverse this so it first looks at the driver
name, and then the disk format. This is more useful since the
list of supported disk formats depends on what driver is used.
The code for setting loop devices for LXC fs entries also needs
to have the same logic added, now the XML schema supports this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
http://www.uhv.edu/ac/newsletters/writing/grammartip2009.07.01.htm
(and several other sites) give hints that 'onto' is best used if
you can also add 'up' just before it and still make sense. In many
cases in the code base, we really want the two-word form, or even
a simplification to just 'on' or 'to'.
* docs/hacking.html.in: Use correct 'on to'.
* python/libvirt-override.c: Likewise.
* src/lxc/lxc_controller.c: Likewise.
* src/util/virpci.c: Likewise.
* daemon/THREADS.txt: Use simpler 'on'.
* docs/formatdomain.html.in: Better usage.
* docs/internals/rpc.html.in: Likewise.
* src/conf/domain_event.c: Likewise.
* src/rpc/virnetclient.c: Likewise.
* tests/qemumonitortestutils.c: Likewise.
* HACKING: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in virLXCDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the LXC driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This allows a container-type domain to have exclusive access to one of
the host's NICs.
Wire <hostdev caps=net> with the lxc_controller - when moving the newly
created veth devices into a new namespace, also look for any hostdev
devices that should be moved. Note: once the container domain has been
destroyed, there is no code that moves the interfaces back to the
original namespace. This does happen, though, probably due to default
cleanup on namespace destruction.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
This patch is the result of running:
for i in $(git ls-files | grep -v html | grep -v \.po$ ); do
sed -i -e "s/virDomainXMLConf/virDomainXMLOption/g" -e "s/xmlconf/xmlopt/g" $i
done
and a few manual tweaks.
Early on kernel support for private devpts was not widespread,
so we had compatibiltiy codepaths. Such old kernels are not
seriously used for LXC these days, so the compat code can go
away
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the lxc controller sets up the devpts instance on
$rootfsdef->src, but this only works if $rootfsdef is using
type=mount. To support type=block or type=file for the root
filesystem, we must use /var/lib/libvirt/lxc/$NAME.devpts
for the temporary devpts mount in the controller
The 'nodeset' variable was never initialized, causing a later
VIR_FREE(nodeset) to free uninitialized memory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Intend to reduce the redundant code,use virNumaSetupMemoryPolicy
to replace virLXCControllerSetupNUMAPolicy and
qemuProcessInitNumaMemoryPolicy.
This patch also moves the numa related codes to the
file virnuma.c and virnuma.h
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Allow lxc using the advisory nodeset from querying numad,
this means if user doesn't specify the numa nodes that
the lxc domain should assign to, libvirt will automatically
bind the lxc domain to the advisory nodeset which queried from
numad.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
The LXC controller is closing loop devices as soon as the
container has started. This is fine if the loop device
was setup as a mounted filesystem, but if we're just passing
through the loop device as a disk, nothing else is keeping
it open. Thus we must keep the loop device FDs open for as
long the libvirt_lxc process is running.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the LXC controller creates the cgroup, configures the
resources and adds the task all in one go. This is not sufficiently
flexible for the forthcoming NBD integration. We need to make sure
the NBD process gets into the right cgroup immediately, but we can
not have limits (in particular the device ACL) applied at the point
where we start qemu-nbd. So create a virLXCCgroupCreate method
which creates the cgroup and adds the current task to be called
early, and leave virLXCCgroupSetup to only do resource config.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The naming used in the RPC protocols for the LXC monitor and
lock daemon confused the script used to generate systemtap
helper functions. Rename the LXC monitor protocol symbols to
reduce confusion. Adapt the gensystemtap.pl script to cope
with the LXC monitor / lock daemon naming conversions.
This has no functional impact on RPC wire protocol, since
names are only used in the C layer
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCaps structure gathered a ton of irrelevant data over time that.
The original reason is that it was propagated to the XML parser
functions.
This patch aims to create a new data structure virDomainXMLConf that
will contain immutable data that are used by the XML parser. This will
allow two things we need:
1) Get rid of the stuff from virCaps
2) Allow us to add callbacks to check and add driver specific stuff
after domain XML is parsed.
This first attempt removes pointers to private data allocation functions
to this new structure and update all callers and function that require
them.
When setting up disks with loop devices for LXC, one of the
switch cases was missing a 'break' causing it to fallthrough
to an error condition.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
By using a loopback device, disks backed by plain files can
be made available to LXC containers. We make no attempt to
auto-detect format if <driver type="raw"/> is not set,
instead we unconditionally treat that as meaning raw. This
is to avoid the security issues inherent with format
auto-detection
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently we rely on a VIR_ERROR message being logged by the
virRaiseError function to report LXC startup errors. This gives
the right message, but is rather ugly and can be truncated
if lots of log messages are written. Change the LXC controller
to explicitly print any virErrorPtr message to stderr. Then
change the driver to skip over anything that looks like a log
message.
The result is that this
error: Failed to start domain busy
error: internal error guest failed to start: 2013-03-04 19:46:42.846+0000: 1734: info : libvirt version: 1.0.2
2013-03-04 19:46:42.846+0000: 1734: error : virFileLoopDeviceAssociate:600 : Unable to open /root/disk.raw: No such file or directory
changes to
error: Failed to start domain busy
error: internal error guest failed to start: Unable to open /root/disk.raw: No such file or directory
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable locking to be introduced to the security manager
objects later, turn virSecurityManager into a virObjectLockable
class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To enable virCapabilities instances to be reference counted,
turn it into a virObject. All cases of virCapabilitiesFree
turn into virObjectUnref
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To avoid confusion between the LXC driver <-> controller
monitor RPC protocol and the libvirt-lxc.so <-> libvirtd public
RPC protocol, rename the former to lxc_monitor_protocol.x
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The code for setting up a private /dev/pts for the containers
is also responsible for making the LXC controller have a
private mount namespace. Unfortunately the /dev/pts code is
not run if launching a container without a custom root. This
causes the LXC FUSE mount to leak into the host FS.
Convert the host capabilities and domain config structs to
use the virArch datatype. Update the parsers and all drivers
to take account of datatype change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
this patch addes fuse support for libvirt lxc.
we can use fuse filesystem to generate sysinfo dynamically,
So we can isolate /proc/meminfo,cpuinfo and so on through
fuse filesystem.
we mount fuse filesystem for every container.
the mount name is libvirt,mount point is
localstatedir/run/libvirt/lxc/containername.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
When no security driver is specified libvirt_lxc segfaults as a debug
message tries to access security labels for the container that are not
present.
This problem was introduced in commit 6c3cf57d6c.
Early jumps to the cleanup label caused a crash of the libvirt_lxc
container helper as the cleanup section called
virLXCControllerDeleteInterfaces(ctrl) without checking the ctrl argument
for NULL. The argument was de-referenced soon after.
$ /usr/libexec/libvirt_lxc
/usr/libexec/libvirt_lxc: missing --name argument for configuration
Segmentation fault
The virLXCControllerClientCloseHook method was mistakenly
assuming that the private data associated with the network
client was the virLXCControllerPtr. In fact it was just a
dummy int, so we were derefencing a bogus struct. The
frequent result of this was that we would never quit, because
we tried to arm a non-existant timer.
Fix the code by removing the dummy private data and just
using the virLXCControllerPtr instance as private data
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the LXC driver logs audit messages when a container
is started or stopped. These audit messages, however, contain
the PID of the libvirt_lxc supervisor process. To enable
sysadmins to correlate with audit messages generated by
processes /inside/ the container, we need to include the
container init process PID.
We can't do this in the main 'start' audit message, since
the init PID is not available at that point. Instead we output
a completely new audit record, that lists both PIDs.
type=VIRT_CONTROL msg=audit(1353433750.071:363): pid=20180 uid=0 auid=501 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='virt=lxc op=init vm="busy" uuid=dda7b947-0846-1759-2873-0f375df7d7eb vm-pid=20371 init-pid=20372 exe="/home/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd" hostname=? addr=? terminal=pts/6 res=success'
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC controller code currently directly invokes the
libvirt main loop code. The problem is that this misses
the cleanup of virNetServerClient connections that
virNetServerRun takes care of.
The result is that when libvirtd is stopped, the
libvirt_lxc controller process gets stuck in a I/O loop.
When libvirtd is then started again, it fails to connect
to the controller and thus kills off the entire domain.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add two new APIs virNetServerClientNewPostExecRestart and
virNetServerClientPreExecRestart which allow a virNetServerClientPtr
object to be created from a JSON object and saved to a
JSON object, for the purpose of re-exec'ing a process.
This includes serialization of the connected socket associated
with the client
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Many parts of virDomainDefPtr were using 'int' variables as
array length counts. Replace all these with size_t and update
various format strings & API signatures to adapt
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Continue consolidation of process functions by moving some
helpers out of command.{c,h} into virprocess.{c,h}
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
https://www.gnu.org/licenses/gpl-howto.html recommends that
the 'If not, see <url>.' phrase be a separate sentence.
* tests/securityselinuxhelper.c: Remove doubled line.
* tests/securityselinuxtest.c: Likewise.
* globally: s/; If/. If/
This patch updates the structures that store information about each
domain and each hypervisor to support multiple security labels and
drivers. It also updates all the remaining code to use the new fields.
Signed-off-by: Marcelo Cerri <mhcerri@linux.vnet.ibm.com>
Currently there is a hook function that is invoked when a
new client connection comes in, which allows an app to
setup private data. This setup will make it difficult to
serialize client state during process re-exec(). Change to
a model where the app registers a callback when creating
the virNetServerPtr instance, which is used to allocate
the client private data immediately during virNetClientPtr
construction.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
All callers used the same initialization seed (well, the new
viratomictest forgot to look at getpid()); so we might as well
make this value automatic. And while it may feel like we are
giving up functionality, I documented how to get it back in the
unlikely case that you actually need to debug with a fixed
pseudo-random sequence. I left that crippled by default, so
that a stray environment variable doesn't cause a lack of
randomness to become a security issue.
* src/util/virrandom.c (virRandomInitialize): Rename...
(virRandomOnceInit): ...and make static, with one-shot call.
Document how to do fixed-seed debugging.
* src/util/virrandom.h (virRandomInitialize): Drop prototype.
* src/libvirt_private.syms (virrandom.h): Don't export it.
* src/libvirt.c (virInitialize): Adjust caller.
* src/lxc/lxc_controller.c (main): Likewise.
* src/security/virt-aa-helper.c (main): Likewise.
* src/util/iohelper.c (main): Likewise.
* tests/seclabeltest.c (main): Likewise.
* tests/testutils.c (virtTestMain): Likewise.
* tests/viratomictest.c (mymain): Likewise.
The reboot() syscall is allowed by new kernels for LXC containers.
The LXC controller can detect whether a reboot was requested
(instead of a normal shutdown) by looking at the "init" process
exit status. If a reboot was triggered, the exit status will
record SIGHUP as the kill reason.
The LXC controller has cleared all its capabilities, and the
veth network devices will no longer exist at this time. Thus
it cannot restart the container init process itself. Instead
it emits an event which is picked up by the LXC driver in
libvirtd. This will then re-create the container, using the
same configuration as it was previously running with (ie it
will not activate 'newDef').
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This defines a new RPC protocol to be used between the LXC
controller and the libvirtd LXC driver. There is only a
single RPC message defined thus far, an asynchronous "EXIT"
event that is emitted just before the LXC controller process
exits. This provides the LXC driver with details about how
the container shutdown - normally, or abnormally (crashed),
thus allowing the driver to emit better libvirt events.
Emitting the event in the LXC controller requires a few
little tricks with the RPC service. Simply calling the
virNetServiceClientSendMessage does not work, since this
merely queues the message for asynchronous processing.
In addition the main event loop is no longer running at
the point the event is emitted, so no I/O is processed.
Thus after invoking virNetServiceClientSendMessage it is
necessary to mark the client as being in "delayed close"
mode. Then the event loop is run again, until the client
completes its close - this happens only after the queued
message has been fully transmitted. The final complexity
is that it is not safe to run virNetServerQuit() from the
client close callback, since that is invoked from a
context where the server is locked. Thus a zero-second
timer is used to trigger shutdown of the event loop,
causing the controller to finally exit.
* src/Makefile.am: Add rules for generating RPC protocol
files and dispatch methods
* src/lxc/lxc_controller.c: Emit an RPC event immediately
before exiting
* src/lxc/lxc_domain.h: Record the shutdown reason
given by the controller
* src/lxc/lxc_monitor.c, src/lxc/lxc_monitor.h: Register
RPC program and event handler. Add callback to let
driver receive EXIT event.
* src/lxc/lxc_process.c: Use monitor exit event to decide
what kind of domain event to emit
* src/lxc/lxc_protocol.x: Define wire protocol for LXC
controller monitor.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Per the FSF address could be changed from time to time, and GNU
recommends the following now: (http://www.gnu.org/licenses/gpl-howto.html)
You should have received a copy of the GNU General Public License
along with Foobar. If not, see <http://www.gnu.org/licenses/>.
This patch removes the explicit FSF address, and uses above instead
(of course, with inserting 'Lesser' before 'General').
Except a bunch of files for security driver, all others are changed
automatically, the copyright for securify files are not complete,
that's why to do it manually:
src/security/security_selinux.h
src/security/security_driver.h
src/security/security_selinux.c
src/security/security_apparmor.h
src/security/security_apparmor.c
src/security/security_driver.c
Move the cgroup setup code out of the lxc_controller.c file
and into lxc_cgroup.{c,h}. This reduces the size of the
lxc_controller.c file and paves the way to invoke cgroup
setup from lxc_driver.c instead of lxc_controller.c in the
future
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Since we are not yet using the virNetServerPtr object for running
the event loop, we can't use virNetServerQuit(). Instead set the
global 'quit' flag in libvirt_lxc
In preparation for introducing a full RPC protocol for
libvirt_lxc, switch over to using the virNetServer APIs
for the monitor connection
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
While it is not currently used elsewhere in libvirt, the code
for finding a free loop device & associating a file with it
is not LXC specific. Move it into the viffile.{c,h} file where
potentially shared code is more commonly kept.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the cgroup object into virLXCControllerPtr and rename
all the setup methods to include 'Cgroup' in their name
if appropriate
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the monitor FDs into the virLXCControllerPtr object
removing the need for the 'struct lxcMonitor' object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virLXCControllerRun method is getting a little too large,
and about 50% of its code is related to setting up a /dev/pts
mount. Move the latter out into a dedicated method
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the security manager object into the virLXCControllerPtr
object. Also simplify the code creating it in the first place
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the list of loop device FDs into the virLXCControllerPtr
object and make sure that virLXCControllerStopInit will
close them all
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Turn 'struct lxc_console' into virLXCControllerConsolePtr and make it
a part of virLXCControllerPtr
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keep a record of the init PID in the virLXCController object
and create a virLXCControllerStopInit method for killing this
process
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Move the veth device name state into the virLXCControllerPtr
object and stop passing it around. Also use size_t instead
of unsigned int for the array length parameters.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The LXC controller code is having to pass around an ever increasing
number of parameters between methods. To make the code more managable
introduce a virLXCControllerPtr to hold all this state, starting with
the container name and virDomainDefPtr object
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The lxc contoller eventually makes use of virRandomBits(), which was
segfaulting since virRandomInitialize() is never invoked.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff554d560 in random_r () from /lib64/libc.so.6
(gdb) bt
0 0x00007ffff554d560 in random_r () from /lib64/libc.so.6
1 0x0000000000469eaa in virRandomBits (nbits=32) at util/virrandom.c:80
2 0x000000000045bf69 in virHashCreateFull (size=256,
dataFree=0x4aa2a2 <hashDataFree>, keyCode=0x45bd40 <virHashStrCode>,
keyEqual=0x45bdad <virHashStrEqual>, keyCopy=0x45bdfa <virHashStrCopy>,
keyFree=0x45be37 <virHashStrFree>) at util/virhash.c:134
3 0x000000000045c069 in virHashCreate (size=0, dataFree=0x4aa2a2 <hashDataFree>)
at util/virhash.c:164
4 0x00000000004aa562 in virNWFilterHashTableCreate (n=0)
at conf/nwfilter_params.c:686
5 0x00000000004aa95b in virNWFilterParseParamAttributes (cur=0x711d30)
at conf/nwfilter_params.c:793
6 0x0000000000481a7f in virDomainNetDefParseXML (caps=0x702c90, node=0x7116b0,
ctxt=0x7101b0, bootMap=0x0, flags=0) at conf/domain_conf.c:4589
7 0x000000000048cc36 in virDomainDefParseXML (caps=0x702c90, xml=0x710040,
root=0x7103b0, ctxt=0x7101b0, expectedVirtTypes=16, flags=0)
at conf/domain_conf.c:8658
8 0x000000000048f011 in virDomainDefParseNode (caps=0x702c90, xml=0x710040,
root=0x7103b0, expectedVirtTypes=16, flags=0) at conf/domain_conf.c:9360
9 0x000000000048ee30 in virDomainDefParse (xmlStr=0x0,
filename=0x702ae0 "/var/run/libvirt/lxc/x.xml", caps=0x702c90,
expectedVirtTypes=16, flags=0) at conf/domain_conf.c:9310
10 0x000000000048ef00 in virDomainDefParseFile (caps=0x702c90,
filename=0x702ae0 "/var/run/libvirt/lxc/x.xml", expectedVirtTypes=16, flags=0)
at conf/domain_conf.c:9332
11 0x0000000000425053 in main (argc=5, argv=0x7fffffffe2b8)
at lxc/lxc_controller.c:1773
Instead of hardcoding use of SELinux contexts in the LXC driver,
switch over to using the official security driver API.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
To allow the security drivers to apply different configuration
information per hypervisor, pass the virtualization driver name
into the security manager constructor.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virLogSetFromEnv call was done too late in startup to
catch many log messages (eg from security driver initialization).
To assist debugging also explicitly log the security details
at startup
Currently the libvirt_lxc process uses VIR_DOMAIN_XML_INACTIVE
when loading the XML for the container. This means it loses
any dynamic data such as the, just allocated, SELinux label.
Further there is an inconsistency in the libvirt LXC driver
whereby it saves the live config XML and then later overwrites
the file with the live status XML instead. Add a comment about
this for future reference.
* src/lxc/lxc_controller.c: Remove VIR_DOMAIN_XML_INACTIVE
when loading XML
* src/lxc/lxc_driver.c: Add comment about inconsistent
config file formats
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Error: UNINIT:
/libvirt/src/lxc/lxc_driver.c:1412:
var_decl: Declaring variable "fd" without initializer.
/libvirt/src/lxc/lxc_driver.c:1460:
uninit_use_in_call: Using uninitialized value "fd" when calling "virFileClose".
/libvirt/src/util/virfile.c:50:
read_parm: Reading a parameter value.
Error: DEADCODE:
/libvirt/src/lxc/lxc_controller.c:960:
dead_error_condition: On this path, the condition "ret == 4" cannot be true.
/libvirt/src/lxc/lxc_controller.c:959:
at_most: After this line, the value of "ret" is at most -1.
/libvirt/src/lxc/lxc_controller.c:959:
new_values: Noticing condition "ret < 0".
/libvirt/src/lxc/lxc_controller.c:961:
dead_error_line: Execution cannot reach this statement "continue;".
Error: UNINIT:
/libvirt/src/lxc/lxc_controller.c:1104:
var_decl: Declaring variable "consoles" without initializer.
/libvirt/src/lxc/lxc_controller.c:1237:
uninit_use: Using uninitialized value "consoles".
This patch fixes the access of variable "con" in two files where the
variable was declared only on SELinux builds and thus the build failed
without SELinux. It's a rather nasty fix but helps fix the build
quickly and without any major changes to the code.
To allow the container to access /dev and /dev/pts when under
sVirt, set an explicit mount option. Also set a max size on
the /dev mount to prevent DOS on memory usage
* src/lxc/lxc_container.c: Set /dev mount context
* src/lxc/lxc_controller.c: Set /dev/pts mount context
For the sake of backwards compat, LXC guests are *not*
confined by default. This is because it is not practical
to dynamically relabel containers using large filesystem
trees. Applications can create confined containers though,
by giving suitable XML configs
* src/Makefile.am: Link libvirt_lxc to security drivers
* src/lxc/libvirtd_lxc.aug, src/lxc/lxc_conf.h,
src/lxc/lxc_conf.c, src/lxc/lxc.conf,
src/lxc/test_libvirtd_lxc.aug: Config file handling for
security driver
* src/lxc/lxc_driver.c: Wire up security driver functions
* src/lxc/lxc_controller.c: Add a '--security' flag to
specify which security driver to activate
* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Set
the process label just before exec'ing init.
Currently the LXC controller attempts to deal with EOF on a
tty by spawning a thread to do an edge triggered epoll_wait().
This avoids the normal event loop spinning on POLLHUP. There
is a subtle mistake though - even after seeing POLLHUP on a
master PTY, it is still perfectly possible & valid to write
data to the PTY. There is a buffer that can be filled with
data, even when no client is present.
The second mistake is that the epoll_wait() thread was not
looking for the EPOLLOUT condition, so when a new client
connects to the LXC console, it had to explicitly send a
character before any queued output would appear.
Finally, there was in fact no need to spawn a new thread to
deal with epoll_wait(). The epoll file descriptor itself
can be poll()'d on normally.
This patch attempts to deal with all these problems.
- The blocking epoll_wait() thread is replaced by a poll
on the epoll file descriptor which then does a non-blocking
epoll_wait() to handle events
- Even if POLLHUP is seen, we continue trying to write
any pending output until getting EAGAIN from write.
- Once write returns EAGAIN, we modify the epoll event
mask to also look for EPOLLOUT
* src/lxc/lxc_controller.c: Avoid stalled I/O upon
connected to an LXC console
To make lxcSetContainerResources smaller, pull the mem tune
and device ACL setup code out into separate methods
* src/lxc/lxc_controller.c: Introduce lxcSetContainerMemTune
and lxcSetContainerDeviceACL
While LXC does not have the concept of VCPUS, so we can't do
per-VCPU pCPU placement, we can support the VM level CPU
placement. Todo this simply set the CPU affinity of the LXC
controller at startup. All child processes will inherit this
affinity.
* src/lxc/lxc_controller.c: Set process affinity
Move the virNetDevSetName and virNetDevSetNamespace APIs out
of LXC's veth.c and into virnetdev.c.
Move the remaining content of the file to src/util/virnetdevveth.c
* src/lxc/veth.c: Rename to src/util/virnetdevveth.c
* src/lxc/veth.h: Rename to src/util/virnetdevveth.h
* src/util/virnetdev.c, src/util/virnetdev.h: Add
virNetDevSetName and virNetDevSetNamespace
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
src/lxc/lxc_driver.c: Update include paths
The src/lxc/veth.c file contains APIs for managing veth devices,
but some of the APIs duplicate stuff from src/util/virnetdev.h.
Delete thed duplicate APIs and rename the remaining ones to
follow virNetDevVethXXXX
* src/lxc/veth.c, src/lxc/veth.h: Rename APIs & delete duplicates
* src/lxc/lxc_container.c, src/lxc/lxc_controller.c,
src/lxc/lxc_driver.c: Update for API renaming
Based on a Coverity report - the return value of waitpid() should
always be checked, to avoid problems with leaking resources.
* src/lxc/lxc_controller.c (lxcControllerRun): Use simpler virPidAbort.
Currently the LXC controller only supports setup of a single
text console. This is wired up to the container init's stdio,
as well as /dev/console and /dev/tty1. Extending support for
multiple consoles, means wiring up additional PTYs to /dev/tty2,
/dev/tty3, etc, etc. The LXC controller is passed multiple open
file handles, one for each console requested.
* src/lxc/lxc_container.c, src/lxc/lxc_container.h: Wire up
all the /dev/ttyN links required to symlink to /dev/pts/NN
* src/lxc/lxc_container.h: Open more container side /dev/pts/NN
devices, and adapt event loop to handle I/O from all consoles
* src/lxc/lxc_driver.c: Setup multiple host side PTYs
The current I/O code for LXC uses a hand crafted event loop
to forward I/O between the container & host app, based on
epoll to handle EOF on PTYs. This event loop is not easily
extensible to add more consoles, or monitor other types of
file descriptors.
Remove the custom event loop and replace it with a normal
libvirt event loop. When detecting EOF on a PTY, disable
the event watch on that FD, and fork off a background thread
that does a edge-triggered epoll() on the FD. When the FD
finally shows new incoming data, the thread re-enables the
watch on the FD and exits.
When getting EOF from a read() on the PTY, the existing code
would do waitpid(WNOHANG) to see if the container had exited.
Unfortunately there is a race condition, because even though
the process has closed its stdio handles, it might still
exist.
To deal with this the new event loop uses a SIG_CHILD handler
to perform the waitpid only when the container is known to
have actually exited.
* src/lxc/lxc_controller.c: Rewrite the event loop to use
the standard APIs.
Previous commit clears number of items alocated in lxcSetupLoopDevices
if VIR_REALLOC_N fails. In that case, the pointer is not NULL, and
causes leaking FDs that have been allocated.
* src/lxc/lxc_controller.c: revert zeroing array size
If the function lxcSetupLoopDevices(def, &nloopDevs, &loopDevs) failed,
the variable loopDevs will keep a initial NULL value, however, the
function VIR_FORCE_CLOSE(loopDevs[i]) will directly deref it.
This patch also fixes returning a bogous number of devices from
lxcSetupLoopDevices on an error path.
* rc/lxc/lxc_controller.c: fixed a null pointer dereference.
Signed-off-by: Alex Jia <ajia@redhat.com>
The glibc ones (intentionally) cannot handle ptys opened in a
devpts not mounted at /dev/pts.
Drop the (un-exported, unused) virFileOpenTtyAt.
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Detected by Coverity. We want to increment the size_t counter,
not the pointer to the counter. Bug present since 5f5c6fde (0.9.5).
* src/lxc/lxc_controller.c (lxcSetupLoopDevices): Use correct
precedence.
The functions for manipulating pidfiles are in util/util.{c,h}.
We will shortly be adding some further pidfile related functions.
To avoid further growing util.c, this moves the pidfile related
functions into a dedicated virpidfile.{c,h}. The functions are
also all renamed to have 'virPidFile' as their name prefix
* util/util.h, util/util.c: Remove all pidfile code
* util/virpidfile.c, util/virpidfile.h: Add new APIs for pidfile
handling.
* lxc/lxc_controller.c, lxc/lxc_driver.c, network/bridge_driver.c,
qemu/qemu_process.c: Add virpidfile.h include and adapt for API
renames
A previous commit gave the LXC driver the ability to mount
block devices for the container filesystem. Through use of
the loopback device functionality, we can build on this to
support use of plain file images for LXC filesytems.
By setting the LO_FLAGS_AUTOCLEAR flag we can ensure that
the loop device automatically disappears when the container
dies / shuts down
* src/lxc/lxc_container.c: Raise error if we see a file
based filesystem, since it should have been turned into
a loopback device already
* src/lxc/lxc_controller.c: Rewrite any filesystems of
type=file, into type=block, by binding the file image
to a free loop device
Currently the LXC driver can only populate filesystems from
host filesystems, using bind mounts. This patch allows host
block devices to be mounted. It autodetects the filesystem
format at mount time, and adds the block device to the cgroups
ACL. Example usage is
<filesystem type='block' accessmode='passthrough'>
<source dev='/dev/sda1'/>
<target dir='/home'/>
</filesystem>
* src/lxc/lxc_container.c: Mount block device filesystems
* src/lxc/lxc_controller.c: Add block device filesystems
to cgroups ACL
Although most functions in libvirt return 0 on success and < 0 on
failure, there are a few functions lingering around that return errno
(a positive value) on failure, and sometimes code calling those
functions incorrectly assumes the <0 standard. I noticed one of these
the other day when auditing networkStartDhcpDaemon after Guido Gunther
found a place where success was improperly returned on failure (that
patch has been acked and is pending a push). The problem was that it
expected the return value from virFileReadPid to be < 0 on failure,
but it was actually positive (it was also neglected to set the return
code in this case, similar to the bug found by Guido).
This all led to the fact that *all* of the virFile*Pid functions in
util.c are returning errno on failure. This patch remedies that
problem by changing them all to return -errno on failure, and makes
any necessary changes to callers of the functions. (In the meantime, I
also properly set the return code on failure of virFileReadPid in
networkStartDhcpDaemon).
The drivers were accepting domain configs without checking if those
were actually meant for them. For example the LXC driver happily
accepts configs with type QEMU.
Add a check for the expected domain types to the virDomainDefParse*
functions.
Some callers expected virFileMakePath to set errno, some expected
it to return an errno value. Unify this to return 0 on success and
-1 on error. Set errno to report detailed error information.
Also optimize virFileMakePath if stat fails with an errno different
from ENOENT.
Add a simple handshake with the lxc_controller process so we can detect
process startup failures. We do this by adding a new --handshake cli arg
to lxc_controller for passing a file descriptor. If the process fails to
launch, we scrape all output from the logfile and report it to the user.
These VIR_XXXX0 APIs make us confused, use the non-0-suffix APIs instead.
How do these coversions works? The magic is using the gcc extension of ##.
When __VA_ARGS__ is empty, "##" will swallow the "," in "fmt," to
avoid compile error.
example: origin after CPP
high_level_api("%d", a_int) low_level_api("%d", a_int)
high_level_api("a string") low_level_api("a string")
About 400 conversions.
8 special conversions:
VIR_XXXX0("") -> VIR_XXXX("msg") (avoid empty format) 2 conversions
VIR_XXXX0(string_literal_with_%) -> VIR_XXXX(%->%%) 0 conversions
VIR_XXXX0(non_string_literal) -> VIR_XXXX("%s", non_string_literal)
(for security) 6 conversions
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
* Correct the documentation for cgroup: the swap_hard_limit indicates
mem+swap_hard_limit.
* Change cgroup private apis to: virCgroupGet/SetMemSwapHardLimit
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
The current LXC I/O controller looks for HUP to detect
when a guest has quit. This isn't reliable as during
initial bootup it is possible that 'init' will close
the console and let mingetty re-open it. The shutdown
of containers was also flakey because it only killed
the libvirt I/O controller and expected container
processes to gracefully follow.
Change the I/O controller such that when it see HUP
or an I/O error, it uses kill($PID, 0) to see if the
process has really quit.
Change the container shutdown sequence to use the
virCgroupKillPainfully function to ensure every
really goes away
This change makes the use of the 'cpu', 'devices'
and 'memory' cgroups controllers compulsory with
LXC
* docs/drvlxc.html.in: Document that certain cgroups
controllers are now mandatory
* src/lxc/lxc_controller.c: Check if PID is still
alive before quitting on I/O error/HUP
* src/lxc/lxc_driver.c: Use virCgroupKillPainfully
Adding audit points showed that we were granting too much privilege
to qemu; it should not need any mknod rights to recreate any
devices. On the other hand, lxc should have all device privileges.
The solution is adding a flag parameter.
This also lets us restrict write access to read-only disks.
* src/util/cgroup.h (virCgroup*Device*): Adjust prototypes.
* src/util/cgroup.c (virCgroupAllowDevice)
(virCgroupAllowDeviceMajor, virCgroupAllowDevicePath)
(virCgroupDenyDevice, virCgroupDenyDeviceMajor)
(virCgroupDenyDevicePath): Add parameter.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Update clients.
* src/lxc/lxc_controller.c (lxcSetContainerResources): Likewise.
* src/qemu/qemu_cgroup.c: Likewise.
(qemuSetupDiskPathAllow): Also, honor read-only disks.
Using the 'personality(2)' system call, we can make a container
on an x86_64 host appear to be i686. Likewise for most other
Linux 64bit arches.
* src/lxc/lxc_conf.c: Fill in 32bit capabilities for x86_64 hosts
* src/lxc/lxc_container.h, src/lxc/lxc_container.c: Add API to
check if an arch has a 32bit alternative
* src/lxc/lxc_controller.c: Set the process personality when
starting guest
Normal practice for /dev/pts is to have it mode=620,gid=5
but LXC was leaving mode=000,gid=0 preventing unprivilegd
users in the guest use of PTYs
* src/lxc/lxc_controller.c: Fix /dev/pts setup
Done mechanically with:
$ git grep -l '\bDEBUG0\? *(' | xargs -L1 sed -i 's/\bDEBUG0\? *(/VIR_&/'
followed by manual deletion of qemudDebug in daemon/libvirtd.c, along
with a single 'make syntax-check' fallout in the same file, and the
actual deletion in src/util/logging.h.
* src/util/logging.h (DEBUG, DEBUG0): Delete.
* daemon/libvirtd.h (qemudDebug): Likewise.
* global: Change remaining clients over to VIR_DEBUG counterpart.
Per the gettext developer:
http://lists.gnu.org/archive/html/bug-gnu-utils/2010-10/msg00019.htmlhttp://lists.gnu.org/archive/html/bug-gnu-utils/2010-10/msg00021.html
gettext() doesn't work correctly on all platforms unless you have
called setlocale(). Furthermore, gnulib's gettext.h has provisions
for setting up a default locale, which is the preferred method for
libraries to use gettext without having to call textdomain() and
override the main program's default domain (virInitialize already
calls bindtextdomain(), but this is insufficient without the
setlocale() added in this patch; and a redundant bindtextdomain()
in this patch doesn't hurt, but serves as a good example for other
packages that need to bind a second translation domain).
This patch is needed to silence a new gnulib 'make syntax-check'
rule in the next patch.
* daemon/libvirtd.c (main): Setup locale and gettext.
* src/lxc/lxc_controller.c (main): Likewise.
* src/security/virt-aa-helper.c (main): Likewise.
* src/storage/parthelper.c (main): Likewise.
* tools/virsh.c (main): Fix exit status.
* src/internal.h (DEFAULT_TEXT_DOMAIN): Define, for gettext.h.
(_): Simplify definition accordingly.
* po/POTFILES.in: Add src/storage/parthelper.c.
The /dev/console device inside the container must NOT map
to the real /dev/console device node, since this allows the
container control over the current host console. A fun side
effect of this is that starting a container containing a
real Fedora OS will kill off your X server.
Remove the /dev/console node, and replace it with a symlink
to the primary console TTY
* src/lxc/lxc_container.c: Replace /dev/console with a
symlink to /dev/pty/0
* src/lxc/lxc_controller.c: Remove /dev/console from cgroups
ACL
Using automated replacement with sed and editing I have now replaced all
occurrences of close() with VIR_(FORCE_)CLOSE() except for one, of
course. Some replacements were straight forward, others I needed to pay
attention. I hope I payed attention in all the right places... Please
have a look. This should have at least solved one more double-close
error.
Adding parsing code for memory tunables in the domain xml file
also change the internal define structures used for domain memory
informations
Adds a new specific test
Previously, the functions in src/lxc/veth.c could sometimes return
positive values on failure rather than -1. This made accurate error
reporting difficult, and led to one failure to catch an error in a
calling function.
This patch makes all the functions in veth.c consistently return 0 on
success, and -1 on failure. It also fixes up the callers to the veth.c
functions where necessary.
Note that this patch may be related to the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=607496.
It will not fix the bug, but should unveil what happens.
* po/POTFILES.in - add veth.c, which previously had no translatable strings
* src/lxc/lxc_controller.c
* src/lxc/lxc_container.c
* src/lxc/lxc_driver.c - fixup callers to veth.c, and remove error logs,
as they are now done in veth.c
* src/lxc/veth.c - make all functions consistently return -1 on error.
* src/lxc/veth.h - use ATTRIBUTE_NONNULL to protect against NULL args.
Init process may remain after sending SIGTERM for some reason.
For example, if original init program is used, it is definitely
not killed by SIGTERM.
* src/lxc/lxc_controller.c: kill with SIGKILL if SIGTERM wasn't
sufficient
* src/lxc/lxc_controller.c (ignorable_epoll_accept_errno): New function.
(lxcControllerMain): Handle a failed accept carefully:
most errno values indicate legitimate failure and must be fatal.
However, ignore a special case: that in which an incoming client quits
between the poll() indicating its presence, and our accept() which
is trying to process it.
Approximately 60 messages were marked. Since these diagnostics are
intended solely for developers and maintainers, encouraging translation
is deemed to be counterproductive:
http://thread.gmane.org/gmane.comp.emulators.libvirt/25050/focus=25052
Run this command:
git grep -l VIR_WARN|xargs perl -pi -e \
's/(VIR_WARN0?)\s*\(_\((".*?")\)/$1($2/'
When using the 'ns' cgroup controller, the moment a process calls
'unshare(CLONE_NEWNS)', it will be given a private cgroup tree
under its current location. This really messages up the LXC
controller process, because it ends up creating the containers'
cgroup in the wrong place. The fix is fairly easy, just move
the cgroup setup before the code which calls unshare(). The
'ns' controller will still create extra undesired cgroups, but
they at least won't break libvirt's setup now.
The patch also adds a missing cgroups allow rule for /dev/tty
device node