The LXC, QEMU, and LibXL drivers have all merged their handling of
the attach/update/modify device APIs into one large
'xxxxDomainModifyDeviceFlags'
which then does a 'switch()' based on the actual API being invoked.
While this saves some lines of code, it is not really all that
significant in the context of the driver API impls as a whole.
This merger of the handling of different APIs creates pain when
wanting to automated analysis of the code and do things which
are specific to individual APIs. The slight duplication of code
from unmerged the API impls, is preferrable to allow for easier
automated analysis.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The individual hypervisor drivers were directly referencing
APIs in virnodesuspend.c in their virDriverPtr struct. Separate
these methods, so there is always a wrapper in the hypervisor
driver. This allows the unused virConnectPtr args to be removed
from the virnodesuspend.c file. Again this will ensure that
ACL checks will only be performed on invocations that are
directly associated with public API usage.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The individual hypervisor drivers were directly referencing
APIs in src/nodeinfo.c in their virDriverPtr struct. Separate
these methods, so there is always a wrapper in the hypervisor
driver. This allows the unused virConnectPtr args to be
removed from the nodeinfo.c file. Again this will ensure that
ACL checks will only be performed on invocations that are
directly associated with public API usage.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the virGetHostname() API has a bogus virConnectPtr
parameter. This is because virtualization drivers directly
reference this API in their virDriverPtr tables, tieing its
API design to the public virConnectGetHostname API design.
This also causes problems for access control checks since
these must only be done for invocations from the public
API, not internal invocation.
Remove the bogus virConnectPtr parameter, and make each
hypervisor driver provide a dedicated function for the
driver API impl. This will allow access control checks
to be easily inserted later.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
Ensure that all drivers implementing public APIs use a
naming convention for their implementation that matches
the public API name.
eg for the public API virDomainCreate make sure QEMU
uses qemuDomainCreate and not qemuDomainStart
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure that the driver struct field names match the public
API names. For an API virXXXX we must have a driver struct
field xXXXX. ie strip the leading 'vir' and lowercase any
leading uppercase letters.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The change in commit aed4986322fe77bdf718e31a0587d00f04f3d97a
was incomplete, missing a couple of cases of /system. This
caused failure to start VMs.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After discussions with systemd developers it was decided that
a better default policy for resource partitions is to have
3 default partitions at the top level
/system - system services
/machine - virtual machines / containers
/user - user login session
This ensures that the default policy isolates guest from
user login sessions & system services, so a mis-behaving
guest can't consume 100% of CPU usage if other things are
contending for it.
Thus we change the default partition from /system to
/machine
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
http://www.uhv.edu/ac/newsletters/writing/grammartip2009.07.01.htm
(and several other sites) give hints that 'onto' is best used if
you can also add 'up' just before it and still make sense. In many
cases in the code base, we really want the two-word form, or even
a simplification to just 'on' or 'to'.
* docs/hacking.html.in: Use correct 'on to'.
* python/libvirt-override.c: Likewise.
* src/lxc/lxc_controller.c: Likewise.
* src/util/virpci.c: Likewise.
* daemon/THREADS.txt: Use simpler 'on'.
* docs/formatdomain.html.in: Better usage.
* docs/internals/rpc.html.in: Likewise.
* src/conf/domain_event.c: Likewise.
* src/rpc/virnetclient.c: Likewise.
* tests/qemumonitortestutils.c: Likewise.
* HACKING: Regenerate.
Signed-off-by: Eric Blake <eblake@redhat.com>
The LXC driver currently has code to detect cgroups mounts
and then re-mount them inside the new root filesystem. Replace
this fragile code with a call to virCgroupIsolateMount.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Historically QEMU/LXC guests have been placed in a cgroup layout
that is
$LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME
This is bad for a number of reasons
- The cgroup hierarchy gets very deep which seriously
impacts kernel performance due to cgroups scalability
limitations.
- It is hard to setup cgroup policies which apply across
services and virtual machines, since all VMs are underneath
the libvirtd service.
To address this the default cgroup location is changed to
be
/system/$VMNAME.{lxc,qemu}.libvirt
This puts virtual machines at the same level in the hierarchy
as system services, allowing consistent policy to be setup
across all of them.
This also honours the new resource partition location from the
XML configuration, for example
<resource>
<partition>/virtualmachines/production</partitions>
</resource>
will result in the VM being placed at
/virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt
NB, with the exception of the default, /system, path which
is intended to always exist, libvirt will not attempt to
auto-create the partitions in the XML. It is the responsibility
of the admin/app to configure the partitions. Later libvirt
APIs will provide a way todo this.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rename all the virCgroupForXXX methods to use the form
virCgroupNewXXX since they are all constructors. Also
make sure the output parameter is the last one in the
list, and annotate all pointers as non-null. Fix up
all callers, and make sure they use true/false not 0/1
for the boolean parameters
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in virLXCDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the LXC driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the user requests a mount for /run, this may hide any existing
mounts that are lower down in /run. The result is that the
container still sees the mounts in /proc/mounts, but cannot
access them
sh-4.2# df
df: '/run/user/501/gvfs': No such file or directory
df: '/run/media/berrange/LIVE': No such file or directory
df: '/run/media/berrange/SecureDiskA1': No such file or directory
df: '/run/libvirt/lxc/sandbox': No such file or directory
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg_t500wlan-lv_root 151476396 135390200 8384900 95% /
tmpfs 1970888 3204 1967684 1% /run
/dev/sda1 194241 155940 28061 85% /boot
devfs 64 0 64 0% /dev
tmpfs 64 0 64 0% /sys/fs/cgroup
tmpfs 1970888 1200 1969688 1% /etc/libvirt-sandbox/scratch
Before mounting any filesystem at a particular location, we
must recursively unmount anything at or below the target mount
point
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Ensure lxcContainerUnmountSubtree is at the top of the
lxc_container.c file so it is easily referenced from
any other method. No functional change
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This allows a container-type domain to have exclusive access to one of
the host's NICs.
Wire <hostdev caps=net> with the lxc_controller - when moving the newly
created veth devices into a new namespace, also look for any hostdev
devices that should be moved. Note: once the container domain has been
destroyed, there is no code that moves the interfaces back to the
original namespace. This does happen, though, probably due to default
cleanup on namespace destruction.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
The virCgroupMounted method is badly named, since a controller can be
mounted, but disabled in the current object. Rename the method to be
virCgroupHasController. Also make it tolerant to a NULL virCgroupPtr
and out-of-range controller index, to avoid duplication of these
checks in all callers
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently when getting an instance of virCgroupPtr we will
create the path in all cgroup controllers. Only at the virt
driver layer are we attempting to filter controllers. This
is bad because the mere act of creating the dirs in the
controllers can have a functional impact on the kernel,
particularly for performance.
Update the virCgroupForDriver() method to accept a bitmask
of controllers to use. Only create dirs in the controllers
that are requested. When creating cgroups for domains,
respect the active controller list from the parent cgroup
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupGetAppRoot is not clear in its meaning. Change
to virCgroupForSelf to highlight that this returns the
cgroup config for the caller's process
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch refactors various places to allow removing of the
defaultConsoleTargetType callback from the virCaps structure.
A new console character device target type is introduced -
VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_NONE - to mark that no type was
specified in the XML. This type is at the end converted to the standard
VIR_DOMAIN_CHR_CONSOLE_TARGET_TYPE_SERIAL. Other types that are
different from this default have to be processed separately in the
device post parse callback.
Use the virDomainXMLConf structure to hold this data and tweak the code
to avoid semantic change.
Without configuration the KVM mac prefix is used by default. I chose it
as it's in the privately administered segment so it should be usable for
any purposes.
This patch removes the emulatorRequired field and associated
infrastructure from the virCaps object. Instead the driver specific
callbacks are used as this field isn't enforced by all drivers.
This patch implements the appropriate callbacks in the qemu and lxc
driver and moves to check to that location.
This patch adds instrumentation that will allow hypervisor drivers to
fill and validate domain and device definitions after parsed by the XML
parser.
With this patch, after the XML is parsed, a callback to the driver is
issued requesting to fill and validate driver specific details of the
configuration. This allows to use sensible defaults and checks on a per
driver basis at the time the XML is parsed.
Two callback pointers are stored in the new virDomainXMLConf object:
* virDomainDeviceDefPostParseCallback (devicesPostParseCallback)
- called for a single device parsed and for every single device in a
domain config. A virDomainDeviceDefPtr is passed along with the
domain definition and virCaps.
* virDomainDefPostParseCallback, (domainPostParseCallback)
- A callback that is meant to process the domain config after it's
parsed. A virDomainDefPtr is passed along with virCaps.
Both types of callbacks support arbitrary opaque data passed for the
callback functions.
Errors may be reported in those callbacks resulting in a XML parsing
failure.
This patch is the result of running:
for i in $(git ls-files | grep -v html | grep -v \.po$ ); do
sed -i -e "s/virDomainXMLConf/virDomainXMLOption/g" -e "s/xmlconf/xmlopt/g" $i
done
and a few manual tweaks.
This reverts commit c9c87376f2b2197ad774533ad6a6dd2f631ca105.
Now that we force all containers to have a root filesystem,
there is no way the host's /dev is ever exposed
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the LXC container code has two codepaths, depending on
whether there is a <filesystem> element with a target path of '/'.
If we automatically add a <filesystem> device with src=/ and dst=/,
for any container which has not specified a root filesystem, then
we only need one codepath for setting up the filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Early on kernel support for private devpts was not widespread,
so we had compatibiltiy codepaths. Such old kernels are not
seriously used for LXC these days, so the compat code can go
away
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
For a root filesystem with type=file or type=block, the LXC
container was forgetting to actually mount it, before doing
the pivot root step.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the lxc controller sets up the devpts instance on
$rootfsdef->src, but this only works if $rootfsdef is using
type=mount. To support type=block or type=file for the root
filesystem, we must use /var/lib/libvirt/lxc/$NAME.devpts
for the temporary devpts mount in the controller
Instead of using /var/lib/libvirt/lxc/$NAME for the FUSE
filesystem, use /var/lib/libvirt/lxc/$NAME.fuse. This allows
room for other temporary mounts in the same directory
Some of the LXC callbacks did not lock the virDomainObjPtr
instance. This caused transient errors like
error: Failed to start domain busy-mount
error: cannot rename file '/var/run/libvirt/lxc/busy-mount.xml.new' as '/var/run/libvirt/lxc/busy-mount.xml': No such file or directory
as 2 threads tried to update the status file concurrently
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'nodeset' variable was never initialized, causing a later
VIR_FREE(nodeset) to free uninitialized memory.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Intend to reduce the redundant code,use virNumaSetupMemoryPolicy
to replace virLXCControllerSetupNUMAPolicy and
qemuProcessInitNumaMemoryPolicy.
This patch also moves the numa related codes to the
file virnuma.c and virnuma.h
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Allow lxc using the advisory nodeset from querying numad,
this means if user doesn't specify the numa nodes that
the lxc domain should assign to, libvirt will automatically
bind the lxc domain to the advisory nodeset which queried from
numad.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
The LXC controller is closing loop devices as soon as the
container has started. This is fine if the loop device
was setup as a mounted filesystem, but if we're just passing
through the loop device as a disk, nothing else is keeping
it open. Thus we must keep the loop device FDs open for as
long the libvirt_lxc process is running.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the LXC controller creates the cgroup, configures the
resources and adds the task all in one go. This is not sufficiently
flexible for the forthcoming NBD integration. We need to make sure
the NBD process gets into the right cgroup immediately, but we can
not have limits (in particular the device ACL) applied at the point
where we start qemu-nbd. So create a virLXCCgroupCreate method
which creates the cgroup and adds the current task to be called
early, and leave virLXCCgroupSetup to only do resource config.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The naming used in the RPC protocols for the LXC monitor and
lock daemon confused the script used to generate systemtap
helper functions. Rename the LXC monitor protocol symbols to
reduce confusion. Adapt the gensystemtap.pl script to cope
with the LXC monitor / lock daemon naming conversions.
This has no functional impact on RPC wire protocol, since
names are only used in the C layer
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If an LXC domain failed to start because of a bogus SELinux
label, virLXCProcessStart would call VIR_CLOSE(0) by mistake.
This is because the code which initializes the member of the
ttyFDs array to -1 got moved too far away from the place where
the array is first allocated.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In some startup failure modes, the fuse thread may get itself
wedged. This will cause the entire libvirt_lxc process to
hang trying to the join the thread. There is no compelling
reason to wait for the thread to exit if the whole process
is exiting, so just daemonize the fuse thread instead.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virDomainGetSecurityLabel method is currently (mistakenly)
showing the label of the libvirt_lxc process:
...snip...
Security model: selinux
Security DOI: 0
Security label: system_u:system_r:virtd_t:s0-s0:c0.c1023 (permissive)
when it should be showing the init process label
...snip...
Security model: selinux
Security DOI: 0
Security label: system_u:system_r:svirt_t:s0:c724,c995 (permissive)
The virCaps structure gathered a ton of irrelevant data over time that.
The original reason is that it was propagated to the XML parser
functions.
This patch aims to create a new data structure virDomainXMLConf that
will contain immutable data that are used by the XML parser. This will
allow two things we need:
1) Get rid of the stuff from virCaps
2) Allow us to add callbacks to check and add driver specific stuff
after domain XML is parsed.
This first attempt removes pointers to private data allocation functions
to this new structure and update all callers and function that require
them.