If we don't properly clean up all processes in the
machine-<vmname>.scope systemd won't remove the cgroup and subsequent vm
starts fail with
'CreateMachine: File exists'
Additional processes can e.g. be added via
echo $PID > /sys/fs/cgroup/systemd/machine.slice/machine-${VMNAME}.scope/tasks
but there are other cases like
http://bugs.debian.org/761521
Invoke TerminateMachine to be on the safe side since systemd tracks the
cgroup anyway. This is a noop if all processes have terminated already.
Added <capabilities> in the <features> section of LXC domains
configuration. This section can contain elements named after the
capabilities like:
<mknod state="on"/>, keep CAP_MKNOD capability
<sys_chroot state="off"/> drop CAP_SYS_CHROOT capability
Users can restrict or give more capabilities than the default using
this mechanism.
Commit a48f445100 introduced a helper
function to convert cgroup device mode to string. The function was only
conditionally compiled on platforms that support cgroup. This broke the
build when attempting to export the symbol:
CCLD libvirt.la
Cannot export virCgroupGetDevicePermsString: symbol not defined
Move the function out of the ifdef, as it doesn't really depend on the
cgroup code being present.
Cgroups code uses VIR_CGROUP_DEVICE_* flags to specify the mode but in
the end it needs to be converted to a string. Add a helper to do it and
use it in the cgroup code before introducing it into the rest of the
code.
In making the conversion to the new API, I fixed a couple bugs:
virSCSIDeviceGetSgName would leak memory if a directory
unexpectedly contained multiple entries;
virNetDevTapGetRealDeviceName could report a spurious error
from a stale errno inherited before starting the readdir search.
The decision on whether to store the result of virDirRead into
a variable is based on whether the end of the loop falls through
to cleanup code automatically. In some cases, we have loops that
are documented to return NULL on failure, and which raise an
error on most failure paths but not in the case where the directory
was unexpectedly empty; it may be worth a followup patch to
explicitly report an error if readdir was successful but the
directory was empty, so that a NULL return always has an error set.
* src/util/vircgroup.c (virCgroupRemoveRecursively): Use new
interface.
(virCgroupKillRecursiveInternal, virCgroupSetOwner): Report
readdir failures.
* src/util/virfile.c (virFileLoopDeviceOpenSearch)
(virFileNBDDeviceFindUnused, virFileDeleteTree): Use new
interface.
* src/util/virnetdevtap.c (virNetDevTapGetRealDeviceName):
Properly check readdir errors.
* src/util/virpci.c (virPCIDeviceIterDevices)
(virPCIDeviceFileIterate, virPCIGetNetName): Report readdir
failures.
(virPCIDeviceAddressIOMMUGroupIterate): Use new interface.
* src/util/virscsi.c (virSCSIDeviceGetSgName): Report readdir
failures, and avoid memory leak.
(virSCSIDeviceGetDevName): Report readdir failures.
* src/util/virusb.c (virUSBDeviceSearch): Report readdir
failures.
* src/util/virutil.c (virGetFCHostNameByWWN)
(virFindFCHostCapableVport): Report readdir failures.
Signed-off-by: Eric Blake <eblake@redhat.com>
The iterator is checked for being less than or equal to need_cpus.
The 'n' variable is incremented need_cpus + 1 times.
Simplify the computation of need_cpus and make its value one larger,
to let it be used instead of 'n' and compared without the equal sign
in loop conditions.
Just index the sum_cpu_time array instead of using a helper variable.
Start the loop at start_cpu instead of continuing for all lower values.
total_cpus is the total number of CPUs on the host
need_cpus is the number of CPUs we need to look at
(need_cpus can be larger than ncpus, because we need to look
at CPUs before the startcpu too, even if we aren't reporting
their stats)
Currently, virCgroupGetPercpuStats is only used by the LXC driver,
filling out the CPUTIME stats. qemuDomainGetPercpuStats does this
and also filles out VCPUTIME stats.
Extend virCgroupGetPercpuStats to also report VCPUTIME stats if
nvcpupids is non-zero. In the LXC driver, we don't have cpupids.
In the QEMU driver, there is at least one cpupid for a running domain,
so the behavior shouldn't change for QEMU either.
Also rename getSumVcpuPercpuStats to virCgroupGetPercpuVcpuSum.
When I start multi VMs coincidently and any of the cgroup directories
named machine doesn't exist. There's a chance that VM start failed because
of creating directory failed:
Unable to initialize /machine cgroup: File exists
When the errno returned by mkdir in virCgroupMakeGroup is EEXIST,
we should pass it through and continue to start the VM.
Signed-off-by: Wang Yufei <james.wangyufei@huawei.com>
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit a1cbe4b5 added a check for spaces around assignments and this
patch extends it to checks for spaces around '=='. One exception is
virAssertCmpInt where comma after '==' is acceptable (since it is a
macro and '==' is its argument).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Running ./autobuild.sh detected a mingw failure:
CCLD libvirt.la
Cannot export virCgroupGetPercpuStats: symbol not defined
Cannot export virCgroupSetOwner: symbol not defined
* src/util/vircgroup.c (virCgroupGetPercpuStats)
(virCgroupSetOwner): Implement stubs.
Signed-off-by: Eric Blake <eblake@redhat.com>
This function is needed for user namespaces, where we need to chmod()
the cgroup to the initial uid/gid such that systemd is allowed to
use the cgroup.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 6515889 broke the build on FreeBSD:
In function `qemuDomainGetCPUStats':
/../../src/qemu/qemu_driver.c:16102:
undefined reference to `virCgroupGetDomainTotalCpuStats'
This patch introduces virCgroupSetBlkioDeviceReadIops,
virCgroupSetBlkioDeviceWriteIops,
virCgroupSetBlkioDeviceReadBps and
virCgroupSetBlkioDeviceWriteBps,
we can use these interfaces to set up throttle
blkio cgroup for domain.
This patch also adds the new throttle blkio cgroup
elements to the test xml.
Signed-off-by: Guan Qiang <hzguanqiang@corp.netease.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Since kernel 3.12 (commit 34ff8dc08956098563989d8599840b130be81252 in
linux-stable.git in particular) the value for 'unlimited' in cgroup
memory limits changed from LLONG_MAX to ULLONG_MAX. Due to rather
unfortunate choice of our VIR_DOMAIN_MEMORY_PARAM_UNLIMITED constant
(which we transfer as an unsigned long long in Kibibytes), we ended up
with the situation described below (applies to x86_64):
- 2^64-1 (ULLONG_MAX) -- "unlimited" in kernel = 3.12
- 2^63-1 (LLONG_MAX) -- "unlimited" in kernel < 3.12
- 2^63-1024 -- our PARAM_UNLIMITED scaled to Bytes
- 2^53-1 -- our PARAM_UNLIMITED unscaled (in Kibibytes)
This means that when any number within (2^63-1, 2^64-1] is read from
memory cgroup, we are transferring that number instead of "unlimited".
Unfortunately, changing VIR_DOMAIN_MEMORY_PARAM_UNLIMITED would break
ABI compatibility and thus we have to resort to a different solution.
With this patch every value greater than PARAM_UNLIMITED means
"unlimited". Even though this may seem misleading, we are already in
such unclear situation when running 3.12 kernel with memory limits set
to 2^63.
One example showing most of the problems at once (with kernel 3.12.2):
# virsh memtune asdf --hard-limit 9007199254740991 --swap-hard-limit -1
# echo 12345678901234567890 >\
/sys/fs/cgroup/memory/machine/asdf.libvirt-qemu/memory.soft_limit_in_bytes
# virsh memtune asdf
hard_limit : 18014398509481983
soft_limit : 12056327051986884
swap_hard_limit: 18014398509481983
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The range of valid values for cgroup tunables has
changed in the past and may change again in future
kernels. Avoid hardcoding range checks in libvirt
code, delegating range checking to the kernel itself.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
When EINVAL is returned while changing a cgroups value, tell
user that what values are invalid for the field.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Debian systems may run the 'systemd-logind' daemon, which causes the
/sys/fs/cgroup/systemd mount to be setup, but no other cgroup
controllers are created. While the LXC driver considers cgroups to
be mandatory, the QEMU driver is supposed to accept them as optional.
We detect whether they are present by looking in /proc/mounts for
any mounts of type 'cgroups', but this is not sufficient. We need to
skip any named mounts (as seen by a name=XXX string in the mount
options), so that we only detect actual resource controllers.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=721979
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users in Ubuntu/Debian seem to have a setup where all the
cgroup controllers are mounted on /sys/fs/cgroup rather than
any /sys/fs/cgroup/<controller> name. In the loop which detects
which controllers are present for a mount point we were modifying
'mnt_dir' field in the 'struct mntent' var, but not always restoring
the original value. This caused detection to break in the all-in-one
mount setup.
Fix that logic bug and add test case coverage for this mount
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
- Convert virCgroupGet* to VIR_CGROUP_SUPPORTED
- Convert virCgroup(Get|Set)FreezerState to VIR_CGROUP_SUPPORTED
Signed-off-by: Eric Blake <eblake@redhat.com>
- Introduce VIR_CGROUP_SUPPORTED conditional
- Convert virCgroupKill* to use it
- Convert virCgroupIsolateMount() to use it
- Convert virCgroupRemoveRecursively to VIR_CGROUP_SUPPORTED
Signed-off-by: Eric Blake <eblake@redhat.com>
Make future patches smaller by matching a sane header listing in
the first place. No semantic change.
* src/util/vircgroup.h: Move free next to new, and controller
functions next to each other.
* src/util/vircgroup.c (virCgroupFree, virCgroupHasController)
(virCgroupPathOfController, virCgroupRemoveRecursively)
(virCgroupRemove): Sort implementation to be closer to header.
Signed-off-by: Eric Blake <eblake@redhat.com>
Avoid a forward declaration of a static function.
* src/util/vircgroup.c (virCgroupPartitionNeedsEscaping)
(virCgroupParticionEscape): Move up.
Signed-off-by: Eric Blake <eblake@redhat.com>
Format all functions with two blank lines between, and return type
on separate line from function name. Also break some lines longer
than 80 columns. This makes the subsequent macro refactoring
less noisy.
* src/util/vircgroup.c: Match prevailing style.
Signed-off-by: Eric Blake <eblake@redhat.com>
Make the virCgroupNewMachine method try to use systemd-machined
first. If that fails, then fallback to using the traditional
cgroup setup code path.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When systemd is involved in managing processes, it may start
killing off & tearing down croups associated with the process
while we're still doing virCgroupKillPainfully. We must
explicitly check for ENOENT and treat it as if we had finished
killing processes
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Systemd uses a named cgroup mount for tracking processes. Add
it as another type of controller, albeit one which we have to
special case in a number of places. In particular we must
never create/delete directories there, nor add tasks. Essentially
the systemd mount is to be considered read-only for libvirt.
With this change both the virCgroupDetectPlacement and
virCgroupCopyPlacement methods must be invoked. The copy
placement method will copy setup for resource controllers
only. The detect placement method will probe for any
named controllers, or resource controllers not already
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The previous patch was incomplete.
CC libvirt_util_la-vircgroup.lo
../../src/util/vircgroup.c:70:12: error: 'virCgroupPartitionEscape' declared 'static' but never defined [-Werror=unused-function]
static int virCgroupPartitionEscape(char **path);
^
* src/util/vircgroup.c (virCgroupPartitionEscape): Move forward
declaration inside conditional.
Signed-off-by: Eric Blake <eblake@redhat.com>
The virCgroupValidateMachineGroup method calls some functions
which are only conditionally compiled, thus it too must be
made conditional. This fixes the build on non-Linux hosts.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>