This reverts commit 0f80c71822.
Turns out, our code relies on virCgroupFree(&var) setting
var = NULL.
Conflicts:
src/util/vircgroup.c: context because 94f1855f09 is not
reverted.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Modify virCgroupFree function signature to take a value of type
virCgroupPtr instead of virCgroupPtr * as the parameter.
Change the argument type in all calls to virCgroupFree function
from virCgroupPtr * to virCgroupPtr. This is a step towards
having consistent function signatures for Free helpers so that
they can be used with VIR_AUTOPTR cleanup macro.
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Right-aligning backslashes when defining macros or using complex
commands in Makefiles looks cute, but as soon as any changes is
required to the code you end up with either distractingly broken
alignment or unnecessarily big diffs where most of the changes
are just pushing all backslashes a few characters to one side.
Generated using
$ git grep -El '[[:blank:]][[:blank:]]\\$' | \
grep -E '*\.([chx]|am|mk)$$' | \
while read f; do \
sed -Ei 's/[[:blank:]]*[[:blank:]]\\$/ \\/g' "$f"; \
done
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Currently the scan of the /proc/mounts file used to find cgroup mount
points doesn't take into account that mount points may hidden by other
mount points. For, example in certain Kubernetes environments the
/proc/mounts contains the following lines:
cgroup /sys/fs/cgroup/net_prio,net_cls cgroup ...
tmpfs /sys/fs/cgroup tmpfs ...
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup ...
In this particular environment the first mount point is hidden by the
second one. The correct mount point is the third one, but libvirt will
never process it because it only checks the first mount point for each
controller (net_cls in this case). So libvirt will try to use the first
mount point, which doesn't actually exist, and the complete detection
process will fail.
To avoid that issue this patch changes the virCgroupDetectMountsFromFile
function so that when there are duplicates it takes the information from
the last line in /proc/mounts. This requires removing the previous
explicit condition to skip duplicates, and adding code to free the
memory used by the processing of duplicated lines.
Related-To: https://bugzilla.redhat.com/1468214
Related-To: https://github.com/kubevirt/libvirt/issues/4
Signed-off-by: Juan Hernandez <jhernand@redhat.com>
Move all APIs with a virHostCPU name prefix out into new
util/virhostcpu.h & util/virhostcpu.c files
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
In preparation for moving all the CPU related APIs out of
the nodeinfo file, give them a virHostCPU name prefix.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Nearly all the methods in the nodeinfo file are given a
'const char *sysfs_prefix' parameter to override the
default sysfs path (/sys/devices/system). Every single
caller passes in NULL for this, except one use in the
unit tests. Furthermore this parameter is totally
Linux-specific, when the APIs are intended to be cross
platform portable.
This removes the sysfs_prefix parameter and instead gives
a new method linuxNodeInfoSetSysFSSystemPath for use by
the test suite.
For two of the methods this hardcodes use of the constant
SYSFS_SYSTEM_PATH, since the test suite does not need to
override the path for thos methods.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We have macros for both positive and negative string matching.
Therefore there is no need to use !STREQ or !STRNEQ. At the same
time as we are dropping this, new syntax-check rule is
introduced to make sure we won't introduce it again.
Signed-off-by: Ishmanpreet Kaur Khera <khera.ishman@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
gcc 4.1.2 (hello RHEL 5) on 32-bit platforms complains:
vircgrouptest.c: In function 'testCgroupGetPercpuStats':
vircgrouptest.c:627: warning: integer constant is too large for 'long' type
vircgrouptest.c:628: warning: this decimal constant is unsigned only in ISO C90
vircgrouptest.c:634: warning: integer constant is too large for 'long' type
vircgrouptest.c:635: warning: this decimal constant is unsigned only in ISO C90
vircgrouptest.c:636: warning: this decimal constant is unsigned only in ISO C90
vircgrouptest.c:644: warning: integer constant is too large for 'long' type
* tests/vircgrouptest.c (testCgroupGetPercpuStats): Use ULL suffix.
Signed-off-by: Eric Blake <eblake@redhat.com>
This new internal API checks if given CGroup controller is
available. It is going to be needed later when we need to make a
decision whether pin domain memory onto NUMA nodes using cpuset
CGroup controller or using numa_set_membind().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Coverity reports that my commit af1c98e introduced
two memory leaks:
the cpumap if ncpus == 0 in virCgroupGetPercpuStats
and the params array in the test of the function.
My commit af1c98e4 broke the build on RHEL-6:
vircgrouptest.c: In function 'testCgroupGetPercpuStats':
vircgrouptest.c:566: error: nested extern declaration of
'_gl_verify_function2' [-Wnested-externs]
The only thing that needs checking is that the array size
is at least EXPECTED_NCPUS, to prevent access beyond the array.
We can ensure the minimum size also by specifying the array
size upfront.
Per-cpu stats are only shown for present CPUs in the cgroups,
but we were only parsing the largest CPU number from
/sys/devices/system/cpu/present and looking for stats even for
non-present CPUs.
This resulted in:
internal error: cpuacct parse error
Currently, virCgroupGetPercpuStats is only used by the LXC driver,
filling out the CPUTIME stats. qemuDomainGetPercpuStats does this
and also filles out VCPUTIME stats.
Extend virCgroupGetPercpuStats to also report VCPUTIME stats if
nvcpupids is non-zero. In the LXC driver, we don't have cpupids.
In the QEMU driver, there is at least one cpupid for a running domain,
so the behavior shouldn't change for QEMU either.
Also rename getSumVcpuPercpuStats to virCgroupGetPercpuVcpuSum.
Any source file which calls the logging APIs now needs
to have a VIR_LOG_INIT("source.name") declaration at
the start of the file. This provides a static variable
of the virLogSource type.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit a1cbe4b5 added a check for spaces around assignments and this
patch extends it to checks for spaces around '=='. One exception is
virAssertCmpInt where comma after '==' is acceptable (since it is a
macro and '==' is its argument).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
vircgrouptest.c: In function 'testCgroupGetPercpuStats':
vircgrouptest.c:543: warning: integer constatnt is too large for 'long' type
Signed-off-by: Eric Blake <eblake@redhat.com>
The test case average timing code has not been used by any test
case ever. Delete it to remove complexity.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Debian systems may run the 'systemd-logind' daemon, which causes the
/sys/fs/cgroup/systemd mount to be setup, but no other cgroup
controllers are created. While the LXC driver considers cgroups to
be mandatory, the QEMU driver is supposed to accept them as optional.
We detect whether they are present by looking in /proc/mounts for
any mounts of type 'cgroups', but this is not sufficient. We need to
skip any named mounts (as seen by a name=XXX string in the mount
options), so that we only detect actual resource controllers.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=721979
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some users in Ubuntu/Debian seem to have a setup where all the
cgroup controllers are mounted on /sys/fs/cgroup rather than
any /sys/fs/cgroup/<controller> name. In the loop which detects
which controllers are present for a mount point we were modifying
'mnt_dir' field in the 'struct mntent' var, but not always restoring
the original value. This caused detection to break in the all-in-one
mount setup.
Fix that logic bug and add test case coverage for this mount
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Systemd uses a named cgroup mount for tracking processes. Add
it as another type of controller, albeit one which we have to
special case in a number of places. In particular we must
never create/delete directories there, nor add tasks. Essentially
the systemd mount is to be considered read-only for libvirt.
With this change both the virCgroupDetectPlacement and
virCgroupCopyPlacement methods must be invoked. The copy
placement method will copy setup for resource controllers
only. The detect placement method will probe for any
named controllers, or resource controllers not already
setup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDomainDriver and virCgroupNewDriver methods
are obsolete now that we can auto-detect existing cgroup
placement. Delete them to reduce code bloat.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, the controllers argument to virCgroupDetect acts both as
a result filter and a required controller specification, which is
a bit overloaded. If both functionalities are needed, it would be
better to have them seperated into a filter and a requirement mask.
The only situation where it is used today is to ensure that only
CPU related controllers are used for the VCPU directories. But here
we clearly do not want to enforce the existence of cpu, cpuacct and
specifically not cpuset at the same time.
This commit changes the semantics of controllers to "filter only".
Should a required mask ever be needed, more work will have to be done.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.
If a user cgroup name begins with "cgroup.", "_" or with any of
the controllers from /proc/cgroups followed by a dot, then they
need to be prefixed with a single underscore. eg if there is
an object "cpu.service", then this would end up as "_cpu.service"
in the cgroup filesystem tree, however, "waldo.service" would
stay "waldo.service", at least as long as nobody comes up with
a cgroup controller called "waldo".
Since we require a '.XXXX' suffix on all partitions, there is
no scope for clashing with the kernel 'tasks' and 'release_agent'
files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If the partition named passed in the XML does not already have
a suffix, ensure it gets a '.partition' added to each component.
The exceptions are /machine, /user and /system which do not need
to have a suffix, since they are fixed partitions at the top
level.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Recently we changed to create VM cgroups with the naming pattern
$VMNAME.$DRIVER.libvirt. Following discussions with the systemd
community it was decided that only having a single '.' in the
names is preferrable. So this changes the naming scheme to be
$VMNAME.libvirt-$DRIVER. eg for LXC 'mycontainer.libvirt-lxc' or
for KVM 'myvm.libvirt-qemu'.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
If a cgroup controller is co-mounted with another, eg
/sys/fs/cgroup/cpu,cpuacct
Then it is a requirement that there exist symlinks at
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
pointing to the real mount point. Add support to virCgroupPtr
to detect and track these symlinks
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently the virCgroupPtr struct contains 3 pieces of
information
- path - path of the cgroup, relative to current process'
cgroup placement
- placement - current process' placement in each controller
- mounts - mount point of each controller
When reading/writing cgroup settings, the path & placement
strings are combined to form the file path. This approach
only works if we assume all cgroups will be relative to
the current process' cgroup placement.
To allow support for managing cgroups at any place in the
heirarchy a change is needed. The 'placement' data should
reflect the absolute path to the cgroup, and the 'path'
value should no longer be used to form the paths to the
cgroup attribute files.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some aspects of the cgroups setup / detection code are quite subtle
and easy to break. It would greatly benefit from unit testing, but
this is difficult because the test suite won't have privileges to
play around with cgroups. The solution is to use monkey patching
via LD_PRELOAD to override the fopen, open, mkdir, access functions
to redirect access of cgroups files to some magic stubs in the
test suite.
Using this we provide custom content for the /proc/cgroup and
/proc/self/mounts files which report a fixed cgroup setup. We
then override open/mkdir/access so that access to the cgroups
filesystem gets redirected into files in a temporary directory
tree in the test suite build dir.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>