Virsh freecell --all was not only getting wrong NUMA nodes count, but
even the NUMA nodes IDs. They doesn't have to be continuous, as I've
found out during testing this. Therefore a modification of
nodeGetCellsFreeMemory() error message.
When running 'make check' under a multi-cpu Dom0 xen machine,
nodeinfotest had a spurious failure it was reading from
/sys/devices/system/cpu, but xen has no notion of topology. The test
was intended to be isolated from reading any real system files; the
regression was introduced in Mar 2010 with commit aa2f6f96dd.
Fix things by allowing an early exit for the testsuite.
* src/nodeinfo.c (linuxNodeInfoCPUPopulate): Add parameter.
(nodeGetInfo): Adjust caller.
* tests/nodeinfotest.c (linuxTestCompareFiles): Likewise.
The nodeinfo structure includes
nodes : the number of NUMA cell, 1 for uniform mem access
sockets : number of CPU socket per node
cores : number of core per socket
threads : number of threads per core
which does not work well for NUMA topologies where each node does not
consist of integral number of CPU sockets.
We also have VIR_NODEINFO_MAXCPUS macro in public libvirt.h which
computes maximum number of CPUs as (nodes * sockets * cores * threads).
As a result, we can't just change sockets to report total number of
sockets instead of sockets per node. This would probably be the easiest
since I doubt anyone is using the field directly. But because of the
macro, some apps might be using sockets indirectly.
This patch leaves sockets to be the number of CPU sockets per node (and
fixes qemu driver to comply with this) on machines where sockets can be
divided by nodes. If we can't divide sockets by nodes, we behave as if
there was just one NUMA node containing all sockets. Apps interested in
NUMA should consult capabilities XML, which is what they probably do
anyway.
This way, the only case in which apps that care about NUMA may break is
on machines with funky NUMA topology. And there is a chance libvirt
wasn't able to start any guests on those machines anyway (although it
depends on the topology, total number of CPUs and kernel version).
Nothing changes at all for apps that don't care about NUMA.
Similarly to deprecating close(), I am now deprecating fclose() and
introduce VIR_FORCE_FCLOSE() and VIR_FCLOSE(). Also, fdopen() is replaced with
VIR_FDOPEN().
Most of the files are opened in read-only mode, so usage of
VIR_FORCE_CLOSE() seemed appropriate. Others that are opened in write
mode already had the fclose()< 0 check and I converted those to
VIR_FCLOSE()< 0.
I did not find occurrences of possible double-closed files on the way.
When finding a sparse NUMA topology, libnuma will return ENOENT
the first time it is invoked. On subsequent invocations it
will return success, but with an all-1's CPU mask. Check for
this, to avoid polluting the capabilities XML with 4096 bogus
CPUs
* src/nodeinfo.c: Check for all-1s CPU mask
https://bugzilla.redhat.com/622515 - When hot-unplugging CPUs,
libvirt failed to start a guest that had been pinned to CPUs that
were still online.
Tested on a dual-core laptop, where I also discovered that, per
http://www.cyberciti.biz/files/linux-kernel/Documentation/cpu-hotplug.txt,
/sys/devices/system/cpu/cpu0/online does not exist on systems where it
cannot be hot-unplugged.
* src/nodeinfo.c (linuxNodeInfoCPUPopulate): Ignore CPUs that are
currently offline. Detect readdir failure.
(parse_socket): Move guts...
(get_cpu_value): ...to new function, shared with...
(cpu_online): New function.
The nodeGetInfo code was always assuming that machine had a
single NUMA node, which is not correct. The good news is that
libnuma gives us this information pretty easily, so let's
properly report it.
NOTE: With recent hardware starting to support CPU hot-add
and hot-remove, both this code and the nodeCapsInitNUMA()
code are quickly going to become obsolete. We'll have to
think of a more dynamic solution for dealing with NUMA
nodes and CPUs that can come and go at will.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
As pointed out by eblake, I made a real hash of the
nodeinfo code with commit
aa2f6f96ddd7a57011c3d25586d588100651feb2. This patch
cleans it up:
1) Do more work at compile time instead of runtime (minor)
2) Properly handle the hex digits that come from
/sys/devices/system/cpu/cpu*/topology/thread_siblings
3) Fix up some error paths that could cause SEGV
4) Used unsigned's for the cpu numbers (cpu -1 doesn't
make any sense)
Along with the recent patch from jdenemar to zero out
the nodeinfo structure, I've re-tested this on the
machines having the problems, and it seems to be good.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
The nodeinfo structure wasn't initialized in qemu driver and with the
recent change in CPU topology parsing, old value of nodeinfo->sockets
could be used and incremented giving totally bogus results.
Let's just wipe the structure completely.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The current code for "nodeinfo" is pretty naive
about socket and thread information. To determine the
sockets, it just takes the number of cpus and divides
by the number of cores. For the thread count, it always
sets it to 1. With more recent Intel machines, however,
hyperthreading is again an option, meaning that these
heuristics no longer work and give bogus numbers. This
patch goes through /sys to get the additional
information so we properly report it.
Note that I had to edit the tests not to report on
socket and thread counts, since these are determined
dynamically now.
v2: As pointed out by Eric Blake, gnulib provides
count-one-bits (which is LGPLv2+). Use it instead
of a hand-coded popcnt.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Add the virStrncpy function, which takes a dst string, source string,
the number of bytes to copy and the number of bytes available in the
dest string. If the source string is too large to fit into the
destination string, including the \0 byte, then no data is copied and
the function returns NULL. Otherwise, this function copies n bytes
from source into dst, including the \0, and returns a pointer to the
dst string. This function is intended to replace all unsafe uses
of strncpy in the code base, since strncpy does *not* guarantee that
the buffer terminates with a \0.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
by running this command:
git ls-files -z | xargs -0 perl -pi -0777 -e 's/\n\n+$/\n/'
This is in preparation for a more strict make syntax-check
rule that will detect trailing blank lines.
qemudCapsInitNUMA and umlCapsInitNUMA were identical, so this change
factors them into a new function, virCapsInitNUMA, and puts it in
nodeinfo.c.
In addition to factoring out the duplicates, this change also
adjusts that function definition (along with its macros) so
that it works with Fedora 9's numactl version 1, and makes it
so the code will work even if someone builds the kernel with
CONFIG_NR_CPUS > 4096.
Finally, also perform this NUMA initialization for the lxc
and openvz drivers.
* src/nodeinfo.c: Include <stdint.h>, <numa.h> and "memory.h".
(virCapsInitNUMA): Rename from qemudCapsInitNUMA and umlCapsInitNUMA.
(NUMA_MAX_N_CPUS): Define depending on NUMA API version.
(n_bits, MASK_CPU_ISSET): Define, adjust, use uint64 rather than long.
* src/nodeinfo.h: Include "capabilities.h".
(virCapsInitNUMA): Declare it.
* examples/domain-events/events-c/Makefile.am:
* src/Makefile.am: Add $(NUMACTL_CFLAGS) and $(NUMACTL_LIBS) to various
compile/link-related variables.
* src/qemu_conf.c: Include "nodeinfo.h".
(qemudCapsInitNUMA): Remove duplicate code. Adjust caller.
* src/uml_conf.c (umlCapsInitNUMA): Likewise.
Include "nodeinfo.h".
* src/lxc_conf.c: Include "nodeinfo.h".
(lxcCapsInit): Perform NUMA initialization here, too.
* src/openvz_conf.c (openvzCapsInit): And here.
Include "nodeinfo.h".
* src/libvirt_sym.version.in: Add virCapsInitNUMA so that libvirtd
can link to this function.
Up to now, we've been avoiding ctype functions like isspace, isdigit,
etc. because they are locale-dependent. Now that we have the c-ctype
functions, we can start using *them*, to make the code more readable
with changes like these:
- /* This may not work on EBCDIC. */
- if ((*p >= 'a' && *p <= 'z') ||
- (*p >= 'A' && *p <= 'Z') ||
- (*p >= '0' && *p <= '9'))
+ if (c_isalnum(*p))
- while ((*cur >= '0') && (*cur <= '9')) {
+ while (c_isdigit(*cur)) {
Also, some macros in conf.c used names that conflicted with
standard meaning of "BLANK" and "SPACE", so I've adjusted them
to be in line with the definition of e.g., isblank.
In addition, I've wrapped those statement macros with do {...} while (0),
so that we can't forget the ";" after a use. There was one like that
already (fixed below). The missing semicolon would mess up automatic
indenting.
* src/buf.c (virBufferURIEncodeString):
* src/conf.c (IS_EOL, SKIP_BLANKS_AND_EOL, SKIP_BLANKS)
(virConfParseLong, virConfParseValue, virConfParseName)
(virConfParseSeparator, virConfParseStatement, IS_BLANK, IS_CHAR)
(IS_DIGIT, IS_SPACE, SKIP_SPACES):
* src/nodeinfo.c:
* src/qemu_conf.c (qemudParseInterfaceXML):
* src/qemu_driver.c (qemudDomainBlockStats):
* src/sexpr.c:
* src/stats_linux.c:
* src/util.c (virParseNumber, virDiskNameToIndex):
* src/uuid.c (hextobin, virUUIDParse):
* src/virsh.c:
* src/xml.c (parseCpuNumber, virParseCpuSet):
# Convert uses of isspace to c_isspace, isdigit to c_isdigit, etc.
re=$(man isspace|grep is.....,.is|sed 's/ -.*//' \
|tr -s ', \n' \||sed 's/^|//;s/|$//')
git grep -l -E "$re"|grep -Ev 'Chan|gnulib' \
|xargs perl -pi -e 's/\b('"$re"')\b/c_$1/g'
# Remove all uses of to_uchar
git grep -l to_uchar|xargs perl -pi -e 's/to_uchar\((.*?)\)/$1/g'
* src/util.h (to_uchar): Remove definition.
(TOLOWER): Remove definition.
(__virMacAddrCompare): Use c_tolower, not TOLOWER.
Globally:
Where needed, change <ctype.h> to <c-ctype.h>.
Remove unnecessary inclusion of <ctype.h>.
Ensure the global changes are never needed again:
* Makefile.maint (sc_avoid_ctype_macros): Prohibit use of ctype
macros. Recommend c-ctype.h instead.
(sc_prohibit_c_ctype_without_use): New rule.
(sc_prohibit_ctype_h): New rule. Disallow use of <ctype.h>.
* src/internal.h: move xstrol() variants from here ...
* src/util.[ch]: ... to here and rename to virStrToLong()
* src/libvirt_sym.version: export __virStrToLong_i() for
virsh and qemud.
* src/nodeinfo.c, src/stats_linux.c, src/virsh.c,
src/xend_internal.c, qemud/qemud.c: replace xstrtol()
calls with virStrToLong()
* src/nodeinfo.h: don't include internal.h, which was only
needed for xstrtol(), but instead include libvirt.h which
is suffificient for the declarations in the header.
Use <config.h>, not "config.h", per autoconf documentation.
* Makefile.cfg (local-checks-to-skip) [sc_require_config_h]: Enable.
* .x-sc_require_config_h: New file, to list exempted files.
* Makefile.am (EXTRA_DIST): Add .x-sc_require_config_h.
* src/.cvsignore: Ignore *.loT files (generated under Windows).
* proxy/libvirt_proxy.c: Bail out earlier --without-xen.
* src/proxy_internal.c: Don't build proxy client side if
configured --without-xen.
* src/iptables.c, src/iptables.h: Disable this code if
configured --without-qemu.
* src/nodeinfo.c: If no 'uname' function, set model name to
empty string (for Windows).
* src/xen_unified.h, src/util.c, src/test.c: Include <winsock2.h>
on Windows.
* src/util.c: Disable virExec* and virFileLinkPointsTo on
MinGW.
New files go into these directories:
gnulib/lib
gnulib/m4
gnulib/tests
* bootstrap: A wrapper around gnulib-tool.
* configure.in: Invoke gl_EARLY and gl_INIT, being careful to put gl_EARLY
before any macro that uses AC_COMPILE_IFELSE.
(AC_OUTPUT): Add lib/Makefile and gl-tests/Makefile. Remove m4/Makefile.
* Makefile.am (SUBDIRS): Add gnulib/lib and remove m4. Add gnulib/tests
early enough that those tests run before any libvirt unit tests.
* m4/Makefile.am: Remove file. Not needed.
* src/Makefile.am (INCLUDES): Add -I$(top_srcdir)/gnulib/lib -I../gnulib/lib.
(LDADDS, libvirt_la_LIBADD): Add ../gnulib/lib/libgnu.la.
* src/nodeinfo.c: Include "physmem.h".
* qemud/qemud.c, src/remote_internal.c: Include "getaddrinfo.h".
(MEMINFO_PATH, linuxNodeInfoMemPopulate): Remove definitions.
(virNodeInfoPopulate): Use physmem_total, not linuxNodeInfoMemPopulate.
* tests/Makefile.am (INCLUDES): Add -I$(top_srcdir)/gnulib/lib -I../gnulib/lib.
(LDADDS): Add ../gnulib/lib/libgnu.la.
* qemud/Makefile.am (libvirtd_LDADD): Add ../gnulib/lib/libgnu.la.
* tests/nodeinfotest.c (linuxTestCompareFiles): No longer read total
memory from a file.
Update expected output not to include "Memory: NNNN"
* tests/nodeinfodata/linux-nodeinfo-1.txt:
* tests/nodeinfodata/linux-nodeinfo-2.txt:
* tests/nodeinfodata/linux-nodeinfo-3.txt:
* tests/nodeinfodata/linux-nodeinfo-4.txt:
* tests/nodeinfodata/linux-nodeinfo-5.txt:
* tests/nodeinfodata/linux-nodeinfo-6.txt:
* src/test.c [WITH_TEST]: Remove definition of _GNU_SOURCE that
would conflict with the one now in "config.h".
* autogen.sh: Add -I gnulib/m4.
* src/conf.c, src/sexpr.c: Don't define _GNU_SOURCE.
Instead, include "config.h".
* qemud/qemud.c: Remove definition of _GNU_SOURCE.
* src/openvz_driver.c: Likewise.
* src/qemu_driver.c: Likewise.
* src/remote_internal.c: Likewise.
* configure.in: Use AC_CONFIG_AUX_DIR(build-aux), so that a bunch
of gettextize-generated files go into build-aux/, rather than in
the top-level directory.
* .cvsignore: Adjust.
* build-aux/.cvsignore: New file.
Author: Jim Meyering <meyering@redhat.com>
* qemud/qemud.c: Replace uses of strtol with uses of xstrtol_i.
Avoid overflow for very large --timeout=N values.
* src/nodeinfo.c: In linuxNodeInfoMemPopulate and
linuxNodeInfoCPUPopulate, use xstrtol_i rather than strtol.
Unlike in qemud.c, here we allow trailing "isspace", and in
the case of "cpuinfo cpu MHz", also allow a "." terminator,
since we ignore the decimal and any following digits.
* src/internal.h: Define xstrtol_ui, too.
Author: Jim Meyering <meyering@redhat.com>