Commit Graph

200 Commits

Author SHA1 Message Date
Michal Privoznik
f55d1316ad sysconf: Include unistd.h
The manpage for sysconf() suggest including unistd.h as the
function is declared there. Even though we are not hitting any
compile issues currently, let's include the correct header file
instead of relying on some hidden include chain.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2015-12-24 18:03:50 +01:00
Andrea Bolognani
88c4c32af1 nodeinfo: Fix build failure when KVM headers are not available
Compiler error:

  ../../src/nodeinfo.c: In function 'nodeGetThreadsPerSubcore':
  ../../src/nodeinfo.c:2393: error: label 'out' defined but not used [-Wunused-label]
  ../../src/nodeinfo.c:2352: error: unused parameter 'arch' [-Wunused-parameter]
2015-08-03 17:14:16 +02:00
Shivaprasad G Bhat
014208c4d0 nodeinfo: Fix output on PPC64 KVM hosts
The nodeinfo is reporting incorrect number of cpus and incorrect host
topology on PPC64 KVM hosts. The KVM hypervisor on PPC64 needs only
the primary thread in a core to be online, and the secondaries offlined.
While scheduling a guest in, the kvm scheduler wakes up the secondaries to
run in guest context.

The host scheduling of the guests happen at the core level(as only primary
thread is online). The kvm scheduler exploits as many threads of the core
as needed by guest. Further, starting POWER8, the processor allows splitting
a physical core into multiple subcores with 2 or 4 threads each. Again, only
the primary thread in a subcore is online in the host. The KVM-PPC
scheduler allows guests to exploit all the offline threads in the subcore,
by bringing them online when needed.
(Kernel patches on split-core http://www.spinics.net/lists/kvm-ppc/msg09121.html)

Recently with dynamic micro-threading changes in ppc-kvm, makes sure
to utilize all the offline cpus across guests, and across guests with
different cpu topologies.
(https://www.mail-archive.com/kvm@vger.kernel.org/msg115978.html)

Since the offline cpus are brought online in the guest context, it is safe
to count them as online. Nodeinfo today discounts these offline cpus from
cpu count/topology calclulation, and the nodeinfo output is not of any help
and the host appears overcommited when it is actually not.

The patch carefully counts those offline threads whose primary threads are
online. The host topology displayed by the nodeinfo is also fixed when the
host is in valid kvm state.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
2015-08-03 08:38:46 -04:00
Andrea Bolognani
f86c45ca0c nodeinfo: Check for errors when reading core_id 2015-07-23 12:01:19 +02:00
Andrea Bolognani
6395ec1cf0 nodeinfo: Calculate present and online CPUs only once
Move the calls to the respective functions from virNodeParseNode(),
which is executed once for every NUMA node, to
linuxNodeInfoCPUPopulate(), which is executed just once per host.
2015-07-22 10:50:53 +02:00
Andrea Bolognani
05be606282 nodeinfo: Use a bitmap to keep track of node CPUs
Keep track of what CPUs belong to the current node while walking
through the sysfs node entry, so we don't need to do it a second
time immediately afterwards.

This also allows us to loop through all CPUs that are part of a
node in guaranteed ascending order, which is something that is
required for some upcoming changes.
2015-07-22 10:37:25 +02:00
Andrea Bolognani
b909e9fb2c nodeinfo: Use nodeGetOnlineCPUBitmap() when parsing node
No need to look up the online status of each CPU separately when we
can get all the information in one go.
2015-07-22 10:37:20 +02:00
Andrea Bolognani
b7b506475c nodeinfo: Phase out cpu_set_t usage
Swap out all instances of cpu_set_t and replace them with virBitmap,
which some of the code was already using anyway.

The changes are pretty mechanical, with one notable exception: an
assumption has been added on the max value we can run into while
reading either socket_it or core_id.

While this specific assumption was not in place before, we were
using cpu_set_t improperly by not making sure not to set any bit
past CPU_SETSIZE or explicitly allocating bigger bitmaps; in fact
the default size of a cpu_set_t, 1024, is way too low to run our
testsuite, which includes core_id values in the 2000s.
2015-07-22 10:14:02 +02:00
Andrea Bolognani
c1df42d734 nodeinfo: Rename nodeGetCPUBitmap() to nodeGetOnlineCPUBitmap()
The new name makes it clear that the returned bitmap contains the
information about which CPUs are online, not eg. which CPUs are
present.

No behavioral change.
2015-07-22 10:14:02 +02:00
Andrea Bolognani
ccd0ea7ef5 nodeinfo: Remove out parameter from nodeGetCPUBitmap()
Not all users of this API will need the size of the returned
bitmap; those who do can simply call virBitmapSize() themselves.
2015-07-22 10:14:01 +02:00
Andrea Bolognani
37f73e4ad5 nodeinfo: Add old kernel compatibility to nodeGetPresentCPUBitmap()
If the cpu/present file is not available, we assume that the kernel
is too old to support non-consecutive CPU ids and return a bitmap
with all the bits set to represent this fact. This assumption is
already exploited in nodeGetCPUCount().

This means users of this API can expect the information to always
be available unless an error has occurred, and no longer need to
treat the NULL return value as a special case.

The error message has been updated as well.
2015-07-22 10:14:01 +02:00
Andrea Bolognani
a2e2add1f1 nodeinfo: Rename linuxParseCPUmax() to linuxParseCPUCount()
The original name was confusing because the function returns the number
of CPUs, not the maximum CPU id. The comment above the function has
been updated to reflect this.

No behavioral changes.
2015-07-22 10:14:01 +02:00
Andrea Bolognani
6fecc4017d nodeinfo: Introduce linuxGetCPUOnlinePath() 2015-07-22 10:14:01 +02:00
Andrea Bolognani
bd87f07c25 nodeinfo: Introduce linuxGetCPUGlobalPath()
This is just a more generic version of linuxGetCPUPresentPath(),
which is now implemented by calling the new function appropriately.
2015-07-22 10:14:01 +02:00
Andrea Bolognani
2a6801892a nodeinfo: Fix nodeGetCPUBitmap()'s fallback code path
During the recent refactoring/cleanups, a bug has been introduced
that caused all CPUs to be reported as online unless the sysfs
cpu/present file was available.

This commit fixes the fallback code path by building the directory
path passed to virNodeGetCpuValue() correctly.
2015-07-22 09:57:57 +02:00
Roman Bogorodskiy
e46791e003 nodeinfo: fix build on FreeBSD
Currently, build fails on FreeBSD with:

  CC       libvirt_driver_la-nodeinfo.lo
nodeinfo.c:1941:56: error: use of undeclared identifier 'SYSFS_SYSTEM_PATH'
    const char *prefix = sysfs_prefix ? sysfs_prefix : SYSFS_SYSTEM_PATH;
                                                       ^
1 error generated.

This is caused by commit b97b3048 that added sysfs_prefix to
nodeCapsInitNUMA and used SYSFS_CPU_PATH.

Fix it by unconditionally defining SYSFS_CPU_PATH instead of defining it
under #ifdef __linux__.
2015-07-20 14:01:49 +03:00
Andrea Bolognani
aa6c3fee86 nodeinfo: Formatting changes 2015-07-14 17:11:36 -04:00
Andrea Bolognani
75f6f54546 nodeinfo: Make sysfs_prefix usage more consistent
Make sure sysfs_prefix, when present, is always the first argument
to a function; don't use a different name to refer to it; check
whether it is NULL, and hence SYSFS_SYSTEM_PATH should be used, only
when using it directly and not just passing it down to another
function; always pass down the same value we've been passed when
calling another function.
2015-07-14 17:11:36 -04:00
Kothapally Madhu Pavan
bb31f4532b nodeinfo: fix to parse present cpus rather than possible cpus
This patch resolves a situation where a core is defective and is not
in the present mask during boot. Optionally a host can have empty sockets
could be brought online if the socket is added. In this case the present
mask contains the cpu's that are actually there in the sockets even though
they might be offline for some reason. This patch excludes the cpu's that
are offline because the socket is defective/empty by checking the present
mask before reading the cpu directory. Otherwise, the nodeinfo on such
hosts always displays wrong output which includes the defective/empty
sockets as set of offline cpu's.

Signed-off-by: Kothapally Madhu Pavan <kmp@linux.vnet.ibm.com>
2015-07-13 16:07:44 -04:00
John Ferlan
c71f0654fc nodeinfo: Add sysfs_prefix to nodeGetMemoryStats
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
b97b30480d nodeinfo: Add sysfs_prefix to nodeCapsInitNUMA
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_CPU_PATH which is a
derivative of SYSFS_SYSTEM_PATH

Use cpupath for nodeCapsInitNUMAFake and remove SYSFS_CPU_PATH
2015-07-13 15:59:32 -04:00
John Ferlan
29e4f2243f nodeinfo: Add sysfs_prefix to nodeGetInfo
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
f1c6179f0d nodeinfo: Add sysfs_prefix to nodeGetCPUMap
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
f220a3e5a8 nodeinfo: Add sysfs_prefix to nodeGetCPUBitmap
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
51281dcb90 nodeinfo: Add sysfs_prefix to nodeGetPresentCPUBitmap
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
f1a43a0f91 nodeinfo: Add sysfs_prefix to nodeGetCPUCount
Add the sysfs_prefix argument to the call to allow for setting the
path for tests to something other than SYSFS_SYSTEM_PATH.
2015-07-13 15:59:32 -04:00
John Ferlan
3119e05e26 nodeinfo: Introduce local linuxGetCPUPresentPath
The API will print the path to the /cpu/present file using the sysfs_prefix.

NB: This is setup for future patches which will allow local/test sysfs paths.
2015-07-13 15:59:32 -04:00
Ján Tomko
18eb727fe9 Simplify virNodeCountThreadSiblings
Use a for loop instead of while.

Do not opencode c_isxdigit and virHexToBin.
2015-06-02 16:13:14 +02:00
Ján Tomko
e37bcbd9b8 Report errors in virNodeCountThreadSiblings
Use virFileReadAll which reports an error when the file is larger
than the specified maximum.

https://bugzilla.redhat.com/show_bug.cgi?id=1207849
2015-06-02 16:13:10 +02:00
Kothapally Madhu Pavan
6074f8316c virsh: Fix to list online cpus using virsh capabilities
Virsh capabilities will list offline cpus as online when
libvirt is compiled with numactl option disabled. This
fix will list correct set of online cpus.
2015-05-28 17:23:53 +02:00
Wei Huang
c13de01691 nodeinfo: Increase the num of CPU thread siblings to a larger value
Current libvirt can only handle up to 1023 bytes when it
reads Linux sysfs topology/thread_siblings. This isn't enough for
Linux distributions that support a large value. This patch fixes
the problem by using VIR_ALLOC()/VIR_FREE(), instead of using a
fixed-size (1024) local char array. In the meanwhile
SYSFS_THREAD_SIBLINGS_LIST_LENGTH_MAX is increased to 8192 which
should be large enough for a foreseeable future.

Signed-off-by: Wei Huang <wei@redhat.com>
2015-03-27 10:20:56 +01:00
Ján Tomko
22fd3ac38f Introduce virBitmapIsBitSet
A helper that never returns an error and treats bits out of bitmap range
as false.

Use it everywhere we use ignore_value on virBitmapGetBit, or loop over
the bitmap size.
2015-03-13 15:31:33 +01:00
Ján Tomko
af1c98e406 Fix virCgroupGetPercpuStats with non-continuous present CPUs
Per-cpu stats are only shown for present CPUs in the cgroups,
but we were only parsing the largest CPU number from
/sys/devices/system/cpu/present and looking for stats even for
non-present CPUs.
This resulted in:
internal error: cpuacct parse error
2015-01-22 17:01:11 +01:00
Jincheng Miao
a5c7ea4536 nodeinfo: report error when failure in nodeSetMemoryParameters
nodeSetMemoryParameters() will call nodeSetMemoryParameterValue()
to set parameters. But it just filter the return code '-2' as
failure. Indeed we should report error when rc is negative.

https://bugzilla.redhat.com/show_bug.cgi?id=1161541

Signed-off-by: Jincheng Miao <jmiao@redhat.com>
2014-11-10 15:06:57 +01:00
Michal Privoznik
0228fa11c0 nodeinfo: Implement nodeAllocPages
And add stubs to other drivers like: lxc, qemu, uml and vbox.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-09-25 10:24:45 +02:00
Michal Privoznik
4aa8a68faa nodeGetFreePages: Push forgotten change
In the previous patch I've changed the for loop bounds but forgot
to 'git add' changes that adapt the rest of the code.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-09-25 10:24:44 +02:00
Jincheng Miao
8baf0f025f nodeinfo: fix nodeGetFreePages when max node is zero
In nodeGetFreePages, if startCell is given by '0',
and the max node number is '0' too. The for-loop
wouldn't be executed.
So convert it to while-loop.

Before:
> virsh freepages --cellno 0 --pagesize 4
error: internal error: no suitable info found

After:
> virsh freepages --cellno 0 --pagesize 4
4KiB: 472637

Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-09-24 13:33:43 +02:00
Michal Privoznik
f8857c8f88 nodeinfo: Prefer MIN in nodeGetFreePages
It's better to use a macro instead of if-else construct.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-09-23 11:34:06 +02:00
Jincheng Miao
7db1936642 nodeinfo: report error when given node is out of range
https://bugzilla.redhat.com/show_bug.cgi?id=1145050

Signed-off-by: Jincheng Miao <jmiao@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-09-23 10:23:20 +02:00
John Ferlan
34476d720f nodeinfo: Resolve Coverity NEGATIVE_RETURNS
If the virNumaGetNodeCPUs() call fails with -1, then jumping to cleanup
with 'cpus == NULL' and calling virCapabilitiesClearHostNUMACellCPUTopology
will cause issues.

Signed-off-by: John Ferlan <jferlan@redhat.com>
2014-09-11 08:10:14 -04:00
Michal Privoznik
f4c87a0c35 nodeCapsInitNUMA: Avoid @cpumap leak
In case the host has 2 or more NUMA nodes, we fetch CPU map for each
node. However, we need to free the CPU map in between loops:

==29513== 96 (72 direct, 24 indirect) bytes in 3 blocks are definitely lost in loss record 951 of 1,264
==29513==    at 0x4C2A700: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==29513==    by 0x52AD24B: virAlloc (viralloc.c:144)
==29513==    by 0x52AF0E6: virBitmapNew (virbitmap.c:78)
==29513==    by 0x52FB720: virNumaGetNodeCPUs (virnuma.c:294)
==29513==    by 0x53C700B: nodeCapsInitNUMA (nodeinfo.c:1886)
==29513==    by 0x11759708: vboxCapsInit (vbox_common.c:398)
==29513==    by 0x11759CC4: vboxConnectOpen (vbox_common.c:514)
==29513==    by 0x53C965F: do_open (libvirt.c:1147)
==29513==    by 0x53C9EBC: virConnectOpen (libvirt.c:1317)
==29513==    by 0x142905: remoteDispatchConnectOpen (remote.c:1215)
==29513==    by 0x126ADF: remoteDispatchConnectOpenHelper (remote_dispatch.h:2346)
==29513==    by 0x5453D21: virNetServerProgramDispatchCall (virnetserverprogram.c:437)

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-08-20 16:15:00 +02:00
Michal Privoznik
3499eedd4b virNumaGetPageInfo: Take huge pages into account
On the Linux kernel, if huge pages are allocated the size they cut off
from memory is accounted under the 'MemUsed' in the meminfo file.
However, we want the sum to be subtracted from 'MemTotal'. This patch
implements this feature. After this change, we can enable reporting
of the ordinary system pages in the capability XML:

<capabilities>

  <host>
    <uuid>01281cda-f352-cb11-a9db-e905fe22010c</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Haswell</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='1' threads='1'/>
      <feature/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
    <power_management/>
    <migration_features/>
    <topology>
      <cells num='4'>
        <cell id='0'>
          <memory unit='KiB'>4048248</memory>
          <pages unit='KiB' size='4'>748382</pages>
          <pages unit='KiB' size='2048'>3</pages>
          <pages unit='KiB' size='1048576'>1</pages>
          <distances/>
          <cpus num='1'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
          </cpus>
        </cell>
        ...
      </cells>
    </topology>
  </host>
</capabilities>

You can see the beautiful thing about this: if you sum up all the
<pages/> you'll get <memory/>.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-24 11:50:31 +02:00
Michal Privoznik
f4dc812c9e virNodeParseSocket: Take ARM into account
The virNodeParseSocket() function tries to get socked ID from
'topology/physical_package_id' file. However, on some architectures
the file contains the -1 constant which makes in turn libvirt think
the info extraction was unsuccessful. If that's the case, we need to
overwrite the obtained integer with zero like we are doing for other
architectures.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-20 15:59:08 +02:00
Michal Privoznik
9571eaaa63 virNodeParseNode: Propagate host architecture
As in previous commit, there are again some places where we can do
runtime decision instead of compile time. This time it's whether the
'topology/physical_package_id' is allowed to have '-1' within or not.
Then, core ID is pared differently on s390(x) than on the rest of
architectures.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-20 15:50:56 +02:00
Michal Privoznik
e808357528 nodeinfo: Introduce @arch to linuxNodeInfoCPUPopulate
So far, we are doing compile time decisions on which architecture is
used. However, for testing purposes it's much easier if we pass host
architecture as parameter and then let the function decide which code
snippet for extracting host CPU info will be used.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-20 15:46:52 +02:00
Michal Privoznik
38fa03f4b0 nodeinfo: Implement nodeGetFreePages
And add stubs to other drivers like: lxc, qemu, uml and vbox.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-19 15:10:50 +02:00
Michal Privoznik
02129b7c0e virCaps: expose pages info
There are two places where you'll find info on page sizes. The first
one is under <cpu/> element, where all supported pages sizes are
listed. Then the second one is under each <cell/> element which refers
to concrete NUMA node. At this place, the size of page's pool is
reported. So the capabilities XML looks something like this:

<capabilities>

  <host>
    <uuid>01281cda-f352-cb11-a9db-e905fe22010c</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Westmere</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='1' threads='1'/>
      ...
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>
    ...
    <topology>
      <cells num='4'>
        <cell id='0'>
          <memory unit='KiB'>4054408</memory>
          <pages unit='KiB' size='4'>1013602</pages>
          <pages unit='KiB' size='2048'>3</pages>
          <pages unit='KiB' size='1048576'>1</pages>
          <distances/>
          <cpus num='1'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>4071072</memory>
          <pages unit='KiB' size='4'>1017768</pages>
          <pages unit='KiB' size='2048'>3</pages>
          <pages unit='KiB' size='1048576'>1</pages>
          <distances/>
          <cpus num='1'>
            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
          </cpus>
        </cell>
        ...
      </cells>
    </topology>
    ...
  </host>

  <guest/>

</capabilities>

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-19 15:10:49 +02:00
Michal Privoznik
99a63aed2d nodeinfo: Rename nodeGetFreeMemory to nodeGetMemory
For future work we want to get info for not only the free memory
but overall memory size too. That's why the function must have
new signature too.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-19 15:10:49 +02:00
Eric Blake
10c10f4380 nodeinfo: avoid uninitialized variable on error
Commit 8ba0a58 introduced a compiler warning that I hit during
a run of ./autobuild.sh:

../../src/nodeinfo.c: In function 'nodeCapsInitNUMA':
../../src/nodeinfo.c:1853:43: error: 'nsiblings' may be used uninitialized in this function [-Werror=maybe-uninitialized]
         if (virCapabilitiesAddHostNUMACell(caps, n, memory,
                                           ^

Sure enough, nsiblings starts uninitialized, and is set by a call
to virNodeCapsGetSiblingInfo, but that function fails to assign
through the pointer if virNumaGetDistances fails.

* src/nodeinfo.c (nodeCapsInitNUMA): Initialize nsiblings.

Signed-off-by: Eric Blake <eblake@redhat.com>
2014-06-10 16:27:34 -06:00
Michal Privoznik
8ba0a58f8d virCaps: Expose distance between host NUMA nodes
If user or management application wants to create a guest,
it may be useful to know the cost of internode latencies
before the guest resources are pinned. For example:

<capabilities>

  <host>
    ...
    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>4004132</memory>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='20'/>
          </distances>
          <cpus num='2'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='2' socket_id='0' core_id='2' siblings='2'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>4030064</memory>
          <distances>
            <sibling id='0' value='20'/>
            <sibling id='1' value='10'/>
          </distances>
          <cpus num='2'>
            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
            <cpu id='3' socket_id='0' core_id='2' siblings='3'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    ...
  </host>
  ...
</capabilities>

We can see the distance from node1 to node0 is 20 and within nodes 10.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
2014-06-04 09:35:55 +02:00