This is just a small helper that will be used later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Links between NUMA nodes can have different latencies and
bandwidths. This info is newly defined in ACPI 6.2 under
Heterogeneous Memory Attribute Table (HMAT) table. Linux kernel
learned how to report these values under sysfs and thus we can
expose them in our capabilities XML. The sysfs interface is
documented in kernel's Documentation/admin-guide/mm/numaperf.rst.
Long story short, two nodes can be in initiator-target
relationship. A node can be initiator if it has a CPU or a device
that's capable of initiating memory transfer. Therefore a node
that has just memory can only be target. An initiator-target link
can then have any combination of {bandwidth, latency} - {access,
read, write} attribute (6 in total). However, the standard says
access is applicable iff read and write values are the same.
Therefore, we really have just four combinations of attributes:
bandwidth-read, bandwidth-write, latency-read, latency-write.
This is the combination that kernel reports anyway.
Then, under /sys/system/devices/node/nodeX/acccessN/initiators we
find values for those 4 attributes and also symlinks named
"nodeN" which then represent initiators to nodeX. For instance:
/sys/system/node/node1/access1/initiators/node0 -> ../../node0
/sys/system/node/node1/access1/initiators/read_bandwidth
/sys/system/node/node1/access1/initiators/read_latency
/sys/system/node/node1/access1/initiators/write_bandwidth
/sys/system/node/node1/access1/initiators/write_latency
This means that node0 is initiator and node1 is target and values
of the interconnect can be read.
In theory, there can be separate links to memory side caches too
(e.g. one link from node X to node Y's main memory, another from
node X to node Y's L1 cache, another one to L2 cache and so on).
But sysfs does not express this relationship just yet.
The "accessN" means either "access0" or "access1". The difference
is that while the former expresses the best interconnect between
two nodes including CPUS and I/O devices (such as GPUs and NICs),
the latter includes only CPUs and thus is what we need.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1786309
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Memory on a NUMA node can have a side caches. Configuring these
for a domain was implemented in v6.6.0-rc1~249 and friends.
However, up until now mgmt applications did not really know what
values to pass because we were not exposing caches of the host.
With recent enough kernel these are exposed under sysfs and with
a bit of parsing we can extend our capabilities XML. The sysfs
structure is documented in kernel's
Documentation/admin-guide/mm/numaperf.rst and basically maps in
1:1 fashion to our virNumaCache structure.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
After previous patches we have two structures:
virCapsHostNUMACellDistance and virNumaDistance which express the
same thing. And have the exact same members (modulo their names).
Drop the former in favor of the latter.
This change means that distances with value of 0 are no longer
printed out into capabilities XML, because domain XML code allows
partial distance specification and thus threats value of 0 as
unspecified by user (see virDomainNumaGetNodeDistance() which
returns the default LOCAL/REMOTE distance for value of 0).
Also, from ACPI 6.1 specification, section 5.2.17 System Locality
Distance Information Table (SLIT):
Distance values of 0-9 are reserved and have no meaning.
Thus we shouldn't be ever reporting 0 in neither domain nor
capabilities XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The virCapsHostNUMACellSiblingInfo structure really represents
distance to other NUMA node. Rename the structure and variables
of that type to make it more obvious.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
What this function really does it takes ownership of all pointers
passed (well, except for the first one - caps - to which it
registers new NUMA node). But since all info is passed as a
single pointer it's hard to tell (and use g_auto*). Let's use
double pointers to make the ownership transfer obvious.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Historically, we declared pointer type to our types:
typedef struct _virXXX virXXX;
typedef virXXX *virXXXPtr;
But usefulness of such declaration is questionable, at best.
Unfortunately, we can't drop every such declaration - we have to
carry some over, because they are part of public API (e.g.
virDomainPtr). But for internal types - we can do drop them and
use what every other C project uses 'virXXX *'.
This change was generated by a very ugly shell script that
generated sed script which was then called over each file in the
repository. For the shell script refer to the cover letter:
https://listman.redhat.com/archives/libvir-list/2021-March/msg00537.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
QEMU has the ability to mark machine types as deprecated. This should be
exposed to management applications in the capabilities.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Update the host CPU code to report the die_id in the NUMA topology
capabilities. On systems with multiple dies, this fixes the bug
where CPU cores can't be distinguished:
<cpus num='12'>
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
<cpu id='1' socket_id='0' core_id='1' siblings='1'/>
<cpu id='2' socket_id='0' core_id='0' siblings='2'/>
<cpu id='3' socket_id='0' core_id='1' siblings='3'/>
</cpus>
Notice how core_id is repeated within the scope of the same socket_id.
It now reports
<cpus num='12'>
<cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0'/>
<cpu id='1' socket_id='0' die_id='0' core_id='1' siblings='1'/>
<cpu id='2' socket_id='0' die_id='1' core_id='0' siblings='2'/>
<cpu id='3' socket_id='0' die_id='1' core_id='1' siblings='3'/>
</cpus>
So core_id is now unique within a (socket_id, die_id) pair.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The NUMA cells are stored directly in the virCapsHostPtr
struct. This moves them into their own struct allowing
them to be stored independantly of the rest of the host
capabilities. The change is used as an excuse to switch
the representation to use a GPtrArray too.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The XML parser currently calls virCapabilitiesDomainDataLookup during
parsing to find the domain capabilities matching the triple
(virt type, os type, arch)
This is, however, bogus with the QEMU driver as it assumes that there
is an emulator known to the default driver capabilities that matches
this triple. It is entirely possible for the driver to be parsing an
XML file with a custom emulator path specified pointing to a binary
that doesn't exist in the default driver capabilities. This will,
for example be the case on a RHEL host which only installs the host
native emulator to /usr/bin. The user can have built a custom QEMU
for non-native arches into $HOME and wish to use that.
Aside from validation, this call is also used to fill in a machine type
for the guest if not otherwise specified. Again, this data may be
incorrect for the QEMU driver because it is not taking account of
the emulator binary that is referenced.
To start fixing this, move the validation to the post-parse callbacks
where more intelligent driver specific logic can be applied.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Remove the need to pass around strings and switch to the enum values
instead.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The capabilities are declared in the XML schema so passing feature names
as strings from hypervisor drivers makes no sense.
Additionally some of the features expose so called 'toggles' while
others not. This knowledge was encoded by a bunch of 'STREQ's in the
formatter.
Change all of this by declaring the features as an enum and use it
instead of a dynamically allocated array.
Presence of 'toggles' is encoded together with the conversion strings
rather than in the formatter directly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Libvirt currently uses the VIR_AUTOUNREF macro for auto cleanup of
virObject instances. GLib approaches things differently with GObject,
reusing their g_autoptr() concept.
This introduces support for g_autoptr() with virObject, to facilitate
the conversion to GObject.
Only virObject classes which are currently used with VIR_AUTOREF are
updated. Any others should be converted to GObject before introducing
use of autocleanup.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Now that virDomainXMLNamespace matches virXMLNamespace,
we no longer need to keep both around.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
Neither the xmlDocPtr nor the root xmlNode (also passed
in the XPathContext) are interesting to the callees.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Jiri Denemark <jdenemar@redhat.com>
As explained in the previous patch, collecting pointer typedefs into a
common header makes it easier to avoid circular inclusions. Continue
the efforts by pulling the appropriate typedefs from capabilities.h
into the new header.
This patch is just straight code motion (all typedefs are listed in
the same order before and after the patch); a later patch will sort
things for legibility.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Introduce the bare bones functions to processing capability
data for the storage driver.
Since there will be no need for the <host> output, we need
to filter that data.
Signed-off-by: John Ferlan <jferlan@redhat.com>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Require that all headers are guarded by a symbol named
LIBVIRT_$FILENAME
where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.
Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This patch is introducing cache monitor(CMT) to cache and
memory bandwidth monitor(MBM) for monitoring CPU memory
bandwidth.
The host capability of the two monitors is also introduced
in this patch.
For CMT, the host capability is shown like:
<host>
...
<cache>
<bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'>
<control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/>
</bank>
<monitor level='3' 'reuseThreshold'='270336' maxMonitors='176'>
<feature name='llc_occupancy'/>
</monitor>
</cache>
...
</host>
For MBM, the capability is shown like this:
<host>
...
<memory_bandwidth>
<node id='1' cpus='6-11'>
<control granularity='10' min ='10' maxAllocs='4'/>
</node>
<monitor maxMonitors='176'>
<feature name='mbm_total_bytes'/>
<feature name='mbm_local_bytes'/>
</monitor>
</memory_bandwidth>
...
</host>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move memory bandwidth capability nodes into one data structure,
this allows us to add a monitor for memory bandwidth.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Move all cache banks into one data structure, this allows
us to add other cache component, such as cache monitor.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add new XML section to report host's memory bandwidth allocation
capability. The format as below example:
<host>
.....
<memory_bandwidth>
<node id='0' cpus='0-19'>
<control granularity='10' min ='10' maxAllocs='8'/>
</node>
</memory_bandwidth>
</host>
granularity ---- granularity of memory bandwidth, unit percentage.
min ---- minimum memory bandwidth allowed, unit percentage.
maxAllocs ---- maximum memory bandwidth allocation group supported.
Signed-off-by: Bing Niu <bing.niu@intel.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1569678
On some large systems (with ~400GB of RAM) it is possible for
unsigned int to overflow in which case we report invalid number
of 4K pages pool size. Switch to unsigned long long.
We hit overflow in virNumaGetPages when doing:
huge_page_sum += 1024 * page_size * page_avail;
because although 'huge_page_sum' is an unsigned long long, the
page_size and page_avail are both unsigned int, so the promotion
to unsigned long long doesn't happen until the sum has been
calculated, by which time we've already overflowed.
Turning page_avail into a unsigned long long is not strictly
needed until we need ability to represent more than 2^32
4k pages, which equates to 16 TB of RAM. That's not
outside the realm of possibility, so makes sense that we
change it to unsigned long long to avoid future problems.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This way later patches can add another structures with virResctrl
prefix without the meaning being even more confusing than it needs to
be.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It doesn't access anything from conf/ and ti will be needed to use
from other util/ places. This split makes the separation clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
On some platforms the number of bits in the cbm_mask might not be
divisible by 4 (and not even by 2), so we need to properly count the
bits. Similar file, min_cbm_bits, is properly parsed and used, but if
the number is greater than one, we lose the information about
granularity when reporting the data in capabilities. For that matter
always report granularity, but if it is not the same as the minimum,
add that information in there as well.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
We're only adding only info about L3 caches, we can add more
later (just by changing one line), but for now that's more than enough
without overwhelming anyone.
XML snippet of how this should look like (also seen as part of the commit):
<cache>
<bank id='0' level='3' type='both' size='8192' unit='KiB' cpus='0-7'/>
</cache>
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This way more drivers can utilize the functionality without copying
the code. And we can therefore test it in one place for all of them.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
There is no "node driver" as there was before, drivers have to do
their own ACL checking anyway, so they all specify their functions and
nodeinfo is basically just extending conf/capablities. Hence moving
the code to src/conf/ is the right way to go.
Also that way we can de-duplicate some code that is in virsysfs and/or
virhostcpu that got duplicated during the virhostcpu.c split. And
Some cleanup is done throughout the changes, like adding the vir*
prefix etc.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Guests are handled in callers, but if something goes wrong (when it
cannot be added to virCapabilities, for example), there's no way for
them to free it properly.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Usage of this keyword in front of function declaration that is exported via a
header file is unnecessary, since internally, this has been the default for most
compilers for quite some time.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
In the reverted commit d2e5538b1, the libxl driver was changed to copy
interface names autogenerated by libxl to the corresponding network def
in the domain's virDomainDef object. The copied name is freed when the
domain transitions to the shutoff state. But when migrating a domain,
the autogenerated name is included in the XML sent to the destination
host. It is possible an interface with the same name already exists on
the destination host, causing migration to fail.
This patch defines a new capability for setting the network device
prefix that will be used in the driver. Valid prefixes are
VIR_NET_GENERATED_PREFIX or the one announced by the driver.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Introduce VIR_DOMAIN_VIRT_NONE to give domaintype the default value of zero.
This is specially helpful in constructing better error messages
when we don't want to look up the default emulator by virtType.
The test data in vircapstest.c is also modified to reflect this change.
This revealed that GuestDefaultEmulator was a bit buggy, capable
of returning an emulator that didn't match the passed domain type. Fix
up the test suite input to continue to pass.
This is a helper function to look up all capabilities data for all
the OS bits that are relevant to <domain>. This is
- os type
- arch
- domain type
- emulator
- machine type
This will be used to replace several functions in later commits.
But the internal API stays the same, and we just convert the value as
needed. Not useful yet, but this is the beginning step of using an enum
for ostype throughout the code.