Links between NUMA nodes can have different latencies and
bandwidths. This info is newly defined in ACPI 6.2 under
Heterogeneous Memory Attribute Table (HMAT) table. Linux kernel
learned how to report these values under sysfs and thus we can
expose them in our capabilities XML. The sysfs interface is
documented in kernel's Documentation/admin-guide/mm/numaperf.rst.
Long story short, two nodes can be in initiator-target
relationship. A node can be initiator if it has a CPU or a device
that's capable of initiating memory transfer. Therefore a node
that has just memory can only be target. An initiator-target link
can then have any combination of {bandwidth, latency} - {access,
read, write} attribute (6 in total). However, the standard says
access is applicable iff read and write values are the same.
Therefore, we really have just four combinations of attributes:
bandwidth-read, bandwidth-write, latency-read, latency-write.
This is the combination that kernel reports anyway.
Then, under /sys/system/devices/node/nodeX/acccessN/initiators we
find values for those 4 attributes and also symlinks named
"nodeN" which then represent initiators to nodeX. For instance:
/sys/system/node/node1/access1/initiators/node0 -> ../../node0
/sys/system/node/node1/access1/initiators/read_bandwidth
/sys/system/node/node1/access1/initiators/read_latency
/sys/system/node/node1/access1/initiators/write_bandwidth
/sys/system/node/node1/access1/initiators/write_latency
This means that node0 is initiator and node1 is target and values
of the interconnect can be read.
In theory, there can be separate links to memory side caches too
(e.g. one link from node X to node Y's main memory, another from
node X to node Y's L1 cache, another one to L2 cache and so on).
But sysfs does not express this relationship just yet.
The "accessN" means either "access0" or "access1". The difference
is that while the former expresses the best interconnect between
two nodes including CPUS and I/O devices (such as GPUs and NICs),
the latter includes only CPUs and thus is what we need.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1786309
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Expose virNumaInterconnect XML formatter so that it can be
re-used by other parts of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There's nothing domain specific about NUMA interconnects. Rename
the virDomainNumaInterconnect* structures and enums to
virNumaInterconnect*.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Memory on a NUMA node can have a side caches. Configuring these
for a domain was implemented in v6.6.0-rc1~249 and friends.
However, up until now mgmt applications did not really know what
values to pass because we were not exposing caches of the host.
With recent enough kernel these are exposed under sysfs and with
a bit of parsing we can extend our capabilities XML. The sysfs
structure is documented in kernel's
Documentation/admin-guide/mm/numaperf.rst and basically maps in
1:1 fashion to our virNumaCache structure.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Expose virNumaCache XML formatter so that it can be re-used by
other parts of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
There's nothing domain specific about NUMA memory caches. Rename the
virDomainCache* structures and enums to virNumaCache*.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
The way we format <cpu/> element for capabilities is not ideal,
because if there are no CPUs, i.e. no child elements, we still
output opening and closing element. To solve this,
virXMLFormatElement() could be used but that would introduce more
variables into the loop. Therefore, move the formatter into a
separate function and use virXMLFormatElement().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
In a few places we take 1 and shift it left repeatedly. So much
that it won't longer fit into signed integer. The problem is that
this is undefined behaviour. Switching to 1U makes us stay within
boundaries.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
In a few places it may happen that the array we want to sort is
still NULL (e.g. because there were no leases found, no paths for
secdriver to lock or no cache banks). However, passing NULL to
qsort() is undefined and even though glibc plays nicely we
shouldn't rely on undefined behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Tim Wiederhake <twiederh@redhat.com>
I've identified some places (mostly by looking for
virBufferUse()) that can use virXMLFormatElement() instead of
open coded version of it. I'm sure there are many more places
that could use the same treatment. Let's cure them some other
time.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Before the mentioned commit we always parsed the whole disk definition
for qemuDomainBlockCopy API but we only used the @src part. Based on
that assumption the code was changed to parse only the disk <source>
element.
Unfortunately that is not correct as we need to parse some parts of
<driver> element as well.
Fixes: 0202467c4ba8663db2304b140af609f93a9b3091
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Attribute `type` and sub-element `metadata_cache` are internally stored
in the `virStorageSource` structure. Sometimes we only care about the
disk source bits so we need a dedicated helper for that.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
We supported autostart of node devices via an xml element, but this
is not consistent with other libvirt objects which use an explicit API
for setting autostart status. So revert this and implement it as an
official API in a future commit.
The initial support was refactored after merging, so this commit reverts
both of those previous commits.
Revert "virNodeDevCapMdevParseXML: Use virXMLPropEnum() for ./start/@type"
This reverts commit 9d4cd1d1cda84aa15b77a506f2ad6362a74edf1a.
Revert "nodedev: support auto-start property for mdevs"
This reverts commit 42a558549935336cbdb7cbfe8b239ffb0e3442e3.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
In one of my recent commits I've done some renaming. But whilst
doing so I also mistakenly replaced 'goto cleanup' with 'return
-1' in virCapabilitiesHostNUMAInitReal() which was incorrect.
Fixes: fe25224fdaa53bbeceed3ddeef1b3a150665e656
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
virDomainNetRemove() requires the index of the net device you want to
remove from the list, but in some cases you may not have the index
handy, only the object itself (or the object may not have been added
to the domain's list). virDomainNetRemoveByObj() first tries to find
the given object in the nets list, and deletes that if it is found.
As with virDomainNetRemove() it always unconditionally tries to remove
the device from the hostdevs list (in case it is the ridiculous
combined net+hostdev device created for <interface type='hostdev'>).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So far, we have to places where we format <metadata/> into XMLs:
domain and network. Bot places share the same code. Move it into
a helper function and just call it from those places.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
In case the user wants to share the disk image between multiple VMs the
qemu driver needs to hotplug such disks to instantiate the backends.
Since that doesn't work for all disk configs add a switch to force this
behaviour.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Replace the last use of the function by virDomainDiskInsert and remove
the unused helper.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
None of them are currently needed to pass our upstream CI, most were
either for ancient clang versions or coverity for silencing false
positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
They were added mostly randomly and we don't really want to keep working
around of false positives.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
After previous patches we have two structures:
virCapsHostNUMACellDistance and virNumaDistance which express the
same thing. And have the exact same members (modulo their names).
Drop the former in favor of the latter.
This change means that distances with value of 0 are no longer
printed out into capabilities XML, because domain XML code allows
partial distance specification and thus threats value of 0 as
unspecified by user (see virDomainNumaGetNodeDistance() which
returns the default LOCAL/REMOTE distance for value of 0).
Also, from ACPI 6.1 specification, section 5.2.17 System Locality
Distance Information Table (SLIT):
Distance values of 0-9 are reserved and have no meaning.
Thus we shouldn't be ever reporting 0 in neither domain nor
capabilities XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Expose virNumaDistance XML formatter so that it can be re-used by
other parts of the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
There's nothing domain specific about NUMA distances. Rename the
virDomainNumaDistance structure to just virNumaDistance.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The virCapsHostNUMACellSiblingInfo structure really represents
distance to other NUMA node. Rename the structure and variables
of that type to make it more obvious.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The default value hard-coded in QEMU (64KiB) is not always the ideal.
Having a possibility to set the cluster_size by user may in specific
use-cases improve performance for QCOW2 images.
QEMU internally has some limits, the value has to be between 512B and
2048KiB and must by power of two, except when the image has Extended L2
Entries the minimal value has to be 16KiB.
Since qemu-img ensures the value is correct and the limit is not always
the same libvirt will not duplicate any of these checks as the error
message from qemu-img is good enough:
Cluster size must be a power of two between 512 and 2048k
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/154
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Clang complains:
../libvirt/src/conf/node_device_conf.c:1945:74: error: result of comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-unsigned-enum-zero-compare]
if ((mdev->start = virNodeDevMdevStartTypeFromString(starttype)) < 0) {
Fixes: 42a55854993
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`ULLONG_MAX + value + 1`) for attribute `reg`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute, as it
refers to a 32 bit address space.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `aw_bits`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `id`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `latency`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This adds a new element to the mdev capabilities xml schema that
represents the start policy for a defined mediated device. The actual
auto-start functionality is handled behind the scenes by mdevctl, but it
wasn't yet hooked up in libvirt.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `number`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `bufferCount`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `bufferCount`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `port`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attribute `timeout`. Allowing negative
numbers to be interpreted this way makes no sense for this attribute.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This strictens the parser to disallow negative values (interpreted as
`UINT_MAX + value + 1`) for attributes `cyls`, `heads` and `secs`.
Allowing negative numbers to be interpreted this way makes no sense for
these attributes.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>