virPCIGetNetName is used to get the name of the netdev associated with
a particular PCI device. This is used when we have a VF name, but need
the PF name in order to send a netlink command (e.g. in order to
get/set the MAC address of the VF).
In simple cases there is a single netdev associated with any PCI
device, so it is easy to figure out the PF netdev for a VF - just look
for the PCI device that has the VF listed in its "virtfns" directory;
the only name in the "net" subdirectory of that PCI device's sysfs
directory is the PF netdev that is upstream of the VF in question.
In some cases there can be more than one netdev in a PCI device's net
directory though. In the past, the only case of this was for SR-IOV
NICs that could have multiple PF's per PCI device. In this case, all
PF netdevs associated with a PCI address would be listed in the "net"
subdirectory of the PCI device's directory in sysfs. At the same time,
all VF netdevs and all PF netdevs have a phys_port_id in their sysfs,
so the way to learn the correct PF netdev for a particular VF netdev
is to search through the list of devices in the net subdirectory of
the PF's PCI device, looking for the one netdev with a "phys_port_id"
matching that of the VF netdev.
But starting in kernel 5.8, the NVIDIA Mellanox driver began linking
the VFs' representor netdevs to the PF PCI address [1], and so the VF
representor netdevs would also show up in the net
subdirectory. However, all of the devices that do so also only have a
single PF netdev for any given PCI address.
This means that the net directory of the PCI device can still hold
multiple net devices, but only one of them will be the PF netdev (the
others are VF representors):
$ ls '/sys/bus/pci/devices/0000:82:00.0/net'
ens1f0 eth0 eth1
In this case the way to find the PF device is to look at the
"phys_port_name" attribute of each netdev in sysfs. All PF devices
have a phys_port_name matching a particular regex
(p[0-9]+$)|(p[0-9]+s[0-9]+$)
Since there can only be one PF in the entire list of devices, once we
match that regex, we've found the PF netdev.
[1] - https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
commit/?id=123f0f53dd64b67e34142485fe866a8a581f12f1
Co-Authored-by: Moshe Levi <moshele@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Adrian Chiris <adrianc@nvidia.com>
Reviewed-by: Laine Stump <laine@redhat.com>
This commit add virNetDevGetPhysPortName to read netdevice
phys_port_name from sysfs. It also refactor the code so
virNetDevGetPhysPortName and virNetDevGetPhysPortID will use
same method to read the netdevice sysfs.
Signed-off-by: Moshe Levi <moshele@nvidia.com>
Reviewed-by: Laine Stump <laine@redhat.com>
The code handles XML bits and internal definition and should be
in conf directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The code handles XML bits and internal definition and should be
in conf directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Same as virStorageFileBackend, it doesn't belong into util directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
It's used only by storage file code so it doesn't make sense to have
it in util directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Up until now we had a runtime code and XML related code in the same
source file inside util directory.
This patch takes the runtime part and extracts it into the new
storage_file directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This code is not directly relevant to virStorageSource so move it to
separate file.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
This will allow following patches to move virStorageSource into conf
directory and virStorageDriverData into a new storage_file directory.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Since Hyper-V allows multiple VMs to be created with the same name,
some commands produce unpredictable results due to
hypervDomainLookupByName's WMI query selecting the wrong domain.
For example, this prevents `virsh dumpxml` from outputting XML for the
wrong domain.
Signed-off-by: Matt Coleman <matt@datto.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Extract common code as helper function virNetlinkTalk, then simplify
the functions virNetlink[DumpLink|NewLink|DelLink|GetNeighbor].
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Introduce a macro NETLINK_MSG_APPEND to wrap nlmsg_append and
simplify code. Remove those labels 'buffer_too_small', since they
are now useless.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Move macros NETLINK_MSG_[NEST_START|NEST_END|PUT] from .h into .c;
within these macros, replace 'goto' with reporting error and returning;
simplify virNetlinkDumpLink and virNetlinkDelLink by using NETLINK_MSG_PUT.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
NLM_F_CREATE and NLM_F_EXCL are invalid for RTM_DELLINK,
so remove them.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
So far we assumed that any vhostuser interface is plugged into an
OVS bridge and thus 'ovs-vsctl' exists. But this is not always
true. In testing scenarios it is possible to create a vhostuser
interface with this tool dpdk-testpmd (part of dpdk RPM) which
creates/connects to UNIX socket needed for vhostuser. Of course,
since there is no OVS then there is no interface name in which
case virNetDevOpenvswitchGetVhostuserIfname() should return 0.
The rest of APIs that assume OVS are not 'fixed' because we still
want them to fail (e.g. getting statistics, plugging interface
into an OVS bridge, unplugging it from an OVS bridge, ...).
The only API that is fixed is
virNetDevOpenvswitchGetVhostuserIfname() because it is called
explicitly when starting a guest (and callers are okay if no name
was found).
The other way to fix this bug seems to be to simply require
'ovs-vsctl' on spec file level, but that is too heavy gun given
that vhostuser is used by a small set of our users (assumption
made on requirements for vhostuser). Also, this way would drag in
yet another dependency for all users (even those who want minimal
libvirt).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1913156
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
dnsmasq usually prints out a version string like this:
Dnsmasq version 2.82 [...]
but a user reported that the build of dnsmasq included with pihole has
a version string like this:
Dnsmasq version pi-hole-2.81 [...]
We parse the dnsmasq version number to figure out if the dnsmasq
binary supports certain features. Since we expect the version number
(and it must be only numbers!) to start on the first non-space after
the string "Dnsmasq version", we fail to parse this format of the
version string.
Rather than spending a bunch of time trying to get pihole to change
that, we can just make our parsing more permissive - after searching
for "Dnsmasq version", we'll skip ahead to the first decimal digit,
rather than just the first non-space.
(NB: The features we're checking for purely by looking at version
number have been in all releases of dnsmasq since at least 2012, so we
could actually just remove the reading of the version number
completely. However it's possible (although *highly* unlikely)
that some new feature would be added to dnsmasq in the future and we
would need to add that code back.)
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/29
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This function skips over the beginning of a string until it reaches a
decimal digit (0-9) or the NULL at the end of the string. The original
pointer is modified in place (similar to virSkipSpaces()).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
In certain specific cases it might be beneficial to be able to control
the metadata caching of storage image format drivers of a hypervisor.
Introduce XML machinery to set the maximum size of the metadata cache
which will be used by qemu's qcow2 driver.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Enable parsing of backing store strings containing the native 'nfs'
protocol specification.
Signed-off-by: Ryan Gahagan <rgahagan@cs.utexas.edu>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
'nfs_user'/'nfs_group' represents the XML configuration.
'nfs_uid'/'nfs_gid' is internal store when libvirt looks up the user's
uid/gid in the system.
Signed-off-by: Ryan Gahagan <rgahagan@cs.utexas.edu>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
virJSONValueObjectRemoveKey can be used as direct replacement. Fix the
one caller and remove the duplicate function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The lookup didn't do anything apart from comparing the sysfs paths
anyway since that's what makes each mdev unique.
The most ridiculous usage of the old logic was in
virHostdevReAttachMediatedDevices where in order to drop an mdev
hostdev from the list of active devices we first had to create a new
mdev and use it in the lookup call. Why couldn't we have used the
hostdev directly? Because the hostdev and mdev structures are
incompatible.
The way mdevs are currently removed is via a write to a specific sysfs
attribute. If you do it while the machine which has the mdev assigned
is running, the write call may block (with a new enough kernel, with
older kernels it would return a write error!) until the device
is no longer in use which is when the QEMU process exits.
The interesting part here comes afterwards when we're cleaning up and
call virHostdevReAttachMediatedDevices. The domain doesn't exist
anymore, so the list of active hostdevs needs to be updated and the
respective hostdevs removed from the list, but remember we had to
create an mdev object in the memory in order to find it in the list
first which will fail because the write to sysfs had already removed
the mdev instance from the host system.
And so the next time you try to start the same domain you'll get:
"Requested operation is not valid: mediated device <path> is in use by
driver QEMU, domain <name>"
Fixes: https://gitlab.com/libvirt/libvirt/-/issues/119
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
virDeviceHasPCIExpressLink() wasn't checking that pcie_cap_pos was
valid before attempting to use it, which could lead to reading the
byte at offset 0 + PCI_CAP_ID_EXP instead of [valid offset] +
PCI_CAP_ID_EXP. In particular, this could happen for "integrated" PCI
devices (those that are on the PCIe root complex). If it happened that
the byte from the wrong address had the "right" bit set, then it would
lead to us innappropriately believing that Express Link info was
available when it wasn't, and the node device driver would then log an
error like this:
virPCIDeviceGetLinkCapSta:2754 :
internal error: pci device 0000:00:18.0 is not a PCI-Express device
during a libvirtd restart. (this didn't ever occur until after
virPCIDeviceIsPCIExpress() was made more intelligent in commit
c00b6b1ae, which hasn't yet been in any official release)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The only reason why virstoragefile.h needs to be included in virfile.h
is that virFileNBDDeviceAssociate() takes virStorageFileFormat argument.
The function doesn't need the enum value as it converts the value to
string and uses only that.
Change the argument to string which will allow us to remove that
include.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
All these headers are indirectly included provided by virfile.h having
virstoragefile.h which will be removed in the following patch.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The function doesn't take virStorageSource as argument and has nothing
in common with virStorageSource or storage file.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Function virQEMUBuildQemuImgKeySecretOpts is not used anywhere else
so there is no need to have it in util.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The last usage outside of tests was removed by commit
<780f8c94ca8b3dee7eb59c1bfbc32f672f965df8>.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The last user was removed by commit
<40f0e0348dfc84f28a500e262c4953b0d3b44fa0>.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When adding a new lease by our leaseshelper then virLeaseNew() is
called. Here, we check for DNSMASQ_LEASE_EXPIRES environment
variable which is the expiration time for the lease. For infinite
lease time the value is zero. However, our code is not prepared
for that and adds "expiry-time" into the JSON file only if lease
expiry time is non-zero. This breaks the assumption that the
"expiry-time" attribute is always present (as can be seen in
virLeaseReadCustomLeaseFile() and virLeasePrintLeases()).
Store "expiry-time" always.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
In virLeaseNew() we are trying to remove trailing space (per
comment it may happen that older versions of dnsmasq put it into
an env variable). Well, instead of open coding it, we can use
virTrimSpaces().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
There are some variables which are used only inside the single
loop the function has. Let's declare them inside the loop body to
make that obvious. Also, fix indendation.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
During testing of my patch v6.10.0-rc1~221 it was found that
'ovs-vsctl get Interface $name name' or
'ovs-vsctl find Interface options:vhost-server-path=$path'
may return a string in double quotes, e.g. "vhost-user1". Later
investigation of openvswitch code showed, that early versions
(like 1.3.0) have somewhat restrictive set of safe characters
(isalpha() || '_' || '-' || '.'), which is then refined with
increasing version. For instance, version 2.11.4 has: isalnum()
|| '_' || '-' || '.'. If the string that ovs-vsctl wants to
output contains any other character it is escaped. You want to be
looking at ovsdb_atom_to_string() which handles outputting of a
single string and calls string_needs_quotes() and possibly
json_serialize_string() in openvswitch code base.
Since the interfaces are usually named "vhost-userN" we are
facing a problem where with one version we get the name in double
quotes and with another we get plain name without funny business.
Because of json involved I thought, let's make ovs-vsctl output
into JSON format and then use our JSON parser, but guess what -
ovs-vsctl ignores --format=json. But with a little help of
g_strdup_printf() it can be turned into JSON.
Fixes: e4c29e2904
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1767013
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
In v6.10.0-rc1~221 I wanted to make virNetDevOpenvswitchGetVhostuserIfname()
lookup interface name even for vhostuser interfaces with mode='server'. For
these, we are given a socket path which is then created by QEMU and to which
OpenVSwitch connects to and creates an interface. Because of this, we don't
know the name of the interface upfront (when starting QEMU) and have to use
the path to query OpenVSwitch later (using ovs-vsctl). What I intended to use
was:
ovs-vsctl --no-headings --columns=name find Interface options:vhost-server-path=$path
But what my code does is:
ovs-vsctl --no-headings --columns=name find Interface options:vhost-server-path=path
and it's all because the argument to the function is named "path"
which I then enclosed in double quotes while it should have been
used as a variable.
Fixes: e4c29e2904
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1767013
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
The comment about auto-generating names was obsoleted by recent
changes, and there was an unnecessary set of braces around a single
line conditional body.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since commit 282d135ddb the parser for <interface> has cleared out
any interface name from the input XML that used the macvtap/macvlan
name as a prefix. Along with that, the switch to use the new
virNetDevGenerateName() function for auto-generating macvtap/macvlan
device names (commit 9b5d741a9), has realized two facts:
1) virNetDevGenerateName() can be called with a name already filled
in, and in that case it is an effective NOP.
2) because virNetDevGenerate() will always find an unused name, there
is no need to retry device creation in a loop - if it fails the
first time, it would fail any subsequent time as well.
that, combined with the aforementioned parser change allow us to
simplify virNetDevMacVLanCreateWithVPortProfile() - we no longer need
any extra code to determine if a template "AutoName" was requested,
and don't need a separate code path for creating the device in the
case that a specific name was given in the XML - all we need to do is
log any requested name, and then call exactly the same code as we
would if no name was given.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The Linux implementation of virNetDevCreate() doesn't require a
template ifname (e.g. "vnet%d") when it is called, but just generates
a new name if ifname is empty. The FreeBSD implementation requires
that the caller actually fill in a template ifname, and will fail if
ifname is empty. Since we want to eliminate all the special code in
callers that is setting the template name, we need to make the
behavior of the FreeBSD virNetDevCreate() match the behavior of the
Linux virNetDevCreate().
The simplest way to do this is to use the new virNetDevGenerateName()
function - if ifname is empty it generates a new name with the proper
prefix, and if it's not empty, it leaves it alone.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
When netlink is supported, use netlink to create veth device pair
rather than 'ip link' command.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Simplify virNetDevVethCreate by using common GenerateName/ReserveName
functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Simplify ReserveName/GenerateName for macvlan and macvtap by using
common functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Simplify GenerateName/ReserveName for netdevtap by using common
functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Extract ReserveName/GenerateName from netdevtap and netdevmacvlan as
common helper functions.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Laine Stump <laine@redhat.com>
Until now there has been an extra bit of code in
qemuDomainDeviceCalculatePCIConnectFlag() (one of the two callers of
virPCIDeviceIsPCIExpress()) that tries to determine if a device is
PCIe by looking at the *length* of its sysfs config file; it only does
this when libvirt is running as a non-root process.
This patch takes advantage of our newfound ability to tell the
difference between "I read a 0 from the device PCI config file" and "I
couldn't read the PCI Express Capabilities because I don't have
sufficient permission" to put the file length check down in
virPCIDeviceIsPCIExpress(), and do that check any time we fail while
reading the config file (not only when the process is non-root).
Fixes: https://bugzilla.redhat.com/1901685
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Previously there was no way to differentiate between this function 1)
encountering an error while reading the pci config, and 2) determining
that the device in question is a conventional PCI device, and so has
no Express Capabilities.
The difference between these two conditions is important, because an
unprivileged libvirtd will be unable to read all of the pci config (it
can only read the first 64 bytes, and will get ENOENT when it tries to
seek past that limit) even though the device is in fact a PCIe device.
This patch changes virPCIDeviceFindCapabilityOffset() to put the
determined offset into an argument of the function (rather than
sending it back as the return value), and to return the standard "0 on
success, -1 on failure". Failure is determined by checking the value
of errno after each attemptd read of the config file (which can only
work reliably if errno is reset to 0 before each read, and after
virPCIDeviceFindCapabilityOffset() has finished examining it).
(NB: if the config file is read successfully, but no Express
Capabilities are found, then the function returns success, but the
returned offset will be 0 (which is an impossible offset for Express
Capabilities, and so easily recognizeable).
An upcoming patch will take advantage of the change made here.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The new message is more verbose/useful, but only logged at debug level
instead of as a warning (since it could easily happen in a non-error
situation).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This function returned an int, but would only return 0 or 1, and the
one place it was called would just use !! to convert that value to a
bool. Change the function to directly return bool instead.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This function returned an int, and that int was being checked for < 0
in its solitary caller, but within the function it would only ever
return 0 or 1. Change the function itself to return a bool, and the
caller to just directly set the flag in the virPCIDevice.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
We previous added code for passing FDs which was explicitly derived from
gnulib's passfd code:
commit 17460825f3
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Fri Jan 17 11:57:17 2020 +0000
src: implement APIs for passing FDs over UNIX sockets
This is a simplified variant of gnulib's passfd module
without the portability code that we do not require.
while the license was unchanged, we mistakenly failed to copy the FSF
copyright header which is required by the license terms.
Reported-by: Bruno Haible <bruno@clisp.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This macro checks whether given number is an integer power of
two. At the same time, I've identified two places where we check
for pow2 and I'm replacing them with the macro.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Han Han <hhan@redhat.com>
The only test we do when checking for UUID validity is that
whether all bytes are the same (invalid UUID) or not (valid
UUID). The algorithm we use is needlessly complicated.
Also, the checked UUID is not modified and hence the argument can
be of 'const' type.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Han Han <hhan@redhat.com>
In this previous commit:
commit 65491a2dfe
Author: Martin Kletzander <mkletzan@redhat.com>
Date: Thu Nov 12 13:58:53 2020 +0100
Do not disable incompatible-pointer-types-discards-qualifiers
We selectively rewrite G_DEFINE_TYPE to avoid warnings about
mismatched volatile/non-volatile pointers that appeared with
CLang when using GLib2 >= 2.67
We have now just hit the reverse problem, GCC >= 11 has started
warning about mismatched volatile/non-volatile pointers but only
with GLib2 < 2.67. The new GLib2 avoids the warning, as does
older GCC.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Since 032548c4 @cmd was never autofree'd. Perhaps as a result of
VIR_AUTOPTR type changes occurring at roughly the same time so the
copy pasta missed this.
Found by Coverity.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Now that no one uses VIR_AUTOSTRINGLIST it can be dropped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Glib provides g_auto(GStrv) which is in-place replacement of our
VIR_AUTOSTRINGLIST.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
This will open an opportunity to modernize virDomainDiskDefParseXML()
in the next patch.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Just like virCommandPassFD, but it also returns an index of
the passed FD in the FD set.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
The virJSONValueObjectGetStringArray() function is given a @key
which is supposed to be an array inside given @object. Well, if
it's not then an error state is returned (NULL), but no error
message is set.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The original logic is incorrect. We would delete the device entry
from eBPF map only if the newval would be same as current val in the
map. In case that the device was allowed only as read-only but later
we remove all permissions for that device it would remain in the table
with empty values.
The old code would still deny the device but it's not working as
intended. Instead we will update the value in advance. If the updated
value is 0 it means that we are removing all permissions so it should
be removed from the map, otherwise we will update the value in map.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1810356
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Kernel commit <d505b8af58912ae1e1a211fabc9995b19bd40828> added proper
check for cpu quota maximum limit to prevent internal overflow.
Even though this change is not present in all kernels it makes sense
to enforce the same limit in libvirt.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1750315
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Tested-by: Han Han <hhan@redhat.com>
I previously did a workaround for a glib event loop race
that causes crashes:
commit 0db4743645
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Tue Jul 28 16:52:47 2020 +0100
util: avoid crash due to race in glib event loop code
it turns out that the workaround has a significant performance
penalty on I/O intensive workloads. We thus need to avoid the
workaround if we know we have a new enough glib to avoid the
race condition.
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When libvirt added support for firewalld, we were unable to use
firewalld's higher level rules, because they weren't detailed enough
and could not be applied to the iptables FORWARD or OUTPUT chains
(only to the INPUT chain). Instead we changed our code so that rather
than running the iptables/ip6tables/ebtables binaries ourselves, we
would send these commands to firewalld as "passthrough commands", and
firewalld would run the appropriate program on our behalf.
This was done under the assumption that firewalld was somehow tracking
all these rules, and that this tracking was benefitting proper
operation of firewalld and the system in general.
Several years later this came up in a discussion on IRC, and we
learned from the firewalld developers that, in fact, adding iptables
and ebtables rules with firewalld's passthrough commands actually has
*no* advantage; firewalld doesn't keep track of these rules in any
way, and doesn't use them to tailor the construction of its own rules.
Meanwhile, users have been complaining for some time that whenever
firewalld is restarted on a system with libvirt virtual networks
and/or nwfilter rules active, the system logs would be flooded with
warning messages whining that [lots of different rules] could not be
deleted because they didn't exist. For example:
firewalld[3536040]: WARNING: COMMAND_FAILED:
'/usr/sbin/iptables -w10 -w --table filter --delete LIBVIRT_OUT
--out-interface virbr4 --protocol udp --destination-port 68
--jump ACCEPT' failed: iptables: Bad rule
(does a matching rule exist in that chain?).
(See https://bugzilla.redhat.com/1790837 for many more examples and a
discussion)
Note that these messages are created by iptables, but are logged by
firewalld - when an iptables/ebtables command fails, firewalld grabs
whatever is in stderr of the program, and spits it out to the system
log as a warning. We've requested that firewalld not do this (and
instead leave it up to the calling application to do the appropriate
logging), but this request has been respectfully denied.
But combining the two problems above ( 1) firewalld doesn't do
anything useful when you use it as a proxy to add/remove iptables
rules, 2) firewalld often insists on logging lots of
annoying/misleading/useless "error" messages when you use it as a
proxy to remove iptables rules that don't already exist), leads to a
solution - simply stop using firewalld to add and remove iptables
rules. Instead, exec iptables/ip6tables/ebtables directly in the same
way we do when firewalld isn't active.
We still need to keep track of whether or not firewalld is active, as
there are some things that must be done, e.g. we need to add some
actual firewalld rules in the firewalld "libvirt" zone, and we need to
take notice when firewalld restarts, so that we can reload all our
rules.
This patch doesn't remove the infrastructure that allows having
different firewall backends that perform their functions in different
ways, as that will very possibly come in handy in the future when we
want to have an nftables direct backend, and possibly a "pure"
firewalld backend (now that firewalld supports more complex rules, and
can add those rules to the FORWARD and OUTPUT chains). Instead, it
just changes the action when the selected backend is "firewalld" so
that it adds rules directly rather than through firewalld, while
leaving as much of the existing code intact as possible.
In order for tests to still pass, virfirewalltest also had to be
modified to behave in a different way (i.e. by capturing the generated
commandline as it does for the DIRECT backend, rather than capturing
dbus messages using a mocked dbus API).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
When it is starting up, firewalld will delete all existing iptables
rules and chains before adding its own rules. If libvirtd were to try
to directly add iptables rules during the time before firewalld has
finished initializing, firewalld would end up deleting the rules that
libvirtd has just added.
Currently this isn't a problem, since libvirtd only adds iptables
rules via the firewalld "passthrough command" API, and so firewalld is
able to properly serialize everything. However, we will soon be
changing libvirtd to add its iptables and ebtables rules by directly
calling iptables/ebtables rather than via firewalld, thus removing the
serialization of libvirtd adding rules vs. firewalld deleting rules.
This will especially apparent (if we don't fix it in advance, as this
patch does) when libvirtd is responding to the dbus NameOwnerChanged
event, which is used to learn when firewalld has been restarted. In
that case, dbus sends the event before firewalld has been able to
complete its initialization, so when libvirt responds to the event by
adding back its iptables rules (with direct calls to
/usr/bin/iptables), some of those rules are added before firewalld has
a chance to do its "remove everything" startup protocol. The usual
result of this is that libvirt will successfully add its private
chains (e.g. LIBVIRT_INP, etc), but then fail when it tries to add a
rule jumping to one of those chains (because in the interim, firewalld
has deleted the new chains).
The solution is for libvirt to preface it's direct calling to iptables
with a iptables command sent via firewalld's passthrough command
API. Since commands sent to firewalld are completed synchronously, and
since firewalld won't service them until it has completed its own
initialization, this will assure that by the time libvirt starts
calling iptables to add rules, that firewalld will not be following up
by deleting any of those rules.
To minimize the amount of extra overhead, we request the simplest
iptables command possible: "iptables -V" (and aside from logging a
debug message, we ignore the result, for good measure).
(This patch is being done *before* the patch that switches to calling
iptables directly, so that everything will function properly with any
fractional part of the series applied).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Even though *we* don't call ebtables/iptables/ip6tables (yet) when the
firewalld backend is selected, firewalld does, so these binaries need
to be there; let's check for them. (Also, the patch after this one is
going to start execing those binaries directly rather than via
firewalld).
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
iptables and ip6tables have had a "-w" commandline option to grab a
systemwide lock that prevents two iptables invocations from modifying
the iptables chains since 2013 (upstream commit 93587a04 in
iptables-1.4.20). Similarly, ebtables has had a "--concurrent"
commandline option for the same purpose since 2011 (in the upstream
ebtables commit f9b4bcb93, which was present in ebtables-2.0.10.4).
Libvirt added code to conditionally use the commandline option for
iptables/ip6tables in upstream commit ba95426d6f (libvirt-1.2.0,
November 2013), and for ebtables in upstream commit dc33e6e4a5
(libvirt-1.2.11, November 2014) (the latter actually *re*-added the
locking for iptables/ip6tables, as it had accidentally been removed
during a refactor of firewall code in the interim).
I say "conditionally" because a check was made during firewall module
initialization that tried executing a test command with the
-w/--concurrent option, and only continued using it for actual
commands if that test command completed successfully. At the time the
code was added this was a reasonable thing to do, as it had been less
than a year since introduction of -w to iptables, so many distros
supported by libvirt were still using iptables (and possibly even
ebtables) versions too old to have the new commandline options.
It is now 2020, and as far as I can discern from repology.org (and
manually examining a RHEL7.9 system), every version of every distro
that is supported by libvirt now uses new enough versions of both
iptables and ebtables that they all have support for -w/--concurrent.
That means we can finally remove the conditional code and simply
always use them.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
All the unit tests that use iptables/ip6tables/ebtables have been
written to omit the locking/exclusive use primitive on the generated
commandlines. Even though none of the tests actually execute those
commands (and so it doesn't matter for purposes of the test whether or
not the commands support these options), it still made sense when some
systems had these locking options and some didn't.
We are now at a point where every supported Linux distro has supported
the locking options on these commands for quite a long time, and are
going to make their use non-optional. As a first step, this patch uses
the virFirewallSetLockOverride() function, which is called at the
beginning of all firewall-related tests, to set all the bools
controlling whether or not the locking options are used to true. This
means that all the test cases must be updated to include the proper
locking option in their commandlines.
The change to make actual execs of the commands unconditionally use
the locking option will be in an upcoming patch - this one affects
only the unit tests.
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Commit 912c6b22fc added abort() when the
'val' parameter is NULL along with setting the error variable for the
command. We don't want to abort in this case, just set the error.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Using virtCgroupNewSelf() is not correct with cgroups v2 because the
the virt-host-validate process is executed from from the same cgroup
context as the terminal and usually not all controllers are enabled
by default.
To do a proper check we need to use the root cgroup to see what
controllers are actually available. Libvirt or systemd ensures that
all controllers are available for VMs as well.
This still doesn't solve the devices controller with cgroups v2 where
there is no controller as it was replaced by eBPF. Currently libvirt
tries to query eBPF programs which usually works only for root as
regular users will get permission denied for that operation.
Fixes: https://gitlab.com/libvirt/libvirt/-/issues/94
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
This reverts commit b3710e9a2a.
That check is very valuable for our code, but it causes issue with glib >=
2.67.0 when building with clang.
The reason is a combination of two commits in glib, firstly fdda405b6b1b which
adds a g_atomic_pointer_{set,get} variants that enforce stricter type
checking (by removing an extra cast) for compilers that support __typeof__, and
commit dce24dc4492d which effectively enabled the new variant of glib's atomic
code for clang. This will not be necessary when glib's issue #600 [0] (8 years
old) is fixed. Thankfully, MR #1719 [1], which is supposed to deal with this
issue was opened 3 weeks ago, so there is a slight sliver of hope.
[0] https://gitlab.gnome.org/GNOME/glib/-/issues/600
[1] https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1719
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Introduced by commit <22494556542c676d1b9e7f1c1f2ea13ac17e1e3e> which
fixed a CVE.
If the @path passed to virDMSanitizepath() is not a DM name or not a
path to DM name this function could return incorrect sanitized path as
it would always be the first device under /dev/mapper/.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
While it's certainly good to log events like "failed to close fd"
and "tried to close invalid fd", which are likely to be the
consequence of some bug in libvirt, logging a message every single
time a file descriptor is closed successfully is perhaps excessive
and can lead to useful information being missed among the noise.
Log filters don't help in this situation, because filtering out all
of util.file is too big a hammer and would cause important messages
to be left out as well.
To give an idea of just how much noise this single debug statement
can cause, here's a real life example from a quite large libvirtd
log I had to look at recently:
$ grep virFile libvirt.log | wc -l
1307
$ grep virFile libvirt.log | grep -v 'Closed fd' | wc -l
343
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Before commit 24d8968c, virDirClose took a DIR**, and that was never
NULL, so its declaration included ATTRIBUTE_NONNULL(1). Since that
commit, virDirClose takes a DIR*, and it may be NULL (e.g. if the DIR*
is initialized to NULL and was never closed).
Even though virDirClose() is currently only called implicitly (as the
cleanup for a g_autoptr(DIR)), and (as I've just newly learned) the
autocleanup function g_autoptr will only be called if the pointer in
question is non-null (see the definition of
_GLIB_AUTOPTR_CLEAR_FUNC_NAME in
/usr/include/glib-2.0/glib/gmacros.h), it does still cause Coverity to
complain that it *could* be called with a NULL, and it's also possible
that in the future someone might add code that explicitly calls
virDirClose.
To eliminate the Coverity complaints, and protect against the
hypothetical future where someone both explicitly calls virDirClose()
with a potentially NULL value, *and* re-enables the nonnull directive
when not building with Coverity (disabled by commit eefb881) this
patch removes the ATTRIBUTE_NONNULL(1) from the declaration of
virDirClose().
Fixes: 24d8968cd0
Reported-by: John Ferlan <jferlan@redhat.com>
Details-Research-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Laine Stump <laine@redhat.com>