This is a return argument that is to be compared against NULL on
successful return. However, it is not initialized and therefore
relies on callers setting it to NULL prior calling the function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Removing redundant sections of the code
Signed-off-by: Radoslaw Biernacki <radoslaw.biernacki@linaro.org>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
libvirt wrongly assumes that VF netdev has to have the
netdev assigned to PF. There is no such requirement in SRIOV standard.
This patch change the virNetDevSwitchdevFeature() function to deal
with SRIOV devices which does not have netdev on PF. Also corrects
one comment about PF netdev assumption.
One example of such devices is ThunderX VNIC.
By applying this change, VF device is used for virNetlinkCommand() as
it is the only netdev assigned to VNIC.
Signed-off-by: Radoslaw Biernacki <radoslaw.biernacki@linaro.org>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
There seems to be no need to add the ignore_value wrapper or
caste with (void) to the unlink() calls, so let's just remove
them. I assume at one point in time Coverity complained. So,
let's just be consistent - those that care to check the return
status can and those that don't can just have the naked unlink.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
nlmsg_append from the libnl library provides exactly the same
functionality, so we should rely on that instead. This also allows us to
drop the aforementioned function completely.
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
There's a single user for it which takes an existing
virPCIDeviceAddress, passes its various bits to the
function which in turn constructs a virPCIDevice and
then copies the string representation for the caller
to use: we can use virPCIDeviceAddressAsString()
instead and avoid creating the virPCIDevice in the
first place. Since the function ends up having no
users after the change, we can just drop it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Commits 7b706f33ac and 4acb7887e4 introduced some compound type *Free
wrappers in order to use them with VIR_DEFINE_AUTOPTR_FUNC. However,
since those were not used in the code right away, Clang complained about
unused functions (static ones that are defined by the macro above).
This patch puts the defined functions in use.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Using the new VIR_DEFINE_AUTOPTR_FUNC macro defined in
src/util/viralloc.h, define a new wrapper around an existing
cleanup function which will be called when a variable declared
with VIR_AUTOPTR macro goes out of scope. Also, drop the redundant
viralloc.h include, since that has moved from the source module into
the header.
When variables of type virNetDevRxFilterPtr and virNetDevMcastEntryPtr
are declared using VIR_AUTOPTR, the functions virNetDevRxFilterFree
and virNetDevMcastEntryFree, respectively, will be run
automatically on them when they go out of scope.
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Currently, the functions return a pointer to the
destination buffer on success or NULL on failure.
Not only does this kind of error handling look quite
alien in the context of libvirt, where most functions
return zero on success and a negative int on failure,
but it's also somewhat pointless because unless there's
been a failure the returned pointer will be the same
one passed in by the user, thus offering no additional
value.
Change the functions so that they return an int
instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
virStrncpy() allows us to copy a substring, but if we're
going to copy the entire thing it's much more convenient
to use virStrcpy() instead.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This convenience macro was created for the simple cases
where the length of the source string and the size of the
destination buffer can be figued out with strlen() and
sizeof() respectively, so we should use it wherever
possible instead of open-coding parts of it.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
This makes it easier to see why libvirt has decided it must re-attach
a tap device to its bridge.
Signed-off-by: Laine Stump <laine@laine.org>
ACKed-by: Michal Privoznik <mprivozn@redhat.com>
Commit 8708ca01c added virNetDevSwitchdevFeature() to check if a network
device has Switchdev capabilities. virNetDevSwitchdevFeature() attempts
to retrieve the PCI device associated with the network device, ignoring
non-PCI devices. It does so via the following call chain
virNetDevSwitchdevFeature()->virNetDevGetPCIDevice()->
virPCIGetDeviceAddressFromSysfsLink()
For non-PCI network devices (qeth, Xen vif, etc),
virPCIGetDeviceAddressFromSysfsLink() will report an error when
virPCIDeviceAddressParse() fails. virPCIDeviceAddressParse() also
logs an error. After commit 8708ca01c there are now two errors reported
for each non-PCI network device even though the errors are harmless.
To avoid the errors, introduce virNetDevIsPCIDevice() and use it in
virNetDevGetPCIDevice() before attempting to retrieve the associated
PCI device. virNetDevIsPCIDevice() uses the 'subsystem' property of the
device to determine if it is PCI. See the sysfs rules in kernel
documentation for more details
https://www.kernel.org/doc/html/latest/admin-guide/sysfs-rules.html
Right-aligning backslashes when defining macros or using complex
commands in Makefiles looks cute, but as soon as any changes is
required to the code you end up with either distractingly broken
alignment or unnecessarily big diffs where most of the changes
are just pushing all backslashes a few characters to one side.
Generated using
$ git grep -El '[[:blank:]][[:blank:]]\\$' | \
grep -E '*\.([chx]|am|mk)$$' | \
while read f; do \
sed -Ei 's/[[:blank:]]*[[:blank:]]\\$/ \\/g' "$f"; \
done
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Commit id '8708ca01c' added a check to determine whether the NIC had
Switchdev capabilities; however, in doing so inadvertently would cause
network devices without a PCI device to not be added to the node device
database. Thus, network devices having a "computer" as a parent, such
as "net_lo*", "net_virbr*", "net_tun*", "net_vnet*", etc. were not added.
Alter the check to not even check for Switchdev bits if no PCI device found.
After commit 8708ca01c0 libvirtd consistently aborts with "stack
smashing detected" when nodedev driver is initialized.
This is caused by nlmsg_parse() being told that its array of nlattr*
has CTRL_CMD_MAX (10) entries, when in fact it is declared to have
CTRL_ATTR_MAX (8) entries. Since all the entries are initialized to
NULL, the result is that nlmsg_parse is overwriting 2*(sizof(nlattr*))
bytes outside the array.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Instead of checking for all possible constants that every
kernel header with devlink support should have (and defining
HAVE_DECL_DEVLINK as 1 if any of them is present due to the
way AC_CHECK_DECLS works), only check for DEVLINK_CMD_ESWITCH_GET.
This is the name of the constant since kernel 4.11. Between 4.8
and 4.11, the now deprecated spelling DEVLINK_CMD_ESWITCH_MODE_GET
was used.
Assume DEVLINK_ESWITCH_MODE_SWITCHDEV is available, since it was
introduced along with the deprecated spelling.
Adding functionality to libvirt that will allow querying the interface
for the availability of switchdev Offloading NIC capabilities.
The switchdev mode was introduced in kernel 4.8, the iproute2-devlink
command to retrieve the switchdev NIC feature with command example:
devlink dev eswitch show pci/0000:03:00.0
This feature is needed for Openstack so we can do a scheduling decision
if the NIC is in Hardware Offload (switchdev) or regular SR-IOV (legacy) mode.
And select the appropriate hypervisors with the requested capability see [1].
[1] - https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/enable-sriov-nic-features.html
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit 81fb440b further qualified an if statement by adding the
boolean saveVlan to the condition. Coverity pointed out that this
change in the logic eliminated the need to check saveVlan in an
argument to virAsprintf().
When using a VF from an SRIOV-capable network card in a guest (either
in macvtap passthrough mode, or via VFIO PCI device assignment), The
associated PF netdev must be online in order for the VF to be usable
by the guest. The guest, however, is not able to change the state of
the PF. And libvirt *could* set the PF online as needed, but that
could lead to the host receiving unexpected IPv6 traffic (since the
default for an unconfigured interface is to participate in IPv6
autoconf). For this reason, before assigning a VF to a guest, libvirt
verifies that the related PF netdev is online - if it isn't, then we
log an error and don't allow the guest startup to continue.
Until now, this check was done during virNetDevSetNetConfig(). This
works nicely because the same function is called both for macvtap
passthrough and for VFIO device assignment. But in the case of VFIO,
the VF has already been unbound from its netdev driver by the time we
get to virNetDevSetNetConfig(), and in the case of dual port Mellanox
NICs that have their VFs setup in single port mode, the *only* way to
determine the proper PF netdev to query for online status is via the
"phys_port_id" file that is in the VF netdev's sysfs directory. *BUT*
if we've unbound the VF from the netdev driver, then it doesn't *have*
a netdev sysfs directory.
So, in order to check the correct PF netdev for online status, this
patch moved the check earlier in the setup, into
virNetDevSaveNetConfig(), which is called *before* unbinding the VF
from its netdev driver.
(Note that this implies that if you are using VFIO device assignment
for the VFs of a Mellanox NIC that has the VFs programmed in single
port mode, you must let the VFs be bound to their net driver and use
"managed='yes'" in the device definition. To be more specific, this is
only true if the VFs in single port mode are using port *2* of the PF
- if the VFs are using only port 1, then the correct PF netdev will be
arrived at by default/chance))
This resolves: https://bugzilla.redhat.com/267191
virHostdevRestoreNetConfig() calls virNetDevReadNetConfig() to try and
read the "original config" of a netdev, and if that fails, it tries
again with a different directory/netdev name. This achieves the
desired effect (we end up finding the config wherever it may be), but
for each failure, virNetDevReadNetConfig() places a nice error message
in the system logs. Experience has shown that false-positive error
logs like this lead to erroneous bug reports, and can often mislead
those searching for *real* bugs.
This patch changes virNetDevReadNetConfig() to explicitly check if the
file exists before calling virFileReadAll(); if it doesn't exist,
virNetDevReadNetConfig() returns a success, but leaves all the
variables holding the results as NULL. (This makes sense if you define
the purpose of the function as "read a netdev's config from its config
file *if that file exists*).
To take advantage of that change, the caller,
virHostdevRestoreNetConfig() is modified to fail immediately if
virNetDevReadNetConfig() returns an error, and otherwise to try the
different directory/netdev name if adminMAC & vlan & MAC are all NULL
after the preceding attempt.
This patch updates functions in netdev.c to pay attention to
phys_port_id. It uses the new function virNetDevGetPhysPortID() to
learn the phys_port_id of a VF or PF, then sends that info to
virPCIGetNetName(), which has newly been modified to take an optional
phys_port_id.
A single PCI device may have multiple netdevs associated with it. Each
of those netdevs will have a different phys_port_id entry in
sysfs. This patch modifies virPCIGetNetName() to allow selecting one
of the potential many netdevs in two different ways:
1) by setting the "idx" argument, the caller can select the 1st (0),
2nd (1), etc. netdev from the PCI device's net subdirectory.
2) If the physPortID arg is set (to a null-terminated string) then
virPCIGetNetName() returns the netdev that has that phys_port_id in
the sysfs file of the same name in the netdev's directory.
On Linux each network device *can* (but not necessarily *does*) have
an attribute called phys_port_id which can be read from the file of
that name in the netdev's sysfs directory. The examples I've seen have
been a many-digit hexadecimal number (as an ASCII string).
This value can be useful when a single PCI device is associated with
multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has
a single PCI Physical Function (PF), and that PF has two netdevs
associated with it (the "net" subdirectory of the PF in sysfs has two
links rather than the usual single link to a netdev directory). Each
of the PF netdevs has a different phys_port_id. The Virtual Functions
(VF) are similar - the PF (a PCI device) has "n" VFs (also each of
these is a PCI device), each VF has two netdevs, and each of the VF
netdevs points back to the VF PCI device (with the "device" entry in
its sysfs directory) as well as having a phys_port_id matching the PF
netdev it is associated with.
virNetDevGetPhysPortID() simply attempts to read the phys_port_id for
the given netdev and return it to the caller. If this particular
netdev driver doesn't support phys_port_id, it returns NULL (*not* a
NULL-terminated string, but a NULL pointer) but still counts it as a
success.
Change the settings from qemuDomainUpdateDeviceLive() as otherwise the
call would succeed even though nothing has changed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1414627
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Currently, virNetDevSetCoalesce() stub is always returning error. As
it's used by virNetDevTapCreateInBridgePort(), it essentially breaks
bridged networking if coalesce is not supported.
To make it work, relax the stub to trigger error only when its
coalesce argument is not NULL, otherwise report success.
Commit f4ef3a71 made a variation of virNetDevSetMAC that would return
without logging an error message if errno was set to
EADDRNOTAVAIL. This errno is set by some SRIOV VF drivers (in
particular igbvf) when they fail to set the device's MAC address due
to the PF driver refusing the request. This is useful if we want to
try a different method of setting the VF MAC address before giving up
(Commit 86556e16 actually does this, setting the desired MAC address
to the "admin MAC in the PF, then detaching and reattaching the VF
netdev driver to force a reinit of the MAC address).
During testing of Bug 1442040 t was discovered that the ixgbe driver
returns EPERM in this situation, so this patch changes the exception
case for silent+non-terminal failure to account for this difference.
Completes resolution to: https://bugzilla.redhat.com/1415609 (RHEL 7.4)
https://bugzilla.redhat.com/1442040 (RHEL 7.3.z)
The current fallback stub for virNetDevSetCoalesce is inside an
earlier conditional block. This deals with the feature being
missing on older Linux platforms. We need a second fallback stub
though, outside the top level conditional, to ensure builds work
on Win32/FreeBSD platforms too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
That function is able to configure coalesce settings for an interface,
similarly to 'ethtool -C'. This function also updates back the
structure so that it contains actual data on the device (if the device
doesn't support some settings kernel might just return 0 and not set
whatever is not supported), so this way we'll have up-to-date
information in the live domain XML.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
If if_indextoname is not defined, the whole function using it should
not be defined either. Add stub to fix build on mingw.
Caused by 5dd607059d
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Having this information available will make it easier to determine the
culprit when MAC or vlan tag appear to not be set, eg.:
https://bugzilla.redhat.com/1364073
(This patch doesn't fix that bug, just makes it easier to diagnose)
If an SRIOV VF has previously been used for VFIO device assignment,
the "admin MAC" that is stored in the PF driver's table of VF info
will have been set to the MAC address that the virtual machine wanted
the device to have. Setting the admin MAC for a VF also sets a flag in
the PF that is loosely called the "administratively set" flag. Once
that flag is set, it is no longer possible for the net driver of the
VF (either on the host or in a virtual machine) to directly set the
VF's MAC again; this flag isn't reset until the *PF* driver is
restarted, and that requires taking *all* VFs offline, so it's not
really feasible to do.
If the same SRIOV VF is later used for macvtap passthrough mode, the
VF's MAC address must be set, but normally we don't unbind the VF from
its host net driver (since we actually need the host net driver in
this case). Since setting the VF MAC directly will fail, in the past
"we" ("I") had tried to fix the problem by simply setting the admin MAC
(via the PF) instead. This *appeared* to work (and might have at one
time, due to promiscuous mode being turned on somewhere or something),
but it currently creates a non-working interface because only the
value for admin MAC is set to the desired value, *not* the actual MAC
that the VF is using.
Earlier patches in this series reverted that behavior, so that we once
again set the MAC of the VF itself for macvtap passthrough operation,
not the admin MAC. But that brings back the original bug - if the
interface has been used for VFIO device assignment, you can no longer
use it for macvtap passthrough.
This patch solves that problem by noticing when virNetDevSetMAC()
fails for a VF, and in that case it sets the desired MAC to the admin
MAC via the PF, then "bounces" the VF driver (by unbinding and the
immediately rebinding it to the VF). This causes the VF's MAC to be
reinitialized from the admin MAC, and everybody is happy (until the
*next* time someone wants to set the VF's MAC address, since the
"administratively set" bit is still turned on).
Some PF drivers allow setting the admin MAC (that is the MAC address
that the VF will be initialized to the next time the VF's driver is
loaded) to 00:00:00:00:00:00, and some don't. Multiple drivers
initialize the admin MACs to all 0, but don't allow setting it to that
very same value. It has been an uphill battle convincing the driver
people that it's reasonable to expect The argument that's used is
that an all 0 device MAC address on a device is invalid; however, from
an outsider's point of view, when the admin MAC is set to 0 at the
time the VF driver is loaded, the VF's MAC is *not* set to 0, but to a
random non-0 value. But that's beside the point - even if I could
convince one or two SRIOV driver maintainers to permit setting the
admin MAC to 0, there are still several other drivers.
So rather than fighting that losing battle, this patch checks for a
failure to set the admin MAC due to an all 0 value, and retries it
with 02:00:00:00:00:00. That won't result in a random value being set
in the VF MAC at next VF driver init, but that's okay, because we
always want to set a specific value anyway. Rather, the "almost 0"
setting makes it easy to visually detect from the output of "ip link
show" which VFs are currently in use and which are free.
The global functions virNetDevReplaceMacAddress(),
virNetDevReplaceNetConfig(), virNetDevRestoreMacAddress(), and
virNetDevRestoreNetConfig() are no longer used, as their functionality
has been replaced by virNetDev(Save|Read|Set)NetConfig().
The static functions virNetDevReplaceVfConfig() and
virNetDevRestoreVfConfig() were only used by the above-named global
functions that were removed.
These three functions are destined to replace
virNetDev(Replace|Restore)NetConfig() and
virNetDev(Replace|Restore)MacAddress(), which both do the save and set
together as a single step. We need to separate the save, read, and set
steps because there will be situations where we need to do something
else in between (in particular, we will need to rebind a VF's driver
after save but before set).
This patch creates the new functions, but doesn't call them - that
will come in a subsequent patch. Note that the new functions to
read/write the file that stores the original network config now uses
JSON rather than plaintext (it still recognizes the old format as well
though, so it won't get confused during an upgrade).
Fix typo in virNetDevPFGetVF() stub:
ATTRUBUTE_UNUSED -> ATTRIBUTE_UNUSED.
While here, use common indent style for arguments in
virNetDevGetVirtualFunctionIndex() stub.
Given an SRIOV PF netdev name (e.g. "enp2s0f0") and VF#, this new
function returns the netdev name of the referenced VF device
(e.g. "enp2s11f6"), or NULL if the device isn't bound to a net driver.
We will want to allow silent failure of virNetDevSetMAC() in the case
that the SIOSIFHWADDR ioctl fails with errno == EADDRNOTAVAIL. (Yes,
that is very specific, but we really *do* want a logged failure in all
other circumstances, and don't want to duplicate code in the caller
for the other possibilities).
This patch renames the 3 different virNetDevSetMAC() functions to
virNetDevSetMACInternal(), adding a 3rd arg called "quiet" and making
them static (because this extra control will only be needed within
virnetdev.c). A new global virNetDevSetMAC() is defined that calls
whichever of the three *Internal() functions gets compiled with quiet
= false. Callers in virnetdev.c that want to notice a failure with
errno == EADDRNOTAVAIL and retry with a different strategy rather than
immediately failing, can call virNetDevSetMACInternal(..., true).
...and cleanup the callers to report it when it *is* an error.
In many cases It's useful for virPCIGetNetName() to not log an error
and simply return a NULL pointer when the given device isn't bound to
a net driver (e.g. we're looking at a VF that is permanently bound to
vfio-pci). The existing code would silently return an error in this
case, which could eventually lead to the dreaded "An error occurred
but the cause is unknown" log message.
This patch changes virPCIGetNetName() to still return success if the
device simply isn't bound to a net driver, and adjusts all the callers
that require a non-null netname to check for that condition and log an
error when it happens.
This function is only called in two places, and the ifindex,
nltarget_kernel, and getPidFunc args are never used (and never will
be).
ifindex - we always know the name of the device, and never know the
ifindex - if we really did need the ifindex we would have to get it
from the name using virNetDevGetIndex(). In practice, we just send -1
to virNetDevSetVfConfig(), which doesn't bother to learn the real
ifindex (you only need a name *or* an ifindex for the netlink command
to succeed, not both).
nltarget_kernel - messages to set the config of an SRIOV VF will
always go to netlink in the kernel, not to another user process, so
this arg is always true (there are other uses of netlink messages
where the message might need to go to another user process, but never
in the case of RTM_SETLINK for SRIOV).
getPidFunc - this arg is only used if nltarget_kernel is false, and it
never is.
None of this has any functional effect, it just makes it easier to
follow what's happening when virNetDevSetVfConfig() is called.
virNetDevParseVfConfig() assumed that both the MAC address and VLAN
tag pointers were valid, so even if you only wanted one or the other,
you would need a variable to hold the returned value for both. This
patch checks each for a NULL pointer before filling it in.
This function provides the bridge/bond device that the given network
device is attached to. The return value is 0 or -1, and the master
device is a char** argument to the function - this is needed in order
to allow for a "success" return from a device that has no master.
The only reason that the ethtool features weren't being retrieved in
an unprivileged libvirtd was because they required ioctl(), and the
ioctl was using an AF_PACKET socket, which requires root. Now that we
are using AF_UNIX for ioctl(), this restriction can be removed.
The exact family of the socket created for the fd used by ioctl(7)
doesn't matter, it just needs to be a socket and not a file. But for
some reason when macvtap support was added, it used
AF_PACKET/SOCK_DGRAM sockets for its ioctls; we later used the same
AF_PACKET/SOCK_DGRAM socket for new ioctls we added, and eventually
modified the other pre-existing ioctl sockets (for creating/deleting
bridges) to also use AF_PACKET/SOCK_DGRAM (that code originally used
AF_UNIX/SOCK_STREAM).
The problem with using AF_PACKET (intended for sending/receiving "raw"
packets, i.e. packets that can be some protocol other than TCP or UDP)
is that it requires root privileges. This meant that none of the
ioctls in virnetdev.c or virnetdevip.c would work when running
libvirtd unprivileged.
This packet solves that problem by changing the family to AF_UNIX when
creating the socket used for any ioctl().
Before 9c17d665fd (v1.3.2 - I know, right?) it was possible to
have the following interface configuration:
<interface type='ethernet'/>
<script path=''/>
</interface>
This resulted in -netdev tap,script=,.. Fortunately, qemu helped
us to get away with this as it just ignored the empty script
path. However, after the commit mentioned above it's libvirtd
who is executing the script. Unfortunately without special
case-ing empty script path.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This patch splits virnetdev.[ch] into multiple files, with the new
virnetdevip.[ch] containing all the functions related to setting and
retrieving IP-related info for a device (both addresses and routes).
These had been declared in conf/device_conf.h, but then used in
util/virnetdev.c, meaning that we had to #include conf/device_conf.h
in virnetdev.c (which we have for a long time said shouldn't be done.
This caused a bigger problem when I tried to #include util/virnetdev.h
in a file in src/conf (which is allowed) - for some reason the
"device_conf.h: File not found" error.
The solution is to move the data types and functions used in util
sources from conf to util. Some names were adjusted during the move
("virInterface" --> "virNetDevIf", and "VIR_INTERFACE" -->
"VIR_NETDEV_IF")
virNetDevLinkDump should have been in virnetlink.c, but that file
didn't exist yet when the function was created. It didn't really
matter until now - I found that having virnetlink.h included by
virnetdev.h caused build problems when trying to #include virnetdev.h
in a .c file in src/conf (due to missing directory in -I). Rather than
fix that to further institutionalize the incorrect placement of this
one function, this patch moves the function.
The directories we iterate over are unlikely to contain any entries
starting with a dot, other than '.' and '..' which is already skipped
by virDirRead.
Commit b3d069872c added peer address setting to the low level
virNetDevSetIPAddress() function, but ended up causing a segfault in
cases where the caller passed NULL for peer address.
Commit a3510e33d3 fixed the segfault, but managed to cause us to
skip setting the broadcast address when setting an interface's IP
address. The result is that the broadcast address is 0.0.0.0 for all
libvirt-created bridges (and interfaces in lxc containers with IP
addresses set by libvirt).
This was reported on the mailing list:
https://www.redhat.com/archives/libvir-list/2016-June/msg00027.html
but I was too busy to investigate at the time. I found it by accident
today while refactoring virNetDevSetIPAddress(). Since this regression
is present in the 1.3.5 release, I'm sending the bugfix as a separate
patch from my larger refactoring patchset.
SRIOV VFs used in macvtap passthrough mode can take advantage of the
SRIOV card's transparent vlan tagging. All the code was there to set
the vlan tag, and it has been used for SRIOV VFs used for hostdev
interfaces for several years, but for some reason, the vlan tag for
macvtap passthrough devices was stubbed out with a -1.
This patch moves a bit of common validation down to a lower level
(virNetDevReplaceNetConfig()) so it is shared by hostdev and macvtap
modes, and updates the macvtap caller to actually send the vlan config
instead of -1.
virSocketAddrFormat() wants a single pointer, not a double pointer.
Fixes the following compilation error on FreeBSD:
util/virnetdev.c:1448:72: error: incompatible pointer types passing
'virSocketAddr **' to parameter of type 'const virSocketAddr *';
remove & [-Werror,-Wincompatible-pointer-types]
if (VIR_SOCKET_ADDR_VALID(peer) && !(peerstr = virSocketAddrFormat(&peer)))
^~~~~
./util/virsocketaddr.h:92:48: note: passing argument to parameter 'addr' here
char *virSocketAddrFormat(const virSocketAddr *addr);
^
virNetDevIsVirtualFunction() returns 1 if the interface is a
virtual function, 0 if it isn't and -1 on error. This means that,
despite the name suggesting otherwise, using it as a predicate is
not correct.
Fix two callers that were doing so adding an explicit check on
the return value.
I noticed in a log file that we had failed to set a MAC address. The
log said which interface we were trying to set, but didn't give the
offending MAC address, which could have been useful in determining the
source of the problem. This patch modifies all three places in the
code that set MAC addresses to report the failed MAC as well as
interface.
Commit 0f7436ca54 "network: wait for DAD to finish for bridge IPv6 addresses"
results in:
CC util/libvirt_util_la-virnetdevmacvlan.lo
util/virnetdev.c: In function 'virNetDevParseDadStatus':
util/virnetdev.c:1319:188: error: cast increases required alignment of target type [-Werror=cast-align]
util/virnetdev.c:1332:41: error: cast increases required alignment of target type [-Werror=cast-align]
util/virnetdev.c:1334:92: error: cast increases required alignment of target type [-Werror=cast-align]
cc1: all warnings being treated as errors
on at least ARM platforms.
The three macros involved (NLMSG_NEXT, IFA_RTA and RTA_NEXT) all appear to
correctly take care of alignment, therefore suppress Wcast-align around their
uses.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Maxim Perevedentsev <mperevedentsev@virtuozzo.com>
Cc: Laine Stump <laine@laine.org>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
A PCI device may have the capability to setup virtual functions (VFs)
but have them currently all disabled. Prior to this patch, if that was
the case the the node device XML for the device wouldn't report any
virtual_functions capability.
With this patch, if a file called "sriov_totalvfs" is found in the
device's sysfs directory, its contents will be interpreted as a
decimal number, and that value will be reported as "maxCount" in a
capability element of the device's XML, e.g.:
<capability type='virtual_functions' maxCount='7'/>
This will be reported regardless of whether or not any VFs are
currently enabled for the device.
NB: sriov_numvfs (the number of VFs currently active) is also
available in sysfs, but that value is implied by the number of items
in the list that is inside the capability element, so there is no
reason to explicitly provide it as an attribute.
sriov_totalvfs and sriov_numvfs are available in kernels at least as far
back as the 2.6.32 that is in RHEL6.7, but in the case that they
simply aren't there, libvirt will behave as it did prior to this patch
- no maxCount will be displayed, and the virtual_functions capability
will be absent from the device's XML when 0 VFs are enabled.
Use virNetDevSetupControl instead of open coding using socket(AF_LOCAL...)
and clearing virIfreq.
By using virNetDevSetupControl, the socket is then opened using
AF_PACKET which requires being privileged (effectively root) in
order to complete successfully. Since that's now a requirement,
then the ioctl(SIOCETHTOOL) should not fail with EPERM, thus it
is removed from the filtered listed of failure codes.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since the SIOCETHTOOL ioctl only works for privileged daemons, if called
when not root, then virNetDevGetFeatures will VIR_DEBUG a message and
return 0 as if the functions were not available for the architecture.
This effectively returns an empty bitmap indicating no features available.
Introduced by commit id 'c9027d8f4'
Signed-off-by: John Ferlan <jferlan@redhat.com>
In commit id 'c9027d8f4' when updating the posted patch to generate
a bitmap instead of an array of named feature bits, adjustment of
the args was missed
Recently reverted commit id '6f2a0198' showed a need to add extra
comments when dealing with filtering of potential "non-issues".
Scanning through upstream patch postings indicates early on the
reasons for the filtering of specific ioctl failures were provided;
however, when converted from causing an error to VIR_DEBUG's the
reasons were missing. A future read/change of the code incorrectly
assumed they could or should be removed.
This reverts commit 6f2a0198e9.
This commit removed error reporting from virNetDevSendEthtoolIoctl
pushing responsibility onto the callers. This is wrong, however,
since virNetDevSendEthtoolIoctl calls virNetDevSetupControl
which can still report errors. So as a result virNetDevSendEthtoolIoctl
may or may not report errors depending on which bit of it fails, and as
a result callers now overwrite some errors.
It also introduced a regression causing unprivileged libvirtd to
spew error messages to the console due to inability to query the
NIC features, an error which was previously ignored.
virNetDevSetupControlFull:148 : Cannot open network interface control socket: Operation not permitted
virNetDevFeatureAvailable:3062 : Cannot get device wlp3s0 flags: Operation not permitted
virNetDevSetupControlFull:148 : Cannot open network interface control socket: Operation not permitted
virNetDevFeatureAvailable:3062 : Cannot get device wlp3s0 flags: Operation not permitted
virNetDevSetupControlFull:148 : Cannot open network interface control socket: Operation not permitted
virNetDevFeatureAvailable:3062 : Cannot get device wlp3s0 flags: Operation not permitted
virNetDevSetupControlFull:148 : Cannot open network interface control socket: Operation not permitted
virNetDevFeatureAvailable:3062 : Cannot get device wlp3s0 flags: Operation not permitted
Looking back at the original posting I see no explanation of why
thsi refactoring was needed, so reverting the clearly broken
error reporting logic looks like the best option.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Rather than "if (virNetDevFeatureAvailable(ifname, &cmd))" change the
success criteria to "if (virNetDevFeatureAvailable(ifname, &cmd) == 1)".
The called helper returns -1 on failure, 0 on not found, and 1 on found.
Thus a failure was setting bits.
Introduced by commit ac3ed20 which changed the helper's return
values without adjusting its callers
Signed-off-by: John Ferlan <jferlan@redhat.com>
This was originally set to 5 seconds, but times of 5.5 to 7 seconds
were experienced. Since it's an arbitrary number intended to prevent
an infinite hang, having it a bit too high won't hurt anything, and 20
seconds looks to be adequate (i.e. I think/hope we don't need to make
it tunable in libvirtd.conf)
If DAD not finished in 5 seconds, user will get an
unknown error like this:
# virsh net-start ipv6
error: Failed to start network ipv6
error: An error occurred, but the cause is unknown
Call virReportError to set an error.
Signed-off-by: Luyao Huang <lhuang@redhat.com>
Build on non-Linux fails because the virNetDevWaitDadFinish() stub
has unused parameters. Fix by adding appropriate ATTRIBUTE_UNUSED
for these parameters.
Pushing under build-breaker rule.
commit db488c79 assumed that dnsmasq would complete IPv6 DAD before
daemonizing, but in reality it doesn't wait, which creates problems
when libvirt's bridge driver sets the matching "dummy tap device" to
IFF_DOWN prior to DAD completing.
This patch waits for DAD completion by periodically polling the kernel
using netlink to check whether there are any IPv6 addresses assigned
to bridge which have a 'tentative' state (if there are any in this
state, then DAD hasn't yet finished). After DAD is finished, execution
continues. To avoid an endless hang in case something was wrong with
the kernel's DAD, we wait a maximum of 5 seconds.
These functions were made static as a part of commit cbfe38c since
they were no longer called from outside virnetdev.c. We once again
need to call them from another file, so this patch makes them once
again public.
This fixes the crash described here:
https://www.redhat.com/archives/libvir-list/2015-August/msg00162.html
In short, we were calling ioctl(SIOCETHTOOL) pointing to a too-short
object that was a local on the stack, resulting in the memory past the
end of the object being overwritten. This was because the struct used
by the ETHTOOL_GFEATURES command of SIOCETHTOOL ends with a 0-length
array, but we were telling ethtool that it could use 2 elements on the
array.
The fix is to allocate the necessary memory with VIR_ALLOC_VAR(),
including the extra length needed for a 2 element array at the end.
There is no guarantee that an enum start it mapped onto a value
of zero. However, we are guaranteed that enum items are
consecutive integers. Moreover, it's a pity to define an enum to
avoid using magical constants but then using them anyway.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Commit id 'ac3ed2085' causes 'virsh nodedev-list --cap net' to fail
on any system without SYSFS_INFINIBAND_DIR (/sys/class/infiniband).
Rather than assume it's there and fail on the attempt to open the
non-existent directory, check if it's there - if not, return
success and move on. Also fix caller to check < 0 upon return.
As reported by Suren Hajyan <shajyan@redhat.com> from run of unit tests
Commit ac3ed20 breaks build on FreeBSD with:
CC util/libvirt_util_la-virnetdev.lo
util/virnetdev.c:2967:1: error: unused function 'virNetDevRDMAFeature' [-Werror,-Wunused-function]
virNetDevRDMAFeature(const char *ifname,
^
So hide virNetDevRDMAFeature function under the #ifdef 'SIOCETHTOOL'
and 'HAVE_STRUCT_IFREQ' section.
Pushed under the build breaker rule.
Adding functionality to libvirt that will allow
it query the interface for the availability of RDMA and
tx-udp_tnl-segmentation Offloading NIC capabilities
Here is an example of the feature XML definition:
<device>
<name>net_eth4_90_e2_ba_5e_a5_45</name>
<path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/net/eth4</path>
<parent>pci_0000_08_00_1</parent>
<capability type='net'>
<interface>eth4</interface>
<address>90:e2:ba:5e:a5:45</address>
<link speed='10000' state='up'/>
<feature name='rx'/>
<feature name='tx'/>
<feature name='sg'/>
<feature name='tso'/>
<feature name='gso'/>
<feature name='gro'/>
<feature name='rxvlan'/>
<feature name='txvlan'/>
<feature name='rxhash'/>
<feature name='rdma'/>
<feature name='txudptnl'/>
<capability type='80203'/>
</capability>
</device>
There was a couple of problems with the style fixes applied to the original
patch:
1.) virFileReadAllQuiet comparison was incorrectly parenthesized when moved
into a condition, causing the len to be set to the result of comparison. This,
together with the removed underflow check would underflow the phy buffer.
2.) The logic was broken. Failure to call "ip" would abort the function, thus
the "iw" branch would never be reached.
This aims to fix the issues and work around possible style complains :)
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
If an SRIOV PF is offline, the kernel won't complain if you set the
mac address and vlan tag for a VF via this PF, and it will even let
you assign the VF to a guest using PCI device assignment or macvtap
passthrough. But in this case (the PF isn't online), the device won't
be usable in the guest.
Silently setting the PF online would solve the connectivity problem,
but as pointed out by Dan Berrange, when an interface is set online
with no associated config, the kernel will by default turn on IPv6
autoconf, which could create unexpected security problems for the
host. For this reason, this patch instead logs an error and fails the
operation.
This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=893738
Originally filed against RHEL6, but present in every version of
libvirt until today.
Build fails on non-Linux systems with this error:
CC util/libvirt_util_la-virnetdev.lo
util/virnetdev.c:364:1: error: unused function 'virNetDevReplaceMacAddress' [-Werror,-Wunused-function]
virNetDevReplaceMacAddress(const char *linkdev,
^
util/virnetdev.c:406:1: error: unused function 'virNetDevRestoreMacAddress' [-Werror,-Wunused-function]
virNetDevRestoreMacAddress(const char *linkdev,
^
2 errors generated.
The virNetDev{Restore,Replace}MacAddress() functions are only used
by VF-related routines that are available on Linux only. So move these
functions under the same #ifdef.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1113474
When we set the MAC address of a network device as a part of setting
up macvtap "passthrough" mode (where the domain has an emulated netdev
connected to a host macvtap device that has exclusive use of the
physical device, and sets the device MAC address to match its own,
i.e. "<interface type='direct'> <source mode='passthrough' .../>"), we
use ioctl(SIOCSIFHWADDR) giving it the name of that device. This is
true even if it is an SRIOV Virtual Function (VF).
But, when we are setting the MAC address / vlan ID of a VF in
preparation for "hostdev network" passthrough (this is where we set
the MAC address and vlan id of the VF after detaching the host net
driver and before assigning the device to the domain with PCI
passthrough, i.e. "<interface type='hostdev'>", we do the setting via
a netlink RTM_SETLINK message for that VF's Physical Function (PF),
telling it the VF# we want to change. This sets an "administratively
changed MAC" flag for that VF in the PF's driver, and from that point
on (until the PF driver is reloaded, *not* merely the VF driver) that
VF's MAC address can't be changed using ioctl(SIOCSIFHWADDR) - the
only way to change it is via the PF with RTM_SETLINK.
This means that if a VF is used for hostdev passthrough, it will have
the admin flag set, and future attempts to use that VF for macvtap
passthrough will fail.
The solution to this problem is to check if the device being used for
macvtap passthrough is actually a VF; if so, we use the netlink
RTM_SETLINK message to the PF to set the VF's mac address instead of
ioctl(SIOCSIFHWADDR) directly to the VF; if not, behavior does not
change from previously.
There are three pieces to making this work:
1) virNetDevMacVLan(Create|Delete)WithVPortProfile() now call
virNetDev(Replace|Restore)NetConfig() rather than
virNetDev(Replace|Restore)MacAddress() (simply passing -1 for VF#
and vlanid).
2) virNetDev(Replace|Restore)NetConfig() check to see if the device is
a VF. If so, they find the PF's name and VF#, allowing them to call
virNetDev(Replace|Restore)VfConfig().
3) To prevent mixups when detaching a macvtap passthrough device that
had been attached while running an older version of libvirt,
virNetDevRestoreVfConfig() is potentially given the preserved name
of the VF, and if the proper statefile for a VF can't be found in
the stateDir (${stateDir}/${pfname}_vf${vfid}),
virNetDevRestoreMacAddress() is called instead (which will look in
the file named ${stateDir}/${vfname}).
This problem has existed in every version of libvirt that has both
macvtap passthrough and interface type='hostdev'. Fortunately people
seem to use one or the other though, so it hasn't caused any real
world problem reports.