libvirt can retrieve traffic stats for emulated interfaces that are
backed by tap or macvtap devices, but this information wasn't
available for hostdev interfaces (those that are implemented by
assigning an SR-IOV VF device to a guest using vfio):
#virsh domifstat instance --interface=52:54:00:2d:b2:35
error: Failed to get interface stats instance 52:54:00:2d:b2:35
error: internal error: Interface name not provided
For some SR-IOV VF devices this information is available via the
netlink VFINFO_LIST request/response, and that is what this patch uses
to implement stats retrieval for VF. Not that this is dependent on
support in the PF driver - for example, the Mellanox ConnectX-4 Lx
(mlx5) driver reports usable stats, while Intel 82599 (ixgbe) and
82576 (igb) just report all stats as 0. (this is the same result as
"ip -s link show").
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Laine Stump <laine@redhat.com>
By default, pfifo_fast queueing discipline (qdisc) is set on
newly created interfaces (including TAPs). This qdisc has three
queues and packets that want to be sent through given NIC are
placed into one of the queues based on TOS field. Queues are then
emptied based on their priority allowing interactive sessions
stay interactive whilst something else is downloading a large
file.
Obviously, this means that kernel has to be involved and some
locking has to happen (when placing packets into queues). If
virtualization is taken into account then the above algorithm
happens twice - once in the guest and the second time in the
host.
This is arguably not optimal as it burns host CPU cycles
needlessly. Guest already made it choice and sent packets in the
order it wants.
To resolve this, Linux kernel offers 'noqueue' qdisc which can be
applied on virtual interfaces and in fact for 'lo' it is by
default:
lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
Set it for other TAP devices we create for domains too. With this
change I was able to squeeze 1Mbps more from a macvtap attached
to a guest and to my 1Gbps LAN (as measured by iperf3).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1329644
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This helper changes the root qdisc on given interface.
Ideally, it would be written using netlink but my attempts to
write the code were not successful and thus I've fallen back to
virCommand() + tc.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Currently, we are mixing: #if HAVE_BLAH with #if WITH_BLAH.
Things got way better with Pavel's work on meson, but apparently,
mixing these two lead to confusing and easy to miss bugs (see
31fb929eca for instance). While we were forced to use HAVE_
prefix with autotools, we are free to chose our own prefix with
meson and since WITH_ prefix appears to be more popular let's use
it everywhere.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
The net/if.h is not portable so we must check for its
existance and avoid using it when missing. Some use
of net/if.h was redundant and could be removed.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This is needed if we want to call the function when the
virDomainNetDef* we have is a const.
Since virDomainNetGetActualVlan returns a pointer to memory that is
within the virDomainNetDefPtr arg, the returned pointer must also be
made const. This leads to a cascade of other virNetDevVlanPtr's that
must be changed to "const virNetDevVlan *".
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Now that we no longer use any of the macros from this file, remove it.
This also removes a typo.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Since commit 44e7f02915
util: rewrite auto cleanup macros to use glib's equivalent
VIR_AUTOPTR aliases to g_autoptr. Replace all uses of VIR_DEFINE_AUTOPTR_FUNC
with G_DEFINE_AUTOPTR_CLEANUP_FUNC in preparation for replacing the
rest.
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
If managed='no', then the tap device must already exist, and setting
of MAC address and online status (IFF_UP) is skipped.
NB: we still set IFF_VNET_HDR and IFF_MULTI_QUEUE as appropriate,
because those bits must be properly set in the TUNSETIFF we use to set
the tap device name of the handle we've opened - if IFF_VNET_HDR has
not been set and we set it the request will be honored even when
running libvirtd unprivileged; if IFF_MULTI_QUEUE is requested to be
different than how it was created, that will result in an error from
the kernel. This means that you don't need to pay attention to
IFF_VNET_HDR when creating the tap devices, but you *do* need to set
IFF_MULTI_QUEUE if you're going to use multiple queues for your tap
device.
NB2: /dev/vhost-net normally has permissions 600, so it can't be
opened by an unprivileged process. This would normally cause a warning
message when using a virtio net device from an unprivileged
libvirtd. I've found that setting the permissions for /dev/vhost-net
permits unprivileged libvirtd to use vhost-net for virtio devices, but
have no idea what sort of security implications that has. I haven't
changed libvrit's code to avoid *attempting* to open /dev/vhost-net -
if you are concerned about the security of opening up permissions of
/dev/vhost-net (probably a good idea at least until we ask someone who
knows about the code) then add <driver name='qemu'/> to the interface
definition and you'll avoid the warning message.
Note that virNetDevTapCreate() is the correct function to call in the
case of an existing device, because the same ioctl() that creates a
new tap device will also open an existing tap device.
Resolves: https://bugzilla.redhat.com/1723367 (partially)
Signed-off-by: Laine Stump <laine@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
virutil.(c|h) is a very gross collection of random code. Remove the enum
handlers from there so we can limit the scope where virtutil.h is used.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
'viralloc.h' does not provide any type or macro which would be necessary
in headers. Prevent leakage of the inclusion.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Keeping them with viralloc.h forcibly pulls in the other stuff from
viralloc.h into other header files. This in turn creates a mess
as more and more headers pull in the 'viral' header file.
If we want to make 'viralloc.h' omnipresent we should pick a different
approach.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
For consistency, let's use the semicolon for all definitions.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Missing semicolon at the end of macros can confuse some analyzers
(like cppcheck <filename>), and we have a mix of semicolon and
non-semicolon usage through the code. Let's standardize on using
a semicolon for VIR_ENUM_DECL calls.
Drop the semicolon from the final statement of the macro, so
the compiler will require callers to add a semicolon.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Require that all headers are guarded by a symbol named
LIBVIRT_$FILENAME
where $FILENAME is the uppercased filename, with all characters
outside a-z changed into '_'.
Note we do not use a leading __ because that is technically a
namespace reserved for the toolchain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
In many files there are header comments that contain an Author:
statement, supposedly reflecting who originally wrote the code.
In a large collaborative project like libvirt, any non-trivial
file will have been modified by a large number of different
contributors. IOW, the Author: comments are quickly out of date,
omitting people who have made significant contribitions.
In some places Author: lines have been added despite the person
merely being responsible for creating the file by moving existing
code out of another file. IOW, the Author: lines give an incorrect
record of authorship.
With this all in mind, the comments are useless as a means to identify
who to talk to about code in a particular file. Contributors will always
be better off using 'git log' and 'git blame' if they need to find the
author of a particular bit of code.
This commit thus deletes all Author: comments from the source and adds
a rule to prevent them reappearing.
The Copyright headers are similarly misleading and inaccurate, however,
we cannot delete these as they have legal meaning, despite being largely
inaccurate. In addition only the copyright holder is permitted to change
their respective copyright statement.
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Using the new VIR_DEFINE_AUTOPTR_FUNC macro defined in
src/util/viralloc.h, define a new wrapper around an existing
cleanup function which will be called when a variable declared
with VIR_AUTOPTR macro goes out of scope. Also, drop the redundant
viralloc.h include, since that has moved from the source module into
the header.
When variables of type virNetDevRxFilterPtr and virNetDevMcastEntryPtr
are declared using VIR_AUTOPTR, the functions virNetDevRxFilterFree
and virNetDevMcastEntryFree, respectively, will be run
automatically on them when they go out of scope.
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Adding functionality to libvirt that will allow querying the interface
for the availability of switchdev Offloading NIC capabilities.
The switchdev mode was introduced in kernel 4.8, the iproute2-devlink
command to retrieve the switchdev NIC feature with command example:
devlink dev eswitch show pci/0000:03:00.0
This feature is needed for Openstack so we can do a scheduling decision
if the NIC is in Hardware Offload (switchdev) or regular SR-IOV (legacy) mode.
And select the appropriate hypervisors with the requested capability see [1].
[1] - https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/enable-sriov-nic-features.html
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
On Linux each network device *can* (but not necessarily *does*) have
an attribute called phys_port_id which can be read from the file of
that name in the netdev's sysfs directory. The examples I've seen have
been a many-digit hexadecimal number (as an ASCII string).
This value can be useful when a single PCI device is associated with
multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has
a single PCI Physical Function (PF), and that PF has two netdevs
associated with it (the "net" subdirectory of the PF in sysfs has two
links rather than the usual single link to a netdev directory). Each
of the PF netdevs has a different phys_port_id. The Virtual Functions
(VF) are similar - the PF (a PCI device) has "n" VFs (also each of
these is a PCI device), each VF has two netdevs, and each of the VF
netdevs points back to the VF PCI device (with the "device" entry in
its sysfs directory) as well as having a phys_port_id matching the PF
netdev it is associated with.
virNetDevGetPhysPortID() simply attempts to read the phys_port_id for
the given netdev and return it to the caller. If this particular
netdev driver doesn't support phys_port_id, it returns NULL (*not* a
NULL-terminated string, but a NULL pointer) but still counts it as a
success.
This reverts commit e4b980c853.
When a binary links against a .a archive (as opposed to a shared library),
any symbols which are marked as 'weak' get silently dropped. As a result
when the binary later runs, those 'weak' functions have an address of
0x0 and thus crash when run.
This happened with virtlogd and virtlockd because they don't link to
libvirt.so, but instead just libvirt_util.a and libvirt_rpc.a. The
virRandomBits symbols was weak and so left out of the virtlogd &
virtlockd binaries, despite being required by virHashTable functions.
Various other binaries like libvirt_lxc, libvirt_iohelper, etc also
link directly to .a files instead of libvirt.so, so are potentially
at risk of dropping symbols leading to a later runtime crash.
This is normal linker behaviour because a weak symbol is not treated
as undefined, so nothing forces it to be pulled in from the .a You
have to force the linker to pull in weak symbols using -u$SYMNAME
which is not a practical approach.
This risk is silent bad linkage that affects runtime behaviour is
not acceptable for a fix that was merely trying to fix the test
suite. So stop using __weak__ again.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently all mockable functions are annotated with the 'noinline'
attribute. This is insufficient to guarantee that a function can
be reliably mocked with an LD_PRELOAD. The C language spec allows
the compiler to assume there is only a single implementation of
each function. It can thus do things like propagating constant
return values into the caller at compile time, or creating
multiple specialized copies of the function body each optimized
for a different caller. To prevent these optimizations we must
also set the 'noclone' and 'weak' attributes.
This fixes the test suite when libvirt.so is built with CLang
with optimization enabled.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Change the settings from qemuDomainUpdateDeviceLive() as otherwise the
call would succeed even though nothing has changed.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1414627
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
... with VIR_NET_GENERATED_MACV???_PREFIX, which is defined in
util/virnetdevmacvlan.h.
Since VIR_NET_GENERATED_PREFIX is used for plain tap devices, it is
renamed to VIR_NET_GENERATED_TAP_PREFIX and moved to virnetdev.h
That function is able to configure coalesce settings for an interface,
similarly to 'ethtool -C'. This function also updates back the
structure so that it contains actual data on the device (if the device
doesn't support some settings kernel might just return 0 and not set
whatever is not supported), so this way we'll have up-to-date
information in the live domain XML.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
CLang's optimizer is more aggressive at inlining functions than
gcc and so will often inline functions that our tests want to
mock-override. This causes the test to fail in bizarre ways.
We don't want to disable inlining completely, but we must at
least prevent inlining of mocked functions. Fortunately there
is a 'noinline' attribute that lets us control this per function.
A syntax check rule is added that parses tests/*mock.c to extract
the list of functions that are mocked (restricted to names starting
with 'vir' prefix). It then checks that src/*.h header file to
ensure it has a 'ATTRIBUTE_NOINLINE' annotation. This should prevent
use from bit-rotting in future.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The global functions virNetDevReplaceMacAddress(),
virNetDevReplaceNetConfig(), virNetDevRestoreMacAddress(), and
virNetDevRestoreNetConfig() are no longer used, as their functionality
has been replaced by virNetDev(Save|Read|Set)NetConfig().
The static functions virNetDevReplaceVfConfig() and
virNetDevRestoreVfConfig() were only used by the above-named global
functions that were removed.
These three functions are destined to replace
virNetDev(Replace|Restore)NetConfig() and
virNetDev(Replace|Restore)MacAddress(), which both do the save and set
together as a single step. We need to separate the save, read, and set
steps because there will be situations where we need to do something
else in between (in particular, we will need to rebind a VF's driver
after save but before set).
This patch creates the new functions, but doesn't call them - that
will come in a subsequent patch. Note that the new functions to
read/write the file that stores the original network config now uses
JSON rather than plaintext (it still recognizes the old format as well
though, so it won't get confused during an upgrade).
Given an SRIOV PF netdev name (e.g. "enp2s0f0") and VF#, this new
function returns the netdev name of the referenced VF device
(e.g. "enp2s11f6"), or NULL if the device isn't bound to a net driver.
This function provides the bridge/bond device that the given network
device is attached to. The return value is 0 or -1, and the master
device is a char** argument to the function - this is needed in order
to allow for a "success" return from a device that has no master.
Proposed formal coding conventions encourage defining typedefs for
vir[Blah] and vir[Blah]Ptr separately from the associated struct named
_vir[Blah]:
typedef struct _virBlah virBlah;
typedef virBlah *virBlahPtr;
struct _virBlah {
...
};
At some point in the past, I had submitted several patches using a
more compact style that I prefer, and they were accepted:
typedef struct _virBlah {
...
} virBlah, *virBlahPtr;
Since these are by far a minority among all struct definitions, this
patch changes all those definitions to reflect the style prefered by
the proposal so that there is 100% consistency.
Commit cf0568b0af moved a bunch of functions from virNetDev
to the more specific virNetDevIP; however, not all of the
existing uses were moved properly, causing build failures on
FreeBSD.
Complete the transition to the new names and drop the
obsolete declarations from the header file while at it.
These had been declared in conf/device_conf.h, but then used in
util/virnetdev.c, meaning that we had to #include conf/device_conf.h
in virnetdev.c (which we have for a long time said shouldn't be done.
This caused a bigger problem when I tried to #include util/virnetdev.h
in a file in src/conf (which is allowed) - for some reason the
"device_conf.h: File not found" error.
The solution is to move the data types and functions used in util
sources from conf to util. Some names were adjusted during the move
("virInterface" --> "virNetDevIf", and "VIR_INTERFACE" -->
"VIR_NETDEV_IF")
virNetDevLinkDump should have been in virnetlink.c, but that file
didn't exist yet when the function was created. It didn't really
matter until now - I found that having virnetlink.h included by
virnetdev.h caused build problems when trying to #include virnetdev.h
in a .c file in src/conf (due to missing directory in -I). Rather than
fix that to further institutionalize the incorrect placement of this
one function, this patch moves the function.
SRIOV VFs used in macvtap passthrough mode can take advantage of the
SRIOV card's transparent vlan tagging. All the code was there to set
the vlan tag, and it has been used for SRIOV VFs used for hostdev
interfaces for several years, but for some reason, the vlan tag for
macvtap passthrough devices was stubbed out with a -1.
This patch moves a bit of common validation down to a lower level
(virNetDevReplaceNetConfig()) so it is shared by hostdev and macvtap
modes, and updates the macvtap caller to actually send the vlan config
instead of -1.
A PCI device may have the capability to setup virtual functions (VFs)
but have them currently all disabled. Prior to this patch, if that was
the case the the node device XML for the device wouldn't report any
virtual_functions capability.
With this patch, if a file called "sriov_totalvfs" is found in the
device's sysfs directory, its contents will be interpreted as a
decimal number, and that value will be reported as "maxCount" in a
capability element of the device's XML, e.g.:
<capability type='virtual_functions' maxCount='7'/>
This will be reported regardless of whether or not any VFs are
currently enabled for the device.
NB: sriov_numvfs (the number of VFs currently active) is also
available in sysfs, but that value is implied by the number of items
in the list that is inside the capability element, so there is no
reason to explicitly provide it as an attribute.
sriov_totalvfs and sriov_numvfs are available in kernels at least as far
back as the 2.6.32 that is in RHEL6.7, but in the case that they
simply aren't there, libvirt will behave as it did prior to this patch
- no maxCount will be displayed, and the virtual_functions capability
will be absent from the device's XML when 0 VFs are enabled.
Commit id '0f7436ca' added virNetDevWaitDadFinish using ATTRIBUTE_NONNULL
for both arguments, although one is a non-null argument. A Coverity build
balks at that.
commit db488c79 assumed that dnsmasq would complete IPv6 DAD before
daemonizing, but in reality it doesn't wait, which creates problems
when libvirt's bridge driver sets the matching "dummy tap device" to
IFF_DOWN prior to DAD completing.
This patch waits for DAD completion by periodically polling the kernel
using netlink to check whether there are any IPv6 addresses assigned
to bridge which have a 'tentative' state (if there are any in this
state, then DAD hasn't yet finished). After DAD is finished, execution
continues. To avoid an endless hang in case something was wrong with
the kernel's DAD, we wait a maximum of 5 seconds.
These functions were made static as a part of commit cbfe38c since
they were no longer called from outside virnetdev.c. We once again
need to call them from another file, so this patch makes them once
again public.
Adding functionality to libvirt that will allow
it query the interface for the availability of RDMA and
tx-udp_tnl-segmentation Offloading NIC capabilities
Here is an example of the feature XML definition:
<device>
<name>net_eth4_90_e2_ba_5e_a5_45</name>
<path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/net/eth4</path>
<parent>pci_0000_08_00_1</parent>
<capability type='net'>
<interface>eth4</interface>
<address>90:e2:ba:5e:a5:45</address>
<link speed='10000' state='up'/>
<feature name='rx'/>
<feature name='tx'/>
<feature name='sg'/>
<feature name='tso'/>
<feature name='gso'/>
<feature name='gro'/>
<feature name='rxvlan'/>
<feature name='txvlan'/>
<feature name='rxhash'/>
<feature name='rdma'/>
<feature name='txudptnl'/>
<capability type='80203'/>
</capability>
</device>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1113474
When we set the MAC address of a network device as a part of setting
up macvtap "passthrough" mode (where the domain has an emulated netdev
connected to a host macvtap device that has exclusive use of the
physical device, and sets the device MAC address to match its own,
i.e. "<interface type='direct'> <source mode='passthrough' .../>"), we
use ioctl(SIOCSIFHWADDR) giving it the name of that device. This is
true even if it is an SRIOV Virtual Function (VF).
But, when we are setting the MAC address / vlan ID of a VF in
preparation for "hostdev network" passthrough (this is where we set
the MAC address and vlan id of the VF after detaching the host net
driver and before assigning the device to the domain with PCI
passthrough, i.e. "<interface type='hostdev'>", we do the setting via
a netlink RTM_SETLINK message for that VF's Physical Function (PF),
telling it the VF# we want to change. This sets an "administratively
changed MAC" flag for that VF in the PF's driver, and from that point
on (until the PF driver is reloaded, *not* merely the VF driver) that
VF's MAC address can't be changed using ioctl(SIOCSIFHWADDR) - the
only way to change it is via the PF with RTM_SETLINK.
This means that if a VF is used for hostdev passthrough, it will have
the admin flag set, and future attempts to use that VF for macvtap
passthrough will fail.
The solution to this problem is to check if the device being used for
macvtap passthrough is actually a VF; if so, we use the netlink
RTM_SETLINK message to the PF to set the VF's mac address instead of
ioctl(SIOCSIFHWADDR) directly to the VF; if not, behavior does not
change from previously.
There are three pieces to making this work:
1) virNetDevMacVLan(Create|Delete)WithVPortProfile() now call
virNetDev(Replace|Restore)NetConfig() rather than
virNetDev(Replace|Restore)MacAddress() (simply passing -1 for VF#
and vlanid).
2) virNetDev(Replace|Restore)NetConfig() check to see if the device is
a VF. If so, they find the PF's name and VF#, allowing them to call
virNetDev(Replace|Restore)VfConfig().
3) To prevent mixups when detaching a macvtap passthrough device that
had been attached while running an older version of libvirt,
virNetDevRestoreVfConfig() is potentially given the preserved name
of the VF, and if the proper statefile for a VF can't be found in
the stateDir (${stateDir}/${pfname}_vf${vfid}),
virNetDevRestoreMacAddress() is called instead (which will look in
the file named ${stateDir}/${vfname}).
This problem has existed in every version of libvirt that has both
macvtap passthrough and interface type='hostdev'. Fortunately people
seem to use one or the other though, so it hasn't caused any real
world problem reports.
Throughout the code, we have several places need to construct a path
somewhere in /sys/class/net/... They are not consistent and nearly
each code piece invents its own way how to do it. So unify this by:
1) use virNetDevSysfsFile() wherever possible
2) At least use common macro SYSFS_NET_DIR declared in virnetdev.h at
the rest of places which can't go with 1)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>