If we are just ejecting media, ret == -1 even after the retry loop
determines that the tray is open, as requested. This means media
disconnect always report's error.
Fix it, and fix some other mini issues:
- Don't overwrite the 'eject' error message if the retry loop fails
- Move the retries decrement inside the loop, otherwise the final loop
might succeed, yet retries == 0 and we will raise error
- Setting ret = -1 in the disk->src check is unneeded
- Fix comment typos
cc: mprivozn@redhat.com
(cherry picked from commit 406d8a9809)
In 84c59ffa I've tried to fix changing ejectable media process. The
process should go like this:
1) we need to call 'eject' on the monitor
2) we should wait for 'DEVICE_TRAY_MOVED' event
3) now we can issue 'change' command
However, while waiting in step 2) the domain monitor was locked. So
even if qemu reported the desired event, the proper callback was not
called immediately. The monitor handling code needs to lock the
monitor in order to read the event. So that's the first lock we must
not hold while waiting. The second one is the domain lock. When
monitor handling code reads an event, the appropriate callback is
called then. The first thing that each callback does is locking the
corresponding domain as a domain or its device is about to change
state. So we need to unlock both monitor and VM lock. Well, holding
any lock while sleep()-ing is not the best thing to do anyway.
(cherry picked from commit 543af79a14)
Conflicts:
src/qemu/qemu_hotplug.c
https://bugzilla.redhat.com/show_bug.cgi?id=892289
It seems like with new udev within guest OS, the tray is locked,
so we need to:
- 'eject'
- wait for tray to open
- 'change'
Moreover, even when doing bare 'eject', we should check for
'tray_open' as guest may have locked the tray. However, the
waiting phase shouldn't be unbounded, so I've chosen 10 retries
maximum, each per 500ms. This should give enough time for guest
to eject a media and open the tray.
(cherry picked from commit 84c59ffaec)
This patch is in relation to Bug 966449:
https://bugzilla.redhat.com/show_bug.cgi?id=966449
This is a patch addressing the coredump.
Thread 1 must be calling nwfilterDriverRemoveDBusMatches(). It does so with
nwfilterDriverLock held. In the patch below I am now moving the
nwfilterDriverLock(driverState) further up so that the initialization, which
seems to either take a long time or is entirely stuck, occurs with the lock
held and the shutdown cannot occur at the same time.
Remove the lock in virNWFilterDriverIsWatchingFirewallD to avoid
double-locking.
(cherry picked from commit 0ec376c20a)
qemu-img resize will fail with "The new size must be a multiple of 512"
if libvirt doesn't round it first.
This fixes rhbz#951495
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
(cherry picked from commit 9a8f39d097)
Typically when you get EOF on a stream, poll will return
POLLIN|POLLHUP at the same time. Thus when we deal with
stream reads, if we see EOF during the read, we can then
clear the VIR_STREAM_EVENT_HANGUP & VIR_STREAM_EVENT_ERROR
event bits.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit e7b0382945)
Reported by Anthony Messina in
https://bugzilla.redhat.com/show_bug.cgi?id=904692
Present since introduction of smartcard support in commit f5fd9baa
* src/qemu/qemu_command.c (qemuBuildCommandLine): Match qemu spelling.
* tests/qemuxml2argvdata/qemuxml2argv-smartcard-host-certificates.args:
Fix broken test.
(cherry picked from commit 6f7e4ea359)
The previous commit was an incomplete backport of commit 83e4c775,
and as a result made any attempt to start a domain when cgroups
are enabled go into an infinite loop. This fixes the botched
backport.
Signed-off-by: Eric Blake <eblake@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=965169 documents a
problem starting domains when cgroups are enabled; I was able
to reliably reproduce the race about 5% of the time when I added
hooks to domain startup by 3 seconds (as that seemed to be about
the length of time that qemu created and then closed a temporary
thread, probably related to aio handling of initially opening
a disk image). The problem has existed since we introduced
virCgroupMoveTask in commit 9102829 (v0.10.0).
There are some inherent TOCTTOU races when moving tasks between
kernel cgroups, precisely because threads can be created or
completed in the window between when we read a thread id from the
source and when we write to the destination. As the goal of
virCgroupMoveTask is merely to move ALL tasks into the new
cgroup, it is sufficient to iterate until no more threads are
being created in the old group, and ignoring any threads that
die before we can move them.
It would be nicer to start the threads in the right cgroup to
begin with, but by default, all child threads are created in
the same cgroup as their parent, and we don't want vcpu child
threads in the emulator cgroup, so I don't see any good way
of avoiding the move. It would also be nice if the kernel were
to implement something like rename() as a way to atomically move
a group of threads from one cgroup to another, instead of forcing
a window where we have to read and parse the source, then format
and write back into the destination.
* src/util/vircgroup.c (virCgroupAddTaskStrController): Ignore
ESRCH, because a thread ended between read and write attempts.
(virCgroupMoveTask): Loop until all threads have moved.
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 83e4c77547)
Conflicts:
src/util/cgroup.c - refactoring in commit 56f27b3bb is too big
to take in entirety; but I did inline its changes to the cleanup label
The code for putting the emulator threads in a separate cgroup
would spam the logs with warnings
2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3
2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4
2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6
This is because it has only created child cgroups for 3 of the
controllers, but was trying to move the processes from all the
controllers. The fix is to only try to move threads in the
controllers we actually created. Also remove the warning and
make it return a hard error to avoid such lazy callers in the
future.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 279336c5d8)
The QEMU driver has a list of devices nodes that are whitelisted
for all guests. The kernel has recently started returning an
error if you try to whitelist a device which does not exist.
This causes a warning in libvirt logs and an audit error for
any missing devices. eg
2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 7f544a4c8f)
When given a CA cert with basic constraints to set non-critical,
and key usage of 'key signing', this should be rejected. Version
of GNUTLS < 3 do not rejecte it though, so we never noticed the
test case was broken
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 0204d6d7a0)
CVE-2013-1962
remoteDispatchStoragePoolListAllVolumes wasn't freeing the pool.
The pool also held a reference to the connection, preventing it from
getting freed and closing the netcf interface driver, which held two
sockets open.
(cherry picked from commit ca697e90d5)
https://bugzilla.redhat.com/show_bug.cgi?id=924501 tracks a
problem that occurs if uid 107 is already in use at the time
libvirt is first installed. In response that problem, Fedora
packaging guidelines were recently updated. This fixes the
spec file to comply with the new guidelines:
https://fedoraproject.org/wiki/Packaging:UsersAndGroups
* libvirt.spec.in (daemon): Follow updated Fedora guidelines.
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit a2584d58f6)
Conflicts:
libvirt.spec.in - no backport of c8f79c9b %if reindents
When a changelog entry references an RPM macro, % needs to be escaped so
that it does not appear expanded in package changelog.
Fri Mar 4 2009 is incorrect since Mar 4 was Wednesday. Since
libvirt-0.6.1 was released on Mar 4 2009, we should change Fri to Wed.
(cherry picked from commit 53657a0abe)
The macro was made to help installing broken packages that did not use
DESTDIR correctly by overriding individual path variables (prefix,
sysconfdir, ...). Newer rpm provides fixed make_install macro that calls
make install with just the correct DESTDIR, however it is not available
everywhere (e.g., RHEL 5 does not have it). On the other hand the
make_install macro is simple and straightforward enough for us to use
its expansion directly.
(cherry picked from commit d45066a55f)
https://bugzilla.redhat.com/show_bug.cgi?id=922186
Commit d04916fa introduced a regression in audit quality - even
though the code was computing the proper escaped name for a
path, it wasn't feeding that escaped name on to the audit message.
As a result, /var/log/audit/audit.log would mention a pair of
fields class=path path=/dev/hpet instead of the intended
class=path path="/dev/hpet", which in turn caused ausearch to
format the audit log with path=(null).
* src/conf/domain_audit.c (virDomainAuditCgroupPath): Use
constructed encoding.
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 31c6bf35b9)
When virStorageBackendLogicalCreateVol() creates a snapshot for a
logical volume with backingStore element, it fails with the message
below:
2013-01-17 03:10:18.869+0000: 1967: error : virCommandWait:2345 :
internal error Child process (/sbin/lvcreate --name lvm-snapshot -L 51200K
-s=/dev/lvm-pool/lvm-volume) unexpected exit status 3: /sbin/lvcreate:
invalid option -- '=' Error during parsing of command line.
This is because virCommandAddArgPair() uses '=' to connect the two
parameters, it's unsuitable for -s option of the lvcreate.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
(cherry picked from commit ffee627a4a)
Avoid requesting information such as identity or power state when it
is not necessary.
Lookup virtual machine list with the required fields (configStatus,
name, and config.uuid) to make esxVI_GetVirtualMachineIdentity work.
No need to call esxVI_GetNumberOfSnapshotTrees. rootSnapshotTreeList
can be tested for emptiness by checking it for NULL.
esxVI_LookupRootSnapshotTreeList already does the error reporting,
don't overwrite it.
Check if autostart is enabled at all before looking up the individual
autostart setting of a virtual machine.
Reorder VIR_EXPAND_N(doms, ndoms, 1) to avoid leaking the result of
the call to virGetDomain if VIR_EXPAND_N fails.
Replace VIR_EXPAND_N by VIR_RESIZE_N to avoid quadratic scaling, as in
the Hyper-V version of the function.
If virGetDomain fails it already reports an error, don't overwrite it
with an OOM error.
All items in doms up to the count-th one are valid, no need to double
check before freeing them.
Finally, don't leak autoStartDefaults and powerInfoList.
(cherry picked from commit 5fc663d8be)
Normally libvirtd should run with a SELinux label
system_u:system_r:virtd_t:s0-s0:c0.c1023
If a user manually runs libvirtd though, it is sometimes
possible to get into a situation where it is running
system_u:system_r:init_t:s0
The SELinux security driver isn't expecting this and can't
parse the security label since it lacks the ':c0.c1023' part
causing it to complain
internal error Cannot parse sensitivity level in s0
This updates the parser to cope with this, so if no category
is present, libvirtd will hardcode the equivalent of c0.c1023.
Now this won't work if SELinux is in Enforcing mode, but that's
not an issue, because the user can only get into this problem
if in Permissive mode. This means they can now start VMs in
Permissive mode without hitting that parsing error
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 1732c1c629)
Conflicts:
src/security/security_selinux.c
Pull the code which parses the current process MCS range
out of virSecuritySELinuxMCSFind and into a new method
virSecuritySELinuxMCSGetProcessRange.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 4a92fe4413)
Conflicts:
src/security/security_selinux.c
The body of the loop in virSecuritySELinuxMCSFind would
directly 'return NULL' on OOM, instead of jumping to the
cleanup label. This caused a leak of several local vars.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit f2d8190cfb)
Since we switched from direct host migration scheme to the one,
where we connect to the destination and then just pass a FD to a
qemu, we have uncovered a qemu bug. Qemu expects migration FD to
block. However, we are passing a nonblocking one which results in
cryptic error messages like:
qemu: warning: error while loading state section id 2
load of migration failed
The bug is already known to Qemu folks, but we should workaround
already released Qemus. Patch has been originally proposed by Stefan
Hajnoczi <stefanha@gmail.com>
(cherry picked from commit ceb31795af)
Commit c308a9ae was incomplete; it resolved the configure failure,
but not a later build failure.
* src/util/virnetdevbridge.c: Include pre-req header.
* configure.ac (AC_CHECK_HEADERS): Prefer standard in.h over
non-standard ip6.h.
(cherry picked from commit 1bf661caf4)
I got this scary warning during ./configure on rawhide:
checking linux/if_bridge.h usability... no
checking linux/if_bridge.h presence... yes
configure: WARNING: linux/if_bridge.h: present but cannot be compiled
configure: WARNING: linux/if_bridge.h: check for missing prerequisite headers?
configure: WARNING: linux/if_bridge.h: see the Autoconf documentation
configure: WARNING: linux/if_bridge.h: section "Present But Cannot Be Compiled"
configure: WARNING: linux/if_bridge.h: proceeding with the compiler's result
configure: WARNING: ## ------------------------------------- ##
configure: WARNING: ## Report this to libvir-list@redhat.com ##
configure: WARNING: ## ------------------------------------- ##
checking for linux/if_bridge.h... no
* configure.ac (AC_CHECK_HEADERS): Provide struct in6_addr, since
linux/if_bridge.h uses it without declaring it.
(cherry picked from commit c308a9ae15)
(cherry picked from commit 7ae53f1593)
If securityselinuxtest was run on a system with newer SELinux
policy it would fail, due to using svirt_tcg_t instead of
svirt_t. Fixing the domain type to be KVM avoids this issue.
(cherry picked from commit 32df483f1d)
This resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=895294
The symptom was that attempts to modify a network device using
virDomainUpdateDeviceFlags() would fail if the original device had a
<boot> element (e.g. "<boot order='1'/>"), even if the updated device
had the same <boot> element. Instead, the following error would be logged:
cannot modify network device boot index setting
It's true that it's not possible to change boot order (internally
known as bootIndex) of a live device; qemuDomainChangeNet checks for
that, but the problem was that the information it was checking was
incorrect.
Explanation:
When a complete domain is parsed, a global (to the domain) "bootMap"
is passed down to the parse for each device; the bootMap is used to
make sure that devices don't have conflicting settings for their boot
orders.
When a single device is parsed by itself (as in the case of
virDomainUpdateDeviceFlags), there is no global bootMap that would be
appropriate to send, so NULL is sent instead. However, although the
lowest level function that parses just the boot order *does* simply
skip the sanity check in that case, the next higher level
"virDomainDeviceInfoParseXML" function refuses to call down to the
lower "virDomainDeviceBootParseXML" if bootMap is NULL. So, the boot
order is never set in the "new" device object, and when it is compared
to the original (which does have a boot order), they don't match.
The fix is to patch virDomainDeviceInfoParseXML to not care about
bootMap, and just always call virDomainDeviceInfoBootParseXML whenever
there is a <boot> element. When we are only parsing a single device,
we don't care whether or not any specified boot order is consistent
with the rest of the domain; we will always do this check later (in
the current case, we do it by verifying that the net bootIndex exactly
matches the old bootIndex).
The current SELinux policy only works for KVM guests, since
TCG requires the 'execmem' privilege. There is a 'virt_use_execmem'
boolean to turn this on globally, but that is unpleasant for users.
This changes libvirt to automatically use a new 'svirt_tcg_t'
context for TCG based guests. This obsoletes the previous
boolean tunable and makes things 'just work(tm)'
Since we can't assume we run with new enough policy, I also
make us log a warning message (once only) if we find the policy
lacks support. In this case we fallback to the normal label and
expect users to set the boolean tunable
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 77d3a80974)
There's been a few bugs about an expected error from polkit:
https://bugzilla.redhat.com/show_bug.cgi?id=873799https://bugzilla.redhat.com/show_bug.cgi?id=872166
The error is:
Authorization requires authentication but no agent is available.
The error means that polkit needs a password, but there is no polkit
agent registered in your session. Polkit agents are the bit of UI that
pop up and actually ask for your password.
Preface the error with the string 'polkit:' so folks can hopefully
make more sense of it.
(cherry picked from commit 96a108c993)
According to Eric Paris this is slightly more efficient because it
only loads the regular expressions in libselinux once.
(cherry picked from commit 6159710ca1)
Conflicts:
src/security/security_selinux.c
The virSecurityManager{Set,Restore}AllLabel methods are invoked
at domain startup/shutdown to relabel resources associated with
a domain. This works fine with QEMU, but with LXC they are in
fact both currently no-ops since LXC does not support disks,
hostdevs, or kernel/initrd files. Worse, when LXC gains support
for disks/hostdevs, they will do the wrong thing, since they
run in host context, not container context. Thus this patch
turns then into a formal no-op when used with LXC. The LXC
controller will call out to specific security manager labelling
APIs as required during startup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 89c5a9d0e8)
Commit id a994ef2d1 changed the mechanism to store/update the default
security label from using disk->seclabels[0] to allocating one on the
fly. That change allocated the label, but never saved it. This patch
will save the label. The new virDomainDiskDefAddSecurityLabelDef() is
a copy of the virDomainDefAddSecurityLabelDef().
(cherry picked from commit 05cc035189)
Conflicts:
src/conf/domain_conf.h
This patch resolves CVE-2013-0170:
https://bugzilla.redhat.com/show_bug.cgi?id=893450
When reading and dispatching of a message failed the message was freed
but wasn't removed from the message queue.
After that when the connection was about to be closed the pointer for
the message was still present in the queue and it was passed to
virNetMessageFree which tried to call the callback function from an
uninitialized pointer.
This patch removes the message from the queue before it's freed.
* rpc/virnetserverclient.c: virNetServerClientDispatchRead:
- avoid use after free of RPC messages
(cherry picked from commit 46532e3e8e)
https://bugzilla.redhat.com/show_bug.cgi?id=903184
Although the nwfilter driver skips startup when running in a
session libvirtd, it did not skip reload or shutdown. This
caused errors to be reported when sending SIGHUP to libvirtd,
and caused an abort() in libdbus on shutdown due to trying
to remove a dbus filter that was never added
(cherry picked from commit abbec81bd0)
Conflicts:
src/nwfilter/nwfilter_driver.c - earlier changes f4ea67f and
79b8a56 related to using bool and auto-shutdown of drivers are not backported
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit 81621f3e6e)
Conflicts:
src/qemu/qemu_driver.c - virReportError had been removed from
upstream in cases where qemuProcessKill failed, creating
different context.
When building libvirt rpms on rhel5, I got the following error:
File must begin with "/": rm
File must begin with "/": -f
File must begin with "/": $RPM_BUILD_ROOT/etc/sysctl.d/libvirtd
Installed (but unpackaged) file(s) found:
/etc/sysctl.d/libvirtd
It is triggerd by the %files list of libvirt daemon:
%if 0%{?fedora} >= 14 || 0%{?rhel} >= 6
%config(noreplace) %{_prefix}/lib/sysctl.d/libvirtd.conf
%else
rm -f $RPM_BUILD_ROOT%{_prefix}/lib/sysctl.d/libvirtd.conf
%endif
After checking document of rpm spec file, I think it would be better
to move the file deleting line from %files list to %install script.
Bug introduced in commit a1fd56c.
(cherry picked from commit daef7c9e9c)
In a non-systemd environment the post and preun scripts of libvirt-client
fail, since the required files are in libvirt-daemon. Moved them to client.
Doing that I noticed %{_unitdir}/libvirt-guests.service was contained in
both libvirt-client and libvirt-daemon, which I don't think was intended.
Removed the extra copy from daemon.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
(cherry picked from commit b7159dca8b)
Conflicts:
libvirt.spec.in - no virtlockd service
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
(cherry picked from commit 3c83df679e)
This is an adjustment to the fix for
https://bugzilla.redhat.com/show_bug.cgi?id=889319
to account for two bonehead mistakes I made.
commit ac2797cf2a attempted to fix a
problem with netlink in newer kernels requiring an extra attribute
with a filter flag set in order to receive an IFLA_VFINFO_LIST from
netlink. Unfortunately, the #ifdef that protected against compiling it
in on systems without the new flag went a bit too far, assuring that
the new code would *never* be compiled, and even if it had, the code
was incorrect.
The first problem was that, while some IFLA_* enum values are also
their existence at compile time, IFLA_EXT_MASK *isn't* #defined, so
checking to see if it's #defined is not a valid method of determining
whether or not to add the attribute. Fortunately, the flag that is
being set (RTEXT_FILTER_VF) *is* #defined, and it is never present if
IFLA_EXT_MASK isn't, so it's sufficient to just check for that flag.
And to top it off, due to the code not actually compiling when I
thought it did, I didn't realize that I'd been given the wrong arglist
to nla_put() - you can't just send a const value to nla_put, you have
to send it a pointer to memory containing what you want to add to the
message, along with the length of that memory.
This time I've actually sent the patch over to the other machine
that's experiencing the problem, applied it to the branch being used
(0.10.2) and verified that it works properly, i.e. it does fix the
problem it's supposed to fix. :-/
(cherry picked from commit 7c36650699)
This patch fixes the lack of error messages when libvirt fails to find
VFINFO in a returned netlinke response message.
https://bugzilla.redhat.com/show_bug.cgi?id=827519#c10 is an example
of the error message that was previously logged when the
IFLA_VFINFO_LIST object was missing from the netlink response. The
reason for this failure is detailed in
https://bugzilla.redhat.com/show_bug.cgi?id=889319
Even though that root problem has been fixed, the experience of
finding the root cause shows us how important it is to properly log an
error message in these cases. This patch *seems* to replace the entire
function, but really most of the changes are due to moving code that
was previously inside an if() statement out to the top level of the
function (the original if() was reversed and made to log an error and
return).
(cherry picked from commit 846770e5ff)
This patch resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=889319
When assigning an SRIOV virtual function to a guest using "intelligent
PCI passthrough" (<interface type='hostdev'>, which sets the MAC
address and vlan tag of the VF before passing its info to qemu),
libvirt first learns the current MAC address and vlan tag by sending
an NLM_F_REQUEST message for the VF's PF (physical function) to the
kernel via a NETLINK_ROUTE socket (see virNetDevLinkDump()); the
response message's IFLA_VFINFO_LIST section is examined to extract the
info for the particular VF being assigned.
This worked fine with kernels up until kernel commit
115c9b81928360d769a76c632bae62d15206a94a (first appearing in upstream
kernel 3.3) which changed the ABI to not return IFLA_VFINFO_LIST in
the response until a newly introduced IFLA_EXT_MASK field was included
in the request, with the (newly introduced, of course) RTEXT_FILTER_VF
flag set.
The justification for this ABI change was that new fields had been
added to the VFINFO, causing NLM_F_REQUEST messages to fail on systems
with large numbers of VFs if the requesting application didn't have a
large enough buffer for all the info. The idea is that most
applications doing an NLM_F_REQUEST don't care about VFINFO anyway, so
eliminating it from the response would lower the requirements on
buffer size. Apparently, the people who pushed this patch made the
mistaken assumption that iproute2 (the "ip" command) was the only
package that used IFLA_VFINFO_LIST, so it wouldn't break anything else
(and they made sure that iproute2 was fixed.
The logic of this "fix" is debatable at best (one could claim that the
proper fix would be for the applications in question to be fixed so
that they properly sized the buffer, which is what libvirt does
(purely by virtue of using libnl), but it is what it is and we have to
deal with it.
In order for <interface type='hostdev'> to work properly on systems
with a kernel 3.3 or later, libvirt needs to add the afore-mentioned
IFLA_EXT_MASK field with RTEXT_FILTER_VF set.
Of course we also need to continue working on systems with older
kernels, so that one bit of code is compiled conditionally. The one
time this could cause problems is if the libvirt binary was built on a
system without IFLA_EXT_MASK which was subsequently updated to a
kernel that *did* have it. That could be solved by manually providing
the values of IFLA_EXT_MASK and RTEXT_FILTER_VF and adding it to the
message anyway, but I'm uncertain what that might actually do on a
system that didn't support the message, so for the time being we'll
just fail in that case (which will very likely never happen anyway).
(cherry picked from commit ac2797cf2a)