Just like in previous commit, qemu-pr-helper might want to open
/dev/mapper/control under certain circumstances. Therefore we
have to allow it in cgroups.
The change virdevmapper.c might look spurious but it isn't. After
6dd84f6850 any path that we're allowing in deivces CGroup is
subject to virDevMapperGetTargets() inspection. And libdevmapper
returns ENXIO for the path from subject.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
For command line we need two things:
1) -object pr-manager-helper,id=$alias,path=$socketPath
2) -drive file.pr-manager=$alias
In -object pr-manager-helper we tell qemu which socket to connect
to, then in -drive file-pr-manager we just reference the object
the drive in question should use.
For managed PR helper the alias is always "pr-helper0" and socket
path "${vm->priv->libDir}/pr-helper0.sock".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Couple of reasons for that:
a) there's no monitor command to change path where the pr-helper
connects to, or
b) there's no monitor command to introduce a new pr-helper for a
disk that already exists.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This is a definition that holds information on SCSI persistent
reservation settings. The XML part looks like this:
<reservations enabled='yes' managed='no'>
<source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/>
</reservations>
If @managed is set to 'yes' then the <source/> is not parsed.
This design was agreed on here:
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Rather than have virJSONValueArraySize return a -1 when the input
is not an array and then splat an error message, let's check for
an array before calling and then change the return to be a size_t
instead of ssize_t.
That means using the helper virJSONValueIsArray as well as using a
more generic error message such as "Malformed <something> array".
In some cases we can remove stack variables and when we cannot,
those variables should be size_t not ssize_t. Alter a few references
of if (!value) to be if (value == 0) instead as well.
Some callers can already assume an array is being worked on based
on the previous call, so there's less to do.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Use virFileIsCDROM to detect whether a block device is a cdrom drive and
store it in virStorageSource. This will be necessary to correctly create
the 'host_cdrom' backend in qemu when using -blockdev.
We assume that host_cdrom makes only sense when used directly as a raw
image, but if a backing chain would be put in front of it, libvirt will
use 'host_device' in that case.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add detection mechanism which will allow to check whether a path to a
block device is a physical CDROM drive. This will be useful once we will
need to pass it to hypervisors.
The linux implementation uses an ioctl to do the detection, while the
fallback uses a simple string prefix match.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Add a flag denoting that a virStorageSource is going to be used as a
floppy image. This will be useful in cases where the user passes in
files which shall be exposed as an image to the guest.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Few things which are currently stored the virDomainDiskDef structure are
actually relevant for the storage source as well. Add the fields with a
note that they are just mirror of the values from the disk.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Everything besides the top of the chain is readonly. Track this when
parsing the XML and detecting the chain from the disk. Also fix the
state when taking snapshots.
All other cases where the top image is changed already preserve the
readonly state from the original image.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We want to make sure our wrapper is used instead in order
to keep the test suite working.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The latter is impossible to mock on platforms that use the
gnulib implementation, such as FreeBSD, while the former
doesn't suffer from this limitation.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
It's a trivial wrapper around canonicalize_file_name(),
which we need in order to fully mock file access on non-Linux
platforms.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The virStorageFileLoadBackendModule method is only used if either
fs or gluster storage is built in, which doesn't happen on mingw
leading to warning of an unused static function.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The storage file drivers are currently loaded as a side effect of
loading the storage driver. This is a bogus dependancy because the
storage file code has no interaction with the storage drivers, and
even ultimately be running in a completely separate daemon.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The virStorageFileSupportsSecurityDriver and
virStorageFileSupportsAccess currently just return a boolean
value. This is ok because they don't have any failure scenarios
but a subsequent patch is going to introduce potential failure
scenario. This changes their return type from a boolean to an
int with values -1, 0, 1.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The virStorageFileGetBackingStoreStr method has overloaded the NULL
return value to indicate both no backing available and a fatal
error dealing with it.
The caller is thus not able to correctly propagate the error
messages.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The driver.{c,h} files are primarily targetted at loading hypervisor
drivers and some helper functions in that area. It also, however,
contains a generically useful function for loading extension modules
that is called by the storage driver. Split that functionality off
into a new virmodule.{c,h} file to isolate it.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This reverts commit 8daa593b07.
There are two undesirable aspects to the impl
- Only a bare wildcard is permitted
- The wildcard match is not performed in the order listed
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
virNetDevTapGetRealDeviceName() is used on FreeBSD because interface
names (such as one sees in output of tools like ifconfig(8)) might not
match their /dev entity names, and for bhyve we need the latter.
Current implementation is not very efficient because in order to find
/dev name, it goes through all /dev/tap* entries and tries to issue
TAPGIFNAME ioctl on it. Not only this is slow, but also there's a bug in
this implementation when more than one NIC is passed to a VM: once we
find the tap interface we're looking for, we set its state to UP because
opening it for issuing ioctl sets it DOWN, even if it was UP before.
When we have more than 1 NIC for a VM, we have only last one UP because
others remain DOWN after unsuccessful attempts to match interface name.
New implementation just uses sysctl(3), so it should be faster and
won't make interfaces go down to get name.
Signed-off-by: Roman Bogorodskiy <bogorodskiy@gmail.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
The virFileFindResource method merely builds up the expected fully
qualified path to the resource. It does not actually check if it exists
on disk. The loadable module callers were mistakenly thinking a NULL
indicates the file doesn't exist on disk, whereas it in fact indicates
an out of memory error.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1569678
On some large systems (with ~400GB of RAM) it is possible for
unsigned int to overflow in which case we report invalid number
of 4K pages pool size. Switch to unsigned long long.
We hit overflow in virNumaGetPages when doing:
huge_page_sum += 1024 * page_size * page_avail;
because although 'huge_page_sum' is an unsigned long long, the
page_size and page_avail are both unsigned int, so the promotion
to unsigned long long doesn't happen until the sum has been
calculated, by which time we've already overflowed.
Turning page_avail into a unsigned long long is not strictly
needed until we need ability to represent more than 2^32
4k pages, which equates to 16 TB of RAM. That's not
outside the realm of possibility, so makes sense that we
change it to unsigned long long to avoid future problems.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Now that every caller is using virDomainObjListFindByUUIDRef,
let's just remove it and keep the name as virDomainObjListFindByUUID.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Historically we have relied on autopoint/gettextize to install a
standard po/Makefile.in.in. There is very limited scope for customizing
this and it also causes a bunch of extra stuff to be pulled into
configure.ac which potentially clashes with gnulib. Writing make rules
for po file management is no more difficult than any other rules libvirt
has, so stop using autopoint/gettextize.
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently it is not used in backing chains and does not seem that we
will need to use it so return it back to the disk definition. Thankfully
most accesses are done via the accessors.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Our virObject code relies heavily on the fact that the first
member of the class struct is type of virObject (or some
derivation of if). Let's check for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
So far we are repeating the following lines over and over:
if (!(virSomeObjectClass = virClassNew(virClassForObject(),
"virSomeObject",
sizeof(virSomeObject),
virSomeObjectDispose)))
return -1;
While this works, it is impossible to do some checking. Firstly,
the class name (the 2nd argument) doesn't match the name in the
code in all cases (the 3rd argument). Secondly, the current style
is needlessly verbose. This commit turns example into following:
if (!(VIR_CLASS_NEW(virSomeObject,
virClassForObject)))
return -1;
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Whenever we declare a new object the first member of the struct
has to be virObject (or any other member of that family). Now, up
until now we did not care about the name of the struct member.
But lets unify it so that we can do some checks at compile time
later.
The unified name is 'parent'.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This is the responsability of the caller to apply the correct lock
before using these functions. Moreover, the use of a simple boolean
was still racy: two threads may check the boolean and "lock" it
simultaneously.
Users of functions from src/util/virhash.c have to be checked for
correctness. Lookups and iteration should hold a RO
lock. Modifications should hold a RW lock.
Most important uses seem to be covered. Callers have now a greater
responsability, notably the ability to execute some operations while
iterating were reliably forbidden before are now accepted.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Since virCloseCallbacksRun was ignoring the value anyway, let's
just change it to be a void function.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Since the introduction of log tuning capabilities to virt-admin by
@06b91785, this has been a much needed missing improvement on the way to
deprecate the global 'log_level'.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
When preparing for migration, the libxl driver creates a new TCP listen
socket for the incoming migration by calling virNetSocketNewListenTCP,
passing the destination host name. virNetSocketNewListenTCP calls
virSocketAddrParse to check if the host name is a wildcard address, in
which case it avoids adding the AI_ADDRCONFIG flag to the hints passed to
getaddrinfo. If the host name is not an IP address, virSocketAddrParse
reports an error
error : virSocketAddrParseInternal:121 : Cannot parse socket address
'myhost.example.com': Name or service not known
But virNetSocketNewListenTCP succeeds regardless and the overall migration
operation succeeds.
Introduce virSocketAddrParseAny and use it when simply testing if a host
name/addr is parsable.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This helper fetches dependencies for given device mapper target.
At the same time, we need to provide a dummy log function because
by default libdevmapper prints out error messages to stderr which
we need to suppress.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Sometimes it's desired to get a JSON number as string. Add a helper.
This will help in cases where we'd want to convert the internal type from
string to something else.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
It was not possible to determine whether virJSONValueObjectAddVArgs and
the functions using it would consume a virJSONValue or not when used
with the 'a' or 'A' modifier depending on when the loop failed.
Fix this by passing in a pointer to the pointer so that it can be
cleared once it's successfully consumed and the callers don't have to
second-guess leaving a chance of leaking or double freeing the value
depending on the ordering.
Fix all callers to pass a double pointer too.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The set of arguments was changed a long time ago (040d996342
which dates back to July 2013) but the corresponding
documentation was not updated.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
The flags passed to virCommandPassFD() are unnamed and
documentation to this function doesn't list them either.
Give them name and mention it in documentation to functions
using them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Laine Stump <laine@laine.org>
What one currently gets is:
failed to read '/sys/bus/mdev/devices/<UUID>/mdev_type/device_api': No
such file or directory
This indicates that something is missing within the device's sysfs tree
which likely might be not be the case here because the device simply
doesn't exist yet. So, when creating our internal mdev obj, let's check
whether the device exists first prior to trying to verify the
user-provided model within domain XML.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
==16451== 32,768 bytes in 2 blocks are definitely lost in loss record 1,007 of 1,013
==16451== at 0x4C2AF0F: malloc (vg_replace_malloc.c:299)
==16451== by 0x7CADB40: nl_recv (in /usr/lib64/libnl-3.so.200.23.0)
==16451== by 0x532DFAC: virNetlinkDumpCommand (virnetlink.c:363)
==16451== by 0x53236AE: virNetDevIPCheckIPv6Forwarding (virnetdevip.c:641)
==16451== by 0xE3E4A1A: networkStartNetworkVirtual (bridge_driver.c:2490)
==16451== by 0xE3E55F5: networkStartNetwork (bridge_driver.c:2832)
==16451== by 0xE3DFFE5: networkAutostartConfig (bridge_driver.c:531)
==16451== by 0x53F47E0: virNetworkObjListForEachHelper (virnetworkobj.c:1412)
==16451== by 0x52FE69F: virHashForEach (virhash.c:606)
==16451== by 0x53F4857: virNetworkObjListForEach (virnetworkobj.c:1439)
==16451== by 0xE3E0BF4: networkStateAutoStart (bridge_driver.c:808)
==16451== by 0x55689CE: virStateInitialize (libvirt.c:758)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
We have to use VIR_WARNINGS_NO_CAST_ALIGN to avoid clang warning
about increased required alignment caused by some netlink macros.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
And at the same time, do that from .c rather than .h file.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Chen Hanxiao<chenhanxiao@gmail.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
introduce helper to parse RTM_GETNEIGH query message and
store it in struct virArpTable.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Some fields reported by dmidecode have plenty of useless spaces
(in fact some have nothing but spaces). To deal with this we have
introduced virSkipSpacesBackwards() and use it in
virSysinfoParseX86Processor() and virSysinfoParseX86Memory().
However, other functions (e.g. virSysinfoParseX86Chassis()) don't
use it at all and thus we are reporting nonsense:
<sysinfo type='smbios'>
<chassis>
<entry name='manufacturer'>FUJITSU</entry>
<entry name='version'> </entry>
<entry name='serial'> </entry>
<entry name='asset'> </entry>
<entry name='sku'>Default string</entry>
</chassis>
</sysinfo>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Policy-Kit has been replaced by polkit (referred to, respectively,
as POLKIT0 and POLKIT1 in our Makefiles).
The last build fix with old Policy-Kit was in May 2013:
commit <442eb2ba> and build with -Wunused-label was broken
since April 2016: commit <8437130>
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Probably due to copy-paste error we're storing asset tag into
def->sku which we even use in the next step to store SKU number
and thus the asset tag leaks.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
This time around it's not enough to just pick the latest commit,
because with aed87bb2aa6ed83b49574eb982e3bdd4c36acf17 keycodemapdb
renamed the 'rfb' keycode to 'qnum' and we need to accept the new
name while maintaining backwards compatibility.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
The function used the 'cleanup' label only in error cases. This patch
makes the code pass the cleanup label in every case and removes few
unnecessary VIR_FREEs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
When commit 3545cbef moved the sysfs attribute reading logic from
_udev.c module to virmdev.c, it had to replace our udev read wrappers
with the ones available from virfile.c. The problem is that the original
logic worked correctly with udev read wrappers which don't return an
error code for a missing attribute, virfile.c readers however - not so
much. Therefore add another parameter to the macro, so we can again
accept the fact that optional attributes may be missing.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This flag is only used for tests. Let's instead overload bind syscall
in mocks where it is not done yet.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Range check in virPortAllocatorSetUsed is not useful anymore
when we manage ports for entire unsigned short range values.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Range check in virPortAllocatorSetUsed is not useful anymore
when we manage ports for entire unsigned short range values.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Host tcp4/tcp6 ports is a global resource thus we need to make
port accounting also global or we have issues described in [1] when
port allocator ranges of different instances are overlapped (which
is by default for qemu for example).
Let's have only one global port allocator object that take care
of the entire ports range (0 - 65535) and introduce port range object
for clients to specify desired auto allocation band.
[1] https://www.redhat.com/archives/libvir-list/2017-December/msg00600.html
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Ensure all enum cases are listed in switch statements.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
To ensure we have standardized error messages when reporting problems
with enum values being out of a range, add virReportEnumRangeError().
virReportEnumRangeError(virDomainState, 34);
results in a message
"internal error: Unexpected enum value 34 for virDomainState"
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This function returns nothing but zero. Therefore it makes no
sense to have it returning an integer.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Commit 7e62c4cd26 (first appearing in libvirt-3.9.0 as a resolution
to rhbz #1343919) added a "generated" attribute to virMacAddr that was
set whenever a mac address was auto-generated by libvirt. This
knowledge was used in a single place - when trying to match a NetDef
from the Domain to Delete with user-provided XML. Since the XML parser
always auto-generates a MAC address for NetDefs when none is provided,
it was previously impossible to make a search where the MAC address
isn't significant, but the addition of the "generated" attribute made
it possible for the search function to ignore auto-generated MACs.
This implementation had a problem though - it was adding a field to a
"low level" struct - virMacAddr - which is used in other places with
the assumption that it contains exactly a 6 byte MAC address and
nothing else. In particular, virNWFilterSnoopEthHdr uses virMacAddr as
part of the definition of an ethernet packet header, whose layout must
of course match an actual ethernet packet. Adding the extra bools into
virNWFilterSnoopEthHdr caused the nwfilter driver's "IP discovery via
DHCP packet snooping" functionality to mysteriously stop working.
In order to fix that behavior, and prevent potential future similar
odd behavior, this patch moves the "generated" member out of
virMacAddr (so that it is again really is just a MAC address) into
virDomainNetDef, and sets it only when virDomainNetGenerateMAC() is
called from virDomainNetDefParseXML() (which is the only time we care
about it).
Resolves: https://bugzilla.redhat.com/1529338
(It should also be applied to any maintenance branch that applies
commit 7e62c4cd26 and friends to resolve
https://bugzilla.redhat.com/1343919)
Signed-off-by: Laine Stump <laine@laine.org>
This type of information defines attributes of a system
chassis, such as SMBIOS Chassis Asset Tag.
access inside VM (for example)
Linux: /sys/class/dmi/id/chassis_asset_tag.
Windows: (Get-WmiObject Win32_SystemEnclosure).SMBIOSAssetTag
wirhin Windows PowerShell.
As an example, add the following to the guest XML
<chassis>
<entry name='manufacturer'>Dell Inc.</entry>
<entry name='version'>2.12</entry>
<entry name='serial'>65X0XF2</entry>
<entry name='asset'>40000101</entry>
<entry name='sku'>Type3Sku1</entry>
</chassis>
Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We can't really detect all the authentication data in a sane manner for
disk backing chains. Since the old RBD parser parses it in some cases as
the argv->XML convertor requires it, we can't just drop it.
Instead clear any detected authentication data in the code paths related
to disk backing chain lookup and fix the tests to cope with the change.
https://bugzilla.redhat.com/show_bug.cgi?id=1544659
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The documentation for the JSON/qapi type 'UnixSocketAddress' states that
the unix socket path field is named 'path'. Unfortunately qemu uses
'socket' in case of the gluster driver (despite documented otherwise).
Add logic which will format the correct fields while keeping support of
the old spelling.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544325
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
The fix for CVE-2018-6764 introduced a potential deadlock scenario
that gets triggered by the NSS module when virGetHostname() calls
getaddrinfo to resolve the hostname:
#0 0x00007f6e714b57e7 in futex_wait
#1 futex_wait_simple
#2 __pthread_once_slow
#3 0x00007f6e71d16e7d in virOnce
#4 0x00007f6e71d0997c in virLogInitialize
#5 0x00007f6e71d0a09a in virLogVMessage
#6 0x00007f6e71d09ffd in virLogMessage
#7 0x00007f6e71d0db22 in virObjectNew
#8 0x00007f6e71d0dbf1 in virObjectLockableNew
#9 0x00007f6e71d0d3e5 in virMacMapNew
#10 0x00007f6e71cdc50a in findLease
#11 0x00007f6e71cdcc56 in _nss_libvirt_gethostbyname4_r
#12 0x00007f6e724631fc in gaih_inet
#13 0x00007f6e72464697 in __GI_getaddrinfo
#14 0x00007f6e71d19e81 in virGetHostnameImpl
#15 0x00007f6e71d1a057 in virGetHostnameQuiet
#16 0x00007f6e71d09936 in virLogOnceInit
#17 0x00007f6e71d09952 in virLogOnce
#18 0x00007f6e714b5829 in __pthread_once_slow
#19 0x00007f6e71d16e7d in virOnce
#20 0x00007f6e71d0997c in virLogInitialize
#21 0x00007f6e71d0a09a in virLogVMessage
#22 0x00007f6e71d09ffd in virLogMessage
#23 0x00007f6e71d0db22 in virObjectNew
#24 0x00007f6e71d0dbf1 in virObjectLockableNew
#25 0x00007f6e71d0d3e5 in virMacMapNew
#26 0x00007f6e71cdc50a in findLease
#27 0x00007f6e71cdc839 in _nss_libvirt_gethostbyname3_r
#28 0x00007f6e71cdc724 in _nss_libvirt_gethostbyname2_r
#29 0x00007f6e7248f72f in __gethostbyname2_r
#30 0x00007f6e7248f494 in gethostbyname2
#31 0x000056348c30c36d in hosts_keys
#32 0x000056348c30b7d2 in main
Fortunately the extra stuff virGetHostname does is totally irrelevant to
the needs of the logging code, so we can just inline a call to the
native hostname() syscall directly.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Some of function comments don't have the right named parameters
and others are not consistent with the description alignment.
This patch fixes this.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
The QEMU driver loadable module needs to be able to resolve all ELF
symbols it references against libvirt.so. Some of its symbols can only
be resolved against the storage_driver.so loadable module which creates
a hard dependancy between them. By moving the storage file backend
framework into the util directory, this gets included directly in the
libvirt.so library. The actual backend implementations are still done as
loadable modules, so this doesn't re-add deps on gluster libraries.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
At later point it might not be possible or even safe to use getaddrinfo(). It
can in turn result in a load of NSS module.
Notably, on a LXC container startup we may find ourselves with the guest
filesystem already having replaced the host one. Loading a NSS module
from the guest tree would allow a malicous guest to escape the
confinement of its container environment because libvirt will not yet
have locked it down.
Note the fact that the unused portion of the last element in the bitmap
needs to be cleared, since we use functions which process only full-size
elements and don't really deal with individual bits.
The function only reduces the size of the bitmap thus we can use the
appropriate shrinking function which also does not have any return
value.
Since virBitmapShrink now does not return any value callers need to be
fixed as well.
The virBitmap code uses VIR_RESIZE_N to do quadratic scaling, which
means that along with the number of requested map elements we also need
to keep the number of actually allocated elements for the scaling
algorithm to work properly.
The shrinking code did not fix 'map_alloc' thus virResizeN might
actually not expand the bitmap properly after called on a previously
shrunk bitmap.
'max_bit' is misleading as the value is set to the first invalid bit
as it's used as the number of bits in the bitmap. Rename it to a more
descriptive name.
Just in case someone re-mounted /sys/fs/resctrl with different mount
options (cdp), add a check here.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Some of the other functions depend on the fact that unused bits and longs are
always zero and it's less error-prone to clear it than fix the other functions.
It's enough to zero out one piece of the map since we're calling realloc() to
get rid of the rest (and updating map_len).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540817
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Add new error code to be able to allow consumers (such as Nova) to be
able to key of a specific error code rather than needing to search the
error message."
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Some platforms/toolchains will complain about casting
sockaddr_storage to sockaddr_un because it breaks strict
aliasing rule
../../src/util/virutil.c: In function 'virGetUNIXSocketPath':
../../src/util/virutil.c:2005: error: dereferencing pointer 'un' does break strict-aliasing rules [-Wstrict-aliasing]
Change the code to use a union, in the same way that the
virsocketaddr.h header does.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When starting an LXC container, the /dev entries are created
under temp root (/var/run/libvirt/lxc/$name.dev), relabelled and
then the root is pivoted. However, when it comes to USB devices
which keep path to the device in the structure we need a way to
override the default /dev/usb/... path because we want to work
with the one under temp root. That's what @vroot argument is for
in virUSBDeviceNew. However, what is being passed there is:
vroot = /var/run/libvirt/lxc/lxc_0.dev/bus/usb
Therefore, constructed path is wrong:
dev->path = //var/run/libvirt/lxc/lxc_0.dev/bus/usb//dev/bus/usb/002/002
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
When receiving multiple socket FDs from systemd, it is critical to know
what socket address each corresponds to so we can setup the right
protocols on each.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Due to confusing naming the pointer to the mask got copied which must not
happen, so use UpdateMask instead of SetMask. That also means we can get
completely rid of SetMask.
Also don't clear the free bits since it is not used again (leftover from
previous versions).
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Introduce virResctrlAllocCopyMasks() and use that to initially copy the default
group schemata to the allocation before reserving any parts of the cache. The
reason for this is that when new group is created the schemata will have unknown
data in it. If there was previously group with the same CLoS ID, it will have
the previous valies, if not it will have all bits set. And we need to set all
unspecified (in the XML) allocations to the same one as the default group.
Some non-Linux functions now need to be made public due to this change.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1289368
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
While the QEMU QAPI schema describes 'lun' as a number, the code dealing
with JSON strings does not strictly adhere to this schema and thus
formats the number back as a string. Use the new helper to retrieve both
possibilities.
Note that the formatting code is okay and qemu will accept it as an int.
Tweak also one of the test strings to verify that both formats work
with libvirt.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540290
The helper is useful in cases when the JSON we have to parse may contain
one of the two due to historical reasons and the number value itself
would be stored as a string.
We are skipping non-directories under /sys/fs/resctrl/(info/) since those are not
interesting for us. However in tests it can sometimes happen that ent->d_type
is 0 instead of 4 (DT_DIR) for directories.
I've seen it fail on two machines. Different machines, different systems, I
cannot reproduce it even using the same setup. So one of the ways how to work
around this is call stat() on it. The other one is not checking if it is a
directory since we'll find out eventually when we want to read some files
underneath it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This wil be used in the future, but it makes sense for now as well. It makes
sure there is no mask leftover that would leak.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Pointed out during review on one or two places, but it actually appears in lot
more places. So let's be consistent.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
When working on the CAT series one of the changes was that the pointer got
allocated in another part of the code, even when resctrl was not available on
the host system. However this one particular place neglected that so it needs
to be fixed in order to get the proper error message when requesting
<cachetune/> on HW with no support for it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commits f83c7c88 and 6eb1f2b9 broke the build on FreeBSD and OSX because
of symbols being undefined for those platforms.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This is a replacement for the existing udevPCIGetMdevTypesCap which is
static to the udev backend. This simple helper constructs the sysfs path
from the device's base path for each mdev type and queries the
corresponding attributes of that type.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This should serve as a replacement for the existing udevFillMdevType
which is responsible for fetching the device type's attributes from the
sysfs interface. The problem with the existing solution is that it's
tied to the udev backend.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This is later going to replace the existing virNodeDevCapMdevType, since:
1) it's going to couple related stuff in a single module
2) util is supposed to contain helpers that are widely accessible across
the whole repository.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
With this commit we finally have a way to read and manipulate basic resctrl
settings. Locking is done only on exposed functions that read/write from/to
resctrlfs. Not in functions that are exposed in virresctrlpriv.h as those are
only supposed to be used from tests.
More information about how resctrl works:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/intel_rdt_ui.txt
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
This will make the current functions obsolete and it will provide more
information to the virresctrl module so that it can be used later.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The OEM strings table in SMBIOS allows the vendor to pass arbitrary
strings into the guest OS. This can be used as a way to pass data to an
application like cloud-init, or potentially as an alternative to the
kernel command line for OS installers where you can't modify the install
ISO image to change the kernel args.
As an example, consider if cloud-init and anaconda supported OEM strings
you could use something like
<oemStrings>
<entry>cloud-init:ds=nocloud-net;s=http://10.10.0.1:8000/</entry>
<entry>anaconda:method=http://dl.fedoraproject.org/pub/fedora/linux/releases/25/x86_64/os</entry>
</oemStrings>
use of a application specific prefix as illustrated above is
recommended, but not mandated, so that an app can reliably identify
which of the many OEM strings are targetted at it.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Commit 8708ca01c added virNetDevSwitchdevFeature() to check if a network
device has Switchdev capabilities. virNetDevSwitchdevFeature() attempts
to retrieve the PCI device associated with the network device, ignoring
non-PCI devices. It does so via the following call chain
virNetDevSwitchdevFeature()->virNetDevGetPCIDevice()->
virPCIGetDeviceAddressFromSysfsLink()
For non-PCI network devices (qeth, Xen vif, etc),
virPCIGetDeviceAddressFromSysfsLink() will report an error when
virPCIDeviceAddressParse() fails. virPCIDeviceAddressParse() also
logs an error. After commit 8708ca01c there are now two errors reported
for each non-PCI network device even though the errors are harmless.
To avoid the errors, introduce virNetDevIsPCIDevice() and use it in
virNetDevGetPCIDevice() before attempting to retrieve the associated
PCI device. virNetDevIsPCIDevice() uses the 'subsystem' property of the
device to determine if it is PCI. See the sysfs rules in kernel
documentation for more details
https://www.kernel.org/doc/html/latest/admin-guide/sysfs-rules.html
Let's also parse the available processor frequency information on S390
so that it can be utilized by virsh sysinfo:
# virsh sysinfo
<sysinfo type='smbios'>
...
<processor>
<entry name='family'>2964</entry>
<entry name='manufacturer'>IBM/S390</entry>
<entry name='version'>00</entry>
<entry name='max_speed'>5000</entry>
<entry name='serial_number'>145F07</entry>
</processor>
...
</sysinfo>
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Since kernel version 4.7, processor frequency information is available
on S390. Let's adjust the parser so this information shows up for virsh
nodeinfo:
# virsh nodeinfo
CPU model: s390x
CPU(s): 8
CPU frequency: 5000 MHz
CPU socket(s): 1
Core(s) per socket: 8
Thread(s) per core: 1
NUMA cell(s): 1
Memory size: 16273908 KiB
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Some ARM platforms, such as the original Raspberry Pi, report the
CPU frequency in the BogoMIPS field of /proc/cpuinfo, so libvirt
parsed that field and returned it through its API.
However, not only many more boards don't report any value there,
but several - including ARMv8-based server hardware, and even the
more recent Raspberry Pi 3 - use this field as originally intended:
to report the BogoMIPS value instead of the CPU frequency.
Since we have no way of detecting how the field is being used,
it's better to report no information at all rather than something
ludicrous like "your shiny 96-core aarch64 virtualization host's
CPUs are running at a whopping 100 MHz".
Partially-resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1206353
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Make the parser both more strict, by not ignoring errors reported
by virStrToLong_ui(), and more permissive, by not failing due to
unrelated fields which just happen to have a know prefix and
accepting any amount of whitespace before the numeric value.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Instead of a generic "your architecture", print the actual
architecture name.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
All different architectures use the same copy-pasted code to parse
processor frequency information from /proc/cpuinfo. Let's extract that
code into a function to avoid repetition.
We now also tolerate if the parsing of /proc/cpuinfo is not successful
and just report a warning instead of bailing out and abandoning the rest
of the CPU information.
Reviewed-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
This new API reads host's CPU microcode version from /proc/cpuinfo.
Unfortunately, there is no other way of reading microcode version which
would be usable from both system and session daemon.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
There are a few more description-related issues that commit @9026d115
forgot to address.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
There's no argument named @result, use @matches instead.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Signed-off-by: Erik Skultety <eskultet@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1523564
If the vhost-scsi device file cannot be found, the generic error
"error: An error occurred, but the cause is unknown"
is returned. Let's add a real error message to make it clear
why the failure occurred.
We cannot be sure someone initialized the passed *vhostfd and we
certainly don't want or need to be calling VIR_FORCE_CLOSE on what
probably is -1. So let's just return -1 immediately.
Replace the error message during startup of libvirtd with an info
message if audit_level < 2 and audit is not supported by the
kernel. Audit is not supported by the current kernel if the kernel
does not have audit compiled in or if audit is disabled (e.g. by the
kernel cmdline).
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Since commit 5e5019bf, we've no longer use
VIR_ERR_AGENT_UNSYNCED anymore.
Mark it as DEPRECATED.
Signed-off-by: Chen Hanxiao <chenhanxiao@gmail.com>
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
The manual page clearly states that
gettid() is Linux-specific and should not be used in programs
that are intended to be portable.
Unfortunately, it looks like macOS implemented the functionality
and defined SYS_gettid accordingly, only to deprecate syscall()
altogether with 10.12 (Sierra), released last late year.
To avoid compilation errors, call gettid() on Linux only.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Some drive backends allow output of debugging information which can be
configured using properties of the image. Add fields to virStorageSource
which will allow configuring them.
Sometimes the size of the bitmap matters and it might not be guessed correctly
when parsing from some type of input. For example virBitmapNewData() has Byte
granularity, virBitmapNewString() has nibble granularity and so on.
virBitmapParseUnlimited() can be tricked into creating huge bitmap that's not
needed (e.g.: "0-2,^99999999"). This function provides a way to shrink the
bitmap. It is not supposed to free any memory.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Already introduced in the past with 9479642fd3, but then renamed to
virBitmapIntersect by a908e9e45e. This time we'll really use it.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Our bitmaps can be represented as data (raw bytes for which we have
virBitmapNewData() and virBitmapToData()), human representation (list
of numbers in a string for which we have virBitmapParse() and
virBitmapFormat()) and hexadecimal string (for which we have only
virBitmapToString()). So let's add the missing complement for the
last one so that we can parse hexadecimal strings.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Truncate the output so that it is only as big as is needed to fit all
the bits, not all the units from the map. This will be needed in the
future in order to properly format bitmaps for kernel's sysfs files.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It is literally only a wrapper around virBitmapNewData() and
virBitmapFormat(), only the naming was wrong since it was introduced.
And because we have virBitmap*String functions where the meaning of
the 'String' is constant, this might confuse someone.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
This follows the virBitmapToData() function and, similarly to
virBitmapNewData(), we'll be able to have virBitmapNewString() later
on without name confusion.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
We can't output better memory sizes if we want to be compatible with libvirt
older than the one which introduced /memory/unit, but for new things we can just
output nicer capacity to the user if available. And this function enables that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
libvirt reports a fake NUMA topology in virConnectGetCapabilities
even if built without numactl support. The fake NUMA topology consists
of a single cell representing the host's cpu and memory resources.
Currently this is the case for ARM and s390[x] RPM builds.
A client iterating over NUMA cells obtained via virConnectGetCapabilities
and invoking virNodeGetMemoryStats on them will see an internal failure
"NUMA isn't available on this host" from virNumaGetMaxNode. An example
for such a client is VDSM.
Since the intention seems to be that libvirt always reports at least
a single cell it is necessary to return "fake" node memory statistics
matching the previously reported fake cell in case NUMA isn't supported
on the system.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
There was a recent report of the xen-xl converter not handling
config files missing an ending newline
https://www.redhat.com/archives/libvir-list/2017-October/msg01353.html
Commit 3cc2a9e0 fixed a similar problem when parsing content of a
file but missed parsing in-memory content. But AFAICT, the better
fix is to properly set the end of the content when initializing the
virConfParserCtxt in virConfParse().
This commit reverts the part of 3cc2a9e0 that appends a newline to
files missing it, and fixes setting the end of content when
initializing virConfParserCtxt. A test is also added to check
parsing in-memory content missing an ending newline.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Split on the last colon and avoid parsing port if the split remainder
contains the closing square bracket, so that IPv6 addresses are
interpreted correctly.
Libvirt historically stores storage source path including the volume as
one string in the XML, but that is not really flexible enough when
dealing with the fields in the code. Previously we'd store the slash
separating the two as part of the image name. This was fine for gluster
but it's not necessary and does not scale well when converting other
protocols.
Don't store the slash as part of the path. The resulting change from
absolute to relative path within the gluster driver should be okay,
as the root directory is the default when accessing gluster.
Right-aligning backslashes when defining macros or using complex
commands in Makefiles looks cute, but as soon as any changes is
required to the code you end up with either distractingly broken
alignment or unnecessarily big diffs where most of the changes
are just pushing all backslashes a few characters to one side.
Generated using
$ git grep -El '[[:blank:]][[:blank:]]\\$' | \
grep -E '*\.([chx]|am|mk)$$' | \
while read f; do \
sed -Ei 's/[[:blank:]]*[[:blank:]]\\$/ \\/g' "$f"; \
done
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
One of the usecases of iohelper is to read from pipe and write
to file with O_DIRECT. As we read from pipe we can have partial
read and then we fail to write this data because output file
is open with O_DIRECT and buffer size is not aligned.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
The function virEventRegisterImpl() checks the attempt to replace the
registered events. But there is a duplicate variable inside the IF statement.
The variable 'removeHandleImpl' was wrongly repeated. One of them needs to be
replaced by 'removeTimeoutImpl'.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Since the virStorageEncryptionPtr encryption; is a member of
_virStorageSource it really should be allowed to be a subelement
of the disk <source> for various disk formats:
Source{File|Dir|Block|Volume}
SourceProtocol{RBD|ISCSI|NBD|Gluster|Simple|HTTP}
NB: Simple includes sheepdog, ftp, ftps, tftp
That way we can set up to allow the <encryption> element to be
formatted within the disk source, but we still need to be wary
from whence the element was read - see keep track and when it
comes to format the data, ensure it's written in the correct place.
Modify the qemuxml2argvtest to add a parse failure when there is an
<encryption> as a child of <disk> *and* an <encryption> as a child
of <source>.
The virschematest will read the new test files and validate from a
RNG viewpoint things are fine.
Since the virStorageAuthDefPtr auth; is a member of _virStorageSource
it really should be allowed to be a subelement of the disk <source>
for the RBD and iSCSI prototcols. That way we can set up to allow
the <auth> element to be formatted within the disk source.
Since we've allowed the <auth> to be a child of <disk>, we'll need
to keep track of how it was read so that when writing out we'll know
whether to format as child of <disk> or <source>. For the argv2xml
parsing, let's format under <source> as a preference. Do not allow
<auth> to be both a child of <disk> and <source>.
Modify the qemuxml2argvtest to add a parse failure when there is an
<auth> as a child of <disk> *and* an <auth> as a child of <source>.
Add tests to validate that if the <auth> was found in <source>, then
the resulting xml2xml and xml2arg works just fine. The two new .args
file are exact copies of the non "-source" version of the file.
The virschematest will read the new test files and validate from a
RNG viewpoint things are fine
Update the virstoragefile, virstoragetest, and args2xml file to show
the "preference" to place <auth> as a child of <source>.
Introduce the bare necessities to add privateData to _virStorageSource.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Since we have a number of places where we workaround timing issues with
devices, attributes (files in general) not being available at the time
of processing them by calling usleep in a loop for a fixed number of
tries, we could as well have a utility function that would do that.
Therefore we won't have to duplicate this ugly workaround even more.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
There were a bunch of commentary blocks that were literally useless in
terms of describing what the code following them does, since most of
them were documenting "the obvious" or it just wouldn't help at all.
Signed-off-by: Erik Skultety <eskultet@redhat.com>
So we have a syntax-check rule to catch all tab indents but it naturally
can't catch tab spacing, i.e. as a delimiter. This patch is a result of
running 'vim -en +retab +wq'
(using tabstop=8 softtabstop=4 shiftwidth=4 expandtab) on each file from
a list generated by the following:
find . -regextype gnu-awk \
-regex ".*\.(rng|syms|html|s?[ch]|py|pl|php(\.code)?)(\.in)?" \
| xargs git grep -lP "\t"
Signed-off-by: Erik Skultety <eskultet@redhat.com>
When formatting an inactive or migratable XML we will need to suppress
backing chain members which were detected from the disk to keep
semantics straight. This means we need to record, whether a
virStorageSource originates from autodetection.
Express a properly terminated backing chain by putting a
virStorageSource of type VIR_STORAGE_TYPE_NONE in the chain. The newly
used helpers simplify this greatly.
The change fixes a bug as formatting an incomplete backing chain and
parsing it back would end up in expressing a terminated chain since
src->backingStoreRaw was not populated. By relying on the terminator
object this can be now processed appropriately.
Add helpers that will simplify checking if a backing file is valid or
whether it has backing store. The helper virStorageSourceIsBacking
returns true if the given virStorageSource is a valid backing store
member. virStorageSourceHasBacking returns true if the virStorageSource
has a backing store child.
Adding these functions creates a central points for further refactors.
The backing store indexes were not bound to the storage sources in any
way. To allow us to bind a given alias to a given storage source we need
to save the index in virStorageSource. The backing store ids are now
generated when detecting the backing chain.
Since we don't re-detect the backing chain after snapshots, the
numbering needs to be fixed there.
The code was vulnerable to SQL injection. Likely not a security issue due to
WMI SQL and other constraints but still lame. For example:
virsh # dominfo \"
error: failed to get domain '"'
error: internal error: SOAP fault during enumeration: code 's:Sender', subcode
'n:CannotProcessFilter', reason 'The data source could not process the filter.
The filter might be missing or it might be invalid. Change the filter and try
the request again. ', detail 'The WS-Management service cannot process the
request. The WQL query is invalid. '
This commit fixes the Hyper-V driver by escaping all WMI SQL string parameters.
The same command with the fix:
virsh # dominfo \"
error: failed to get domain '"'
error: Domain not found: No domain with name "
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
The API makes a deep copy of a NULL-terminated string list.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Commit id '8708ca01c' added a check to determine whether the NIC had
Switchdev capabilities; however, in doing so inadvertently would cause
network devices without a PCI device to not be added to the node device
database. Thus, network devices having a "computer" as a parent, such
as "net_lo*", "net_virbr*", "net_tun*", "net_vnet*", etc. were not added.
Alter the check to not even check for Switchdev bits if no PCI device found.
This commit fixes the deadlock introduced by commit
0980764dee. The call getgrouplist() of
the glibc library isn't safe to be called in between fork and
exec (see commit 75c125641a).
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Fixes: 0980764dee ("util: share code between virExec and virCommandExec")
Reviewed-by: Bjoern Walk <bwalk@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
These functions are used by an upcoming commit.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Similarly to previous patch, for some types of interface domain
and host are on the same side of RX/TX barrier. In that case, we
need to set up the QoS differently. Well, swapped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1497410
The comment in virNetDevTapInterfaceStats() implementation for
Linux states that packets transmitted by domain are received by
the host and vice versa. Well, this is true but not for all types
of interfaces. For instance, for macvtaps when TAP device is
hooked right onto a physical device any packet that domain sends
looks also like a packet sent to the host. Therefore, we should
allow caller to chose if the stats returned should be straight
copy or swapped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: John Ferlan <jferlan@redhat.com>
It will come handy to know if the MAC address was generated (e.g.
during XML parse) or if it was parsed since provided by user in
the XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Introduce a function to setup any TLS needs for a disk source.
If there's a configuration or other error setting up the disk source
for TLS, then cause the domain startup to fail.
For VxHS, follow the chardevTLS model where if the src->haveTLS hasn't
been configured, then take the system/global cfg->haveTLS setting for
the storage source *and* mark that we've done so via the tlsFromConfig
setting in storage source.
Next, if we are using TLS, then generate an alias into a virStorageSource
'tlsAlias' field that will be used to create the TLS object and added to
the disk object in order to link the two together for QEMU.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add an optional virTristateBool haveTLS to virStorageSource to
manage whether a storage source will be using TLS.
Sample XML for a VxHS disk:
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source protocol='vxhs' name='eb90327c-8302-4725-9e1b-4e85ed4dc251' tls='yes'>
<host name='192.168.0.1' port='9999'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
Additionally add a tlsFromConfig boolean to control whether the TLS
setting was due to domain configuration or qemu.conf global setting
in order to decide whether to Format the haveTLS setting for either
a live or saved domain configuration file.
Update the qemuxml2xmltest in order to add a test to show the proper
parsing.
Also update the docs to describe the tls attribute.
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
Calling fallocate on the new (smaller) capacity ensures
that the whole file is allocated, but it does not reduce
the file size.
Also call ftruncate after fallocate.
https://bugzilla.redhat.com/show_bug.cgi?id=1366446
We have been trying to implement the ALLOCATE flag to mean
"the volume should be fully allocated after the resize".
Since commit b0579ed9 we do not allocate from the existing
capacity, but from the existing allocation value.
However this value is a total of all the allocated bytes,
not an offset.
For a sparsely allocated file:
$ perl -e 'print "x"x8192;' > vol1
$ fallocate -p -o 0 -l 4096 vol1
$ virsh vol-info vol1 default
Capacity: 8.00 KiB
Allocation: 4.00 KiB
Treating allocation as an offset would result in an incompletely
allocated file:
$ virsh vol-resize vol1 --pool default 16384 --allocate
Capacity: 16.00 KiB
Allocation: 12.00 KiB
Call fallocate from zero on the whole requested capacity to fully
allocate the file. After that, the volume is fully allocated
after the resize:
$ virsh vol-resize vol1 --pool default 16384 --allocate
$ virsh vol-info vol1 default
Capacity: 16.00 KiB
Allocation: 16.00 KiB
Introduce a new function virFileAllocate that will call the
non-destructive variants of safezero, essentially reverting
my commit 1390c268
safezero: fall back to writing zeroes even when resizing
back to the state as of commit 18f0316
virstoragefile: Have virStorageFileResize use safezero
This means that _ALLOCATE flag will no longer work on platforms
without the allocate syscalls, but it will not overwrite data
either.
Seeing a log message saying 'flags=93' is ambiguous & confusing unless
you happen to know that libvirt always prints flags as hex. Change our
debug messages so that they always add a '0x' prefix when printing flags,
and '0' prefix when printing mode. A few other misc places gain a '0x'
prefix in error messages too.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
After commit 8708ca01c0 libvirtd consistently aborts with "stack
smashing detected" when nodedev driver is initialized.
This is caused by nlmsg_parse() being told that its array of nlattr*
has CTRL_CMD_MAX (10) entries, when in fact it is declared to have
CTRL_ATTR_MAX (8) entries. Since all the entries are initialized to
NULL, the result is that nlmsg_parse is overwriting 2*(sizof(nlattr*))
bytes outside the array.
Signed-off-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Commit id '5604c056' used the wrong API to generate the
<secret type='%s'..." field. The previous code used the
correct API as was done in commit id '6887af39'. The data
is actually a usage type not an auth type even though the
result is the same.
The iohelper currently calls saferead() to get data from the
underlying file. This has a problem with O_DIRECT when hitting
end-of-file. saferead() is asked to read 1MB, but the first
read() it does may return only a few KB, so it'll try another
read() to fill the remaining buffer. Unfortunately the buffer
pointer passed into this 2nd read() is likely not aligned
to the extent that O_DIRECT requires, so rather than seeing
'0' for end-of-file, we'll get -1 + EINVAL due to misaligned
buffer.
The way the iohelper is currently written, it already handles
getting short reads, so there is actually no need to use
saferead() at all. We can simply call read() directly. The
benefit of this is that we can now write() the data immediately
so when we go into the subsequent reads() we'll always have a
correctly aligned buffer.
Technically the file position ought to be aligned for O_DIRECT
too, but this does not appear to matter when at end-of-file.
Tested-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add the backing parse and a test case to verify parsing of VxHS
backing storage.
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
Add a new virStorageNetProtocol for Veritas HyperScale (VxHS) disks
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Signed-off-by: John Ferlan <jferlan@redhat.com>
The mlx4 (Mellanox) netdev driver implements the sysfs phys_port_id
file for both VFs and PFs, so you can find the VF netdev plugged into
the same physical port as any given PF netdev by comparing the
contents of phys_port_id of the respective netdevs. That's what
libvirt does when attempting to find the PF netdev for a given VF
netdev (or vice versa).
Most other netdev's drivers don't implement phys_port_id, so the file
is visible in sysfs directory listing, but attempts to read it result
in ENOTSUPP. In these cases, libvirt is unable to read phys_port_id of
either the PF or the VF, so it just returns the first entry in the
PF/VF's list of netdevs.
But we've found that the i40e driver is in between those two
situations - it implements phys_port_id for PF netdevs, but doesn't
implement it for VF netdevs. So libvirt would successfully read the
phys_port_id of the PF netdev, then try to find a VF netdev with
matching phys_port_id, but would fail because phys_port_id is NULL for
all VFs. This would result in a message like the following:
Could not find network device with phys_port_id '3cfdfe9edc39'
under PCI device at /sys/class/net/ens4f1/device/virtfn0
To solve this problem in a way that won't break functionality for
anyone else, this patch saves the first netdev name we find for the
device, and returns that if we fail to find a netdev with the desired
phys_port_id.
Instead of checking for all possible constants that every
kernel header with devlink support should have (and defining
HAVE_DECL_DEVLINK as 1 if any of them is present due to the
way AC_CHECK_DECLS works), only check for DEVLINK_CMD_ESWITCH_GET.
This is the name of the constant since kernel 4.11. Between 4.8
and 4.11, the now deprecated spelling DEVLINK_CMD_ESWITCH_MODE_GET
was used.
Assume DEVLINK_ESWITCH_MODE_SWITCHDEV is available, since it was
introduced along with the deprecated spelling.
Adding functionality to libvirt that will allow querying the interface
for the availability of switchdev Offloading NIC capabilities.
The switchdev mode was introduced in kernel 4.8, the iproute2-devlink
command to retrieve the switchdev NIC feature with command example:
devlink dev eswitch show pci/0000:03:00.0
This feature is needed for Openstack so we can do a scheduling decision
if the NIC is in Hardware Offload (switchdev) or regular SR-IOV (legacy) mode.
And select the appropriate hypervisors with the requested capability see [1].
[1] - https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/enable-sriov-nic-features.html
Reviewed-by: Laine Stump <laine@laine.org>
Reviewed-by: John Ferlan <jferlan@redhat.com>
Some cleanup paths overwrite a usefull error message with a less useful
one and we then try to preserve the original message. The handlers added
in this patch will simplify the operations since they are designed right
for the purpose.
TPM 2 does not implement sysfs files for cancellation of commands.
We therefore use /dev/null for the cancel path passed to QEMU.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Although not previously explicitly documented, the expectation for
the libvirt event loop is that an implementation is registered early
in application startup, before calling any libvirt APIs and then
run forever after. Replacing a previously registered event loop is
not safe & subject to races even if virConnectClose has been called
on open handles, due to delayed deregistration of callbacks during
conenction close.
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
We can now check for the error and not care about the return value as
it will be properly handled in virBufferContentAndReset() anyway.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
The function is useful even without using the return value. And if
needed, the return value can be obtained by other calls as well. The
potential for clean-up can be seen in the following patch.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 94c465d0 refactored the logging setup phase but introduced an
issue, where the daemon ignores verbose mode when there are no outputs
defined and the default must be used. The problem is that the default
output was determined too early, thus ignoring the potential '--verbose'
option taking effect. This patch postpones the creation of the default
output to the very last moment when nothing else can change. Since the
default output is only created during the init phase, it's safe to leave
the pointer as NULL for a while, but it will be set eventually, thus not
affecting runtime.
Patch also adjusts both the other daemons.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1442947
Signed-off-by: Erik Skultety <eskultet@redhat.com>
This helper allows you to better structurize the code if some element
may or may not contains attributes and/or child elements.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The helper returns true if a string contains any of the given chars.
virStringHasControlChars can be reimplemented using that helper.
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
It's equivalent of calling virXPathString("string(.)", ctxt) but it
doesn't have to use the XPath resolving and parsing.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
The virXMLPropStringLimit is an equivalent of virXPathStringLimit
which should be preferred if you already have a XML dom node or
if you need to parse more than one property.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Move networkMacMgrFileName into src/util/virmacmap.c and rename to
virMacMapFileName. We're about to move some more MacMgr processing
files into virnetworkobj and it doesn't make sense to have this helper
in the driver or in virnetworkobj.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than assuming that what's passed to virObject{Ref|Unref}
would be a virObjectPtr as long as it's not NULL, let's do the
similar checks virObjectIsClass in order to prevent a possible
increment or decrement to some field at the obj->u.s.refs offset.
Signed-off-by: John Ferlan <jferlan@redhat.com>
The virObjectIsClass API has only ever checked object validity
based on if the @obj is not NULL and it was derived from some class.
While this has worked well in general, there is one additional
check that could be made prior to calling virClassIsDerivedFrom
which loops through the classes checking the magic number against
the klass expected magic number.
If by chance a non virObject is passed, rather than assuming the
void * @obj is a _virObject and thus offsetting to obj->klass,
obj->magic, and obj->parent, let's check that the void * @obj
has at least the "base part" of the magic number in the right
place and generate a more specific VIR_WARN message if not.
There are many consumers to virObjectIsClass, include the locking
primitives virObject{Lock|Unlock}, virObjectRWLock{Read|Write},
and virObjectRWUnlock. For those callers, the locking call will
not fail, but it also will not attempt a virMutex* call which
will "most likely" fail since the &obj->lock is used.
In order to avoid some possible future wrap on the 0xCAFExxxx
value, add a check during initialization that some new class
won't cause the wrap. Should be good for a few years at least!
It is still left up to the caller to handle the failed API calls
just as it would be if it passed a NULL opaque pointer anyobj.
If virObjectIsClass fails "internally" to virobject.c, create a
macro to generate the VIR_WARN describing what the problem is.
Also improve the checks and message a bit to indicate which was
the failure - whether the obj was NULL or just not the right class
Signed-off-by: John Ferlan <jferlan@redhat.com>
Rather than overload virObjectUnlock as commit id '77f4593b' has
done, create a separate virObjectRWUnlock API that will force the
consumers to make the proper decision regarding unlocking the
RWLock's. Similar to the RWLockRead and RWLockWrite, use the
virObjectGetRWLockableObj helper. This restores the virObjectUnlock
code to using the virObjectGetLockableObj.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Introduce a helper to handle the error path more cleanly. The same
as virObjectGetLockableObj in order to essentially follow the original
logic of commit 'b545f65d' to ensure that the input argument at least
has some validity before using.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Now that virObjectRWLockWrite exists to handle the virObjectRWLockable
objects, let's restore virObjectLock to only handle virObjectLockable
class locks. There still exists the possibility that the input @anyobj
isn't a valid object and the resource isn't truly locked, but that
also exists before commit id '77f4593b'.
This also restores some logic that commit id '77f4593b' removed
with respect to a common code path that commit id '10c2bb2b' had
introduced as virObjectGetLockableObj. This code path merely does
the same checks as the original virObjectLock commit 'b545f65d',
but in callable/reusable helper to ensure the @obj at least has
some validity before using.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Instead of making virObjectLock be the entry point for two
different types of locks, let's create a virObjectRWLockWrite API
which will only handle the virObjectRWLockableClass objects.
Use the new virObjectRWLockWrite for the virdomainobjlist code
in order to handle the Add, Remove, Rename, and Load operations
that need to be very synchronous.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Since the class it represents is based on virObjectRWLockableClass
and in order to make sure we differentiate just in case anyone somehow
believes they could use virObjectLockRead for a virObjectLockableClass,
let's rename the API to use the RW in the name. Besides the RW locks
refer to pthread_rwlock_{init|rdlock|wrlock|unlock|destroy} while the
other locks refer to pthread_mutex_{init|lock|unlock|destroy}.
Signed-off-by: John Ferlan <jferlan@redhat.com>
This way later patches can add another structures with virResctrl
prefix without the meaning being even more confusing than it needs to
be.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
That means that returning negative values means error and non-negative
values differ in meaning, but are all successful.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
It doesn't access anything from conf/ and ti will be needed to use
from other util/ places. This split makes the separation clearer.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Commit 81fb440b further qualified an if statement by adding the
boolean saveVlan to the condition. Coverity pointed out that this
change in the logic eliminated the need to check saveVlan in an
argument to virAsprintf().
Commit 9a94af6d restructured virHostdevReadNetConfig() so that it
would manually set ret = 0 after successfully reading the device's
config, but Coverity pointed out that "ret = 0" was erroneously placed
outside of an "else" clause, meaning that the the value of ret set in
the "if" clause was unnecessarily and incorrectly overwritten.
This patch moves ret = 0 into the else clause, which should silence
Coverity.
When using a VF from an SRIOV-capable network card in a guest (either
in macvtap passthrough mode, or via VFIO PCI device assignment), The
associated PF netdev must be online in order for the VF to be usable
by the guest. The guest, however, is not able to change the state of
the PF. And libvirt *could* set the PF online as needed, but that
could lead to the host receiving unexpected IPv6 traffic (since the
default for an unconfigured interface is to participate in IPv6
autoconf). For this reason, before assigning a VF to a guest, libvirt
verifies that the related PF netdev is online - if it isn't, then we
log an error and don't allow the guest startup to continue.
Until now, this check was done during virNetDevSetNetConfig(). This
works nicely because the same function is called both for macvtap
passthrough and for VFIO device assignment. But in the case of VFIO,
the VF has already been unbound from its netdev driver by the time we
get to virNetDevSetNetConfig(), and in the case of dual port Mellanox
NICs that have their VFs setup in single port mode, the *only* way to
determine the proper PF netdev to query for online status is via the
"phys_port_id" file that is in the VF netdev's sysfs directory. *BUT*
if we've unbound the VF from the netdev driver, then it doesn't *have*
a netdev sysfs directory.
So, in order to check the correct PF netdev for online status, this
patch moved the check earlier in the setup, into
virNetDevSaveNetConfig(), which is called *before* unbinding the VF
from its netdev driver.
(Note that this implies that if you are using VFIO device assignment
for the VFs of a Mellanox NIC that has the VFs programmed in single
port mode, you must let the VFs be bound to their net driver and use
"managed='yes'" in the device definition. To be more specific, this is
only true if the VFs in single port mode are using port *2* of the PF
- if the VFs are using only port 1, then the correct PF netdev will be
arrived at by default/chance))
This resolves: https://bugzilla.redhat.com/267191
virHostdevRestoreNetConfig() calls virNetDevReadNetConfig() to try and
read the "original config" of a netdev, and if that fails, it tries
again with a different directory/netdev name. This achieves the
desired effect (we end up finding the config wherever it may be), but
for each failure, virNetDevReadNetConfig() places a nice error message
in the system logs. Experience has shown that false-positive error
logs like this lead to erroneous bug reports, and can often mislead
those searching for *real* bugs.
This patch changes virNetDevReadNetConfig() to explicitly check if the
file exists before calling virFileReadAll(); if it doesn't exist,
virNetDevReadNetConfig() returns a success, but leaves all the
variables holding the results as NULL. (This makes sense if you define
the purpose of the function as "read a netdev's config from its config
file *if that file exists*).
To take advantage of that change, the caller,
virHostdevRestoreNetConfig() is modified to fail immediately if
virNetDevReadNetConfig() returns an error, and otherwise to try the
different directory/netdev name if adminMAC & vlan & MAC are all NULL
after the preceding attempt.
Mellanox ConnectX-3 dual port SRIOV NICs present a bit of a challenge
when assigning one of their VFs to a guest using VFIO device
assignment.
These NICs have only a single PCI PF device, and that single PF has
two netdevs sharing the single PCI address - one for port 1 and one
for port 2. When a VF is created it can also have 2 netdevs, or it can
be setup in "single port" mode, where the VF has only a single netdev,
and that netdev is connected either to port 1 or to port 2.
When the VF is created in dual port mode, you get/set the MAC
address/vlan tag for the port 1 VF by sending a netlink message to the
PF's port1 netdev, and you get/set the MAC address/vlan tag for the
port 2 VF by sending a netlink message to the PF's port 2 netdev. (Of
course libvirt doesn't have any way to describe MAC/vlan info for 2
ports in a single hostdev interface, so that's a bit of a moot point)
When the VF is created in single port mode, you can *set* the MAC/vlan
info by sending a netlink message to *either* PF netdev - the driver
is smart enough to understand that there's only a single netdev, and
set the MAC/vlan for that netdev. When you want to *get* it, however,
the driver is more accurate - it will return 00:00:00:00:00:00 for the
MAC if you request it from the port 1 PF netdev when the VF was
configured to be single port on port 2, or if you request if from the
port 2 PF netdev when the VF was configured to be single port on port
1.
Based on this information, when *getting* the MAC/vlan info (to save
the original setting prior to assignment), we determine the correct PF
netdev by matching phys_port_id between VF and PF.
(IMPORTANT NOTE: this implies that to do PCI device assignment of the
VFs on dual port Mellanox cards using <interface type='hostdev'>
(i.e. if you want the MAC address/vlan tag to be set), not only must
the VFs be configured in single port mode, but also the VFs *must* be
bound to the host VF net driver, and libvirt must use managed='yes')
By the time libvirt is ready to set the new MAC/vlan tag, the VF has
already been unbound from the host net driver and bound to
vfio-pci. This isn't problematic though because, as stated earlier,
when a VF is created in single port mode, commands to configure it can
be sent to either the port 1 PF netdev or the port 2 PF netdev.
When it is time to restore the original MAC/vlan tag, again the VF
will *not* be bound to a host net driver, so it won't be possible to
learn from sysfs whether to use the port 1 or port 2 PF netdev for the
netlink commands. And again, it doesn't matter which netdev you
use. However, we must keep in mind that we saved the original settings
to a file called "${PF}_${VFNUM}". To solve this problem, we just
check for the existence of ${PF1}_${VFNUM} and ${PF2}_${VFNUM}, and
use whichever one we find (since we know that only one can be there)
This patch updates functions in netdev.c to pay attention to
phys_port_id. It uses the new function virNetDevGetPhysPortID() to
learn the phys_port_id of a VF or PF, then sends that info to
virPCIGetNetName(), which has newly been modified to take an optional
phys_port_id.
A single PCI device may have multiple netdevs associated with it. Each
of those netdevs will have a different phys_port_id entry in
sysfs. This patch modifies virPCIGetNetName() to allow selecting one
of the potential many netdevs in two different ways:
1) by setting the "idx" argument, the caller can select the 1st (0),
2nd (1), etc. netdev from the PCI device's net subdirectory.
2) If the physPortID arg is set (to a null-terminated string) then
virPCIGetNetName() returns the netdev that has that phys_port_id in
the sysfs file of the same name in the netdev's directory.
On Linux each network device *can* (but not necessarily *does*) have
an attribute called phys_port_id which can be read from the file of
that name in the netdev's sysfs directory. The examples I've seen have
been a many-digit hexadecimal number (as an ASCII string).
This value can be useful when a single PCI device is associated with
multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has
a single PCI Physical Function (PF), and that PF has two netdevs
associated with it (the "net" subdirectory of the PF in sysfs has two
links rather than the usual single link to a netdev directory). Each
of the PF netdevs has a different phys_port_id. The Virtual Functions
(VF) are similar - the PF (a PCI device) has "n" VFs (also each of
these is a PCI device), each VF has two netdevs, and each of the VF
netdevs points back to the VF PCI device (with the "device" entry in
its sysfs directory) as well as having a phys_port_id matching the PF
netdev it is associated with.
virNetDevGetPhysPortID() simply attempts to read the phys_port_id for
the given netdev and return it to the caller. If this particular
netdev driver doesn't support phys_port_id, it returns NULL (*not* a
NULL-terminated string, but a NULL pointer) but still counts it as a
success.
We are given a string in @machinename, we never allocate it, just
merely use it for reading. We should not free it otherwise it
leads to double free:
==32191== Thread 17:
==32191== Invalid free() / delete / delete[] / realloc()
==32191== at 0x4C2D1A0: free (vg_replace_malloc.c:530)
==32191== by 0x54BBB84: virFree (viralloc.c:582)
==32191== by 0x2BC04499: qemuProcessStop (qemu_process.c:6313)
==32191== by 0x2BC500FF: processMonitorEOFEvent (qemu_driver.c:4724)
==32191== by 0x2BC502FC: qemuProcessEventHandler (qemu_driver.c:4769)
==32191== by 0x5550640: virThreadPoolWorker (virthreadpool.c:167)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
==32191== by 0x928DE3C: clone (in /lib64/libc-2.23.so)
==32191== Address 0x31893d70 is 0 bytes inside a block of size 1,100 free'd
==32191== at 0x4C2D1A0: free (vg_replace_malloc.c:530)
==32191== by 0x54BBB84: virFree (viralloc.c:582)
==32191== by 0x54C1936: virCgroupValidateMachineGroup (vircgroup.c:343)
==32191== by 0x54C4B29: virCgroupNewDetectMachine (vircgroup.c:1550)
==32191== by 0x2BBDDA29: qemuConnectCgroup (qemu_cgroup.c:972)
==32191== by 0x2BC05DA7: qemuProcessReconnect (qemu_process.c:6822)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
==32191== by 0x928DE3C: clone (in /lib64/libc-2.23.so)
==32191== Block was alloc'd at
==32191== at 0x4C2BE80: malloc (vg_replace_malloc.c:298)
==32191== by 0x4C2E35F: realloc (vg_replace_malloc.c:785)
==32191== by 0x54BB492: virReallocN (viralloc.c:245)
==32191== by 0x54BEDF2: virBufferGrow (virbuffer.c:150)
==32191== by 0x54BF3B9: virBufferVasprintf (virbuffer.c:408)
==32191== by 0x54BF324: virBufferAsprintf (virbuffer.c:381)
==32191== by 0x55BB271: virDomainGenerateMachineName (domain_conf.c:27078)
==32191== by 0x2BBD5B8F: qemuDomainGetMachineName (qemu_domain.c:9595)
==32191== by 0x2BBDD9B4: qemuConnectCgroup (qemu_cgroup.c:966)
==32191== by 0x2BC05DA7: qemuProcessReconnect (qemu_process.c:6822)
==32191== by 0x554FBCF: virThreadHelper (virthread.c:206)
==32191== by 0x8F913D3: start_thread (in /lib64/libpthread-2.23.so)
Moreover, make the @machinename 'const char *' to mark it
explicitly that we are not changing the passed string.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>